text analytics software choosing the right fit tom reamy chief knowledge architect kaps group text...
TRANSCRIPT
![Page 1: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/1.jpg)
Text Analytics SoftwareChoosing the Right Fit
Tom ReamyChief Knowledge Architect
KAPS Group
http://www.kapsgroup.com
Text Analytics World
October 20 New York
![Page 2: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/2.jpg)
2
Agenda
Introduction – Text Analytics Basics Evaluation Process & Methodology
– Two Stages – Initial Filters & POC Proof of Concept
– Methodology – Results
Text Analytics and “Text Analytics” Conclusions
![Page 3: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/3.jpg)
3
KAPS Group: General
Knowledge Architecture Professional Services Virtual Company: Network of consultants – 8-10 Partners – SAS, SAP, FAST, Smart Logic, Concept Searching, etc. Consulting, Strategy, Knowledge architecture audit Services:
– Taxonomy/Text Analytics development, consulting, customization– Evaluation of Enterprise Search, Text Analytics– Text Analytics Assessment, Fast Start– Technology Consulting – Search, CMS, Portals, etc.– Knowledge Management: Collaboration, Expertise, e-learning– Applied Theory – Faceted taxonomies, complexity theory, natural
categories
![Page 4: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/4.jpg)
4
Introduction to Text AnalyticsText Analytics Features Noun Phrase Extraction
– Catalogs with variants, rule based dynamic– Multiple types, custom classes – entities, concepts, events– Feeds facets
Summarization– Customizable rules, map to different content
Fact Extraction– Relationships of entities – people-organizations-activities– Ontologies – triples, RDF, etc.
Sentiment Analysis– Rules – Objects and phrases
![Page 5: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/5.jpg)
5
Introduction to Text AnalyticsText Analytics Features Auto-categorization
– Training sets – Bayesian, Vector space– Terms – literal strings, stemming, dictionary of related terms– Rules – simple – position in text (Title, body, url)– Semantic Network – Predefined relationships, sets of rules– Boolean– Full search syntax – AND, OR, NOT– Advanced – DIST(#), ORDDIST#, PARAGRAPH, SENTENCE
This is the most difficult to develop Build on a Taxonomy Combine with Extraction
– If any of list of entities and other words
![Page 6: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/6.jpg)
Case Study – Categorization & Sentiment
6
![Page 7: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/7.jpg)
Case Study – Categorization & Sentiment
7
![Page 8: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/8.jpg)
8
![Page 9: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/9.jpg)
Evaluation Process & MethodologyOverview Start with Self Knowledge
– Think Big, Start Small, Scale Fast Eliminate the unfit
– Filter One- Ask Experts - reputation, research – Gartner, etc.• Market strength of vendor, platforms, etc.• Feature scorecard – minimum, must have, filter to top 3
– Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus
– Filter Three – In-Depth Demo – 3-6 vendors Deep POC (2) – advanced, integration, semantics Focus on working relationship with vendor.
9
![Page 10: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/10.jpg)
Design of the Text Analytics Selection Team Traditional Candidates – IT&, Business, Library IT - Experience with software purchases, needs assess, budget
– Search/Categorization is unlike other software, deeper look
Business -understand business, focus on business value They can get executive sponsorship, support, and budget
– But don’t understand information behavior, semantic focus Library, KM - Understand information structure Experts in search experience and categorization
– But don’t understand business or technology
10
![Page 11: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/11.jpg)
Design of the Text Analytics Selection Team
Interdisciplinary Team, headed by Information Professionals Relative Contributions
– IT – Set necessary conditions, support tests– Business – provide input into requirements, support project– Library – provide input into requirements, add understanding
of search semantics and functionality Much more likely to make a good decision Create the foundation for implementation
11
![Page 12: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/12.jpg)
Evaluating Taxonomy/Text Analytics Software Start with Self Knowledge
Strategic and Business Context Info Problems – what, how severe Strategic Questions – why, what value from the text analytics,
how are you going to use it– Platform or Applications?
Formal Process - KA audit – content, users, technology, business and information behaviors, applications - Or informal for smaller organization,
Text Analytics Strategy/Model – forms, technology, people– Existing taxonomic resources, software
Need this foundation to evaluate and to develop
12
![Page 13: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/13.jpg)
13
Varieties of Taxonomy/ Text Analytics Software
Taxonomy Management– Synaptica, SchemaLogic
Full Platform– SAS, SAP, Smart Logic, Linguamatics, Concept Searching, Expert
System, IBM, GATE Embedded – Search or Content Management
– FAST, Autonomy, Endeca, Exalead, etc.– Nstein, Interwoven, Documentum, etc.
Specialty / Ontology (other semantic)– Sentiment Analysis – Lexalytics, Clarabridge, Lots of players– Ontology – extraction, plus ontology
![Page 14: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/14.jpg)
Vendors of Taxonomy/ Text Analytics Software
– Attensity– Business Objects –
Inxight– Clarabridge– ClearForest– Concept Searching– Data Harmony / Access
Innovations– Expert Systems– GATE (Open Source)– IBM Infosphere
– Lexalytics– Multi-Tes– Nstein– SAS– SchemaLogic– Smart Logic– Synaptica
14
![Page 15: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/15.jpg)
15
Initial Evaluation – Factors Traditional Software Evaluation - Deeper Basic & Advanced Capabilities Lack of Essential Feature
– No Sentiment Analysis, Limited language support Customization vs. OOB
– Strongest OOB – highest customization cost Company experience, multiple products vs. platform Ease of integration – API’s, Java
– Internal and External Applications– Technical Issues, Development Environment
Total Cost of Ownership and support, initial price POC Candidates – 1-4
![Page 16: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/16.jpg)
16
Initial Evaluation – Factors Case Studies Amdocs
– Customer Support Notes – short, badly written, millions of documents– Total Cost, multiple languages, Integration with their application– Distributed expertise – Platform – resell full range of services, Sentiment Analysis– Twenty to Four to POC (Two) to SAS
GAO– Library of 200 page PDF formal documents, plus public web site– People – library staff – 3-4 taxonomists – centralized expertise– Enterprise search, general public– Twenty to POC with SAS
![Page 17: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/17.jpg)
Phase II - Proof Of Concept - POC
Measurable Quality of results is the essential factor 4 weeks POC – bake off / or short pilot Real life scenarios, categorization with your content 2 rounds of development, test, refine / Not OOB Need SME’s as test evaluators – also to do an initial categorization of
content Majority of time is on auto-categorization Need to balance uniformity of results with vendor unique capabilities –
have to determine at POC time Taxonomy Developers – expert consultants plus internal taxonomists
17
![Page 18: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/18.jpg)
18
POC Design: Evaluation Criteria & Issues
Basic Test Design – categorize test set– Score – by file name, human testers
Categorization & Sentiment – Accuracy 80-90%– Effort Level per accuracy level
Quantify development time – main elements Comparison of two vendors – how score?
– Combination of scores and report Quality of content & initial human categorization
– Normalize among different test evaluators Quality of taxonomists – experience with text analytics software and/or
experience with content and information needs and behaviors Quality of taxonomy – structure, overlapping categories
![Page 19: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/19.jpg)
Text Analytics POC OutcomesEvaluation Factors
Variety & Limits of Content – Twitter to large formal libraries
Quality of Categorization– Scores – Recall, Precision (harder)– Operators – NOT, DIST, START,
Development Environment & Methodology– Toolkit or Integrated Product– Effort Level and Usability
Importance of relevancy – can be used for precision, applications Combination of workbench, statistical modeling Measures – scores, reports, discussions
19
![Page 20: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/20.jpg)
POC and Early Development: Risks and Issues
CTO Problem –This is not a regular software process Semantics is messy not just complex
– 30% accuracy isn’t 30% done – could be 90% Variability of human categorization Categorization is iterative, not “the program works”
– Need realistic budget and flexible project plan Anyone can do categorization
– Librarians often overdo, SME’s often get lost (keywords) Meta-language issues – understanding the results
– Need to educate IT and business in their language
20
![Page 21: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/21.jpg)
Text Analytics and “Text Analytics” – Text Mining
TA is pre-processing for text mining TA adds huge dimensions of unstructured text
– Now 85-90% of all content, Social Media TA can improve the quality of text
– Categorization, Disambiguated metadata extraction Unstructured text into data - What are the possibilities?
– New Kinds of Taxonomies – emotion, small smart modular – Information Overload – search, facets, auto-tagging, etc.– Behavior Prediction – individual actions (cancel or not?)– Customer & Business Intelligence – new relationships– Crowd sourcing – technical support – Expertise Analysis – documents, authors, communities
21
![Page 22: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/22.jpg)
Conclusion
Start with self-knowledge – what will you use it for?– Current Environment – technology, information
Basic Features are only filters, not scores Integration – need an integrated team (IT, Business, KA)
– For evaluation and development POC – your content, real world scenarios – not scores Foundation for development, experience with software
– Development is better, faster, cheaper Categorization is essential, time consuming Text Analytics opens up new worlds of applications
22
![Page 23: Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20](https://reader034.vdocument.in/reader034/viewer/2022051001/56649e905503460f94b959b7/html5/thumbnails/23.jpg)
Questions?
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com