text analytics and text mining best of text and data tom reamy chief knowledge architect kaps group...
TRANSCRIPT
Text Analytics And Text MiningBest of Text and Data
Tom ReamyChief Knowledge Architect
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
2
Agenda
Text Analytics Capabilities Text Analytics Applications Text Mining and Text Analytics
– Data and Unstructured Content
Case Study – Text Mining for Taxonomy Development Conclusion
3
KAPS Group: General
Knowledge Architecture Professional Services Virtual Company: Network of consultants – 8-10 Partners – SAS, Smart Logic, Microsoft-FAST, Concept Searching, etc. Consulting, Strategy, Knowledge architecture audit Services:
– Text Analytics evaluation, development, consulting, customization– Knowledge Representation – taxonomy, ontology, Prototype– Metadata standards and implementation– Knowledge Management: Collaboration, Expertise, e-learning– Applied Theory – Faceted taxonomies, complexity theory, natural
categories
4
Introduction to Text AnalyticsText Analytics Features Noun Phrase Extraction
– Catalogs with variants, rule based dynamic– Multiple types, custom classes – entities, concepts, events– Feeds facets
Summarization– Customizable rules, map to different content
Fact Extraction– Relationships of entities – people-organizations-activities– Ontologies – triples, RDF, etc.
Sentiment Analysis– Statistical, rules – full categorization set of operators
5
Introduction to Text AnalyticsText Analytics Features Auto-categorization
– Training sets – Bayesian, Vector space– Terms – literal strings, stemming, dictionary of related terms– Rules – simple – position in text (Title, body, url)– Semantic Network – Predefined relationships, sets of rules– Boolean– Full search syntax – AND, OR, NOT– Advanced – NEAR (#), PARAGRAPH, SENTENCE
This is the most difficult to develop Build on a Taxonomy Combine with Extraction, Sentiment Foundation for best text analytics & combination
6
7
8
9
10
11
12
Varieties of Taxonomy/ Text Analytics Software
Taxonomy Management– Synaptica, SchemaLogic
Full Platform– SAS-Teragram, SAP-Inxight, Smart Logic, Data Harmony, Concept
Searching, Expert System, IBM, GATE
Content Management – embedded Embedded – Search
– FAST, Autonomy, Endeca, Exalead, etc.
Specialty– Sentiment Analysis , VOC – Lexalytics, Attensity / Reports– Ontology – extraction, plus ontology
13
Text Analytics ApplicationsPlatform for Multiple Applications Content Aggregation, Duplicate Documents – save millions! Business intelligence, Customer Intelligence Social Media - sentiment analysis, Voice of the Customer Social – Hybrid folksonomy / taxonomy / auto-metadata Social – expertise, categorize tweets and blogs, reputation Ontology – travel assistant, semantic web, etc. eDiscovery, Reputation management, Customer Experience Expertise Location, Crowd sourcing Technical support
14
Text Analytics Applications:Enterprise Search - Elements Text Analytics can “solve” enterprise search Multiple Knowledge Structures
– Facet – orthogonal dimension of metadata– Taxonomy - Subject matter / aboutness
Software - Search, ECM, auto-categorization, entity extraction, Text Analytics and Text Mining
People – tagging, evaluating tags, fine tune rules and taxonomy
Rich Search Results – context and conversation Platform for search based applications
15
16
Text Analytics and Text MiningData and Unstructured Content
80% of content is unstructured – adding to semantic web is major Text Analytics – content into data
– Big Data meets Big Content Real integration of text and ontology
– Beyond “hasDescription”– Improve accuracy of extracted entities, facts – disambiguation
• Pipeline – oil & gas OR research / Ford– Add Concepts, not just “Things” – 68% want this
Semantic Web + Text Analytics = real world value Linked Data + Text Analytics – best of both worlds
Build superior foundation elements – taxonomies, categorization
17
Text Analytics and Text Mining and Data MiningVaccine Adverse Reaction Combine with Data Mining New sources of information
News stories, medical records Blogs, social
Find new connections, sources of knowledge Vaccine Adverse Effects – disease, symptoms, variables
Unstructured text into a data source Some preliminary analysis, content structure Find unknown adverse effects and prevalence Drug Discovery + search / research – 5 year story
18
19
Text Analytics ApplicationsExample – Vaccine Adverse Effects
20
Text Analytics ApplicationsExample – Vaccine Adverse Effects
21
Text Analytics ApplicationsExample – Vaccine Adverse Effects
Text Analytics and Text MiningCase Study – Taxonomy Development
Problem – 200,000 new uncategorized documents Old taxonomy –need one that reflects change in corpus Text mining, entity extraction, categorization Bottom Up- terms in documents – frequency, date, Clustering – suggested categories Clustering – chunking for editors Time savings – only feasible way to scan documents Quality – important terms, co-occurring terms
22
Text Analytics and Text MiningCase Study – Taxonomy Development
Text into Data: Article, Abstract, Title, Subtitle – fields & source of terms Add Data: PubDate, journalTitle, Taxonomy Node Terms – Map to frequency, date, date ranges, Taxonomy Node
– New Terms, Trends Relevance – frequency, Abstract, Title, human judgment Entity Extraction – Authors, Organizations, Products, Categorization – build on clusters & taxonomy Combination – reports, visualizations, interactive explorations
23
Case Study – Taxonomy Development
24
25
26
Case Study – Taxonomy Development
27
Case Study – Taxonomy Development
28
Conclusion
Text Analytics impact is huge – solve information overload Enterprise Search and Search Based Applications: Save millions
and enhance productivity Combination of Text Analytics & Text Mining – unlimited range of
applications Mutual Enrichment – more data, add structure to unstructured Add Ontology = Richer Text Analytics – smarter, more useful Text Analytics + Text Mining + Semantic Web
– Move from theory to new practical applications
The best is yet to come!
29
Questions?
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com