text analytics past, present & future: an industry view
Embed Size (px)
DESCRIPTION
Keynote presentation to JADT.org, June 5, 2014TRANSCRIPT

Text Analytics Past, Present & Future: An Industry View
Seth GrimesAlta Plana Corporation
@sethgrimes
June 5, 2014

Text Analytics: An Industry View
JADT – June 5, 2014
2

Text Analytics: An Industry View
JADT – June 5, 2014
3
Analytics is the systematic application of algorithmic methods that derive and deliver information, typically expressed quantitatively, whether in the form of indicators, tables, visualizations, or models.
• Systematic means formal & repeatable.
• Algorithmic contrasts with heuristic.

Text Analytics: An Industry View
JADT – June 5, 2014
4
Text analytics past:
Pioneers…

Document input and processing
Knowledge handling is key
Desk Set (1957): Computer engineer Richard Sumner (Spencer Tracy) and television network librarian Bunny Watson (Katherine Hepburn) and the "electronic brain" EMERAC.Hans Peter Luhn
“A Business Intelligence System”IBM Journal, October 1958

Text Analytics: An Industry View
JADT – June 5, 2014
6
“Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the auto-abstract.”
H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.




Text Analytics: An Industry View
JADT – June 5, 2014
10
Pipelines and patternsIBM’s
MedTAKMI, 1997-
http://www.research.ibm.com/trl/projects/textmining/index_e.htm

Text Analytics: An Industry View
JADT – June 5, 2014
11
Exhaustive extractionAn (old) Attensity example – NLP to identify roles
and relationships, for a law-enforcement application .

Text Analytics: An Industry View
JADT – June 5, 2014
12
Language engineeringGATE: General Architecture for Text Engineering.
http://gate.ac.uk/

Text Analytics: An Industry View
JADT – June 5, 2014
13
Text analytics present:
Business, technology, applications, and solutions…

Text Analytics: An Industry View
JADT – June 5, 2014
14
“Organizations embracing text analytics all report having an epiphany moment when they suddenly knew more than before.” -- Philip Russom, the Data Warehousing Institute,
2007http://tdwi.org/articles/2007/05/09-what-works/bi-search-and-text-
analytics.aspx

Text Analytics: An Industry View
JADT – June 5, 2014
15
Linguistics, statistics, and semanticsText analytics (typically) involves linguistic
modelling, statistical characterization, learned patterns, and semantic understanding of text-derived features –Named entities: people, companies, places, etc.Pattern-based features: e-mail addresses, phone
numbers, etc.Concepts: abstractions of entities.Facts and relationships.Events.Concrete and abstract attributes (e.g., “expensive”
& “comfortable”) including measure-value pairs.Subjectivity in the forms of opinions, sentiments,
and emotions: attitudinal data.– applied to business ends.

Text Analytics: An Industry View
JADT – June 5, 2014
16
SourcesIt’s a truism that 80% of enterprise-relevant
information originates in “unstructured” form:E-mail and messages.Web pages, online news & blogs, forum postings,
and other social media.Contact-center notes and transcripts.Surveys, feedback forms, warranty claims.Scientific literature, books, legal documents....
Non-text “unstructured” content?ImagesAudio including speechVideo
Value derives from patterns.

Text Analytics: An Industry View
JADT – June 5, 2014
17
ValueWhat do we do with text, whether online, on-
social, or in the enterprise?1. Post/Publish, Manage, and Archive.2. Index and Search.3. Categorize and Classify according to
metadata & contents.4. Extract information and Analyze.

Text Analytics: An Industry View
JADT – June 5, 2014
18
Semantics, analytics, and IRText analytics generates semantics to bridge
search, BI, and applications, enabling next-generation information systems.
Search BI/Big Data
Applica-tions
Search based applications (search + text + apps)
Information access (search + analytics)
Synthesis (text + BI)/(big data)
Text analytics (inner circle)
Semantic search (search + text)
NextGen CRM, EFM, MR, marketing, apps…

Text Analytics: An Industry View
JADT – June 5, 2014
19
Content, composites, connections 1

Text Analytics: An Industry View
JADT – June 5, 2014
20Content, Composites, Connections, 2Content, composites, connections 2

Text Analytics: An Industry View
JADT – June 5, 2014
21
ApplicationsText analytics has applications in:
Intelligence & law enforcement.Life sciences & clinical medicine.Media & publishing including social-media analysis and contextual advertizing.Competitive intelligence.Voice of the Customer: CRM, product management & marketing.Public administration & policy.Legal, tax & regulatory (LTR) including compliance.Recruiting.

Text Analytics: An Industry View
JADT – June 5, 2014
22
Opinion, sentiment & emotion

Text Analytics: An Industry View
JADT – June 5, 2014
23
Sentiment analysisA specialization, of relevance to:
Brand/reputation management.Customer experience management (CEM).Competitive intelligence.Survey analysis (EFM = Enterprise Feedback
Management).Market research.Product design/quality.Trend spotting.

Text Analytics: An Industry View
JADT – June 5, 2014
24
Data exploration via dashboards and workbenches.

Text Analytics: An Industry View
JADT – June 5, 2014
25
Text analytics present:
The market…

Text Analytics: An Industry View
JADT – June 5, 2014
26
http://altaplana.com/TA2014

Text Analytics: An Industry View
JADT – June 5, 2014
27
Military/national security/intelligenceLaw enforcement
Intellectual property/patent analysisFinancial services/capital markets
Product/service design, quality assurance, or warranty claimsOther
Insurance, risk management, or fraudE-discovery
Life sciences or clinical medicine
Online commerce including shopping, price intelligence, reviews
Content management or publishingCustomer /CRM
Search, information access, or Question AnsweringCompetitive intelligence
Brand/product/reputation managementResearch (not listed)
Voice of the Customer / Customer Experience Management
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
5%
6%
8%
9%
10%
11%
13%
14%
15%
16%
25%
27%
29%
33%
38%
38%
39%
What are your primary applications where text comes into play?

Text Analytics: An Industry View
JADT – June 5, 2014
28
Voice of the CustomerText analytics is applied to improve customer
service and boost satisfaction and loyalty.Analyze customer interactions and opinions –
• E-mail, contact-center notes, survey responses.• Forum & blog posting and other social media.
– to – • Address customer product & service issues.• Improve quality.• Manage brand & reputation.
Assessment of qualitative information from text helps users – • Gain feedback on interactions.• Assess customer value.• Understand root causes.• Mine data for measures such as churn likelihood.

Text Analytics: An Industry View
JADT – June 5, 2014
29
The commercial scene

Text Analytics: An Industry View
JADT – June 5, 2014
30
Online commerceText analytics is applied for marketing, search
optimization, competitive intelligence.Analyze social media and enterprise feedback to
understand the Voice of the Market: • Opportunities• Threats• Trends
Categorize product and service offerings for on-site search and faceted navigation and to enrich content delivery.
Annotate pages to enhance Web-search findability, ranking.
Scrape competitor sites for offers and pricing.Analyze social and news media for competitive
information.

Text Analytics: An Industry View
JADT – June 5, 2014
31
E-Discovery and complianceText analytics is applied for compliance, fraud and
risk, and e-discovery.Regulatory mandates and corporate practices
dictate –• Monitoring corporate communications• Managing electronic stored information for
production in event of litigationSources include e-mail (!!), news, social mediaRisk avoidance and fraud detection are key to
effective decision making• Text analytics mines critical data from unstructured
sources• Integrated text-transactional analytics provides rich
insights

Text Analytics: An Industry View
JADT – June 5, 2014
32
Web-site feedbacksocial media not listed above
chatemployee surveys
contact-center notes or transcripts
e-mail and correspondenceonline reviews
scientific or technical literatureFacebook postings
on-line forumscustomer/market surveys
comments on blogs and articlesnews articles
blogs (long form+micro)
0% 10% 20% 30% 40% 50% 60% 70%
16%19%
20%
20%
22%
26%31%
31%
32%
36%
37%
38%
42%
61%What textual information are you analyzing or do you plan
to analyze?
201420112009

Text Analytics: An Industry View
JADT – June 5, 2014
33
insurance claims or underwriting notes
video or animated images
photographs or other graphical images
field/intelligence reports
patent/IP filings
text messages/instant messages/SMS
Web-site feedback
chat
contact-center notes or transcripts
online reviews
Facebook postings
customer/market surveys
news articles
Twitter, Sina Weibo, or other microblogs
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%5%5%5%5%
7%9%
11%11%
12%12%12%13%
16%19%
20%20%
22%26%
31%31%
32%36%
37%38%
42%43%
46%What textual information are you analyzing or do you plan to
analyze?

Text Analytics: An Industry View
JADT – June 5, 2014
34
Events
Semantic annotations
Other entities – phone numbers, part/product numbers, e-mail & street addresses, etc.
Metadata such as document author, publication date, title, headers, etc.
Concepts, that is, abstract groups of entities
Named entities – people, companies, geographic locations, brands, ticker symbols, etc.
Relationships and/or facts
Sentiment, opinions, attitudes, emotions, percep-tions, intent
Topics and themes
0% 20% 40% 60% 80% 100%
Current; 33%
Current; 31%
Current; 34%
Current; 47%
Current; 51%
Current; 56%
Current; 47%
Current; 54%
Current; 66%
Expect; 21%
Expect; 24%
Expect; 23%
Expect; 23%
Expect; 28%
Expect; 25%
Expect; 33%
Expect; 28%
Expect; 22%
Do you currently need (or expect to need) to extract or analyze...

Text Analytics: An Industry View
JADT – June 5, 2014
35
“The share rise in users who selected Arabic…coincided with much of the civil unrest… in Middle Eastern countries.”
http://bits.blogs.nytimes.com/2014/03/09/the-languages-of-twitter-users/

Text Analytics: An Industry View
JADT – June 5, 2014
36
Arabic
Chinese
French
Greek
Italian
Korean
Portuguese
Scandinavian or Baltic
Turkish or Turkic
Other Arabic script (including Urdu, Pashto, Farsi, Dari)
Other European or Slavic/Cyrillic
-10% 0% 10% 20% 30% 40% 50% 60%
10%1%
16%9%
36%34%
2%2%
18%7%
4%3%
13%8%7%
38%3%2%3%2%
5%9%
17%3%
28%7%
17%24%
2%10%
11%15%
8%4%
17%21%
3%20%
4%0%
1%1%
2%0%
CurrentWithin 2 years
Non-English language support?

Text Analytics: An Industry View
JADT – June 5, 2014
37
Software & platform optionsText-analytics options may be grouped in general
classes.• Installed text-analysis application, whether
desktop or server or deployed in-database.• Data mining workbench.• Hosted.• Programming tool.• As-a-service, via an application programming
interface (API).• Code library or component of a business/vertical
application, for instance for CRM, e-discovery, search.
Text analytics is frequently embedded in search or other end-user applications.
The slides that follow next will present leading options in each category except Hosted…

Text Analytics: An Industry View
JADT – June 5, 2014
38
media monitoring/analysis interface
hosted or Web service (on-demand "API") option
supports data fusion / unified analytics
sector adaptation (e.g., hospitality, insurance, retail, health care, communications, financial services)
BI (business intelligence) integration
ability to create custom workflows or to create or change topics/categories yourself
big data capabilities, e.g., via Hadoop/MapReduce
predictive-analytics integration
open source
support for multiple languages
sentiment scoring
"real time" capabilities
low cost
deep sentiment/emotion/opinion/intent extraction
document classification
broad information extraction capability
ability to use specialized dictionaries, taxonomies, ontologies, or extraction rules
ability to generate categories or taxonomies
0% 10% 20% 30% 40% 50% 60% 70%
22%
25%
28%
30%
32%
33%
33%
36%
37%
40%
41%
43%
44%
45%
53%
53%
54%
64%What is important in a solution?
2014 (n=139)2011 (n=136)2009 (n=78)

Text Analytics: An Industry View
JADT – June 5, 2014
39
User decision criteriaPrimary considerations include –
Adaptation or specialization: To a business or cultural domain, language, information type (e.g., text, speech, images) & source (e.g., Twitter, e-mail, online news).
By-user customization possibilities: For instance, via custom taxonomies, rules, lexicons.
Sentiment resolution: Aggregate, message, or feature level. (What features? Topics, coreferenced entities?)
What sentiment? Valence & what else? Emotion? Intent?
Outputs: E.g., annotated text, models, indicators, dashboards, exploratory data interfaces.
Usage mode: As-a-service (API), installed, or hosted/cloud.
Capacity: Volume, performance, throughput, latency.
Cost.

Text Analytics: An Industry View
JADT – June 5, 2014
40
A few French companies

Text Analytics: An Industry View
JADT – June 5, 2014
41
Academic spin-offs
People Pattern

Text Analytics: An Industry View
JADT – June 5, 2014
42
Text analytics future:
Synthesis and sensemaking.

New York Times,September 8, 1957

Text Analytics: An Industry View
JADT – June 5, 2014
44
Emotion in text

Text Analytics: An Industry View
JADT – June 5, 2014
45
Emotion and outcomes

Text Analytics: An Industry View
JADT – June 5, 2014
46
Audio including speech.Images.Video.
http://www.geekosystem.com/facebook-face-recognition/
http://www.sciencedirect.com/science/article/pii/S0167639312000118
http://flylib.com/books/en/2.495.1.54/1/
Beyond Text

Text Analytics: An Industry View
JADT – June 5, 2014
47
The world of big dataMachine data (e.g., logs, sensor outputs,
clickstreams).Actions, interactions, and transactions:
geolocation and time.Profiles: individual, demographic & behavioral.Text, audio, images, and video.
Facts and feelings.

Text Analytics: An Industry View
JADT – June 5, 2014
48
(Accessible) data everywhere

Text Analytics: An Industry View
JADT – June 5, 2014
49
http://www.geeklawblog.com/2011/12/lexis-advance-platform-launch-two.html
A big data analytics architecture (example)

Text Analytics: An Industry View
JADT – June 5, 2014
50
http://searchuserinterfaces.com/
“It is convenient to divide the entire information access process into two main components: information retrieval through searching and browsing, and analysis and synthesis of results. This broader process is often referred to in the literature as sensemaking. Sensemaking refers to an iterative process of formulating a conceptual representation from of a large volume of information.”
– Marti Hearst, 2009
Sensemaking

Text Analytics: An Industry View
JADT – June 5, 2014
51
http://www.businessweek.com/magazine/content/04_19/b3882029_mz072.htm
En route

Text Analytics Past, Present & Future: An Industry View
Seth GrimesAlta Plana Corporation
@sethgrimes
June 5, 2014