got chaos? extracting business intelligence from email with natural language processing and dynamic...

67
Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis Steve Kramer, Ph.D. President & Chief Scientist Paragon Science, Inc. Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved. Matthew Russell Chief Technology Officer Digital Reasoning, Inc.

Upload: digital-reasoning

Post on 25-May-2015

4.665 views

Category:

Technology


0 download

DESCRIPTION

In this presentation, O'Reilly author and Digital Reasoning CTO Matthew Russell along with Dr. Steve Kramer, founder and chief scientist at Paragon Science, discuss how Digital Reasoning processed the Enron corpus with its advanced Natural Language Processing (NLP) technology - effectively transforming it into building blocks that are viable for data science. Then, Paragon Science used dynamic graph analysis inspired from particle physics to tease out insights from the data in order to better understand whether an enterprise fiasco such as the Enron scandal could have been thwarted.

TRANSCRIPT

Page 1: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Steve Kramer, Ph.D. President & Chief Scientist Paragon Science, Inc.

Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Matthew Russell Chief Technology Officer Digital Reasoning, Inc.

Page 2: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Overview

•  Matthew Russell •  Intro & Background •  Data Science @ Digital Reasoning •  Making Human Language Viable for Data Science

•  Dr. Steve Kramer •  Intro & Background •  Data Science @ Paragon Science •  Dynamic Graph Analysis on the Enron Corpus

2 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 3: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

“Any sufficiently advanced technology is indistinguishable

from magic.”

--Arthur C. Clarke

3! 3 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 4: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Hello, My Name Is … Matthew

•  Background in Computer Science •  Data Mining & Machine Learning

•  CTO @ Digital Reasoning •  Natural Language Processing

•  Author @ O’Reilly Media •  Mining the Social Web

•  Fitness junkie •  CrossFit, triathlon, yoga

•  Twitter, etc. •  @ptwobrussell

4 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 5: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

•  Flagship Product •  Synthesys •  Makes human language viable for data science

•  Customers •  Now: defense/intel, finance •  Soon: healthcare & alternative energy

•  Offices •  Nashville, NYC, DC, & London

5 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 6: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

An “Oracle” for Human Language Data

Verticals

Platform (Data Organized & Application-Ready)

Data Sources

Relational Database Management System

(RDBMS)

Enables applications on structured data

 Synthesys

Knowledge Graph Management System (KGMS)

Enables applications on unstructured data

Read

Resolve

Reason

Legal

Financial

External Data Internal Data

Processing Platforms and Data Stores Infrastructure

Government

Health

6 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 7: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Data Alchemy •  Data

•  Documents & document fragments •  Information

•  Assertions, extractions, etc. •  Knowledge

•  Aggregated information •  Wisdom

•  “Compressed” knowledge •  Gold

7 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 8: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis
Page 9: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis
Page 10: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis
Page 11: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis
Page 12: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis
Page 13: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis
Page 14: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis
Page 15: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis
Page 16: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis
Page 17: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis
Page 18: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis
Page 19: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis
Page 20: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis
Page 21: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

"Knowledge is a process of piling up facts; wisdom lies in

their simplification.”

--Martin Fischer

10! 10 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 22: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Natural Language Building Blocks

11 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 23: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Big Data Needs Big Understanding

12 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 24: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

The End State: A Knowledge Graph

Paul    Tudor  Jones  

Michael    Cardillo  

Galleon    Tech    Fund  

Kris    Chellam  

Krish    Panu  

Gary    Cohn  

Goldman    Sachs  

David    Loeb  

Ian    Horowitz  

Stan    Druckenmiller  

Leon    Shaulov  

1. READ: Forge natural language building blocks & build an information graph 2. RESOLVE: Introduce hierarchy & compress the information graph into knowledge 3. REASON: Exploit the knowledge to solve business problems

Lunch  

P&G    TransacDons  

GS  TransacDons  

Rajat  Gupta   Richard  SchuHe  

Fantasy  Football    League  

Wharton    School  

Proctor  &    Gamble  

SpotTail    Capital  Advisors  

McKinsey    &  Co.  

Intel  Corp.  

Anil  Kumar  

Fund  

Galleon  Group  LLC  

Rajiv  Goel  

P  &  G  Board  

PorNolio  Manager  

Raj  Rajaratnam  Voyager  Capital  

Partners  

13 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 25: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

“In God we trust. All others must bring data.”

--W. Edwards Deming

14 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 26: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Overview of Enron

•  A Texas-based energy company that grew to a multibillion-dollar company between 1985 and 2001

•  The substance of the Enron scandal involved the use of financial instruments called raptors to effectively hide accounting losses.

•  Soon after the scandal was revealed in 2001, Enron filed bankruptcy to the tune of over $60 billion dollars

•  The largest bankruptcy in U.S. history (at the time)!

15 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 27: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Could It Have Been Prevented?

•  Maybe/Probably – Steve will share some insights •  The answers were hiding in plain sight •  Fast-forward till now…

•  Any enterprise has significant potential liability lurking in digital forms of human language data

•  “Deep” NLP can power applications to help thwart these types of situations and do significant moral good

16 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 28: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Now Introducing…

Dr. Steve Kramer

<Applause />

17 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 29: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Hello, My Name Is … Steve •  Computational Physicist

•  Nonlinear Dynamics and Chaos •  Dynamic Graph Analysis

•  Data Science Entrepreneur •  Paragon Science, Inc.

•  Group Fitness Instructor •  BODYPUMP •  BODYATTACK •  CXWORX

•  Food and Wine Aficionado •  Twitter, etc.

•  @ParagonSci_Inc •  @dr_steve_kramer

18 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 30: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Extracting Critical Business Intelligence from Data Streams

•  Digital Reasoning’s Synthesys platform for natural language processing

•  Paragon Science’s dynamic anomaly detection software •  Business goals:

•  Help seize opportunities •  Mitigate risks

•  Case study: Enron email data set •  Identify key individuals by position in network •  Highlight unusual changes in sentiments •  Drill down to relevant source messages

19 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 31: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Example: Enhanced Enron Email Data Set from Digital Reasoning

•  Graphml representation for each email message •  343,134 email messages •  28,905 email addresses •  Natural language processing artifacts

•  Messages •  Sentences •  Phrases •  Tokens •  Co-references •  Co-reference chains

Paragon Science, Inc. 20 20 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 32: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Paragon Science Enron Network Analysis

Paragon Science, Inc. 21

Ingest Digital Reasoning Data

Find Key Individuals with K-Core Decomposition

Apply Dynamic Network Analysis

Construct Graph for Each

Day

Perform Sentiment Analysis on Messages

Identify Anomalous Clusters and Time Periods

Relate Anomalies to Source Data

21 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 33: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

K-Core Decomposition of the Enron Network

Paragon Science, Inc. 22

The users at the center are best-positioned to spread information and influence the network.

J.I. Alvarez-Hamelin, Alain Barrat, Alessandro Vespignani, Luca Dall'Asta, and Mariano Beiró http://lanet-vi.fi.uba.ar/

22 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 34: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

K-Core Decomposition: Central Core

Paragon Science, Inc. 23 23 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 36: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Enron Data Enhanced by NLP and Sentiment Analysis

•  For the central individuals in the Enron network, we construct dynamic networks using the phrases extracted from the emails by Digital Reasoning.

•  Use the LIWC (Linguistic Inquiry and Word Count) tool to classify the sentiment scores of the phrases. •  Prof. James Pennebaker from UT Austin (http://

www.liwc.net/) •  Sample categories

•  Positive emotion •  Negative emotion •  Anger •  Anxiety

Paragon Science, Inc. 25 25 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 37: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

High-level Sentiment Analysis Results

Paragon Science, Inc. 26 26 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 38: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Positive Emotion Weight Totals

Paragon Science, Inc. 27

Peak on Feb. 28, 2001 for [email protected]

27 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 39: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Anxiety Weight Totals

Paragon Science, Inc. 28

Peak on Feb. 28, 2001 for [email protected]

28 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 40: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Related Message from Feb. 28, 2001

Paragon Science, Inc. 29

Message-ID: <8261377.1075848143751.JavaMail.evans@thyme>Date: Wed, 28 Feb 2001 06:19:00 -0800 (PST)From: [email protected] To: [email protected] Subject: Energy Issues … Forwarded by Steven J Kean/NA/Enron on 02/28/2001 02:19 PM ----- To: Ann M Schmidt/Corp/Enron@ENRON, Bryan Seyfried/LON/ECT@ECT, [email protected], Riverside Press 2/23: "Power line plan has new foes"Contra Costa Times, 2/23: "GOP in a fix over power crisis"Sac Bee, Fri 2/23: "Davis says State has tentative deal for Edison grid"SF Chron 2/23: "Utilities Search for Long-Term Fix During Breather... GOP in a fix over power crisis Leaders seek solutions that won't undercut past support for deregulation By Daniel Borenstein TIMES POLITICAL EDITOR California Republican Party leaders are trying desperately to politicize the state's electricity crisis by blaming Gray Davis, but it seems that the harder they try the more popular the Democratic governor becomes. The latest GOP push will come today when party leaders, meeting in Sacramento for the start of their three-day state convention, hold a hastily organized workshop on energy. "We should be explaining to California that Gray Davis was asleep at the switch last year," said one of the scheduled speakers, Republican strategist Dan Schnur. "But that message can only work if it's coupled with a proposal for an alternative solution."  

29 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 41: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Apply Paragon Science Dynamic Anomaly Detection

•  Simple measures such as counts can miss important features •  Use dynamic network analysis to study the changes over time

in: •  Who communicates with whom •  The distribution of the emotion-laden phrases in the sent

emails •  Extract business intelligence

Paragon Science, Inc. 30 30 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 42: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Paragon Science, Inc. 31

Dynamic Cluster Analysis

•  Which entities behave or evolve differently than others in the data set?

•  Which entities have shifted their behavior unexpectedly?

31 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 43: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Paragon Science, Inc. 32

Dynamic Anomaly Detection Overview

•  A general approach that incorporates nonlinear time series analysis methods •  Complexity measures •  Finite-time Lyapunov exponents (FTLEs)

•  Input data •  Communications or transactional data streams •  General time-dependent data sets

•  Key questions •  Which entities behave or evolve differently than others in

the data set? •  Which entities have shifted their behavior unexpectedly?

32 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 44: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Paragon Science, Inc. 33

Finite-Time Lyapunov Exponents (FTLEs)

•  General dynamical system

•  Flow map •  Advects points in the state

space •  Describes the time evolution

of the system

33 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 45: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Paragon Science, Inc. 34

•  FTLEs characterize the amount of stretching or contraction about a point x0 during a time interval T •  Stability •  Predictability

•  Definition

Finite-Time Lyapunov Exponents (FTLEs)

34 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 46: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Paragon Science, Inc. 35

•  Similarly, characteristic vectors derived from the flow map’s Jacobian can describe the generalized directions of the local stretching or contraction.

•  Possible derivation approaches: •  Weight-based column sampling •  Singular value decomposition (SVD) •  Principal component analysis (PCA)

Derived Jacobian Vectors

35 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 47: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Paragon Science, Inc. 36

Paragon Dynamic Anomaly Detection

Representation of Data at t=ti

Cluster Resolution

Feature Vector Encoding

Outlier Detection at t=ti

More Time Intervals?

Yes

No

Clustering / Segmentation

Dynamic Anomaly Detection

Nonlinear Time Series Analysis FTLEs, Dynamic Thresholds, etc.

Pattern Classification

Outlier Detection

Domain-Specific Filtering Threat Signatures, Risk Profiles, etc.

36 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 48: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Summary of Top 100 Anxiety Anomalies

Paragon Science, Inc. 37

Sender Peak  Start  Date

Peak  End  Date Max  Change  Metric

#  Anomalies

[email protected] 2000-­‐03-­‐17 2001-­‐05-­‐04 3.018 48

[email protected] 1999-­‐12-­‐24 2001-­‐06-­‐08 2.356 34

[email protected] 2000-­‐09-­‐29 2001-­‐12-­‐14 2.164 2

[email protected] 2001-­‐05-­‐18 2001-­‐11-­‐16 1.893 6

[email protected] 2000-­‐07-­‐07 2001-­‐08-­‐10 1.862 5

[email protected] 2001-­‐06-­‐15 2001-­‐11-­‐16 1.789 5

Dr. Vince Kaminski Managing Director for Research at Enron

37 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 49: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Anomaly Detection Results for Anxiety Phrases

Paragon Science, Inc. 38

Anomalous period for [email protected] during August 2000

38 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 50: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Vince Kaminski’s Anxiety Phrase Network in August 2000

Paragon Science, Inc. 39 39 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 51: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Related Sentences Ranked by Anxiety Weight

Paragon Science, Inc. 40

Sentence Message  Date Anxiety  Score

the  framework  with  modern  credit  risk  measurement  tools  leads  to  a  liquidity  risk  VAR  measure 2000-­‐08-­‐08  02:34:00 0.9

This  event  has  rapidly  establishing  itself  as  the  risk  management  industry  ,s  most  important  meeDng  point 2000-­‐08-­‐17  04:22:00 0.7

This  event  has  rapidly  establishing  itself  as  the  risk  management  industry's  most  important  meeDng  point 2000-­‐08-­‐17  07:01:00 0.7

system  providers  this  is  THE  financial  risk  management  event  of  the  year 2000-­‐08-­‐17  04:22:00 0.6 system  providers  this  is  THE  financial  risk  management  event  of  the  year 2000-­‐08-­‐17  07:01:00 0.6 bank  liquidity  risk  can  be  viewed  as  a  variaDon  of  credit  risk  analysis 2000-­‐08-­‐08  02:34:00 0.6

modeling,  validaDon,  EVT)  Advanced  Asset  &  Liability  Management,  Corporate/Energy  Risk  Management 2000-­‐08-­‐17  04:22:00 0.5

the  New  Products  Designed  to  Stabilise  VolaDlity  Energy  Credit  Risk  Management  GARP 2000-­‐08-­‐17  04:22:00 0.5 this  ConvenDon  will  consider  include  Market  Risk  (stress  tesDng,  liquidity,  jump  diffusion 2000-­‐08-­‐17  04:22:00 0.5 validaDon,  EVT)  Advanced  Asset  &  Liability  Management,  Corporate/Energy  Risk  Management  and  the  Insurance  &  Capital  Markets

2000-­‐08-­‐17  04:22:00 0.5

VolaDlity  Energy  Credit  Risk  Management  GARP  is  a  not-­‐for-­‐profit,  independent  organisaDon  of  risk  management  pracDDoner

2000-­‐08-­‐17  04:22:00 0.5

modeling,  validaDon,  EVT)  Advanced  Asset  &  Liability  Management,  Corporate/Energy  Risk  Management 2000-­‐08-­‐17  07:01:00 0.5

the  New  Products  Designed  to  Stabilise  VolaDlity  Energy  Credit  Risk  Management  GARP 2000-­‐08-­‐17  07:01:00 0.5 this  ConvenDon  will  consider  include  Market  Risk  (stress  tesDng,  liquidity,  jump  diffusion 2000-­‐08-­‐17  07:01:00 0.5 validaDon,  EVT)  Advanced  Asset  &  Liability  Management,  Corporate/Energy  Risk  Management  and  the  Insurance  &  Capital  Markets

2000-­‐08-­‐17  07:01:00 0.5

VolaDlity  Energy  Credit  Risk  Management  GARP  is  a  not-­‐for-­‐profit,  independent  organisaDon  of  risk  management  pracDDoner

2000-­‐08-­‐17  07:01:00 0.5

40 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 52: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Related Sentences Ranked by Anxiety Weight

Paragon Science, Inc. 41

Sentence Message  Date Anxiety  Score any  quesDons  please  do  not  hesitate  to  contact  me 2000-­‐08-­‐17  04:22:00 0.4 INVITATION  TO  SPEAK  GARP  2001  The  2nd  Annual  Risk  Management  ConvenDon  13th  &  14th  February,  2001  )  MarrioH  World  

2000-­‐08-­‐17  04:22:00 0.4

Three  simultaneous  streams  address  tradiDonal  pricing  and  risk  management  techniques 2000-­‐08-­‐17  04:22:00 0.4 any  quesDons,  please  do  not  hesitate  to  call  me  at  ext 2000-­‐08-­‐07  01:27:00 0.4 your  risk  factor  will  be  reduced  by  a  factor  equal  the  sqrt(5 2000-­‐07-­‐28  10:05:00 0.4 I  have  referred  him  to  the  "Managing  Energy  Price  Risk"  book 2000-­‐08-­‐08  02:29:00 0.4 any  quesDons  please  do  not  hesitate  to  contact  me 2000-­‐08-­‐17  07:01:00 0.4 Three  simultaneous  streams  address  tradiDonal  pricing  and  risk  management  techniques 2000-­‐08-­‐17  07:01:00 0.4 conflicts  of  schedule,  please  do  not  hesitate  to  contact  me 2000-­‐07-­‐31  08:46:00 0.4 modelling  approaches  to  credit  risk. 2000-­‐08-­‐02  01:11:00 0.4 market  risks,  has  received  liHle  aHenDon  in  professional  or  academic  journals 2000-­‐08-­‐08  02:34:00 0.4 the  Global  AssociaDon  of  Risk  Professionals,  I  have  great  pleasure 2000-­‐08-­‐17  04:22:00 0.3 you  to  speak  at  our  2nd  Annual  Risk  Management  ConvenDon  )  GARP  2001 2000-­‐08-­‐17  04:22:00 0.3 we  lack  desperately  technical  talent 2000-­‐08-­‐07  04:13:00 0.3 we  lack  desperately  technical  talent 2000-­‐08-­‐08  02:39:00 0.3 Energy  Risk  -­‐  Tackling  Price  VolaDlity,  AdapDng  VaR,  Scenario  Modelling  and  Regulatory  Requirements  Vince  Kaminski

2000-­‐08-­‐17  07:01:00 0.3

GARP  2000  reflected  the  key  concerns  of  risk  management  experts 2000-­‐08-­‐17  07:01:00 0.3 INVITATION  TO  SPEAK  GARP  2001  The  2nd  Annual  Risk  Management  ConvenDon  13(superscript 2000-­‐08-­‐17  07:01:00 0.3 the  Global  AssociaDon  of  Risk  Professionals,  I  have  great  pleasure 2000-­‐08-­‐17  07:01:00 0.3 you  to  speak  at  our  2(superscript:  nd)  Annual  Risk  Management  ConvenDon 2000-­‐08-­‐17  07:01:00 0.3 approaches  for  measuring  these  risks  for  EES  deals 2000-­‐07-­‐26  02:08:00 0.3 EES  deals  given  the  uncertainDes  in  energy  consumpDon 2000-­‐07-­‐26  02:08:00 0.3 we  lack  desperately  technical  talent 2000-­‐08-­‐11  00:42:00 0.3 a  framework  allowing  to  measure  a  bank's  structural  liquidity  risk 2000-­‐08-­‐08  02:34:00 0.3 I  taking  over  responsibility  for  semng-­‐up  EES  Europe's  Risk  Management  acDvity 2000-­‐08-­‐03  08:51:00 0.3

41 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 53: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Vince Kaminski’s Anxiety Sentences Network in August 2000

Paragon Science, Inc. 42 42 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 54: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Vince Kaminski’s Anxiety Sentences Network (Detail)

Paragon Science, Inc. 43 43 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 55: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Vince Kaminski’s Anxiety Sentences Network: Gephi Highlights

Paragon Science, Inc. 44 44 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 56: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Related Messages Ranked by Anxiety Weight

Paragon Science, Inc. 45

Sender                               Source  File Anxiety  Weight [email protected] ./part-­‐m-­‐00042.base64.stemmed/2369.txt   1

[email protected] ./part-­‐m-­‐00038.base64.stemmed/8080.txt   1

[email protected] ./part-­‐m-­‐00040.base64.stemmed/1643.txt   0.8333

[email protected] ./part-­‐m-­‐00030.base64.stemmed/2628.txt   0.5

[email protected] ./part-­‐m-­‐00030.base64.stemmed/3403.txt   0.5

[email protected] ./part-­‐m-­‐00032.base64.stemmed/7385.txt   0.5

[email protected] ./part-­‐m-­‐00035.base64.stemmed/763.txt   0.5

[email protected] ./part-­‐m-­‐00038.base64.stemmed/3677.txt   0.5

[email protected] ./part-­‐m-­‐00038.base64.stemmed/7057.txt   0.5

[email protected] ./part-­‐m-­‐00042.base64.stemmed/4641.txt   0.5

[email protected] ./part-­‐m-­‐00032.base64.stemmed/609.txt   0.3333

[email protected] ./part-­‐m-­‐00041.base64.stemmed/7337.txt   0.3333

45 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 57: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Top-Ranked Anxiety Message from Aug. 17, 2000

Paragon Science, Inc. 46

Message-­‐ID:  <2250375.1075856554294.JavaMail.evans@thyme>  Date:  Thu,  17  Aug  2000  08:55:00  -­‐0700  (PDT)  From:  [email protected]  To:  [email protected]  Subject:  Re:  Cairn  Gas  Purchase  Bid    Douglas  S  Parsons@ENRON_DEVELOPMENT  08/15/2000  09:30  AM  To:  Doug  Leach/HOU/ECT@ECT  cc:  Marc  De  La  Roche/HOU/ECT@ECT    Subject:  Re:  Cairn  Gas  Purchase  Bid        I  can  appreciate  and  share  your  objec`ve.    Earlier  today  I  sent  a  separate    note  to  Vince  forwarding  your  concerns  and  asking  again  for  his  assistance.      …    To:  Douglas  S  Parsons/ENRON_DEVELOPMENT@ENRON_DEVELOPMENT  Subject:  Re:  Cairn  Gas  Purchase  Bid        I  gave  you  several  clear  alterna`ves  such  as  contac`ng  Vince's  structuring    group,  Michael  Popkin's  Southern  Cone  structuring  group  and  a  long  discussion    regarding  the  pricing  and  suggested  "collar."  I  also  asked  if  you  had  spoken    to  your  customer  about  what  they  were  willing  to  pay,  but  that  was  a  non    starter.  Trust  me,  I  have  seen  almost  every  bad  deal  Enron  has  entered  into    or  afempted  to  enter  into  and  I  am  trying  to  get  Metgas  to  objec`vely    relook  at  their  offer  to  Cairn  become  it  becomes  another  bad  deal.            

Source File Anxiety Weight ./part-m-00042.base64.stemmed/2369.txt 1

46 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 58: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Details on Dr. Vince Kaminski

Paragon Science, Inc. 47

http://www.nytimes.com/2006/01/29/business/businessspecial3/29profiles.html

47 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 59: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Summary of Top 100 Anger Anomalies

Paragon Science, Inc. 48

Sender Peak  Start  Date Peak  End  Date Max  Change  Metric

#  Anomalies

[email protected] 2000-­‐08-­‐25 2002-­‐01-­‐04 3.39 47

[email protected] 1999-­‐12-­‐31 2001-­‐06-­‐08 2.97 36

[email protected] 2000-­‐03-­‐17 2002-­‐01-­‐11 2.51 11

[email protected] 2001-­‐06-­‐15 2001-­‐08-­‐31 2.37 3

[email protected] 2001-­‐07-­‐06 2001-­‐09-­‐21 2.03 3 Kay Mann Assistant General Counsel

Mark Taylor VP & Assistant General Counsel

48 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 60: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Anomaly Detection Results for Anger Phrases

Paragon Science, Inc. 49

Anomalous period for [email protected] starting in September 2000

49 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 61: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Kay Mann’s Anger Phrase Network (Sept.-Nov. 2000)

Paragon Science, Inc. 50 50 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 62: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Kay Mann’s Key Anger Sentences

Paragon Science, Inc. 51

Sentence Message  Date Anger  Score we  have  a  fighDng  chance 2000-­‐10-­‐21  06:27:00 0.4 a  criDcal  point  and  needs  to  understand  the  Nigeria  tax  implicaDons  for  the  venture 2000-­‐09-­‐01  05:05:00 0.3

the  commercial  team  is  now  at  a  criDcal  point 2000-­‐09-­‐01  05:05:00 0.3 he  had  a  heart  aHack 2000-­‐10-­‐10  02:26:00 0.3 5  really  bothers  her 2000-­‐09-­‐18  07:36:00 0.3 The  Midwest  is  dominated  by  verDcally  integrated  Transmission  Providers  ("TPs 2000-­‐11-­‐10  08:56:00 0.3

the  Aquarium  closed  -­‐  killed  by  bike  lanes 2000-­‐09-­‐14  02:54:00 0.3 Assignment  language  -­‐  new  language  being  tortured  within  the  Enron  ranks,  to  GE 2000-­‐11-­‐28  05:20:00 0.3

my  gas  won't  be  cut  off  this  COLD  November 2000-­‐11-­‐22  02:55:00 0.3 such  case,  you  should  destroy  this  message 2000-­‐09-­‐01  05:05:00 0.2 you  tonight  to  trick  or  treat 2000-­‐10-­‐30  07:45:00 0.2 michael  cut  is  toe,  I 2000-­‐10-­‐16  06:46:00 0.2 the  criDcal  juncture,  we  should  enlist  external  support 2000-­‐09-­‐15  05:38:00 0.2

Enron  should  protest  or  intervene 2000-­‐11-­‐09  01:06:00 0.2 Calif.,  who  represents  San  Diego 2000-­‐09-­‐12  02:13:00 0.2 State  Mkt  Structure  Blamed  For  Crisis  The  hearing 2000-­‐09-­‐12  02:13:00 0.2 the  staff  probe,  appeared  to  place  much  of  the  blame 2000-­‐09-­‐12  02:13:00 0.2 I  will  be  leaving  around  2:00  for  suburban  trick  or  treaDng 2000-­‐11-­‐09  01:45:00 0.2 Enron  should  protest  or  intervene 2000-­‐11-­‐06  07:34:00 0.2 it  screwed  up  the  parking 2000-­‐09-­‐14  06:21:00 0.2 our  criDcal  final  zoning  vote  is  next  Thursday,  I 2000-­‐11-­‐01  06:17:00 0.2 She's  not  a  dumb  dog 2000-­‐10-­‐20  05:32:00 0.2 it  just  hasn't  done  the  trick 2000-­‐10-­‐27  01:02:00 0.2 FP&L  revealed  violaDons 2000-­‐11-­‐10  08:56:00 0.2

51 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 63: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Related Top-Ranked Anger Message #1

Paragon Science, Inc. 52

Date:  Sat,  21  Oct  2000  06:27:00  -­‐0700  (PDT)  From:  [email protected]  To:  [email protected],  [email protected]  Subject:  Re:  Power  Plant  Development  Powerpoint  PresentaDon    I  don't  know  if  you  got  this,  so  here  'Ds.    I  was  out  of  town  and  missed  the    presentaDon,  although  I've  worked  with  Herman,  Roger  and  David  on  these    issues.    I'm  sure  they  would  be  glad  to  give  the  presentaDon  again  for    others  with  a  need  to  understand  the  complexiDes  of  off  balance  sheet    treatment  of  power  plants.    Kay  …  To:  Suzanne  Adams/HOU/ECT@ECT  cc:  Karen  E  Jones/HOU/ECT@ECT,  Bob  Carter/HOU/ECT@ECT,  Barton    Clark/HOU/ECT@ECT,  Dale  Rasmussen/HOU/ECT@ECT,  Ed  B  Hearn  III/HOU/ECT@ECT,    [email protected],  [email protected],  Stuart  Zisman/HOU/ECT@ECT,  Peggy    Banczak/HOU/ECT@ECT,  David  Leboe/HOU/ECT@ECT    Subject:  Re:  Power  Plant  Development  Powerpoint  PresentaDon        Could  you  check  with  David  and  see  if  he  could  fax  copies  of  the  PowerPoint    plates  to  me  in  Portland  so  we  have  a  fighDng  chance  of  following  along  by    tying  in  by  phone?        If  that  is  possible,  try  to  send  them  by  Tuesday.  Thank    you  very  much.    Al  Larsen                      

52 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 64: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Related Top-Ranked Anger Message #2

Paragon Science, Inc. 53

Date:  Fri,  1  Sep  2000  05:05:00  -­‐0700  (PDT)  From:  [email protected]  To:  [email protected]  Subject:  Nigeria  Tax  Issues    Seyi,    James  MacCallon  has  now  ler  for  Mexico.    I  am  just  returning  from  India  and  must  step  in  where  he  ler  off.    I  see  James  has  sent  a  few  emails  and  faxes    to  you  but  I  can  find  no  reply  from  AA  although  there  is  a  14-­‐July  email  from  you  indicaDng  a  reply  would  be  sent  by  16-­‐July.      Have  you  sent  a  reply  and  I    have  missed  it?    Given  that  the  commercial  team  is  now  at  a  criDcal  point  and  needs  to  understand  the  Nigeria  tax  implicaDons  for  the  venture  and  bid,  perhaps  we  should  short-­‐cut  this  Q&A  process  and  have  a  conference  call.    I  would  suggest  early  AM  Tuesday  20th,  what  works  for  you?    

53 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 65: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Related Top-Ranked Anger Message #8

Paragon Science, Inc. 54

From:  [email protected]  To:  Peter  J  Thompson  <[email protected]>  Subject:  RE:  Weekly  GE  Conference  Call  …  -­‐-­‐-­‐-­‐-­‐Original  Message-­‐-­‐-­‐-­‐-­‐  From:  [email protected]  [mailto:[email protected]]  Sent:  Monday,  November  27,  2000  3:14  PM  To:  [email protected];  [email protected];  [email protected];  [email protected];  [email protected];  [email protected];  [email protected];  [email protected];  [email protected]  Cc:  [email protected]  Subject:  Weekly  GE  Conference  Call    Dear  Turbine  Torture  Club  Members:    ..  Assignment  language  -­‐  new  language  being  tortured  within  the  Enron  ranks,  to  GE  soon.  Limit  of  liability,  indemnity,  etc.  -­‐  Enron  owes  GE  a  posiDon  on  this.  

54 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 66: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Possible Next Steps

•  Sentiment analysis using n-grams •  Influence calculations: Determine which users were most

effective at spreading the various emotions •  Topic detection: Perform community detection on the

combined user/phrase networks to determine groups of users and set of related terms

•  URL phrase analysis: Analyze the dynamics of URL sharing in the network as an independent indicator of topics and sentiment

Paragon Science, Inc. 55 55 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.

Page 67: Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis

Paragon Science, Inc. 56

What Are the Payoffs?

•  Quickly identify key influencers and trends in online networks with NLP, sentiment analysis, and graph analytics

•  Provide early warning of viral emails, videos, anomalous web events, or unusual network traffic

•  Enable enhanced business intelligence and proactive corporate compliance without having to specify normal vs. abnormal behavior in advance

56 Copyright © 2013-2014 Paragon Science, Inc. and Digital Reasoning, Inc. All rights reserved.