demystifying artificial intelligence delivered by data … documents... · demystifying artificial...

40
Demystifying Artificial Intelligence delivered by Data Science Status Quo of Machine Learning, Cognitive and Advanced Analytics The Three Legged Problem: Assets, People, and Process Brian Ray Cognitive Team Lead | Products & Solutions Deloitte Consulting LLP 111 S Wacker Drive Chicago, IL 60606 @ brianray or [email protected] or linked in http://www.linkedin.com/in/brianray

Upload: vankhanh

Post on 16-May-2018

230 views

Category:

Documents


2 download

TRANSCRIPT

Demystifying Artificial Intelligence delivered by

Data ScienceStatus Quo of Machine Learning, Cognitive and Advanced Analytics

The Three Legged Problem: Assets, People, and Process

Brian RayCognitive Team Lead | Products & SolutionsDeloitte Consulting LLP111 S Wacker Drive Chicago, IL 60606

@brianray or [email protected] or linked in http://www.linkedin.com/in/brianray

Switch and bait!I’m really here to promote my B&B…

OR my video series:

http://bit.ly/PythonLesson

Demystify

1. An incredibly brief intro to ML 2. The Status Quo of AI3. The Key factors in the Three Legged Problem:

1. Process: project management aimedat agile and practical

2. People: Data Scientists, Engineers, and SME3. Assets: Data, Tools, and Platforms

4. Examples

ASSETS

PEOPLE

PROCESS

An incredibly brief intro to ML

In two slides

Identify existing Mail (unsupervised learning)

Explore Cluster

Feature modeling

Invoices

Personal

Junk

Mail features:

• Stock• Size (Width x Height)• Finish (matte, gloss)• Thickness• Says “open now”

Identify a *new* piece of mail based on previous (supervised learning)

HISTORICAL

Invoices

Personal

Junk

Mail features:

• Stock: Cover• Size (Width x Height): 8 x 9• Finish (matte, gloss): semi-gloss• Thickness: .02 • Says “open now”: No

Model

Is it junk x 165 (20%)?Yes and guessed yes: 100No and guessed no: 50No and guessed yes: 10Yes and guessed no: 5

Train

825 examples

Precision .91 (100 / 110)Recall .95 (100 / 105)

F1 = 2*((.91*.95)/(.91+.95)) = .929

Is it Junk?YES (.90% confident)

4-tiered definition of "Analytics": ranging from "Traditional" to "Cognitive/AI"

A. Traditional Analytics / Statistical Modeling – datasets with homoscedasticity (variability of a variable is equal across the range of values[2]) where the distribution of variables is known.B. Advanced Analytics – Mostly done by machine learning via supervised and unsupervised learning. Sometimes deterministic vs probabilistic.C. Predictive Analytics – Same as ‘B’; however, the model is *wired* up to do real time prediction. May also include retraining.D. Cognitive Analytics (AI)— New brand of Data Science Analytics in practice that uses 2 or more predictive models (Like that from ‘C’) to mimic human thinking to help add insights and solve problems in business or daily life.

https://www.linkedin.com/pulse/new-4-tiered-definition-analytics-ranging-from-traditional-brian-ray

The Status Quo of AISpoiler: It’s here! Finally (AI winters: 1974–80 and 1987–93)

0

20

40

60

80

100

120

11/2/14 11/2/15 11/2/16 11/2/17

Google Search worldwide from 2014 - 2017 from @brianray

machine learning IOT blockchain big data

2006

Gartner Emerging Technologies Hype Curve

2017

Gartner Emerging Technologies Hype Curve

The evolution to AI

202820262024202220202018

2004 …20022000199819961994

2016 …20142012201020082006

In practice…

Cognitive Insights

Used for predictive decision making to answer probabilistic questions, such as with finance planning and strategy to customer trends and interactions

“Augments Human Intelligence”

Process Automation

Rules-based, deterministic processes, such as invoice processing, leave of absence processing, etc.

“Mimics Human Actions”

Cognitive Engagement

“Mimics Human behavior and Intelligence”

Systems that completely replicate human behavior, emotions and interactions

Cognitive Automation

Software used to capture and interpret existing applications for the purpose of automating transaction processing, data manipulation, and communication across multiple IT systems

- Deloitte, “The Robots are Coming”

• Screen scraping data collection

• Rules based business process management

• Tactical toolset to automate repetitive tasks

• Cheaper and faster step towards process efficiency, compliance improvement and error reduction

Cognitive Insights

Automate non routine tasks involving intuition, judgment, creativity, persuasion, or problem solving

- Deloitte, “Automate This”

• Data input and output in any format

• Pattern recognition within unstructured data

• Replication of judgment based tasks through natural language processing

• Basic learning capabilities for continuous improvement to quality and speed applying machine learning algorithm

Cognitive Engagement

“The theory and development of computer systems able to perform tasks that normally require human intelligence.”

- Deloitte, DU Press “Cognitive Technologies”

• Natural language recognition and processing

• Dealing with unstructured super data sets

• Hypothesis based predictive analysis

• Self-learning rules continuously rewritten to improve performance

Cognitive Automation

Comprehension of a sentence or multiple sentences in a document, such as email or a commercial contracts

“Comprehends” Human Intelligence”

Deloitte is equipped with a wide spectrum of automation and cognitive technologies to deliver value through the Cognitive Advantage framework

Cognitive Advantage Capability Spectrum

PROCESS

It’s a three legged stool

ASSETS

PEOPLE

Business Issues

Fraud

Customer Retention

Profitability

Reliability

Risk

Productivity

Customer Acquisition

Real time Fraud Detection with

Predict Bank deflection

Price is Right?

Part Expiration

Predict High Risk Insurance

Shop floor optimizer

Assess Campaign Success

Understanding Regulation Reform Tool

ASSETS

PEOPLE

PROCESS

Data Scientists

Machine Learning

IoT

Blockchain

Deep Learning

Tensorflow

Traditional statistics

Engineers

Data Lake

SaaS Platforms

Cloud Computing

Automated Machine Learning

Streaming Data

Unlabeled and Unstructured Data

Subject Mater Experts

Business Analysts

Design UI/UX

TERMS

Agile

EDA

Models

Taxonomies

NLP

AssetsData, Tools, Services, and Platforms

AI and ML

Assets

• Explore which data sets are available and where additional context can be created by new sources

Identify what data is important to your business owners and users

Initial data set

Outside data set owned by another business unit

Outside (free) data set

Outside data set that can be purchased

• Existing easily accessible datasets

• Accessible data sets owned by other business units or ones where there is an appetite to acquire

• Not easily accessible sets:

• Data that is not machine readable

• Unlabeled data

• Poor quality data

Assets

Information Sensing & Recognition

Knowledge Learning & Representation

Reasoning & Decision Making

Natural & Visual Interaction

HWR IR

VR NLP

ML IRVL

SCE TAE

PIE DRE NLG VDA

COGNITIVE COMPUTING PLATFORM

HYBRID REFERENCE ARCHITECTURE

IntegrationWorkflow Web Server App Server

Database Big Data Cloud Events

APIs / Services Graphical UI

Analytics Reporting

CI

Information Retrieval

Hand Writing Recognition Natural Language Processing Probabilistic Inference Engine Deterministic Rules Engine

Semantic Computing Engine Machine Learning Voice Recognition

Image Recognition Virtual Decision Assistant Text Analytics Engine

Natural Language Generation

NLP PIE DRE

SCE ML VR

IR VDA TAE IRVL

HWR

NLG

Legend

CI Cognitive Insights

INFORMATION AND DATA SOURCES

Data StoresPlanning, Procurement, and manufacturing KPIs

Social/ Public DataNews and Economic Reports, Facebook

Text & ImagesSupplier catalogues, online pricing data

Paper / Fax / PrintsLegal contracts

Sales data, Customer segmentation

Customer Data

COGNITIVELY AUGMENTED APPLICATIONS / USE CASES

Input Output

What does a proposed Cognitive Platform look like?

Future demand prediction

Capacity Demand

Determine build plan best path and optimal service

Optimization Model

Automated customer interface for customer service requests

Customer Service

AI platform to address all 17 capabilities and more

Cognitive Platform

Assets

Assets

Obligatory “nascar” slideEcosystem Partners

*

*

**

****

Platform Partners Tools

Assets

People

Unicorn HuntingPeople

Put together a team that will bring the right skills to each phase

Blend teams with business, technology + science talent

People

Why is making Machine Learning real at-scale is still somewhat elusive?• Technical, business, and organizational challenges

• It feels risky and daunting. I don’t know how to begin

• My business stakeholders do not not buy into it

• My Data Scientists are not able to communicate the value

• We don’t have enough test data that can be relied upon

• My techies don’t have the right skills for this

• How do I develop a business case?

• …..

People

People

ProcessAgile, EDA, Data Science Modeling

How different are we? Business Process VS Engineering VS Data Science.

PROCESS

Business Waterfall Process PROCESS

Engineering Iterative Process PROCESS

Data Science Recursive Process

Predictive Modeling

Feature Selection

Model Selection & Assessment

Model Ensembling

Error Analysis

Feature Engineering

Data Processing

Exploratory Data Analysis

Data Processing • Imputing missing values• Document conversion and decomposition • Centering and scaling • Transformations to resolve skewness • Transformations to resolve outliers• Dimensionality Reduction• Assessing assumptions

Exploratory Data Analysis • Sorting / Aggregation data

for discovering meaningful relationships

• Suggesting and verifying hypothesis

• Supporting model selection • Providing a basis for further

data collection

Feature Engineering • Categorical encoding• Adding (polynomial) terms • Word Embedding • TF-IDF

Feature Selection • Wrapper methods (AIC,

backward / forward / stepwise selection, genetic algorithms )

• Filter methods (Chi2, Bonferroni correction)

Predictive modeling• Linear models • Basis expansions and

regularization• Additive models and Trees -

based models• Neural networks

Model Selection & Assessment• Model selection • Model assessment • Resampling techniques (k-fold

cross validation, bootstrap)• Bayesian approach and BIC

Error Analysis • Researching error patterns• Fixing high variance

problems • Fixing high bias problems • Comparison with state-of-

the-art models where available

Model Ensembling • Model inference, averaging

and voting • Boosting• Bagging • Stacking • Ensemble pruning

1

23

4 5

6

7

8

Gino Tesei

PROCESS

Approach

1. Data Scientists interactively build models

2. Wrap Models to be Packaged using Engineering

3. Deployment Integration into Production

Package models into‏containers to allow deployment‏Allow Scientists to use

existing tools for developing Predictive models on client data

Client‏Systems‏Data

Enable real time‏prediction and integration with client systems and workflows

Results‏

Results‏

Implementation Example: Complaint System

1. Text ClassificationMachine Learning Models

2. Deloitte Open-text Classification Engine (DOTCE)

3. 133 models in a resource limited environment, each over

370,000 narratives, nearly 50,000,000 predictions in less

than 2.5 hrs. With accuracy between 70-90%

CRM‏Complaints‏

Results‏

Results‏

NLP

Parts of speech

Term Frequency

Machine Learning

Rules

Random Forest

Elastic Search

PCA

Each document is 1,000 words long. Would have taken humans 31,000 hours (Average readers only reach around 200 wpm with a typical comprehension of 60%.)

Thank You Q&A

Brian RayCognitive Team Lead | Products & SolutionsDeloitte Consulting LLP111 S Wacker Drive Chicago, IL 60606

@brianray or [email protected] or linked in http://www.linkedin.com/in/brianray