leadership session: machine

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Leadership session: Machine learning

Swami Sivasubramanian

A I M 2 1 8 - L

VP, Amazon AI

Amazon Web Services

M A C H I N E L E A R N I N G I S H A P P E N I N G I N C O M P A N I E S O F E V E R Y S I Z E A N D I N D U S T R Y

Tens of thousands customers have chosen AWS for their ML workloads | More than twice as many customers using ML than any other cloud provider

But the journey has only just begun

Photography

Then Now

Machine learning in the hands

of all developers

Common challenges

• Skills gap—not enough people can build ML models

• ML model building is a time-consuming and complex process

• Finding the right business use cases that could benefit from ML

The AWS ML Stack

Broadest and most complete set of Machine Learning capabilities

VISION SPEECH TEXT SEARCH NEW CHATBOTS PERSONALIZATION FORECASTING FRAUD NEW DEVELOPMENT NEW CONTACT CENTERS

Amazon SageMaker Ground Truthdata labelling

ML

Marketplace

Amazon

SageMaker

NeoBuilt-in

algorithms

SageMaker

Notebooks NEW

SageMaker

Experiments NEW

Model

tuning

SageMaker

Autopilot NEW

Model

hosting

SageMaker

Model Monitor NEW

Deep Learning

AMIs & Containers

GPUs &

CPUs

Elastic

InferenceInferentia FPGA

AmazonRekognition

AmazonPolly

AmazonTranscribe

+Medical

AmazonComprehend

+Medical

AmazonTranslate

AmazonLex

AmazonPersonalize

AmazonForecast

AmazonFraud Detector

AmazonCodeGuru

AI SERVICES

ML SERVICES

ML FRAMEWORKS & INFRASTRUCTURE

AmazonTextract

AmazonKendra

AmazonConnect

with Contact Lens

SageMaker Studio IDE NEW

NEWNEW

NEW

Easier to build Easier to scale Easier to apply

Easier to Build

Deep Learning

AMIs & Containers

GPUs &

CPUs

Elastic

InferenceInferentia FPGA

ML FRAMEWORKS & INFRASTRUCTURE

Java is the language of choice for many customers

1. [1]TIBOE Programming Community Survey https://www.tiobe.com/tiobe-index/

2. [2] Stack Overflow Developer Survey https://insights.stackoverflow.com/survey/2019

Stack Overflow Developer Survey

#1 Language since 20041

>67% adoption in enterprise2

Simple Java API

Streamlined workflow and tech stack

DJL gives Java users an end-end solution for ML development

Models for prototyping

Simplify and accelerate development

Amazon forecasting team spends weeks refactoring ML models developed in Python

A reduction of 30% in development time using DJL

ML infrastructure team

Deploys models to predict in Java

Data science team

Develops forecasting models in Python—“How many Instant

Pot accessories will sell?”

SQL for ML

Stack Overflow Developer Survey

How do you incorporate ML in a database driven app?

Adding ML to an application is challenging

Typical steps require ML expertise & manual work

Write application code to read data from the database

2

Format the data for the ML model

3Call an ML service to run the ML model on the formatted data

4

Select and train the ML model

1

Format the output for the application

5

Load the results to the application

6

ML for database developers and BI analysts

From six steps

Typical steps require ML expertise & manual work

Write application code to read data from the database

2

Format the data for the ML model

3Call an ML service to run the ML model on the formatted data

4

Select and train the ML model

1

Format the output for the application

5

Load the results to the application

6

To three steps

Run a SQL query to invoke the ML service

2

(Optional) Select and configure the ML model with Autopilot

1

Use the results in the application

3

Use the familiar SQL language for training & prediction

From SQL to ML-driven insights

SELECT from product_reviews WHERE

aws_comprehend.detect_sentiment

(review_text, ‘EN’)' = 'NEGATIVE'

CREATE TRIGGER insert_check

BEFORE INSERT ON sales

FOR EACH ROW

BEGIN

IF

is_transaction_fraudulent(column1,

column2, column3 …) = 'True' THEN

rollback; END IF;

END;

SELECT from customers order by

predicted_future_spend (column1,

column2, ...)

Under the hood: optimized Aurora ML query processing

ID User Feedback

1 Great product!

Good job

Mediocre

I didn’t like it

Loved it

Terrible service

50 Great service

Select * from

product_reviews where

aws_comprehend.detect

_sentiment(review_text,

‘EN’)' = ‘POSITIVE'"

Easier to Scale

Amazon SageMaker Ground Truthdata labelling

ML

Marketplace

Amazon

SageMaker

NeoBuilt-in

algorithms

SageMaker

Notebooks NEW

SageMaker

Experiments NEW

Model

tuning

SageMaker

Autopilot NEW

Model

hosting

SageMaker

Model Monitor NEW

ML SERVICES

SageMaker Studio IDE NEW

+

Successful ML requires complex, hard-to-discover combinations of algorithms, data, parameters

+

Time consuming, error prone

process even for ML experts

Combinatorial Largely explorative

& iterative

Requires broad and

complete knowledge

of ML domain

Customers faced a false choice

DIY model training

• Manual effort by experts

• Fully controlled and auditable

• Experts make tradeoff decisions

• Gets better over time with experience

Automated ML

• Accessible to experts and non-experts alike

• No visibility into the training process

• Can’t make tradeoffs between accuracy and other characteristics

Customers now have a better choice

Amazon SageMaker AutopilotDIY model training

• Manual effort by experts

• Fully controlled and auditable

• Experts make tradeoff decisions

• Gets better over time with experience

Automated ML

• Accessible to experts and non-experts alike

• No visibility into the training process

• Can’t make tradeoffs between accuracy and other characteristics

Integrated

with Studio

Automated machine learning with Amazon SageMaker Autopilot

Commented

notebook

describing actions

Specify

prediction target

Automated

feature

engineering

Regression &

classification

Automated

algorithm

selection & HPO

Under the hood

1 7 250

#Model Accuracy Latency Model Size

1churn-xgboost-1756-013-33398f0 95% 450ms 9.1MB

2 churn-xgboost-1756-014-53facc2 93% 200ms 4.8MB

3churn-xgboost-1756-015-58bc692 92% 200ms 4.3MB

4 churn-linear-1756-016-db54598 91% 50ms 1.3MB

5 churn-xgboost-1756-017-af8d756 91% 190ms 4.2MB

Model training involves tradeoffs

+

Easily explore many hard to discover combinationsof algorithms, data, parameters

+

Easy, transparent process

with accurate results

even for ML novices

Many

combinations

explored

Data-driven

exploration &

iteration

Notebook based

on analysis and

best practices

Orchestration: moving to production

• To go to production ML workloads need repeatable workflows

• This orchestration spans traditional boundaries between data science and IT

• Data scientists desire a familiar “language” to build and orchestrate workflows

Multiple orchestration frameworks…

But orchestration isn’t enough.

Native integration of Amazon SageMaker with Kubeflow

Train, tune, and deploy models

from popular orchestration

frameworks

Fully managed infrastructure

in Amazon SageMaker; no

resource management or

optimization required

Improve data scientist

productivity, while DevOps

orchestrates and automates

Amazon

Batch

Amazon

SageMakerAmazon S3

A data-scientist-friendly Python library for ML on AWS

Open-source project by Netflix

Now available at metaflow.org

Easier to Apply

VISION SPEECH TEXT SEARCH NEW CHATBOTS PERSONALIZATION FORECASTING FRAUD NEW DEVELOPMENT NEW CONTACT CENTERS

AmazonRekognition

AmazonPolly

AmazonTranscribe

+Medical

AmazonComprehend

+Medical

AmazonTranslate

AmazonLex

AmazonPersonalize

AmazonForecast

AmazonFraud Detector

AmazonCodeGuru

AI SERVICES

AmazonTextract

AmazonKendra

AmazonConnect

With Contact Lens

Employees spend 20% of their time looking for

information.—McKinsey

20%

44%44% of the time, they cannot find the information they need to do their job.

—IDC

Key Challenges

Low Accuracy

• 80% of data is unstructured

• Keyword Engines

Complexity

• Scattered Data Silos

• Stale Search Results

• Difficult to set up

Amazon Kendra-Rethinking Enterprise Search

Easy to find what

you are looking for

Simple and

quick to set up

Native connectorsNatural language

Queries

NLU and

ML core

Simple API

and console

experiences

Code samples

Continuous

Improvement

Domain

Expertise

Intelligent Digital Employee Experience Platform

Transforming the employee experience using AI and

conversational interfaces.

MOBILE

SOCIAL

CLOUD

ANALYTICS

“Enterprises are paying their

employees to waste time and

frustrating them in the

process.”

The Complexity of Today’s Workplace

Digitizing processes and transactions has led to enormous complexity

Consumer-like ExperiencesEngage employees with modern consumer-like experiences

they expect, anytime & anywhere.

Single Pane of Glass Consolidates notifications, tasks, and messaging from

disparate systems into a single, modern interface.

Elevates Communications Outside EmailShrink email volume with communications outside of

email, ensuring employees never miss updates.

Instant Access to Information & AnswersInstant access to information and automation of task – all

through natural language chatbot.

Workgrid Assistant - An Employee Experience Platform

Tasks Intelligence

Knowledge Entertainment

✓Corporate Answers

✓General Knowledge

✓Knowledge Extraction

✓Knowledge

Connectors

✓Action Based

Conversations

✓3rd Party Integrations.

✓Memory

✓Context

✓Personalization

✓Small Talk

✓Games

✓Quizzes

✓Witty Banter

A ‘good’ Natural Language Assistant

Unsupervised

Supervised Machine

Learning

Natural

Language

Processing

Speech

Deep LearningText Generation

Question Answering

Context Extraction

Classification

Machine Translation

Sentiment Analysis

Natural Language Understanding

Text to Speech

Transcription

Automatic Speech Recognition

Emotion Recognition

Named Entity Recognition

Reinforcement

AI

AI in the workplace

Image Recognition

Machine Vision

ML Based Search Advances in NLP can help understand user

intent.

Goodbye Keywords, Hello Queries Search is changing from keywords to

natural language queries – who, what,

why, how, when

Voice is on the Rise By 2020, 50% of all searches will be

conducted via voice using natural

language queries.

Search is Changing…

Amazon Kendra goes to the gym

Answers not links

Answers + Action = Assistance

Building and running high quality software today

Write Build +

Test

Deploy Measure Improve

Difficult in practice

Review

1. Expert reviewers

are hard to find

2. May not catch

all bugs

Application characteristics

constantly evolve so run-

time patterns for application

profile change

Hard to find ”most

expensive lines of code”:

Understanding run time

performance and

availability characteristics

require deep expertise in

Profiling

How do you rethink software development processes with ML?

Immediately identify

application

inefficiencies running

in a production

Automated code

reviews with

intelligent

recommendations

Seamlessly integrate with

pull request workflow or

performed anytime on-

demand

Automate code reviews and identify your most expensive lines of code with Amazon CodeGuru

Detect and optimize

the most expensive

line of code

Use ML to build and run high-performing software

Write + Review

Built-in code reviews

with intelligent

recommendations

Build + Test

Detect and optimize

the expensive lines

of code pre-prod

Measure

Easily identify application

inefficiencies in production

environment

Deploy Improve

CodeGuru Reviewer workflow

Code

Repository

CodeGuru

Reviewer

1. Repository

association

2. Pull request

4. Developer feedback

3. Recommendation

Repo admin

Amazon CodeGuru Reviewer – How it Works

Customer performs

Pull Request

Input:

Source Code

try (GZip gzip =

GZIPInputStream.create(

url.openStream())) {

use(gzip);

} catch (Exception e) {

handle();

}

Extract semantic

features/patterns

Feature Extraction

gzip =

GZIPInputStream

.create(stream)

use(gzip)

ENTRY

EXIT

stream =

url.openStream()

gzip.close()

handle()

throw

Exception

ML algorithms + Program

analysis identify code defects

Machine Learning

Code

corpus

Customers see

recommendations as

Pull Request comments

Output:

Recommendations

CodeGuru Reviewer – Concurrency

public String get(final String ip) {

if(!IP_PATTERN.matcher(ip).matches()) {

return ip;

}

if (repo.containsKey(ip)) {

return repo.get(ip);

}

…

}

Code

Recommendation“repo” is a ConcurrentHashMap and your usage of containsKey() and get() may not be thread-safe. In between containsKey()

and get(), another thread can remove the key and the get() will return null. Consider calling get() and using its result.

Good catch of a potential race.

Developer Feedback

public String get(final String ip) {

if(!IP_PATTERN.matcher(ip).matches()) {

return ip;

}

str = repo.get(ip);

if (str) {

return str;

}

…

}

Fix

CodeGuru Reviewer – Concurrency

synchronized (orderObject) {

obj = orderObject.get(name);

if (obj == null) {

obj = new orderObjectMarkdown(name, category);

orderObject.put(name, obj);

}

}

Code

Recommendation

Correct.

Developer Feedback

synchronized (orderObject) {


if (obj == null) {

obj = new orderObjectMarkdown(name, category);

orderObject.putIfAbsent(name, obj);


}

}

Fix

CodeGuru Reviewer – Pagination

This code might not produce accurate results if the operation returns paginated results

instead of all results. Consider adding an additional call getLastEvaluatedKey() to check for

additional results.

Recommendation

ScanResult sr = ddbClient.scan(scan);

return (sr != null ? sr.getItems() : null);

Code

Developer Feedback

This is right - scan is paginated so we should do the scan until lastEvaluatedKey is empty in the

response to do a complete scan.

Fix

CodeGuru Reviewer – Sensitive Information Leak

try {

updateJobStatus(context.getAwsRequestId(),

request.getAwsAccountId(),

request.getPredictorName(),

request.getInternalStatus());

} catch (ValidationException e) {

log.error(NON_RETRIABLE_LIST_ERROR_MESSAGE, e);

throw e;

} catch (InternalServerException e) {

log.warn(RETRIABLE_LIST_ERROR_MESSAGE, e);

retries++;

continue;

}

Code

This code contains a potential information leakage in the error handling for the following call: 'getAwsAccountId()'. You are handling this error

with catch clauses. There are methods available that could be added to handle sensitive data, like masking and redaction. For more information

about information leakage, see https://cwe.mitre.org/data/definitions/209.html

Recommendation

Developer Feedback

try {

updateJobStatus(context.getAwsRequestId(),

request.getAwsAccountId(),

request.getPredictorName(),

request.getInternalStatus());

} catch (ValidationException e) {

log.error(NON_RETRIABLE_LIST_ERROR_MESSAGE, redact(e));

throw redact(e);

} catch (InternalServerException e) {

log.warn(RETRIABLE_LIST_ERROR_MESSAGE, redact(e));

retries++;

continue;

}

Fix

Easier to build Easier to scale Easier to apply

Thank you!


leadership session: machine

Documents