leadership session: machine

61

Upload: others

Post on 04-Jun-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Leadership session: Machine
Page 2: Leadership session: Machine

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Leadership session: Machine learning

Swami Sivasubramanian

A I M 2 1 8 - L

VP, Amazon AI

Amazon Web Services

Page 3: Leadership session: Machine

M A C H I N E L E A R N I N G I S H A P P E N I N G I N C O M P A N I E S O F E V E R Y S I Z E A N D I N D U S T R Y

Tens of thousands customers have chosen AWS for their ML workloads | More than twice as many customers using ML than any other cloud provider

Page 4: Leadership session: Machine

But the journey has only just begun

Page 5: Leadership session: Machine

Photography

Then Now

Page 6: Leadership session: Machine

Machine learning in the hands

of all developers

Page 7: Leadership session: Machine

Common challenges

• Skills gap—not enough people can build ML models

• ML model building is a time-consuming and complex process

• Finding the right business use cases that could benefit from ML

Page 8: Leadership session: Machine

The AWS ML Stack

Broadest and most complete set of Machine Learning capabilities

VISION SPEECH TEXT SEARCH NEW CHATBOTS PERSONALIZATION FORECASTING FRAUD NEW DEVELOPMENT NEW CONTACT CENTERS

Amazon SageMaker Ground Truthdata labelling

ML

Marketplace

Amazon

SageMaker

NeoBuilt-in

algorithms

SageMaker

Notebooks NEW

SageMaker

Experiments NEW

Model

tuning

SageMaker

Autopilot NEW

Model

hosting

SageMaker

Model Monitor NEW

Deep Learning

AMIs & Containers

GPUs &

CPUs

Elastic

InferenceInferentia FPGA

AmazonRekognition

AmazonPolly

AmazonTranscribe

+Medical

AmazonComprehend

+Medical

AmazonTranslate

AmazonLex

AmazonPersonalize

AmazonForecast

AmazonFraud Detector

AmazonCodeGuru

AI SERVICES

ML SERVICES

ML FRAMEWORKS & INFRASTRUCTURE

AmazonTextract

AmazonKendra

AmazonConnect

with Contact Lens

SageMaker Studio IDE NEW

NEWNEW

NEW

Page 9: Leadership session: Machine

Easier to build Easier to scale Easier to apply

Page 10: Leadership session: Machine

Easier to Build

Deep Learning

AMIs & Containers

GPUs &

CPUs

Elastic

InferenceInferentia FPGA

ML FRAMEWORKS & INFRASTRUCTURE

Page 11: Leadership session: Machine

Java is the language of choice for many customers

1. [1]TIBOE Programming Community Survey https://www.tiobe.com/tiobe-index/

2. [2] Stack Overflow Developer Survey https://insights.stackoverflow.com/survey/2019

Stack Overflow Developer Survey

#1 Language since 20041

>67% adoption in enterprise2

Page 12: Leadership session: Machine

Simple Java API

Streamlined workflow and tech stack

DJL gives Java users an end-end solution for ML development

Models for prototyping

Simplify and accelerate development

Page 13: Leadership session: Machine

Amazon forecasting team spends weeks refactoring ML models developed in Python

A reduction of 30% in development time using DJL

ML infrastructure team

Deploys models to predict in Java

Data science team

Develops forecasting models in Python—“How many Instant

Pot accessories will sell?”

Page 14: Leadership session: Machine

SQL for ML

Stack Overflow Developer Survey

Page 15: Leadership session: Machine

How do you incorporate ML in a database driven app?

Page 16: Leadership session: Machine

Adding ML to an application is challenging

Typical steps require ML expertise & manual work

Write application code to read data from the database

2

Format the data for the ML model

3Call an ML service to run the ML model on the formatted data

4

Select and train the ML model

1

Format the output for the application

5

Load the results to the application

6

Page 17: Leadership session: Machine

ML for database developers and BI analysts

Page 18: Leadership session: Machine

From six steps

Typical steps require ML expertise & manual work

Write application code to read data from the database

2

Format the data for the ML model

3Call an ML service to run the ML model on the formatted data

4

Select and train the ML model

1

Format the output for the application

5

Load the results to the application

6

Page 19: Leadership session: Machine

To three steps

Run a SQL query to invoke the ML service

2

(Optional) Select and configure the ML model with Autopilot

1

Use the results in the application

3

Use the familiar SQL language for training & prediction

Page 20: Leadership session: Machine

From SQL to ML-driven insights

SELECT from product_reviews WHERE

aws_comprehend.detect_sentiment

(review_text, ‘EN’)' = 'NEGATIVE'

CREATE TRIGGER insert_check

BEFORE INSERT ON sales

FOR EACH ROW

BEGIN

IF

is_transaction_fraudulent(column1,

column2, column3 …) = 'True' THEN

rollback; END IF;

END;

SELECT from customers order by

predicted_future_spend (column1,

column2, ...)

Page 21: Leadership session: Machine

Under the hood: optimized Aurora ML query processing

ID User Feedback

1 Great product!

Good job

Mediocre

I didn’t like it

Loved it

Terrible service

50 Great service

Select * from

product_reviews where

aws_comprehend.detect

_sentiment(review_text,

‘EN’)' = ‘POSITIVE'"

Page 22: Leadership session: Machine

Easier to Scale

Amazon SageMaker Ground Truthdata labelling

ML

Marketplace

Amazon

SageMaker

NeoBuilt-in

algorithms

SageMaker

Notebooks NEW

SageMaker

Experiments NEW

Model

tuning

SageMaker

Autopilot NEW

Model

hosting

SageMaker

Model Monitor NEW

ML SERVICES

SageMaker Studio IDE NEW

Page 23: Leadership session: Machine

+

Successful ML requires complex, hard-to-discover combinations of algorithms, data, parameters

+

Time consuming, error prone

process even for ML experts

Combinatorial Largely explorative

& iterative

Requires broad and

complete knowledge

of ML domain

Page 24: Leadership session: Machine

Customers faced a false choice

DIY model training

• Manual effort by experts

• Fully controlled and auditable

• Experts make tradeoff decisions

• Gets better over time with experience

Automated ML

• Accessible to experts and non-experts alike

• No visibility into the training process

• Can’t make tradeoffs between accuracy and other characteristics

Page 25: Leadership session: Machine

Customers now have a better choice

Amazon SageMaker AutopilotDIY model training

• Manual effort by experts

• Fully controlled and auditable

• Experts make tradeoff decisions

• Gets better over time with experience

Automated ML

• Accessible to experts and non-experts alike

• No visibility into the training process

• Can’t make tradeoffs between accuracy and other characteristics

Page 26: Leadership session: Machine

Integrated

with Studio

Automated machine learning with Amazon SageMaker Autopilot

Commented

notebook

describing actions

Specify

prediction target

Automated

feature

engineering

Regression &

classification

Automated

algorithm

selection & HPO

Page 27: Leadership session: Machine

Under the hood

1 7 250

Page 28: Leadership session: Machine

#Model Accuracy Latency Model Size

1churn-xgboost-1756-013-33398f0 95% 450ms 9.1MB

2 churn-xgboost-1756-014-53facc2 93% 200ms 4.8MB

3churn-xgboost-1756-015-58bc692 92% 200ms 4.3MB

4 churn-linear-1756-016-db54598 91% 50ms 1.3MB

5 churn-xgboost-1756-017-af8d756 91% 190ms 4.2MB

Model training involves tradeoffs

Page 29: Leadership session: Machine

+

Easily explore many hard to discover combinationsof algorithms, data, parameters

+

Easy, transparent process

with accurate results

even for ML novices

Many

combinations

explored

Data-driven

exploration &

iteration

Notebook based

on analysis and

best practices

Page 30: Leadership session: Machine

Orchestration: moving to production

• To go to production ML workloads need repeatable workflows

• This orchestration spans traditional boundaries between data science and IT

• Data scientists desire a familiar “language” to build and orchestrate workflows

Page 31: Leadership session: Machine

Multiple orchestration frameworks…

But orchestration isn’t enough.

Page 32: Leadership session: Machine

Native integration of Amazon SageMaker with Kubeflow

Train, tune, and deploy models

from popular orchestration

frameworks

Fully managed infrastructure

in Amazon SageMaker; no

resource management or

optimization required

Improve data scientist

productivity, while DevOps

orchestrates and automates

Page 33: Leadership session: Machine

Amazon

Batch

Amazon

SageMakerAmazon S3

A data-scientist-friendly Python library for ML on AWS

Open-source project by Netflix

Now available at metaflow.org

Page 34: Leadership session: Machine

Easier to Apply

VISION SPEECH TEXT SEARCH NEW CHATBOTS PERSONALIZATION FORECASTING FRAUD NEW DEVELOPMENT NEW CONTACT CENTERS

AmazonRekognition

AmazonPolly

AmazonTranscribe

+Medical

AmazonComprehend

+Medical

AmazonTranslate

AmazonLex

AmazonPersonalize

AmazonForecast

AmazonFraud Detector

AmazonCodeGuru

AI SERVICES

AmazonTextract

AmazonKendra

AmazonConnect

With Contact Lens

Page 35: Leadership session: Machine

Employees spend 20% of their time looking for

information.—McKinsey

20%

44%44% of the time, they cannot find the information they need to do their job.

—IDC

Page 36: Leadership session: Machine

Key Challenges

Low Accuracy

• 80% of data is unstructured

• Keyword Engines

Complexity

• Scattered Data Silos

• Stale Search Results

• Difficult to set up

Page 37: Leadership session: Machine

Amazon Kendra-Rethinking Enterprise Search

Easy to find what

you are looking for

Simple and

quick to set up

Native connectorsNatural language

Queries

NLU and

ML core

Simple API

and console

experiences

Code samples

Continuous

Improvement

Domain

Expertise

Page 38: Leadership session: Machine

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 39: Leadership session: Machine

Intelligent Digital Employee Experience Platform

Transforming the employee experience using AI and

conversational interfaces.

Page 40: Leadership session: Machine

MOBILE

SOCIAL

CLOUD

ANALYTICS

“Enterprises are paying their

employees to waste time and

frustrating them in the

process.”

The Complexity of Today’s Workplace

Digitizing processes and transactions has led to enormous complexity

Page 41: Leadership session: Machine

Consumer-like ExperiencesEngage employees with modern consumer-like experiences

they expect, anytime & anywhere.

Single Pane of Glass Consolidates notifications, tasks, and messaging from

disparate systems into a single, modern interface.

Elevates Communications Outside EmailShrink email volume with communications outside of

email, ensuring employees never miss updates.

Instant Access to Information & AnswersInstant access to information and automation of task – all

through natural language chatbot.

Workgrid Assistant - An Employee Experience Platform

Page 42: Leadership session: Machine

Tasks Intelligence

Knowledge Entertainment

✓Corporate Answers

✓General Knowledge

✓Knowledge Extraction

✓Knowledge

Connectors

✓Action Based

Conversations

✓3rd Party Integrations.

✓Memory

✓Context

✓Personalization

✓Small Talk

✓Games

✓Quizzes

✓Witty Banter

A ‘good’ Natural Language Assistant

Page 43: Leadership session: Machine

Unsupervised

Supervised Machine

Learning

Natural

Language

Processing

Speech

Deep LearningText Generation

Question Answering

Context Extraction

Classification

Machine Translation

Sentiment Analysis

Natural Language Understanding

Text to Speech

Transcription

Automatic Speech Recognition

Emotion Recognition

Named Entity Recognition

Reinforcement

AI

AI in the workplace

Image Recognition

Machine Vision

Page 44: Leadership session: Machine

ML Based Search Advances in NLP can help understand user

intent.

Goodbye Keywords, Hello Queries Search is changing from keywords to

natural language queries – who, what,

why, how, when

Voice is on the Rise By 2020, 50% of all searches will be

conducted via voice using natural

language queries.

Search is Changing…

Page 45: Leadership session: Machine

Amazon Kendra goes to the gym

Page 46: Leadership session: Machine
Page 47: Leadership session: Machine
Page 48: Leadership session: Machine

Answers not links

Answers + Action = Assistance

Page 49: Leadership session: Machine

Building and running high quality software today

Write Build +

Test

Deploy Measure Improve

Difficult in practice

Review

1. Expert reviewers

are hard to find

2. May not catch

all bugs

Application characteristics

constantly evolve so run-

time patterns for application

profile change

Hard to find ”most

expensive lines of code”:

Understanding run time

performance and

availability characteristics

require deep expertise in

Profiling

Page 50: Leadership session: Machine

How do you rethink software development processes with ML?

Page 51: Leadership session: Machine

Immediately identify

application

inefficiencies running

in a production

Automated code

reviews with

intelligent

recommendations

Seamlessly integrate with

pull request workflow or

performed anytime on-

demand

Automate code reviews and identify your most expensive lines of code with Amazon CodeGuru

Detect and optimize

the most expensive

line of code

Page 52: Leadership session: Machine

Use ML to build and run high-performing software

Write + Review

Built-in code reviews

with intelligent

recommendations

Build + Test

Detect and optimize

the expensive lines

of code pre-prod

Measure

Easily identify application

inefficiencies in production

environment

Deploy Improve

Page 53: Leadership session: Machine

CodeGuru Reviewer workflow

Code

Repository

CodeGuru

Reviewer

1. Repository

association

2. Pull request

4. Developer feedback

3. Recommendation

Repo admin

Page 54: Leadership session: Machine

Amazon CodeGuru Reviewer – How it Works

Customer performs

Pull Request

Input:

Source Code

try (GZip gzip =

GZIPInputStream.create(

url.openStream())) {

use(gzip);

} catch (Exception e) {

handle();

}

Extract semantic

features/patterns

Feature Extraction

gzip =

GZIPInputStream

.create(stream)

use(gzip)

ENTRY

EXIT

stream =

url.openStream()

gzip.close()

handle()

throw

Exception

ML algorithms + Program

analysis identify code defects

Machine Learning

Code

corpus

Customers see

recommendations as

Pull Request comments

Output:

Recommendations

Page 55: Leadership session: Machine

CodeGuru Reviewer – Concurrency

public String get(final String ip) {

if(!IP_PATTERN.matcher(ip).matches()) {

return ip;

}

if (repo.containsKey(ip)) {

return repo.get(ip);

}

}

Code

Recommendation“repo” is a ConcurrentHashMap and your usage of containsKey() and get() may not be thread-safe. In between containsKey()

and get(), another thread can remove the key and the get() will return null. Consider calling get() and using its result.

Good catch of a potential race.

Developer Feedback

public String get(final String ip) {

if(!IP_PATTERN.matcher(ip).matches()) {

return ip;

}

str = repo.get(ip);

if (str) {

return str;

}

}

Fix

Page 56: Leadership session: Machine

CodeGuru Reviewer – Concurrency

synchronized (orderObject) {

obj = orderObject.get(name);

if (obj == null) {

obj = new orderObjectMarkdown(name, category);

orderObject.put(name, obj);

}

}

Code

Recommendation

Correct.

Developer Feedback

synchronized (orderObject) {

obj = orderObject.get(name);

if (obj == null) {

obj = new orderObjectMarkdown(name, category);

orderObject.putIfAbsent(name, obj);

obj = orderObject.get(name);

}

}

Fix

Page 57: Leadership session: Machine

CodeGuru Reviewer – Pagination

This code might not produce accurate results if the operation returns paginated results

instead of all results. Consider adding an additional call getLastEvaluatedKey() to check for

additional results.

Recommendation

ScanResult sr = ddbClient.scan(scan);

return (sr != null ? sr.getItems() : null);

Code

Developer Feedback

This is right - scan is paginated so we should do the scan until lastEvaluatedKey is empty in the

response to do a complete scan.

Fix

Page 58: Leadership session: Machine

CodeGuru Reviewer – Sensitive Information Leak

try {

updateJobStatus(context.getAwsRequestId(),

request.getAwsAccountId(),

request.getPredictorName(),

request.getInternalStatus());

} catch (ValidationException e) {

log.error(NON_RETRIABLE_LIST_ERROR_MESSAGE, e);

throw e;

} catch (InternalServerException e) {

log.warn(RETRIABLE_LIST_ERROR_MESSAGE, e);

retries++;

continue;

}

Code

This code contains a potential information leakage in the error handling for the following call: 'getAwsAccountId()'. You are handling this error

with catch clauses. There are methods available that could be added to handle sensitive data, like masking and redaction. For more information

about information leakage, see https://cwe.mitre.org/data/definitions/209.html

Recommendation

Developer Feedback

try {

updateJobStatus(context.getAwsRequestId(),

request.getAwsAccountId(),

request.getPredictorName(),

request.getInternalStatus());

} catch (ValidationException e) {

log.error(NON_RETRIABLE_LIST_ERROR_MESSAGE, redact(e));

throw redact(e);

} catch (InternalServerException e) {

log.warn(RETRIABLE_LIST_ERROR_MESSAGE, redact(e));

retries++;

continue;

}

Fix

Page 59: Leadership session: Machine

Easier to build Easier to scale Easier to apply

Page 60: Leadership session: Machine

Thank you!

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 61: Leadership session: Machine

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.