leadership session: machine
TRANSCRIPT
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Leadership session: Machine learning
Swami Sivasubramanian
A I M 2 1 8 - L
VP, Amazon AI
Amazon Web Services
M A C H I N E L E A R N I N G I S H A P P E N I N G I N C O M P A N I E S O F E V E R Y S I Z E A N D I N D U S T R Y
Tens of thousands customers have chosen AWS for their ML workloads | More than twice as many customers using ML than any other cloud provider
But the journey has only just begun
Photography
Then Now
Machine learning in the hands
of all developers
Common challenges
• Skills gap—not enough people can build ML models
• ML model building is a time-consuming and complex process
• Finding the right business use cases that could benefit from ML
The AWS ML Stack
Broadest and most complete set of Machine Learning capabilities
VISION SPEECH TEXT SEARCH NEW CHATBOTS PERSONALIZATION FORECASTING FRAUD NEW DEVELOPMENT NEW CONTACT CENTERS
Amazon SageMaker Ground Truthdata labelling
ML
Marketplace
Amazon
SageMaker
NeoBuilt-in
algorithms
SageMaker
Notebooks NEW
SageMaker
Experiments NEW
Model
tuning
SageMaker
Autopilot NEW
Model
hosting
SageMaker
Model Monitor NEW
Deep Learning
AMIs & Containers
GPUs &
CPUs
Elastic
InferenceInferentia FPGA
AmazonRekognition
AmazonPolly
AmazonTranscribe
+Medical
AmazonComprehend
+Medical
AmazonTranslate
AmazonLex
AmazonPersonalize
AmazonForecast
AmazonFraud Detector
AmazonCodeGuru
AI SERVICES
ML SERVICES
ML FRAMEWORKS & INFRASTRUCTURE
AmazonTextract
AmazonKendra
AmazonConnect
with Contact Lens
SageMaker Studio IDE NEW
NEWNEW
NEW
Easier to build Easier to scale Easier to apply
Easier to Build
Deep Learning
AMIs & Containers
GPUs &
CPUs
Elastic
InferenceInferentia FPGA
ML FRAMEWORKS & INFRASTRUCTURE
Java is the language of choice for many customers
1. [1]TIBOE Programming Community Survey https://www.tiobe.com/tiobe-index/
2. [2] Stack Overflow Developer Survey https://insights.stackoverflow.com/survey/2019
Stack Overflow Developer Survey
#1 Language since 20041
>67% adoption in enterprise2
Simple Java API
Streamlined workflow and tech stack
DJL gives Java users an end-end solution for ML development
Models for prototyping
Simplify and accelerate development
Amazon forecasting team spends weeks refactoring ML models developed in Python
A reduction of 30% in development time using DJL
ML infrastructure team
Deploys models to predict in Java
Data science team
Develops forecasting models in Python—“How many Instant
Pot accessories will sell?”
SQL for ML
Stack Overflow Developer Survey
How do you incorporate ML in a database driven app?
Adding ML to an application is challenging
Typical steps require ML expertise & manual work
Write application code to read data from the database
2
Format the data for the ML model
3Call an ML service to run the ML model on the formatted data
4
Select and train the ML model
1
Format the output for the application
5
Load the results to the application
6
ML for database developers and BI analysts
From six steps
Typical steps require ML expertise & manual work
Write application code to read data from the database
2
Format the data for the ML model
3Call an ML service to run the ML model on the formatted data
4
Select and train the ML model
1
Format the output for the application
5
Load the results to the application
6
To three steps
Run a SQL query to invoke the ML service
2
(Optional) Select and configure the ML model with Autopilot
1
Use the results in the application
3
Use the familiar SQL language for training & prediction
From SQL to ML-driven insights
SELECT from product_reviews WHERE
aws_comprehend.detect_sentiment
(review_text, ‘EN’)' = 'NEGATIVE'
CREATE TRIGGER insert_check
BEFORE INSERT ON sales
FOR EACH ROW
BEGIN
IF
is_transaction_fraudulent(column1,
column2, column3 …) = 'True' THEN
rollback; END IF;
END;
SELECT from customers order by
predicted_future_spend (column1,
column2, ...)
Under the hood: optimized Aurora ML query processing
ID User Feedback
1 Great product!
Good job
Mediocre
I didn’t like it
Loved it
Terrible service
50 Great service
Select * from
product_reviews where
aws_comprehend.detect
_sentiment(review_text,
‘EN’)' = ‘POSITIVE'"
Easier to Scale
Amazon SageMaker Ground Truthdata labelling
ML
Marketplace
Amazon
SageMaker
NeoBuilt-in
algorithms
SageMaker
Notebooks NEW
SageMaker
Experiments NEW
Model
tuning
SageMaker
Autopilot NEW
Model
hosting
SageMaker
Model Monitor NEW
ML SERVICES
SageMaker Studio IDE NEW
+
Successful ML requires complex, hard-to-discover combinations of algorithms, data, parameters
+
Time consuming, error prone
process even for ML experts
Combinatorial Largely explorative
& iterative
Requires broad and
complete knowledge
of ML domain
Customers faced a false choice
DIY model training
• Manual effort by experts
• Fully controlled and auditable
• Experts make tradeoff decisions
• Gets better over time with experience
Automated ML
• Accessible to experts and non-experts alike
• No visibility into the training process
• Can’t make tradeoffs between accuracy and other characteristics
Customers now have a better choice
Amazon SageMaker AutopilotDIY model training
• Manual effort by experts
• Fully controlled and auditable
• Experts make tradeoff decisions
• Gets better over time with experience
Automated ML
• Accessible to experts and non-experts alike
• No visibility into the training process
• Can’t make tradeoffs between accuracy and other characteristics
Integrated
with Studio
Automated machine learning with Amazon SageMaker Autopilot
Commented
notebook
describing actions
Specify
prediction target
Automated
feature
engineering
Regression &
classification
Automated
algorithm
selection & HPO
Under the hood
1 7 250
#Model Accuracy Latency Model Size
1churn-xgboost-1756-013-33398f0 95% 450ms 9.1MB
2 churn-xgboost-1756-014-53facc2 93% 200ms 4.8MB
3churn-xgboost-1756-015-58bc692 92% 200ms 4.3MB
4 churn-linear-1756-016-db54598 91% 50ms 1.3MB
5 churn-xgboost-1756-017-af8d756 91% 190ms 4.2MB
Model training involves tradeoffs
+
Easily explore many hard to discover combinationsof algorithms, data, parameters
+
Easy, transparent process
with accurate results
even for ML novices
Many
combinations
explored
Data-driven
exploration &
iteration
Notebook based
on analysis and
best practices
Orchestration: moving to production
• To go to production ML workloads need repeatable workflows
• This orchestration spans traditional boundaries between data science and IT
• Data scientists desire a familiar “language” to build and orchestrate workflows
Multiple orchestration frameworks…
But orchestration isn’t enough.
Native integration of Amazon SageMaker with Kubeflow
Train, tune, and deploy models
from popular orchestration
frameworks
Fully managed infrastructure
in Amazon SageMaker; no
resource management or
optimization required
Improve data scientist
productivity, while DevOps
orchestrates and automates
Amazon
Batch
Amazon
SageMakerAmazon S3
A data-scientist-friendly Python library for ML on AWS
Open-source project by Netflix
Now available at metaflow.org
Easier to Apply
VISION SPEECH TEXT SEARCH NEW CHATBOTS PERSONALIZATION FORECASTING FRAUD NEW DEVELOPMENT NEW CONTACT CENTERS
AmazonRekognition
AmazonPolly
AmazonTranscribe
+Medical
AmazonComprehend
+Medical
AmazonTranslate
AmazonLex
AmazonPersonalize
AmazonForecast
AmazonFraud Detector
AmazonCodeGuru
AI SERVICES
AmazonTextract
AmazonKendra
AmazonConnect
With Contact Lens
Employees spend 20% of their time looking for
information.—McKinsey
20%
44%44% of the time, they cannot find the information they need to do their job.
—IDC
Key Challenges
Low Accuracy
• 80% of data is unstructured
• Keyword Engines
Complexity
• Scattered Data Silos
• Stale Search Results
• Difficult to set up
Amazon Kendra-Rethinking Enterprise Search
Easy to find what
you are looking for
Simple and
quick to set up
Native connectorsNatural language
Queries
NLU and
ML core
Simple API
and console
experiences
Code samples
Continuous
Improvement
Domain
Expertise
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Intelligent Digital Employee Experience Platform
Transforming the employee experience using AI and
conversational interfaces.
MOBILE
SOCIAL
CLOUD
ANALYTICS
“Enterprises are paying their
employees to waste time and
frustrating them in the
process.”
The Complexity of Today’s Workplace
Digitizing processes and transactions has led to enormous complexity
Consumer-like ExperiencesEngage employees with modern consumer-like experiences
they expect, anytime & anywhere.
Single Pane of Glass Consolidates notifications, tasks, and messaging from
disparate systems into a single, modern interface.
Elevates Communications Outside EmailShrink email volume with communications outside of
email, ensuring employees never miss updates.
Instant Access to Information & AnswersInstant access to information and automation of task – all
through natural language chatbot.
Workgrid Assistant - An Employee Experience Platform
Tasks Intelligence
Knowledge Entertainment
✓Corporate Answers
✓General Knowledge
✓Knowledge Extraction
✓Knowledge
Connectors
✓Action Based
Conversations
✓3rd Party Integrations.
✓Memory
✓Context
✓Personalization
✓Small Talk
✓Games
✓Quizzes
✓Witty Banter
A ‘good’ Natural Language Assistant
Unsupervised
Supervised Machine
Learning
Natural
Language
Processing
Speech
Deep LearningText Generation
Question Answering
Context Extraction
Classification
Machine Translation
Sentiment Analysis
Natural Language Understanding
Text to Speech
Transcription
Automatic Speech Recognition
Emotion Recognition
Named Entity Recognition
Reinforcement
AI
AI in the workplace
Image Recognition
Machine Vision
ML Based Search Advances in NLP can help understand user
intent.
Goodbye Keywords, Hello Queries Search is changing from keywords to
natural language queries – who, what,
why, how, when
Voice is on the Rise By 2020, 50% of all searches will be
conducted via voice using natural
language queries.
Search is Changing…
Amazon Kendra goes to the gym
Answers not links
Answers + Action = Assistance
Building and running high quality software today
Write Build +
Test
Deploy Measure Improve
Difficult in practice
Review
1. Expert reviewers
are hard to find
2. May not catch
all bugs
Application characteristics
constantly evolve so run-
time patterns for application
profile change
Hard to find ”most
expensive lines of code”:
Understanding run time
performance and
availability characteristics
require deep expertise in
Profiling
How do you rethink software development processes with ML?
Immediately identify
application
inefficiencies running
in a production
Automated code
reviews with
intelligent
recommendations
Seamlessly integrate with
pull request workflow or
performed anytime on-
demand
Automate code reviews and identify your most expensive lines of code with Amazon CodeGuru
Detect and optimize
the most expensive
line of code
Use ML to build and run high-performing software
Write + Review
Built-in code reviews
with intelligent
recommendations
Build + Test
Detect and optimize
the expensive lines
of code pre-prod
Measure
Easily identify application
inefficiencies in production
environment
Deploy Improve
CodeGuru Reviewer workflow
Code
Repository
CodeGuru
Reviewer
1. Repository
association
2. Pull request
4. Developer feedback
3. Recommendation
Repo admin
Amazon CodeGuru Reviewer – How it Works
Customer performs
Pull Request
Input:
Source Code
try (GZip gzip =
GZIPInputStream.create(
url.openStream())) {
use(gzip);
} catch (Exception e) {
handle();
}
Extract semantic
features/patterns
Feature Extraction
gzip =
GZIPInputStream
.create(stream)
use(gzip)
ENTRY
EXIT
stream =
url.openStream()
gzip.close()
handle()
throw
Exception
ML algorithms + Program
analysis identify code defects
Machine Learning
Code
corpus
Customers see
recommendations as
Pull Request comments
Output:
Recommendations
CodeGuru Reviewer – Concurrency
public String get(final String ip) {
if(!IP_PATTERN.matcher(ip).matches()) {
return ip;
}
if (repo.containsKey(ip)) {
return repo.get(ip);
}
…
}
Code
Recommendation“repo” is a ConcurrentHashMap and your usage of containsKey() and get() may not be thread-safe. In between containsKey()
and get(), another thread can remove the key and the get() will return null. Consider calling get() and using its result.
Good catch of a potential race.
Developer Feedback
public String get(final String ip) {
if(!IP_PATTERN.matcher(ip).matches()) {
return ip;
}
str = repo.get(ip);
if (str) {
return str;
}
…
}
Fix
CodeGuru Reviewer – Concurrency
synchronized (orderObject) {
obj = orderObject.get(name);
if (obj == null) {
obj = new orderObjectMarkdown(name, category);
orderObject.put(name, obj);
}
}
Code
Recommendation
Correct.
Developer Feedback
synchronized (orderObject) {
obj = orderObject.get(name);
if (obj == null) {
obj = new orderObjectMarkdown(name, category);
orderObject.putIfAbsent(name, obj);
obj = orderObject.get(name);
}
}
Fix
CodeGuru Reviewer – Pagination
This code might not produce accurate results if the operation returns paginated results
instead of all results. Consider adding an additional call getLastEvaluatedKey() to check for
additional results.
Recommendation
ScanResult sr = ddbClient.scan(scan);
return (sr != null ? sr.getItems() : null);
Code
Developer Feedback
This is right - scan is paginated so we should do the scan until lastEvaluatedKey is empty in the
response to do a complete scan.
Fix
CodeGuru Reviewer – Sensitive Information Leak
try {
updateJobStatus(context.getAwsRequestId(),
request.getAwsAccountId(),
request.getPredictorName(),
request.getInternalStatus());
} catch (ValidationException e) {
log.error(NON_RETRIABLE_LIST_ERROR_MESSAGE, e);
throw e;
} catch (InternalServerException e) {
log.warn(RETRIABLE_LIST_ERROR_MESSAGE, e);
retries++;
continue;
}
Code
This code contains a potential information leakage in the error handling for the following call: 'getAwsAccountId()'. You are handling this error
with catch clauses. There are methods available that could be added to handle sensitive data, like masking and redaction. For more information
about information leakage, see https://cwe.mitre.org/data/definitions/209.html
Recommendation
Developer Feedback
try {
updateJobStatus(context.getAwsRequestId(),
request.getAwsAccountId(),
request.getPredictorName(),
request.getInternalStatus());
} catch (ValidationException e) {
log.error(NON_RETRIABLE_LIST_ERROR_MESSAGE, redact(e));
throw redact(e);
} catch (InternalServerException e) {
log.warn(RETRIABLE_LIST_ERROR_MESSAGE, redact(e));
retries++;
continue;
}
Fix
Easier to build Easier to scale Easier to apply
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.