intro to aws machine learning

20
AGENDA About me Predictive Analytics Amazon Machine Learning (ML) Amazon ML – Key Concepts Amazon ML – Datasources Amazon ML – Models Amazon ML – Evaluations Amazon ML – Demo AN INTRO TO AWS MACHINE LEARNING PREDICTIVE ANALYTICS

Upload: nvisia

Post on 23-Feb-2017

345 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Intro to AWS Machine Learning

AGENDA• About me• Predictive Analytics• Amazon Machine Learning (ML)• Amazon ML – Key Concepts• Amazon ML – Datasources• Amazon ML – Models• Amazon ML – Evaluations• Amazon ML – Demo

AN INTRO TO AWS MACHINE LEARNING

PREDICTIVE ANALYTICS

Page 2: Intro to AWS Machine Learning

2

ABOUT ME

NVISIA® Confidential 2016

Naveen VK• Principal Architect at NVISIA, a regional software development company• Worked for NVISIA for over 17 years• Designed and built custom multi-tier applications using Java Enterprise stack for various companies• Involved in entire application development lifecycle including requirements gathering, architecture, design,

implementation, integration, testing and deployment• Some clients: ETF - State of WI, American Family, Harley Davidson, Cumulus Media • Currently working at ETF (Employee Trust Fund)

• Manage pensions, insurance and other benefits for state and local employees• Involved in multiple projects (5) and currently supporting multiple applications (7)

• Has deep expertise in databases like Oracle (since 1994) and DB2 (since 1999) and with SQL queries and PL/SQL stored procedures

• 3 fun facts about myself

Page 3: Intro to AWS Machine Learning

3

PREDICTIVE ANALYTICS

What is Predictive Analytics? Some use cases/examples

NVISIA® Confidential 2016

Page 4: Intro to AWS Machine Learning

4

PREDICTIVE ANALYTICS

NVISIA® Confidential 2016

What is it?• Mining data, using statistical algorithms and machine learning to predict trends or probabilities• Use historical data and patterns in historical data to predict future• Create models based on patterns in data to predict the probability of something happening in the future• The better the model and the training data, the better the prediction

Examples• Is this email spam?• Will this product sell? • How many units of this product will sell?• Is this product a piece of clothing, a book or a movie?• What price will this house sell for? • What will be the temperature here tomorrow?

Page 5: Intro to AWS Machine Learning

5

AMAZON MACHINE LEARNING (ML) What is it? When to use it?

NVISIA® Confidential 2016

Page 6: Intro to AWS Machine Learning

6

AMAZON MACHINE LEARNING (ML)

NVISIA® Confidential 2016

• AWS (Amazon Web Service) cloud-based service for predictive analytics• Use tools and wizards to create machine learning models• Use simple APIs to obtain predictions for your application• No need to write custom code or have supporting infrastructure• Finds patterns in your existing data• Use models to process new data and generate predictions

When to use ML?• ML is not a solution for every type of problem

• A target value can be determined by coding simple rules, computations and steps without any data-driven learning

• Use ML when the rules cannot be programmed easily• Too many factors• Too many overlapping rules• Too much fine tuning of rules

• Use ML when the solution cannot be scaled• 100s of Millions vs. 100s (Example: manual vs. automated spam filter)

Page 7: Intro to AWS Machine Learning

7

AMAZON ML – KEY CONCEPTSTerms and concepts

NVISIA® Confidential 2016

Page 8: Intro to AWS Machine Learning

8

AMAZON ML – KEY CONCEPTS

NVISIA® Confidential 2016

Datasources• Contains metadata associated with data inputs to the ML• Speadsheets, CSV files, Streaming data, Relational data baseML Models• Patterns in data to generate predictionsEvaluations• Measure the quality of ML modelsBatch Predictions• Multiple data inputs aka batch data• AsynchronousRealtime Predictions• Individual data inputs• Synchronous

Page 9: Intro to AWS Machine Learning

9

AMAZON ML – DATASOURCESDetails of datasources in Amazon ML

NVISIA® Confidential 2016

Page 10: Intro to AWS Machine Learning

10

AMAZON ML – DATASOURCES

NVISIA® Confidential 2016

• In Amazon ML, a datasource contains only the metadata about the actual input data• Actual data may be stored in

• Amazon S3 buckets• Amazon Redshift Databases• MySQL databases in Amazon Relational Database Service (RDS)• Amazon Kinesis

• Attributes• Column headings represent attributes• Unique• Required

• Target Attribute• The data that is being predicted• Training data has a target attribute that has already been predicted (required in training data)

• Observation• Single row of data

• Input data• All observations aka Rows in spreadsheet/csv file or database

Page 11: Intro to AWS Machine Learning

11

AMAZON ML – DATASOURCES CONTINUED

NVISIA® Confidential 2016

• Schema• All attributes and corresponding data-types of input data

• Location• Location of input data stored in, say, Amazon S3 bucket

• Row ID• Attribute flagged to be included in prediction output• Helps cross-reference the prediction with the observation• Unique for each observation• Optional

• Datasource Name• Human readable name of the datasource• Optional

• Statistics• Summary stats for each attribute of input data

• Status• All attributes and corresponding data-types of input data

Page 12: Intro to AWS Machine Learning

12

AMAZON ML – MODEL

Details of mathematical model in Amazon ML

NVISIA® Confidential 2016

Page 13: Intro to AWS Machine Learning

13

AMAZON ML – MODEL

NVISIA® Confidential 2016

• In Amazon ML, a model finds patterns in data and generates predictions• Three distinct types of models

• Binary• Multiclass• Regression

• Type of model chosen based on the type of target to predict• Binary Model

• Predicts values that has 1 of 2 states: true/false, 1/0, win/lose, alive/dead, pass/fail, healthy/sick• Uses industry-wide standard learning algorithm called Binary Logistic Regression Algorithm

• Statistical model used to predict the probability of a binary response based on certain variables• Examples

• Is this email spam?• Will this product sell?

• Multiclass Model• Predicts values that belong to a pre-defined, limited set of states (1 of 3 or more states)• Uses industry-wide standard learning algorithm called Multinomial Logistic Regression Algorithm• Examples

• Is this product a book, a movie or apparel?• Is this movie a thriller, a documentary or a comedy?

Page 14: Intro to AWS Machine Learning

14

AMAZON ML – MODEL

NVISIA® Confidential 2016

• Regression Model• Predicts a numeric value• For regression problems• Uses industry-wide standard learning algorithm called Linear Regression Algorithm

• Statistical model to predict the value of y based on a number of variables x1, x2, x3, etc.• Examples:

• What will the temperature be tomorrow?• How many units of this product will sell?• How much will this house sell for?

• Recipe• Attributes and attribute transformations available to train the model

• Model size• In MB• Directly proportional to patterns stored in model

• Number of passes• The number of times the datasource is used when training the model

• Regularization• ML technique to get higher quality models

Page 15: Intro to AWS Machine Learning

15

AMAZON ML – EVALUATIONSEvaluate the model in Amazon ML

NVISIA® Confidential 2016

Page 16: Intro to AWS Machine Learning

16

AMAZON ML – EVALUATIONS

NVISIA® Confidential 2016

• In Amazon ML, an evaluation measures the quality of the ML model• Need to evaluate a model to determine if it will do a good job predicting the target on new/future data• Need training data where target is already predicted to train/evaluate a model

• Max size of training data: 100KB• Model Insight

• Amazon ML will provide metrics and insights to review accuracy of the model• Overall success metric of the model• Visualizations to explore accuracy of model• Alerts to check validity of evaluation

• Focus on Binary Insights only for this presentation

Page 17: Intro to AWS Machine Learning

17

AMAZON ML – EVALUATIONS – BINARY INSIGHTS

NVISIA® Confidential 2016

• Prediction score• Actual output of the binary prediction• Indicates the system’s certainty that the given observation has target value of 1• Output scores of observations is between 0 & 1• Default threshold score aka cut-off is 0.5, this can be changed

• Any observation that scores above cut-off is predicted as target=1 and below cut-off is predicted as 0• Correct predictions

• True Positive (TP)• Predicted value of target = 1, true value of target = 1

• True Negative (TN)• Predicted value of target = 0, true value of target = 0

• Incorrect predictions• False Positive (FP)

• Predicted value of target = 1, true value of target = 0• False Negative (FN)

• Predicted value of target = 0, true value of target = 1• Area Under the Curve (AUC)

• Measures the ability of the model to make a correct prediction• AUC near 1 indicates model is highly accurate (near 0s?)

Page 18: Intro to AWS Machine Learning

18

AMAZON ML – EVALUATIONS – BINARY INSIGHTS – AUC (AWS TUTORIAL)

NVISIA® Confidential 2016

Page 19: Intro to AWS Machine Learning

19

AMAZON ML – DEMO – BINARY MODEL

NVISIA® Confidential 2016

• Demo• Simple – predicting will this product sell?• Not so simple – predicting will this person survive?

• Checklist• Predictive Analytics• Amazon Machine Learning (ML)• Amazon ML – Key Concepts• Amazon ML – Datasources• Amazon ML – Models• Amazon ML – Evaluations• Amazon ML – Demo

• Pricing• https://aws.amazon.com/machine-learning/pricing/• Data analysis and model building: @0.42/hr• Batch predictions: $0.10/nearest 1000 (rounded up to the next 1000)• Realtime predictions: $0.0001/transaction (rounded to nearest penny)• S3 Standard storage: $0.03/TB/month

• Questions

Page 20: Intro to AWS Machine Learning

THANK YOU FOR COMING

Links: http://docs.aws.amazon.com/machine-learning/latest/dg/what-is-amazon-machine-learning.html https://www.kaggle.com/

Contact Info:Linked-In: Naveen VKEmail: [email protected] (work)

[email protected] (personal)Github: https://github.com/navnoon23/