developers intro to data science · 2020. 7. 10. · • developers intro to data science learn...
TRANSCRIPT
-
Sarah Guthals, PhD - @sarahguthals
Francesca Lazzeri, PhD - @frlazzeri
Developer's Intro to Data Science
-
Introductions
• Who are we?
• Who are you?
• What is our goal?
-
Sarah Guthals, PhD
Principal Program Manager, Academic Ecosystems, Microsoft
• Obsessed with teaching and learning tech
• Goes to Disneyland whenever possible
• Spouse and Mother of a two-legged toddler and FOUR four-legged children
https://guthals.com/sarah
@sarahguthals
https://twitter.com/frlazzeri
-
Francesca Lazzeri, PhD
Senior Cloud Advocate Lead, AI & ML for Academics + Research, Microsoft
• In love with machine learning algorithms and operations research
• Impressionism art addicted
• Jazz music connoisseur
https://medium.com/@francescalazzeri
@frlazzeri
https://twitter.com/frlazzeri
-
Who we represent
Sarah
• Developer
• Knows data is important
• No clue how to get started
Francesca
• Machine Learning Scientist
• Knows data is important
• Knows how to get started (and finish!)
-
You
Experience coding in a text-based programming language• Python
• JavaScript
• C#
Experience as a developer• Built an app from start to finish
• Completed a coding course or bootcamp
Looking to get started with data science• Partner with data scientists
• Become a data scientist
• Make data-informed decisions in development
-
Goals of this Series
Why Data Science• What is data science
• Who is involved
• How it can help development
How Data Science• Explore and prepare data
• Elevate to use machine learning
Ethical Data Science• How to stay ethical
-
Topics Covered
The Data Science Lifecyle
Formulating business questions
Intro to Machine Learning Algorithms
Preparing data in Visual Studio Code
Using AutoML to train and test data
Using Azure Machine Learning Workspace
Choosing the best model
Selecting ML algorithms
Ethics in data
Model Interpretability
-
• The Developer’s Introduction to Data Science GitHub Repo: aka.ms/DevIntroDS_GitHub
• Developers Intro to Data Science Learn Collection aka.ms/DevIntroDS_Learn
• The Data Science Lifecycle: aka.ms/DataScienceLifecycle
• Algorithm Cheat Sheet: aka.ms/AlgorithmCheatSheet
• Automated Machine Learning: aka.ms/AutomatedML
• Azure Machine Learning service: aka.ms/AzureMLservice
• Auto ML Featurization : aka.ms/AutoMLfeaturization
• How to Select Machine Learning algorithms: aka.ms/SelectAlgos
• Model Interpretability: aka.ms/ModelInterpretability Sarah Guthals, PhD - @sarahguthals
Francesca Lazzeri, PhD - @frlazzeri
aka.ms/DevIntroDS_GitHubaka.ms/DevIntroDS_Learnhttp://www.aka.ms/DataScienceLifecyclehttp://aka.ms/algorithmcheatsheethttps://aka.ms/AutomatedMLhttps://aka.ms/AzureMLservicehttp://www.aka.ms/AutoMLfeaturizationhttp://www.aka.ms/SelectAlgoshttps://aka.ms/https://aka.ms/AzureMLModelInterpretability
-
Francesca Lazzeri, PhD
The Data Science Lifecycle
@frlazzeri
www.aka.ms/DataScienceLifecycle
https://twitter.com/frlazzerihttp://www.aka.ms/DataScienceLifecycle
-
Data Science Lifecyle www.aka.ms/DataScienceLifecycle
http://www.aka.ms/DataScienceLifecycle
-
Data Science Lifecyle www.aka.ms/DataScienceLifecycle
http://www.aka.ms/DataScienceLifecycle
-
Data Science Lifecyle www.aka.ms/DataScienceLifecycle
http://www.aka.ms/DataScienceLifecycle
-
Data Science Lifecyle www.aka.ms/DataScienceLifecycle
http://www.aka.ms/DataScienceLifecycle
-
Data Science Lifecyle www.aka.ms/DataScienceLifecycle
http://www.aka.ms/DataScienceLifecycle
-
Data Science Lifecyle www.aka.ms/DataScienceLifecycle
http://www.aka.ms/DataScienceLifecycle
-
Sarah Guthals, PhD
Defining the Problem
@sarahguthals
https://aka.ms/DataScienceBusinessUnderstanding
https://twitter.com/frlazzeri
-
Use data to improve my app
Bike sharing app Data can inform decisions Learn how to iteratively improve
-
Goal: Make my bike sharing app better
Increase the number of bikes rented.
Make sure bikes are where they should be at
peak commute times.
Predict how many bikes will be rented within
the next hour.
-
Predict how many bikes will
be rented in the next hour
-
Data Science Lifecyle
-
Francesca Lazzeri, PhD
Introduction to Machine Learning
@frlazzeri
www.aka.ms/AlgorithmCheatSheet
https://twitter.com/frlazzeri
-
Computation Computation
www.aka.ms/AlgorithmCheatSheet
-
Regression: how much / how many?
Supervised Learning vs Unsupervised Learning
Classification: which class does it belong to?
Clustering: are there different groups? Which does it belong to?
Anomaly Detection: is this weird?
Recommendation: which option should I choose?
sup
erv
ised
learn
ing
un
sup
erv
ised
learn
ing
www.aka.ms/AlgorithmCheatSheet
-
The Machine Learning Model Building Process
• Test Candidate
Model with unseen
data
• Select good enough
model
• Deploy Chosen
Model
• Application posts to
API
• Apply learning
algorithm
• Select Candidate
model
• Find, Select and/or
Create Data
• Apply preprocessing
Prepare
Data
Train
Model
Test
Model
Deploy
Model
www.aka.ms/AlgorithmCheatSheet
-
Francesca Lazzeri, PhD
Machine Learning Algorithms
@frlazzeri
www.aka.ms/AlgorithmCheatSheet
https://twitter.com/frlazzeri
-
www.aka.ms/algorithmcheatsheet
http://www.aka.ms/algorithmcheatsheet
-
Understand
images and
natural
language
Predict
between
categories
Predict results
based on
relationship
between
values
Machine Learning Algorithms
Discover patterns in your data
www.aka.ms/algorithmcheatsheet
http://www.aka.ms/algorithmcheatsheet
-
Predict between categories
Two-Class ClassificationGoal: Predict between two categories
Answers two-choice questions, like yes or no, true of false
Is this a romantic movie or an adventure movie?
Multiclass ClassificationGoal: Predict between several categories
Answers complex questions with multiple possible answers
Is this a romantic movie or an adventure movie or a musical?
www.aka.ms/algorithmcheatsheet
http://www.aka.ms/algorithmcheatsheet
-
Discover patterns in your data
RecommendersGoal: Generate recommendations
Predicts what someone will be interested in
What will customers buy next?
ClusteringGoal: Discover structure
Separates similar data points into intuitive groups
How can I segment my customers based on their preferences and run a better advertising strategy?
Anomaly DetectionGoal: Find unusual occurrences
Identifies and predicts rare or unusual data points
Can I detect equipment anomalies and predict maintenance operations in industrial plants?
www.aka.ms/algorithmcheatsheet
http://www.aka.ms/algorithmcheatsheet
-
Understand images and natural language
Image ClassificationGoal: Classify images
Able to identify images with popular networks
Does this image represent a dog or a cat?
Text AnalyticsGoal: Extract information from text
Derives high-quality information from text
What are our customer feedback and reviews on the quality of our products?
www.aka.ms/algorithmcheatsheet
http://www.aka.ms/algorithmcheatsheet
-
Predict results based on relationship between values
RegressionGoal: Predict Values
Makes forecasts by estimating the relationship between values
What are the forecasted sales quantities per item per store for the next 4 weeks?
www.aka.ms/algorithmcheatsheet
http://www.aka.ms/algorithmcheatsheet
-
Predict how many bikes will be rented in the
next hour
Regression: Predict outcomes based on relationship between values
-
Francesca Lazzeri, PhD
Automated ML
@frlazzeri
www.aka.ms/AutomatedML
https://twitter.com/frlazzeri
-
www.aka.ms/algorithmcheatsheet
http://www.aka.ms/algorithmcheatsheet
-
• The search space to explore—i.e. evaluating all possible combinations—is huge.
• Sparsity of good configurations. Very few of all possible configurations are optimal.
• Evaluating each configuration is resource and time consuming.
• Time and resources are limited.
Challenges with Hyperparameter Selection
www.aka.ms/AutomatedML
-
• Automated machine learning is the process of automating the time consuming,
iterative tasks of machine learning model development.
• Data scientists, analysts and developers across industries can use automated ML
to:
o Implement machine learning solutions without extensive programming
knowledge
o Save time and resources
o Leverage data science best practices
o Provide agile problem-solving
Automated Machine Learningwww.aka.ms/AutomatedML
-
How Automated ML workswww.aka.ms/AutomatedML
-
AutoMLConfig classwww.aka.ms/AutomatedML
It represents configuration for submitting an automated ML experiment
in Azure Machine Learning.
www.aka.ms/AutoMLConfig-Class
http://www.aka.ms/AutoMLConfig-Class
-
Sarah Guthals, PhD
Azure Machine Learning
@sarahguthals
https://aka.ms/AzureMLGettingStarted
https://twitter.com/frlazzeri
-
Sarah Guthals, PhD
Set Up Local Environment with VS Code
@sarahguthals
https://aka.ms/PythonInVSCode
https://twitter.com/frlazzeri
-
Sarah Guthals, PhD
Jupyter Notebooks in VS Code
@sarahguthals
https://aka.ms/DataScienceInVSCode
https://twitter.com/frlazzeri
-
Sarah Guthals, PhD
AzureML in VS Code Notebooks
@sarahguthals
https://aka.ms/IntroToAzureMLSDK
https://twitter.com/frlazzeri
-
Sarah Guthals, PhD
Connect Data Between AzureML and VS Code
@sarahguthals
https://aka.ms/AzureMLDatastore
https://twitter.com/frlazzeri
-
Sarah Guthals, PhD
Split Train and Test Data
@sarahguthals
https://aka.ms/AzureMLRepository
https://twitter.com/frlazzeri
-
How to create your AutoML Config object
Francesca Lazzeri, PhD
@frlazzeri
www.aka.ms/AutomatedML
https://twitter.com/frlazzerihttp://www.aka.ms/AutomatedML
-
AutoMLConfigwww.aka.ms/AutomatedML
Instantiate an AutoMLConfig object. This defines the settings and data used to run the experiment.
-
Sarah Guthals, PhD - @sarahguthals
Francesca Lazzeri, PhD - @frlazzeri
Configure and run AutoML Config
www.aka.ms/AutomatedML
-
Sarah Guthals, PhD
Train Model with AutoML
@sarahguthals
https://aka.ms/IntroToAutoML
https://twitter.com/frlazzeri
-
Azure Machine Learning
Francesca Lazzeri, PhD
@frlazzeri
www.aka.ms/AMLservice
https://twitter.com/frlazzeri
-
Authoring tools: Jupyter Notebooks, Automated ML, Designer
Azure Machine Learning
Assets: Datasets, Experiments, ML Workflow Pipelines, Models, Deployments
Management: Compute, Datastores, Workspaces
www.aka.ms/AMLservice
-
Sarah Guthals, PhD
Setup AzureML Studio
@sarahguthals
https://ml.azure.com/
https://twitter.com/frlazzeri
-
Best Model Selection and Featurization Process
Francesca Lazzeri, PhD
@frlazzeri
www.aka.ms/AutomatedML
www.aka.ms/AutoMLfeaturization
https://twitter.com/frlazzerihttp://www.aka.ms/AutomatedML
-
Best Model Selectionwww.aka.ms/AutomatedML
-
Data Featurizationwww.aka.ms/AutoMLfeaturization
-
Best Model Selection and Featurization in Azure ML
Francesca Lazzeri, PhD
@frlazzeri
www.aka.ms/AutomatedML
https://twitter.com/frlazzeri
-
Sarah Guthals, PhD
Evaluate and Retrieve Forecast
@sarahguthals
https://aka.ms/AzureMLRepository
https://twitter.com/frlazzeri
-
Sarah Guthals, PhD
Score Model Using Metrics
@sarahguthals
https://aka.ms/EvaluateModel
https://twitter.com/frlazzeri
-
Sarah Guthals, PhD
Deploy Model as Web Service
@sarahguthals
https://aka.ms/DeployModel
https://twitter.com/frlazzeri
-
Sarah Guthals, PhD
What We Know
@sarahguthals
https://aka.ms/DeployModel
https://twitter.com/frlazzeri
-
Metrics Help Guide Future Questions
Without a forecast horizon, our
model is not accurate enough.
Using a forecast horizon we can
see MAPE is closer to 10%, but
we shouldn’t use a 3-day
forecast
Visualizing APE over Horizon,
we see similarly that day-3 is
not likely to yield an accurate
prediction, but day-14 is.
Our webservice, using the
best_run model, can (mostly)
predict with under 10%
absolute percentage error
across the horizon
-
Predict how many bikes will
be rented in the next hour
-
Predict how many bikes will
be rented in the two weeks
-
Model Deployment Recap
Francesca Lazzeri, PhD
@frlazzeri
www.aka.ms/AzureMLservice
https://twitter.com/frlazzeri
-
The workflow is similar no matter where you deploy your model:
Register the
model
Prepare to
deploy
Deploy the
model to the
compute target
Test the
deployed model,
also called a web
service
Model Deployment Recap www.aka.ms/AzureMLservice
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where#target
-
Register the
model from a
local file
Prepare to
deploy
Deploy the
model to the
compute target
Test the
deployed model,
also called a web
service
1
Register a model from an experiment run
(with SDK)
Register a model from a local file
(with SDK and ONNX)
Model Deployment Recap www.aka.ms/AzureMLservice
-
Register the
model from a
local file
Prepare to
deploy
Deploy the
model to the
compute target
Test the
deployed model,
also called a web
service
2
Define inference
environment
Define scoring code
Define inference
configuration
Model Deployment Recap www.aka.ms/AzureMLservice
-
Register the
model from a
local file
Prepare to
deploy
Deploy the
model to the
compute target
Test the
deployed model,
also called a web
service
3Choose a compute target
Define your deployment configuration
Model Deployment Recap www.aka.ms/AzureMLservice
-
Chose a compute target
-
Register the
model from a
local file
Prepare to
deploy
Deploy the
model to the
compute target
Test the
deployed model,
also called a web
service
4
Model Deployment Recap www.aka.ms/AzureMLservice
-
How to select Machine Learning Algorithms
Francesca Lazzeri, PhD
@frlazzeri
www.aka.ms/SelectAlgos
https://twitter.com/frlazzerihttp://www.aka.ms/SelectAlgos
-
www.aka.ms/SelectAlgos
http://www.aka.ms/SelectAlgos
-
Sarah Guthals, PhD
Ethics in Data
@sarahguthals
https://aka.ms/DeployModel
https://twitter.com/frlazzeri
-
Understand
images and
natural
language
Predict
between
categories
Predict results
based on
relationship
between
values
Machine Learning Algorithms
Discover patterns in your data
www.aka.ms/algorithmcheatsheet
http://www.aka.ms/algorithmcheatsheet
-
Predict between categories
Two-Class ClassificationGoal: Predict between two categories
Answers two-choice questions, like yes or no, true of false
Is this a romantic movie or an adventure movie?
Multiclass ClassificationGoal: Predict between several categories
Answers complex questions with multiple possible answers
Is this a romantic movie or an adventure movie or a musical?
www.aka.ms/algorithmcheatsheet
http://www.aka.ms/algorithmcheatsheet
-
Predict between categories
Two-Class ClassificationGoal: Predict between two categories
Answers two-choice questions, like yes or no, true of false
Is this a romantic movie or an adventure movie?
Multiclass ClassificationGoal: Predict between several categories
Answers complex questions with multiple possible answers
Is this a romantic movie or an adventure movie or a musical?
www.aka.ms/algorithmcheatsheet
Ethical Question: What if adventure movies only had a male cast?
http://www.aka.ms/algorithmcheatsheet
-
Discover patterns in your data
RecommendersGoal: Generate recommendations
Predicts what someone will be interested in
What will customers buy next?
ClusteringGoal: Discover structure
Separates similar data points into intuitive groups
How can I segment my customers based on their preferences and run a better advertising strategy?
Anomaly DetectionGoal: Find unusual occurrences
Identifies and predicts rare or unusual data points
Can I detect equipment anomalies and predict maintenance operations in industrial plants?
www.aka.ms/algorithmcheatsheet
http://www.aka.ms/algorithmcheatsheet
-
Discover patterns in your data
RecommendersGoal: Generate recommendations
Predicts what someone will be interested in
What will customers buy next?
ClusteringGoal: Discover structure
Separates similar data points into intuitive groups
How can I segment my customers based on their preferences and run a better advertising strategy?
Anomaly DetectionGoal: Find unusual occurrences
Identifies and predicts rare or unusual data points
Can I detect equipment anomalies and predict maintenance operations in industrial plants?
www.aka.ms/algorithmcheatsheet
Ethical Question:
What if my data only consisted of
single men without children?
http://www.aka.ms/algorithmcheatsheet
-
Understand images and natural language
Image ClassificationGoal: Classify images
Able to identify images with popular networks
Does this image represent a dog or a cat?
Text AnalyticsGoal: Extract information from text
Derives high-quality information from text
What are our customer feedback and reviews on the quality of our products?
www.aka.ms/algorithmcheatsheet
http://www.aka.ms/algorithmcheatsheet
-
Understand images and natural language
Image ClassificationGoal: Classify images
Able to identify images with popular networks
Does this image represent a dog or a cat?
Text AnalyticsGoal: Extract information from text
Derives high-quality information from text
What are our customer feedback and reviews on the quality of our products?
www.aka.ms/algorithmcheatsheet
Ethical Question: What if my
images were only of large dogs?
Ethical Question: What if we only received feedback from customers with
iPhones?
http://www.aka.ms/algorithmcheatsheet
-
Predict results based on relationship between values
RegressionGoal: Predict Values
Makes forecasts by estimating the relationship between values
What are the forecasted sales quantities per item per store for the next 4 weeks?
www.aka.ms/algorithmcheatsheet
http://www.aka.ms/algorithmcheatsheet
-
Predict results based on relationship between values
RegressionGoal: Predict Values
Makes forecasts by estimating the relationship between values
What are the forecasted sales quantities per item per store for the next 4 weeks?
www.aka.ms/algorithmcheatsheet
Ethical Question: What if I based this off of a holiday season?
http://www.aka.ms/algorithmcheatsheet
-
What data are we collecting, and why?
Check in On Your Process
What data are we not collecting, and why?
What data sources are we missing?
What questions are we asking?
How are we ensuring we improve ethics in our process?
-
Model Interpretability
Francesca Lazzeri, PhD
@frlazzeri
www.aka.ms/ModelInterpretability
https://twitter.com/frlazzeri
-
Interpretability is critical for data scientists, auditors, and business decision makers alike to ensure
compliance with company policies, industry standards, and government regulations:
• Data scientists need the ability to explain their models to executives and stakeholders, so they can
understand the value and accuracy of their findings. They also require interpretability to debug
their models and make informed decisions about how to improve them.
• Legal auditors require tools to validate models with respect to regulatory compliance and monitor
how models' decisions are impacting humans.
• Business decision makers need peace-of-mind by having the ability to provide transparency for
end users. This allows them to earn and maintain trust.
www.aka.ms/ModelInterpretability
-
The interpretability classes are made available through multiple SDK packages:
• azureml.explain.model, the main package, containing functionalities supported by Microsoft
• azureml.contrib.explain.model, preview, and experimental functionalities that you can try
• azureml.train.automl.automlexplainer package for interpreting automated machine learning
models
Using the classes and methods in the SDK, you can:
• Explain model prediction by generating feature importance values for the entire model and/or
individual datapoints.
• Achieve model interpretability on real-world datasets at scale, during training and inference.
• Use an interactive visualization dashboard to discover patterns in data and explanations at
training time
www.aka.ms/ModelInterpretability
-
Global
visualizations
www.aka.ms/ModelInterpretability
-
Local visualizations
www.aka.ms/ModelInterpretability
-
Interpretability in Automated ML• To enable feature importance for a trained ensemble model, use the explain_model() function:
• To enable feature importance for each individual run prior to training, set the model_explainabilityparameter to True in the AutoMLConfig object, along with providing validation data. Then use the retrieve_model_explanation() function.
www.aka.ms/ModelInterpretability
https://twitter.com/frlazzeri
-
Sarah Guthals, PhD - @sarahguthals
Francesca Lazzeri, PhD - @frlazzeri
Conclusions
-
• The Developer’s Introduction to Data Science GitHub Repo: aka.ms/DevIntroDS_GitHub
• Developers Intro to Data Science Learn Collection aka.ms/DevIntroDS_Learn
• The Data Science Lifecycle: aka.ms/DataScienceLifecycle
• Algorithm Cheat Sheet: aka.ms/AlgorithmCheatSheet
• Automated Machine Learning: aka.ms/AutomatedML
• Azure Machine Learning service: aka.ms/AzureMLservice
• Auto ML Featurization : aka.ms/AutoMLfeaturization
• How to Select Machine Learning algorithms: aka.ms/SelectAlgos
• Model Interpretability: aka.ms/ModelInterpretabilitySarah Guthals, PhD - @sarahguthals
Francesca Lazzeri, PhD - @frlazzeri
aka.ms/DevIntroDS_GitHubaka.ms/DevIntroDS_Learnhttp://www.aka.ms/DataScienceLifecyclehttp://aka.ms/algorithmcheatsheethttps://aka.ms/AutomatedMLhttps://aka.ms/AzureMLservicehttp://www.aka.ms/AutoMLfeaturizationhttp://www.aka.ms/SelectAlgoshttps://aka.ms/https://aka.ms/AzureMLModelInterpretability