developers intro to data science · 2020. 7. 10. · • developers intro to data science learn...

91
Sarah Guthals, PhD - @sarahguthals Francesca Lazzeri, PhD - @frlazzeri Developer's Intro to Data Science

Upload: others

Post on 29-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Sarah Guthals, PhD - @sarahguthals

    Francesca Lazzeri, PhD - @frlazzeri

    Developer's Intro to Data Science

  • Introductions

    • Who are we?

    • Who are you?

    • What is our goal?

  • Sarah Guthals, PhD

    Principal Program Manager, Academic Ecosystems, Microsoft

    • Obsessed with teaching and learning tech

    • Goes to Disneyland whenever possible

    • Spouse and Mother of a two-legged toddler and FOUR four-legged children

    https://guthals.com/sarah

    @sarahguthals

    https://twitter.com/frlazzeri

  • Francesca Lazzeri, PhD

    Senior Cloud Advocate Lead, AI & ML for Academics + Research, Microsoft

    • In love with machine learning algorithms and operations research

    • Impressionism art addicted

    • Jazz music connoisseur

    https://medium.com/@francescalazzeri

    @frlazzeri

    https://twitter.com/frlazzeri

  • Who we represent

    Sarah

    • Developer

    • Knows data is important

    • No clue how to get started

    Francesca

    • Machine Learning Scientist

    • Knows data is important

    • Knows how to get started (and finish!)

  • You

    Experience coding in a text-based programming language• Python

    • JavaScript

    • C#

    Experience as a developer• Built an app from start to finish

    • Completed a coding course or bootcamp

    Looking to get started with data science• Partner with data scientists

    • Become a data scientist

    • Make data-informed decisions in development

  • Goals of this Series

    Why Data Science• What is data science

    • Who is involved

    • How it can help development

    How Data Science• Explore and prepare data

    • Elevate to use machine learning

    Ethical Data Science• How to stay ethical

  • Topics Covered

    The Data Science Lifecyle

    Formulating business questions

    Intro to Machine Learning Algorithms

    Preparing data in Visual Studio Code

    Using AutoML to train and test data

    Using Azure Machine Learning Workspace

    Choosing the best model

    Selecting ML algorithms

    Ethics in data

    Model Interpretability

  • • The Developer’s Introduction to Data Science GitHub Repo: aka.ms/DevIntroDS_GitHub

    • Developers Intro to Data Science Learn Collection aka.ms/DevIntroDS_Learn

    • The Data Science Lifecycle: aka.ms/DataScienceLifecycle

    • Algorithm Cheat Sheet: aka.ms/AlgorithmCheatSheet

    • Automated Machine Learning: aka.ms/AutomatedML

    • Azure Machine Learning service: aka.ms/AzureMLservice

    • Auto ML Featurization : aka.ms/AutoMLfeaturization

    • How to Select Machine Learning algorithms: aka.ms/SelectAlgos

    • Model Interpretability: aka.ms/ModelInterpretability Sarah Guthals, PhD - @sarahguthals

    Francesca Lazzeri, PhD - @frlazzeri

    aka.ms/DevIntroDS_GitHubaka.ms/DevIntroDS_Learnhttp://www.aka.ms/DataScienceLifecyclehttp://aka.ms/algorithmcheatsheethttps://aka.ms/AutomatedMLhttps://aka.ms/AzureMLservicehttp://www.aka.ms/AutoMLfeaturizationhttp://www.aka.ms/SelectAlgoshttps://aka.ms/https://aka.ms/AzureMLModelInterpretability

  • Francesca Lazzeri, PhD

    The Data Science Lifecycle

    @frlazzeri

    www.aka.ms/DataScienceLifecycle

    https://twitter.com/frlazzerihttp://www.aka.ms/DataScienceLifecycle

  • Data Science Lifecyle www.aka.ms/DataScienceLifecycle

    http://www.aka.ms/DataScienceLifecycle

  • Data Science Lifecyle www.aka.ms/DataScienceLifecycle

    http://www.aka.ms/DataScienceLifecycle

  • Data Science Lifecyle www.aka.ms/DataScienceLifecycle

    http://www.aka.ms/DataScienceLifecycle

  • Data Science Lifecyle www.aka.ms/DataScienceLifecycle

    http://www.aka.ms/DataScienceLifecycle

  • Data Science Lifecyle www.aka.ms/DataScienceLifecycle

    http://www.aka.ms/DataScienceLifecycle

  • Data Science Lifecyle www.aka.ms/DataScienceLifecycle

    http://www.aka.ms/DataScienceLifecycle

  • Sarah Guthals, PhD

    Defining the Problem

    @sarahguthals

    https://aka.ms/DataScienceBusinessUnderstanding

    https://twitter.com/frlazzeri

  • Use data to improve my app

    Bike sharing app Data can inform decisions Learn how to iteratively improve

  • Goal: Make my bike sharing app better

    Increase the number of bikes rented.

    Make sure bikes are where they should be at

    peak commute times.

    Predict how many bikes will be rented within

    the next hour.

  • Predict how many bikes will

    be rented in the next hour

  • Data Science Lifecyle

  • Francesca Lazzeri, PhD

    Introduction to Machine Learning

    @frlazzeri

    www.aka.ms/AlgorithmCheatSheet

    https://twitter.com/frlazzeri

  • Computation Computation

    www.aka.ms/AlgorithmCheatSheet

  • Regression: how much / how many?

    Supervised Learning vs Unsupervised Learning

    Classification: which class does it belong to?

    Clustering: are there different groups? Which does it belong to?

    Anomaly Detection: is this weird?

    Recommendation: which option should I choose?

    sup

    erv

    ised

    learn

    ing

    un

    sup

    erv

    ised

    learn

    ing

    www.aka.ms/AlgorithmCheatSheet

  • The Machine Learning Model Building Process

    • Test Candidate

    Model with unseen

    data

    • Select good enough

    model

    • Deploy Chosen

    Model

    • Application posts to

    API

    • Apply learning

    algorithm

    • Select Candidate

    model

    • Find, Select and/or

    Create Data

    • Apply preprocessing

    Prepare

    Data

    Train

    Model

    Test

    Model

    Deploy

    Model

    www.aka.ms/AlgorithmCheatSheet

  • Francesca Lazzeri, PhD

    Machine Learning Algorithms

    @frlazzeri

    www.aka.ms/AlgorithmCheatSheet

    https://twitter.com/frlazzeri

  • www.aka.ms/algorithmcheatsheet

    http://www.aka.ms/algorithmcheatsheet

  • Understand

    images and

    natural

    language

    Predict

    between

    categories

    Predict results

    based on

    relationship

    between

    values

    Machine Learning Algorithms

    Discover patterns in your data

    www.aka.ms/algorithmcheatsheet

    http://www.aka.ms/algorithmcheatsheet

  • Predict between categories

    Two-Class ClassificationGoal: Predict between two categories

    Answers two-choice questions, like yes or no, true of false

    Is this a romantic movie or an adventure movie?

    Multiclass ClassificationGoal: Predict between several categories

    Answers complex questions with multiple possible answers

    Is this a romantic movie or an adventure movie or a musical?

    www.aka.ms/algorithmcheatsheet

    http://www.aka.ms/algorithmcheatsheet

  • Discover patterns in your data

    RecommendersGoal: Generate recommendations

    Predicts what someone will be interested in

    What will customers buy next?

    ClusteringGoal: Discover structure

    Separates similar data points into intuitive groups

    How can I segment my customers based on their preferences and run a better advertising strategy?

    Anomaly DetectionGoal: Find unusual occurrences

    Identifies and predicts rare or unusual data points

    Can I detect equipment anomalies and predict maintenance operations in industrial plants?

    www.aka.ms/algorithmcheatsheet

    http://www.aka.ms/algorithmcheatsheet

  • Understand images and natural language

    Image ClassificationGoal: Classify images

    Able to identify images with popular networks

    Does this image represent a dog or a cat?

    Text AnalyticsGoal: Extract information from text

    Derives high-quality information from text

    What are our customer feedback and reviews on the quality of our products?

    www.aka.ms/algorithmcheatsheet

    http://www.aka.ms/algorithmcheatsheet

  • Predict results based on relationship between values

    RegressionGoal: Predict Values

    Makes forecasts by estimating the relationship between values

    What are the forecasted sales quantities per item per store for the next 4 weeks?

    www.aka.ms/algorithmcheatsheet

    http://www.aka.ms/algorithmcheatsheet

  • Predict how many bikes will be rented in the

    next hour

    Regression: Predict outcomes based on relationship between values

  • Francesca Lazzeri, PhD

    Automated ML

    @frlazzeri

    www.aka.ms/AutomatedML

    https://twitter.com/frlazzeri

  • www.aka.ms/algorithmcheatsheet

    http://www.aka.ms/algorithmcheatsheet

  • • The search space to explore—i.e. evaluating all possible combinations—is huge.

    • Sparsity of good configurations. Very few of all possible configurations are optimal.

    • Evaluating each configuration is resource and time consuming.

    • Time and resources are limited.

    Challenges with Hyperparameter Selection

    www.aka.ms/AutomatedML

  • • Automated machine learning is the process of automating the time consuming,

    iterative tasks of machine learning model development.

    • Data scientists, analysts and developers across industries can use automated ML

    to:

    o Implement machine learning solutions without extensive programming

    knowledge

    o Save time and resources

    o Leverage data science best practices

    o Provide agile problem-solving

    Automated Machine Learningwww.aka.ms/AutomatedML

  • How Automated ML workswww.aka.ms/AutomatedML

  • AutoMLConfig classwww.aka.ms/AutomatedML

    It represents configuration for submitting an automated ML experiment

    in Azure Machine Learning.

    www.aka.ms/AutoMLConfig-Class

    http://www.aka.ms/AutoMLConfig-Class

  • Sarah Guthals, PhD

    Azure Machine Learning

    @sarahguthals

    https://aka.ms/AzureMLGettingStarted

    https://twitter.com/frlazzeri

  • Sarah Guthals, PhD

    Set Up Local Environment with VS Code

    @sarahguthals

    https://aka.ms/PythonInVSCode

    https://twitter.com/frlazzeri

  • Sarah Guthals, PhD

    Jupyter Notebooks in VS Code

    @sarahguthals

    https://aka.ms/DataScienceInVSCode

    https://twitter.com/frlazzeri

  • Sarah Guthals, PhD

    AzureML in VS Code Notebooks

    @sarahguthals

    https://aka.ms/IntroToAzureMLSDK

    https://twitter.com/frlazzeri

  • Sarah Guthals, PhD

    Connect Data Between AzureML and VS Code

    @sarahguthals

    https://aka.ms/AzureMLDatastore

    https://twitter.com/frlazzeri

  • Sarah Guthals, PhD

    Split Train and Test Data

    @sarahguthals

    https://aka.ms/AzureMLRepository

    https://twitter.com/frlazzeri

  • How to create your AutoML Config object

    Francesca Lazzeri, PhD

    @frlazzeri

    www.aka.ms/AutomatedML

    https://twitter.com/frlazzerihttp://www.aka.ms/AutomatedML

  • AutoMLConfigwww.aka.ms/AutomatedML

    Instantiate an AutoMLConfig object. This defines the settings and data used to run the experiment.

  • Sarah Guthals, PhD - @sarahguthals

    Francesca Lazzeri, PhD - @frlazzeri

    Configure and run AutoML Config

    www.aka.ms/AutomatedML

  • Sarah Guthals, PhD

    Train Model with AutoML

    @sarahguthals

    https://aka.ms/IntroToAutoML

    https://twitter.com/frlazzeri

  • Azure Machine Learning

    Francesca Lazzeri, PhD

    @frlazzeri

    www.aka.ms/AMLservice

    https://twitter.com/frlazzeri

  • Authoring tools: Jupyter Notebooks, Automated ML, Designer

    Azure Machine Learning

    Assets: Datasets, Experiments, ML Workflow Pipelines, Models, Deployments

    Management: Compute, Datastores, Workspaces

    www.aka.ms/AMLservice

  • Sarah Guthals, PhD

    Setup AzureML Studio

    @sarahguthals

    https://ml.azure.com/

    https://twitter.com/frlazzeri

  • Best Model Selection and Featurization Process

    Francesca Lazzeri, PhD

    @frlazzeri

    www.aka.ms/AutomatedML

    www.aka.ms/AutoMLfeaturization

    https://twitter.com/frlazzerihttp://www.aka.ms/AutomatedML

  • Best Model Selectionwww.aka.ms/AutomatedML

  • Data Featurizationwww.aka.ms/AutoMLfeaturization

  • Best Model Selection and Featurization in Azure ML

    Francesca Lazzeri, PhD

    @frlazzeri

    www.aka.ms/AutomatedML

    https://twitter.com/frlazzeri

  • Sarah Guthals, PhD

    Evaluate and Retrieve Forecast

    @sarahguthals

    https://aka.ms/AzureMLRepository

    https://twitter.com/frlazzeri

  • Sarah Guthals, PhD

    Score Model Using Metrics

    @sarahguthals

    https://aka.ms/EvaluateModel

    https://twitter.com/frlazzeri

  • Sarah Guthals, PhD

    Deploy Model as Web Service

    @sarahguthals

    https://aka.ms/DeployModel

    https://twitter.com/frlazzeri

  • Sarah Guthals, PhD

    What We Know

    @sarahguthals

    https://aka.ms/DeployModel

    https://twitter.com/frlazzeri

  • Metrics Help Guide Future Questions

    Without a forecast horizon, our

    model is not accurate enough.

    Using a forecast horizon we can

    see MAPE is closer to 10%, but

    we shouldn’t use a 3-day

    forecast

    Visualizing APE over Horizon,

    we see similarly that day-3 is

    not likely to yield an accurate

    prediction, but day-14 is.

    Our webservice, using the

    best_run model, can (mostly)

    predict with under 10%

    absolute percentage error

    across the horizon

  • Predict how many bikes will

    be rented in the next hour

  • Predict how many bikes will

    be rented in the two weeks

  • Model Deployment Recap

    Francesca Lazzeri, PhD

    @frlazzeri

    www.aka.ms/AzureMLservice

    https://twitter.com/frlazzeri

  • The workflow is similar no matter where you deploy your model:

    Register the

    model

    Prepare to

    deploy

    Deploy the

    model to the

    compute target

    Test the

    deployed model,

    also called a web

    service

    Model Deployment Recap www.aka.ms/AzureMLservice

    https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where#target

  • Register the

    model from a

    local file

    Prepare to

    deploy

    Deploy the

    model to the

    compute target

    Test the

    deployed model,

    also called a web

    service

    1

    Register a model from an experiment run

    (with SDK)

    Register a model from a local file

    (with SDK and ONNX)

    Model Deployment Recap www.aka.ms/AzureMLservice

  • Register the

    model from a

    local file

    Prepare to

    deploy

    Deploy the

    model to the

    compute target

    Test the

    deployed model,

    also called a web

    service

    2

    Define inference

    environment

    Define scoring code

    Define inference

    configuration

    Model Deployment Recap www.aka.ms/AzureMLservice

  • Register the

    model from a

    local file

    Prepare to

    deploy

    Deploy the

    model to the

    compute target

    Test the

    deployed model,

    also called a web

    service

    3Choose a compute target

    Define your deployment configuration

    Model Deployment Recap www.aka.ms/AzureMLservice

  • Chose a compute target

  • Register the

    model from a

    local file

    Prepare to

    deploy

    Deploy the

    model to the

    compute target

    Test the

    deployed model,

    also called a web

    service

    4

    Model Deployment Recap www.aka.ms/AzureMLservice

  • How to select Machine Learning Algorithms

    Francesca Lazzeri, PhD

    @frlazzeri

    www.aka.ms/SelectAlgos

    https://twitter.com/frlazzerihttp://www.aka.ms/SelectAlgos

  • www.aka.ms/SelectAlgos

    http://www.aka.ms/SelectAlgos

  • Sarah Guthals, PhD

    Ethics in Data

    @sarahguthals

    https://aka.ms/DeployModel

    https://twitter.com/frlazzeri

  • Understand

    images and

    natural

    language

    Predict

    between

    categories

    Predict results

    based on

    relationship

    between

    values

    Machine Learning Algorithms

    Discover patterns in your data

    www.aka.ms/algorithmcheatsheet

    http://www.aka.ms/algorithmcheatsheet

  • Predict between categories

    Two-Class ClassificationGoal: Predict between two categories

    Answers two-choice questions, like yes or no, true of false

    Is this a romantic movie or an adventure movie?

    Multiclass ClassificationGoal: Predict between several categories

    Answers complex questions with multiple possible answers

    Is this a romantic movie or an adventure movie or a musical?

    www.aka.ms/algorithmcheatsheet

    http://www.aka.ms/algorithmcheatsheet

  • Predict between categories

    Two-Class ClassificationGoal: Predict between two categories

    Answers two-choice questions, like yes or no, true of false

    Is this a romantic movie or an adventure movie?

    Multiclass ClassificationGoal: Predict between several categories

    Answers complex questions with multiple possible answers

    Is this a romantic movie or an adventure movie or a musical?

    www.aka.ms/algorithmcheatsheet

    Ethical Question: What if adventure movies only had a male cast?

    http://www.aka.ms/algorithmcheatsheet

  • Discover patterns in your data

    RecommendersGoal: Generate recommendations

    Predicts what someone will be interested in

    What will customers buy next?

    ClusteringGoal: Discover structure

    Separates similar data points into intuitive groups

    How can I segment my customers based on their preferences and run a better advertising strategy?

    Anomaly DetectionGoal: Find unusual occurrences

    Identifies and predicts rare or unusual data points

    Can I detect equipment anomalies and predict maintenance operations in industrial plants?

    www.aka.ms/algorithmcheatsheet

    http://www.aka.ms/algorithmcheatsheet

  • Discover patterns in your data

    RecommendersGoal: Generate recommendations

    Predicts what someone will be interested in

    What will customers buy next?

    ClusteringGoal: Discover structure

    Separates similar data points into intuitive groups

    How can I segment my customers based on their preferences and run a better advertising strategy?

    Anomaly DetectionGoal: Find unusual occurrences

    Identifies and predicts rare or unusual data points

    Can I detect equipment anomalies and predict maintenance operations in industrial plants?

    www.aka.ms/algorithmcheatsheet

    Ethical Question:

    What if my data only consisted of

    single men without children?

    http://www.aka.ms/algorithmcheatsheet

  • Understand images and natural language

    Image ClassificationGoal: Classify images

    Able to identify images with popular networks

    Does this image represent a dog or a cat?

    Text AnalyticsGoal: Extract information from text

    Derives high-quality information from text

    What are our customer feedback and reviews on the quality of our products?

    www.aka.ms/algorithmcheatsheet

    http://www.aka.ms/algorithmcheatsheet

  • Understand images and natural language

    Image ClassificationGoal: Classify images

    Able to identify images with popular networks

    Does this image represent a dog or a cat?

    Text AnalyticsGoal: Extract information from text

    Derives high-quality information from text

    What are our customer feedback and reviews on the quality of our products?

    www.aka.ms/algorithmcheatsheet

    Ethical Question: What if my

    images were only of large dogs?

    Ethical Question: What if we only received feedback from customers with

    iPhones?

    http://www.aka.ms/algorithmcheatsheet

  • Predict results based on relationship between values

    RegressionGoal: Predict Values

    Makes forecasts by estimating the relationship between values

    What are the forecasted sales quantities per item per store for the next 4 weeks?

    www.aka.ms/algorithmcheatsheet

    http://www.aka.ms/algorithmcheatsheet

  • Predict results based on relationship between values

    RegressionGoal: Predict Values

    Makes forecasts by estimating the relationship between values

    What are the forecasted sales quantities per item per store for the next 4 weeks?

    www.aka.ms/algorithmcheatsheet

    Ethical Question: What if I based this off of a holiday season?

    http://www.aka.ms/algorithmcheatsheet

  • What data are we collecting, and why?

    Check in On Your Process

    What data are we not collecting, and why?

    What data sources are we missing?

    What questions are we asking?

    How are we ensuring we improve ethics in our process?

  • Model Interpretability

    Francesca Lazzeri, PhD

    @frlazzeri

    www.aka.ms/ModelInterpretability

    https://twitter.com/frlazzeri

  • Interpretability is critical for data scientists, auditors, and business decision makers alike to ensure

    compliance with company policies, industry standards, and government regulations:

    • Data scientists need the ability to explain their models to executives and stakeholders, so they can

    understand the value and accuracy of their findings. They also require interpretability to debug

    their models and make informed decisions about how to improve them.

    • Legal auditors require tools to validate models with respect to regulatory compliance and monitor

    how models' decisions are impacting humans.

    • Business decision makers need peace-of-mind by having the ability to provide transparency for

    end users. This allows them to earn and maintain trust.

    www.aka.ms/ModelInterpretability

  • The interpretability classes are made available through multiple SDK packages:

    • azureml.explain.model, the main package, containing functionalities supported by Microsoft

    • azureml.contrib.explain.model, preview, and experimental functionalities that you can try

    • azureml.train.automl.automlexplainer package for interpreting automated machine learning

    models

    Using the classes and methods in the SDK, you can:

    • Explain model prediction by generating feature importance values for the entire model and/or

    individual datapoints.

    • Achieve model interpretability on real-world datasets at scale, during training and inference.

    • Use an interactive visualization dashboard to discover patterns in data and explanations at

    training time

    www.aka.ms/ModelInterpretability

  • Global

    visualizations

    www.aka.ms/ModelInterpretability

  • Local visualizations

    www.aka.ms/ModelInterpretability

  • Interpretability in Automated ML• To enable feature importance for a trained ensemble model, use the explain_model() function:

    • To enable feature importance for each individual run prior to training, set the model_explainabilityparameter to True in the AutoMLConfig object, along with providing validation data. Then use the retrieve_model_explanation() function.

    www.aka.ms/ModelInterpretability

    https://twitter.com/frlazzeri

  • Sarah Guthals, PhD - @sarahguthals

    Francesca Lazzeri, PhD - @frlazzeri

    Conclusions

  • • The Developer’s Introduction to Data Science GitHub Repo: aka.ms/DevIntroDS_GitHub

    • Developers Intro to Data Science Learn Collection aka.ms/DevIntroDS_Learn

    • The Data Science Lifecycle: aka.ms/DataScienceLifecycle

    • Algorithm Cheat Sheet: aka.ms/AlgorithmCheatSheet

    • Automated Machine Learning: aka.ms/AutomatedML

    • Azure Machine Learning service: aka.ms/AzureMLservice

    • Auto ML Featurization : aka.ms/AutoMLfeaturization

    • How to Select Machine Learning algorithms: aka.ms/SelectAlgos

    • Model Interpretability: aka.ms/ModelInterpretabilitySarah Guthals, PhD - @sarahguthals

    Francesca Lazzeri, PhD - @frlazzeri

    aka.ms/DevIntroDS_GitHubaka.ms/DevIntroDS_Learnhttp://www.aka.ms/DataScienceLifecyclehttp://aka.ms/algorithmcheatsheethttps://aka.ms/AutomatedMLhttps://aka.ms/AzureMLservicehttp://www.aka.ms/AutoMLfeaturizationhttp://www.aka.ms/SelectAlgoshttps://aka.ms/https://aka.ms/AzureMLModelInterpretability