unit 1 fundamentals, course 1: introduction to data …data analysis in excel using classic tools,...
TRANSCRIPT
1
Microsoft Professional Program: Data Science
Unit 1 – Fundamentals, Course 1: Introduction to Data
Science
Learn what it takes to become a data scientist. This is the first stop in the Data
Science curriculum from Microsoft. It will help you get started with the program, plan
your learning schedule, and connect with fellow students and teaching assistants.
Along the way, you’ll get an introduction to working with and exploring data using a
variety of visualization, analytical, and statistical techniques.
What you'll learn
• How the Microsoft Data Science curriculum works
• How to navigate the curriculum and plan your course schedule
• Basic data exploration and visualization techniques in Microsoft Excel
• Foundational statistics that can be used to analyze data
Duration: 2 weeks
Total effort: 12 – 24 hours
Level: Introductory
Prerequisite knowledge: none
Language: English, with Q&A worskshop in Croatian language
2
Microsoft Professional Program: Data Science
Unit 1 – Fundamentals, Course 2a: Analyzing and
Visualizing Data with Excel
Excel is one of the most widely used solutions for analyzing and visualizing data. It
now includes tools that enable the analysis of more data, with improved
visualizations and more sophisticated business logics. In this data science course,
you will get an introduction to the latest versions of these new tools in Excel 2016
from an expert on the Excel Product Team at Microsoft.
Learn how to import data from different sources, create mashups between data
sources, and prepare data for analysis. After preparing the data, find out how
business calculations can be expressed using the DAX calculation engine. See how
the data can be visualized and shared to the Power BI cloud service, after which it
can be used in dashboards, queried using plain English sentences, and even
consumed on mobile devices.
Do you feel that the contents of this course is a bit too advanced for you and you
need to fill some gaps in your Excel knowledge? Do you need a better understanding
of how pivot tables, pivot charts and slicers work together, and help in creating
dashboards? If so, check out DAT205x: Introduction to Data Analysis using Excel.
What you'll learn
• Gather and transform data from multiple sources
• Discover and combine data in mashups
• Learn about data model creation
• Explore, analyze, and visualize dana
Duration: 2 weeks
Total effort: 12 – 24 hours
Level: Intermediate
Prerequisite knowledge: Understanding of Excel analytic tools such as tables, pivot
tables and pivot charts. Also, some experience in working with data from databases
and also from text files will be helpful.
Language: English, with Q&A worskshop in Croatian language
Syllabus
Week 1
Setup the lab environment by installing Office applications. Learn how to perform
data analysis in Excel using classic tools, such as pivot tables, pivot charts, and
slicers, on data that is already in a worksheet / grid data. Explore an Excel data
model, its content, and its structure, using the Power Pivot add-in. Create your first
DAX expressions for calculated columns and measures.
Learn about queries (Power Query add-in in Excel 2013 and Excel 2010), and build
an Excel data model from a single flat table. Learn how to import multiple tables from
3
Microsoft Professional Program: Data Science
a SQL database, and create an Excel data model from the imported data. Create a
mash-up between data from text-files and data from a SQL database.
Week 2
Get the details on how to create measures to calculate for each cell, filter context for
calculation, and explore several advanced DAX functions. Find out how to use
advanced text query to import data from a formatted Excel report. Perform queries
beyond the standard user interface.
Explore ways to create stunning visualizations in Excel. Use the cube functions to
perform year-over-year comparisons. Create timelines, hierarchies, and slicers to
enhance your visualizations. Learn how Excel can work together with Power BI.
Upload an Excel workbook to the Power BI service. Explore the use of Excel on the
mobile platform.
4
Microsoft Professional Program: Data Science
Unit 1 – Fundamentals, Course 2b: Analyzing and
Visualizing Data with Power BI
Learn Power BI, a powerful cloud-based service that helps data scientists visualize
and share insights from their data.
Power BI is quickly gaining popularity among professionals in data science as a
cloud-based service that helps them easily visualize and share insights from their
organizations’ data.
In this data science course, you will learn from the Power BI product team at
Microsoft with a series of short, lecture-based videos, complete with demos, quizzes,
and hands-on labs. You’ll walk through Power BI, end to end, starting from how to
connect to and import your data, author reports using Power BI Desktop, and publish
those reports to the Power BI service. Plus, learn to create dashboards and share
with business users—on the web and on mobile devices.
What you'll learn
• Connect, import, shape, and transform data for business intelligence (BI)
• Visualize data, author reports, and schedule automated refresh of your
reports
• Create and share dashboards based on reports in Power BI desktop and
Excel
• Use natural language queries
• Create real-time dashboards
Duration: 2 weeks
Total effort: 12 – 24 hours
Level: Introductory
Prerequisite knowledge: Some experience in working with data from Excel,
databases, or text files.
Language: English, with Q&A worskshop in Croatian language
Syllabus
Week 1
• Understanding key concepts in business intelligence, data analysis, and data
visualization
• Importing your data and automatically creating dashboards from services such
as Marketo, Salesforce, and Google Analytics
• Connecting to and importing your data, then shaping and transforming that
data
• Enriching your data with business calculations
• Visualizing your data and authoring reports
• Scheduling automated refresh of your reports
• Creating dashboards based on reports and natural language queries
5
Microsoft Professional Program: Data Science
• Sharing dashboards across your organization
• Consuming dashboards in mobile apps
Week 2
• Leveraging your Excel reports within Power BI
• Creating custom visualizations that you can use in dashboards and reports
• Collaborating within groups to author reports and dashboards
• Sharing dashboards effectively based on your organization’s needs
• Exploring live connections to data with Power BI
• Connecting directly to SQL Azure, HD Spark, and SQL Server Analysis
Services
• Introduction to Power BI Development API
• Leveraging custom visuals in Power BI
6
Microsoft Professional Program: Data Science
Unit 1 – Fundamentals, Course 3: Analytics Storytelling for
Impact
All analytics work begins and ends with a story. Storytelling with data is the analytics
professional’s missing link in delivering the essence of date signals and insights to
executives, management, and other stakeholders.
In this analytics storytelling course, you’ll learn effective strategies and tools to
master data communication in the most impactful way possible—through well-crafted
analytics stories.
You'll explore what a story is and, perhaps more importantly, what a story is not. Find
out how stories create value and why they matter. Learn to craft stories, command
the room, finish strong, and assess your impact. Get practical help applying these
ideas to your data analytics work. Plus, you'll learn guidelines and best practices for
creating high-impact reports and presentations.
edX offers financial assistance for learners who want to earn Verified Certificates but
who may not be able to pay the fee. To apply for financial assistance, enroll in the
course, then follow this link to complete an application for assistance.
What you'll learn
• How to apply storytelling principles to your analytics work
• How to improve your analytics presentations through storytelling
• Guidelines and best practices for creating high-impact reports and
presentations
Duration: 1 week
Total effort: 12 – 24 hours
Level: Introductory
Prerequisite knowledge: one of the following courses or equivalent knowledge and
skills:
• Analyzing and Visualizing Data with Excel
• Analyzing and Visualizing Data with Power BI
• Working knowledge of PowerPoint.
Language: English, with Q&A worskshop in Croatian language
7
Microsoft Professional Program: Data Science
Unit 1 – Fundamentals, Course 4: Ethics and Law in Data
and Analytics Analytics and AI are powerful tools that have real-word outcomes. Learn how to
apply practical, ethical, and legal constructs and scenarios so that you can be an
effective analytics professional.
Corporations, governments, and individuals have powerful tools in Analytics and AI to
create real-world outcomes, for good or for ill.
Data professionals today need both the frameworks and the methods in their job to
achieve optimal results while being good stewards of their critical role in society
today.
In this course, you'll learn to apply ethical and legal frameworks to initiatives in the
data profession. You'll explore practical approaches to data and analytics problems
posed by work in Big Data, Data Science, and AI. You'll also investigate applied data
methods for ethical and legal work in Analytics and AI.
edX offers financial assistance for learners who want to earn Verified Certificates but
who may not be able to pay the fee. To apply for financial assistance, enroll in the
course, then follow this link to complete an application for assistance.
What you'll learn
• Foundational abilities in applying ethical and legal frameworks for the data
profession
• Practical approaches to data and analytics problems, including Big Data and
Data Science and AI
• Applied data methods for ethical and legal work in Analytics and AI
Duration: 1 week
Total effort: 12 – 18 hours
Level: Introductory
Prerequisite knowledge: No prerequisites
Language: English, with Q&A worskshop in Croatian language
8
Microsoft Professional Program: Data Science
Unit 1 – Fundamentals, Course 5: Querying Data with
Transact-SQL
From querying and modifying data in SQL Server or Azure SQL to programming with
Transact-SQL, learn essential skills that employers need.
Transact-SQL is an essential skill for data professionals and developers working with
SQL databases. With this combination of expert instruction, demonstrations, and
practical labs, step from your first SELECT statement through to implementing
transactional programmatic logic.
Work through multiple modules, each of which explore a key area of the
TransactSQL language, with a focus on querying and modifying data in Microsoft
SQL Server or Azure SQL Database. The labs in this course use a sample database
that can be deployed easily in Azure SQL Database, so you get hands-on experience
with Transact-SQL without installing or configuring a database server.
What you'll learn
• Create Transact-SQL SELECT queries
• Work with data types and NULL
• Query multiple tables with JOIN
• Explore set operators
• Use functions and aggregate data
• Work with subqueries and APPLY
• Use table expressions
• Group sets and pivot data
• Modify data
• Program with Transact-SQL
• Implement error handling and transactions
Duration: 3 weeks
Total effort: 24 – 30 hours
Level: Intermediate
Prerequisite knowledge: Basic understanding of databases and IT systems.
Language: English, with Q&A worskshop in Croatian language
9
Microsoft Professional Program: Data Science
Unit 2 Core Data Science, Course 6a: Introduction to R for
Data Science
Learn the R statistical programming language, the lingua franca of data science in
this hands-on course.
R is rapidly becoming the leading language in data science and statistics. Today, R
is the tool of choice for data science professionals in every industry and field.
Whether you are full-time number cruncher, or just the occasional data analyst, R
will suit your needs.
This introduction to R programming course will help you master the basics of R. In
seven sections, you will cover its basic syntax, making you ready to undertake your
own first data analysis using R. Starting from variables and basic operations, you will
eventually learn how to handle data structures such as vectors, matrices, data
frames and lists. In the final section, you will dive deeper into the graphical
capabilities of R,
and create your own stunning data visualizations. No prior knowledge in
programming or data science is required.
What makes this course unique is that you will continuously practice your newly
acquired skills through interactive in-browser coding challenges using the DataCamp
platform. Instead of passively watching videos, you will solve real data problems
while receiving instant and personalized feedback that guides you to the correct
solution.
What you'll learn
• Introductory R language fundamentals and basic syntax
• What R is and how it’s used to perform data analysis
• Become familiar with the major R data structures
• Create your own visualizations using R
Duration: 2 weeks
Total effort: 12 – 24 hours
Level: Introductory
Prerequisite knowledge: none, but previous experience in basic mathematics is
helpful.
Language: English, with Q&A worskshop in Croatian language
Syllabus
Module 1: Introduction to Basics
Take your first steps with R. Discover the basic data types in R and assign your first
variable.
Module 2: Vectors
10
Microsoft Professional Program: Data Science
Analyze gambling behaviour using vectors. Create, name and select elements from
vectors.
Module 3: Matrices
Learn how to work with matrices in R. Do basic computations with them and
demonstrate your knowledge by analyzing the Star Wars box office figures.
Module 4: Factors
R stores categorical data in factors. Learn how to create, subset and compare
categorical data.
Module 5: Data Frames
When working R, you’ll probably deal with Data Frames all the time. Therefore, you
need to know how to create one, select the most interesting parts of it, and order
them.
Module 6: Lists
Lists allow you to store components of different types. Module 6 will show you how
to deal with lists.
Module 7: Basic Graphics
Discover R’s packages to do graphics and create your own data visualizations.
11
Microsoft Professional Program: Data Science
Unit 2 - Core Data Science, Course 6b: Introduction to
Python for Data Science
The ability to analyze data with Python is critical in data science. Learn the basics,
and move on to create stunning visualizations.
Python is a very powerful programming language used for many different
applications. Over time, the huge community around this open source language has
created quite a few tools to efficiently work with Python. In recent years, a number of
tools have been built specifically for data science. As a result, analyzing data with
Python has never been easier.
In this practical course, you will start from the very beginning, with basic arithmetic
and variables, and learn how to handle data structures, such as Python lists, Numpy
arrays, and Pandas DataFrames. Along the way, you’ll learn about Python functions
and control flow. Plus, you’ll look at the world of data visualizations with Python and
create your own stunning visualizations based on real data.
What you'll learn
• Explore Python language fundamentals, including basic syntax, variables,
and types
• Create and manipulate regular Python lists
• Use functions and import packages
• Build Numpy arrays, and perform interesting calculations
• Create and customize plots on real data
• Supercharge your scripts with control flow, and get to know the Pandas
DataFrame
Duration: 2 weeks
Total effort: 12 – 24 hours
Level: Introductory
Prerequisite knowledge: Some experience in working with data from Excel,
databases, or text files.
Language: English, with Q&A worskshop in Croatian language
Syllabus
Module 1: Python Basics
Take your first steps in the world of Python. Discover the different data types and
create your first variable.
Module 2: Python Lists
Get the know the first way to store many different data points under a single name.
Create, subset and manipulate Lists in all sorts of ways.
Module 3: Functions and Packages
12
Microsoft Professional Program: Data Science
Learn how to get the most out of other people's efforts by importing Python
packages and calling functions.
Module 4: Numpy
Write superfast code with Numerical Python, a package to efficiently store and do
calculations with huge amounts of data.
Module 5: Matplotlib
Create different types of visualizations depending on the message you want to
convey. Learn how to build complex and customized plots based on real data.
Module 6: Control flow and Pandas
Write conditional constructs to tweak the execution of your scripts and get to know
the Pandas DataFrame: the key data structure for Data Science in Python.
13
Microsoft Professional Program: Data Science
Unit 2 - Core Data Science, Course 7a: Essential Statistics
for Data Analysis using Excel
Gain a solid understanding of statistics and basic probability, using Excel, and build
on your data analysis and data science foundation.
If you’re considering a career as a data analyst, you need to know about histograms,
Pareto charts, Boxplots, Bayes’ theorem, and much more. In this applied statistics
course, the second in our Microsoft Excel Data Analyst XSeries, use the powerful
tools built into Excel, and explore the core principles of statistics and basic
probability—from both the conceptual and applied perspectives. Learn about
descriptive statistics, basic probability, random variables, sampling and confidence
intervals, and hypothesis testing. And see how to apply these concepts and
principles using the environment, functions, and visualizations of Excel.
As a data science pro, the ability to analyze data helps you to make better decisions,
and a solid foundation in statistics and basic probability helps you to better
understand your data. Using real-world concepts applicable to many industries,
including medical, business, sports, insurance, and much more, learn from leading
experts why Excel is one of the top tools for data analysis and how its built-in
features make Excel a great way to learn essential skills.
Before taking this course, you should be familiar with organizing and summarizing
data using Excel analytic tools, such as tables, pivot tables, and pivot charts. You
should also be comfortable (or willing to try) creating complex formulas and
visualizations. Want to start with the basics? Check out DAT205x: Introduction to
Data Analysis using Excel. As you learn these concepts and get more experience
with this powerful tool that can be extremely helpful in your journey as a data analyst
or data scientist, you may want to also take the third course in our series, DAT206x
Analyzing and Visualizing Data with Excel. This course includes excerpts from
Microsoft Excel 2016: Data Analysis and Business Modeling from Microsoft Press
and authored by course instructor Wayne Winston.
What you'll learn
• Descriptive statistics
• Basic probability
• Random variables
• Sampling and confidence intervals
• Hypothesis testing
Duration: 2 weeks
Total effort: 12 – 24 hours
Level: Intermediate
Prerequisite knowledge: Secondary school (high school) algebra. Ability to work
with tables, formulas, and charts in Excel. Ability to organize and summarize data
using Excel analytic tools such as tables, pivot tables, and pivot charts.
Language: English, with Q&A worskshop in Croatian language
14
Microsoft Professional Program: Data Science
System Requirements
Excel 2016 is required for the full course experience. Excel 2013 will work but will
not support all the visualizations and functions.
Syllabus
Module 1: Descriptive Statistics
You will learn how to describe data using charts and basic statistical measures. Full
use will be made of the new histograms, Pareto charts, Boxplots, and Treemap and
Sunburst charts in Excel 2016.
Module 2: Basic Probability
You will learn basic probability including the law of complements, independent
events, conditional probability and Bayes Theorem.
Module 3: Random Variables
You will learn how to find the mean and variance of random variables and then learn
about the binomial, Poisson, and Normal random variables. We close with a
discussion of the beautiful and important Central Limit Theorem.
Module 4: Sampling and Confidence Intervals
You will learn the mechanics of sampling, point estimation, and interval estimation of
population parameters.
Module 5: Hypothesis Testing
You will learn null and alternative hypotheses, Type I and Type II error, One sample
tests for means and proportions, Tests for difference between means of two
populations, and the Chi Square Test for Independence.
15
Microsoft Professional Program: Data Science
Unit 2 - Core Data Science, Course 7b: Essential Math for
Machine Learning: R Edition
Want to study machine learning or artificial intelligence, but worried that your math
skills may not be up to it? Do words like “algebra’ and “calculus” fill you with dread?
Has it been so long since you studied math at school that you’ve forgotten much of
what you learned in the first place?
You’re not alone. Machine learning and AI are built on mathematical principles like
Calculus, Linear Algebra, Probability, Statistics, and Optimization; and many would-
be AI practitioners find this daunting. This course is not designed to make you a
mathematician. Rather, it aims to help you learn some essential foundational
concepts and the notation used to express them. The course provides a hands-on
approach to working with data and applying the techniques you’ve learned.
This course is not a full math curriculum. It’s not designed to replace school or
college math education. Instead, it focuses on the key mathematical concepts that
you’ll encounter in studies of machine learning. It is designed to fill the gaps for
students who missed these key concepts as part of their formal education, or who
need to refresh their memories after a long break from studying math.
What you'll learn
• Familiarity with Equations, Functions, and Graphs
• Differentiation and Optimization
• Vectors and Matrices
• Statistics and Probability
Duration: 3 weeks
Total effort: 36 – 48 hours
Level: Intermediate
Prerequisite knowledge: To complete this course successfully, you should have:
• A basic knowledge of math
• Some programming experience – R is preferred.
Language: English, with Q&A worskshop in Croatian language
Syllabus
• Introduction
• Equations, Functions, and Graphs
• Differentiation and Optimization
• Vectors and Matrices
• Statistics and Probability
16
Microsoft Professional Program: Data Science
Unit 2 - Core Data Science, Course 7c: Essential Math for
Machine Learning: Python Edition
Want to study machine learning or artificial intelligence, but worried that your math
skills may not be up to it? Do words like “algebra’ and “calculus” fill you with dread?
Has it been so long since you studied math at school that you’ve forgotten much of
what you learned in the first place?
You’re not alone. machine learning and AI are built on mathematical principles like
Calculus, Linear Algebra, Probability, Statistics, and Optimization; and many would-
be AI practitioners find this daunting. This course is not designed to make you a
mathematician. Rather, it aims to help you learn some essential foundational
concepts and the notation used to express them. The course provides a hands-on
approach to working with data and applying the techniques you’ve learned.
This course is not a full math curriculum; it’s not designed to replace school or
college math education. Instead, it focuses on the key mathematical concepts that
you’ll encounter in studies of machine learning. It is designed to fill the gaps for
students who missed these key concepts as part of their formal education, or who
need to refresh their memories after a long break from studying math.
What you'll learn After completing this course, you will be familiar with the following mathematical
concepts and techniques:
• Equations, Functions, and Graphs
• Differentiation and Optimization
• Vectors and Matrices
• Statistics and Probability
Duration: 3 weeks
Total effort: 36 – 48 hours
Level: Intermediate
Prerequisite knowledge: To complete this course successfully, you should have:
• A basic knowledge of math
• Some programming experience – Python is preferred.
Language: English, with Q&A worskshop in Croatian language
Syllabus
• Introduction
• Equations, Functions, and Graphs
• Differentiation and Optimization
• Vectors and Matrices
• Statistics and Probability
17
Microsoft Professional Program: Data Science
Unit 3 - Applied Data Science, Course 8a: Data Science
Research Methods: Python Edition
Get hands-on experience with the science and research aspects of data science
work, from setting up a proper data study to making valid claims and inferences from
data experiments. Data scientists are often trained in the analysis of data. However, the goal of data
science is to produce a good understanding of some problem or idea and build
useful models on this understanding. Because of the principle of “garbage in,
garbage out,” it is vital that a data scientist know how to evaluate the quality of
information that comes into a data analysis. This is especially the case when data
are collected specifically for some analysis (e.g., a survey).
In this course, you will learn the fundamentals of the research process—from
developing a good question to designing good data collection strategies to putting
results in context. Although a data scientist may often play a key part in data
analysis, the entire research process must work cohesively for valid insights to be
gleaned.
Developed as a powerful and flexible language used in everything from Data
Science to cutting-edge and scalable Artificial Intelligence solutions, Python has
become an essential tool for doing Data Science and Machine Learning. With this
edition of Data Science Research Methods, all of the labs are done with Python,
while the videos are language-agnostic. If you prefer your Data Science to be done
with R, please see Data Science Research Methods: R Edition.
What you'll learn
After completing this course, you will be familiar with the following concepts and
techniques:
• Data analysis and inference
• Data science research design
• Experimental data analysis and modeling
Duration: 2 weeks
Total effort: 12 – 18 hours
Level: Intermediate
Prerequisite knowledge:
• A basic knowledge of math
• Some programming experience – Python is preferred.
Language: English, with Q&A worskshop in Croatian language
Syllabus
• The Research Process
• Planning for Analysis
• Research Claims
18
Microsoft Professional Program: Data Science
• Measurement
• Correlational and Experimental Design
19
Microsoft Professional Program: Data Science
Unit 3 - Applied Data Science, Course 8b: Data Science
Research Methods: R Edition
Get hands-on experience with the science and research aspects of data science
work, from setting up a proper data study to making valid claims and inferences from
data experiments.
Data scientists are often trained in the analysis of data. However, the goal of data
science is to produce good understanding of some problem or idea and build useful
models on this understanding. Because of the principle of “garbage in, garbage out,”
it is vital that the data scientist know how to evaluate the quality of information that
comes into a data analysis. This is especially the case when data are collected
specifically for some analysis (e.g., a survey).
In this course, you will learn the fundamentals of the research process—from
developing a good question to designing good data collection strategies to putting
results in context. Although the data scientist may often play a key part in data
analysis, the entire research process must work cohesively for valid insights to be
gleaned.
Developed as a language with statistical analysis and modeling in mind, R has
become an essential tool for doing real-world Data Science. With this edition of Data
Science Research Methods, all of the labs are done with R, while the videos are
tool-agnostic. If you prefer your Data Science to be done with Python, please see
Data Science Research Methods: Python Edition.
What you'll learn
After completing this course, you will be familiar with the following concepts and
techniques:
• Data analysis and inference
• Data science research design
• Experimental data analysis and modeling
Duration: 2 weeks
Total effort: 12 – 18 hours
Level: Intermediate
Prerequisite knowledge:
• A basic knowledge of math
• Some programming experience – R is preferred.
Language: English, with Q&A worskshop in Croatian language
Syllabus
• The Research Process
• Planning for Analysis
• Research Claims
• Measurement
• Correlational and Experimental Design
20
Microsoft Professional Program: Data Science
Unit 3 - Applied Data Science, Course 9a: Principles of
Machine Learning: R Edition
Get hands-on experience building and deriving insights from machine learning models using R and Azure Notebooks. Machine learning uses computers to run predictive models that learn from existing data in order to forecast future behaviors, outcomes, and trends. In this data science course, you will be given clear explanations of machine learning theory combined with practical scenarios and hands-on experience building, validating, and deploying machine learning models. You will learn how to build and derive insights from these models using R, and Azure Notebooks. What you'll learn
After completing this course, you will be familiar with the following concepts and
techniques:
• Data exploration, preparation and cleaning
• Supervised machine learning techniques
• Unsupervised machine learning techniques
• Model performance improvement
•
Duration: 4 weeks
Total effort: 36 – 48 hours
Level: Intermediate
Prerequisite knowledge:
• A basic knowledge of math
• Some programming experience – R is preferred.
Language: English, with Q&A worskshop in Croatian language
Syllabus
• Introduction to Machine Learning
• Exploring Data
• Data Preparation and Cleaning
• Getting Started with Supervised Learning
• Improving Model Performance
• Machine Learning Algorithms
• Unsupervised Learning
21
Microsoft Professional Program: Data Science
Unit 3 - Applied Data Science, Course 9b: Principles of
Machine Learning: Python Edition
Get hands-on experience building and deriving insights from machine learning models using Python and Azure Notebooks. Machine learning uses computers to run predictive models that learn from existing data in order to forecast future behaviors, outcomes, and trends. In this data science course, you will be given clear explanations of machine learning theory combined with practical scenarios and hands-on experience building, validating, and deploying machine learning models. You will learn how to build and derive insights from these models using Python, and Azure Notebooks.
What you'll learn
After completing this course, you will be familiar with the following concepts and
techniques:
• Data exploration, preparation and cleaning
• Supervised machine learning techniques
• Unsupervised machine learning techniques
• Model performance improvement
Duration: 4 weeks
Total effort: 36 – 48 hours
Level: Intermediate
Prerequisite knowledge:
• A basic knowledge of math
• Some programming experience – Python is preferred.
Language: English, with Q&A worskshop in Croatian language
Syllabus
• Introduction to Machine Learning
• Exploring Data
• Data Preparation and Cleaning
• Getting Started with Supervised Learning
• Improving Model Performance
• Machine Learning Algorithms
• Unsupervised Learning
22
Microsoft Professional Program: Data Science
Unit 3 - Applied Data Science, Course 10a: Developing
Big Data Solutions with Azure Machine Learning
The past can often be the key to predicting the future. Big data from historical
sources is a valuable resource for identifying trends and building machine learning
models that apply statistical patterns and predict future outcomes.
This course introduces Azure Machine Learning, and explores techniques and
considerations for using it to build models from big data sources, and to integrate
predictive insights into big data processing workflows.
What you'll learn
• How to create predictive web services with Azure Machine Learning
• How to work with big data sources in Azure Machine Learning
• How to integrate Azure Machine Learning into big data batch processing
pipelines
• How to integrate Azure Machine Learning into real-time big data
processing solutions
Duration: 2 weeks
Total effort: 12 – 16 hours
Level: Intermediate
Prerequisite knowledge:
• Building data processing pipelines with Azure Data Factory
• Building real-time data processing solutions with Azure Stream
Analytics
Language: English, with Q&A worskshop in Croatian language
Syllabus
• Module 1: Introduction to Azure Machine Learning
• Module 2: Building Predictive Models with Azure Machine Learning
• Module 3: Operationalizing Machine Learning Models
• Module 4: Using Azure Machine Learning in Big Data Solutions
23
Microsoft Professional Program: Data Science
Unit 3 - Applied Data Science, Course 10b:
Implementing Predictive Solutions with Spark in Azure
HDInsight
Learn how to use Spark in Microsoft Azure HDInsight to create predictive analytics
and machine learning solutions.
Are you ready for big data science? In this course, learn how to implement predictive
analytics solutions for big data using Apache Spark in Microsoft Azure HDInsight.
See how to work with Scala or Python to cleanse and transform data and build
machine learning models with Spark ML (the machine learning library in Spark).
What you'll learn
• Using Spark to explore data and prepare for modeling
• Build supervised machine learning models
• Evaluate and optimize models
• Build recommenders and unsupervised machine learning models
Duration: 3 weeks
Total effort: 18 – 24 hours
Level: Intermediate
Prerequisite knowledge: Familiarity with Azure HDInsight. Familiarity with
databases and SQL. Some programming experience. A willingness to learn actively
in a self-paced manner.
Language: English, with Q&A worskshop in Croatian language
System Requirements
To complete the hands-on elements in this course, you will require an Azure
subscription and a Windows client computer. You can sign up for a free Azure trial
subscription (a valid credit card is required for verification, but you will not be
charged for Azure services). Note that the free trial is not available in all regions.
Syllabus
Module 1: Introduction to Data Science with Spark
Get started with Spark clusters in Azure HDInsight, and use Spark to run Python or
Scala code to work with data.
Module 2: Getting Started with Machine Learning
Learn how to build classification and regression models using the Spark ML library.
Module 3: Evaluating Machine Learning Models
Learn how to evaluate supervised learning models, and how to optimize model
parameters.
Module 4: Recommenders and Unsupervised Models
Learn how to build recommenders and clustering models using Spark ML.
24
Microsoft Professional Program: Data Science
Unit 3 - Applied Data Science, Course 10c: Analyzing
Big Data with Microsoft R
Learn how to use Microsoft R Server to analyze large datasets using R, one of the
most powerful programming languages.
The open-source programming language R has for a long time been popular
(particularly in academia) for data processing and statistical analysis. Among R's
strengths are that it's a succinct programming language and has an extensive
repository of third party libraries for performing all kinds of analyses. Together, these
two features make it possible for a data scientist to very quickly go from raw data to
summaries, charts, and even full-blown reports. However, one deficiency with R is
that traditionally it uses a lot of memory, both because it needs to load a copy of the
data in its entirety as a data.frame object, and also because processing the data
often involves making further copies (sometimes referred to as copy-on-modify).
This is one of the reasons R has been more reluctantly received by industry
compared to academia.
The main component of Microsoft R Server (MRS) is the RevoScaleR package,
which is an R library that offers a set of functionalities for processing large datasets
without having to load them all at once in the memory. RevoScaleR offers a rich set
of distributed statistical and machine learning algorithms, which get added to over
time. Finally, RevoScaleR also offers a mechanism by which we can take code that
we developed on our laptop and deploy it on a remote server such as SQL Server or
Spark (where the infrastructure is very different under the hood), with minimal effort.
In this course, we will show you how to use MRS to run an analysis on a large
dataset and provide some examples of how to deploy it on a Spark cluster or a SQL
Server database. Upon completion, you will know how to use R for big-data
problems.
Since RevoScaleR is an R package, we assume that the course participants are
familiar with R. A solid understanding of R data structures (vectors, matrices, lists,
data frames, environments) is required. Familiarity with 3rd party packages such as
dplyr is also helpful.
What you'll learn
• You will learn how to use MRS to read, process, and analyze large
datasets including:
• Read data from flat files into R’s data frame object, investigate the
structure of the dataset and make corrections, and store prepared
datasets for later use
• Prepare and transform the data
25
Microsoft Professional Program: Data Science
• Calculate essential summary statistics, do crosstabulation, write your own
summary functions, and visualize data with the ggplot2 package
• Build predictive models, evaluate and compare models, and generate
predictions on new data
Duration: 2 weeks
Total effort: 8 – 16 hours
Level: Intermediate
Prerequisite knowledge:
• Familiarity with R
Language: English, with Q&A worskshop in Croatian language
26
Microsoft Professional Program: Data Science
Unit 4 - Capstone Project: Data Science
Solve a real-world data science problem in this capstone project for the Microsoft
Professional Program in Data Science.
Showcase the knowledge and skills you’ve acquired during the Microsoft
Professional Program for Data Science, and solve a real-world data science problem
in this program capstone project. The project takes the form of a challenge in which
you will explore a dataset and develop a machine learning solution that is tested and
scored to determine your grade.
Duration: 4 weeks
Total effort: 12 – 16 hours
Level: Advanced
Language: English, with Q&A worskshop in Croatian language