about this specialization ask the right questions, … · · 2017-12-15video · statistical...
TRANSCRIPT
About This SpecializationAsk the right questions, manipulate data sets, and create visualizations to communicate results.
This Specialization covers the concepts and tools you'll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. In the final Capstone Project, you’ll apply the skills learned by building a data product using real-world data. At completion, students will have a portfolio demonstrating their mastery of the material.
10 coursesFollow the suggested order or choose yourown
ProjectsFollow the suggested order or choose yourown
Certi�catesFollow the suggested order or choose yourown
� � � � � � ��
����������������������������
� �������������������
����������
������ ���
1-4 hours/week
English, French, Chinese (Simpli�ed), Greek, Italian, Portuguese (Brazilian), Vietnamese, Russian, Turkish, Hebrew
About the CourseIn this course you will get an introduction to the main tools and ideas in the data scientist's toolbox. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and RStudio.
Week 1Week 1During Week 1, you'll learn about the goals and objectives of the Data Science Specialization and each of its components. You'll also get an overview of the field as well as instructions on how to install R.
Reading · Welcome to the Data Scientist's Toolbox
Reading · Pre-Course Survey
Reading · Syllabus
Reading · Specialization Textbooks
Video · Specialization Motivation
Reading · The Elements of Data Analytic Style
Video · The Data Scientist's Toolbox
Video · Getting Help
Video · Finding Answers
Video · R Programming Overview
Video · Getting Data Overview
Video · Exploratory Data Analysis Overview
Video · Reproducible Research Overview
Video · Statistical Inference Overview
Video · Regression Models Overview
Video · Practical Machine Learning Overview
Video · Building Data Products Overview
Video · Installing R on Windows {Roger Peng}
Video · Install R on a Mac {Roger Peng}
Video · Installing Rstudio {Roger Peng}
Video · Installing Outside Software on Mac (OS X Mavericks)
Quiz · Week 1 Quizs
Week 2Week 2: Installing the ToolboxThis is the most lecture-intensive week of the course. The primary goal is to get you set up with R, Rstudio, Github, and the other tools we will use throughout the Data Science Specialization and your ongoing work as a data scientist.
Video · Tips from Coursera Users - Optional Video
Video · Command Line Interface
Video · Introduction to Git
Video · Introduction to Github
Video · Creating a Github Repository
Video · Basic Git Commands
Video · Basic Markdown
Video · Installing R Packages
Video · Installing Rtools
Quiz · Week 2 Quiz
Week 3Week 3: Conceptual IssuesThe Week 3 lectures focus on conceptual issues behind study design and turning data into knowledge. If you have trouble or want to explore issues in more depth, please seek out answers on the forums. They are a great resource! If you happen to be a superstar who already gets it, please take the time to help your classmates by answering their questions as well. This is one of the best ways to practice using and explaining your skills to others. These are two of the key characteristics of excellent data scientists.
Video · Types of Questions
Video · What is Data?
Video · What About Big Data?
Video · Experimental Design
Quiz · Week 3 Quiz
Week 4Week 4: Course Project Submission & EvaluationIn Week 4, we'll focus on the Course Project. This is your opportunity to install the tools and set up the accounts that you'll need for the rest of the specialization and for work in data science.
Peer Review · Course Project
Reading · Post-Course Survey
� � � � � � ��
�������������
� �������������������
������ ��� English, French, Japanese, Chinese (Simpli�ed)
About the CourseIn this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code. Topics in statistical data analysis will provide working examples.
Week 1Week 1: Background, Getting Started, and Nuts & BoltsThis week covers the basics to get you started up with R. The Background Materials lesson contains information about course mechanics and some videos on installing R. The Week 1 videos cover the history of R and S, go over the basic data types in R, and describe the functions for reading and writing data. I recommend that you watch the videos in the listed order, but watching the videos out of order isn't going to ruin the story.
Reading · Welcome to R Programming
Reading · About the Instructor
Reading · Pre-Course Survey
Reading · Syllabus
Reading · Course Textbook
Reading · Course Supplement: The Art of Data Science
Reading · Data Science Podcast: Not So Standard Deviations
Video · Installing R on a Mac
Video · Installing R on Windows
Video · Installing R Studio (Mac)
Video · Writing Code / Setting Your Working Directory (Windows)
Video · Writing Code / Setting Your Working Directory (Mac)
Reading · Getting Started and R Nuts and Bolts
Video · Introduction
Video · Overview and History of R
Video · Getting Help
Video · R Console Input and Evaluation
Video · Data Types - R Objects and Attributes
Video · Data Types - Vectors and Lists
Video · Data Types - Matrices
Video · Data Types - Factors
Video · Data Types - Missing Values
Video · Data Types - Data Frames
Video · Data Types - Names Attribute
Video · Data Types - Summary
Video · Reading Tabular Data
Video · Reading Large Tables
Video · Textual Data Formats
Video · Connections: Interfaces to the Outside World
Video · Subsetting - Basics
Video · Subsetting - Lists
Video · Subsetting - Matrices
Video · Subsetting - Partial Matching
Video · Subsetting - Removing Missing Values
Video · Vectorized Operations
Quiz · Week 1 Quiz
Video · Introduction to swirl
Reading · Practical R Exercises in swirl Part 1
Practice Programming Assignment · swirl Lesson 1: Basic Building Blocks
Practice Programming Assignment · swirl Lesson 2: Workspace and Files
Practice Programming Assignment · swirl Lesson 3: Sequences of Numbers
Practice Programming Assignment · swirl Lesson 4: Vectors
Practice Programming Assignment · swirl Lesson 5: Missing Values
Practice Programming Assignment · swirl Lesson 6: Subsetting Vectors
Practice Programming Assignment · swirl Lesson 7: Matrices and Data Frames
Week 2Week 2: Programming with RWelcome to Week 2 of R Programming. This week, we take the gloves off, and the lectures cover key topics like control structures and functions. We also introduce the first programming assignment for the course, which is due at the end of the week.
Reading · Week 2: Programming with R
Video · Control Structures - Introduction
Video · Control Structures - If-else
Video · Control Structures - For loops
Video · Control Structures - While loops
Video · Control Structures - Repeat, Next, Break
Video · Your First R Function
Video · Functions (part 1)
Video · Functions (part 2)
Video · Scoping Rules - Symbol Binding
Video · Scoping Rules - R Scoping Rules
Video · Scoping Rules - Optimization Example (OPTIONAL)
Video · Coding Standards
Video · Dates and Times
Reading · Practical R Exercises in swirl Part 2
Practice Programming Assignment · swirl Lesson 1: Logic
Practice Programming Assignment · swirl Lesson 2: Functions
Practice Programming Assignment · swirl Lesson 3: Dates and Times
Quiz · Week 2 Quiz
Reading · Programming Assignment 1 INSTRUCTIONS: Air Pollution
Quiz · Programming Assignment 1: Quiz
Week 3Week 3: Loop Functions and DebuggingWe have now entered the third week of R Programming, which also marks the halfway point. The lectures this week cover loop functions and the debugging tools in R. These aspects of R make R useful for both interactive work and writing longer code, and so they are commonly used in practice.
Reading · Week 3: Loop Functions and Debugging
Video · Loop Functions - lapply
Video · Loop Functions - apply
Video · Loop Functions - mapply
Video · Loop Functions - tapply
Video · Loop Functions - split
Video · Debugging Tools - Diagnosing the Problem
Video · Debugging Tools - Basic Tools
Video · Debugging Tools - Using the Tools
Reading · Practical R Exercises in swirl Part 3
Practice Programming Assignment · swirl Lesson 1: lapply and sapply
Practice Programming Assignment · swirl Lesson 2: vapply and tapply
Quiz · Week 3 Quiz
Peer Review · Programming Assignment 2: Lexical Scoping
Week 4Week 4: Simulation & ProfilingThis week covers how to simulate data in R, which serves as the basis for doing simulation studies. We also cover the profiler in R which lets you collect detailed information on how your R functions are running and to identify bottlenecks that can be addressed. The profiler is a key tool in helping you optimize your programs. Finally, we cover the str function, which I personally believe is the most useful function in R.
Reading · Week 4: Simulation & Profiling
Video · The str Function
Video · Simulation - Generating Random Numbers
Video · Simulation - Simulating a Linear Model
Video · Simulation - Random Sampling
Video · R Profiler (part 1)
Video · R Profiler (part 2)
Quiz · Week 4 Quiz
Reading · Practical R Exercises in swirl Part 4
Practice Programming Assignment · swirl Lesson 1: Looking at Data
Practice Programming Assignment · swrl Lesson 2: Simulation
Practice Programming Assignment · swirl Lesson 3: Base Graphics
Reading · Programming Assignment 3 INSTRUCTIONS: Hospital Quality
Quiz · Programming Assignment 3: Quiz
Reading · Post-Course Survey
� � � � � � ��
�������������������������
� ��������������������
������ ��� English, Russian, French, Chinese (Simpli�ed)
About the CourseBefore you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.
Week 1Week 1In this first week of the course, we look at finding data and reading different file types.
Reading · Welcome to Week 1
Reading · Syllabus
Reading · Pre-Course Survey
Video · Obtaining Data Motivation
Video · Raw and Processed Data
Video · Components of Tidy Data
Video · Downloading Files
Video · Reading Local Files
Video · Reading Excel Files
Video · Reading XML
Video · Reading JSON
Video · The data.table Package
Reading · Practical R Exercises in swirl Part 1
Quiz · Week 1 Quiz
Week 2Week 2Welcome to Week 2 of Getting and Cleaning Data! The primary goal is to introduce you to the most common data storage systems and the appropriate tools to extract data from web or from databases like MySQL.
.Video · Reading from MySQL
Video · Reading from HDF5
Video · Reading from The Web
Video · Reading From APIs
Video · Reading From Other Sources
Quiz · Week 2 Quiz
Week 3Week 3Welcome to Week 3 of Getting and Cleaning Data! This week the lectures will focus on organizing, merging and managing the data you have collected using the lectures from Weeks 1 and 2.
.
Video · Subsetting and Sorting
Video · Summarizing Data
Video · Creating New Variables
Video · Reshaping Data
Video · Managing Data Frames with dplyr - Introduction
Video · Managing Data Frames with dplyr - Basic Tools
Video · Merging Data
Reading · Practical R Exercises in swirl Part 2
Practice Programming Assignment · swirl Lesson 1: Manipulating Data with dplyr
Practice Programming Assignment · swirl Lesson 2: Grouping and Chaining with dplyr
Practice Programming Assignment · swirl Lesson 3: Tidying Data with tidyr
Quiz · Week 3 Quiz
Week 4Week 4Welcome to Week 4 of Getting and Cleaning Data! This week we finish up with lectures on text and date manipulation in R. In this final week we will also focus on peer grading of Course Projects.
.
Video · Editing Text Variables
Video · Regular Expressions I
Video · Regular Expressions II
Video · Working with Dates
Video · Data Resources
Reading · Practical R Exercises in swirl Part 4
Practice Programming Assignment · swirl Lesson 1: Dates and Times with lubridate
Quiz · Week 4 Quiz
Peer Review · Getting and Cleaning Data Course Project
Reading · Post-Course Survey
� � � � � � ��
�� ����������������������
� ��������������������
������ ��� English, Chinese (Simpli�ed)
About the CourseThis course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data.
Week 1Week 1This week covers the basics of analytic graphics and the base plotting system in R. We've also included some background material to help you install R if you haven't done so already.
.Reading · Welcome to Exploratory Data Analysis
Reading · Syllabus
Reading · Pre-Course Survey
Video · Introduction
Reading · Exploratory Data Analysis with R Book
Reading · The Art of Data Science
Video · Installing R on Windows (3.2.1)
Video · Installing R on a Mac (3.2.1)
Video · Installing R Studio (Mac)
Video · Setting Your Working Directory (Windows)
Video · Setting Your Working Directory (Mac)
Video · Principles of Analytic Graphics
Video · Exploratory Graphs (part 1)
Video · Exploratory Graphs (part 2)
Video · Plotting Systems in R
Video · Base Plotting System (part 1)
Video · Base Plotting System (part 2)
Video · Base Plotting Demonstration
Video · Graphics Devices in R (part 1)
Video · Graphics Devices in R (part 2)
Reading · Practical R Exercises in swirl Part 1
Practice Programming Assignment · swirl Lesson 1: Principles of Analytic Graphs
Practice Programming Assignment · swirl Lesson 2: Exploratory Graphs
Practice Programming Assignment · swirl Lesson 3: Graphics Devices in R
Practice Programming Assignment · swirl Lesson 4: Plotting Systems
Practice Programming Assignment · swirl Lesson 5: Base Plotting System
Quiz · Week 1 Quiz
Peer Review · Course Project 1
Week 2Week 2Welcome to Week 2 of Exploratory Data Analysis. This week covers some of the more advanced graphing systems available in R: the Lattice system and the ggplot2 system. While the base graphics system provides many important tools for visualiz-ing data, it was part of the original R system and lacks many features that may be desirable in a plotting system, particular-ly when visualizing high dimensional data. The Lattice and ggplot2 systems also simplify the laying out of plots making it a much less tedious process..
Video · Lattice Plotting System (part 1)
Video · Lattice Plotting System (part 2)
Video · ggplot2 (part 1)
Video · ggplot2 (part 2)
Video · ggplot2 (part 3)
Video · ggplot2 (part 4)
Video · ggplot2 (part 5)
Reading · Practical R Exercises in swirl Part 2
Practice Programming Assignment · swirl Lesson 1: Lattice Plotting System
Practice Programming Assignment · swirl Lesson 2: Working with Colors
Practice Programming Assignment · swirl Lesson 3: GGPlot2 Part1
Practice Programming Assignment · swirl Lesson 4: GGPlot2 Part2
Practice Programming Assignment · swirl Lesson 5: GGPlot2 Extras
Quiz · Week 2 Quiz)
Week 3Week 3Welcome to Week 3 of Exploratory Data Analysis. This week covers some of the workhorse statistical methods for explor-atory analysis. These methods include clustering and dimension reduction techniques that allow you to make graphical displays of very high dimensional data (many many variables). We also cover novel ways to specify colors in R so that you can use color as an important and useful dimension when making data graphics. All of this material is covered in chapters 9-12 of my book Exploratory Data Analysis with R..
Video · Hierarchical Clustering (part 1)
Video · Hierarchical Clustering (part 2)
Video · Hierarchical Clustering (part 3)
Video · K-Means Clustering (part 1)
Video · K-Means Clustering (part 2)
Video · Dimension Reduction (part 1)
Video · Dimension Reduction (part 2)
Video · Dimension Reduction (part 3)
Video · Working with Color in R Plots (part 1)
Video · Working with Color in R Plots (part 2)
Video · Working with Color in R Plots (part 3)
Video · Working with Color in R Plots (part 4)
Reading · Practical R Exercises in swirl Part 3
Practice Programming Assignment · swirl Lesson 1: Hierarchical Clustering
Practice Programming Assignment · swirl Lesson 2: K Means Clustering
Practice Programming Assignment · swirl Lesson 3: Dimension Reduction
Practice Programming Assignment · swirl Lesson 4: Clustering Example
Week Week 4This week, we'll look at two case studies in exploratory data analysis. The first involves the use of cluster analysis tech-niques, and the second is a more involved analysis of some air pollution data. How one goes about doing EDA is often personal, but I'm providing these videos to give you a sense of how you might proceed with a specific type of dataset.
Video · Clustering Case Study
Video · Air Pollution Case Study
Reading · Practical R Exercises in swirl Part 4
Practice Programming Assignment · swirl Lesson 1: CaseStudy
Peer Review · Course Project 2
Reading · Post-Course Survey
� � � � � � ��
�� ������������������
About the CourseThis course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner. Repro-ducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Reproducibility allows for people to focus on the actual content of a data analysis, rather than on superficial details report-ed in a written summary. In addition, reproducibility makes an analysis more useful to others because the data and code that actually conducted the analysis are available. This course will focus on literate statistical analysis tools which allow one to publish data analyses in a single document that allows others to easily execute the same analysis to obtain the same results.
� ����������������������
����������
������ ��� English
6 weeks of study, 3-8 hours/week, the week will vary.
Week 1Week 1: Concepts, Ideas, & StructureThis week will cover the basic ideas of reproducible research since they may be unfamiliar to some of you. We also cover structuring and organizing a data analysis to help make it more reproducible. I recommend that you watch the videos in the order that they are listed on the web page, but watching the videos out of order isn't going to ruin the story.
Video · Introduction
Reading · Syllabus
Reading · Pre-course survey
Reading · Course Book: Report Writing for Data Science in R
Video · What is Reproducible Research About?
Video · Reproducible Research: Concepts and Ideas (part 1)
Video · Reproducible Research: Concepts and Ideas (part 2)
Video · Reproducible Research: Concepts and Ideas (part 3)
Video · Scripting Your Analysis
Video · Structure of a Data Analysis (part 1)
Video · Structure of a Data Analysis (part 2)
Video · Organizing Your Analysis
Quiz · Week 1 Quiz
Week 2Week 2: Markdown & knitrThis week we cover some of the core tools for developing reproducible documents. We cover the literate programming tool knitr and show how to integrate it with Markdown to publish reproducible web documents. We also introduce the first peer assessment which will require you to write up a reproducible data analysis using knitr.
Video · Coding Standards in R
Video · Markdown
Video · R Markdown
Video · R Markdown Demonstration
Video · knitr (part 1)
Video · knitr (part 2)
Video · knitr (part 3)
Video · knitr (part 4)
Quiz · Week 2 Quiz
Video · Introduction to Course Project 1
Peer Review · Course Project 1
Week 3Week 3: Reproducible Research Checklist & Evidence-based Data AnalysisThis week covers what one could call a basic check list for ensuring that a data analysis is reproducible. While it's not absolutely sufficient to follow the check list, it provides a necessary minimum standard that would be applicable to almost any area of analysis.
Video · Communicating Results
Video · RPubs
Video · Reproducible Research Checklist (part 1)
Video · Reproducible Research Checklist (part 2)
Video · Reproducible Research Checklist (part 3)
Video · Evidence-based Data Analysis (part 1)
Video · Evidence-based Data Analysis (part 2)
Video · Evidence-based Data Analysis (part 3)
Video · Evidence-based Data Analysis (part 4)
Video · Evidence-based Data Analysis (part 5)
� � � � � � ��
Statistical Inference
� �������������������
����������
������ ���
1-4 hours/week
English
About the CourseStatistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference. A practi-tioner can often be left in a debilitating maze of techniques, philosophies and nuance. This course presents the fundamen-tals of inference in a practical approach for getting things done. After taking this course, students will understand the broad directions of statistical inference and use this information for making informed choices in analyzing data.
Week 1Week 1: Probability & Expected ValuesThis week, we'll focus on the fundamentals including probability, random variables, expectations and more.
Video · 04 03 Expected values for PDFs
Reading · Practical R Exercises in swirl 1
Practice Programming Assignment · swirl Lesson 1: Introduction
Practice Programming Assignment · swirl Lesson 2: Probability1
Practice Programming Assignment · swirl Lesson 3: Probability2
Practice Programming Assignment · swirl Lesson 4: ConditionalProbability
Practice Programming Assignment · swirl Lesson 5: Expectations
Quiz · Quiz 1
Video · Introductory video
Reading · Welcome to Statistical Inference
Reading · Some introductory comments
Reading · Pre-Course Survey
Reading · Syllabus
Reading · Course Book: Statistical Inference for Data Science
Reading · Data Science Specialization Community Site
Reading · Homework Problems
Reading · Probability
Video · 02 01 Introduction to probability
Video · 02 02 Probability mass functions
Video · 02 03 Probability density functions
Reading · Conditional probability
Video · 03 01 Conditional Probability
Video · 03 02 Bayes' rule
Video · 03 03 Independence
Reading · Expected values
Video · 04 01 Expected values
Video · 04 02 Expected values, simple examples
Week 2Week 2: Variability, Distribution, & AsymptoticsWe're going to tackle variability, distributions, limits, and confidence intervals.
Reading · Variability
Video · 05 01 Introduction to variability
Video · 05 02 Variance simulation examples
Video · 05 03 Standard error of the mean
Video · 05 04 Variance data example
Reading · Distributions
Video · 06 01 Binomial distrubtion
Video · 06 02 Normal distribution
Video · 06 03 Poisson
Reading · Asymptotics
Video · 07 01 Asymptotics and LLN
Video · 07 02 Asymptotics and the CLT
Video · 07 03 Asymptotics and confidence intervals
Reading · Practical R Exercises in swirl Part 2
Practice Programming Assignment · swirl Lesson 1: Variance
Practice Programming Assignment · swirl Lesson 2: CommonDistros
Practice Programming Assignment · swirl Lesson 3: Asymptotics
Quiz · Quiz 2
Week 3Week: Intervals, Testing, & PvaluesWe will be taking a look at intervals, testing, and values in this lesson.
Week: Intervals, Testing, & Pvalues
We will be taking a look at intervals, testing, and pvalues in this lesson.
Reading · Confidence intervals
Video · 08 01 T confidence intervals
Video · 08 02 T confidence intervals example
Video · 08 03 Independent group T intervals
Video · 08 04 A note on unequal variance
Reading · Hypothesis testing
Video · 09 01 Hypothesis testing
Video · 09 02 Example of choosing a rejection region
Video · 09 03 T tests
Video · 09 04 Two group testing
Reading · P-values
Video · 10 01 Pvalues
Video · 10 02 Pvalue further examples
Reading · Knitr
Video · Just enough knitr to do the project
Reading · Practical R Exercises in swirl Part 3
Practice Programming Assignment · swirl Lesson 1: T Confidence Intervals
Practice Programming Assignment · swirl Lesson 2: Hypothesis Testing
Practice Programming Assignment · swirl Lesson 3: P Values
Quiz · Quiz 3
Week 4Week 4: Power, Bootstrapping, & Permutation TestsWe will begin looking into power, bootstrapping, and permutation tests.
Reading · Power
Video · 11 01 Power
Video · 11 02 Calculating Power
Video · 11 03 Notes on power
Video · 11 04 T test power
Video · 12 01 Multiple Comparisons
Reading · Resampling
Video · 13 01 Bootstrapping
Video · 13 02 Bootstrapping example
Video · 13 03 Notes on the bootstrap
Video · 13 04 Permutation tests
Quiz · Quiz 4
Peer Review · Statistical Inference Course Project
Reading · Practical R Exercises in swirl Part 4
Practice Programming Assignment · swirl Lesson 1: Power
Practice Programming Assignment · swirl Lesson 2: Multiple Testing
Practice Programming Assignment · swirl Lesson 3: Resampling
Reading · Post-Course Survey
� � � � � � �
�����������������
� ��������������������
������ ��� Englisht
About the CourseLinear models, as their name implies, relates an outcome to a set of predictors of interest using linear assumptions. Regression models, a subset of linear models, are the most important statistical analysis tool in a data scientist’s toolkit. This course covers regression analysis, least squares and inference using regression models. Special cases of the regres-sion model, ANOVA and ANCOVA will be covered as well. Analysis of residuals and variability will be investigated. The course will cover modern thinking on model selection and novel uses of regression models including scatterplot smoothing.
Week 1Week 1: Least Squares and Linear RegressionThis week, we focus on least squares and linear regression.
Reading · Welcome to Regression Models
Reading · Book: Regression Models for Data Science in R
Reading · Syllabus
Reading · Pre-Course Survey
Reading · Data Science Specialization Community Site
Reading · Where to get more advanced material
Reading · Regression
Video · Introduction to Regression
Video · Introduction: Basic Least Squares
Reading · Technical details
Video · Technical Details (Skip if you'd like)
Video · Introductory Data Example
Reading · Least squares
Video · Notation and Background
Video · Linear Least Squares
Video · Linear Least Squares Coding Example
Video · Technical Details (Skip if you'd like)
Reading · Regression to the mean
Video · Regression to the Mean
Reading · Practical R Exercises in swirl Part 1
Practice Programming Assignment · swirl Lesson 1: Introduction
Practice Programming Assignment · swirl Lesson 2: Residuals
Practice Programming Assignment · swirl Lesson 3: Least Squares Estimation
Quiz · Quiz 1
Week 2Week 2: Linear Regression & Multivariable RegressionThis week, we will work through the remainder of linear regression and then turn to the first part of multivariable regres-sion.
Reading · *Statistical* linear regression models
Video · Statistical Linear Regression Models
Video · Interpreting Coefficients
Video · Linear Regression for Prediction
Reading · Residuals
Video · Residuals
Video · Residuals, Coding Example
Video · Residual Variance
Reading · Inference in regression
Video · Inference in Regression
Video · Coding Example
Video · Prediction
Reading · Looking ahead to the project
Video · Really, really quick intro to knitr
Reading · Practical R Exercises in swirl Part 2
Practice Programming Assignment · swirl Lesson 1: Residual Variation
Practice Programming Assignment · swirl Lesson 2: Introduction to Multivariable Regression
Practice Programming Assignment · swirl Lesson 3: MultiVar Examples
Quiz · Quiz 2
Week 3Week 3: Multivariable Regression, Residuals, & DiagnosticsThis week, we'll build on last week's introduction to multivariable regression with some examples and then cover residuals, diagnostics, variance inflation, and model comparison.
Reading · Multivariable regression
Video · Multivariable Regression part I
Video · Multivariable Regression part II
Video · Multivariable Regression Continued
Video · Multivariable Regression Examples part I
Video · Multivariable Regression Examples part II
Video · Multivariable Regression Examples part III
Video · Multivariable Regression Examples part IV
Reading · Adjustment
Video · Adjustment Examples
Reading · Residuals
Video · Residuals and Diagnostics part I
Video · Residuals and Diagnostics part II
Video · Residuals and Diagnostics part III
Reading · Model selection
Video · Model Selection part I
Video · Model Selection part II
Video · Model Selection part III
Reading · Practical R Exercises in swirl Part 3
Practice Programming Assignment · swirl Lesson 1: MultiVar Examples2
Practice Programming Assignment · swirl Lesson 2: MultiVar Examples3
Practice Programming Assignment · swirl Lesson 3: Residuals Diagnostics and Variation
Quiz · Quiz 3
Practice Quiz · (OPTIONAL) Data analysis practice with immediate feedback (NEW! 10/18/2017
Week 4Week 4: Logistic Regression and Poisson RegressionThis week, we will work on generalized linear models, including binary outcomes and Poisson regression.
Practice Programming Assignment · swirl Lesson 1: MultiVar Examples2
Practice Programming Assignment · swirl Lesson 2: MultiVar Examples3
Practice Programming Assignment · swirl Lesson 3: Residuals Diagnostics and Variation
Quiz · Quiz 3
Practice Quiz · (OPTIONAL) Data analysis practice with immediate feedback (NEW! 10/18/2017
Reading · GLMs
Video · GLMs
Reading · Logistic regression
Video · Logistic Regression part I
Video · Logistic Regression part II
Video · Logistic Regression part III
Reading · Count Data
Video · Poisson Regression part I
Video · Poisson Regression part II
Reading · Mishmash
Video · Hodgepodge
Reading · Practical R Exercises in swirl Part 4
Practice Programming Assignment · swirl Lesson 1: Variance Inflation Factors
Practice Programming Assignment · swirl Lesson 2: Overfitting and Underfitting
Practice Programming Assignment · swirl Lesson 3: Binary Outcomes
Practice Programming Assignment · swirl Lesson 4: Count Outcomes
Quiz · Quiz 4
Peer Review · Regression Models Course Project
Reading · Post-Course Survey
� � � � � � ��
Practical Machine Learning
� �������������������
������ ��� English
About the CourseOne of the most common tasks performed by data scientists and data analysts are prediction and machine learning. This course will cover the basic components of building and applying prediction functions with an emphasis on practical appli-cations. The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates. The course will also introduce a range of model based and algorithmic machine learning methods including regression, classifi-cation trees, Naive Bayes, and random forests. The course will cover the complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation.
Week 1Week 1: Prediction, Errors, and Cross ValidationThis week will cover prediction, relative importance of steps, errors, and cross validation
Reading · Welcome to Practical Machine Learning
Reading · Syllabus
Reading · Pre-Course Survey
Video · Prediction motivation
Video · What is prediction?
Video · Relative importance of steps
Video · In and out of sample errors
Video · Prediction study design
Video · Types of errors
Video · Receiver Operating Characteristic
Video · Cross validation
Video · What data should you use?
Quiz · Quiz 1
Week 2 Week 2: The Caret PackageThis week will introduce the caret package, tools for creating features and preprocessing
Video · Caret package
Video · Data slicing
Video · Training options
Video · Plotting predictors
Video · Basic preprocessing
Video · Covariate creation
Video · Preprocessing with principal components analysis
Video · Predicting with Regression
Video · Predicting with Regression Multiple Covariates
Quiz · Quiz 2
Week 3Week 3: Predicting with trees, Random Forests, & Model Based PredictionsThis week we introduce a number of machine learning algorithms you can use to complete your course project.
Video · Predicting with trees
Video · Bagging
Video · Random Forests
Video · Boosting
Video · Model Based Prediction
Quiz · Quiz 3
Week 4Week 4: Regularized Regression and Combining PredictorsThis week, we will cover regularized regression and combining predictors.
Video · Regularized regression
Video · Combining predictors
Video · Forecasting
Video · Unsupervised Prediction
Quiz · Quiz 4
Reading · Course Project Instructions (READ FIRST)
Peer Review · Prediction Assignment Writeup
Quiz · Course Project Prediction Quiz
Reading · Post-Course Survey
� � � � � � ��
Developing Data Products
� ��������������������
������ ��� Englisht
About the CourseA data product is the production output from a statistical analysis. Data products automate complex analysis tasks or use technology to expand the utility of a data informed model, algorithm or inference. This course covers the basics of creating data products using Shiny, R packages, and interactive graphics. The course will focus on the statistical fundamentals of creating a data product that can be used to tell a story about data to a mass audience.
Week 1Course OverviewIn this overview module, we'll go over some information and resources to help you get started and succeed in the course.
Shiny, GoogleVis, and PlotlyNow we can turn to the first substantive lessons. In this module, you'll learn how to develop basic applications and interac-tive graphics in shiny, compose interactive HTML graphics with GoogleVis, and prepare data visualizations with Plotly.
Video · Welcome to Developing Data ProductsT
Reading · Syllabus
Reading · Welcome
Reading · Book: Developing Data Products in R
Reading · Community Site
Reading · R and RStudio Links & Tutorials
Reading · Shiny
Reading · Shinyapps.io Project
Video · Shiny 1.1
Video · Shiny 1.2
Video · Shiny 1.3
Video · Shiny 1.4
Video · Shiny 1.5
Video · Shiny 2.1
Video · Shiny 2.2
Video · Shiny 2.3
Video · Shiny 2.4
Video · Plotly 1.4
Video · Plotly 1.5
Video · Plotly 1.6
Video · Plotly 1.7
Video · Plotly 1.8
Quiz · Quiz 1
Video · Shiny 2.5
Video · Shiny 2.6
Video · Shiny Gadgets 1.1
Video · Shiny Gadgets 1.2
Video · Shiny Gadgets 1.3
Video · GoogleVis 1.1
Video · GoogleVis 1.2
Video · Plotly 1.1
Video · Plotly 1.2
Video · Plotly 1.3
Week 2Course OverviewR Markdown and LeafletDuring this module, we'll learn how to create R Markdown files and embed R code in an Rmd. We'll also explore Leaflet and use it to create interactive annotated maps.
Video · R Markdown 1.1
Video · R Markdown 1.2
Video · R Markdown 1.3
Video · R Markdown 1.4
Video · R Markdown 1.5
Video · R Markdown 1.6
Reading · Three Ways to Share R Markdown Products
Video · Leaflet 1.1
Video · Leaflet 1.2
Video · Leaflet 1.3
Video · Leaflet 1.4
Video · Leaflet 1.5
Video · Leaflet 1.6
Quiz · Quiz 2
Peer Review · R Markdown and Leaflet
Week 3R PackagesIn this module, we'll dive into the world of creating R packages and practice developing an R Markdown presentation that includes a data visualization built using Plotly.
Reading · R Packages
Video · R Packages (Part 1)
Video · R Packages (Part 2)
Video · Building R Packages Demo
Video · R Classes and Methods (Part 1)
Video · R Classes and Methods (Part 2)
Quiz · Quiz 3
Peer Review · R Markdown Presentation & Plotly
� � � � � � �� �
Data Science Capstone
� �������������������
����������
������ ��� English
4-9 hours/week
About the Capstone ProjectThe capstone project class will allow students to create a usable/public data product that can be used to show your skills to potential employers. Projects will be drawn from real-world problems and will be conducted with industry, government, and academic partners.
Week 1Overview, Understanding the Problem, and Getting the DataThis week, we introduce the project so you can get a clear grip on the problem at hand and begin working with the dataset.
Video · Welcome to the Capstone Project
Reading · Project Overview
Video · Welcome from SwiftKey
Video · You Are a Data Scientist Now
Reading · Syllabus
Video · Introduction to Task 0: Understanding the Problem
Reading · Task 0 - Understanding the problem
Reading · About the Copora
Video · Introduction to Task 1: Getting and Cleaning the Data
Reading · Task 1 - Getting and cleaning the data
Video · Regular Expressions: Part 1 (Optional)
Video · Regular Expressions: Part 2 (Optional)
Quiz · Quiz 1: Getting Started
Week 2Exploratory Data Analysis and ModelingThis week, we move on to the next tasks, exploratory data analysis and modeling. You'll also submit your milestone report and review submissions from your classmates.
Video · Introduction to Task 2: Exploratory Data Analysis
Reading · Task 2 - Exploratory Data Analysis
Video · Introduction to Task 3: Modeling
Reading · Task 3 - Modeling
Peer Review · Milestone Report
Week 3Prediction ModelThis week, you'll build and evaluate your prediction model. The goal is to make your model efficient and accurate.
Video · Introduction to Task 4: Prediction Model
Reading · Task 4 - Prediction Model
Quiz · Quiz 2: Natural language processing I
Week 4Creative ExplorationThis week's goal is to improve the predictive accuracy while reducing computational runtime and model complexity.
Video · Introduction to Task 5: Creative Exploration
Reading · Task 5 - Creative Exploration
Quiz · Quiz 3: Natural language processing II
Week 5Creative ExplorationThis week's goal is to improve the predictive accuracy while reducing computational runtime and model complexity.
Video · Introduction to Task 6: Data Product
Reading · Task 6 - Data Product
Week 6Slide DeckThis week, you'll work on developing the second component of your final project, a slide deck to accompany your data product.
Video · Introduction to Task 7: Slide Deck
Reading · Task 7 - Slide Deck
Week 7Final Project Submission and EvaluationThis week, you'll submit your final project and review the work of your classmates.
Peer Review · Final Project Submission
Video · Congratulations!