part i: introduction to statistical learningdzeng/bios740/introduction.pdf · deﬁnition of...

PART I: INTRODUCTION TOSTATISTICAL LEARNING

Donglin Zeng, Department of Biostatistics, University of North Carolina

Statistical Decision Theory

Definition of statistical learning

I My definition: statistical learning is a framework ofstatistical methods and computational algorithms usingprobabilistic distribution generated data for the goal of eitherprediction or data extraction in future applications.

I Statistical learning consists of developing– statistical methods;– computational algorithms.

I Statistical learning concerns– empirical data and their randomness.

I Statistical learning aims for– future prediction;– understanding future data patterns.

I Hence, many scientific disciplines play roles in statisticallearning: probability and statistics, computer science, datascience, informatics and subject-area applications.

Other alias names for statistical learning

I Machine learning, data miningI Pattern recognitionI Supervised learning and unsupervised learningI Data analytics or predictive analytics

Comparisons between statistical learning andstatistical inference

I Traditional statistical inference focuses on understandingdistributional behavior of data.

I In statistical inference, estimation and hypothesis testing(inference) of distribution parameters are of most interest;bias, consistency, efficiency are the main concerns.

I In statistical learning, distribution estimation is lessimportant compared to the learning goals such asprediction and feature extraction.

I Thus, prediction accuracy is most important in statisticallearning.

I Prediction rule consistency and expected risk control are ofmore concern in statistical learning.

I However,– both assume data randomly generated from someunderlying distribution so account for random behaviorsin the procedures;– both rely on data-dependent objective functions forestimation and inference;– both, more or less, involve development of statisticalmodels for data and computation algorithm for execution;– more specifically, supervised learning is analogue toregression and unsupervised learning is to densityestimation.

Challenges in modern statistical learning

I Method challenges: what kinds of methods/models enablethe achievement of prediction goals?

I Data challenges: how to deal with data complexity:dimensionality, heterogeneous structure, missing data etc.

I Algorithm challenges: what kind of computationalgorithms are suitable for estimation and data?

I Inference challenges: how well is the performance whenapplication to future data?

Example 1. Email Spam Data

Example 2. Prostate Cancer Data

Example 3. Handwritten Digit Data

Example 4. DNA Expression Data

Overview of lectures on statistical learning

– I will introduce a number of statistical or machine learningmethods.

– I will discuss probabilistic and statistical theory behindlearning methods.

– Computation algorithms and examples will be usedthroughout lectures.

What you should know

I Many data examples and figures are taken from Hastie,Tibshirani and Friedman’s book.

I A number of R-algorithms and examples are taken from avariety of web-sources publicly available.

I All errors in this course are mine.

Statistical Decision Theory (Supervised Learning)

I The goal of supervised learning is to learn a prediction ruleto predict outcome given subject’s feature variables.

I The components in supervised learning– X: feature variables – Y: outcome variable (continuous,categorical, ordinal)

I We assume that (X,Y) follows some distribution.I We aim to determine a prediction rule:

f : X→ Y

using available data (X1,Y1), ..., (Xn,Yn), called trainingdata or training sample.

Loss function to assess a prediction rule

I Loss function is a functional that for a given rule f and aspecific subject with (X,Y), what is the incurred loss due toimprecision.

I General notation: L(y, x; f ), but usually, it is defined basedone certain metric between y and f (x). For the latter, weuse L(y, f (x)).

I Examples of typical loss functions– squared loss: L(y, f ) = (y− f )2

– absolute deviation loss: L(y, f ) = |y− f |– Huber loss:L(y, f ) = (y− f )2I(|y− f | < δ) + (2δ|y− f | − δ2)I(|y− f | ≥ δ)– zero-one loss: L(y, f ) = I(y 6= f ) – preference loss:L(y1, y2, f1, f2) = 1− I(y1 < y2, f1 < f2)

Plot of loss function

−2 −1 0 1 2

Statistical framework for supervised learning

I Feature variables: X; outcome: Y.I Loss function: L(y, f (x)).I The goal is to find the optimal prediction f ∗ to minimize

the expected prediction error:

EPE(f ) = E [L(Y, f (X))] .

I Training data: (X1,Y1), ..., (Xn,Yn).

part i: introduction to statistical learningdzeng/bios740/introduction.pdf · deﬁnition of...

Documents

bootstrap differential privacythe main concern seems to be...

multi-jobproductionsystems: deﬁnition, problems,analysis...

indirect rule learning: support vector...

brownian motion wiener process: deﬁnition. deﬁnition...

brownian motion - university of...

gabriele dragottocomp551 @ mcgill university gabriele...

1 — intro to statistical research...

deﬁnition - circle center

abstract - arxiv · 2018-03-09 · domain at hand beyond...

fluid mechanics - chapter 2: aerodynamics · deﬁnition...

behavioural requirements language deﬁnition

subject deﬁnition and objectives

home ess deﬁnition

area%ofstudy%two% -...

statistical analysis of technical information in digital...

center-piece subgraphs: problem deﬁnition and fast … ·...

hamiltonian monte carlo for scalable deep...

xml schema deﬁnition language...

votable format deﬁnition version 1 - andromeda.star.bris...

introduction deﬁnition planning heuristics...