on prediction
TRANSCRIPT
On prediction
Jussi Hakanen
Post-doctoral researcher [email protected]
April 22, 2014 TIES445 Data mining (guest lecture)
Learning outcomes
To understand the basic principles of
prediction
To understand linear regression in prediction
To be aware of the connection between the
least squares and optimization
April 22, 2014 TIES445 Data mining (guest lecture)
Exercise
Find out issues related to prediction in
data mining
Work in pairs or small groups
Time: 10 minutes
April 22, 2014 TIES445 Data mining (guest
lecture)
Summary of the exercise
How to measure quality of prediction?
How to avoid overlearning/overfitting?
Supervised learning
How to predict for missing data?
Different applications (stock markets, biology, sport prediction, income in finance, energy water consumption, β¦)
Underfitting
Bias of the model
Concept drift (inaccuracy of the prediction)
Obtaining new knowledge from a given data
April 22, 2014 TIES445 Data mining (guest
lecture)
Motivation
How to estimate data β Within the ranges of the
dataset?
β Outside of the dataset?
Handling missing data, outliers
Predictive vs. descriptive models
Prediction for numerical values (cf. classification)
April 22, 2014 TIES445 Data mining (guest lecture)
Concepts
Predictor (or independent or input) variables
π₯ = π₯1, β¦ , π₯ππ (π β₯ 1)
Response (or dependent or output) variables
π¦ = π¦1, β¦ , π¦ππ (π β₯ 1)
Regression model/function
β A model/function describing the prediction used
Linear regression
β Regression model/function is linear
April 22, 2014 TIES445 Data mining (guest lecture)
Prediction
A set of data for which the values of predictor and response variables are known
β π data points π₯π , π¦π , π = 1, β¦ , π
β π₯π = π₯1π, β¦ , π₯π
π π
β π¦π = π¦1π, β¦ , π¦π
π π
Idea is to use prediction models for predicting a value for such predictor variables for which we donβt know the response
Note: π should be greater or equal than π!
Interpolation vs. extrapolation
Can give misleading results if not interpreted carefully!!!
Accuracy important β Different measures of accuracy
β Can be used e.g. to choose between different models and/or for choosing values for different parameters in the models
β Can sometimes be sacrificed for a simpler model
April 22, 2014 TIES445 Data mining (guest lecture)
Regression analysis
Used for prediction and forecasting
Parametric/non-parametric regression
Regression function depends of a finite number of
unknown parameters
Non-parametric: regression function is in a set of
functions (can be infinite dimensional)
Linear and nonlinear regression
W.r.t to the parameters
April 22, 2014 TIES445 Data mining (guest lecture)
Linear regression
Model linear w.r.t. parameters β Not necessarily linear w.r.t. predictor variables
π¦ = π0 + πππ₯πππ=1
β π¦ is a predicted estimate of the mean value at π₯
β π0, β¦ , ππ are parameters
Oldest and most widely used due to simplicity
Typically, the model used is not exact β an error exists β π¦π = π¦ π + ππ for each data point π₯π , π = 1, β¦ , π
β In matrix terms: π¦ = ππ + π
How to select values for the parameters π?
April 22, 2014 TIES445 Data mining (guest lecture)
Least squares
Determining orbits of bodies around the Sun from astronomical observations (Legendre, 1805; Gauss, 1809)
Idea: minimize the sum of the squared errors
For problems with π > π
ππ 2ππ=1 = π¦π β πππ₯π
πππ=0
2ππ=1
minπ
π¦π β πππ₯πππ
π=0
2ππ=1
β Optimization problem!
Parameter values minimizing the above can be shown to be πβ = πππ β1πππ¦ β Direct solution requires πππ to be invertible (problems if π is small or
there are linear dependences between π₯π)
β Typically πβ is computed by numerical linear algebra
April 22, 2014 TIES445 Data mining (guest lecture)
Example
April 22, 2014 TIES445 Data mining (guest lecture)
π¦ = β0.6777 + 3.0166π₯
Example (cont.)
April 22, 2014 TIES445 Data mining (guest lecture)
π¦ = 4.1579 β 0.0057π₯ + 0.3053π₯2
Notes
The parameter values in linear regression can
be interpreted as follows:
β If the value of predictor variable π₯π is increased by
one unit and the values of other predictor variables
remain the same, then ππ denotes the change in
prediction
β π¦ = π0 + πππ₯πππ=1
April 22, 2014 TIES445 Data mining (guest
lecture)
Connection to optimization
Least squares β unconstrained optimization problem
minπ
1
2 (ππ(π))2π
π=1 = minπ
1
2π π 2
β Function ππ(π) = π¦π β β(π, π₯π) where β(π, π₯) is the model used
β E.g. β π, π₯ = π0 + πππ₯πππ=1
Gauss-Newton method
β Taylor (1st order): π π, πβ β π πβ + π»π πβ π(π β πβ)
β β πβ+1 = πβ β π»π πβ π»π πβ π β1π»π πβ π(πβ)
Connection to Newtonβs method
β Hessian of 1
2π π 2: π»π πβ π»π πβ π
+ π»2ππ πβ ππ(πβ)π
π=1
β Gauss-Newton is equivalent with Newton except the second order term!
April 22, 2014 TIES445 Data mining (guest lecture)
Function approximation
Prediction can be used in optimization for approximating the objective function
Typically used when the evaluation of the objective function is time consuming β E.g. if the model is a partial differential equation that
takes significant amount of time to solve numerically
β Reduces time for optimization since typically a large amount of function evaluations are required
Examples of approximation models are polynomial approximation, radial basis functions (RBFs), Kriging, support vector regression
April 22, 2014 TIES445 Data mining (guest lecture)
Regularization
Previously, no requirements were made for the parameter values β Unconstraint optimization problem
E.g. need for constraining the size of parameters
Tikhonov regularization (ridge regression) β Add a constraint that π 2, the πΏ2 norm of the parameter vector is
not greater than a given value
β Can be considered as unconstraint optimization problem by adding a penalty term π½ π 2 to the objective function
Lasso method (least absolute shrinkage and selection operator) β Add a constraint that π 1, the πΏ1 norm of the parameter vector is
not greater than a given value
β Prefers solutions with fewer non-zeros
April 22, 2014 TIES445 Data mining (guest lecture)
Conclusions
What were the keypoints from your
perspective? What do you remember best?
Extrapolation and interpolation can be
dangerous
Regularization is important?
April 22, 2014 TIES445 Data mining (guest lecture)
April 22, 2014
Thank You!
Dr. Jussi Hakanen
Industrial Optimization Group
http://www.mit.jyu.fi/optgroup/
Department of Mathematical Information Technology
P.O. Box 35 (Agora)
FI-40014 University of JyvΓ€skylΓ€
http://users.jyu.fi/~jhaka/en/
TIES445 Data mining (guest lecture)