variations on regression models
TRANSCRIPT
Variations on Regression Models
Prof. BennettMath Models of Data Science
2/02/06
Outline
Steps in modelingReview of Least Squares modelModel in E & K pg 24-29Aqualsol version of E&KOther loss functionsOther regularization
Modeling Process
Gather data: input and output pairsRepresent data mathematically: (X,Y)Select parametric model: g(x)Select loss function: LossSelect regularizationOptimize parameters with respect to given lossEstimate out-of-sample accuracy
Predict Drug Bioavailability
Aqua solubility = Aquasol525 descriptors generated
Electronic TAETraditional
197 molecules with tested solubility
y R∈
525i R∈x
197=
Modeling Process
Gather data: input and output pairsRepresent data mathematically: (X,Y)Select parametric model: g(x)Select loss function: LossSelect regularizationOptimize parameters with respect to given lossEstimate out-of-sample accuracy
1-d Regression with bias
, b+w x
x
y
b=2
Modeling Process
Gather data: input and output pairsRepresent data mathematically: (X,Y)Select parametric model: g(x)Select loss function: LossSelect regularizationOptimize parameters with respect to given lossEstimate out-of-sample accuracy
Linear Regression
Given training data:
Construct linear function:
Goal for future data (x,y) with y unknown
( ) ( ) ( ) ( )( )1 1 2 2, , , , , , , , ,
points and labels i
ni
S y y y y
R y R
=
∈ ∈
i
i
x x x x
x
… …
1( ) , ' ( )
n
i ii
g w x b=
= = = +∑x x w x w
( )g y≈x
Least Squares Approximation
Want
Define error
Minimize loss
( )g x y≈
( , ) ( )f y y g ξ= − =x x
( )2
1( , ) ( )i i
iL g s y g
=
= −∑ x
Least Squares Loss
21
2
( ) ( ' )
2( ) '(
i iiy
b normb b
== + −
= + − −
)
L b
= + − + −
∑w x w
Xw e yXw e y Xw e y
Convex Functions
A function f is (strictly) convex on a convex set S, if and only if for any x,y∈S, f(λx+(1- λ)y)(<) ≤ λ f(x)+ (1- λ)f(y)
for all 0≤ λ ≤ 1.
x y
f(y)
f(x)
x+(1- λ)y
f(x+(1- λ)y)
Theorem
Consider problem min f(x) unconstrained. If and f is convex, then
is a global minimum.Proof:
( ) 0f x∇ =x
( ) ( ) ( ) ' ( ) by convexity of( ) since ( ) 0.
yf y f x y x f x f
f x f x
∀≥ + − ∇= ∇ =
Stationary Points
Note that this condition is not sufficient
( * ) 0f x∇ =Also true for
local max and saddle points
Modeling Process
Gather data: input and output pairsRepresent data mathematically: (X,Y)Select parametric model: g(x)Select loss function: LossSelect regularizationOptimize parameters with respect to given lossEstimate out-of-sample accuracy
Optimal Solution
Want:Mathematical Model:
Optimality Conditions:
2 2min ( , , ) ( ) || ||L b S b w= − + +w w y Xw e
b is a vector of on≈ +y Xw e e
( , , ) 2 '( ) 2 0L b S b∂= − − + =
∂w X y Xw e ww
( , , ) 2 '( ) 0L b S e bb
∂= − − =
∂w y Xw e
es
Optimal Solution
Thus :
Idea: Scale data so means are 0, e.g
' ' '' ' ( ) ( ) '
b
b mean mean
= −
⇒ = − = −
e e e y e Xwe y e Xw y X w
( ' ) ' ' bλ+ = −X X I w X y X e
' 0' 0==
e ye X
Recenter Data
Shift y by mean
Shift x by mean
1
1 :i i ii
y y yµ µ=
= = −∑
1
1 :i i ii=
= = −∑x x x x x
Ridge Regression with bias
Center data by
Calculate w
CalculateTo predict new point g(x)=x’w+b
x and µ
1( ) 'λ −= +w X'X I X y
'µ µ= − = −X X e y y e
( ' )b µ= − x w
Modeling Process
Gather data: input and output pairsRepresent data mathematically: (X,Y)Select parametric model: g(x)Select loss function: LossSelect regularizationOptimize parameters with respect to given lossEstimate out-of-sample accuracy
Generalization
To estimate generalization error:Divide test into training set= Xtrain
100 points in Aquasoland test set = Xtest
97 points in AquasolCreate g(x) using XtrainEvaluate on Xtest
Train and Test for λ
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
70
80
lambda
squa
red
erro
rtest errortrain error
Could we do better
Other loss functions?Other types of regularization?
Nonlinear time series
Read E&K pg 24-25
What are the modeling tricks?
Linear Programming
Matlab can solve linear programs of the following form for give f,A,Aeq,lb,ub
min '*
*eq eq
≤=
≤ ≤
x f xA x bA x blb x ub
Sample Problem T=[1 2 3 4 5 6 7 8 9 10 11 12 13 14]’R=[-1.7500
-4.0000-5.7500-7.0000-7.7500-8.0000-7.7500-7.0000-5.7500-4.0000-1.75001.00004.25008.0000]
0 2 4 6 8 10 12 14 16 18 20-10
0
10
20
30
40
50
Matlab2[ 1]i it tLet ith row of B be
Problem becomes
, , ,min [1000]*[ ]'w a b c wab c
we B a Re B b R
c
⎡ ⎤⎢ ⎥− − −⎡ ⎤ ⎡ ⎤⎢ ⎥ ≤⎢ ⎥ ⎢ ⎥⎢ ⎥− +⎣ ⎦ ⎣ ⎦⎢ ⎥⎣ ⎦
Comparison of norms ||w||=1
2-norm
1-norm
∞-norm
Inconsistent System of equations
Want Ax≈b?How does this correspond to our
framework?What are the loss functions?
Review of norms
2-norm
1-norm
∞-norm
22 1
|| || nii
y y=
= ∑
1 1|| || | |n
iiy y
== ∑
|| || max | |i iy y∞=
Translate the ideas to our framework
Using the ideas in E&K,formulate the following problem as a linear program
,min b Xw b y∞
+ −w
Translate the ideas to our framework
Using the ideas in E&K,formulate the following problem as a linear program
, 1min b Xw b y+ −w
Translate the ideas to our framework
Using the ideas in E&K,formulate the following problem as a linear program
, 11min || ||b Xw b y wλ+ − +w
Challenge problem
Popular loss function ε-insensitive
,min max(| ' | ,0)w b ii
x w b y ε+ − −∑
-ε ε
Discussion Questions
Sketch each type of loss function and discuss advantages and disadvantages of different loss functions? (1-norm, 2-norm, ∞-norm, ε-insensitive)What would be the ramification of using these same functions for regularization?