variations on regression models

Variations on Regression Models

Prof. BennettMath Models of Data Science

2/02/06

Outline

Steps in modelingReview of Least Squares modelModel in E & K pg 24-29Aqualsol version of E&KOther loss functionsOther regularization

Modeling Process

Gather data: input and output pairsRepresent data mathematically: (X,Y)Select parametric model: g(x)Select loss function: LossSelect regularizationOptimize parameters with respect to given lossEstimate out-of-sample accuracy

Predict Drug Bioavailability

Aqua solubility = Aquasol525 descriptors generated

Electronic TAETraditional

197 molecules with tested solubility

y R∈

525i R∈x

197=

Modeling Process


1-d Regression with bias

, b+w x

x

y

b=2

Modeling Process


Linear Regression

Given training data:

Construct linear function:

Goal for future data (x,y) with y unknown

( ) ( ) ( ) ( )( )1 1 2 2, , , , , , , , ,

points and labels i

ni

S y y y y

R y R

=

∈ ∈

i

i

x x x x

x

… …

1( ) , ' ( )

n

i ii

g w x b=

= = = +∑x x w x w

( )g y≈x

Least Squares Approximation

Want

Define error

Minimize loss

( )g x y≈

( , ) ( )f y y g ξ= − =x x

( )2

1( , ) ( )i i

iL g s y g

=

= −∑ x

Least Squares Loss

21

2

( ) ( ' )

2( ) '(

i iiy

b normb b

== + −

= + − −

)

L b

= + − + −

∑w x w

Xw e yXw e y Xw e y

Convex Functions

A function f is (strictly) convex on a convex set S, if and only if for any x,y∈S, f(λx+(1- λ)y)(<) ≤ λ f(x)+ (1- λ)f(y)

for all 0≤ λ ≤ 1.

x y

f(y)

f(x)

x+(1- λ)y

f(x+(1- λ)y)

Theorem

Consider problem min f(x) unconstrained. If and f is convex, then

is a global minimum.Proof:

( ) 0f x∇ =x

( ) ( ) ( ) ' ( ) by convexity of( ) since ( ) 0.

yf y f x y x f x f

f x f x

∀≥ + − ∇= ∇ =

Stationary Points

Note that this condition is not sufficient

( * ) 0f x∇ =Also true for

local max and saddle points

Modeling Process


Optimal Solution

Want:Mathematical Model:

Optimality Conditions:

2 2min ( , , ) ( ) || ||L b S b w= − + +w w y Xw e

b is a vector of on≈ +y Xw e e

( , , ) 2 '( ) 2 0L b S b∂= − − + =

∂w X y Xw e ww

( , , ) 2 '( ) 0L b S e bb

∂= − − =

∂w y Xw e

es

Optimal Solution

Thus :

Idea: Scale data so means are 0, e.g

' ' '' ' ( ) ( ) '

b

b mean mean

= −

⇒ = − = −

e e e y e Xwe y e Xw y X w

( ' ) ' ' bλ+ = −X X I w X y X e

' 0' 0==

e ye X

Recenter Data

Shift y by mean

Shift x by mean

1

1 :i i ii

y y yµ µ=

= = −∑

1

1 :i i ii=

= = −∑x x x x x

Ridge Regression with bias

Center data by

Calculate w

CalculateTo predict new point g(x)=x’w+b

x and µ

1( ) 'λ −= +w X'X I X y

'µ µ= − = −X X e y y e

( ' )b µ= − x w

Modeling Process


Generalization

To estimate generalization error:Divide test into training set= Xtrain

100 points in Aquasoland test set = Xtest

97 points in AquasolCreate g(x) using XtrainEvaluate on Xtest

Train and Test for λ

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

lambda

squa

red

erro

rtest errortrain error

Could we do better

Other loss functions?Other types of regularization?

Nonlinear time series

Read E&K pg 24-25

What are the modeling tricks?

Linear Programming

Matlab can solve linear programs of the following form for give f,A,Aeq,lb,ub

min '*

*eq eq

≤=

≤ ≤

x f xA x bA x blb x ub

Sample Problem T=[1 2 3 4 5 6 7 8 9 10 11 12 13 14]’R=[-1.7500

-4.0000-5.7500-7.0000-7.7500-8.0000-7.7500-7.0000-5.7500-4.0000-1.75001.00004.25008.0000]

0 2 4 6 8 10 12 14 16 18 20-10

0

10

20

30

40

50

Matlab2[ 1]i it tLet ith row of B be

Problem becomes

, , ,min [1000]*[ ]'w a b c wab c

we B a Re B b R

c

⎡ ⎤⎢ ⎥− − −⎡ ⎤ ⎡ ⎤⎢ ⎥ ≤⎢ ⎥ ⎢ ⎥⎢ ⎥− +⎣ ⎦ ⎣ ⎦⎢ ⎥⎣ ⎦

Comparison of norms ||w||=1

2-norm

1-norm

∞-norm

Inconsistent System of equations

Want Ax≈b?How does this correspond to our

framework?What are the loss functions?

Review of norms

2-norm

1-norm

∞-norm

22 1

|| || nii

y y=

= ∑

1 1|| || | |n

iiy y

== ∑

|| || max | |i iy y∞=

Translate the ideas to our framework

Using the ideas in E&K,formulate the following problem as a linear program

,min b Xw b y∞

+ −w



, 1min b Xw b y+ −w



, 11min || ||b Xw b y wλ+ − +w

Challenge problem

Popular loss function ε-insensitive

,min max(| ' | ,0)w b ii

x w b y ε+ − −∑

-ε ε

Discussion Questions

Sketch each type of loss function and discuss advantages and disadvantages of different loss functions? (1-norm, 2-norm, ∞-norm, ε-insensitive)What would be the ramification of using these same functions for regularization?

variations on regression models

Documents