smoothing spline anova - statisticsusers.stat.umn.edu/~helwig/notes/ssanova-notes.pdf · smoothing...

87
Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 1

Upload: others

Post on 22-Mar-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Smoothing Spline ANOVA

Nathaniel E. Helwig

Assistant Professor of Psychology and StatisticsUniversity of Minnesota (Twin Cities)

Updated 04-Jan-2017

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 1

Page 2: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Copyright

Copyright c© 2017 by Nathaniel E. Helwig

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 2

Page 3: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Outline of Notes

1) IntroductionParametric regressionNonparametric regressionSmoothing splines

2) Background TheoryAveraging operatorsHilbert spacesReproducing kernels

3) Estimation & Inference:Penalized least squaresSmoothing parameter selectionBayesian confidence intervals

4) SSANOVA in Practice:One-way SSANOVATwo-way SSANOVA (additive)Two-way SSANOVA (interactive)

For a thorough treatment see:

Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer-Verlag.

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 3

Page 4: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction

Introduction

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 4

Page 5: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction Parametric Regression

Parametric Regression Model: Scalar Form

The multiple linear regression model has the form

yi =

p∑j=1

bjxij + ei

for i ∈ {1, . . . ,n} whereyi ∈ R is the real-valued response for the i-th observationbj ∈ R is the j-th predictor’s regression slopexij ∈ R is the j-th predictor for the i-th observation

eiiid∼ N(0, σ2) is Gaussian measurement error

Implies that (yi |xi1, . . . , xip)ind∼ N(

∑pj=1 bjxij , σ

2)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 5

Page 6: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction Parametric Regression

Parametric Regression Model: Matrix Form

The multiple linear regression model has the form

y = Xb + e

wherey = (y1, . . . , yn)′ ∈ Rn is the n × 1 response vectorX = [x1, . . . ,xp] ∈ Rn×p is the n × p design matrix• xj = (x1j , . . . , xnj )

′ ∈ Rn is j-th predictor vector (n × 1)

b = (b1, . . . ,bp)′ ∈ Rp is p × 1 vector of coefficientse = (e1, . . . ,en)′ ∈ Rn is the n × 1 error vector

Implies that (y|x) ∼ N(Xb, σ2In)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 6

Page 7: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction Parametric Regression

Ordinary Least Squares Solution

The ordinary least squares (OLS) problem is

minb∈Rp

1n‖y− Xb‖2 ←→ min

b∈Rp

1n

n∑i=1

(yi − yi)2

where ‖ · ‖ denotes the Euclidean norm and yi =∑p

j=1 bjxij .

The OLS solution has the form

b = (X′X)−1X′y

and the fitted values corresponding to b are given by

y = Xb = Hy

where H = X(X′X)−1X′ is the hat matrix.Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 7

Page 8: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction Parametric Regression

Summary of Results

Using the model assumption (y|x) ∼ N(Xb, σ2In), we have

b ∼ N(b, σ2(X′X)−1)

y ∼ N(Xb, σ2H)

e ∼ N(0, σ2(In − H))

where e = y− y is the residual vector.

Typically σ2 is unknown, so we use the MSE σ2 = 1n−p

∑ni=1 e2

i .

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 8

Page 9: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction Nonparametric Regression

Nonparametric Regression Model

The Gaussian nonparametric regression model has the form

yi = η(xi) + ei

for i ∈ {1, . . . ,n} whereyi ∈ R is the real-valued response for the i-th observationxi ∈ Rp is the predictor vector for the i-th observationη : Rp → R is an unknown smooth function

eiiid∼ N(0, σ2) is Gaussian measurement error

Implies that (yi |xi1, . . . , xip)ind∼ N(η(xi), σ

2)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 9

Page 10: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction Nonparametric Regression

Additive versus Interactive Models

Suppose that xi = (xi1, xi2) with xi1 ∈ X1 and xi2 ∈ X2.

We could fit one of two possible models:

Additive : η(xi) = η0 + η1(xi1) + η2(xi2)

Interaction : η(xi) = η0 + η1(xi1) + η2(xi2) + η12(xi1, xi2)

whereη0 is a constant functionη1 is main effect of first predictorη2 is main effect of second predictorη12 is interaction effect

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 10

Page 11: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction Nonparametric Regression

Example 1: Continuous and Nominal Covariates

xi = (xi1, xi2) with xi1 ∈ [0,1] and xi2 ∈ {a,b}.

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Additive

x1

y

x2 = ax2 = b

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Interaction

x1

y

x2 = ax2 = b

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 11

Page 12: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction Nonparametric Regression

Example 1: R Code

addfun = function(x1,x2){funval = sin(2*pi*x1)idx = which(x2=="a")funval[idx] = funval[idx] + 2funval

}intfun = function(x1,x2){

funval = sin(2*pi*x1)idx = which(x2=="a")funval[idx] = funval[idx] + 2 + sin(4*pi*x1[idx])funval

}

dev.new(width=12,height=6,noRStudioGD=TRUE)par(mfrow=c(1,2))x1 = seq(0,1,length=200)plot(x1,addfun(x1,rep("a",200)),type="l",ylim=c(-2,4),main="Additive",

ylab="y",cex.axis=1.25,cex.lab=1.5,cex.main=3)lines(x1,addfun(x1,rep("b",200)),lty=2)legend("bottomleft",legend=c(expression(x[2]*" = "*a),expression(x[2]*" = "*b)),

lty=1:2,bty="n",cex=1.5)plot(x1,intfun(x1,rep("a",200)),type="l",ylim=c(-2,4),main="Interaction",

ylab="y",cex.axis=1.25,cex.lab=1.5,cex.main=3)lines(x1,intfun(x1,rep("b",200)),lty=2)legend("bottomleft",legend=c(expression(x[2]*" = "*a),expression(x[2]*" = "*b)),

lty=1:2,bty="n",cex=1.5)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 12

Page 13: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction Nonparametric Regression

Example 2: Two Continuous Covariates

xi = (xi1, xi2) with xi1, xi2 ∈ [0,1].

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Additive

x1

x2

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Interaction

x1

x2

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 13

Page 14: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction Nonparametric Regression

Example 2: R Code

addfun = function(x1,x2){sin(2*pi*x1) + cos(4*pi*x2*(1-x2))

}intfun = function(x1,x2){sin(2*pi*x1) + cos(4*pi*x2*(1-x2)) + 2*sin(pi*(x1-x2))

}

xs = seq(0,1,length=50)xg = expand.grid(xs,xs)dev.new(width=12,height=6,noRStudioGD=TRUE)par(mfrow=c(1,2))zmat = matrix(addfun(xg[,1],xg[,2]),50,50)image(xs,xs,zmat,xlab="x1",ylab="x2",main="Additive",

cex.axis=1.25,cex.lab=1.5,cex.main=3)zmat = matrix(intfun(xg[,1],xg[,2]),50,50)image(xs,xs,zmat,xlab="x1",ylab="x2",main="Interaction",

cex.axis=1.25,cex.lab=1.5,cex.main=3)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 14

Page 15: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction Smoothing Splines

Smoothing Splines on {1, . . . ,K}Suppose xi ∈ {1, . . . ,K} and note that ηf is a vector of length K .

f = (f1, . . . , fK )′ ∈ RK is vector corresponding to ηf

ηf(1) = f1, ηf(2) = f2, . . . , ηf(K ) = fKLet ηf =

∑Kx=1 ηf(x)/K denote the mean

A nominal smoothing spline is the ηλ ∈ RK that minimizes

1n

n∑i=1

(yi − ηf(xi))2 + λJ(ηf)

where λ ≥ 0 is smoothing parameter and J(ηf) is roughness penalty.J(ηf) =

∑Kx=1(ηf(x)− ηf)

2 to shrink towards constant

J(ηf) =∑K

x=1 ηf(x)2 to shrink towards zeroNathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 15

Page 16: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction Smoothing Splines

Polynomial Smoothing Splines on [0,1]

Suppose xi ∈ [0,1] and let C(m)[0,1] = {η : η(m) ∈ L2[0,1]}.η(m) = dmη

dxm denotes m-th derivative of η

L2[0,1] = {η :∫ 1

0 η2dx <∞}

A polynomial smoothing spline is the ηλ ∈ C(m)[0,1] that minimizes

1n

n∑i=1

(yi − η(xi))2 + λ

∫ 1

0(η(m))2dx

where λ ≥ 0 is the smoothing parameter and m is spline order.Related to natural spline in numerical analysis literature

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 16

Page 17: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction Smoothing Splines

Cubic Smoothing Splines

Setting m = 2 results in classic cubic smoothing spline.x1 < x2 < · · · < xq are “knots” (distinct xi values)ηλ is piecewise cubic polynomial, and is linear beyond x1 and xq

ηλ is three-times differentiable, and 3rd derivative jumps at “knots”As λ→ 0, ηλ approaches minimum curvature interpolantAs λ→∞, ηλ approaches simple linear regression

Can also view cubic smoothing spline as solution to

min1n

n∑i=1

(yi − η(xi))2 subject to∫ 1

0η2dx ≤ ρ

for some ρ ≥ 0, which is least-squares with soft constraint.

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 17

Page 18: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction Smoothing Splines

Example with R’s spline Function

yi = sin(2πxi) + ei where xi = i/20 for i ∈ {0,1,2, . . . ,20} andNo Noise: ei = 0 ∀iSome Noise: ei

iid∼ N(0,0.152)

More Noise: eiiid∼ N(0,0.252)

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

No Noise

x

y

●●

● ●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

Some Noise

x

y

● ●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

More Noise

xy

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 18

Page 19: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction Smoothing Splines

spline Function (R code)

dev.new(width=12,height=4,noRStudioGD=TRUE)par(mfrow=c(1,3))x = seq(0,1,length=21)y = sin(2*pi*x)mysp = spline(x,y,method="natural")plot(x,y,main="No Noise",ylim=c(-1.5,1.5))lines(x,sin(2*pi*x),lty=2)lines(mysp)

set.seed(1)x = seq(0,1,length=21)y = sin(2*pi*x) + rnorm(21,sd=0.15)mysp = spline(x,y,method="natural")plot(x,y,main="Some Noise",ylim=c(-1.5,1.5))lines(x,sin(2*pi*x),lty=2)lines(mysp)

set.seed(1)x = seq(0,1,length=21)y = sin(2*pi*x) + rnorm(21,sd=0.25)mysp = spline(x,y,method="natural")plot(x,y,main="More Noise",ylim=c(-1.5,1.5))lines(x,sin(2*pi*x),lty=2)lines(mysp)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 19

Page 20: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction Smoothing Splines

Same Example with R’s smooth.spline Function

yi = sin(2πxi) + ei where xi = i/20 for i ∈ {0,1,2, . . . ,20} andNo Noise: ei = 0 ∀iSome Noise: ei

iid∼ N(0,0.152)

More Noise: eiiid∼ N(0,0.252)

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

No Noise

x

y

●●

● ●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

Some Noise

x

y

● ●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

More Noise

xy

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 20

Page 21: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Introduction Smoothing Splines

smooth.spline Function (R code)

dev.new(width=12,height=4,noRStudioGD=TRUE)par(mfrow=c(1,3))set.seed(1)x = seq(0,1,length=21)y = sin(2*pi*x)mysp = smooth.spline(x,y)plot(x,y,main="No Noise",ylim=c(-1.5,1.5))lines(x,sin(2*pi*x),lty=2)lines(x,mysp$y)

set.seed(1)x = seq(0,1,length=21)y = sin(2*pi*x) + rnorm(21,sd=0.15)mysp = smooth.spline(x,y)plot(x,y,main="Some Noise",ylim=c(-1.5,1.5))lines(x,sin(2*pi*x),lty=2)lines(x,mysp$y)

set.seed(1)x = seq(0,1,length=21)y = sin(2*pi*x) + rnorm(21,sd=0.25)mysp = smooth.spline(x,y)plot(x,y,main="More Noise",ylim=c(-1.5,1.5))lines(x,sin(2*pi*x),lty=2)lines(x,mysp$y)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 21

Page 22: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory

Background Theory

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 22

Page 23: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Averaging Operators

One-Way ANOVA Decomposition

Consider the standard one-way ANOVA model

yij = µj + eij

for i ∈ {1, . . . ,nj} and j ∈ {1, . . . ,K}.

Typically, we want to decompose the treatment effects such as

µj = µ+ αj

where µ is overall mean and αj is treatment effect such thatα1 = 0 if first group is control∑K

j=1 αj = 0 if using effect coding

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 23

Page 24: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Averaging Operators

One-Way ANOVA and Averaging Operators

Consider the standard one-way ANOVA model using a smoothingspline on xi ∈ {1, . . . ,K}

yi = η(xi) + ei

for i ∈ {1, . . . ,n} where n =∑K

j=1 nj .

The ANOVA decomposition µj = µ+ αj can be written as

η = Aη + (I − A)η = η0 + ηc

where A “averages out” η to return a constant η0.α1 = 0 corresponds to Aη = η(1)∑K

j=1 αj = 0 corresponds to Aη =∑K

x=1 η(x)/K

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 24

Page 25: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Averaging Operators

Averaging Operators on Continuous Domain

For a continuous domain X = [a,b] we can decompose η such as

η = Aη + (I − A)η = η0 + ηc

where A “averages out” η to return a constant η0.Need averaging operator A defined such that A(Aη) = Aη = η0

Need identity operator I defined such that Iη = η

Note that η0 is overall constant, and ηc is treatment (contrast) effect.

For a function defined on X = [0,1], we could defineAη = η(0)

Aη =∫ 1

0 η(z)dz

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 25

Page 26: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Averaging Operators

Two-Way ANOVA Decomposition

Consider the standard two-way ANOVA model

yijk = µjk + eijk

for i ∈ {1, . . . ,njk}, j ∈ {1, . . . ,a}, and k ∈ {1, . . . ,b}.

Typically, we want to decompose the treatment effects such as

µjk = µ+ αj + βk + γjk

where µ is overall mean andαj is main effect of Factor A such that

∑aj=1 αj = 0

βk is main effect of Factor B such that∑b

k=1 βk = 0

γjk is interaction effect such that∑a

j=1 γjk =∑b

k=1 γjk = 0 ∀j , kNathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 26

Page 27: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Averaging Operators

Two-Way ANOVA and Averaging Operators

Consider the standard two-way ANOVA model using a smoothingspline on xi = (xi1, xi2) ∈ X1 ×X2 = {1, . . . ,a} × {1, . . . ,b}

yi = η(xi) + ei

for i ∈ {1, . . . ,n} where n =∑a

j=1∑b

k=1 njk .

The ANOVA decomposition µjk = µ+ αj + βk + γjk can be written as

η = [AX1 + (I − AX1)][AX2 + (I − AX2)]η

= AX1AX2η︸ ︷︷ ︸η0

+ (I − AX1)AX2η︸ ︷︷ ︸η1

+ AX1(I − AX2)η︸ ︷︷ ︸η2

+ (I − AX1)(I − AX2)η︸ ︷︷ ︸η12

where AX1 and AX2 are averaging operators such thatAX1(AX1η) = AX1η is constant for all xi1 ∈ X1

AX2(AX2η) = AX2η is constant for all xi2 ∈ X2

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 27

Page 28: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Hilbert Spaces

Linear Spaces and Functionals

Suppose that η, φ ∈ L where the set L satisfies:η + φ ∈ LaL ∈ L for any scalar a

If these two conditions are met, we say that L is a linear space.

A functional L in L operates on η ∈ L and returns a real number.Linear functional: L(η + φ) = Lη + Lφ and L(aη) = aLηBilinear functional: linear functional of two variables

- J(aη + bφ, ψ) = aJ(η, ψ) + bJ(φ, ψ)- J(η,aφ+ bψ) = aJ(η, φ) + bJ(η, ψ)

Symmetry: J(η, φ) = J(φ, η) for all η, φ ∈ LPositive definite: J(η) = J(η, η) > 0 for all η ∈ LNon-negative definite: J(η) = J(η, η) ≥ 0 for all η ∈ LQuadratic: bilinear, symmetric, and non-negative definite

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 28

Page 29: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Hilbert Spaces

Inner Products and Norms

In a linear space L, an inner-product is a positive definite bilinear form.We will use the notation 〈·, ·〉 to denote an inner-product.

The inner-product defines a norm in L, which provides a metric tomeasure distance between two objects η, φ ∈ L.

We will use the notation ‖η‖ =√〈η, η〉 to denote the norm of η.

We will use the notation D[η, φ] = ‖η − φ‖ to denote the distancebetween η and φ in L

In any linear space L we have the following two rules:Cauchy-Schwarz: |〈η, φ〉| ≤ ‖η‖‖φ‖Triangle: ‖η + φ‖ ≤ ‖η‖+ ‖φ‖

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 29

Page 30: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Hilbert Spaces

Null Spaces, Semi-Inner Products, and Semi-Norms

The null space of a non-negative definite bilinear form J in a linearspace L is defined as NJ = {η : J(η, η) = 0, η ∈ L}, and note that

NJ = {0} if J is positive definiteNJ contains 0 and nonzero elements otherwise

A non-negative definite bilinear form J in a linear space L defines asemi-inner-product in L.

Induces a semi-norm√

J(η) =√

J(η, η) in LSimilar to a norm, but J(η) = 0 does not imply η = 0

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 30

Page 31: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Hilbert Spaces

Hilbert Spaces and Projections

A Hilbert space is a complete inner-product linear space.A sequence where limm,n→∞ ‖ηm − ηn‖ = 0 is a Cauchy sequenceA linear space L is complete if every Cauchy sequence in Lconverges to some element in L.

Any closed linear subspace of H (denoted G ⊂ H) is a Hilbert space.Distance between η ∈ H and G is D[η,G] = infφ∈G ‖η − φ‖There exists ηG ∈ G such that D[η,G] = ‖η − ηG‖ηG is the unique projection of η in the space G

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 31

Page 32: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Hilbert Spaces

Tensor Sum Decompositions

Given η ∈ H and G ⊂ H, we have that 〈η − ηG , φ〉 = 0 for all φ ∈ G.Gc = {η : 〈η, φ〉 = 0, ∀φ ∈ G} is orthogonal complement of GTenor sum decomposition: H = G ⊕ Gc and η = ηG + ηGc

If Hn and Hc are Hilbert spaces with inner products 〈·, ·〉n and 〈·, ·〉c,and if Hn ∩Hc = {0}, then H = Hn ⊕Hc is Hilbert space withinner-product 〈·, ·〉 = 〈·, ·〉n + 〈·, ·〉c

Consider a null space NJ corresponding to a semi-inner-product J inthe space H, and define J(·, ·) such that

1 J(·, ·) defines a full inner product in the space NJ

2 (∀η ∈ H)(∃φ ∈ NJ) such that J(η − φ) = 0Then (J + J)(η, φ) defines a full inner product in H.Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 32

Page 33: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Hilbert Spaces

Hilbert Space Example: RK

Note that a Hilbert space is a generalization of Euclidean space RK .

For any vectors x,y ∈ RK , inner product is 〈x,y〉 = x′y =∑K

i=1 xiyi

〈x,y〉 = 〈x,y〉n + 〈x,y〉c = x′[ 1

K 1K 1′K + (IK − 1K 1K 1′K )

]y

Hn = {η : η(1) = · · · = η(K )} and Hc = {η :∑K

x=1 η(x) = 0}

This corresponds to classic one-way ANOVA decomposition

µj = µ+ αj

with the constraint∑

j αj = 0

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 33

Page 34: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Reproducing Kernels

Riesz Representation Theorem

For every φ in a Hilbert space H, the functional Lφη = 〈φ, η〉 defines acontinuous linear functional Lφ.

L is continuous if limn→∞ Lηn = Lη whenever limn→∞ ηn = η

Every continuous linear functional L in H has a representationLη = 〈φL, η〉 for some φL ∈ H, which is called the representer of L.

TheoremFor every continuous linear functional L in a Hilbert space H, thereexists a unique φL ∈ H such that Lη = 〈φL, η〉 for all η ∈ H.

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 34

Page 35: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Reproducing Kernels

Reproducing Kernel Hilbert Spaces

To estimate an SSANOVA, we need to evaluate η for different x ∈ X .Need continuity of evaluation functional: [x ]η = η(x)

Consider a Hilbert space H of functions on the domain X .If evaluation functional [x ]η = η(x) is continuous in H for all x ∈ X ,then we say that H is a reproducing kernel Hilbert spaceBy the Riesz Representation Theorem, there exists ρx ∈ H, whichis the representer of the evaluation functional [x ]η = η(x)

Symmetric bivariate function ρ(x , y) = ρx (y) = 〈ρx , ρy 〉 has thereproducing property 〈ρ(x , ·), η(·)〉 = η(x)

Consequently, ρ is called the reproducing kernel of the space H

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 35

Page 36: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Reproducing Kernels

Examples of Reproducing Kernel Hilbert Spaces

Consider the Euclidean space RK , which is a RKHSInner product is defined as 〈x,y〉 =

∑Ki=1 xiyi

RK is define as ρ(x , y) = I{x=y}, which is indicator function

Consider the space L2[0,1] = {η :∫ 1

0 η2dx <∞}

Elements in L2[0,1] are defined via equivalent classes(not defined via individual functions)NOT a RKHS because evaluation functional is not well-defined

Consider the space C(m)[0,1] = {η : η(m) ∈ L2[0,1]}Elements in C(m)[0,1] are defined via individual functionsEvaluation functional is continuous, so we have a RKHS

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 36

Page 37: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Reproducing Kernels

Tensor Sum Decompositions of RKHS

Given the tensor sum decomposition H = Hn ⊕Hc, we have

ρ = ρn + ρc

where ρ is the RK of H, ρn is the RK of Hn, and ρc is the RK of Hc.

Furthermore, if ρ is the RK of H and if ρ = ρn + ρc whereρn, ρc ∈ H are non-negative for all x ∈ X〈ρn(x , ·), ρc(y , ·)〉 = 0 for all x , y ∈ X

then the spaces Hn and Hc form a tensor sum decomposition of H.

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 37

Page 38: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Reproducing Kernels

Reproducing Kernel for Nominal Smoothing Splines

Suppose that xi ∈ X = {1, . . . ,K} and η ∈ H = RK .

For any elements η, φ ∈ H, we have that〈η, φ〉 = η′φ =

∑Kx=1 η(x)φ(x)

ρ(x , y) = I{x=y} where I{·} is indicator function

Using the averaging operator Aη =∑K

x=1 η(x)/K〈η, φ〉 = 〈η, φ〉n + 〈η, φ〉c = η′

[ 1K 1K 1′K + (IK − 1

K 1K 1′K )]φ

ρ(x , y) = ρn(x , y) + ρc(x , y) = 1K + (I{x=y} − 1

K )

Hn = {η : η(1) = · · · = η(K )} and Hc = {η :∑K

x=1 η(x) = 0}

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 38

Page 39: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Reproducing Kernels

Reproducing Kernel for Polynomial Smoothing Splines

Suppose that xi ∈ X = [0,1] and η ∈ H = C(m)[0,1].

Using the averaging operator Aη =∫ 1

0 ηdx〈η, φ〉 = 〈η, φ〉n + 〈η, φ〉c =

∑m−1ν=0 (

∫ 10 η

(ν)dx)(∫ 1

0 φ(ν)dx) +

∫ 10 η

(m)φ(m)dx

ρ(x , y) = ρn(x , y) + ρc(x , y) =∑m−1

ν=0 kν(x)kν(y) + (−1)m−1k2m(|x − y |)where kν(x) is scaled Bernoulli polynomial

Hn = {η : η(m) = 0} and Hc = {η :∫ 1

0 η(ν) = 0, ν = 0, . . . ,m − 1, η(m) ∈ L2[0, 1]}

Using the averaging operator Aη = η(0)

〈η, φ〉 = 〈η, φ〉n + 〈η, φ〉c =∑m−1

ν=0 η(ν)(0)φ(ν)(0) +

∫ 10 η

(m)φ(m)dx

ρ(x , y) = ρn(x , y) + ρc(x , y) =∑m−1

ν=0xν

ν!yν

ν!+

∫ 10

(x−u)m−1+

(m−1)!(y−u)m−1

+(m−1)! du

Hn = {η : η(m) = 0} and Hc = {η : η(ν)(0) = 0, ν = 0, . . . ,m − 1, η(m) ∈ L2[0, 1]}

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 39

Page 40: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Reproducing Kernels

Tensor Product RKHS

Suppose that xi ∈ X where X = X1 × · · · × Xp is a product domain.Suppose HXj is a RKHS of functions with RK ρXj for all xj ∈ Xj

Note that the marginal RKs have the form ρXj = ρnj + ρcj

We can define ρX =∏p

j=1 ρXj =∏p

j=1(ρnj + ρcj )

ρX is non-negative for all x ∈ XρX is RK of tensor product RKHS H = HX1 ⊗ · · · ⊗ HXp

We can form functional spaces for any number of covariatesCan constrain and/or remove subspaces to fit different models

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 40

Page 41: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Background Theory Reproducing Kernels

Need for Additional Smoothing Parameters

Given a tensor product RKHS H = HX1 ⊗ · · · ⊗ HXp we have that. . .H = ⊗p

j=1(Hnj ⊕Hcj ) = ⊕sk=1Hk is tensor sum decomposition

Each subspace Hk has inner product 〈·, ·〉k and RK ρk

Inner-products and RKs have different metrics

Can introduce additional smoothing parameters to inner product:

〈·, ·〉 =s∑

k=1

θ−1k 〈·, ·〉k

which corresponds to the tensor product RK

ρ =s∑

k=1

θkρk

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 41

Page 42: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Estimation and Inference

Estimation and Inference

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 42

Page 43: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Estimation and Inference Penalized Least Squares

Tensor Product Smoothing Spline

Given xi ∈ X = X1 × · · · × Xp, a tensor product smoothing spline is theηλ ∈ H = HX1 ⊗ · · · ⊗ HXp that minimizes

1n

n∑i=1

(yi − η(xi))2 + λJ(η)

whereλ ≥ 0 is overall (global) smoothing parameterJ is a quadratic functional quantifying roughness of ηAdditional smoothing parameters θ = (θ1, . . . , θs) exist in J

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 43

Page 44: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Estimation and Inference Penalized Least Squares

Representation of η

Let H = Hn ⊕Hc denote the tensor sum decomposition of the tensorproduct RKHS H = ⊗p

j=1HXj

Note that H has RK ρ = ρn + ρc where ρc =∑s

k=1 θkρk

Given fixed smoothing parameters θ, the η ∈ H that minimizes thepenalized least-squares functional can be written as

η(x) =m∑

v=1

dvφv (x) +n∑

i=1

ciρc(xi ,x) (1)

where {φv}mv=1 is a set of known functions spanning Hn, ρc is thereproducing kernel (RK) of Hc, and d ≡ {dv}m×1 and c ≡ {ci}n×1 arethe (unknown) basis function coefficient vectors

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 44

Page 45: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Estimation and Inference Penalized Least Squares

Penalty of η

Given the tensor sum decomposition H = Hn ⊕Hc for some tensorproduct RKHS H = ⊗p

j=1HXj , we define the penalty functional

J(η) = 〈η, η〉c

which is a semi-inner-product with null space Hn.

Using the representation of η(x) on the previous slide, we have

〈η, η〉c = 〈∑m

v=1 dvφv (x) +∑n

i=1 ciρc(xi ,x),∑m

v=1 dvφv (x) +∑n

i=1 ciρc(xi ,x)〉c= 〈∑n

i=1 ciρc(xi ,x),∑n

i=1 ciρc(xi ,x)〉c

=n∑

i=1

n∑j=1

cicj〈ρc(xi ,x), ρc(xj ,x)〉c =n∑

i=1

n∑j=1

cicjρc(xi ,xj )

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 45

Page 46: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Estimation and Inference Penalized Least Squares

Penalized Least Squares Problem

Using {x∗u}qu=1 ⊂ {xi}ni=1 as knots, the penalized least-squares

functional can be approximated as

‖y− Kd− Jθc‖2 + λnc′Qθc

wherey = (y1, . . . , yn)′ is response vectorK = {φv (xi)}n×m is null space basis function matrixJθ = {ρc(xi ,x∗u)}n×q is contrast space basis function matrix

Note: Jθ =∑s

k=1 θk Jk where Jk = {ρk (xi ,x∗u)}n×q

Qθ = {ρc(x∗t ,x∗u)}q×q is penalty matrix

Note: Qθ =∑s

k=1 θk Qk where Qk = {ρk (x∗t ,x∗u)}q×q

d = (d1, . . . ,dm)′ and c = (c1, . . . , cq)′ are unknown coefficients

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 46

Page 47: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Estimation and Inference Penalized Least Squares

Coefficients and Smoothing Matrix

The coefficients minimizing the penalized least-squares function are(dc

)=

(K′K K′Jθ

J′θK J′θJθ + λnQθ

)†(K′

J′θ

)y

where (·)† denotes the Moore-Penrose pseudoinverse.

The fitted values are given by y = Kd + Jθc = Sλy where

Sλ =(K Jθ

)(K′K K′Jθ

J′θK J′θJθ + λnQθ

)†(K′

J′θ

)is the smoothing matrix, which depends on λ = (λ/θ1, . . . , λ/θs).

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 47

Page 48: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Estimation and Inference Smoothing Parameter Selection

Smoothing Parameter Goldilocks Phenomenon

The selection of λ is the crucial step when fitting an SSANOVA model!

If λk ≡ λ/θk is too large, the penalty corresponding to Hk will be toosevere, making it difficult to estimate ηk .

Oversmooth k -th contrast space

If λk ≡ λ/θk is too small, the penalty corresponding to Hk will be toolenient, making it difficult to estimate ηk (assuming noisy data)

Undersmooth k -th contrast space

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 48

Page 49: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Estimation and Inference Smoothing Parameter Selection

Cross-Validation

If σ2 is unknown, a reasonable loss function for selecting λ is thecross-validated loss function

CV(λ|y,X,w) = (1/n)n∑

i=1

wi(yi − η[i]λ (xi))2

where wi > 0 is some weight, and η[i]λ is the function φ ∈ H thatminimizes the delete the i-th observation functional:

(1/n)∑j 6=i

(yj − φ(xj))2 + λJ(φ)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 49

Page 50: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Estimation and Inference Smoothing Parameter Selection

Cross-Validation (continued)

The form of CV loss function might suggest that it is necessary to fit ndifferent models (to obtain η[i]λ for i ∈ {1, . . . ,n}).

However, the CV function can be rewritten as

CV(λ|y,X,w) = (1/n)n∑

i=1

wi(yi − ηλ(xi))2

(1− sii(λ))2

which implies that the CV function can be minimized using the resultsof the full model.

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 50

Page 51: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Estimation and Inference Smoothing Parameter Selection

Generalized Cross-Validation

Defining wi ≡ (1− sii(λ))2/[n−1tr(In − Sλ)]2 replaces each sii(λ) with its

average value, producing the generalized cross-validation (GCV)criterion of Craven and Wahba (1979):

GCV(λ|y,X) = (1/n)n∑

i=1

(yi − ηλ(xi))2

[n−1tr(In − Sλ)]2

=(1/n)‖(In − Sλ)y‖2

[1− tr(Sλ)/n]2

(2)

The λ that minimizes the GCV score produces good estimates of η.

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 51

Page 52: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Estimation and Inference Bayesian Confidence Intervals

Gaussian Process Definition

A Gaussian process is a stochastic process {η(x) : x ∈ X} such thatη(x) ∼ N(µx , σ

2x ) for all x ∈ X where

µx = E(η(x)) is the mean functionγx ,x ′ = Cov(η(x), η(x′)) is the covariance functionσ2

x = Cov(η(x), η(x)) is the variance function

Note η(x) is a random variable that is normally distributed for all x ∈ XUse the notation η(x) ∼ N(µx , σ

2x ) for all x ∈ X

Mean and variance differ for each x ∈ X

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 52

Page 53: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Estimation and Inference Bayesian Confidence Intervals

Bayesian Interpretation of Smoothing Spline

Let η = ηn + ηc denote the null and contrast space functions andassume the following prior distributions:

ηn has a diffuse (vague) prior with mean zeroηc is a zero mean Gaussian process with covariance functionproportional to ρc

Using these prior assumptions. . .η can be interpreted as posterior mean of η given data ywe can derive posterior variance Var(η|y)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 53

Page 54: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

Estimation and Inference Bayesian Confidence Intervals

Bayesian Confidence Intervals

Using the Bayesian interpretation, we can form confidence intervals

η(x)± Zα/2√

Var(η|y)

where Zα/2 is critical value from standard normal distribution.

Bayesian CIs have approximate “across-the-function coverage” whenthe smoothing parameters are selected according to GCV.

On average contain 100(1− α)% of true function realizations

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 54

Page 55: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice

SSANOVA in Practice

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 55

Page 56: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice One-Way SSANOVA

Unidimensional Smoothing Splines in R

Many options for unidimensional smoothing spline in R:smooth.spline function (in stats package)bigspline function (in bigsplines package)bigssa function (in bigsplines package)ssanova function (in gss package)gam function (in mgcv package)

For unidimensional smoothing, we will focus on the smooth.splineand the bigspline functions, which have simple syntax.

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 56

Page 57: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice One-Way SSANOVA

smooth.spline: Overview

> set.seed(1)> x = seq(0,1,length=100)> eta = 2 + x + sin(2*pi*x)> y = eta + rnorm(100)> plot(x,y)> smsp = smooth.spline(x,y)> lines(x,smsp$y)> lines(x,eta,lty=2)

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

x

y

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 57

Page 58: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice One-Way SSANOVA

smooth.spline: Changing Smoothing Parameter

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

spar=0.25

x

y

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

spar=0.75

x

y●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

spar=1

x

y

R code for leftmost plot:> smsp = smooth.spline(x,y,spar=0.25)> plot(x,y,main="spar=0.25")> lines(x,smsp$y)> lines(x,eta,lty=2)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 58

Page 59: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice One-Way SSANOVA

smooth.spline: Changing Number of Knots

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

spar=0.5, nknots=10

x

y

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

spar=0.5, nknots=20

x

y●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

spar=0.5, nknots=30

x

y

R code for leftmost plot:> smsp = smooth.spline(x,y,spar=0.5,nknots=10)> plot(x,y,main="spar=0.5, nknots=10")> lines(x,smsp$y)> lines(x,eta,lty=2)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 59

Page 60: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice One-Way SSANOVA

smooth.spline: CV versus GCV

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

nknots=20, cv=TRUE

x

y

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

nknots=20, cv=FALSE

x

y

R code for leftmost plot:> smsp = smooth.spline(x,y,nknots=20,cv=TRUE)> plot(x,y,main="nknots=20, cv=TRUE")> lines(x,smsp$y)> lines(x,eta,lty=2)Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 60

Page 61: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice One-Way SSANOVA

smooth.spline: Number of Knots (revisited)

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

cv=FALSE, nknots=10

x

y

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

cv=FALSE, nknots=20

x

y●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

cv=FALSE, nknots=30

x

y

R code for leftmost plot:> smsp = smooth.spline(x,y,nknots=10)> plot(x,y,main="cv=FALSE, nknots=10")> lines(x,smsp$y)> lines(x,eta,lty=2)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 61

Page 62: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice One-Way SSANOVA

smooth.spline: Predicting for New Data

Given η we can predict for a new sequence of data:

> set.seed(1)> x = seq(0,1,length=100)> eta = 2 + x + sin(2*pi*x)> y = eta + rnorm(100)> plot(x,y,main="Prediction")> smsp = smooth.spline(x,y)> newdata = seq(0,1,length=200)> yhat = predict(smsp,newdata)> lines(yhat)> lines(x,eta,lty=2)

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

Prediction

x

y

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 62

Page 63: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice One-Way SSANOVA

bigspline: Overview

For smoothing large samples. . .

> set.seed(1)> x = seq(0,1,length=100)> eta = 2 + x + sin(2*pi*x)> y = eta + rnorm(100)> plot(x,y)> bigsp = bigspline(x,y)> lines(x,bigsp$fitted)> lines(x,eta,lty=2)

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

x

y

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 63

Page 64: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice One-Way SSANOVA

bigspline: Changing Smoothing Parameter

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

lambdas=10^−9

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

lambdas=10^−5

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

lambdas=1

x

y

R code for leftmost plot:> bigsp = bigspline(x,y,lambdas=10^-9)> plot(x,y,main="lambdas=10^-9")> lines(x,bigsp$fitted)> lines(x,eta,lty=2)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 64

Page 65: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice One-Way SSANOVA

bigspline: Changing Number of Knots

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

lambdas=10^−5, nknots=10

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

lambdas=10^−5, nknots=20

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

lambdas=10^−5, nknots=30

x

y

R code for leftmost plot:> bigsp = bigspline(x,y,lambdas=10^-5,nknots=10)> plot(x,y,main="lambdas=10^-5, nknots=10")> lines(x,bigsp$fitted)> lines(x,eta,lty=2)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 65

Page 66: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice One-Way SSANOVA

bigspline: Number of Knots (revisited)

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

GCV, nknots=10

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

GCV, nknots=20

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

GCV, nknots=30

x

y

R code for leftmost plot:> bigsp = bigspline(x,y,nknots=10)> plot(x,y,main="GCV, nknots=10")> lines(x,bigsp$fitted)> lines(x,eta,lty=2)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 66

Page 67: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice One-Way SSANOVA

bigspline: Predicting for New Data

Given η we can predict for a new sequence of data:

> set.seed(1)> x = seq(0,1,length=100)> eta = 2 + x + sin(2*pi*x)> y = eta + rnorm(100)> plot(x,y,main="Prediction")> bigsp = bigspline(x,y)> newdata = seq(0,1,length=200)> yhat = predict(bigsp,newdata)> lines(newdata,yhat)> lines(x,eta,lty=2)

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

Prediction

x

y

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 67

Page 68: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice One-Way SSANOVA

bigspline: Predicting Linear and Non-Linear Effects

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

Full Prediction

x

y

0.0 0.2 0.4 0.6 0.8 1.0

2.0

2.2

2.4

2.6

2.8

3.0

Linear Effect

x

2 +

x

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

Non−Linear Effect

x

sin(

2 *

pi *

x)

R code for center and rightmost plot:> newdata = seq(0,1,length=200)> plot(x,2+x,main="Linear Effect",type="l",lty=2)> yhat = predict(bigsp,newdata,effect="0") + predict(bigsp,newdata,effect="lin")> lines(newdata,yhat)> plot(x,sin(2*pi*x),main="Non-Linear Effect",type="l",lty=2)> yhat = predict(bigsp,newdata,effect="non")> lines(newdata,yhat)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 68

Page 69: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice One-Way SSANOVA

bigspline: Bayesian Confidence Intervals

> dev.new(width=6,height=6,noRStudioGD=TRUE)> set.seed(1)> x = seq(0,1,length=100)> eta = 2 + x + sin(2*pi*x)> y = eta + rnorm(100)> bigsp = bigspline(x,y,se.fit=TRUE)> cilo = bigsp$fit - qnorm(0.975)*bigsp$se> cihi = bigsp$fit + qnorm(0.975)*bigsp$se> plot(x,y)> lines(x,eta)> lines(bigsp$xunique,cilo,lty=2)> lines(bigsp$xunique,cihi,lty=2)> sum(eta>=cilo & eta<=cihi)/length(x)[1] 1

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

x

y

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 69

Page 70: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice One-Way SSANOVA

Comparing smooth.spline and bigspline

Consider the function yi = 2 + xi + sin(2πxi) + ei where eiiid∼ N(0, σ2).

Suppose that xi = i/n for i ∈ {0, . . . ,n} and σ2 = 1 so that eiiid∼ N(0,1).

Median true MSE = 1n∑n

i=1(η(xi)− η(xi))2 using q = 20 knots:n 100 1000 10000 1e+05 1e+06

smooth.spline 0.13836 0.00504 0.00113 1e-04 2e-05bigspline 0.14030 0.00497 0.00110 1e-04 2e-05

Median runtimes using q = 20 knots:n 100 1000 10000 1e+05 1e+06

smooth.spline 0.001 0.002 0.021 0.1965 2.233bigspline 0.009 0.009 0.011 0.0120 0.094

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 70

Page 71: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice One-Way SSANOVA

R Code for Simulation (on previous slide)

nsamp = 10^c(2:6)simresults = NULLxnew = seq(0,1,length=200)set.seed(1)for(j in 1:5){for(k in 1:10){

x = seq(0,1,length=nsamp[j])eta = 2 + x + sin(2*pi*x)y = eta + rnorm(nsamp[j])

tic = proc.time()ssmod = smooth.spline(x,y,nknots=20)toc = proc.time() - tictmse = sum( (ssmod$y - eta)^2 ) / nsamp[j]simsp = data.frame(method="smsp",n=nsamp[j],time=toc[3],tmse=tmse,row.names=k)

tic = proc.time()ssmod = bigspline(x,y,nknots=20)toc = proc.time() - tictmse = sum( (predict(ssmod) - eta)^2 ) / nsamp[j]simbig = data.frame(method="big",n=nsamp[j],time=toc[3],tmse=tmse,row.names=k+1)

simresults = rbind(simresults,simsp,simbig)}

}

round(tapply(simresults$tmse,list(simresults$method,simresults$n),median),5)round(tapply(simresults$time,list(simresults$method,simresults$n),median),5)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 71

Page 72: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice One-Way SSANOVA

bigspline: Linear and Non-Linear Effects (revisited)

0.0 0.2 0.4 0.6 0.8 1.0

2.0

2.2

2.4

2.6

2.8

3.0

Linear: n = 100

xnew

2 +

xne

w

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

Non−linear: n = 100

xnew

sin(

2 *

pi *

xne

w)

0.0 0.2 0.4 0.6 0.8 1.0

2.0

2.2

2.4

2.6

2.8

3.0

Linear: n = 1000

xnew

2 +

xne

w

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

Non−linear: n = 1000

xnew

sin(

2 *

pi *

xne

w)

0.0 0.2 0.4 0.6 0.8 1.0

2.0

2.2

2.4

2.6

2.8

3.0

Linear: n = 10000

xnew

2 +

xne

w

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

Non−linear: n = 10000

xnew

sin(

2 *

pi *

xne

w)

0.0 0.2 0.4 0.6 0.8 1.0

2.0

2.2

2.4

2.6

2.8

3.0

Linear: n = 1e+05

xnew

2 +

xne

w

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

Non−linear: n = 1e+05

xnew

sin(

2 *

pi *

xne

w)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 72

Page 73: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice Two-Way SSANOVA (Additive)

Multidimensional Smoothing Splines in R

A few options for multidimensional smoothing splines in R:bigssa function (in bigsplines package)bigssp function (in bigsplines package)ssanova function (in gss package)gam function (in mgcv package)

We will focus on the ssanova and bigssa (or bigssp) functions,which fit tensor product smoothing splines.

Note that the gam function handles interactions in different manner.

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 73

Page 74: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice Two-Way SSANOVA (Additive)

Additive Function: Definition

Suppose we have the following function defined forx = (x1, x2) ∈ [0,1]× {a,b}:

addfun = function(x1,x2){funval = sin(2*pi*x1)idx = which(x2=="a")funval[idx] = funval[idx] + 2funval

}

Note that the function isη(x1, x2) = 2 + sin(2πx1) if x2 = aη(x1, x2) = sin(2πx1) if x2 6= a

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 74

Page 75: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice Two-Way SSANOVA (Additive)

Additive Function: Visualization

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

x1

η(x 1

, x2)

x2 = ax2 = b

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 75

Page 76: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice Two-Way SSANOVA (Additive)

Additive Function: bigssa fitting

> n = 100> set.seed(55455)> x1v = seq(0,1,length=n)> x2v = factor(sample(letters[1:2],n,replace=TRUE))> eta = addfun(x1v,x2v)> y = eta + rnorm(n)> idx = binsamp(cbind(x1v,x2v),nmbin=c(20,2))> ssint = bigssa(y~x1v*x2v, type=list(x1v="cub",x2v="nom"), nknots=idx)> sum((ssint$fitted-eta)^2) / length(eta)[1] 0.04605668> ssadd = bigssa(y~x1v+x2v, type=list(x1v="cub",x2v="nom"), nknots=idx)> sum((ssadd$fitted-eta)^2) / length(eta)[1] 0.03305623> fitstats = rbind(ssint$info,ssadd$info)> rownames(fitstats) = c("int","add")> fitstats

gcv rsq aic bicint 1.441561 0.6258559 319.3529 344.6341add 1.386159 0.6134636 316.0127 332.6986

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 76

Page 77: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice Two-Way SSANOVA (Additive)

Additive Function: bigssa prediction> dev.new(width=12,height=6,noRStudioGD=TRUE)> par(mfrow=c(1,2))> newdata = expand.grid(x1v=seq(0,1,length=100),x2v=c("a","b"))> yint = predict(ssint,newdata)> yadd = predict(ssadd,newdata)> plot(newdata[1:100,1],yint[1:100],main="Interaction",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yint[101:200],lty=2)> plot(newdata[1:100,1],yadd[1:100],main="Additive",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yadd[101:200],lty=2)

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Interaction

newdata[1:100, 1]

yint

[1:1

00]

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Additive

newdata[1:100, 1]

yadd

[1:1

00]

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 77

Page 78: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice Two-Way SSANOVA (Additive)

Additive Function: ssanova fitting

> n = 100> set.seed(55455)> x1v = seq(0,1,length=n)> x2v = factor(sample(letters[1:2],n,replace=TRUE))> eta = addfun(x1v,x2v)> y = eta + rnorm(n)> idx = binsamp(cbind(x1v,x2v),nmbin=c(20,2))> ssint = ssanova(y~x1v*x2v,type=list(x1v="cubic",x2v="nominal"),+ id.basis=idx)> newdata = data.frame(x1v=x1v,x2v=x2v)> sum((predict(ssint,newdata)-eta)^2) / length(eta)[1] 0.01449173> ssadd = ssanova(y~x1v+x2v,type=list(x1v="cubic",x2v="nominal"),+ id.basis=idx)> sum((predict(ssadd,newdata)-eta)^2) / length(eta)[1] 0.01432404

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 78

Page 79: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice Two-Way SSANOVA (Additive)

Additive Function: ssanova prediction> dev.new(width=12,height=6,noRStudioGD=TRUE)> par(mfrow=c(1,2))> newdata = expand.grid(x1v=seq(0,1,length=100),x2v=c("a","b"))> yint = predict(ssint,newdata)> yadd = predict(ssadd,newdata)> plot(newdata[1:100,1],yint[1:100],main="Interaction",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yint[101:200],lty=2)> plot(newdata[1:100,1],yadd[1:100],main="Additive",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yadd[101:200],lty=2)

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Interaction

newdata[1:100, 1]

yint

[1:1

00]

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Additive

newdata[1:100, 1]

yadd

[1:1

00]

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 79

Page 80: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice Two-Way SSANOVA (Interaction)

Interaction Function: Definition

Suppose we have the following function defined forx = (x1, x2) ∈ [0,1]× {a,b}:

intfun = function(x1,x2){funval = sin(2*pi*x1)idx = which(x2=="a")funval[idx] = funval[idx] + 2 + sin(4*pi*x1[idx])funval

}

Note that the function isη(x1, x2) = 2 + sin(2πx1) + sin(4πx1) if x2 = aη(x1, x2) = sin(2πx1) if x2 6= a

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 80

Page 81: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice Two-Way SSANOVA (Interaction)

Interaction Function: Visualization

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

x1

η(x 1

, x2)

x2 = ax2 = b

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 81

Page 82: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice Two-Way SSANOVA (Interaction)

Interaction Function: bigssa fitting

> n = 100> set.seed(55455)> x1v = seq(0,1,length=n)> x2v = factor(sample(letters[1:2],n,replace=TRUE))> eta = intfun(x1v,x2v)> y = eta + rnorm(n)> idx = binsamp(cbind(x1v,x2v),nmbin=c(20,2))> ssint = bigssa(y~x1v*x2v,type=list(x1v="cub",x2v="nom"),nknots=idx)> sum((ssint$fitted-eta)^2) / length(eta)[1] 0.1081747> ssadd = bigssa(y~x1v+x2v,type=list(x1v="cub",x2v="nom"),nknots=idx)> sum((ssadd$fitted-eta)^2) / length(eta)[1] 0.1858098> fitstats = rbind(ssint$info,ssadd$info)> rownames(fitstats) = c("int","add")> fitstats

gcv rsq aic bicint 1.522061 0.6509204 324.0861 356.6680add 1.510616 0.6097741 324.5034 343.1147

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 82

Page 83: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice Two-Way SSANOVA (Interaction)

Interaction Function: bigssa prediction> dev.new(width=12,height=6,noRStudioGD=TRUE)> par(mfrow=c(1,2))> newdata = expand.grid(x1v=seq(0,1,length=100),x2v=c("a","b"))> yint = predict(ssint,newdata)> yadd = predict(ssadd,newdata)> plot(newdata[1:100,1],yint[1:100],main="Interaction",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yint[101:200],lty=2)> plot(newdata[1:100,1],yadd[1:100],main="Additive",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yadd[101:200],lty=2)

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Interaction

newdata[1:100, 1]

yint

[1:1

00]

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Additive

newdata[1:100, 1]

yadd

[1:1

00]

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 83

Page 84: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice Two-Way SSANOVA (Interaction)

Interaction Function: ssanova fitting

> n = 100> set.seed(55455)> x1v = seq(0,1,length=n)> x2v = factor(sample(letters[1:2],n,replace=TRUE))> eta = intfun(x1v,x2v)> y = eta + rnorm(n)> idx = binsamp(cbind(x1v,x2v),nmbin=c(20,2))> ssint = ssanova(y~x1v*x2v,type=list(x1v="cubic",x2v="nominal"),+ id.basis=idx)> newdata = data.frame(x1v=x1v,x2v=x2v)> sum((predict(ssint,newdata)-eta)^2) / length(eta)[1] 0.1624814> ssadd = ssanova(y~x1v+x2v,type=list(x1v="cubic",x2v="nominal"),+ id.basis=idx)> sum((predict(ssadd,newdata)-eta)^2) / length(eta)[1] 0.1802812

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 84

Page 85: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice Two-Way SSANOVA (Interaction)

Interaction Function: ssanova prediction> dev.new(width=12,height=6,noRStudioGD=TRUE)> par(mfrow=c(1,2))> newdata = expand.grid(x1v=seq(0,1,length=100),x2v=c("a","b"))> yint = predict(ssint,newdata)> yadd = predict(ssadd,newdata)> plot(newdata[1:100,1],yint[1:100],main="Interaction",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yint[101:200],lty=2)> plot(newdata[1:100,1],yadd[1:100],main="Additive",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yadd[101:200],lty=2)

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Interaction

newdata[1:100, 1]

yint

[1:1

00]

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Additive

newdata[1:100, 1]

yadd

[1:1

00]

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 85

Page 86: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice Two-Way SSANOVA (Interaction)

Interaction Function: Fitting with More Data

> n = 1000> set.seed(55455)> x1v = seq(0,1,length=n)> x2v = factor(sample(letters[1:2],n,replace=TRUE))> eta = intfun(x1v,x2v)> y = eta + rnorm(n)> idx = binsamp(cbind(x1v,x2v),nmbin=c(20,2))> ssint = bigssa(y~x1v*x2v,type=list(x1v="cub",x2v="nom"),nknots=idx)> sum((ssint$fitted-eta)^2) / length(eta)[1] 0.03178251> ssadd = bigssa(y~x1v+x2v,type=list(x1v="cub",x2v="nom"),nknots=idx)> sum((ssadd$fitted-eta)^2) / length(eta)[1] 0.1311356> fitstats = rbind(ssint$info,ssadd$info)> rownames(fitstats) = c("int","add")> fitstats

gcv rsq aic bicint 1.016522 0.6479397 2854.060 2923.757add 1.081167 0.6236793 2915.779 2973.402

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 86

Page 87: Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing Spline ANOVA Nathaniel E. Helwig Assistant Professor of Psychology and Statistics

SSANOVA in Practice Two-Way SSANOVA (Interaction)

Interaction Function: Predicting with More Data> dev.new(width=12,height=6,noRStudioGD=TRUE)> par(mfrow=c(1,2))> newdata = expand.grid(x1v=seq(0,1,length=100),x2v=c("a","b"))> yint = predict(ssint,newdata)> yadd = predict(ssadd,newdata)> plot(newdata[1:100,1],yint[1:100],main="Interaction",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yint[101:200],lty=2)> plot(newdata[1:100,1],yadd[1:100],main="Additive",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yadd[101:200],lty=2)

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Interaction

newdata[1:100, 1]

yint

[1:1

00]

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Additive

newdata[1:100, 1]

yadd

[1:1

00]

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 87