data presentation michael in the news semi-parametric … · 2014. 3. 14. · today i data...

37
Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final Exam: April 11 2:00 – 5:00 pm SS 1085 I 4 questions I one theory question I one applied question I one question from HW I one question about a study I one question with computer output I detailed list of SM sections coming I See Final Exam from 2012 on course web page STA 2201: Applied Statistics II March 14, 2014 1/37

Upload: others

Post on 25-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

TodayI data presentation MichaelI In the newsI Semi-parametric regression

I HW 3: due March 21

I Final Exam: April 11 2:00 – 5:00 pm SS 1085I 4 questionsI one theory questionI one applied questionI one question from HWI one question about a studyI one question with computer outputI detailed list of SM sections comingI See Final Exam from 2012 on course web page

STA 2201: Applied Statistics II March 14, 2014 1/37

Page 2: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

link

CAREER OUTLOOK FOR ACTUARIES

in Property/Casualty insurance in 20106

in number of actuaries employed by insurance

companies in the US between 2010-202012

for U.S. actuaries in 2010 (median pay)12

WORKERS

533,100 EXPECTED GROWTH

27%SALARY

$87,650

SOURCES

1. Swiss RE, sigma, N. 2/2013. 2. McHale, Cynthia, and Sharlene Leurig. “Stormy Future for U.S. Property/Casualty Insurers: The Growing Costs and Risks of Extreme Weather Events.” CERES, Sept. 2012. 3. Best’s News Service, “University of Illinois: Crop Insurers Could Face Underwriting Losses of $18B,” August 27, 2012. Retrieved online from http://www3.ambest.com/ambv/bestnews/newscontent.aspx?AltSrc=105&RefNum=159603. 4. NOAA National Climatic Data Center. Retrieved online from http://www.ncdc.noaa.gov/extremes/records. 5. ©2013 Munich Re, Geo Risks Research, NatCatSERVICE. As of January 2013. 6. The Insurance Fact Book 2012. Insurance Information Institute, 2012. 7. Sousanis, John. “World Vehicle Population Tops 1 Billion Units.” WardsAuto 15 Aug 2011. 8. LeBeau, Philip. “Whoa! 1.7 Billion Cars on the Road by 2035.” CNBC.com 12 Nov 2012. 9. “China’s Growing General Insurance Market: A Bright Future.” Towers Watson Emphasis Volume 3 (2011): 21-25. 10. Mui, Chunka. “Will Auto Insurers Survive Their Collision with Driverless Cars? (Part 6)” Forbes.com 28 March 2013. 11. Newman, Graeme. “A Future Filled With Driverless Cars.” Insurance Journal.com 11 Feb 2013. 12. Bureau of Labor Statistics, U.S. Department of Labor, Occupational Outlook Handbook, 2012-13 Edition, Actuaries. Retrieved online from http://www.bls.gov/ooh/math/actuaries.htm. 13. Dargay, Joyce, Gately, Dermot and Sommer, Martin. “Vehicle Ownership and Income Growth, Worldwide: 1960-2030,” Energy Journal, 2007, Vol. 28, No. 4. Retrieved online from http://www.iaee.org/en/publications/ejarticle.aspx?id=2234.

As the number of vehicles on the road rises at record rates, the introduction of driverless cars could all but eliminate auto insurance premiums in the future — causing significant challenges for the market.

WHAT ELSE IS DRIVING THE INDUSTRY?

1.0 BILLION World vehicle population in 20107

1.7 BILLION Cars on the road by 20358

90% possible

decrease in future claims as a result of

driverless cars11

auto insurance premiums written each year in the US alone10

BILLION$200

Vehicles per 1000 people

2002: 16 2030: 269 Average annual growth rate: 10.6%

2002: 8122030: 849

Average annual growth rate: 0.2%

AUTO MARKET

COMPARISON13

In just 10 years, it will be #2.

20232013

Today, China is the 7th-largest general insurance market.9

set in 20124

25,000

total loss to US property/casualty insurers due to extreme weather in 20112

THE WINDS OF CLIMATE CHANGE ARE BLOWINGOver the last 30 years, weather events have driven up the value of insured losses around the globe, specifically in the US—a trend that is expected to keep growing.2

in average annual winter storm losses since the 80s2

INCREASENEW RECORD HIGH TEMPS

100 %

BILLION

wildfire damage in 20102

BILLION$1

drought losses to publicly owned crop insurers in 20123

BILLION$18

$32

With major natural and man-made catastrophes — earthquakes, floods, hurricanes, acts of terrorism and more — the last few decades have been costly for insured catastrophic losses.

COVERING CATASTROPHES

Weather-related natural catastrophes

Man-made disasters

Earthquake, tsunami

2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

WORLD INSURED CATASTROPHE LOSSES6

BILLION= $25.80

= $19.57

= $57.42= $123.07

= $31.10= $52.42= $27.80= $47.75

= $126.37= $77.24

The Society of Actuaries’ (SOA) General Insurance track to Fellowship gives candidates the rigorous training and skills they need to make a difference in this fast-moving field.

Practical and in-depth modules, unique to the SOA

Rigorous and robust educational pathway to Fellowhsip

Different tracks to fellowship through the SOA

4 EXAMS

4 MODULES

6• Corporate Finance and ERM• Individual Life and Annuities• Quantitative Finance and Investment

• Retirement Benefits• Group and Health• General Insurance

Number of global actuarial professionals in the SOA network

24,000 +

Learn more about the General Insurance pathway to the FSA at soa.org/general-ins

The General Insurance

track tothe FSA.

• • Corporaate Finance and ERM • Retirement Benefitfits

First practiced by ancient Chinese and Babylonian traders, General Insurance, also known as Property and Casualty or Non-Life Insurance, has played a critical role in the evolution of modern society.

An immense field today, it continues to grow in size and complexity. As increased globalization, and new economies and volatilities emerge, so do new opportunities—especially for risk-focused professionals that seek to solve the unprecedented challenges faced by companies and countries around the world.

NONLIFE INSURANCE PREMIUMS

General Insurance by the numbers. 2006

Direct premiums

written

$1.55 1

TRILLION

2011

Direct premiums

written

TRILLION$1.971

STA 2201: Applied Statistics II March 14, 2014 2/37

Page 4: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

Nature Medicine News Release

STA 2201: Applied Statistics II March 14, 2014 4/37

Page 5: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

Nature Medicine Advance Publication

STA 2201: Applied Statistics II March 14, 2014 5/37

Page 6: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

Nature MedicineI 525 patients, followed for five yearsI 46 patients had AD or pre-AD(“aMCI”) at entry; 28

convertersI in year 3 53 patients with aMCI/AD selected for plasma

testingI matched with 53 normal controls

I looked for biomarkers of disease, using logistic regressionand lasso

I used the most promising to test on a further set of 21patients with 20 matched controls

I ROC curve: plot of sensitivity (True Positives) against 1-specificity (False Positives) as cut-off varies

STA 2201: Applied Statistics II March 14, 2014 6/37

Page 7: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

... nature medicine

STA 2201: Applied Statistics II March 14, 2014 7/37

Page 8: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

Lasso for choosing explanatory variablesI penalized least squares

minβ

(Σ{yi − β0 − Σp

j=1xijβj}2 + λΣpj=1|βj |

)I equivalent to

minβ

Σ{yi − β0 − Σpj=1xijβj}2 s.t. Σ|βj | ≤ t

I quadratic programming problemI β lasso is nonlinear function of y

Tibshirani (1996), JRSS B and (2011), JRSS B

I http://www-stat.stanford.edu/˜tibs/lasso.html

I extends to generalized linear models, implemented inglmnet

STA 2201: Applied Statistics II March 14, 2014 8/37

Page 9: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

Extensions of semi-parametric regressionI original model yj = g(xj) + εj

I fit by local polynomial regression:g(x0)

.= β0 + β1(xj − x0) + · · ·+ βk (xj − x0)k , g(x0) = β0

I β maximizes

`(β, σ; x0,h) =∑ 1

hw(

xj − x0

h

)`j(β, σ; x0)

I `j(β, σ; x0) = − 12σ2 {yj−β0−β1(xj−x0)−· · ·−βk (xj−x0)k}2

−12 logσ2

I local log-likelihood fitting

I extend to more general models by replacing `j by theappropriate log-likelihood contribution

STA 2201: Applied Statistics II March 14, 2014 9/37

Page 10: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

Example 10.32I toxoplasmosis data; response – incidence;

x – yearly rainfall SM Figure 10.12

I yj = rj/mj , rj ∼ Binom{m,π(xj)}

I π(x) = exp[θ(x)/{1 + exp{θ(x)}]I θ(x)

.= β0 + β1(x − x0) + · · ·+ βk (x − x0)k/k !, θ(x0) = β0

I local log-likelihood

`(β; x0,h) =∑ 1

hw(

xj − x0

h

)mj{yjxT

j β − log(1 + exTj β)}

I or possibly allow for over-dispersion

`(β, φ; x0,h) =∑ 1

hw(

xj − x0

h

)mj

φ{yjxT

j β− log(1+exTj β)}

STA 2201: Applied Statistics II March 14, 2014 10/37

Page 11: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

Example 10.32

STA 2201: Applied Statistics II March 14, 2014 11/37

Page 12: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

... Ex 10.32

> library(mgcv)> library(SMPracticals)> data(toxo)> ?gam> toxo.gam <- gam(cbind(r,m-r) ˜ s(rain), family = binomial, data = toxo)> summary(toxo.gam)

Family: binomialLink function: logit

Formula:cbind(r, m - r) ˜ s(rain)

Parametric coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) -0.09015 0.08573 -1.052 0.293

Approximate significance of smooth terms:edf Ref.df Chi.sq p-value

s(rain) 6.515 7.57 23.05 0.00259 **

> par(mfrow=c(2,2))> toxo.gam$sp

s(rain)0.008141828> plot(gam(cbind(r,m-r) ˜ s(rain),sp=toxo.gam$sp, family = binomial, data = toxo), residuals=TRUE, pch="*")> plot(gam(cbind(r,m-r) ˜ s(rain),sp=0.05, family = binomial, data = toxo), residuals=TRUE, pch="*")> plot(gam(cbind(r,m-r) ˜ s(rain),sp=0.5, family = binomial, data = toxo), residuals=TRUE, pch="*")> plot(gam(cbind(r,m-r) ˜ s(rain),sp=1, family = binomial, data = toxo), residuals=TRUE, pch="*")

STA 2201: Applied Statistics II March 14, 2014 12/37

Page 13: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

1600 1800 2000 2200

-2-1

01

2

rain

s(rain,6.51)

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

**

*

*

*

***

****

*

*

*

*

*

*

*

1600 1800 2000 2200

-2-1

01

2

rain

s(rain,4.68)

*

*

*

*

*

*

*

*

**

*

*

*

*

*

**

*

*

*

** ** ***

*

*

*

*

*

*

*

1600 1800 2000 2200

-3-2

-10

12

3

rain

s(rain,2.89)

*

*

*

*

*

*

*

*

**

**

*

*

*

***

**

** ** **

*

*

*

*

*

*

**

1600 1800 2000 2200

-3-2

-10

12

3

rain

s(rain,2.46)

*

*

*

*

*

*

*

*

****

*

*

*

***

**

** ** **

*

*

*

*

*

*

* *

Page 14: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

... Ex 10.32I gam uses spline smoothing terms, rather than local

polynomialsI smoothing parameter replaces bandwidth hI kgplm in librarygplm computes kernel smooths, but for

Bernoulli data

I note from output that φ = 1I quasibinom is a valid choice of familyI gives estimate of φ as 1.8 (with default choice of

smoothing)I smooth fit no longer significant

STA 2201: Applied Statistics II March 14, 2014 14/37

Page 15: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

... Ex 10.32I estimation of smoothing parameter using generalized

cross-validationI or generalization of AIC

I

GCV(h) =∑{

yj − g(xj)

1− tr(Sh)/n

}2

I

AICc(h) = n log σ2(h) + n1 + tr(Sh)/n

1− {tr(Sh) + 2}/nI for generalized linear models

AICc(h) =∑

dj{yj ; µj(h)}+ n1 + tr(Sh)/n

1− {tr(Sh) + 2}/n

STA 2201: Applied Statistics II March 14, 2014 15/37

Page 16: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

Flexible modelling using basis expansions §10.7.2

I yj = g(xj) + εj

I Flexible linear modelling

g(x) = ΣMm=1βmhm(x)

I This is called a linear basis expansion, and hm is the mthbasis function

I For example if X is one-dimensional:g(x) = β0 + β1x + β2x2, org(x) = β0 + β1 sin(x) + β2 cos(x), etc.

I Simple linear regression has h1(x) = 1, h2(x) = x

STA 2201: Applied Statistics II March 14, 2014 16/37

Page 17: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

Piecewise polynomialsI piecewise constant basis functions

h1(x) = I(x < ξ1), h2(x) = I(ξ1 ≤ x < ξ2),h3(x) = I(ξ2 ≤ x)

I equivalent to fitting by local averaging

I piecewise linear basis functions , with constraintsh1(x) = 1, h2(x) = xh3(x) = (x − ξ1)+, h4(x) = (x − ξ2)+

I windows defined by knots ξ1, ξ2, . . .

I piecewise cubic basis functionsh1(x) = 1,h2(x) = x ,h3(x) = x2,h4(x) = x3

I continuity h5(x) = (x − ξ1)3+, h6(x) = (x − ξ2)3

+

I continuous function, continuous first and secondderivatives

STA 2201: Applied Statistics II March 14, 2014 17/37

Page 18: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final
Page 19: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final
Page 20: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

Example: earthquake data

> data(quake,package="SMPracticals")> head(quake)

time mag1 40.08333 6.02 162.38889 6.93 210.22917 6.0> with(quake, plot(log(1/time),mag))## using a different measure of intensity here than in Figure 10.36

-10 -9 -8 -7 -6 -5 -4

6.0

6.5

7.0

7.5

8.0

8.5

log(1/time)

mag

STA 2201: Applied Statistics II March 14, 2014 20/37

Page 21: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

... earthquake

> eq.gam <- gam(mag ˜ s(intensity), data = quake)> with(quake,lines(intensity, eq.gam$fitted.values))

-10 -9 -8 -7 -6 -5 -4

6.0

6.5

7.0

7.5

8.0

8.5

log(1/time)

mag

STA 2201: Applied Statistics II March 14, 2014 21/37

Page 22: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

... earthquake

> plot(eq.gam, residual=TRUE, pch = "o")# standard errors plotted by default

-10 -9 -8 -7 -6 -5 -4

-0.5

0.0

0.5

1.0

1.5

2.0

intensity

s(intensity,2.28)

o

o

o

oo

o

ooo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

oo

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

oo

o

o

o

o

o

ooo

o

o

ooo

o

o

o

o

oooo

o

o

o

o

o

ooo

o

o

oooo

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oooooooo

o

o

o

oo

o

oo

o

oo

o

o

oo

ooo

o

o

o

o

o

ooo

oo

oo

o

o

o

o

o

o

o

o

oo

oo

o

oo

o

o

o

oo

o

o

o

o

oooo

o

oo

o

oo

o

o

ooo

oo

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

oo

o

o

ooo

ooo

o

o

o

o

oo

o

o

oooo

oo

o

o

oo

o

o

o

o

o

o

o

ooo

o

oo

oo

o

o

o

oo

o

oooooo

o

o

o

oo

o

oo

oo

o

o

oo

o

o

ooo

o

o

oooooo

o

o

o

o

o

oo

o

ooooooooo

o

ooo

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

ooo

o

oo

o

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

oo

o

o

ooooo

o

ooo

o

o

o

o

oo

o

oo

oo

o

oooo

o

o

o

o

o

o

o

STA 2201: Applied Statistics II March 14, 2014 22/37

Page 23: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

Cubic splinesI truncated power basis of degree 3I need to choose number of knots K and placement of knotsξ1, . . . ξK SM uses n knots

I construct features matrix using truncated power basis setI use constructed matrix as set of predictors

STA 2201: Applied Statistics II March 14, 2014 23/37

Page 24: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

... cubic splines

> with(quake, bs(log(1/time))[1:10,])#bs(x) with no other arguments just gives a single cubic polynomial

1 2 3[1,] 0.0000000 0.0000000 1.0000000[2,] 0.1018013 0.3903714 0.4989780[3,] 0.1359705 0.4189773 0.4303434[4,] 0.1884790 0.4408886 0.3437743[5,] 0.2056632 0.4436068 0.3189471

...attr(,"degree")[1] 3attr(,"knots")numeric(0)attr(,"Boundary.knots")[1] -10.454784 -3.690961attr(,"intercept")[1] FALSEattr(,"class")[1] "bs" "basis" "matrix"

STA 2201: Applied Statistics II March 14, 2014 24/37

Page 25: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

... cubic splines

> with(quake,bs(log(1/time), df=5)[1:10,])# gives a proper cubic spline basis, here with 5 df

1 2 3 4 5[1,] 0 0.00000000 0.0000000 0.0000000 1.0000000[2,] 0 0.01110655 0.1250814 0.4247847 0.4390274[3,] 0 0.01846075 0.1661869 0.4486889 0.3666635[4,] 0 0.03370916 0.2283997 0.4600092 0.2778819[5,] 0 0.03989014 0.2484715 0.4585984 0.2530400

...attr(,"degree")[1] 3attr(,"knots")33.33333% 66.66667%-9.943294 -9.520987attr(,"Boundary.knots")[1] -10.454784 -3.690961

STA 2201: Applied Statistics II March 14, 2014 25/37

Page 26: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

... earthquake data> quake.bs = lm(mag ˜ bs(log(1/time),df=5),data = quake)> quake.pred = predict(quake.bs, se.fit = TRUE, interval = "confidence")> quake.pred$fit

fit lwr upr1 5.962665 5.216283 6.7090472 6.279641 5.979190 6.5800923 6.323859 6.042772 6.604946> lines(log(1/quake$time),quake.pred[[1]][,1])> lines(log(1/quake$time),quake.pred[[1]][,2], lty=2)> lines(log(1/quake$time),quake.pred[[1]][,3], lty=2)> quake.lo = loess(mag ˜ log(1/time), data = quake)> quake.lopred = predict(quake.lo, se=T)

-10 -9 -8 -7 -6 -5 -4

6.0

6.5

7.0

7.5

8.0

8.5

log(1/time)

mag

-10 -9 -8 -7 -6 -5 -4

6.0

6.5

7.0

7.5

8.0

8.5

log(1/time)

mag

STA 2201: Applied Statistics II March 14, 2014 26/37

Page 27: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

Smoothing splines §10.7.2I yj = g(tj) + εj , j = 1, . . . ,n

I choose g(·) to solve

ming

n∑j=1

{y − g(tj)}2

2σ2 − λ

2σ2

∫ b

a{g′′(t)}2dt , , λ > 0

I solution is a cubic spline, with knots at each observed xivalue

I see Figure 10.18 for a non-regularized solution

I has an explicit, finite dimensional solutionI g = {g(t1), . . . , g(tn)} = (I + λK )−1yI K is a symmetric n × n matrix of rank n − 2

STA 2201: Applied Statistics II March 14, 2014 27/37

Page 28: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

-10 -9 -8 -7 -6 -5 -4

6.0

6.5

7.0

7.5

8.0

8.5

log(1/time)

mag

Page 29: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

... smoothing splines

> quake$int = log(1/quake$time)> quake[1:4,]

time mag int1 40.08333 6.0 -3.6909612 162.38889 6.9 -5.0899943 210.22917 6.0 -5.3481984 303.85417 6.2 -5.716548

> attach(quake)> plot(int,mag)> quake.ss2 = smooth.spline(x = int, y = mag, df = 5)> lines(quake.ss2, col="red")> quake.ss3Call:smooth.spline(x = int, y = mag, cv = TRUE)

Smoothing Parameter spar= 1.499945 lambda= 0.0001340604 (25 iterations)Equivalent Degrees of Freedom (Df): 11.35023Penalized Criterion: 64.57512PRESS: 0.1730025> lines(quake.ss3, col="blue")

STA 2201: Applied Statistics II March 14, 2014 29/37

Page 30: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

... smoothing splinesAn example from the R help file for smooth.spline:> data(cars)> attach(cars)> plot(speed, dist, main = "data(cars) & smoothing splines")> cars.spl <- smooth.spline(speed, dist)> (cars.spl)Call:smooth.spline(x = speed, y = dist)

Smoothing Parameter spar= 0.7801305 lambda= 0.1112206 (11 iterations)Equivalent Degrees of Freedom (Df): 2.635278Penalized Criterion: 4337.638GCV: 244.1044> lines(cars.spl, col = "blue")> lines(smooth.spline(speed, dist, df=10), lty=2, col = "red")> legend(5,120,c(paste("default [C.V.] => df =",round(cars.spl$df,1)),+ "s( * , df = 10)"), col = c("blue","red"), lty = 1:2,+ bg=’bisque’)> detach()

STA 2201: Applied Statistics II March 14, 2014 30/37

Page 31: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

5 10 15 20 25

020

4060

80100

120

data(cars) & smoothing splines

speed

dist

Page 32: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

5 10 15 20 25

020

4060

80100

120

data(cars) & smoothing splines

speed

dist

Page 33: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

5 10 15 20 25

020

4060

80100

120

data(cars) & smoothing splines

speed

dist

Page 34: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

5 10 15 20 25

020

4060

80100

120

data(cars) & smoothing splines

speed

dist

default [C.V.] => df = 2.6s( * , df = 10)

Page 35: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

Multidimensional splinesI so far we are considering just 1 X at a timeI for regression splines we replace each X by the new

columns of the basis matrixI for smoothing splines we get a univariate

regressionI it is possible to construct smoothing splines for two or more

inputs simultaneously, butcomputational difficulty increases rapidly

I these are called thin plate splinesI alternative:

E(Y | X1, . . . ,Xp) = f1(X1) + f2(X2) + · · ·+ fp(Xp)additive models

I binary response:logit{E(Y | X1, . . . ,Xp)} = f1(X1) + f2(X2) + · · ·+ fp(Xp)generalized additive models

STA 2201: Applied Statistics II March 14, 2014 35/37

Page 36: data presentation Michael In the news Semi-parametric … · 2014. 3. 14. · Today I data presentation Michael I In the news I Semi-parametric regression I HW 3: due March 21 I Final

Which smoothing method?I basis functions: natural splines, Fourier, wavelet basesI regularization via cubic smoothing splinesI kernel smoothers: locally constant/linear/polynomialI adaptive bandwidth, running medians, running

M-estimatesI Dantzig selector, elastic net, rodeo (Lafferty & Wasserman,

2008)I Faraway (2006) Extending the Linear Model:

I with very little noise, a small amount of local smoothing(e.g. nearest neighbours)

I with moderate amounts of noise, kernel and spline methodsare effective

I with large amounts of noise, parametric methods are moreattractive

I “It is not reasonable to claim that any one smoother isbetter than the rest”

I loess is robust to outliers, and provides smooth fitsI spline smoothers are more efficient, but potentially sensitive

to outliersI kernel smoothers are very sensitive to bandwidth

STA 2201: Applied Statistics II March 14, 2014 36/37