an introduction to the r package snazzalini.stat.unipd.it/sn/talk-pkg_sn-intro.pdfan introduction to...

29
Scheme Prob-std Prob-obj Stats q() An introduction to the R package sn Adelchi Azzalini Università di Padova, Italia Skewed world of data: Workshop in honor of Reinaldo B. Arellano-Valle’s 65th birthday October 2017 Pontificia Universidad Católica de Chile

Upload: others

Post on 25-Sep-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

An introduction to the R package sn

Adelchi AzzaliniUniversità di Padova, Italia

Skewed world of data:Workshop in honor of Reinaldo B. Arellano-Valle’s 65th birthday

October 2017Pontificia Universidad Católica de Chile

Page 2: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

Scheme

1 / 28

Page 3: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

Pacakge sn: a one-slide panorama

probabilityclassical-type functions for distributions:

univariate dist’n: {d,p,q,r}{sn,st,sc}multivariate dist’n: {d,p,r}{msn,mst,msc}utilities: dp2cp, cp2dp, {sn,st}.cumulants,. . .build your own dist’n: {d,r}{SymmModulated},{d,r}{mSymmModulated} and plot (if d = 2)

SEC distribution ‘objects’two classes: SECdistrUv, SECdistrMv; protocol: S4

create object: makeSECdistr (also extractSECdistr)manipulate it: marginalSECdistr, affineTransSECdistr,conditionalSECdistr (only for SN)methods: summary, plot, show, mean, sd, var, . . .

statisticscore general function: selm (two object types: selm, mselm)methods: show, plot, summary, coef, residuals,fitted, predict, logLik, profile, confint, . . .

2 / 28

Page 4: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

Prob-std

3 / 28

Page 5: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

Simple functions: dsn and plotting SN density

# plotting SN densitieslibrary(sn)x <- seq(-2, 4, length=301)y1 <- dsn(x, xi=0, omega=1.2, alpha=10)plot(x, y1, type="l", ylab="density")y2 <- dsn(x, 0, 1.2, 5)lines(x, y2, col=2)y3 <- dsn(x, dp=c(0, 1.2, 2))lines(x, y3, col=4, lty=2)

−2 −1 0 1 2 3 4

0.0

0.1

0.2

0.3

0.4

0.5

0.6

xde

nsity

4 / 28

Page 6: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

Simple functions: cp2dp, dp2dp, rst, dst

# switch between parameterizationscpST <- c(1, 1.5, 1.5, 5.1)# CP = (mean, st.dev, gamma1, gamma2)dpST <- cp2dp(cpST, family="ST")print(dpST)# xi omega alpha nu# -0.6202 1.8754 3.6733 7.1797print(dp2cp(dpST, family="ST"))# back to cpST## sampling from ST and plotting densityy <- rst(1000, dp=dpST)hist(y, prob=TRUE, col="gray90")x <- seq(min(y), max(y), length=301)pdfST <- dst(x, dp=dpST)lines(x, pdfST, col=4)print(sd(y)) # about cpST[2]

Histogram of y

y

Den

sity

−2 0 2 4 6 8 100.

000.

050.

100.

150.

200.

250.

300.

35

5 / 28

Page 7: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

Unleash your creativity: case d = 1

# SGN of Arellano-Valle et al. (2004)wSGN <- function(z, lambda) z * lambda[1]/sqrt(1 + lambda[2]*z^2)x <- seq(-1, 14, length=301)pdf <- dSymmModulated(x, 5, 2, f0="normal", G0="normal", w=wSGN,

lambda=c(3,5))plot(x, pdf, type="l")

0 5 10

0.00

0.05

0.10

0.15

0.20

0.25

0.30

x

pdf

now change f0, G0, w,... and

make your density

6 / 28

Page 8: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

Unlash your creativity: case d > 1

x <- matrix((1:12)/3, 4, 3)S <- diag(1:3) + outer(1:3,1:3)/2wMvTrigs <- function(z, p, q) sin(z %*% p)/(1 + cos(z %*% q))pdf <- dmSymmModulated(x, xi=1:3, Omega=S, f0="t", G0="logistic",

w=wMvTrigs, par.f0=5, par.G0=NULL, p=c(2,3,-2), q=c(1,1, 0))

# plotting when d=2range <- cbind(c(-4,4), c(-4,4))plot2D.SymmModulated(range, xi=c(0,0), Omega=S[1:2,1:2],

f0="normal", G0="normal", w=wMvTrigs,par.f0=NULL, par.G0=NULL, p=c(1,-3), q=c(1,1), col=4)

y <- rmSymmModulated(2500, xi=c(0,0), Omega=S[1:2,1:2],f0="normal", G0="normal", w=wMvTrigs,par.f0=NULL, par.G0=NULL, p=c(1,-3), q=c(1,1))

points(y, cex=0.3, col="gray60")

7 / 28

Page 9: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

Unlash your creativity: plotting if d = 2

−4 −2 0 2 4

−4

−2

02

4

8 / 28

Page 10: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

Prob-obj

9 / 28

Page 11: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

Working with distribution ‘objects’

Idea is to make a SEC distribution and work with this objectOnce the distribution object is created, we can:

extract/compute various characteristicsplot itmanipulate it to create a new distribution (if d > 1)

Technical note: distributions are S4-type objects

10 / 28

Page 12: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

How to create a distribution

Chief procedure is makeSECdistr(another route is extractSECdistr from selm or mselm object)

we must to specify DP parametersif univariate case, DP is assigned as a vectorif multivariate case, DP is assigned as a list

we must to specify the family,options are: "SN", "ESN", "ST", "SC"

optional: name of the distribution, names of components?makeSECdistr tells you everything

11 / 28

Page 13: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

A simple illustration

f1 <- makeSECdistr(dp=c(3,2,5), family="SN", name="First-SN")#show(f1) # of just type ’f1’# Probability distribution of variable ’First-SN’# Skew-elliptically contoured distribution of univariate family SN# Direct parameters:# xi omega alpha# 3 2 5

summary(f1) # longer output#c(mean(f1), sd(f1)) # lends 4.565 1.246

plot(f1)

# many possible variants, such as:plot(f1, probs=c(0.1, 0.9))

?plot.SECdistr # says more, see method for signature ’SECdistrUv’12 / 28

Page 14: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

A simple illustration – plot ’First-SN’

2 3 4 5 6 7 8 9

0.0

0.1

0.2

0.3

First−SN

prob

abili

ty d

ensi

ty

0.05 0.25 0.50 0.75 0.95

Probability density of First−SN

SN distribution, dp = (3, 2, 5)

13 / 28

Page 15: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

An illustration with d = 3

dp3 <- list(xi = c(3, -2, 0), Omega = diag(1:3) + outer(1:3,1:3)/2,alpha = c(5, -2, 6), nu = 5)

st3 <- makeSECdistr(dp=dp3, family="ST", name="Multiv.ST",compNames=c("X", "Y", "Z"))

show(st3) # of just type ’st3’

mean(st3)# X Y Z# 3.944 -1.253 2.194

vcov(st3) # 3x3 variance matrix

summary(st3) # longer output

plot(st3, col="blue", landmarks="", main=NULL) # note p=0.xx labels

?plot.SECdistr # look at method for signature ’SECdistrMv’

14 / 28

Page 16: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

An illustration with d = 3: matrix plot

X

−5 0 5 0 5 10

24

68

−5

05

Y

−5

05

05

10

2 4 6 8 −5 0 5

Z

15 / 28

Page 17: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

Manipulation of a distribution

Refers to multivariate distributions onlyaffineTransSECdistr(object, a, A, <etc>)applies affine transformation a+ A′Y ,where a is m-vector and A is d ×m matrixmarginalSECdistr(object, comp, <etc>)get marginal distribution of comp components from object

conditionalSECdistr(object, fixed.comp,fixed.values, <etc>)applies conditioning on fixed.comp components(only for "SN" and "ESN" distributions)

16 / 28

Page 18: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

Stats

17 / 28

Page 19: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

Function selm for model fitting

Naming: lm + <Skew-Elliptical error> = selmAlso similar logic of lm: fit a linear model to location parameter

fit︸︷︷︸S4 obj

<- selm( response︸ ︷︷ ︸vector, matrix

∼ formula︸ ︷︷ ︸like in lm

, family ="SN"︸ ︷︷ ︸also "ST", "SC"

, <plus>)

Optional <plus> terms include:data, subset the same as in lmstart starting valuesmethod estimation methods are "MLE" and "MPLE";

the latter can be used to set a prior distribution of αpenalty only relevant for method="MPLE"fixed.param allows only limited specifications, such as nu=<value>

(if family="ST") and alpha=0... additional options

18 / 28

Page 20: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

Methods for a selm|mselm object

classes selm returns a S4 object of class selm (for univariateresponse) or mselm (for multivariate response)

methods for each class, a bounch of ’methods’ are availablemethods like for lm S3-objects: summary, plot, residuals,fitted, coef, predict, confintadditional methods: logLik, profile, vcov

in addition extractSECdistr supplies a link to the probability section

Note: for simpler interpretability, the default parameterization is CP.This contrasts with probability section which uses DP.

19 / 28

Page 21: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

A simple example: Barolo phenols

library(sn)data(wines)olo.ph <- wines[wines$wine=="Barolo", "phenols"]fit <- selm(olo.ph ~ 1, family="SN")plot(fit, which=2:3)## trysummary(fit) # works with CPsummary(fit, param.type="DP")

20 / 28

Page 22: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

A simple example: Barolo phenols – plots

2.5 3.0 3.5 4.0

0.0

0.5

1.0

observed variable

prob

abili

ty d

ensi

ty

Empirical values and fitted distribution

0 1 2 3 4 5

02

46

8

Theoretical values

Em

piri

cal v

alue

s

Q−Q plot of (scaled DP residuals)^2

53

4

59

21 / 28

Page 23: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

A multivariate example: más vino

library(sn)data(wines)fit2 <- selm(cbind(tartaric, malic, uronic) ~ colour + hue,

family="ST", data=wines, subset=(wine != "Grignolino"))

plot(fit2, which=2:3)

summary(fit2) # works with CPsummary(fit2, param.type="DP")

# constraint on degrees of freedom:fit3 <- selm(cbind(tartaric, malic, uronic) ~ colour + hue,

family="ST", fixed.param=list(nu=8),data=wines, subset=(wine != "Grignolino"))

22 / 28

Page 24: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

A multivariate example: más vino, plots

tartaric

88

73

63

−2 −1 0 1 2 3

88

73

63

−0.5 0.0 0.5

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

88

73

63

−2

−1

01

23

malic

88

73

63

−2

−1

01

23

88

73

63

−0.

50.

00.

5

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

88

73

63

−2 −1 0 1 2 3

uronic

0 5 10 15 20 25

05

1015

theoretical values

empi

rica

l val

ues

Q−Q plot of Mahalanobis distances

88

73

63

23 / 28

Page 25: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

For the more adventurous: profile log L (LRT in fact)

# re-use earlier model for Barolo phenolsshow(fit)# Object class: selm# Call: selm(formula = ba.ph ~ 1, family = "SN")# [...omissis...]

summary(fit)# [...omissis...]# Parameters of the SEC random component# estimate std.err# s.d. 0.337 0.04# gamma1 0.703 0.26

pll <- profile(fit, "cp", param.name="gamma1", param.val=c(0, 0.97))

profile(fit, "dp", param.name=c("omega", "alpha"),param.val=list(c(0.25, 1), c(-1, 9)), npt=c(51,51) )

Note: works for selm-class objects, not mselm-class24 / 28

Page 26: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

For the more adventurous: profile log L, plots

Plots of (profile) Deviance ≡ LRT statistic

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

6

gamma1

2*{m

ax(lo

gLik

) −

logL

ik}

log(omega)

alph

a

−1.4 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2 0.0

02

46

8

25 / 28

Page 27: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

Only for the more technically oriented people

selm is the user interface functionselm prepares work for the lower-level function selm.fit

however, not even selm.fit performs the actual fittingdepending on the fitted model, specific functions are called:sn.mple, st.mple, msn.mle, msm.mple, mst.mple

To improve efficiency, one can call selm.fit directly,at che cost of some more programming effortOne can even call the bottom-level functions, below selm.fit

For more details, see ?selm.fit

26 / 28

Page 28: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

q()

27 / 28

Page 29: An introduction to the R package snazzalini.stat.unipd.it/SN/talk-pkg_sn-intro.pdfAn introduction to the R package sn Author Adelchi Azzalini Università di Padova, Italia Created

Scheme Prob-std Prob-obj Stats q()

q()

28 / 28