nicola loperfido università degli studi di urbino carlo...

42
Nicola Loperfido Università degli Studi di Urbino "Carlo Bo“, Dipartimento di Economia, Società e Politica Via Saffi 42, Urbino (PU), ITALY e-mail: [email protected] Wirtschaftsuniversität Wien, 03-24-2017 1/42 Nicola Loperfido, Urbino University

Upload: others

Post on 23-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Nicola LoperfidoUniversità degli Studi di Urbino "Carlo Bo“,

Dipartimento di Economia, Società e PoliticaVia Saffi 42, Urbino (PU), ITALY

e-mail: [email protected]

Wirtschaftsuniversität Wien, 03-24-20171/42

Nicola Loperfido, Urbino University

Page 2: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

OutlineFinite mixtures

Third moment

Multivariate skewness

Decathlon data

Nicola Loperfido, Urbino UniversityWirtschaftsuniversität Wien,

03-24-20172/42

Page 3: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Mixtures: problem

Are the data skewed because they come fromdifferent populations?

Nicola Loperfido, Urbino University3/42Wirtschaftsuniversität Wien,

03-24-2017

Page 4: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Mixtures: quotation"I have at present been unable to find any generalcondition among the moments, which would beimpossible for a skew curve and possible for acompound, and so indicate compoundness. I donot, however, despair of one being found".(Pearson, 1895)

Nicola Loperfido, Urbino University4/42Wirtschaftsuniversität Wien,

03-24-2017

Page 5: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Mixtures: blood pressureBlood pressure data are skewed.

Platt (1963): hypertension is an illness present in agenetically defined subpopulation.

Pickering (1968): hypertension is a labeling forthose in the upper tail of the population.

Nicola Loperfido, Urbino University5/42Wirtschaftsuniversität Wien,

03-24-2017

Page 6: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Mixtures: definition

A probability density function f(∙) is a finitemixture if it can be represented as a weightedaverage of several probability densityfunctions:

g

i

ii xfxf1

Nicola Loperfido, Urbino University6/42Wirtschaftsuniversität Wien,

03-24-2017

Page 7: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Mixtures: special cases

Normal: each component is normally distributed.

Two-component: there are only two components.

Location: components only differ in location.

Nicola Loperfido, Urbino University7/42Wirtschaftsuniversität Wien,

03-24-2017

Page 8: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Mixtures: options

Components

Densities

Parameters

Nicola Loperfido, Urbino University8/42Wirtschaftsuniversität Wien,

03-24-2017

Page 9: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Mixtures: advantages

Flexibility

Interpretability

Tractability

Nicola Loperfido, Urbino University9/42Wirtschaftsuniversität Wien,

03-24-2017

Page 10: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Mixtures: inferenceMaximum likelihood overestimates the number ofcomponents when they are erroneously assumedto be symmetric.

How can we choose between different models?

Nicola Loperfido, Urbino University10/42Wirtschaftsuniversität Wien,

03-24-2017

Page 11: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Kullback-Leibler divergenceLet p and q be two probability density functions.The Kullback-Leibler divergence of p from q is

xdxq

xpxpqpJ ln,

Nicola Loperfido, Urbino University11/42Wirtschaftsuniversität Wien,

03-24-2017

Page 12: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Kronecker product

Nicola Loperfido, Urbino University12/42Wirtschaftsuniversität Wien,

03-24-2017

Page 13: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Third moment: definition

Nicola Loperfido, Urbino University13/42Wirtschaftsuniversität Wien,

03-24-2017

Page 14: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Kullback-Leibler approximationLet p and q be two densities with identical means,identical covariances and possibly different thirdcumulants K3,p and K3,q. Then their Kullback-Lieblerdivergence is approximately ||K3,p -K3,q||2/12, where||.|| denotes the Euclidean distance (Lin et al, 1999).

Nicola Loperfido, Urbino University14/42Wirtschaftsuniversität Wien,

03-24-2017

Page 15: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Third moment: standardization

The third standardized moment (cumulant) of x isthe third moment of z=-½(x-μ).

Nicola Loperfido, Urbino University15/42Wirtschaftsuniversität Wien,

03-24-2017

Page 16: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

TensorsA real tensor is a multidimensional array ofreal values identified by a vector ofsubscripts:

Nicola Loperfido, Urbino University16/42Wirtschaftsuniversität Wien,

03-24-2017

Page 17: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Third-order tensors

Nicola Loperfido, Urbino University17/42Wirtschaftsuniversität Wien,

03-24-2017

Page 18: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Tensor rankThe rank of a n1 n2 n3 third-order tensorA is the smallest number r satisfying

Nicola Loperfido, Urbino University18/42Wirtschaftsuniversität Wien,

03-24-2017

Page 19: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Symmetric tensor rankA p²×p third moment M3,x has symmetric tensorrank k if there are k p-dimensional real vectors v1,..., vk satisfying

.111,3 k

T

kk

T

x vvvvvvM

Nicola Loperfido, Urbino University19/42Wirtschaftsuniversität Wien,

03-24-2017

Page 20: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Third moment: special cases

The symmetric tensor rank of the third moment is

easily recovered when the underlying distribution

is either a shape mixture of skew-normals or a

location normal mixture.

Nicola Loperfido, Urbino University20/42Wirtschaftsuniversität Wien,

03-24-2017

Page 21: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Mardia’s skewness: definition

x, y P E(x) = V(x) = x, y i.i.d.

b1,pM = E[(x- )T -1(y- )]3

Nicola Loperfido, Urbino University21/42Wirtschaftsuniversität Wien,

03-24-2017

Page 22: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Mardia’s skewness: standardizationMardia’s skewness is the trace of the product of

the third standardized moment and its transpose

Nicola Loperfido, Urbino University22/42Wirtschaftsuniversität Wien,

03-24-2017

Page 23: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Mardias skewness: application

The Kullback-Liebler divergence between asymmetric, standardized random vector andanother standardized random vector isapproximately proportional to the Mardia’sskewness of the latter.

Nicola Loperfido, Urbino University23/42Wirtschaftsuniversität Wien,

03-24-2017

Page 24: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Vectorial skewness: definition

22

1

22

11

pd

p

T

V

ZZZ

ZZZ

zzz

Nicola Loperfido, Urbino University24/42Wirtschaftsuniversität Wien,

03-24-2017

Page 25: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Vectorial skewness: standardization

p

T

zV IvecM ,3

Nicola Loperfido, Urbino University25/42Wirtschaftsuniversität Wien,

03-24-2017

Page 26: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Vectorial skewness: clustering

Nicola Loperfido, Urbino University26/42

When data come from a mixture of two weakly symmetricdistributions with different means and proportionalcovariances, the projection of the standardized data onto thedirection of a vectorial skewness consistently estimates thebest linear discriminant projection.

Wirtschaftsuniversität Wien, 03-24-2017

Page 27: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Directional skewness: definition

Nicola Loperfido, Urbino University27/42Wirtschaftsuniversität Wien,

03-24-2017

Page 28: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Directional skewness: standardization

Directional skewness is a function

of the third standardized moment

Nicola Loperfido, Urbino University28/42Wirtschaftsuniversität Wien,

03-24-2017

Page 29: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Directional skewness: fit Let be the total and directionalskewness of the p-dimensional density q. Then thesmallest Kullback-Liebler divergence of q from alocation mixture of two symmetric densities is

approximately .

⇓The fit to the data X of a location mixture of two symmetric densities might be assessed by the difference .

and ,2,2 qq D

p

M

p bb

- ,2,2 qq D

p

M

p bb

- ,1,2 XbXb D

p

M

p

Nicola Loperfido, Urbino University29/42Wirtschaftsuniversität Wien,

03-24-2017

Page 30: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Decathlon data: description Units. The 23 athletes who scored points in all 10

events of the Olympic decathlon in Rio 2016.

Variables. Performances in each event, convertedinto decathlon points using IAAF scoring tables.

Source. The official website of the InternationalAssociation of Athletics Federations (IAAF).

Nicola Loperfido, Urbino University30/42Wirtschaftsuniversität Wien,

03-24-2017

Page 31: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Decathlon data: original variables

Nicola Loperfido, Urbino University31/42Wirtschaftsuniversität Wien,

03-24-2017

Page 32: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Decathlon data: summaries Data are slightly nonnormal, with low to moderate

levels of skewness and kurtosis.

The multiple scatterplot does not show anyparticular features, as for example outliers.

The number of variables is quite large with respectto the number of units.

Nicola Loperfido, Urbino University32/42Wirtschaftsuniversität Wien,

03-24-2017

Page 33: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Decathlon data: principal components

Nicola Loperfido, Urbino University33/42Wirtschaftsuniversität Wien,

03-24-2017

Page 34: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Decathlon data: summary PCA The first two principal component account for

about 55% of the total variation.

Their joint distribution is approximately normal.

They do not clearly suggest the presence ofoutliers.

Nicola Loperfido, Urbino University34/42Wirtschaftsuniversität Wien,

03-24-2017

Page 35: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Decathlon data: Skewed components

We computed the two most skewed and mutuallyorthogonal projections of the Decathlon datausing the R package MaxSkew.

Nicola Loperfido, Urbino University35/42Wirtschaftsuniversität Wien,

03-24-2017

Page 36: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Decathlon data: skewed components

Nicola Loperfido, Urbino University36/42Wirtschaftsuniversität Wien,

03-24-2017

Page 37: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Decathlon data: outliersKarl Robert Saluri (EST). He scored lowest

due to lower-than-average performances innearly all events.

Jeremi Taiwo (USA). He obtained an aboutaverage score due to a very unusual patternof performances.

Nicola Loperfido, Urbino University37/42Wirtschaftsuniversität Wien,

03-24-2017

Page 38: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Decathlon data : conclusion

The scatterplot of the first two most skewedcomponents clearly hints for the presence of twooutliers.

Nicola Loperfido, Urbino University38/42Wirtschaftsuniversität Wien,

03-24-2017

Page 39: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Skewness &

Mixtures

Finite Mixtures

Definition

Properties

Problem

Third MomentsDefinition

Approximation

Tensor

Skewness Measures

Total

Vectorial

Directional

Decathlon dataOriginal variables:

no outliers

Principal components: one outliers

Skewed components: two outliers

39/42Wirtschaftsuniversität Wien, 03-24-2017Nicola Loperfido, Urbino University

Page 40: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Future research

Package Multiskew

Fourth moment

Tensor approach

Nicola Loperfido, Urbino University40/42Wirtschaftsuniversität Wien,

03-24-2017

Page 41: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Essential references

Loperfido, N. (2015). Singular value decomposition of the thirdmultivariate moment. Lin. Alg. Appl. 473, 202-216.

Malkovich, J.F. and Afifi, A.A.(1973). On tests formultivariate normality. J. Amer. Statist. Ass. 68, 176-179.

Mardia, K.V. (1970). Measures of multivariate skewness andkurtosis with applications. Biometrika 57, 519-530.

• McLachlan, G. and Peel, D. (2000). Finite Mixture Models. JohnWiley and Sons Inc, New York.

Mori T.F., Rohatgi V.K. and Székely G.J. (1993). On multivariateskewness and kurtosis. Theory Probab. Appl. 38, 547-551.

Nicola Loperfido, Urbino University41/42Wirtschaftsuniversität Wien,

03-24-2017

Page 42: Nicola Loperfido Università degli Studi di Urbino Carlo Bo“,statmath.wu.ac.at/research/talks/resources/2017_03_Loperfido.pdf · Università degli Studi di Urbino "Carlo Bo“,

Thank you for your attention

Nicola Loperfido, Urbino University42/42Wirtschaftsuniversität Wien,

03-24-2017