nicola loperfido università degli studi di urbino carlo...

Post on 23-Sep-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Nicola LoperfidoUniversità degli Studi di Urbino "Carlo Bo“,

Dipartimento di Economia, Società e PoliticaVia Saffi 42, Urbino (PU), ITALY

e-mail: nicola.loperfido@uniurb.it

Wirtschaftsuniversität Wien, 03-24-20171/42

Nicola Loperfido, Urbino University

OutlineFinite mixtures

Third moment

Multivariate skewness

Decathlon data

Nicola Loperfido, Urbino UniversityWirtschaftsuniversität Wien,

03-24-20172/42

Mixtures: problem

Are the data skewed because they come fromdifferent populations?

Nicola Loperfido, Urbino University3/42Wirtschaftsuniversität Wien,

03-24-2017

Mixtures: quotation"I have at present been unable to find any generalcondition among the moments, which would beimpossible for a skew curve and possible for acompound, and so indicate compoundness. I donot, however, despair of one being found".(Pearson, 1895)

Nicola Loperfido, Urbino University4/42Wirtschaftsuniversität Wien,

03-24-2017

Mixtures: blood pressureBlood pressure data are skewed.

Platt (1963): hypertension is an illness present in agenetically defined subpopulation.

Pickering (1968): hypertension is a labeling forthose in the upper tail of the population.

Nicola Loperfido, Urbino University5/42Wirtschaftsuniversität Wien,

03-24-2017

Mixtures: definition

A probability density function f(∙) is a finitemixture if it can be represented as a weightedaverage of several probability densityfunctions:

g

i

ii xfxf1

Nicola Loperfido, Urbino University6/42Wirtschaftsuniversität Wien,

03-24-2017

Mixtures: special cases

Normal: each component is normally distributed.

Two-component: there are only two components.

Location: components only differ in location.

Nicola Loperfido, Urbino University7/42Wirtschaftsuniversität Wien,

03-24-2017

Mixtures: options

Components

Densities

Parameters

Nicola Loperfido, Urbino University8/42Wirtschaftsuniversität Wien,

03-24-2017

Mixtures: advantages

Flexibility

Interpretability

Tractability

Nicola Loperfido, Urbino University9/42Wirtschaftsuniversität Wien,

03-24-2017

Mixtures: inferenceMaximum likelihood overestimates the number ofcomponents when they are erroneously assumedto be symmetric.

How can we choose between different models?

Nicola Loperfido, Urbino University10/42Wirtschaftsuniversität Wien,

03-24-2017

Kullback-Leibler divergenceLet p and q be two probability density functions.The Kullback-Leibler divergence of p from q is

xdxq

xpxpqpJ ln,

Nicola Loperfido, Urbino University11/42Wirtschaftsuniversität Wien,

03-24-2017

Kronecker product

Nicola Loperfido, Urbino University12/42Wirtschaftsuniversität Wien,

03-24-2017

Third moment: definition

Nicola Loperfido, Urbino University13/42Wirtschaftsuniversität Wien,

03-24-2017

Kullback-Leibler approximationLet p and q be two densities with identical means,identical covariances and possibly different thirdcumulants K3,p and K3,q. Then their Kullback-Lieblerdivergence is approximately ||K3,p -K3,q||2/12, where||.|| denotes the Euclidean distance (Lin et al, 1999).

Nicola Loperfido, Urbino University14/42Wirtschaftsuniversität Wien,

03-24-2017

Third moment: standardization

The third standardized moment (cumulant) of x isthe third moment of z=-½(x-μ).

Nicola Loperfido, Urbino University15/42Wirtschaftsuniversität Wien,

03-24-2017

TensorsA real tensor is a multidimensional array ofreal values identified by a vector ofsubscripts:

Nicola Loperfido, Urbino University16/42Wirtschaftsuniversität Wien,

03-24-2017

Third-order tensors

Nicola Loperfido, Urbino University17/42Wirtschaftsuniversität Wien,

03-24-2017

Tensor rankThe rank of a n1 n2 n3 third-order tensorA is the smallest number r satisfying

Nicola Loperfido, Urbino University18/42Wirtschaftsuniversität Wien,

03-24-2017

Symmetric tensor rankA p²×p third moment M3,x has symmetric tensorrank k if there are k p-dimensional real vectors v1,..., vk satisfying

.111,3 k

T

kk

T

x vvvvvvM

Nicola Loperfido, Urbino University19/42Wirtschaftsuniversität Wien,

03-24-2017

Third moment: special cases

The symmetric tensor rank of the third moment is

easily recovered when the underlying distribution

is either a shape mixture of skew-normals or a

location normal mixture.

Nicola Loperfido, Urbino University20/42Wirtschaftsuniversität Wien,

03-24-2017

Mardia’s skewness: definition

x, y P E(x) = V(x) = x, y i.i.d.

b1,pM = E[(x- )T -1(y- )]3

Nicola Loperfido, Urbino University21/42Wirtschaftsuniversität Wien,

03-24-2017

Mardia’s skewness: standardizationMardia’s skewness is the trace of the product of

the third standardized moment and its transpose

Nicola Loperfido, Urbino University22/42Wirtschaftsuniversität Wien,

03-24-2017

Mardias skewness: application

The Kullback-Liebler divergence between asymmetric, standardized random vector andanother standardized random vector isapproximately proportional to the Mardia’sskewness of the latter.

Nicola Loperfido, Urbino University23/42Wirtschaftsuniversität Wien,

03-24-2017

Vectorial skewness: definition

22

1

22

11

pd

p

T

V

ZZZ

ZZZ

zzz

Nicola Loperfido, Urbino University24/42Wirtschaftsuniversität Wien,

03-24-2017

Vectorial skewness: standardization

p

T

zV IvecM ,3

Nicola Loperfido, Urbino University25/42Wirtschaftsuniversität Wien,

03-24-2017

Vectorial skewness: clustering

Nicola Loperfido, Urbino University26/42

When data come from a mixture of two weakly symmetricdistributions with different means and proportionalcovariances, the projection of the standardized data onto thedirection of a vectorial skewness consistently estimates thebest linear discriminant projection.

Wirtschaftsuniversität Wien, 03-24-2017

Directional skewness: definition

Nicola Loperfido, Urbino University27/42Wirtschaftsuniversität Wien,

03-24-2017

Directional skewness: standardization

Directional skewness is a function

of the third standardized moment

Nicola Loperfido, Urbino University28/42Wirtschaftsuniversität Wien,

03-24-2017

Directional skewness: fit Let be the total and directionalskewness of the p-dimensional density q. Then thesmallest Kullback-Liebler divergence of q from alocation mixture of two symmetric densities is

approximately .

⇓The fit to the data X of a location mixture of two symmetric densities might be assessed by the difference .

and ,2,2 qq D

p

M

p bb

- ,2,2 qq D

p

M

p bb

- ,1,2 XbXb D

p

M

p

Nicola Loperfido, Urbino University29/42Wirtschaftsuniversität Wien,

03-24-2017

Decathlon data: description Units. The 23 athletes who scored points in all 10

events of the Olympic decathlon in Rio 2016.

Variables. Performances in each event, convertedinto decathlon points using IAAF scoring tables.

Source. The official website of the InternationalAssociation of Athletics Federations (IAAF).

Nicola Loperfido, Urbino University30/42Wirtschaftsuniversität Wien,

03-24-2017

Decathlon data: original variables

Nicola Loperfido, Urbino University31/42Wirtschaftsuniversität Wien,

03-24-2017

Decathlon data: summaries Data are slightly nonnormal, with low to moderate

levels of skewness and kurtosis.

The multiple scatterplot does not show anyparticular features, as for example outliers.

The number of variables is quite large with respectto the number of units.

Nicola Loperfido, Urbino University32/42Wirtschaftsuniversität Wien,

03-24-2017

Decathlon data: principal components

Nicola Loperfido, Urbino University33/42Wirtschaftsuniversität Wien,

03-24-2017

Decathlon data: summary PCA The first two principal component account for

about 55% of the total variation.

Their joint distribution is approximately normal.

They do not clearly suggest the presence ofoutliers.

Nicola Loperfido, Urbino University34/42Wirtschaftsuniversität Wien,

03-24-2017

Decathlon data: Skewed components

We computed the two most skewed and mutuallyorthogonal projections of the Decathlon datausing the R package MaxSkew.

Nicola Loperfido, Urbino University35/42Wirtschaftsuniversität Wien,

03-24-2017

Decathlon data: skewed components

Nicola Loperfido, Urbino University36/42Wirtschaftsuniversität Wien,

03-24-2017

Decathlon data: outliersKarl Robert Saluri (EST). He scored lowest

due to lower-than-average performances innearly all events.

Jeremi Taiwo (USA). He obtained an aboutaverage score due to a very unusual patternof performances.

Nicola Loperfido, Urbino University37/42Wirtschaftsuniversität Wien,

03-24-2017

Decathlon data : conclusion

The scatterplot of the first two most skewedcomponents clearly hints for the presence of twooutliers.

Nicola Loperfido, Urbino University38/42Wirtschaftsuniversität Wien,

03-24-2017

Skewness &

Mixtures

Finite Mixtures

Definition

Properties

Problem

Third MomentsDefinition

Approximation

Tensor

Skewness Measures

Total

Vectorial

Directional

Decathlon dataOriginal variables:

no outliers

Principal components: one outliers

Skewed components: two outliers

39/42Wirtschaftsuniversität Wien, 03-24-2017Nicola Loperfido, Urbino University

Future research

Package Multiskew

Fourth moment

Tensor approach

Nicola Loperfido, Urbino University40/42Wirtschaftsuniversität Wien,

03-24-2017

Essential references

Loperfido, N. (2015). Singular value decomposition of the thirdmultivariate moment. Lin. Alg. Appl. 473, 202-216.

Malkovich, J.F. and Afifi, A.A.(1973). On tests formultivariate normality. J. Amer. Statist. Ass. 68, 176-179.

Mardia, K.V. (1970). Measures of multivariate skewness andkurtosis with applications. Biometrika 57, 519-530.

• McLachlan, G. and Peel, D. (2000). Finite Mixture Models. JohnWiley and Sons Inc, New York.

Mori T.F., Rohatgi V.K. and Székely G.J. (1993). On multivariateskewness and kurtosis. Theory Probab. Appl. 38, 547-551.

Nicola Loperfido, Urbino University41/42Wirtschaftsuniversität Wien,

03-24-2017

Thank you for your attention

Nicola Loperfido, Urbino University42/42Wirtschaftsuniversität Wien,

03-24-2017

top related