research article the lambert way to gaussianize heavy

17
Research Article The Lambert Way to Gaussianize Heavy-Tailed Data with the Inverse of Tukey’s h Transformation as a Special Case Georg M. Goerg Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213, USA Correspondence should be addressed to Georg M. Goerg; [email protected] Received 25 July 2014; Accepted 1 October 2014 Academic Editor: Taizhong Hu Copyright © 2015 Georg M. Goerg. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. I present a parametric, bijective transformation to generate heavy tail versions of arbitrary random variables. e tail behavior of this heavy tail Lambert W × random variable depends on a tail parameter ≥0: for =0, , for >0 has heavier tails than . For being Gaussian it reduces to Tukey’s distribution. e Lambert W function provides an explicit inverse transformation, which can thus remove heavy tails from observed data. It also provides closed-form expressions for the cumulative distribution (cdf) and probability density function (pdf). As a special case, these yield analytic expression for Tukey’s pdf and cdf. Parameters can be estimated by maximum likelihood and applications to S&P 500 log-returns demonstrate the usefulness of the presented methodology. e R package LambertW implements most of the introduced methodology and is publicly available on CRAN. 1. Introduction Statistical theory and practice are both tightly linked to Normality. In theory, many methods require Gaussian data or noise: (i) regression oſten assumes Gaussian errors; (ii) many time series models are based on Gaussian white noise [1–3]. In such cases, a model M N , parameter estimates and their standard errors, and other properties are then studied, all based on the ideal(istic) assumption of Normality. In practice, however, data/noise oſten exhibits asymmetry and heavy tails, for example, wind speed data [4], human dynamics [5], or Internet traffic data [6]. Particularly notable examples are financial data [7, 8] and speech signals [9], which almost exclusively exhibit heavy tails. us a model M N developed for the Gaussian case does not necessarily provide accurate inference anymore. One way to overcome this shortcoming is to replace M N with a new model M , where is a heavy tail distribution: (i) regression with Cauchy errors [10]; (ii) forecasting long memory processes with heavy tail innovations [11, 12], or ARMA modeling of electricity loads with hyperbolic noise [13]. See also Adler et al. [14] for a wide range of statistical applications and methodology for heavy-tailed data. While such fundamental approaches are attractive from a theoretical perspective, they can become unsatisfactory from a practical point of view. Many successful statistical models and techniques assume Normality, their theory is very well understood, and many algorithms are implemented for the simple and oſten much faster Gaussian case. us developing models based on an entirely unrelated distribution is like throwing out the (Gaussian) baby with the bathwater. It would be very useful to transform a Gaussian random variable to a heavy-tailed random variable and vice versa and thus rely on knowledge and algorithms for the well-understood Gaussian case, while still capturing heavy tails in the data. Optimally such a transformation should (a) be bijective, (b) include Normality as a special case for hypothesis testing, and (c) be parametric so the optimal transformation can be estimated efficiently. Figure 1 illustrates this pragmatic approach: researchers can make their observations y as Gaussian as possible (x ) before making inference based on their favorite Gaussian model M N . is avoids the development of, or the data analysts waiting for, a whole new theory of M and new implementations based on a particular heavy-tailed distri- bution , while still improving statistical inference from Hindawi Publishing Corporation e Scientific World Journal Volume 2015, Article ID 909231, 16 pages http://dx.doi.org/10.1155/2015/909231

Upload: others

Post on 01-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article The Lambert Way to Gaussianize Heavy

Research ArticleThe Lambert Way to Gaussianize Heavy-Tailed Data with theInverse of Tukeyrsquos h Transformation as a Special Case

Georg M Goerg

Department of Statistics Carnegie Mellon University Pittsburgh PA 15213 USA

Correspondence should be addressed to Georg M Goerg imgmgeorg

Received 25 July 2014 Accepted 1 October 2014

Academic Editor Taizhong Hu

Copyright copy 2015 Georg M Goerg This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

I present a parametric bijective transformation to generate heavy tail versions of arbitrary randomvariablesThe tail behavior of thisheavy tail Lambert W times 119865

119883random variable depends on a tail parameter 120575 ge 0 for 120575 = 0 119884 equiv 119883 for 120575 gt 0119884 has heavier tails than

119883 For119883 being Gaussian it reduces to Tukeyrsquos ℎ distributionThe Lambert W function provides an explicit inverse transformationwhich can thus remove heavy tails from observed data It also provides closed-form expressions for the cumulative distribution(cdf) and probability density function (pdf) As a special case these yield analytic expression for Tukeyrsquos ℎ pdf and cdf Parameterscan be estimated by maximum likelihood and applications to SampP 500 log-returns demonstrate the usefulness of the presentedmethodology The R package LambertW implements most of the introduced methodology and is publicly available on CRAN

1 Introduction

Statistical theory and practice are both tightly linked toNormality In theory many methods require Gaussian dataor noise (i) regression often assumes Gaussian errors (ii)many time series models are based on Gaussian white noise[1ndash3] In such cases a model MN parameter estimates andtheir standard errors and other properties are then studiedall based on the ideal(istic) assumption of Normality

In practice however datanoise often exhibits asymmetryand heavy tails for example wind speed data [4] humandynamics [5] or Internet traffic data [6] Particularly notableexamples are financial data [7 8] and speech signals [9]which almost exclusively exhibit heavy tails Thus a modelMN developed for the Gaussian case does not necessarilyprovide accurate inference anymore

One way to overcome this shortcoming is to replaceMN

with a new model M119866 where 119866 is a heavy tail distribution

(i) regression with Cauchy errors [10] (ii) forecasting longmemory processes with heavy tail innovations [11 12] orARMA modeling of electricity loads with hyperbolic noise[13] See also Adler et al [14] for a wide range of statisticalapplications and methodology for heavy-tailed data

While such fundamental approaches are attractive from atheoretical perspective they can become unsatisfactory froma practical point of view Many successful statistical modelsand techniques assume Normality their theory is very wellunderstood and many algorithms are implemented for thesimple and often much faster Gaussian caseThus developingmodels based on an entirely unrelated distribution 119866 is likethrowing out the (Gaussian) baby with the bathwater

It would be very useful to transform a Gaussian randomvariable 119883 to a heavy-tailed random variable 119884 and viceversa and thus rely on knowledge and algorithms for thewell-understood Gaussian case while still capturing heavytails in the data Optimally such a transformation should(a) be bijective (b) include Normality as a special case forhypothesis testing and (c) be parametric so the optimaltransformation can be estimated efficiently

Figure 1 illustrates this pragmatic approach researcherscan make their observations y as Gaussian as possible (x

120591)

before making inference based on their favorite Gaussianmodel MN This avoids the development of or the dataanalysts waiting for a whole new theory of M

119866and new

implementations based on a particular heavy-tailed distri-bution 119866 while still improving statistical inference from

Hindawi Publishing Corporatione Scientific World JournalVolume 2015 Article ID 909231 16 pageshttpdxdoiorg1011552015909231

2 The Scientific World Journal

InferenceInference

Latent (Gaussian) Observed (heavy tails)

Recovered

ldquoGaussianizerdquoheavy tails

Methodsmodelsassuming

NormalityExact

Methodsmodelsfor heavy-tailed

Approx

X and x

Y and y

X120591 and x120591 W120591(Y) = Hminus1120591 (Y)

FX and FX120591

Lambert W times FX(Tukeyrsquos h if FX = 119977)

on X120591H120591(middot)

H120591(X)

distributions (if exist)

on Y

Figure 1 Schematic view of the heavy tail LambertW times 119865119883framework Latent input119883 sim 119865

119883119867

120591(119883) from (6) transforms (solid arrows)119883 to

119884 sim Lambert W times 119865119883and generates heavy tails (right) Observed heavy-tailed 119884 and y (1) use119882

120591(sdot) to back-transform y to latent ldquoNormalrdquo

x120591 (2) use modelMN of your choice (regression time series models hypothesis testing etc) for inference on x

120591 and (3) convert results back

to the original ldquoheavy-tailed worldrdquo of y (right)

heavy-tailed data y For example consider y = (1199101 119910

500)

from a standard Cauchy distribution C(0 1) in Figure 2(a)modeling heavy tails by a transformation makes it evenpossible to Gaussianize this Cauchy sample (Figure 2(c))This ldquonicerdquo data x

120591can then be subsequently analyzed with

common techniques For example the location can now beestimated using the sample average (Figure 2(d)) For detailssee Section 61

Liu et al [15] use a semiparametric approach where119884 hasa nonparanormal distribution if 119891(119884) sim N(120583 1205902) and 119891(sdot)is an increasing smooth function they estimate 119891(sdot) usingnonparametric methods This leads to a greater flexibility inthe distribution of 119884 but it suffers from two drawbacks (i)nonparametric methods have slower convergence rates andthus need large samples and (ii) for identifiability of 119891(sdot)E119891(119884) equiv E119884 and Var(119891(119884)) equiv Var(119884) must hold While(i) is inherent to nonparametric methods point (ii) requires119884 to have finite mean and variance which is often not metfor heavy-tailed data Thus here we use parametric trans-formations which do not rely on restrictive identifiabilityconditions and also work well for small sample sizes

The main contributions of this work are threefold (a) ametafamily of heavy tail Lambert W times 119865

119883distributions (see

also [16]) with Tukeyrsquos ℎ distribution [17] as a special case (b)a bijective transformation to ldquoGaussianizerdquo heavy-tailed data(Section 2) and (c) simple expressions for the cumulativedistribution function (cdf) 119866

119884(119910) and probability density

function (pdf) 119892119884(119910) (Section 24) In particular analytic

expressions for the pdf and cdf for Tukeyrsquos ℎ (Section 3) arepresented here to the best of the authorrsquos knowledge for thefirst time in the literature

Section 4 introduces amethod ofmoments estimator andstudies the maximum likelihood estimator (MLE) Section 5shows their finite sample properties As has been shownin many case studies Tukeyrsquos ℎ distribution (heavy tailLambert W times Gaussian) is useful to model data with uni-modal heavy-tailed densities Applications to SampP 500 log-returns confirm the usefulness of the Lambert W framework(Section 6) Finally we discuss the new methodology andfuture work in Section 7 Detailed derivations and proofsare given in the Supplementary Material available online athttpdxdoiorg1011552015909231

Computations figures and simulations were done inR [18] The R package LambertW implements most of thepresented methodology and is publicly available on CRAN

11 Multivariate Extensions While this work focuses onthe univariate case multivariate extensions of the presentedmethods can be defined component-wise analogously to themultivariate version of Tukeyrsquos ℎ distribution [19] Whilethis may not make the transformed random variables jointlyGaussian it still provides a good starting point for more well-behaved multivariate estimation

The Scientific World Journal 3

0 100 200 300 400 500minus200

0

200

600

y

(a) Random sample y sim C(0 1)

0 100 200 300 400 5000

5

10

15

20

25

30

X120582

(b) Box-Cox transformed xMLE

with = 037

0 100 200 300 400 500

minus3

minus2

minus1

0

1

2

3

4

X120591

(c) Gaussianized x120591MLE with 120591 = (003 105 086)

0 100 200 300 400 500minus3

minus2

minus1

0

1

2

Number of samples

Cum

ulat

ive s

ampl

e ave

rage

CauchyGaussianized

(d) Cumulative sample average

Figure 2 Gaussianizing a standard Cauchy sample For (d) 120591(119899) was estimated for each fixed 119899 = 5 500 before Gaussianizing (1199101 119910

119899)

12 Box-Cox Transformation A popular method to deal withskewed high variance data is the Box-Cox transformation

x120582=

y120582 minus 1120582

if 120582 gt 0log y if 120582 = 0

(1)

A major limitation of (1) is the nonnegativity constraint ony which prohibits its use in many applications To avoidthis limitation it is common to shift the data y = y +|min(y)| ge 0 which restricts 119884 to a half-open intervalIf however the underlying process can occur on the entirereal line such a shift undermines statistical inference for yetunobserved data (see [20]) Even if out-of-sample predictionis not important for the practitioner Figure 2(b) shows thatthe Box-Cox transformation in fact fails to remove heavy tailsfrom the Cauchy sample (We use y = y + |min(y)| + 1 anduse boxcox from the MASS R package 120582 = 037)

Moreover the main purpose of the Box-Cox transfor-mation is to stabilize variance [21ndash23] and remove right tailskewness [24] a lower empirical kurtosis is merely a by-result of the variance stabilization In contrast the Lambert

W framework is designed primarily to model heavy-tailedrandom variables and remove heavy tails from data and hasno difficulties with negative values

2 Generating Heavy Tails UsingTransformations

Random variables exhibit heavy tails if more mass than for aGaussian random variable lies at the outer end of the densitysupport A random variable 119885 has a tail index 119886 if its cdfsatisfies 1 minus 119865

119885(119911) sim 119871(119911)119911

minus119886 where 119871(119911) is a slowly varyingfunction at infinity that is lim

119911rarrinfin119871(119905119911)119871(119911) = 1 for all

119905 gt 0 [25] (There are various similar definitions of heavy fator long tails for this work these differences are not essential)The heavy tail index 119886 is an important characteristic of 119885 forexample only moments up to order 119886 can exist

21 Tukeyrsquos ℎ Distribution A parametric transformation isthe basis of Tukeyrsquos ℎ random variables [17]

119885 = 119880 exp(ℎ2

1198802) ℎ ge 0 (2)

4 The Scientific World Journal

minus2 minus1 0 1 2 3 4minus2minus1

01234

u

z=G120575(u)

120575r = 110z = u

120575120001 = 0

(a) 120575ℓ120575119903 transformation (3)

minus2 minus1 0 1 2 3 4minus2

minus1

0

1

2

z

120575r = 110u = z

u=W

120575(z)

120575120001 = 0

(b) Inverse119882120575ℓ 120575119903(119911) in (80) in the supplemen-

tary material

z

01 02 03 04 05 06 07 08 09 1 1

1 12

13

14 15

16 17

18 19

2

23 2

4

00 05 10 1500051015202530

120575

W120575(z)

(c) Inverse119882120575(119911)(9) as a function of 120575 and 119911

Figure 3 Transformation and inverse transformation for 120575ℓ= 0 and 120575

119903= 110 identity on the left (same tail behavior) and a heavy-tailed

transformation in the right tail of input 119880

where 119880 is standard Normal random variable and ℎ isthe heavy tail parameter The random variable 119885 has tailparameter 119886 = 1ℎ [17] and reduces to the Gaussian for ℎ = 0Morgenthaler and Tukey [26] extend the ℎ distribution to theskewed heavy-tailed family of ℎℎ random variables

119885 =

119880 exp(120575ℓ

2

1198802) if 119880 le 0

119880 exp(120575119903

2

1198802) if 119880 gt 0

(3)

where again 119880 sim N(0 1) Here 120575ℓge 0 and 120575

119903ge 0 shape the

left and right tail of 119885 respectively thus transformation (3)can model skewed and heavy-tailed data see Figure 3(a) Forthe sake of brevity let119867

120575(119906) = 119906 exp((1205752)1199062)

However despite their great flexibility Tukeyrsquos ℎ andℎℎ distributions are not very popular in statistical practicebecause expressions for the cdf or pdf have not been availablein closed form Although Morgenthaler and Tukey [26]express the pdf of (2) as (ℎ equiv 120575)

119892119885 (119911) =

119891119880(119867

minus1

120575(119911))

1198671015840

120575(119867

minus1

120575(119911))

(4)

they fall short of making119867minus1

120575(119911) explicit So far the inverse of

(2) or (3) has been considered analytically intractable [4 27]Thus parameter inference relied on matching empirical andtheoretical quantiles [4 17 26] or by themethod ofmoments[28] Only recently Headrick et al [28] provided numericalapproximations to the inverse However numerical approx-imations can be slow and prohibit analytical derivationsThus a closed form analytically tractable pdf which canbe computed efficiently is essential for a widespread use ofTukeyrsquos ℎ (and variants)

In this work I present this long sought closed-forminverse which has a large body of literature in mathematicsand is readily available in standard statistics software Forease of notation and concision main results are shown for120575ℓ= 120575

119903= 120575 analogous results for 120575

ℓ= 120575

119903will be stated

without details

22 Heavy Tail Lambert W Random Variables Tukeyrsquos ℎtransformation (2) is strongly related to the approach takenby Goerg [16] to introduce skewness in continuous randomvariables 119883 sim 119865

119883(119909) In particular if 119885 sim Tukeyrsquos ℎ then

1198852sim skewed Lambert W times 120594

2

1with skew parameter 120574 = ℎ

Adapting the skew Lambert W times 119865119883inputoutput idea

(see Figure 1) Tukeyrsquos ℎ random variables can be generalizedto heavy-tailed Lambert W times 119865

119883random variables (Most

concepts and methods from the skew Lambert W times 119865119883case

transfer one-to-one to the heavy tail Lambert W randomvariables presented hereThus for the sake of concision I referto Goerg [16] for details of the Lambert W framework)

Definition 1 Let119880 be a continuous random variable with cdf119865119880(119906 | 120573) pdf 119891

119880(119906 | 120573) and parameter vector 120573 Then

119885 = 119880 exp(1205752

1198802) 120575 isin R (5)

is a noncentral nonscaled heavy tail LambertW times 119865119880random

variable with parameter vector 120579 = (120573 120575) where 120575 is the tailparameter

Tukeyrsquos ℎ distribution results for 119880 being a standardGaussianN(0 1)

Definition 2 For a continuous location-scale family randomvariable 119883 sim 119865

119883(119909 | 120573) define a location-scale heavy-tailed

LambertW times 119865119883random variable

119884 = 119880 exp(1205752

1198802)120590

119883+ 120583

119883 120575 isin R (6)

with parameter vector 120579 = (120573 120575) where119880 = (119883minus120583119883)120590

119883and

120583119883and120590

119883aremean and standard deviation of119883 respectively

The input is not necessarily Gaussian (Tukeyrsquos ℎ) but canbe any other location-scale continuous random variable forexample from a uniform distribution 119883 sim 119880(119886 119887) (seeFigure 4)

The Scientific World Journal 5

120575 0110

00 05 10 15 20 25 30000204060810

y

Pdf

00 05 10 15 20 25 300002040608

y

Cdf

(a) Lambert Wtimes1205942

119896with 120573=119896=1

13

Pdf

000204060810

Cdf

0 2 4 6 8 10 12000

010

020

y

0 2 4 6 8 10 12y

(b) Lambert W times Γ(119904 119903) with 120573 =(119904 119903) = (3 1)

1

minus4 minus2 0 2 4000204060810

y

Cdf

minus4 minus2 0 2 40001020304

y

Pdf

(c) Lambert W times N(120583 1205902) with

120573 = (120583 120590) = (0 1)

minus4 minus2 0 2 4000204060810

y

Cdf

minus4 minus2 0 2 4000102030405

y

Pdf

5

(d) Lambert W times 119880(119886 119887) with120573 = (119886 119887) = (minus1 1)

Figure 4 Pdf (left) and cdf (right) of a heavy tail (a) ldquononcentral nonscaledrdquo (b) ldquoscalerdquo and (c and d) ldquolocation-scalerdquo Lambert W times 119865119883

random variable 119884 for various degrees of heavy tails (color dashed lines)

Definition 3 Let 119883 sim 119865119883(119909119904 | 120573) be a continuous scale-

family random variable with scale parameter 119904 and standarddeviation 120590

119883 let 119880 = 119883120590

119883 Then

119884 = 119883 exp(1205752

1198802) 120575 isin R (7)

is a scaled heavy-tailed Lambert W times 119865119883random variable

with parameter 120579 = (120573 120575)

Let 120591 = (120583119883(120573) 120590

119883(120573) 120575) define transformation (6) (For

noncentral nonscale input set 120591 = (0 1 120575) for scale-familyinput 120591 = (0 120590

119883 120575)) The shape parameter 120575(= Tukeyrsquos ℎ)

governs the tail behavior of 119884 for 120575 gt 0 values further awayfrom 120583

119883are increasingly emphasized leading to a heavy-

tailed version of119883 for 120575 = 0 119884 equiv 119883 and for 120575 lt 0 values faraway from the mean are mapped back again closer to 120583

119883 For

120575 ge 0 and 119883 isin (minusinfininfin) also 119884 isin (minusinfininfin) For 120575 ge 0 and119883 isin [0infin) also 119884 isin [0infin)

Remark 4 (only nonnegative 120575) Although 120575 lt 0 givesinteresting properties for 119884 it defines a nonbijective trans-formation and leads to parameter-dependent support andnonunique inputThus for the remainder of this work assume120575 ge 0 unless stated otherwise

23 Inverse Transformation ldquoGaussianizerdquo Heavy-Tailed DataTransformation (6) is bijective and its inverse can beobtained via the Lambert W function which is the inverseof 119911 = 119906 exp(119906) that is that function which satisfies119882(119911) exp(119882(119911)) = 119911 It has been studied extensively inmathematics physics and other areas of science [29ndash31] andis implemented in the GNU Scientific Library (GSL) [32]Only recently the Lambert W function received attentionin the statistics literature [16 33ndash35] It has many usefulproperties (see Appendix A in the supplementary material

and Corless et al [29]) in particular 119882(119911) is bijective for119911 ge 0

Lemma 5 The inverse transformation of (6) is

119882120591 (119884) = 119882120575

(

119884 minus 120583119883

120590119883

)120590119883+ 120583

119883= 119880120590

119883+ 120583

119883= 119883 (8)

where

119882120575 (119911) = sgn (119911)(

119882(1205751199112)

120575

)

12

(9)

and sgn(119911) is the sign of 119911119882120575(119911) is bijective for all 120575 ge 0 and

all 119911 isin R

Lemma 5 gives for the first time an analytic bijectiveinverse of Tukeyrsquos ℎ transformation119867minus1

120575(119910) of Morgenthaler

and Tukey [26] is now analytically available as (8) Bijectivityimplies that for any data y and parameter 120591 the exact inputx120591= 119882

120591(y) sim 119865

119883(119909) can be obtained

In view of the importance and popularity of Normalitywe clearly want to back-transform heavy-tailed data todata from a Gaussian rather than yet another heavy-taileddistribution Tail behavior of random variables is typicallycompared by their kurtosis 120574

2(119883) = E(119883 minus 120583

119883)4120590

4

119883 which

equals 3 if 119883 is Gaussian Hence for the future when weldquonormalize yrdquo we cannot only center and scale but alsotransform it to x

120591with 120574

2(x120591) = 3 (see Figure 2(c)) While

1205742(119883) = 3 does not guarantee that 119883 is Gaussian it is a good

baseline for a Gaussian sample Furthermore it puts differentdata not only on the same scale but also on the same tail

This data-driven view of the Lambert W framework canalso be useful for kernel density estimation (KDE) wheremultivariate data is often prescaled to unit variance so thesame bandwidth can be used in each dimension [36 37]Thus

6 The Scientific World Journal

ldquonormalizingrdquo the Lambert Way can also improve KDE forheavy-tailed data (see also [38 39])

Remark 6 (generalized transformation) Transformation (2)can be generalized to

119885 = 119880 exp(1205752

(1198802)

120572

) 120572 gt 0 (10)

The inner term 1198802 guarantees bijectivity for all 120572 gt 0 The

inverse is

119882120575120572 (

119911) = sgn (119911)(119882(120572120575 (119911

2)

120572

)

120572120575

)

12120572

(11)

For comparison with Tukeyrsquos ℎ I consider 120572 = 1 onlyFor 120572 = 12 transformation (10) is closely related to skewedLambert W times 119865

119883distributions

24 Distribution and Density For ease of notation let

119911 =

119910 minus 120583119883

120590119883

119906 = 119882120575 (119911) 119909 = 119882

120591(119910) = 119906120590

119883+ 120583

119883 (12)

Theorem 7 The cdf and pdf of a location-scale heavy tailLambert W times 119865

119883random variable 119884 equal

119866119884(119910 | 120573 120575) = 119865

119883(119882

120575(

119910 minus 120583119909

120590119909

)120590119883+ 120583

119883| 120573) (13)

119892119884(119910 | 120573 120575)

= 119891119883(119882

120575(

119910 minus 120583119883

120590119883

)120590119883+ 120583

119883| 120573)

sdot

119882120575((119910 minus 120583

119883) 120590

119883)

((119910 minus 120583119883) 120590

119883) [1 +119882(120575 ((119910 minus 120583

119883) 120590

119883)2)]

(14)

Clearly 119866119884(119910 | 120573 120575 = 0) = 119865

119883(119910 | 120573) and 119892

119884(119910 | 120573 120575 =

0) = 119891119883(119910 | 120573) since lim

120575rarr0119882120575(119911) = 119911 and lim

120575rarr0119882(120575119911

2) =

0 for all 119911 isin RFor scale family or noncentral nonscale input set 120583

119883= 0

or 120583119883= 0 120590

119883= 1

The explicit formula (14) allows a fast computation andtheoretical analysis of the likelihood which is essential forstatistical inference Detailed properties of (14) are given inSection 41

Figure 4 shows (13) and (14) for various 120575 ge 0 with fourdifferent input distributions for 120575 = ℎ = 0 the input equalsthe output (solid black) for larger 120575 the tails of 119866

119884(119910 | 120579) and

119892119884(119910 | 120579) get heavier (dashed colored)

25 Quantile Function Quantile fitting has been the standardtechnique to estimate 120583

119883120590

119883 and 120575 of Tukeyrsquos ℎ In particular

the medians of 119884 and 119883 are equal Thus for symmetriclocation-scale family input the sample median of y is a robust

estimate of 120583119883for any 120575 ge 0 (see also Section 5) General

quantiles can be computed via [17]

119910120572= 119906

120572exp(120575

2

1199062

120572)120590

119883+ 120583

119883 (15)

where 119906120572= 119882

120575(119911120572) are the 120572-quantiles of 119865

119880(119906)

3 Tukeyrsquos ℎ Distribution Gaussian Input

For Gaussian input Lambert W times 119865119883equals Tukeyrsquos ℎ which

has been studied extensively Dutta and Babbel [40] show that

E119885119899=

0 if 119899 is odd 119899 lt 1120575

119899 (1 minus 119899120575)minus(119899+1)2

21198992(1198992)

if 119899 is even 119899 lt 1120575

∄ if 119899 is odd 119899 gt 1120575

infin if 119899 is even 119899 gt 1120575

(16)

which in particular implies that [28]

E119885 = E1198853= 0 if 120575 lt 1 and 1

3

respectively (17)

E1198852=

1

(1 minus 2120575)32 if 120575 lt 1

2

E1198854= 3

1

(1 minus 4120575)52 if 120575 lt 1

4

(18)

Thus the kurtosis of 119884 equals (see Figure 5)

1205742 (120575) = 3

(1 minus 2120575)3

(1 minus 4120575)52

for 120575 lt 14 (19)

For 120575 = 0 (18) and (19) reduce to the familiarGaussian valuesExpanding (19) around 120575 = 0 yields

1205742 (120575) = 3 + 12120575 + 66120575

2+ O (120575

3) (20)

Dropping O(1205753) and solving (20) gives a rule of thumbestimate

120575Taylor =

1

66

[radic66 1205742(y) minus 162 minus 6]

+

(21)

where 1205742(y) is the sample kurtosis and [119886]

+= max(119886 0) that

is 120575Taylor gt 0 if 1205742(y) gt 3 otherwise set 120575Taylor = 0

Corollary 8 The cdf of Tukeyrsquos ℎ equals

119866119884(119910 | 120583

119883 120590119883 120575) = Φ(

119882120591(119910) minus 120583

119883

120590119883

) (22)

whereΦ(119906) is the cdf of a standard NormalThe pdf equals (for120575 gt 0)

119892119884(119910 | 120583

119883 120590119883 120575) =

1

radic2120587

exp(minus1 + 1205752

119882120575(

119910 minus 120583119883

120590119883

)

2

)

sdot

1

1 +119882(120575 ((119910 minus 120583119883) 120590

119883)2)

(23)

The Scientific World Journal 7

000 005 010 015 020 02510

15

20

25

Varia

nce

Studentrsquos t

120575 or 1

Heavy tail Lambert W times Gaussian

(a) Variance

000 005 010 0153

4

5

6

7

8

9

10

Kurt

osis

120575 or 1

Studentrsquos tHeavy tail Lambert W times Gaussian

(b) Kurtosis

Figure 5 Comparing moments of Lambert W times Gaussian and Studentrsquos 119905

Proof Take119883 simN(120583119883 120590

2

119883) in Theorem 7

Section 41 studies functional properties of (23) in moredetail

31 Tukeyrsquos ℎ versus Studentrsquos 119905 Studentrsquos 119905]-distribution with] degrees of freedom is often used to model heavy-tailed data[41 42] as its tail index equals ] Thus the 119899th moment of aStudentrsquos 119905 random variable 119879 exists if 119899 lt ] In particular

E119879 = E1198793= 0 if ] gt 1 or gt 3

E1198792=

]] minus 2

=

1

1 minus (2])if ] gt 2

(24)

and kurtosis

1205742 (]) = 3

] minus 2] minus 4

= 3

1 minus 2 (1])1 minus 4 (1])

if ] gt 4 (25)

Comparing (24) and (25) with (18) and (19) shows anatural association between 1] and 120575 and a close similaritybetween the first four moments of Studentrsquos 119905 and Tukeyrsquosℎ (Figure 5) By continuity and monotonicity the first fourmoments of a location-scale 119905-distribution can always beexactly matched by a corresponding location-scale LambertW times Gaussian Thus if Studentrsquos 119905 is used to model heavytails and not as the true distribution of a test statisticit might be worthwhile to also fit heavy tail Lambert Wtimes Gaussian distributions for an equally valuable ldquosecondopinionrdquo For example a parallel analysis on SampP 500 log-returns in Section 62 leads to divergent inference regardingthe existence of fourth moments

4 Parameter Estimation

Due to the lack of a closed form pdf of 119884 120579 = (120573 120575) hastypically been estimated by matching quantiles or a methodof moments estimator [4 26 28] These methods can now

be replaced by the fast and usually efficient maximumlikelihood estimator (MLE) Rayner and MacGillivray [43]introduce a numerical MLE procedure based on quantilefunctions but they conclude that ldquosample sizes significantlylarger than 100 should be used to obtain reliable estimatesrdquoSimulations in Section 5 show that the MLE using the closedform Lambert W times 119865

119883distribution converges quickly and is

accurate even for sample sizes as small as119873 = 10

41 Maximum Likelihood Estimation (MLE) For an iidsample (119910

1 119910

119873) = y sim 119892

119884(119910 | 120573 120575) the log-likelihood

function equals

ℓ (120579 y) =119873

sum

119894=1

log119892119884(119910

119894| 120573 120575) (26)

The MLE is that 120579 = (120573 120575) which maximizes (26) that is

120579MLE = (

120573 120575)MLE

= argmax120573120575ℓ (120573 120575 y) (27)

Since 119892119884(119910119894| 120573 120575) is a function of 119891

119883(119909119894| 120573) the MLE

depends on the specification of the input density Equation(26) can be decomposed as

ℓ (120573 120575 y) = ℓ (120573 x120591) +R (120591 y) (28)

where

ℓ (120573 x120591) =

119873

sum

119894=1

log119891119883(119882

120575(

119910119894minus 120583

119883

120590119883

)120590119883+ 120583

119883| 120573)

=

119873

sum

119894=1

log119891119883(119909

120591119894| 120573)

(29)

is the log-likelihood of the back-transformed data x120591=

(1199091205911 119909

120591119873) (via (8)) and

R (120591 y) =119899

sum

119894=1

log119877 (120583119883 120590119883 120575 119910

119894) (30)

8 The Scientific World Journal

y

01

015 02 025 03 035 04

045 05

055

06 065 07 075 08 085 09 095

1

00 05 10 1500

05

10

15

20

25

30

120575

R(120575 y | 120583 = 0 120590 = 1)

(a) Penalty 119877(120583119883 120590119883 120575 119910119894)(31) as a function of120575 and 119910 (120583119883 = 0 and 120590119883 = 1)

0 200 400 600 800 1000

minus10

0

10

20

30

40

Index00 05 10 15

minus3000

minus2000

minus1000

0

Log-

likel

ihoo

d (a

nd p

enal

ty)

Penalty

Input loglikOutput loglik

120575

120575trueMLE

(b) Sample z of Lambert W times Gaussian with 120575 = 13 (left) log-likelihood ℓ(120579 y) (solid black)decomposes in input log-likelihood (dotted green) and penalty (dashed red)

Figure 6 Log-likelihood decomposition for Lambert W times 119865119883distributions

where

119877 (120583119883 120590119883 120575 119910

119894)

=

119882120575((119910

119894minus 120583

119883) 120590

119883)

((119910119894minus 120583

119883) 120590

119883) [1 + 120575 (119882

120575((119910

119894minus 120583

119883) 120590

119883))2]

(31)

Note that 119877(120583119883 120590119883 120575 119910

119894) only depends on 120583

119883(120573) and 120590

119883(120573)

(and 120575) but not necessarily on every coordinate of 120573Decomposition (28) shows the difference between the

exact MLE (120573 120575) based on y and the approximate MLE 120573x120591based on x

120591alone if we knew 120591 = (120583

119883 120590119883 120575) beforehand

then we could back-transform y to x120591and estimate 120573x120591 from

x120591(maximize (29) with respect to 120573) In practice however

120591 must also be estimated and this enters the likelihood viathe additive term R(120591 y) A little calculation shows that forany 119910

119894isin R log119877(120583

119883 120590119883 120575 119910

119894) le 0 if 120575 ge 0 with equality

if and only if 120575 = 0 Thus R(120591 y) can be interpreted as apenalty for transforming the data Maximizing (28) faces atrade-off between transforming the data to follow 119891

119883(119909 | 120573)

(and increasing ℓ(120573 x120591)) and the penalty of a more extreme

transformation (and decreasingR(120591 y)) see Figure 6(b)Figure 6(a) shows a contour plot of 119877(120583

119883= 0 120590

119883=

1 120575 119910) as a function of 120575 and 119910 = 119911 it increases (in absolutevalue) either if 120575 gets larger (for fixed 119910) or for larger 119910 (forfixed 120575) In both cases increasing 120575 makes the transformeddata119882

120575(119911) get closer to 0 = 120583

119883 which in turn increases its

input likelihood For 120575 = 0 the penalty disappears since inputequals output for 119910 = 0 there is no penalty since119882

120575(0) = 0

for all 120575Figure 6(b) shows an iid sample (119873 = 1000) z sim

Lambert W times Gaussian with 120575 = 13 and the decompositionof the log-likelihood as in (28) Since 120573 = (0 1) is knownthe likelihood and penalty are only functions of 120575Theorem 9

shows that the convexity of the penalty (decreasing red)and concavity of the input likelihood (increasing green) asa function of 120575 holds true in general for any data z and theirsum (solid black) has a unique maximum here 120575MLE = 037(blue dashed vertical line) See Theorem 9 below for details

The maximization of (28) can be carried out numericallyHere I show existence and uniqueness of 120575MLE assuming that120583119883and 120590

119883are known Further theoretical results for 120579MLE

remain for futureworkGiven the ldquonicerdquo formof119892119884(119910) which

is continuous twice differentiable (under the assumption that119891119909(sdot) is twice differentiable) the MLE for 120579 = (120573 120575) shouldhave the usual optimality properties such as being consistentand efficient [44]

411 Properties of the MLE for the Heavy Tail Parameter 120575Without loss of generality let 120583

119883= 0 and 120590

119883= 1 In this case

ℓ (120575 z) prop minus

1

2

119873

sum

119894=1

[119882120575(119911119894)]2+

119873

sum

119894=1

log119882120575(119911119894)

119911119894

minus

119873

sum

119894=1

log (1 + 120575 [119882120575(119911119894)]2)

(32)

= minus

1 + 120575

2

119873

sum

119894=1

[119882120575(119911119894)]2minus

119873

sum

119894=1

log (1 + 120575 [119882120575(119911119894)]2)

(33)

Theorem 9 (see [14]) Let 119885 have a Lambert W times Gaussiandistribution where 120583

119883= 0 and 120590

119883= 1 are assumed to be

known and fixed Also consider only 120575 isin [0infin) (While forsome samples z the MLE also exists for 120575 lt 0 it cannot beguaranteed for all z If 120575 lt 0 (and 119911 = 0) then119882

120575(119911) is either

The Scientific World Journal 9

not unique in R (principal and nonprincipal branch) or doesnot have a real-valued solution in R if 1205751199112 lt 119890minus1)

(a) If

sum119899

119894=11199114

119894

sum119899

119894=11199112

119894

le 3 (34)

then 120575119872119871119864

= 0If (34) does not hold then

(b) 120575119872119871119864

gt 0 exists and is a positive solution to

119873

sum

119894=1

1199112

119894119882

1015840(120575119911

2

119894)(

1

2

119882120575(119911119894)2minus (

1

2

+

1

1 +119882(1205751199112

119894)

)) = 0 (35)

(c) There is only one such 120575 satisfying (35) that is 120575119872119871119864

isunique

Proof See Appendix B in the supplementary material

Condition (34) says that 120575MLE gt 0 only if the datais heavy-tailed enough Points (b) and (c) guarantee thatthere is no ambiguity in the heavy tail estimate This is anadvantage over Studentrsquos 119905-distribution for example whichhas numerical problems and local maxima for unknown (andsmall) ] [45 46] On the contrary 120575MLE is always a globalmaximum

Given the heavy tails in z one might expect convergenceissues for larger 120575 However 119882

120575(119885) sim N(0 1) for the true

120575 ge 0 and close to a standard Gaussian if 120575MLE asymp 120575 Since thelog-likelihood and its gradient depend on 120575 and z only via119882120575(z) the performance of theMLE should thus not get worse

for large 120575 as long as the initial estimate is close enough to thetruth Simulations in Section 5 support this conjecture evenfor 120579MLE

42 Iterative Generalized Method of Moments (IGMM) Adisadvantage of the MLE is the mandatory a priori specifi-cation of the input distribution Especially for heavy-taileddata the eye is a bad judge to choose a particular parametric119891119883(119909 | 120573) It would be useful to directly estimate 120591 without

the intermediate step of estimating 120579 firstGoerg [16] presented an estimator for 120591 based on iterative

generalized methods of moments (IGMM) The idea ofIGMM is to find a 120591 such that the back-transformed datax120591has desired properties for example being symmetric or

having kurtosis 3 An estimator for 120583119883 120590

119883 and 120575 can be

constructed entirely analogous to the IGMMestimator for theskewed LambertWtimes F case See the SupplementaryMaterialAppendix C for details

IGMM requires less specific knowledge about the inputdistribution and is usually also faster than the MLE Once120591IGMM has been obtained the back-transformed x

120591IGMMcan be

used to check if 119883 has characteristics of a known parametricdistribution 119865

119883(119909 | 120573) It must be noted though that testing

for a particular distribution 119865119883is too optimistic as x

120591has

ldquonicerrdquo properties regarding 119865119883than the true x would have

However estimating the transformation requires only threeparameters and for a large enough sample losing threedegrees of freedom should not matter in practice

5 Simulations

This section explores finite sample properties of estimatorsfor 120579 = (120583

119883 120590119883 120575) and (120583

119884 120590119884) under Gaussian input

119883 sim N(120583119883 120590

2

119883) In particular it compares Gaussian MLE

(estimation of 120583119884and 120590

119884only) IGMM and Lambert W times

Gaussian MLE and for a heavy tail competitor the median(For IGMM optimization was restricted to 120575 isin [0 10]) Allresults below are based on 119899 = 1 000 replications

51 Estimating 120575 Only Here I show finite sample propertiesof 120575MLE for119880 simN(0 1) where 120583

119883= 0 and 120590

119883= 1 are known

and fixed Theorem 9 shows that 120575MLE is unique either at theboundary 120575 = 0 or at the globally optimal solution to (35)Results in Table 1 were obtained by numerical optimizationrestricted to 120575 ge 0 (hArr log 120575 isin R) using the nlm function in119877

Table 1 suggests that the MLE is asymptotically unbiasedfor every 120575 and converges quickly (at about 119873 = 400) to itsasymptotic variance which is increasing with 120575 Assuming120583119883and 120590

119883to be known is unrealistic and thus these finite

sample properties are only an indication of the behavior of thejoint MLE 120579MLE Nevertheless they are very remarkable forextremely heavy-tailed data (120575 gt 1) where classic average-based methods typically break down One reason lies in theparticular form of the likelihood (32) and its gradient (35)(Theorem 9) although both depend on z they only do sothrough 119882

120575(z) = u sim N(0 1) Hence as long as 120575MLE is

sufficiently close to the true 120575 (32) and (35) are functions ofalmost Gaussian random variables and standard asymptoticresults should still apply

52 Estimating All Parameters Jointly Here we consider therealistic scenario where 120583

119883and 120590

119883are also unknown We

consider various sample sizes (119873 = 50 100 and 1000) anddifferent degrees of heavy tails 120575 isin 0 13 1 15 each onerepresenting a particularly interesting situation (i) Gaussiandata (does additional superfluous estimation of120575 affect otherestimates) (ii) fourthmoments do not exist (iii) nonexistingmean and (iv) extremely heavy-tailed data (can we get usefulestimates at all)

The convergence tolerance for IGMM was set to tol =122 sdot 10

minus4 Table 2 summarizes the simulationThe Gaussian MLE estimates 120590

119884directly while IGMM

and the Lambert W times Gaussian MLE estimate 120575 and120590119883 which implicitly give

119884through 120590

119884(120575 120590

119883) = 120590

119883sdot

(1radic(1 minus 2120575)32) if 120575 lt 12 (see (18)) For a fair comparison

each subtable also includes a column for 119884

= 119883sdot

(1radic(1 minus 2120575)32) Some of these entries contain ldquoinfinrdquo even for

120575 lt 12 this occurs if at least one 120575 ge 12 For any 120575 lt 1120583119883= 120583

119884 thus 120583

119883and 120583

119884can be compared directly For 120575 ge 1

10 The Scientific World Journal

Table 1 Finite sample properties of 120575MLE For each119873 120575 was estimated 119899 = 1000 times from a random sample zsimTukeyrsquos ℎ The left columnfor each 120575 shows bias 120575MLE minus 120575 each right column shows the root mean square error (RMSE) timesradic119873

119873 120575 = 0 120575 = 110 120575 = 13 120575 = 12

10 0025 0191 minus0017 0394 minus0042 0915 minus0082 1167

50 0013 0187 minus0010 0492 minus0018 0931 minus0016 1156

100 0010 0200 minus0010 0513 minus0009 0914 minus0006 1225

400 0005 0186 minus0003 0528 0000 0927 minus0004 1211

1000 0003 0197 0000 0532 minus0001 0928 minus0001 1203

2000 0003 0217 minus0001 0523 0000 0935 minus0001 1130

119873 120575 = 1 120575 = 2 120575 = 5

10 minus0054 1987 minus0104 3384 minus0050 7601

50 minus0017 1948 minus0009 3529 0014 7942

100 minus0014 2024 minus0001 3294 0011 7798

400 0001 1919 minus0002 3433 0001 7855

1000 0001 1955 0001 3553 minus0001 7409

2000 0001 1896 0000 3508 minus0001 7578

the mean does not exist each subtable for these 120575 interprets120583119884as the median

Gaussian Data (120575 = 0) This setting checks if imposing theLambert W framework even though it is superfluous causesa quality loss in the estimation of 120583

119884= 120583

119883or 120590

119884= 120590

119883

Furthermore critical values for 1198670 120575 = 0 (Gaussian) can

be obtained As in the 120575-only case above Table 2(a) suggeststhat estimators are asymptotically unbiased and quickly tendto a large-sample variance Additional estimation of 120575 doesnot affect the efficiency of120583

119883compared to estimating solely120583

Estimating 120590119884directly by Gaussian MLE does not give better

results than the Lambert W times Gaussian MLE

No Fourth Moment (120575 = 13) Here 120590119884(120575 120590

119883= 1) =

228 but fourth moments do not exist This results in anincreasing empirical standard deviation of

119910as119873 grows In

contrast estimates for 120590119883are not drifting off In the presence

of these large heavy tails themedian ismuch less variable thanGaussianMLE and IGMM Yet Lambert W timesGaussianMLEfor 120583

119883even outperforms the median

Nonexisting Mean (120575 = 1) Both sample moments divergeand their standard errors are also growing quickly Themedian still provides a very good estimate for the locationbut is again inferior to both Lambert W estimators whichare closer to the true values and appear to converge to anasymptotic variance at rateradic119873

Extreme Heavy Tails (120575 = 15) As in Section 51 IGMMand Lambert WMLE continue to provide excellent estimateseven though the data is extremely heavy tailed MoreoverLambert W MLE also has the smallest empirical standarddeviation overall In particular the Lambert W MLE for 120583

119883

has an approximately 15 lower standard deviation than themedian

The last column shows that for some 119873 about 1 of the119899 = 1 000 simulations generated invalid likelihood values(NA and infin) Here the search for the optimal 120575 led into

regions with a numerical overflow in the evaluation of119882120575(119911)

For a comparable summary these few cases were omitted andnew simulations added until full 119899 = 1 000 finite estimateswere found Since this only happened in 1 of the cases andalso such heavy-tailed data is rarely encountered in practicethis numerical issue is not a limitation in statistical practice

53 Discussion of the Simulations IGMM performs wellindependent of the magnitude of 120575 As expected the LambertWMLE for 120579 has the best properties it can recover the truthfor all 120575 and for 120575 = 0 it performs as well as the classicsample mean and standard deviation For small 120575 it has thesame empirical standard deviation as the Gaussian MLE buta lower one than the median for large 120575

Hence the only advantage of estimating 120583119884and 120590

119884by

sample moments of y is speed otherwise the Lambert W times

Gaussian MLE is at least as good as the Gaussian MLE andclearly outperforms it in presence of heavy tails

6 Applications

Tukeyrsquos ℎ distribution has already proven useful to modelheavy-tailed data but parametric inference was limited toquantile fitting ormethods ofmoments estimation [4 27 28]Theorem 7 allows us to estimate 120579 by ML

This section applies the presented methodology on sim-ulated as well as real world data (i) Section 61 demonstratesGaussianizing on the Cauchy sample from the Introductionand (ii) Section 62 shows that heavy tail Lambert W times

Gaussian distributions provide an excellent fit to daily SampP500 log-return series

61 Estimating Location of a Cauchy with the Sample MeanIt is well known that the sample mean y is a poor estimateof the location parameter of a Cauchy distribution since thesampling distribution of y is again a Cauchy (see [47] for arecent overview) in particular its variance does not go to 0for119873 rarr infin

The Scientific World Journal 11

Table 2 In each subtable (first rows) average (middle rows) proportion of estimates below truth and (bottom rows) empirical standarddeviation timesradic119873 True 120583

119883= 0 and 120590

119883= 1

(a) Truly Gaussian data 120575 = 0

120575 = 0 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 098 000 097 002 099 000 096 002 098 0

100 000 000 099 000 098 001 100 000 097 001 099 0

1000 000 000 100 000 099 000 100 000 099 000 100 0

50 050 050 057 051 060 066 054 051 065 066 056 0

100 050 051 056 051 062 062 053 052 065 062 056 0

1000 050 049 052 049 062 056 052 049 063 056 052 0

50 124 101 072 101 076 021 073 102 078 026 072 0

100 125 102 070 102 076 023 070 103 078 026 070 0

1000 126 098 073 098 079 022 073 098 079 022 073 0

(b) No fourth moments 120575 = 13

120575 = 13 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 198 000 107 029 infin 000 101 033 infin 0

100 000 000 203 000 104 031 infin 000 100 033 infin 0

1000 000 000 218 000 100 033 234 000 100 033 234 0

50 050 051 078 050 038 063 060 050 052 054 054 0

100 050 051 078 051 042 061 060 050 051 054 054 0

1000 048 051 077 051 047 056 055 051 050 053 052 0

50 127 221 656 144 145 110 NA 123 135 114 NA 0

100 130 233 1128 143 142 112 NA 119 134 109 NA 0

1000 123 225 1676 139 145 120 1597 117 133 108 1230 0

(c) Non-existing mean 120575 = 1

120575 = 1 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 minus010 246 minus001 118 090 infin 000 101 099 infin 0

100 000 074 724 000 109 095 infin 000 101 099 infin 0

1000 000 384 3481 000 101 100 infin 000 100 100 infin 0

50 053 052 10 051 034 065 1 051 052 052 1 0

100 050 052 10 051 038 063 1 050 053 053 1 0

1000 049 052 10 051 048 053 1 049 051 051 1 0

50 127 6585 4243 210 250 232 NA 119 170 216 NA 0

100 130 41075 40502 201 228 259 NA 117 174 225 NA 0

1000 126 330758 1040527 193 221 281 NA 111 164 218 NA 0

(d) Extreme heavy tails 120575 = 15

120575 = 15 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 minus002 684 309 minus002 123 137 infin minus001 100 149 infin 001

100 000 minus5116 3080 minus001 112 144 infin 000 101 150 infin 000

1000 000 17613 14251 000 101 149 infin 000 100 150 infin 000

50 053 048 1 051 034 064 1 053 053 054 1 001

100 051 053 1 054 037 061 1 052 051 051 1 000

1000 050 050 1 050 047 054 1 049 053 052 1 000

50 132 134771 9261 257 320 312 NA 115 186 276 NA 001

100 133 4215628 418435 239 287 344 NA 112 178 284 NA 000

1000 126 12446282 3903629 218 266 367 NA 111 180 285 NA 000

12 The Scientific World Journal

Table 3 Summary statistics for observed (heavy-tailed) y and back-transformed (Gaussianized) data x120591MLE

lowastlowast stands for lt10minus16and lowast forlt22 sdot 10

minus16

y sim C(0 1)

(Section 61)y = SampP 500(Section 62)

y x120591

x

y x120591

Min minus16159 minus316 0 minus711 minus242Max 95295 381 3318 499 223Mean 230 003 1498 005 005Median 004 004 1496 004 004Standard Deviation 46980 106 120 095 071Skewness 1743 012 390 minus030 minus004Kurtosis 34334 321 16175 770 293Shapiro-Wilk lowast 071 lowastlowast lowast 024Anderson-Darling lowastlowast 051 lowastlowast lowast 018

Heavy-tailed Lambert W times Gaussian distributions havesimilar properties to a Cauchy for 120575 asymp 1 The mean of119883 (120583

119883)

equals the location of119884 (119888) due to symmetry around 120583119883and 119888

respectively Thus we can estimate 120591 from the Cauchy sampley transform y to x

120591 get 120583

119883from x

120591= 119882

120591(y) and thus obtain

an estimate for 119888 by 119888 = 120583119883

The random sample y sim C(0 1) with pdf 119891(119910) = 1120587(1+1199102) in Figure 2(a) has heavy tails with two extreme (positive)

observations A Cauchy ML fit gives 119888 = 003(0055) and119904 = 086(0053) (standard errors in parenthesis) A LambertWtimesGaussianMLEgives120583

119883= 003(0055)120590

119883= 105(0072)

and 120575 = 086(0082) Thus both fits correctly fail to reject120583119883= 119888 = 0 Table 3(a) shows summary statistics on both

samples Since the Cauchy distribution does not have a well-defined mean y = 2304(2101) is not meaningful Howeverx120591MLE

is approximately Gaussian and we can use the sampleaverage for inference x

120591MLE= 0033(00472) correctly fails to

reject a zero location for y The transformed x120591MLE

featuresadditional Gaussian characteristics (symmetric no excesskurtosis) and even the null hypothesis of Normality cannotbe rejected (119875-valuege 05) Note however that Normality forthe transformed data is only an empirical approximation therandom variable119882

120591((119884 minus 120583

119883)120590

119883) where 119884 is Cauchy does

not have a Normal distributionFigure 2(d) shows the cumulative sample average for

the original sample and its Gaussianized version For afair comparison 120591(119899)MLE was reestimated cumulatively for each119899 = 5 500 and then used to compute (119909

1 119909

119899) The

transformation works extremely well data point 11991049is highly

influential for y but has no relevant effect on x120591(119899)

MLE Even for

small 119899 it is already clear that the location of the underlyingCauchy distribution is approximately zero

Although it is a simulated example it demonstrates thatremoving (strong) heavy tails from data works well andprovides ldquonicerdquo data that can then be analyzed with morerefined Gaussian methods

62 Heavy Tails in Finance SampP 500 Case Study A lot offinancial data displays negative skewness and excess kurto-sis Since financial data in general is not iid it is oftenmodeled with a (skew) Studentrsquos 119905-distribution underlyinga (generalized) autoregressive conditional heteroskedastic(GARCH) [2 48] or a stochastic volatility (SV) [49 50]model Using the Lambert W approach we can build uponthe knowledge and implications of Normality (and avoidderiving properties of a GARCH or SV model with heavy-tailed innovations) and simply ldquoGaussianizerdquo the returnsbefore fitting more complex GARCH or SV models

Remark 10 Time series models with Lambert W times Gaussianwhite noise are far beyond the scope of this work but canbe a direction of future research Here I only consider theunconditional distribution

Figure 7(a) shows the SampP 500 log-returns with a totalof 119873 = 2 780 daily observations (R package MASSdataset SP500) Table 3(b) confirms the heavy tails (samplekurtosis 770) but also indicates negative skewness (minus0296)As the sample skewness 120574

1(y) is very sensitive to outliers

we fit a distribution which allows skewness and test forsymmetry In case of the double-tail Lambert W times Gaussianthis means testing 119867

0 120575

ℓ= 120575

119903= 120575 versus 119867

1

120575ℓ

= 120575119903 Using the likelihood expression in (28) we can

use a likelihood ratio test with one degree of freedom (3versus 4 parameters) The log-likelihood of the double-tail fit(Table 4(a)) equals minus36060 = minus297227 + (minus63373) (inputlog-likelihood + penalty) while the symmetric 120575 fit givesminus360656 = minus297147 + (minus63509) Here the symmetric fitgives a transformed sample that is more Gaussian but it paysa greater penalty for transforming the data Comparing twicetheir difference to a1205942

1distribution gives a119875-value of 029 For

comparison a skew-119905 fit [51] with location 119888 scale 119904 shape120572 and ] degrees of freedom also yields (Function stmle

in the R package sn) a nonsignificant (Table 4(b)) Thusboth fits cannot reject symmetry

The Scientific World Journal 13

Time

y

0 500 1500 2500

minus6

minus2

2

0 5 2010 3000

04

08

Lag

ACF

y

minus5 0 500

03

06

KernelGaussian

minus3 minus2 minus1 0 1 2 3

minus6

minus2

2

Theoretical quantiles

Sam

ple q

uant

ilesMean 005

SD 095Skewminus03Kurt 771

(a) Observed heavy tail returns y

Time

x

0 500 1500 2500

minus2

012

0 5 10 20 3000

04

08

Lag

ACF

x

00

03

06

minus3 minus2 minus1 0 1 2 3minus3 minus2 minus1 0 1 2 3

minus2

012

Theoretical quantiles

KernelGaussian

Sam

ple q

uant

iles

Mean 005SD 07Skewminus004Kurt 293

(b) Gaussianized returns x120591MLE

Figure 7 Lambert W Gaussianization of SampP 500 log-returns 120591 = (005 070 017) In (a) and (b) data (top left) autocorrelation function(ACF) (top right) histogram Gaussian fit and KDE (bottom left) Normal QQ plot (bottom right)

Assume we have to make a decision if we should trade acertificate replicating the SampP 500 Since we can either buyor sell it is not important if the average return is positive ornegative as long as it is significantly different from zero

621 Gaussian Fit to Returns Estimating (120583119884 120590119884) by Gaus-

sianMLE and thus ignoring the heavy tails 120583119884= 0 cannot be

rejected on a 120572 = 1 level (Table 4(e)) Ignoring heavy tailswe would thus decide to not trade a replicating certificate atall

622 Heavy Tail Fit to Returns Both a heavy tail LambertWtimes Gaussian (Table 4(c)) and Student 119905-fit (Table 4(d)) rejectthe zero mean null (119875-values 10minus4 and 3 sdot 10minus5 resp)

Location and scale estimates are almost identical buttail estimates lead to different conclusions while for ] =

371 only moments up to order 3 exist in the Lambert W times

Gaussian case moments up to order 5 exist (10172 = 581)This is especially noteworthy as many theoretical results inthe (financial) time series literature rely on finite fourthmoments [52 53] consequently many empirical studies testthis assumption [7 54] Here Studentrsquos 119905 and a LambertW times Gaussian fit give different conclusions Since previousempirical studies often use Studentrsquos 119905 as a baseline [41] itmight be worthwhile to reexamine their findings in light ofheavy tail Lambert W times Gaussian distributions

623 ldquoGaussianizingrdquo Financial Returns The back-trans-formed x

120591MLEis indistinguishable from a Gaussian sample

(Figure 7(b)) and thus demonstrates that a Lambert W times

Gaussian distribution is indeed appropriate for x120591MLE

Noteven one test can reject Normality 119875-values are 018 018031 and 024 respectively (Anderson-Darling Cramer-von-Mises Shapiro-Francia Shapiro-Wilk see Thode [55])Table 3(b) confirms that Lambert W ldquoGaussianiziationrdquo was

successful 1205741(x120591) = minus0039 and 120574

2(x120591) = 293 are within the

typical variation for aGaussian sample of size119873 = 2780Thus

119884 = (119880119890(01722)119880

2

) 0705 + 0055 119880 =

119883 minus 0055

0705

119880 simN (0 1)

(36)

is an adequate (unconditional) LambertWtimesGaussianmodelfor the SampP 500 log-returns y For trading thismeans that theexpected return is significantly larger than zero (120583

119883= 0055 gt

0) and thus replicating certificates should be bought

624 Gaussian MLE for Gaussianized Data For 120575119897= 120575

119903equiv

120575 lt 1 also 120583119883equiv 120583

119884 We can therefore replace testing

120583119910= 0 versus 120583

119910= 0 for a non-Gaussian y with the

very well-understood hypothesis test 120583119909= 0 versus 120583

119909= 0

for the Gaussian x120591MLE

In particular standard errors basedon radic119873 and thus t and 119875-values should be closer to theldquotruthrdquo (Tables 4(c) and 4(d)) than a Gaussian MLE on thenon-Gaussian y (Table 4(e)) Table 4(f) shows that standarderrors for 120583x are even a bit too small compared to the heavy-tailed versions Since the ldquoGaussianizingrdquo transformation wasestimated treating x

120591MLEas if it was original data is too

optimistic regarding its Normality (recall the penalty (30) inthe total likelihood (28))

This example confirms that if a model and its theoret-ical properties are based on Normality but the observeddata is heavy-tailed then Gaussianizing the data first givesmore reliable inference than applying Gaussian methods tothe original heavy-tailed data (Figure 1) Clearly a jointestimation of the model parameters based on Lambert Wtimes Gaussian random variables (or any other heavy-taileddistribution) would be optimal However theoretical prop-erties and estimation techniques may not be available orwell understood The Lambert Way to Gaussianize data is

14 The Scientific World Journal

Table 4 MLE fits to SampP 500 y Tables 4(a) 4(b) 4(c) 4(d) and 4(e)and the Gaussianized data x

120591MLETable 4(f)

(a) Double-tail Lambert W times Gaussian = Tukeyrsquos ℎℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 366 000120590119883

071 0016 4400 000120575ℓ

019 0021 899 000120575119903

016 0019 824 000

(b) Skew 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 010 0061 165 010119904 067 0017 3847 000120572 minus008 0101 minus077 044] 373 0297 1257 000

(c) Lambert W times Gaussian = Tukeyrsquos ℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 365 0000120590119883

071 0016 4395 0000120575 017 0016 1105 0000

(d) Studentrsquos 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 006 0015 365 000119904 067 0017 3951 000] 372 0295 1261 000

(e) Gaussian (SampP 500)

Est se 119905 Pr (gt|119905|)120583119884

005 0018 255 001120590119884

095 0013 7457 000

(f) Gaussian (x120591MLE )

Est se 119905 Pr (gt|119905|)120583119909120591

005 0013 381 000120590119909120591

071 0009 7457 000

thus a pragmatic method to improve statistical inferenceon heavy-tailed data while preserving the applicability andinterpretation of Gaussian models

7 Discussion and Outlook

In this work I use the Lambert W function to modeland remove heavy tails from continuous random variablesusing a data-transformation approach For Gaussian randomvariables this not only contributes to existing work onTukeyrsquos ℎ distribution but also gives convincing empiricalresults unimodal data with heavy tails can be transformedto Gaussian data Properties of a Gaussian model MN onthe back-transformed data mimic the features of the ldquotruerdquoheavy-tailed modelM

119866very closely

Since Normality is the single most typical and oftenrequired assumption in many areas of statistics machine

learning and signal processing future research can takemany directions From a theoretical perspective properties ofLambert W times 119865

119883distributions viewed as a generalization of

alreadywell-known distributions119865119883can be studiedThis area

will profit from existing literature on the Lambert W func-tion which has been discovered only recently by the statisticscommunity Empirical work can focus on transforming thedata and comparing approximate Gaussian with joint heavytail analyses The comparisons in this work showed thatapproximate inference for Gaussianized data is comparablewith the direct heavy tail modeling and so provides a simpletool to improve inference for heavy-tailed data in statisticalpractice

I also provide the R package Lambert W publicly avail-able at CRAN to facilitate the use of heavy tail Lambert Wtimes 119865

119883distributions in practice

Disclosure

The first version of this article entitled ldquoClosed-form cdfand pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert Wapproach and how to bijectively lsquoGaussianizersquo heavy-taileddatardquo has appeared online [56]This research was performedwhile the author was at Carnegie Mellon University Cur-rently Georg M Goerg works at Google Inc New York NY10011 USA

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

References

[1] P J Brockwell andR ADavisTime SeriesTheory andMethodsSpringer Series in Statistics 1998

[2] R F Engle ldquoAutoregressive conditional heteroscedasticity withestimates of the variance of United Kingdom inflationrdquo Econo-metrica vol 50 no 4 pp 987ndash1007 1982

[3] C W J Granger and R Joyeux ldquoAn introduction to long-memory time series models and fractional differencingrdquo Jour-nal of Time Series Analysis vol 1 pp 15ndash30 2001

[4] C A Field ldquoUsing the gh distribution to model extreme windspeedsrdquo Journal of Statistical Planning and Inference vol 122 no1-2 pp 15ndash22 2004

[5] A Vazquez J G Oliveira Z Dezso K-I Goh I Kondor andA-L Barabasi ldquoModeling bursts and heavy tails in humandynamicsrdquo Physical Review E vol 73 no 3 Article ID 0361272006

[6] M Gidlund and N Debernardi ldquoScheduling performance ofheavy-tailed data traffic in wireless high-speed shared chan-nelsrdquo in Proceedings of the IEEE Wireless Communications ampNetworking Conference (WCNC rsquo09) pp 1818ndash1823 IEEE PressBudapest Hungary April 2009

[7] R Cont ldquoEmpirical properties of asset returns stylized factsand statistical issuesrdquo Quantitative Finance vol 1 pp 223ndash2362001

[8] T-H Kim and H White ldquoOn more robust estim ation ofskewness and kurtosis simulation and application to the SP500indexrdquo Tech Rep Department of Economics UCSD 2003

The Scientific World Journal 15

[9] T C Aysal and K E Barner ldquoSecond-order heavy-taileddistributions and tail analysisrdquo IEEE Transactions on SignalProcessing vol 54 no 7 pp 2827ndash2832 2006

[10] V K Smith ldquoLeast squares regression with Cauchy errorsrdquoOxford Bulletin of Economics and Statistics vol 35 no 3 pp223ndash231 1973

[11] J Ilow ldquoForecasting network traffic using FARIMAmodels withheavy tailed innovationsrdquo inProceedings of the IEEE InterntionalConference onAcoustics Speech and Signal Processing pp 3814ndash3817 June 2000

[12] W Palma and M Zevallos ldquoFitting non-Gaussian persistentdatardquo Applied Stochastic Models in Business and Industry vol27 no 1 pp 23ndash36 2011

[13] J Nowicka-Zagrajek and R Weron ldquoModeling electricity loadsin California ARMA models with hyperbolic noiserdquo SignalProcessing vol 82 no 12 pp 1903ndash1915 2002

[14] R J Adler R E Feldman and M S Taqqu Eds A PracticalGuide to Heavy Tails Statistical Techniques and ApplicationsBirkhauser Cambridge Mass USA 1998

[15] H Liu J Lafferty and L Wasserman ldquoThe nonparanormalsemiparametric estimation of high dimensional undirectedgraphsrdquo Journal ofMachine Learning Research vol 10 pp 2295ndash2328 2009

[16] G M Goerg ldquoLambert W random variablesmdasha new familyof generalized skewed distributions with applications to riskestimationrdquo The Annals of Applied Statistics vol 5 no 3 pp2197ndash2230 2011

[17] D C Hoaglin ldquoSummarizing shape numerically the g-and-hdistributionsrdquo in Exploring Data Tables Trends and Shapes DC Hoaglin F Mosteller and J W Tukey Eds chapter 11 pp461ndash513 John Wiley amp Sons Hoboken NJ USA 1985

[18] RDevelopment Core Team R A Language and Environment forStatistical Computing R Foundation for Statistical ComputingVienna Austria 2010

[19] C Field and M G Genton ldquoThe Multivariate g-and-h Distri-butionrdquo Technometrics vol 48 no 1 pp 104ndash111 2006

[20] RM Sakia ldquoThe Box-Cox transformation technique a reviewrdquoJournal of the Royal Statistical Society Series D The Statisticianvol 41 no 2 pp 169ndash178 1992

[21] J R Blaylock L E Salathe and R D Green ldquoA note on the Box-Cox transformation under heteroskedasticityrdquoWestern Journalof Agricultural Economics vol 5 no 2 pp 129ndash136 1980

[22] A J Lawrance ldquoA note on the variance of the Box-Cox regres-sion transformation estimaterdquo Journal of the Royal StatisticalSociety C vol 36 no 2 pp 221ndash223 1987

[23] G Tsiotas ldquoOn the use of the Box-Cox transformation onconditional variance modelsrdquo Finance Research Letters vol 4no 1 pp 28ndash32 2007

[24] S Goncalves andNMeddahi ldquoBox-Cox transforms for realizedvolatilityrdquo Journal of Econometrics vol 160 no 1 pp 129ndash1442011

[25] N H Bingham C M Goldie and J L Teugels RegularVariation Encyclopedia of Mathematics and its ApplicationsCambridge University Press Cambridge UK 1989

[26] S Morgenthaler and J W Tukey ldquoFitting quantiles doublingHRHQ andHHH distributionsrdquo Journal of Computational andGraphical Statistics vol 9 no 1 pp 180ndash195 2000

[27] M Fischer ldquoGeneralized Tukey-type distributions with appli-cation to financial and teletraffic datardquo Statistical Papers vol 51no 1 pp 41ndash56 2010

[28] T C Headrick R K Kowalchuk and Y Sheng ldquoParametricprobability densities and distribution functions for Tukey g-and-h transform ations and their use for fi tting datardquo AppliedMathematical Sciences vol 2 no 9 pp 449ndash462 2008

[29] R M Corless G H Gonnet D E Hare D J Jeffrey andD E Knuth ldquoOn the Lambert 119882 functionrdquo Advances inComputational Mathematics vol 5 no 4 pp 329ndash359 1996

[30] M Rosenlicht ldquoOn the explicit solvability of certain transcen-dental equationsrdquo Publications Mathematiques de lrsquoInstitut desHautes Etudes Scientifiques no 36 pp 15ndash22 1969

[31] S R Valluri D J Jeffrey and R M Corless ldquoSome applicationsof the Lambert W function to physicsrdquo Canadian Journal ofPhysics vol 78 no 9 pp 823ndash831 2000

[32] M Galassi J Davies J Theiler et al GNU Scientific LibraryReference Manual 3rd edition 2011 httpwwwgnuorgsoft-waregsl

[33] P Jodra ldquoA closed-form expression for the quantile functionof the Gompertz-Makeham distributionrdquo Mathematics andComputers in Simulation vol 79 no 10 pp 3069ndash3075 2009

[34] A G Pakes ldquoLambertrsquos W infinite divisibility and poissonmixturesrdquo Journal of Mathematical Analysis and Applicationsvol 378 no 2 pp 480ndash492 2011

[35] M Stehlık ldquoDistributions of exact tests in the exponentialfamilyrdquoMetrika vol 57 no 2 pp 145ndash164 2003

[36] J-N Hwang S-R Lay and A Lippman ldquoNonparametricmultivariate density estimation a comparative studyrdquo IEEETransactions on Signal Processing vol 42 no 10 pp 2795ndash28101994

[37] L Wasserman All of Nonparametric Statistics Springer Texts inStatistics 2007

[38] R E Maiboroda and N M Markovich ldquoEstimation of heavy-tailed probability density functionwith application toweb datardquoComputational Statistics vol 19 no 4 pp 569ndash592 2004

[39] N M Markovich ldquoAccuracy of transformed kernel densityestimates for a heavy-tailed distributionrdquo Automation andRemote Control vol 66 no 2 pp 217ndash232 2005

[40] K K Dutta and D Babbel ldquoOn measuring skewness andkurtosis in short rate distributions the case of the US dollarLondon inter bank offer ratesrdquo Tech Rep Wharton SchoolCenter for Financial Institutions University of Pennsylvania2002

[41] C S Wong W S Chan and P L Kam ldquoA Student t-mixtureautoregressivemodel with applications to heavy-tailed financialdatardquo Biometrika vol 96 no 3 pp 751ndash760 2009

[42] J Yan ldquoAsymmetry fat-tail and autoregressive conditionaldensity in financial return data with systems of frequencycurvesrdquo 2005 httpciteseerxistpsueduviewdocsummarydoi=1011762741

[43] G D Rayner and H L MacGillivray ldquoNumerical maximumlikelihood estimation for the g-and-k and generalized g-and-hdistributionsrdquo Statistics and Computing vol 12 no 1 pp 57ndash752002

[44] E L Lehmann and G Casella Theory of Point EstimationSpringer Texts in Statistics Springer New York NY USA 2ndedition 1998

[45] C Fernandez andM F Steel ldquoMultivariate Student-119905 regressionmodels pitfalls and inferencerdquo Biometrika vol 86 no 1 pp153ndash167 1999

[46] C Liu and D B Rubin ldquoML estimation of the t distributionusing EM and its extensions ECMand ECMErdquo Statistica Sinicavol 5 pp 19ndash39 1995

16 The Scientific World Journal

[47] M P Cohen ldquoSample means of independent standard CAUchyrandom variables are standard cauchy a new approachrdquo TheAmerican Mathematical Monthly vol 119 no 3 pp 240ndash2442012

[48] T Bollerslev ldquoGeneralized autoregressive conditional het-eroskedasticityrdquo Journal of Econometrics vol 31 no 3 pp 307ndash327 1986

[49] R Deo C Hurvich and Y Lu ldquoForecasting realized volatilityusing a long-memory stochastic volatility model estimationprediction and seasonal adjustmentrdquo Journal of Econometricsvol 131 no 1-2 pp 29ndash58 2006

[50] AMelino and SM Turnbull ldquoPricing foreign currency optionswith stochastic volatilityrdquo Journal of Econometrics vol 45 no 1-2 pp 239ndash265 1990

[51] A Azzalini and A Capitanio ldquoDistributions generated byperturbation of symmetrywith emphasis on amultivariate skewt-distributionrdquo Journal of the Royal Statistical Society Series BStatistical Methodology vol 65 no 2 pp 367ndash389 2003

[52] R N Mantegna and H E Stanley ldquoModeling of financial datacomparison of the truncated Levy flight and the ARCH(1) andGARCH(11) processesrdquo Physica A Statistical Mechanics and itsApplications vol 254 no 1-2 pp 77ndash84 1998

[53] P A Zadrozny Necessary and Sufficient Restrictions for Exis-tence of a Unique Fourth Moment of a Univariate GARCH(pq)Process vol 20 of Advances in Econometrics chapter 13Emerald Group Publishing 2005 httpsideasrepecorgpcesceswps 1505html

[54] R Huisman K G Koedijk C J Kool and F Palm ldquoTail-indexestimates in small samplesrdquo Journal of Business amp EconomicStatistics vol 19 no 2 pp 208ndash216 2001

[55] H CThode JrTesting for Normality CRCPress NewYorkNYUSA 2002

[56] G M Goerg ldquoClosed-form cdf and pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert W approach and howto bijectively ldquoGaussianizerdquo heavy-tailed datardquo Tech RepCarnegieMellonUniversity 2010 httparxiv-web3librarycor-nelleduabs10102265v1

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article The Lambert Way to Gaussianize Heavy

2 The Scientific World Journal

InferenceInference

Latent (Gaussian) Observed (heavy tails)

Recovered

ldquoGaussianizerdquoheavy tails

Methodsmodelsassuming

NormalityExact

Methodsmodelsfor heavy-tailed

Approx

X and x

Y and y

X120591 and x120591 W120591(Y) = Hminus1120591 (Y)

FX and FX120591

Lambert W times FX(Tukeyrsquos h if FX = 119977)

on X120591H120591(middot)

H120591(X)

distributions (if exist)

on Y

Figure 1 Schematic view of the heavy tail LambertW times 119865119883framework Latent input119883 sim 119865

119883119867

120591(119883) from (6) transforms (solid arrows)119883 to

119884 sim Lambert W times 119865119883and generates heavy tails (right) Observed heavy-tailed 119884 and y (1) use119882

120591(sdot) to back-transform y to latent ldquoNormalrdquo

x120591 (2) use modelMN of your choice (regression time series models hypothesis testing etc) for inference on x

120591 and (3) convert results back

to the original ldquoheavy-tailed worldrdquo of y (right)

heavy-tailed data y For example consider y = (1199101 119910

500)

from a standard Cauchy distribution C(0 1) in Figure 2(a)modeling heavy tails by a transformation makes it evenpossible to Gaussianize this Cauchy sample (Figure 2(c))This ldquonicerdquo data x

120591can then be subsequently analyzed with

common techniques For example the location can now beestimated using the sample average (Figure 2(d)) For detailssee Section 61

Liu et al [15] use a semiparametric approach where119884 hasa nonparanormal distribution if 119891(119884) sim N(120583 1205902) and 119891(sdot)is an increasing smooth function they estimate 119891(sdot) usingnonparametric methods This leads to a greater flexibility inthe distribution of 119884 but it suffers from two drawbacks (i)nonparametric methods have slower convergence rates andthus need large samples and (ii) for identifiability of 119891(sdot)E119891(119884) equiv E119884 and Var(119891(119884)) equiv Var(119884) must hold While(i) is inherent to nonparametric methods point (ii) requires119884 to have finite mean and variance which is often not metfor heavy-tailed data Thus here we use parametric trans-formations which do not rely on restrictive identifiabilityconditions and also work well for small sample sizes

The main contributions of this work are threefold (a) ametafamily of heavy tail Lambert W times 119865

119883distributions (see

also [16]) with Tukeyrsquos ℎ distribution [17] as a special case (b)a bijective transformation to ldquoGaussianizerdquo heavy-tailed data(Section 2) and (c) simple expressions for the cumulativedistribution function (cdf) 119866

119884(119910) and probability density

function (pdf) 119892119884(119910) (Section 24) In particular analytic

expressions for the pdf and cdf for Tukeyrsquos ℎ (Section 3) arepresented here to the best of the authorrsquos knowledge for thefirst time in the literature

Section 4 introduces amethod ofmoments estimator andstudies the maximum likelihood estimator (MLE) Section 5shows their finite sample properties As has been shownin many case studies Tukeyrsquos ℎ distribution (heavy tailLambert W times Gaussian) is useful to model data with uni-modal heavy-tailed densities Applications to SampP 500 log-returns confirm the usefulness of the Lambert W framework(Section 6) Finally we discuss the new methodology andfuture work in Section 7 Detailed derivations and proofsare given in the Supplementary Material available online athttpdxdoiorg1011552015909231

Computations figures and simulations were done inR [18] The R package LambertW implements most of thepresented methodology and is publicly available on CRAN

11 Multivariate Extensions While this work focuses onthe univariate case multivariate extensions of the presentedmethods can be defined component-wise analogously to themultivariate version of Tukeyrsquos ℎ distribution [19] Whilethis may not make the transformed random variables jointlyGaussian it still provides a good starting point for more well-behaved multivariate estimation

The Scientific World Journal 3

0 100 200 300 400 500minus200

0

200

600

y

(a) Random sample y sim C(0 1)

0 100 200 300 400 5000

5

10

15

20

25

30

X120582

(b) Box-Cox transformed xMLE

with = 037

0 100 200 300 400 500

minus3

minus2

minus1

0

1

2

3

4

X120591

(c) Gaussianized x120591MLE with 120591 = (003 105 086)

0 100 200 300 400 500minus3

minus2

minus1

0

1

2

Number of samples

Cum

ulat

ive s

ampl

e ave

rage

CauchyGaussianized

(d) Cumulative sample average

Figure 2 Gaussianizing a standard Cauchy sample For (d) 120591(119899) was estimated for each fixed 119899 = 5 500 before Gaussianizing (1199101 119910

119899)

12 Box-Cox Transformation A popular method to deal withskewed high variance data is the Box-Cox transformation

x120582=

y120582 minus 1120582

if 120582 gt 0log y if 120582 = 0

(1)

A major limitation of (1) is the nonnegativity constraint ony which prohibits its use in many applications To avoidthis limitation it is common to shift the data y = y +|min(y)| ge 0 which restricts 119884 to a half-open intervalIf however the underlying process can occur on the entirereal line such a shift undermines statistical inference for yetunobserved data (see [20]) Even if out-of-sample predictionis not important for the practitioner Figure 2(b) shows thatthe Box-Cox transformation in fact fails to remove heavy tailsfrom the Cauchy sample (We use y = y + |min(y)| + 1 anduse boxcox from the MASS R package 120582 = 037)

Moreover the main purpose of the Box-Cox transfor-mation is to stabilize variance [21ndash23] and remove right tailskewness [24] a lower empirical kurtosis is merely a by-result of the variance stabilization In contrast the Lambert

W framework is designed primarily to model heavy-tailedrandom variables and remove heavy tails from data and hasno difficulties with negative values

2 Generating Heavy Tails UsingTransformations

Random variables exhibit heavy tails if more mass than for aGaussian random variable lies at the outer end of the densitysupport A random variable 119885 has a tail index 119886 if its cdfsatisfies 1 minus 119865

119885(119911) sim 119871(119911)119911

minus119886 where 119871(119911) is a slowly varyingfunction at infinity that is lim

119911rarrinfin119871(119905119911)119871(119911) = 1 for all

119905 gt 0 [25] (There are various similar definitions of heavy fator long tails for this work these differences are not essential)The heavy tail index 119886 is an important characteristic of 119885 forexample only moments up to order 119886 can exist

21 Tukeyrsquos ℎ Distribution A parametric transformation isthe basis of Tukeyrsquos ℎ random variables [17]

119885 = 119880 exp(ℎ2

1198802) ℎ ge 0 (2)

4 The Scientific World Journal

minus2 minus1 0 1 2 3 4minus2minus1

01234

u

z=G120575(u)

120575r = 110z = u

120575120001 = 0

(a) 120575ℓ120575119903 transformation (3)

minus2 minus1 0 1 2 3 4minus2

minus1

0

1

2

z

120575r = 110u = z

u=W

120575(z)

120575120001 = 0

(b) Inverse119882120575ℓ 120575119903(119911) in (80) in the supplemen-

tary material

z

01 02 03 04 05 06 07 08 09 1 1

1 12

13

14 15

16 17

18 19

2

23 2

4

00 05 10 1500051015202530

120575

W120575(z)

(c) Inverse119882120575(119911)(9) as a function of 120575 and 119911

Figure 3 Transformation and inverse transformation for 120575ℓ= 0 and 120575

119903= 110 identity on the left (same tail behavior) and a heavy-tailed

transformation in the right tail of input 119880

where 119880 is standard Normal random variable and ℎ isthe heavy tail parameter The random variable 119885 has tailparameter 119886 = 1ℎ [17] and reduces to the Gaussian for ℎ = 0Morgenthaler and Tukey [26] extend the ℎ distribution to theskewed heavy-tailed family of ℎℎ random variables

119885 =

119880 exp(120575ℓ

2

1198802) if 119880 le 0

119880 exp(120575119903

2

1198802) if 119880 gt 0

(3)

where again 119880 sim N(0 1) Here 120575ℓge 0 and 120575

119903ge 0 shape the

left and right tail of 119885 respectively thus transformation (3)can model skewed and heavy-tailed data see Figure 3(a) Forthe sake of brevity let119867

120575(119906) = 119906 exp((1205752)1199062)

However despite their great flexibility Tukeyrsquos ℎ andℎℎ distributions are not very popular in statistical practicebecause expressions for the cdf or pdf have not been availablein closed form Although Morgenthaler and Tukey [26]express the pdf of (2) as (ℎ equiv 120575)

119892119885 (119911) =

119891119880(119867

minus1

120575(119911))

1198671015840

120575(119867

minus1

120575(119911))

(4)

they fall short of making119867minus1

120575(119911) explicit So far the inverse of

(2) or (3) has been considered analytically intractable [4 27]Thus parameter inference relied on matching empirical andtheoretical quantiles [4 17 26] or by themethod ofmoments[28] Only recently Headrick et al [28] provided numericalapproximations to the inverse However numerical approx-imations can be slow and prohibit analytical derivationsThus a closed form analytically tractable pdf which canbe computed efficiently is essential for a widespread use ofTukeyrsquos ℎ (and variants)

In this work I present this long sought closed-forminverse which has a large body of literature in mathematicsand is readily available in standard statistics software Forease of notation and concision main results are shown for120575ℓ= 120575

119903= 120575 analogous results for 120575

ℓ= 120575

119903will be stated

without details

22 Heavy Tail Lambert W Random Variables Tukeyrsquos ℎtransformation (2) is strongly related to the approach takenby Goerg [16] to introduce skewness in continuous randomvariables 119883 sim 119865

119883(119909) In particular if 119885 sim Tukeyrsquos ℎ then

1198852sim skewed Lambert W times 120594

2

1with skew parameter 120574 = ℎ

Adapting the skew Lambert W times 119865119883inputoutput idea

(see Figure 1) Tukeyrsquos ℎ random variables can be generalizedto heavy-tailed Lambert W times 119865

119883random variables (Most

concepts and methods from the skew Lambert W times 119865119883case

transfer one-to-one to the heavy tail Lambert W randomvariables presented hereThus for the sake of concision I referto Goerg [16] for details of the Lambert W framework)

Definition 1 Let119880 be a continuous random variable with cdf119865119880(119906 | 120573) pdf 119891

119880(119906 | 120573) and parameter vector 120573 Then

119885 = 119880 exp(1205752

1198802) 120575 isin R (5)

is a noncentral nonscaled heavy tail LambertW times 119865119880random

variable with parameter vector 120579 = (120573 120575) where 120575 is the tailparameter

Tukeyrsquos ℎ distribution results for 119880 being a standardGaussianN(0 1)

Definition 2 For a continuous location-scale family randomvariable 119883 sim 119865

119883(119909 | 120573) define a location-scale heavy-tailed

LambertW times 119865119883random variable

119884 = 119880 exp(1205752

1198802)120590

119883+ 120583

119883 120575 isin R (6)

with parameter vector 120579 = (120573 120575) where119880 = (119883minus120583119883)120590

119883and

120583119883and120590

119883aremean and standard deviation of119883 respectively

The input is not necessarily Gaussian (Tukeyrsquos ℎ) but canbe any other location-scale continuous random variable forexample from a uniform distribution 119883 sim 119880(119886 119887) (seeFigure 4)

The Scientific World Journal 5

120575 0110

00 05 10 15 20 25 30000204060810

y

Pdf

00 05 10 15 20 25 300002040608

y

Cdf

(a) Lambert Wtimes1205942

119896with 120573=119896=1

13

Pdf

000204060810

Cdf

0 2 4 6 8 10 12000

010

020

y

0 2 4 6 8 10 12y

(b) Lambert W times Γ(119904 119903) with 120573 =(119904 119903) = (3 1)

1

minus4 minus2 0 2 4000204060810

y

Cdf

minus4 minus2 0 2 40001020304

y

Pdf

(c) Lambert W times N(120583 1205902) with

120573 = (120583 120590) = (0 1)

minus4 minus2 0 2 4000204060810

y

Cdf

minus4 minus2 0 2 4000102030405

y

Pdf

5

(d) Lambert W times 119880(119886 119887) with120573 = (119886 119887) = (minus1 1)

Figure 4 Pdf (left) and cdf (right) of a heavy tail (a) ldquononcentral nonscaledrdquo (b) ldquoscalerdquo and (c and d) ldquolocation-scalerdquo Lambert W times 119865119883

random variable 119884 for various degrees of heavy tails (color dashed lines)

Definition 3 Let 119883 sim 119865119883(119909119904 | 120573) be a continuous scale-

family random variable with scale parameter 119904 and standarddeviation 120590

119883 let 119880 = 119883120590

119883 Then

119884 = 119883 exp(1205752

1198802) 120575 isin R (7)

is a scaled heavy-tailed Lambert W times 119865119883random variable

with parameter 120579 = (120573 120575)

Let 120591 = (120583119883(120573) 120590

119883(120573) 120575) define transformation (6) (For

noncentral nonscale input set 120591 = (0 1 120575) for scale-familyinput 120591 = (0 120590

119883 120575)) The shape parameter 120575(= Tukeyrsquos ℎ)

governs the tail behavior of 119884 for 120575 gt 0 values further awayfrom 120583

119883are increasingly emphasized leading to a heavy-

tailed version of119883 for 120575 = 0 119884 equiv 119883 and for 120575 lt 0 values faraway from the mean are mapped back again closer to 120583

119883 For

120575 ge 0 and 119883 isin (minusinfininfin) also 119884 isin (minusinfininfin) For 120575 ge 0 and119883 isin [0infin) also 119884 isin [0infin)

Remark 4 (only nonnegative 120575) Although 120575 lt 0 givesinteresting properties for 119884 it defines a nonbijective trans-formation and leads to parameter-dependent support andnonunique inputThus for the remainder of this work assume120575 ge 0 unless stated otherwise

23 Inverse Transformation ldquoGaussianizerdquo Heavy-Tailed DataTransformation (6) is bijective and its inverse can beobtained via the Lambert W function which is the inverseof 119911 = 119906 exp(119906) that is that function which satisfies119882(119911) exp(119882(119911)) = 119911 It has been studied extensively inmathematics physics and other areas of science [29ndash31] andis implemented in the GNU Scientific Library (GSL) [32]Only recently the Lambert W function received attentionin the statistics literature [16 33ndash35] It has many usefulproperties (see Appendix A in the supplementary material

and Corless et al [29]) in particular 119882(119911) is bijective for119911 ge 0

Lemma 5 The inverse transformation of (6) is

119882120591 (119884) = 119882120575

(

119884 minus 120583119883

120590119883

)120590119883+ 120583

119883= 119880120590

119883+ 120583

119883= 119883 (8)

where

119882120575 (119911) = sgn (119911)(

119882(1205751199112)

120575

)

12

(9)

and sgn(119911) is the sign of 119911119882120575(119911) is bijective for all 120575 ge 0 and

all 119911 isin R

Lemma 5 gives for the first time an analytic bijectiveinverse of Tukeyrsquos ℎ transformation119867minus1

120575(119910) of Morgenthaler

and Tukey [26] is now analytically available as (8) Bijectivityimplies that for any data y and parameter 120591 the exact inputx120591= 119882

120591(y) sim 119865

119883(119909) can be obtained

In view of the importance and popularity of Normalitywe clearly want to back-transform heavy-tailed data todata from a Gaussian rather than yet another heavy-taileddistribution Tail behavior of random variables is typicallycompared by their kurtosis 120574

2(119883) = E(119883 minus 120583

119883)4120590

4

119883 which

equals 3 if 119883 is Gaussian Hence for the future when weldquonormalize yrdquo we cannot only center and scale but alsotransform it to x

120591with 120574

2(x120591) = 3 (see Figure 2(c)) While

1205742(119883) = 3 does not guarantee that 119883 is Gaussian it is a good

baseline for a Gaussian sample Furthermore it puts differentdata not only on the same scale but also on the same tail

This data-driven view of the Lambert W framework canalso be useful for kernel density estimation (KDE) wheremultivariate data is often prescaled to unit variance so thesame bandwidth can be used in each dimension [36 37]Thus

6 The Scientific World Journal

ldquonormalizingrdquo the Lambert Way can also improve KDE forheavy-tailed data (see also [38 39])

Remark 6 (generalized transformation) Transformation (2)can be generalized to

119885 = 119880 exp(1205752

(1198802)

120572

) 120572 gt 0 (10)

The inner term 1198802 guarantees bijectivity for all 120572 gt 0 The

inverse is

119882120575120572 (

119911) = sgn (119911)(119882(120572120575 (119911

2)

120572

)

120572120575

)

12120572

(11)

For comparison with Tukeyrsquos ℎ I consider 120572 = 1 onlyFor 120572 = 12 transformation (10) is closely related to skewedLambert W times 119865

119883distributions

24 Distribution and Density For ease of notation let

119911 =

119910 minus 120583119883

120590119883

119906 = 119882120575 (119911) 119909 = 119882

120591(119910) = 119906120590

119883+ 120583

119883 (12)

Theorem 7 The cdf and pdf of a location-scale heavy tailLambert W times 119865

119883random variable 119884 equal

119866119884(119910 | 120573 120575) = 119865

119883(119882

120575(

119910 minus 120583119909

120590119909

)120590119883+ 120583

119883| 120573) (13)

119892119884(119910 | 120573 120575)

= 119891119883(119882

120575(

119910 minus 120583119883

120590119883

)120590119883+ 120583

119883| 120573)

sdot

119882120575((119910 minus 120583

119883) 120590

119883)

((119910 minus 120583119883) 120590

119883) [1 +119882(120575 ((119910 minus 120583

119883) 120590

119883)2)]

(14)

Clearly 119866119884(119910 | 120573 120575 = 0) = 119865

119883(119910 | 120573) and 119892

119884(119910 | 120573 120575 =

0) = 119891119883(119910 | 120573) since lim

120575rarr0119882120575(119911) = 119911 and lim

120575rarr0119882(120575119911

2) =

0 for all 119911 isin RFor scale family or noncentral nonscale input set 120583

119883= 0

or 120583119883= 0 120590

119883= 1

The explicit formula (14) allows a fast computation andtheoretical analysis of the likelihood which is essential forstatistical inference Detailed properties of (14) are given inSection 41

Figure 4 shows (13) and (14) for various 120575 ge 0 with fourdifferent input distributions for 120575 = ℎ = 0 the input equalsthe output (solid black) for larger 120575 the tails of 119866

119884(119910 | 120579) and

119892119884(119910 | 120579) get heavier (dashed colored)

25 Quantile Function Quantile fitting has been the standardtechnique to estimate 120583

119883120590

119883 and 120575 of Tukeyrsquos ℎ In particular

the medians of 119884 and 119883 are equal Thus for symmetriclocation-scale family input the sample median of y is a robust

estimate of 120583119883for any 120575 ge 0 (see also Section 5) General

quantiles can be computed via [17]

119910120572= 119906

120572exp(120575

2

1199062

120572)120590

119883+ 120583

119883 (15)

where 119906120572= 119882

120575(119911120572) are the 120572-quantiles of 119865

119880(119906)

3 Tukeyrsquos ℎ Distribution Gaussian Input

For Gaussian input Lambert W times 119865119883equals Tukeyrsquos ℎ which

has been studied extensively Dutta and Babbel [40] show that

E119885119899=

0 if 119899 is odd 119899 lt 1120575

119899 (1 minus 119899120575)minus(119899+1)2

21198992(1198992)

if 119899 is even 119899 lt 1120575

∄ if 119899 is odd 119899 gt 1120575

infin if 119899 is even 119899 gt 1120575

(16)

which in particular implies that [28]

E119885 = E1198853= 0 if 120575 lt 1 and 1

3

respectively (17)

E1198852=

1

(1 minus 2120575)32 if 120575 lt 1

2

E1198854= 3

1

(1 minus 4120575)52 if 120575 lt 1

4

(18)

Thus the kurtosis of 119884 equals (see Figure 5)

1205742 (120575) = 3

(1 minus 2120575)3

(1 minus 4120575)52

for 120575 lt 14 (19)

For 120575 = 0 (18) and (19) reduce to the familiarGaussian valuesExpanding (19) around 120575 = 0 yields

1205742 (120575) = 3 + 12120575 + 66120575

2+ O (120575

3) (20)

Dropping O(1205753) and solving (20) gives a rule of thumbestimate

120575Taylor =

1

66

[radic66 1205742(y) minus 162 minus 6]

+

(21)

where 1205742(y) is the sample kurtosis and [119886]

+= max(119886 0) that

is 120575Taylor gt 0 if 1205742(y) gt 3 otherwise set 120575Taylor = 0

Corollary 8 The cdf of Tukeyrsquos ℎ equals

119866119884(119910 | 120583

119883 120590119883 120575) = Φ(

119882120591(119910) minus 120583

119883

120590119883

) (22)

whereΦ(119906) is the cdf of a standard NormalThe pdf equals (for120575 gt 0)

119892119884(119910 | 120583

119883 120590119883 120575) =

1

radic2120587

exp(minus1 + 1205752

119882120575(

119910 minus 120583119883

120590119883

)

2

)

sdot

1

1 +119882(120575 ((119910 minus 120583119883) 120590

119883)2)

(23)

The Scientific World Journal 7

000 005 010 015 020 02510

15

20

25

Varia

nce

Studentrsquos t

120575 or 1

Heavy tail Lambert W times Gaussian

(a) Variance

000 005 010 0153

4

5

6

7

8

9

10

Kurt

osis

120575 or 1

Studentrsquos tHeavy tail Lambert W times Gaussian

(b) Kurtosis

Figure 5 Comparing moments of Lambert W times Gaussian and Studentrsquos 119905

Proof Take119883 simN(120583119883 120590

2

119883) in Theorem 7

Section 41 studies functional properties of (23) in moredetail

31 Tukeyrsquos ℎ versus Studentrsquos 119905 Studentrsquos 119905]-distribution with] degrees of freedom is often used to model heavy-tailed data[41 42] as its tail index equals ] Thus the 119899th moment of aStudentrsquos 119905 random variable 119879 exists if 119899 lt ] In particular

E119879 = E1198793= 0 if ] gt 1 or gt 3

E1198792=

]] minus 2

=

1

1 minus (2])if ] gt 2

(24)

and kurtosis

1205742 (]) = 3

] minus 2] minus 4

= 3

1 minus 2 (1])1 minus 4 (1])

if ] gt 4 (25)

Comparing (24) and (25) with (18) and (19) shows anatural association between 1] and 120575 and a close similaritybetween the first four moments of Studentrsquos 119905 and Tukeyrsquosℎ (Figure 5) By continuity and monotonicity the first fourmoments of a location-scale 119905-distribution can always beexactly matched by a corresponding location-scale LambertW times Gaussian Thus if Studentrsquos 119905 is used to model heavytails and not as the true distribution of a test statisticit might be worthwhile to also fit heavy tail Lambert Wtimes Gaussian distributions for an equally valuable ldquosecondopinionrdquo For example a parallel analysis on SampP 500 log-returns in Section 62 leads to divergent inference regardingthe existence of fourth moments

4 Parameter Estimation

Due to the lack of a closed form pdf of 119884 120579 = (120573 120575) hastypically been estimated by matching quantiles or a methodof moments estimator [4 26 28] These methods can now

be replaced by the fast and usually efficient maximumlikelihood estimator (MLE) Rayner and MacGillivray [43]introduce a numerical MLE procedure based on quantilefunctions but they conclude that ldquosample sizes significantlylarger than 100 should be used to obtain reliable estimatesrdquoSimulations in Section 5 show that the MLE using the closedform Lambert W times 119865

119883distribution converges quickly and is

accurate even for sample sizes as small as119873 = 10

41 Maximum Likelihood Estimation (MLE) For an iidsample (119910

1 119910

119873) = y sim 119892

119884(119910 | 120573 120575) the log-likelihood

function equals

ℓ (120579 y) =119873

sum

119894=1

log119892119884(119910

119894| 120573 120575) (26)

The MLE is that 120579 = (120573 120575) which maximizes (26) that is

120579MLE = (

120573 120575)MLE

= argmax120573120575ℓ (120573 120575 y) (27)

Since 119892119884(119910119894| 120573 120575) is a function of 119891

119883(119909119894| 120573) the MLE

depends on the specification of the input density Equation(26) can be decomposed as

ℓ (120573 120575 y) = ℓ (120573 x120591) +R (120591 y) (28)

where

ℓ (120573 x120591) =

119873

sum

119894=1

log119891119883(119882

120575(

119910119894minus 120583

119883

120590119883

)120590119883+ 120583

119883| 120573)

=

119873

sum

119894=1

log119891119883(119909

120591119894| 120573)

(29)

is the log-likelihood of the back-transformed data x120591=

(1199091205911 119909

120591119873) (via (8)) and

R (120591 y) =119899

sum

119894=1

log119877 (120583119883 120590119883 120575 119910

119894) (30)

8 The Scientific World Journal

y

01

015 02 025 03 035 04

045 05

055

06 065 07 075 08 085 09 095

1

00 05 10 1500

05

10

15

20

25

30

120575

R(120575 y | 120583 = 0 120590 = 1)

(a) Penalty 119877(120583119883 120590119883 120575 119910119894)(31) as a function of120575 and 119910 (120583119883 = 0 and 120590119883 = 1)

0 200 400 600 800 1000

minus10

0

10

20

30

40

Index00 05 10 15

minus3000

minus2000

minus1000

0

Log-

likel

ihoo

d (a

nd p

enal

ty)

Penalty

Input loglikOutput loglik

120575

120575trueMLE

(b) Sample z of Lambert W times Gaussian with 120575 = 13 (left) log-likelihood ℓ(120579 y) (solid black)decomposes in input log-likelihood (dotted green) and penalty (dashed red)

Figure 6 Log-likelihood decomposition for Lambert W times 119865119883distributions

where

119877 (120583119883 120590119883 120575 119910

119894)

=

119882120575((119910

119894minus 120583

119883) 120590

119883)

((119910119894minus 120583

119883) 120590

119883) [1 + 120575 (119882

120575((119910

119894minus 120583

119883) 120590

119883))2]

(31)

Note that 119877(120583119883 120590119883 120575 119910

119894) only depends on 120583

119883(120573) and 120590

119883(120573)

(and 120575) but not necessarily on every coordinate of 120573Decomposition (28) shows the difference between the

exact MLE (120573 120575) based on y and the approximate MLE 120573x120591based on x

120591alone if we knew 120591 = (120583

119883 120590119883 120575) beforehand

then we could back-transform y to x120591and estimate 120573x120591 from

x120591(maximize (29) with respect to 120573) In practice however

120591 must also be estimated and this enters the likelihood viathe additive term R(120591 y) A little calculation shows that forany 119910

119894isin R log119877(120583

119883 120590119883 120575 119910

119894) le 0 if 120575 ge 0 with equality

if and only if 120575 = 0 Thus R(120591 y) can be interpreted as apenalty for transforming the data Maximizing (28) faces atrade-off between transforming the data to follow 119891

119883(119909 | 120573)

(and increasing ℓ(120573 x120591)) and the penalty of a more extreme

transformation (and decreasingR(120591 y)) see Figure 6(b)Figure 6(a) shows a contour plot of 119877(120583

119883= 0 120590

119883=

1 120575 119910) as a function of 120575 and 119910 = 119911 it increases (in absolutevalue) either if 120575 gets larger (for fixed 119910) or for larger 119910 (forfixed 120575) In both cases increasing 120575 makes the transformeddata119882

120575(119911) get closer to 0 = 120583

119883 which in turn increases its

input likelihood For 120575 = 0 the penalty disappears since inputequals output for 119910 = 0 there is no penalty since119882

120575(0) = 0

for all 120575Figure 6(b) shows an iid sample (119873 = 1000) z sim

Lambert W times Gaussian with 120575 = 13 and the decompositionof the log-likelihood as in (28) Since 120573 = (0 1) is knownthe likelihood and penalty are only functions of 120575Theorem 9

shows that the convexity of the penalty (decreasing red)and concavity of the input likelihood (increasing green) asa function of 120575 holds true in general for any data z and theirsum (solid black) has a unique maximum here 120575MLE = 037(blue dashed vertical line) See Theorem 9 below for details

The maximization of (28) can be carried out numericallyHere I show existence and uniqueness of 120575MLE assuming that120583119883and 120590

119883are known Further theoretical results for 120579MLE

remain for futureworkGiven the ldquonicerdquo formof119892119884(119910) which

is continuous twice differentiable (under the assumption that119891119909(sdot) is twice differentiable) the MLE for 120579 = (120573 120575) shouldhave the usual optimality properties such as being consistentand efficient [44]

411 Properties of the MLE for the Heavy Tail Parameter 120575Without loss of generality let 120583

119883= 0 and 120590

119883= 1 In this case

ℓ (120575 z) prop minus

1

2

119873

sum

119894=1

[119882120575(119911119894)]2+

119873

sum

119894=1

log119882120575(119911119894)

119911119894

minus

119873

sum

119894=1

log (1 + 120575 [119882120575(119911119894)]2)

(32)

= minus

1 + 120575

2

119873

sum

119894=1

[119882120575(119911119894)]2minus

119873

sum

119894=1

log (1 + 120575 [119882120575(119911119894)]2)

(33)

Theorem 9 (see [14]) Let 119885 have a Lambert W times Gaussiandistribution where 120583

119883= 0 and 120590

119883= 1 are assumed to be

known and fixed Also consider only 120575 isin [0infin) (While forsome samples z the MLE also exists for 120575 lt 0 it cannot beguaranteed for all z If 120575 lt 0 (and 119911 = 0) then119882

120575(119911) is either

The Scientific World Journal 9

not unique in R (principal and nonprincipal branch) or doesnot have a real-valued solution in R if 1205751199112 lt 119890minus1)

(a) If

sum119899

119894=11199114

119894

sum119899

119894=11199112

119894

le 3 (34)

then 120575119872119871119864

= 0If (34) does not hold then

(b) 120575119872119871119864

gt 0 exists and is a positive solution to

119873

sum

119894=1

1199112

119894119882

1015840(120575119911

2

119894)(

1

2

119882120575(119911119894)2minus (

1

2

+

1

1 +119882(1205751199112

119894)

)) = 0 (35)

(c) There is only one such 120575 satisfying (35) that is 120575119872119871119864

isunique

Proof See Appendix B in the supplementary material

Condition (34) says that 120575MLE gt 0 only if the datais heavy-tailed enough Points (b) and (c) guarantee thatthere is no ambiguity in the heavy tail estimate This is anadvantage over Studentrsquos 119905-distribution for example whichhas numerical problems and local maxima for unknown (andsmall) ] [45 46] On the contrary 120575MLE is always a globalmaximum

Given the heavy tails in z one might expect convergenceissues for larger 120575 However 119882

120575(119885) sim N(0 1) for the true

120575 ge 0 and close to a standard Gaussian if 120575MLE asymp 120575 Since thelog-likelihood and its gradient depend on 120575 and z only via119882120575(z) the performance of theMLE should thus not get worse

for large 120575 as long as the initial estimate is close enough to thetruth Simulations in Section 5 support this conjecture evenfor 120579MLE

42 Iterative Generalized Method of Moments (IGMM) Adisadvantage of the MLE is the mandatory a priori specifi-cation of the input distribution Especially for heavy-taileddata the eye is a bad judge to choose a particular parametric119891119883(119909 | 120573) It would be useful to directly estimate 120591 without

the intermediate step of estimating 120579 firstGoerg [16] presented an estimator for 120591 based on iterative

generalized methods of moments (IGMM) The idea ofIGMM is to find a 120591 such that the back-transformed datax120591has desired properties for example being symmetric or

having kurtosis 3 An estimator for 120583119883 120590

119883 and 120575 can be

constructed entirely analogous to the IGMMestimator for theskewed LambertWtimes F case See the SupplementaryMaterialAppendix C for details

IGMM requires less specific knowledge about the inputdistribution and is usually also faster than the MLE Once120591IGMM has been obtained the back-transformed x

120591IGMMcan be

used to check if 119883 has characteristics of a known parametricdistribution 119865

119883(119909 | 120573) It must be noted though that testing

for a particular distribution 119865119883is too optimistic as x

120591has

ldquonicerrdquo properties regarding 119865119883than the true x would have

However estimating the transformation requires only threeparameters and for a large enough sample losing threedegrees of freedom should not matter in practice

5 Simulations

This section explores finite sample properties of estimatorsfor 120579 = (120583

119883 120590119883 120575) and (120583

119884 120590119884) under Gaussian input

119883 sim N(120583119883 120590

2

119883) In particular it compares Gaussian MLE

(estimation of 120583119884and 120590

119884only) IGMM and Lambert W times

Gaussian MLE and for a heavy tail competitor the median(For IGMM optimization was restricted to 120575 isin [0 10]) Allresults below are based on 119899 = 1 000 replications

51 Estimating 120575 Only Here I show finite sample propertiesof 120575MLE for119880 simN(0 1) where 120583

119883= 0 and 120590

119883= 1 are known

and fixed Theorem 9 shows that 120575MLE is unique either at theboundary 120575 = 0 or at the globally optimal solution to (35)Results in Table 1 were obtained by numerical optimizationrestricted to 120575 ge 0 (hArr log 120575 isin R) using the nlm function in119877

Table 1 suggests that the MLE is asymptotically unbiasedfor every 120575 and converges quickly (at about 119873 = 400) to itsasymptotic variance which is increasing with 120575 Assuming120583119883and 120590

119883to be known is unrealistic and thus these finite

sample properties are only an indication of the behavior of thejoint MLE 120579MLE Nevertheless they are very remarkable forextremely heavy-tailed data (120575 gt 1) where classic average-based methods typically break down One reason lies in theparticular form of the likelihood (32) and its gradient (35)(Theorem 9) although both depend on z they only do sothrough 119882

120575(z) = u sim N(0 1) Hence as long as 120575MLE is

sufficiently close to the true 120575 (32) and (35) are functions ofalmost Gaussian random variables and standard asymptoticresults should still apply

52 Estimating All Parameters Jointly Here we consider therealistic scenario where 120583

119883and 120590

119883are also unknown We

consider various sample sizes (119873 = 50 100 and 1000) anddifferent degrees of heavy tails 120575 isin 0 13 1 15 each onerepresenting a particularly interesting situation (i) Gaussiandata (does additional superfluous estimation of120575 affect otherestimates) (ii) fourthmoments do not exist (iii) nonexistingmean and (iv) extremely heavy-tailed data (can we get usefulestimates at all)

The convergence tolerance for IGMM was set to tol =122 sdot 10

minus4 Table 2 summarizes the simulationThe Gaussian MLE estimates 120590

119884directly while IGMM

and the Lambert W times Gaussian MLE estimate 120575 and120590119883 which implicitly give

119884through 120590

119884(120575 120590

119883) = 120590

119883sdot

(1radic(1 minus 2120575)32) if 120575 lt 12 (see (18)) For a fair comparison

each subtable also includes a column for 119884

= 119883sdot

(1radic(1 minus 2120575)32) Some of these entries contain ldquoinfinrdquo even for

120575 lt 12 this occurs if at least one 120575 ge 12 For any 120575 lt 1120583119883= 120583

119884 thus 120583

119883and 120583

119884can be compared directly For 120575 ge 1

10 The Scientific World Journal

Table 1 Finite sample properties of 120575MLE For each119873 120575 was estimated 119899 = 1000 times from a random sample zsimTukeyrsquos ℎ The left columnfor each 120575 shows bias 120575MLE minus 120575 each right column shows the root mean square error (RMSE) timesradic119873

119873 120575 = 0 120575 = 110 120575 = 13 120575 = 12

10 0025 0191 minus0017 0394 minus0042 0915 minus0082 1167

50 0013 0187 minus0010 0492 minus0018 0931 minus0016 1156

100 0010 0200 minus0010 0513 minus0009 0914 minus0006 1225

400 0005 0186 minus0003 0528 0000 0927 minus0004 1211

1000 0003 0197 0000 0532 minus0001 0928 minus0001 1203

2000 0003 0217 minus0001 0523 0000 0935 minus0001 1130

119873 120575 = 1 120575 = 2 120575 = 5

10 minus0054 1987 minus0104 3384 minus0050 7601

50 minus0017 1948 minus0009 3529 0014 7942

100 minus0014 2024 minus0001 3294 0011 7798

400 0001 1919 minus0002 3433 0001 7855

1000 0001 1955 0001 3553 minus0001 7409

2000 0001 1896 0000 3508 minus0001 7578

the mean does not exist each subtable for these 120575 interprets120583119884as the median

Gaussian Data (120575 = 0) This setting checks if imposing theLambert W framework even though it is superfluous causesa quality loss in the estimation of 120583

119884= 120583

119883or 120590

119884= 120590

119883

Furthermore critical values for 1198670 120575 = 0 (Gaussian) can

be obtained As in the 120575-only case above Table 2(a) suggeststhat estimators are asymptotically unbiased and quickly tendto a large-sample variance Additional estimation of 120575 doesnot affect the efficiency of120583

119883compared to estimating solely120583

Estimating 120590119884directly by Gaussian MLE does not give better

results than the Lambert W times Gaussian MLE

No Fourth Moment (120575 = 13) Here 120590119884(120575 120590

119883= 1) =

228 but fourth moments do not exist This results in anincreasing empirical standard deviation of

119910as119873 grows In

contrast estimates for 120590119883are not drifting off In the presence

of these large heavy tails themedian ismuch less variable thanGaussianMLE and IGMM Yet Lambert W timesGaussianMLEfor 120583

119883even outperforms the median

Nonexisting Mean (120575 = 1) Both sample moments divergeand their standard errors are also growing quickly Themedian still provides a very good estimate for the locationbut is again inferior to both Lambert W estimators whichare closer to the true values and appear to converge to anasymptotic variance at rateradic119873

Extreme Heavy Tails (120575 = 15) As in Section 51 IGMMand Lambert WMLE continue to provide excellent estimateseven though the data is extremely heavy tailed MoreoverLambert W MLE also has the smallest empirical standarddeviation overall In particular the Lambert W MLE for 120583

119883

has an approximately 15 lower standard deviation than themedian

The last column shows that for some 119873 about 1 of the119899 = 1 000 simulations generated invalid likelihood values(NA and infin) Here the search for the optimal 120575 led into

regions with a numerical overflow in the evaluation of119882120575(119911)

For a comparable summary these few cases were omitted andnew simulations added until full 119899 = 1 000 finite estimateswere found Since this only happened in 1 of the cases andalso such heavy-tailed data is rarely encountered in practicethis numerical issue is not a limitation in statistical practice

53 Discussion of the Simulations IGMM performs wellindependent of the magnitude of 120575 As expected the LambertWMLE for 120579 has the best properties it can recover the truthfor all 120575 and for 120575 = 0 it performs as well as the classicsample mean and standard deviation For small 120575 it has thesame empirical standard deviation as the Gaussian MLE buta lower one than the median for large 120575

Hence the only advantage of estimating 120583119884and 120590

119884by

sample moments of y is speed otherwise the Lambert W times

Gaussian MLE is at least as good as the Gaussian MLE andclearly outperforms it in presence of heavy tails

6 Applications

Tukeyrsquos ℎ distribution has already proven useful to modelheavy-tailed data but parametric inference was limited toquantile fitting ormethods ofmoments estimation [4 27 28]Theorem 7 allows us to estimate 120579 by ML

This section applies the presented methodology on sim-ulated as well as real world data (i) Section 61 demonstratesGaussianizing on the Cauchy sample from the Introductionand (ii) Section 62 shows that heavy tail Lambert W times

Gaussian distributions provide an excellent fit to daily SampP500 log-return series

61 Estimating Location of a Cauchy with the Sample MeanIt is well known that the sample mean y is a poor estimateof the location parameter of a Cauchy distribution since thesampling distribution of y is again a Cauchy (see [47] for arecent overview) in particular its variance does not go to 0for119873 rarr infin

The Scientific World Journal 11

Table 2 In each subtable (first rows) average (middle rows) proportion of estimates below truth and (bottom rows) empirical standarddeviation timesradic119873 True 120583

119883= 0 and 120590

119883= 1

(a) Truly Gaussian data 120575 = 0

120575 = 0 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 098 000 097 002 099 000 096 002 098 0

100 000 000 099 000 098 001 100 000 097 001 099 0

1000 000 000 100 000 099 000 100 000 099 000 100 0

50 050 050 057 051 060 066 054 051 065 066 056 0

100 050 051 056 051 062 062 053 052 065 062 056 0

1000 050 049 052 049 062 056 052 049 063 056 052 0

50 124 101 072 101 076 021 073 102 078 026 072 0

100 125 102 070 102 076 023 070 103 078 026 070 0

1000 126 098 073 098 079 022 073 098 079 022 073 0

(b) No fourth moments 120575 = 13

120575 = 13 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 198 000 107 029 infin 000 101 033 infin 0

100 000 000 203 000 104 031 infin 000 100 033 infin 0

1000 000 000 218 000 100 033 234 000 100 033 234 0

50 050 051 078 050 038 063 060 050 052 054 054 0

100 050 051 078 051 042 061 060 050 051 054 054 0

1000 048 051 077 051 047 056 055 051 050 053 052 0

50 127 221 656 144 145 110 NA 123 135 114 NA 0

100 130 233 1128 143 142 112 NA 119 134 109 NA 0

1000 123 225 1676 139 145 120 1597 117 133 108 1230 0

(c) Non-existing mean 120575 = 1

120575 = 1 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 minus010 246 minus001 118 090 infin 000 101 099 infin 0

100 000 074 724 000 109 095 infin 000 101 099 infin 0

1000 000 384 3481 000 101 100 infin 000 100 100 infin 0

50 053 052 10 051 034 065 1 051 052 052 1 0

100 050 052 10 051 038 063 1 050 053 053 1 0

1000 049 052 10 051 048 053 1 049 051 051 1 0

50 127 6585 4243 210 250 232 NA 119 170 216 NA 0

100 130 41075 40502 201 228 259 NA 117 174 225 NA 0

1000 126 330758 1040527 193 221 281 NA 111 164 218 NA 0

(d) Extreme heavy tails 120575 = 15

120575 = 15 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 minus002 684 309 minus002 123 137 infin minus001 100 149 infin 001

100 000 minus5116 3080 minus001 112 144 infin 000 101 150 infin 000

1000 000 17613 14251 000 101 149 infin 000 100 150 infin 000

50 053 048 1 051 034 064 1 053 053 054 1 001

100 051 053 1 054 037 061 1 052 051 051 1 000

1000 050 050 1 050 047 054 1 049 053 052 1 000

50 132 134771 9261 257 320 312 NA 115 186 276 NA 001

100 133 4215628 418435 239 287 344 NA 112 178 284 NA 000

1000 126 12446282 3903629 218 266 367 NA 111 180 285 NA 000

12 The Scientific World Journal

Table 3 Summary statistics for observed (heavy-tailed) y and back-transformed (Gaussianized) data x120591MLE

lowastlowast stands for lt10minus16and lowast forlt22 sdot 10

minus16

y sim C(0 1)

(Section 61)y = SampP 500(Section 62)

y x120591

x

y x120591

Min minus16159 minus316 0 minus711 minus242Max 95295 381 3318 499 223Mean 230 003 1498 005 005Median 004 004 1496 004 004Standard Deviation 46980 106 120 095 071Skewness 1743 012 390 minus030 minus004Kurtosis 34334 321 16175 770 293Shapiro-Wilk lowast 071 lowastlowast lowast 024Anderson-Darling lowastlowast 051 lowastlowast lowast 018

Heavy-tailed Lambert W times Gaussian distributions havesimilar properties to a Cauchy for 120575 asymp 1 The mean of119883 (120583

119883)

equals the location of119884 (119888) due to symmetry around 120583119883and 119888

respectively Thus we can estimate 120591 from the Cauchy sampley transform y to x

120591 get 120583

119883from x

120591= 119882

120591(y) and thus obtain

an estimate for 119888 by 119888 = 120583119883

The random sample y sim C(0 1) with pdf 119891(119910) = 1120587(1+1199102) in Figure 2(a) has heavy tails with two extreme (positive)

observations A Cauchy ML fit gives 119888 = 003(0055) and119904 = 086(0053) (standard errors in parenthesis) A LambertWtimesGaussianMLEgives120583

119883= 003(0055)120590

119883= 105(0072)

and 120575 = 086(0082) Thus both fits correctly fail to reject120583119883= 119888 = 0 Table 3(a) shows summary statistics on both

samples Since the Cauchy distribution does not have a well-defined mean y = 2304(2101) is not meaningful Howeverx120591MLE

is approximately Gaussian and we can use the sampleaverage for inference x

120591MLE= 0033(00472) correctly fails to

reject a zero location for y The transformed x120591MLE

featuresadditional Gaussian characteristics (symmetric no excesskurtosis) and even the null hypothesis of Normality cannotbe rejected (119875-valuege 05) Note however that Normality forthe transformed data is only an empirical approximation therandom variable119882

120591((119884 minus 120583

119883)120590

119883) where 119884 is Cauchy does

not have a Normal distributionFigure 2(d) shows the cumulative sample average for

the original sample and its Gaussianized version For afair comparison 120591(119899)MLE was reestimated cumulatively for each119899 = 5 500 and then used to compute (119909

1 119909

119899) The

transformation works extremely well data point 11991049is highly

influential for y but has no relevant effect on x120591(119899)

MLE Even for

small 119899 it is already clear that the location of the underlyingCauchy distribution is approximately zero

Although it is a simulated example it demonstrates thatremoving (strong) heavy tails from data works well andprovides ldquonicerdquo data that can then be analyzed with morerefined Gaussian methods

62 Heavy Tails in Finance SampP 500 Case Study A lot offinancial data displays negative skewness and excess kurto-sis Since financial data in general is not iid it is oftenmodeled with a (skew) Studentrsquos 119905-distribution underlyinga (generalized) autoregressive conditional heteroskedastic(GARCH) [2 48] or a stochastic volatility (SV) [49 50]model Using the Lambert W approach we can build uponthe knowledge and implications of Normality (and avoidderiving properties of a GARCH or SV model with heavy-tailed innovations) and simply ldquoGaussianizerdquo the returnsbefore fitting more complex GARCH or SV models

Remark 10 Time series models with Lambert W times Gaussianwhite noise are far beyond the scope of this work but canbe a direction of future research Here I only consider theunconditional distribution

Figure 7(a) shows the SampP 500 log-returns with a totalof 119873 = 2 780 daily observations (R package MASSdataset SP500) Table 3(b) confirms the heavy tails (samplekurtosis 770) but also indicates negative skewness (minus0296)As the sample skewness 120574

1(y) is very sensitive to outliers

we fit a distribution which allows skewness and test forsymmetry In case of the double-tail Lambert W times Gaussianthis means testing 119867

0 120575

ℓ= 120575

119903= 120575 versus 119867

1

120575ℓ

= 120575119903 Using the likelihood expression in (28) we can

use a likelihood ratio test with one degree of freedom (3versus 4 parameters) The log-likelihood of the double-tail fit(Table 4(a)) equals minus36060 = minus297227 + (minus63373) (inputlog-likelihood + penalty) while the symmetric 120575 fit givesminus360656 = minus297147 + (minus63509) Here the symmetric fitgives a transformed sample that is more Gaussian but it paysa greater penalty for transforming the data Comparing twicetheir difference to a1205942

1distribution gives a119875-value of 029 For

comparison a skew-119905 fit [51] with location 119888 scale 119904 shape120572 and ] degrees of freedom also yields (Function stmle

in the R package sn) a nonsignificant (Table 4(b)) Thusboth fits cannot reject symmetry

The Scientific World Journal 13

Time

y

0 500 1500 2500

minus6

minus2

2

0 5 2010 3000

04

08

Lag

ACF

y

minus5 0 500

03

06

KernelGaussian

minus3 minus2 minus1 0 1 2 3

minus6

minus2

2

Theoretical quantiles

Sam

ple q

uant

ilesMean 005

SD 095Skewminus03Kurt 771

(a) Observed heavy tail returns y

Time

x

0 500 1500 2500

minus2

012

0 5 10 20 3000

04

08

Lag

ACF

x

00

03

06

minus3 minus2 minus1 0 1 2 3minus3 minus2 minus1 0 1 2 3

minus2

012

Theoretical quantiles

KernelGaussian

Sam

ple q

uant

iles

Mean 005SD 07Skewminus004Kurt 293

(b) Gaussianized returns x120591MLE

Figure 7 Lambert W Gaussianization of SampP 500 log-returns 120591 = (005 070 017) In (a) and (b) data (top left) autocorrelation function(ACF) (top right) histogram Gaussian fit and KDE (bottom left) Normal QQ plot (bottom right)

Assume we have to make a decision if we should trade acertificate replicating the SampP 500 Since we can either buyor sell it is not important if the average return is positive ornegative as long as it is significantly different from zero

621 Gaussian Fit to Returns Estimating (120583119884 120590119884) by Gaus-

sianMLE and thus ignoring the heavy tails 120583119884= 0 cannot be

rejected on a 120572 = 1 level (Table 4(e)) Ignoring heavy tailswe would thus decide to not trade a replicating certificate atall

622 Heavy Tail Fit to Returns Both a heavy tail LambertWtimes Gaussian (Table 4(c)) and Student 119905-fit (Table 4(d)) rejectthe zero mean null (119875-values 10minus4 and 3 sdot 10minus5 resp)

Location and scale estimates are almost identical buttail estimates lead to different conclusions while for ] =

371 only moments up to order 3 exist in the Lambert W times

Gaussian case moments up to order 5 exist (10172 = 581)This is especially noteworthy as many theoretical results inthe (financial) time series literature rely on finite fourthmoments [52 53] consequently many empirical studies testthis assumption [7 54] Here Studentrsquos 119905 and a LambertW times Gaussian fit give different conclusions Since previousempirical studies often use Studentrsquos 119905 as a baseline [41] itmight be worthwhile to reexamine their findings in light ofheavy tail Lambert W times Gaussian distributions

623 ldquoGaussianizingrdquo Financial Returns The back-trans-formed x

120591MLEis indistinguishable from a Gaussian sample

(Figure 7(b)) and thus demonstrates that a Lambert W times

Gaussian distribution is indeed appropriate for x120591MLE

Noteven one test can reject Normality 119875-values are 018 018031 and 024 respectively (Anderson-Darling Cramer-von-Mises Shapiro-Francia Shapiro-Wilk see Thode [55])Table 3(b) confirms that Lambert W ldquoGaussianiziationrdquo was

successful 1205741(x120591) = minus0039 and 120574

2(x120591) = 293 are within the

typical variation for aGaussian sample of size119873 = 2780Thus

119884 = (119880119890(01722)119880

2

) 0705 + 0055 119880 =

119883 minus 0055

0705

119880 simN (0 1)

(36)

is an adequate (unconditional) LambertWtimesGaussianmodelfor the SampP 500 log-returns y For trading thismeans that theexpected return is significantly larger than zero (120583

119883= 0055 gt

0) and thus replicating certificates should be bought

624 Gaussian MLE for Gaussianized Data For 120575119897= 120575

119903equiv

120575 lt 1 also 120583119883equiv 120583

119884 We can therefore replace testing

120583119910= 0 versus 120583

119910= 0 for a non-Gaussian y with the

very well-understood hypothesis test 120583119909= 0 versus 120583

119909= 0

for the Gaussian x120591MLE

In particular standard errors basedon radic119873 and thus t and 119875-values should be closer to theldquotruthrdquo (Tables 4(c) and 4(d)) than a Gaussian MLE on thenon-Gaussian y (Table 4(e)) Table 4(f) shows that standarderrors for 120583x are even a bit too small compared to the heavy-tailed versions Since the ldquoGaussianizingrdquo transformation wasestimated treating x

120591MLEas if it was original data is too

optimistic regarding its Normality (recall the penalty (30) inthe total likelihood (28))

This example confirms that if a model and its theoret-ical properties are based on Normality but the observeddata is heavy-tailed then Gaussianizing the data first givesmore reliable inference than applying Gaussian methods tothe original heavy-tailed data (Figure 1) Clearly a jointestimation of the model parameters based on Lambert Wtimes Gaussian random variables (or any other heavy-taileddistribution) would be optimal However theoretical prop-erties and estimation techniques may not be available orwell understood The Lambert Way to Gaussianize data is

14 The Scientific World Journal

Table 4 MLE fits to SampP 500 y Tables 4(a) 4(b) 4(c) 4(d) and 4(e)and the Gaussianized data x

120591MLETable 4(f)

(a) Double-tail Lambert W times Gaussian = Tukeyrsquos ℎℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 366 000120590119883

071 0016 4400 000120575ℓ

019 0021 899 000120575119903

016 0019 824 000

(b) Skew 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 010 0061 165 010119904 067 0017 3847 000120572 minus008 0101 minus077 044] 373 0297 1257 000

(c) Lambert W times Gaussian = Tukeyrsquos ℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 365 0000120590119883

071 0016 4395 0000120575 017 0016 1105 0000

(d) Studentrsquos 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 006 0015 365 000119904 067 0017 3951 000] 372 0295 1261 000

(e) Gaussian (SampP 500)

Est se 119905 Pr (gt|119905|)120583119884

005 0018 255 001120590119884

095 0013 7457 000

(f) Gaussian (x120591MLE )

Est se 119905 Pr (gt|119905|)120583119909120591

005 0013 381 000120590119909120591

071 0009 7457 000

thus a pragmatic method to improve statistical inferenceon heavy-tailed data while preserving the applicability andinterpretation of Gaussian models

7 Discussion and Outlook

In this work I use the Lambert W function to modeland remove heavy tails from continuous random variablesusing a data-transformation approach For Gaussian randomvariables this not only contributes to existing work onTukeyrsquos ℎ distribution but also gives convincing empiricalresults unimodal data with heavy tails can be transformedto Gaussian data Properties of a Gaussian model MN onthe back-transformed data mimic the features of the ldquotruerdquoheavy-tailed modelM

119866very closely

Since Normality is the single most typical and oftenrequired assumption in many areas of statistics machine

learning and signal processing future research can takemany directions From a theoretical perspective properties ofLambert W times 119865

119883distributions viewed as a generalization of

alreadywell-known distributions119865119883can be studiedThis area

will profit from existing literature on the Lambert W func-tion which has been discovered only recently by the statisticscommunity Empirical work can focus on transforming thedata and comparing approximate Gaussian with joint heavytail analyses The comparisons in this work showed thatapproximate inference for Gaussianized data is comparablewith the direct heavy tail modeling and so provides a simpletool to improve inference for heavy-tailed data in statisticalpractice

I also provide the R package Lambert W publicly avail-able at CRAN to facilitate the use of heavy tail Lambert Wtimes 119865

119883distributions in practice

Disclosure

The first version of this article entitled ldquoClosed-form cdfand pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert Wapproach and how to bijectively lsquoGaussianizersquo heavy-taileddatardquo has appeared online [56]This research was performedwhile the author was at Carnegie Mellon University Cur-rently Georg M Goerg works at Google Inc New York NY10011 USA

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

References

[1] P J Brockwell andR ADavisTime SeriesTheory andMethodsSpringer Series in Statistics 1998

[2] R F Engle ldquoAutoregressive conditional heteroscedasticity withestimates of the variance of United Kingdom inflationrdquo Econo-metrica vol 50 no 4 pp 987ndash1007 1982

[3] C W J Granger and R Joyeux ldquoAn introduction to long-memory time series models and fractional differencingrdquo Jour-nal of Time Series Analysis vol 1 pp 15ndash30 2001

[4] C A Field ldquoUsing the gh distribution to model extreme windspeedsrdquo Journal of Statistical Planning and Inference vol 122 no1-2 pp 15ndash22 2004

[5] A Vazquez J G Oliveira Z Dezso K-I Goh I Kondor andA-L Barabasi ldquoModeling bursts and heavy tails in humandynamicsrdquo Physical Review E vol 73 no 3 Article ID 0361272006

[6] M Gidlund and N Debernardi ldquoScheduling performance ofheavy-tailed data traffic in wireless high-speed shared chan-nelsrdquo in Proceedings of the IEEE Wireless Communications ampNetworking Conference (WCNC rsquo09) pp 1818ndash1823 IEEE PressBudapest Hungary April 2009

[7] R Cont ldquoEmpirical properties of asset returns stylized factsand statistical issuesrdquo Quantitative Finance vol 1 pp 223ndash2362001

[8] T-H Kim and H White ldquoOn more robust estim ation ofskewness and kurtosis simulation and application to the SP500indexrdquo Tech Rep Department of Economics UCSD 2003

The Scientific World Journal 15

[9] T C Aysal and K E Barner ldquoSecond-order heavy-taileddistributions and tail analysisrdquo IEEE Transactions on SignalProcessing vol 54 no 7 pp 2827ndash2832 2006

[10] V K Smith ldquoLeast squares regression with Cauchy errorsrdquoOxford Bulletin of Economics and Statistics vol 35 no 3 pp223ndash231 1973

[11] J Ilow ldquoForecasting network traffic using FARIMAmodels withheavy tailed innovationsrdquo inProceedings of the IEEE InterntionalConference onAcoustics Speech and Signal Processing pp 3814ndash3817 June 2000

[12] W Palma and M Zevallos ldquoFitting non-Gaussian persistentdatardquo Applied Stochastic Models in Business and Industry vol27 no 1 pp 23ndash36 2011

[13] J Nowicka-Zagrajek and R Weron ldquoModeling electricity loadsin California ARMA models with hyperbolic noiserdquo SignalProcessing vol 82 no 12 pp 1903ndash1915 2002

[14] R J Adler R E Feldman and M S Taqqu Eds A PracticalGuide to Heavy Tails Statistical Techniques and ApplicationsBirkhauser Cambridge Mass USA 1998

[15] H Liu J Lafferty and L Wasserman ldquoThe nonparanormalsemiparametric estimation of high dimensional undirectedgraphsrdquo Journal ofMachine Learning Research vol 10 pp 2295ndash2328 2009

[16] G M Goerg ldquoLambert W random variablesmdasha new familyof generalized skewed distributions with applications to riskestimationrdquo The Annals of Applied Statistics vol 5 no 3 pp2197ndash2230 2011

[17] D C Hoaglin ldquoSummarizing shape numerically the g-and-hdistributionsrdquo in Exploring Data Tables Trends and Shapes DC Hoaglin F Mosteller and J W Tukey Eds chapter 11 pp461ndash513 John Wiley amp Sons Hoboken NJ USA 1985

[18] RDevelopment Core Team R A Language and Environment forStatistical Computing R Foundation for Statistical ComputingVienna Austria 2010

[19] C Field and M G Genton ldquoThe Multivariate g-and-h Distri-butionrdquo Technometrics vol 48 no 1 pp 104ndash111 2006

[20] RM Sakia ldquoThe Box-Cox transformation technique a reviewrdquoJournal of the Royal Statistical Society Series D The Statisticianvol 41 no 2 pp 169ndash178 1992

[21] J R Blaylock L E Salathe and R D Green ldquoA note on the Box-Cox transformation under heteroskedasticityrdquoWestern Journalof Agricultural Economics vol 5 no 2 pp 129ndash136 1980

[22] A J Lawrance ldquoA note on the variance of the Box-Cox regres-sion transformation estimaterdquo Journal of the Royal StatisticalSociety C vol 36 no 2 pp 221ndash223 1987

[23] G Tsiotas ldquoOn the use of the Box-Cox transformation onconditional variance modelsrdquo Finance Research Letters vol 4no 1 pp 28ndash32 2007

[24] S Goncalves andNMeddahi ldquoBox-Cox transforms for realizedvolatilityrdquo Journal of Econometrics vol 160 no 1 pp 129ndash1442011

[25] N H Bingham C M Goldie and J L Teugels RegularVariation Encyclopedia of Mathematics and its ApplicationsCambridge University Press Cambridge UK 1989

[26] S Morgenthaler and J W Tukey ldquoFitting quantiles doublingHRHQ andHHH distributionsrdquo Journal of Computational andGraphical Statistics vol 9 no 1 pp 180ndash195 2000

[27] M Fischer ldquoGeneralized Tukey-type distributions with appli-cation to financial and teletraffic datardquo Statistical Papers vol 51no 1 pp 41ndash56 2010

[28] T C Headrick R K Kowalchuk and Y Sheng ldquoParametricprobability densities and distribution functions for Tukey g-and-h transform ations and their use for fi tting datardquo AppliedMathematical Sciences vol 2 no 9 pp 449ndash462 2008

[29] R M Corless G H Gonnet D E Hare D J Jeffrey andD E Knuth ldquoOn the Lambert 119882 functionrdquo Advances inComputational Mathematics vol 5 no 4 pp 329ndash359 1996

[30] M Rosenlicht ldquoOn the explicit solvability of certain transcen-dental equationsrdquo Publications Mathematiques de lrsquoInstitut desHautes Etudes Scientifiques no 36 pp 15ndash22 1969

[31] S R Valluri D J Jeffrey and R M Corless ldquoSome applicationsof the Lambert W function to physicsrdquo Canadian Journal ofPhysics vol 78 no 9 pp 823ndash831 2000

[32] M Galassi J Davies J Theiler et al GNU Scientific LibraryReference Manual 3rd edition 2011 httpwwwgnuorgsoft-waregsl

[33] P Jodra ldquoA closed-form expression for the quantile functionof the Gompertz-Makeham distributionrdquo Mathematics andComputers in Simulation vol 79 no 10 pp 3069ndash3075 2009

[34] A G Pakes ldquoLambertrsquos W infinite divisibility and poissonmixturesrdquo Journal of Mathematical Analysis and Applicationsvol 378 no 2 pp 480ndash492 2011

[35] M Stehlık ldquoDistributions of exact tests in the exponentialfamilyrdquoMetrika vol 57 no 2 pp 145ndash164 2003

[36] J-N Hwang S-R Lay and A Lippman ldquoNonparametricmultivariate density estimation a comparative studyrdquo IEEETransactions on Signal Processing vol 42 no 10 pp 2795ndash28101994

[37] L Wasserman All of Nonparametric Statistics Springer Texts inStatistics 2007

[38] R E Maiboroda and N M Markovich ldquoEstimation of heavy-tailed probability density functionwith application toweb datardquoComputational Statistics vol 19 no 4 pp 569ndash592 2004

[39] N M Markovich ldquoAccuracy of transformed kernel densityestimates for a heavy-tailed distributionrdquo Automation andRemote Control vol 66 no 2 pp 217ndash232 2005

[40] K K Dutta and D Babbel ldquoOn measuring skewness andkurtosis in short rate distributions the case of the US dollarLondon inter bank offer ratesrdquo Tech Rep Wharton SchoolCenter for Financial Institutions University of Pennsylvania2002

[41] C S Wong W S Chan and P L Kam ldquoA Student t-mixtureautoregressivemodel with applications to heavy-tailed financialdatardquo Biometrika vol 96 no 3 pp 751ndash760 2009

[42] J Yan ldquoAsymmetry fat-tail and autoregressive conditionaldensity in financial return data with systems of frequencycurvesrdquo 2005 httpciteseerxistpsueduviewdocsummarydoi=1011762741

[43] G D Rayner and H L MacGillivray ldquoNumerical maximumlikelihood estimation for the g-and-k and generalized g-and-hdistributionsrdquo Statistics and Computing vol 12 no 1 pp 57ndash752002

[44] E L Lehmann and G Casella Theory of Point EstimationSpringer Texts in Statistics Springer New York NY USA 2ndedition 1998

[45] C Fernandez andM F Steel ldquoMultivariate Student-119905 regressionmodels pitfalls and inferencerdquo Biometrika vol 86 no 1 pp153ndash167 1999

[46] C Liu and D B Rubin ldquoML estimation of the t distributionusing EM and its extensions ECMand ECMErdquo Statistica Sinicavol 5 pp 19ndash39 1995

16 The Scientific World Journal

[47] M P Cohen ldquoSample means of independent standard CAUchyrandom variables are standard cauchy a new approachrdquo TheAmerican Mathematical Monthly vol 119 no 3 pp 240ndash2442012

[48] T Bollerslev ldquoGeneralized autoregressive conditional het-eroskedasticityrdquo Journal of Econometrics vol 31 no 3 pp 307ndash327 1986

[49] R Deo C Hurvich and Y Lu ldquoForecasting realized volatilityusing a long-memory stochastic volatility model estimationprediction and seasonal adjustmentrdquo Journal of Econometricsvol 131 no 1-2 pp 29ndash58 2006

[50] AMelino and SM Turnbull ldquoPricing foreign currency optionswith stochastic volatilityrdquo Journal of Econometrics vol 45 no 1-2 pp 239ndash265 1990

[51] A Azzalini and A Capitanio ldquoDistributions generated byperturbation of symmetrywith emphasis on amultivariate skewt-distributionrdquo Journal of the Royal Statistical Society Series BStatistical Methodology vol 65 no 2 pp 367ndash389 2003

[52] R N Mantegna and H E Stanley ldquoModeling of financial datacomparison of the truncated Levy flight and the ARCH(1) andGARCH(11) processesrdquo Physica A Statistical Mechanics and itsApplications vol 254 no 1-2 pp 77ndash84 1998

[53] P A Zadrozny Necessary and Sufficient Restrictions for Exis-tence of a Unique Fourth Moment of a Univariate GARCH(pq)Process vol 20 of Advances in Econometrics chapter 13Emerald Group Publishing 2005 httpsideasrepecorgpcesceswps 1505html

[54] R Huisman K G Koedijk C J Kool and F Palm ldquoTail-indexestimates in small samplesrdquo Journal of Business amp EconomicStatistics vol 19 no 2 pp 208ndash216 2001

[55] H CThode JrTesting for Normality CRCPress NewYorkNYUSA 2002

[56] G M Goerg ldquoClosed-form cdf and pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert W approach and howto bijectively ldquoGaussianizerdquo heavy-tailed datardquo Tech RepCarnegieMellonUniversity 2010 httparxiv-web3librarycor-nelleduabs10102265v1

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article The Lambert Way to Gaussianize Heavy

The Scientific World Journal 3

0 100 200 300 400 500minus200

0

200

600

y

(a) Random sample y sim C(0 1)

0 100 200 300 400 5000

5

10

15

20

25

30

X120582

(b) Box-Cox transformed xMLE

with = 037

0 100 200 300 400 500

minus3

minus2

minus1

0

1

2

3

4

X120591

(c) Gaussianized x120591MLE with 120591 = (003 105 086)

0 100 200 300 400 500minus3

minus2

minus1

0

1

2

Number of samples

Cum

ulat

ive s

ampl

e ave

rage

CauchyGaussianized

(d) Cumulative sample average

Figure 2 Gaussianizing a standard Cauchy sample For (d) 120591(119899) was estimated for each fixed 119899 = 5 500 before Gaussianizing (1199101 119910

119899)

12 Box-Cox Transformation A popular method to deal withskewed high variance data is the Box-Cox transformation

x120582=

y120582 minus 1120582

if 120582 gt 0log y if 120582 = 0

(1)

A major limitation of (1) is the nonnegativity constraint ony which prohibits its use in many applications To avoidthis limitation it is common to shift the data y = y +|min(y)| ge 0 which restricts 119884 to a half-open intervalIf however the underlying process can occur on the entirereal line such a shift undermines statistical inference for yetunobserved data (see [20]) Even if out-of-sample predictionis not important for the practitioner Figure 2(b) shows thatthe Box-Cox transformation in fact fails to remove heavy tailsfrom the Cauchy sample (We use y = y + |min(y)| + 1 anduse boxcox from the MASS R package 120582 = 037)

Moreover the main purpose of the Box-Cox transfor-mation is to stabilize variance [21ndash23] and remove right tailskewness [24] a lower empirical kurtosis is merely a by-result of the variance stabilization In contrast the Lambert

W framework is designed primarily to model heavy-tailedrandom variables and remove heavy tails from data and hasno difficulties with negative values

2 Generating Heavy Tails UsingTransformations

Random variables exhibit heavy tails if more mass than for aGaussian random variable lies at the outer end of the densitysupport A random variable 119885 has a tail index 119886 if its cdfsatisfies 1 minus 119865

119885(119911) sim 119871(119911)119911

minus119886 where 119871(119911) is a slowly varyingfunction at infinity that is lim

119911rarrinfin119871(119905119911)119871(119911) = 1 for all

119905 gt 0 [25] (There are various similar definitions of heavy fator long tails for this work these differences are not essential)The heavy tail index 119886 is an important characteristic of 119885 forexample only moments up to order 119886 can exist

21 Tukeyrsquos ℎ Distribution A parametric transformation isthe basis of Tukeyrsquos ℎ random variables [17]

119885 = 119880 exp(ℎ2

1198802) ℎ ge 0 (2)

4 The Scientific World Journal

minus2 minus1 0 1 2 3 4minus2minus1

01234

u

z=G120575(u)

120575r = 110z = u

120575120001 = 0

(a) 120575ℓ120575119903 transformation (3)

minus2 minus1 0 1 2 3 4minus2

minus1

0

1

2

z

120575r = 110u = z

u=W

120575(z)

120575120001 = 0

(b) Inverse119882120575ℓ 120575119903(119911) in (80) in the supplemen-

tary material

z

01 02 03 04 05 06 07 08 09 1 1

1 12

13

14 15

16 17

18 19

2

23 2

4

00 05 10 1500051015202530

120575

W120575(z)

(c) Inverse119882120575(119911)(9) as a function of 120575 and 119911

Figure 3 Transformation and inverse transformation for 120575ℓ= 0 and 120575

119903= 110 identity on the left (same tail behavior) and a heavy-tailed

transformation in the right tail of input 119880

where 119880 is standard Normal random variable and ℎ isthe heavy tail parameter The random variable 119885 has tailparameter 119886 = 1ℎ [17] and reduces to the Gaussian for ℎ = 0Morgenthaler and Tukey [26] extend the ℎ distribution to theskewed heavy-tailed family of ℎℎ random variables

119885 =

119880 exp(120575ℓ

2

1198802) if 119880 le 0

119880 exp(120575119903

2

1198802) if 119880 gt 0

(3)

where again 119880 sim N(0 1) Here 120575ℓge 0 and 120575

119903ge 0 shape the

left and right tail of 119885 respectively thus transformation (3)can model skewed and heavy-tailed data see Figure 3(a) Forthe sake of brevity let119867

120575(119906) = 119906 exp((1205752)1199062)

However despite their great flexibility Tukeyrsquos ℎ andℎℎ distributions are not very popular in statistical practicebecause expressions for the cdf or pdf have not been availablein closed form Although Morgenthaler and Tukey [26]express the pdf of (2) as (ℎ equiv 120575)

119892119885 (119911) =

119891119880(119867

minus1

120575(119911))

1198671015840

120575(119867

minus1

120575(119911))

(4)

they fall short of making119867minus1

120575(119911) explicit So far the inverse of

(2) or (3) has been considered analytically intractable [4 27]Thus parameter inference relied on matching empirical andtheoretical quantiles [4 17 26] or by themethod ofmoments[28] Only recently Headrick et al [28] provided numericalapproximations to the inverse However numerical approx-imations can be slow and prohibit analytical derivationsThus a closed form analytically tractable pdf which canbe computed efficiently is essential for a widespread use ofTukeyrsquos ℎ (and variants)

In this work I present this long sought closed-forminverse which has a large body of literature in mathematicsand is readily available in standard statistics software Forease of notation and concision main results are shown for120575ℓ= 120575

119903= 120575 analogous results for 120575

ℓ= 120575

119903will be stated

without details

22 Heavy Tail Lambert W Random Variables Tukeyrsquos ℎtransformation (2) is strongly related to the approach takenby Goerg [16] to introduce skewness in continuous randomvariables 119883 sim 119865

119883(119909) In particular if 119885 sim Tukeyrsquos ℎ then

1198852sim skewed Lambert W times 120594

2

1with skew parameter 120574 = ℎ

Adapting the skew Lambert W times 119865119883inputoutput idea

(see Figure 1) Tukeyrsquos ℎ random variables can be generalizedto heavy-tailed Lambert W times 119865

119883random variables (Most

concepts and methods from the skew Lambert W times 119865119883case

transfer one-to-one to the heavy tail Lambert W randomvariables presented hereThus for the sake of concision I referto Goerg [16] for details of the Lambert W framework)

Definition 1 Let119880 be a continuous random variable with cdf119865119880(119906 | 120573) pdf 119891

119880(119906 | 120573) and parameter vector 120573 Then

119885 = 119880 exp(1205752

1198802) 120575 isin R (5)

is a noncentral nonscaled heavy tail LambertW times 119865119880random

variable with parameter vector 120579 = (120573 120575) where 120575 is the tailparameter

Tukeyrsquos ℎ distribution results for 119880 being a standardGaussianN(0 1)

Definition 2 For a continuous location-scale family randomvariable 119883 sim 119865

119883(119909 | 120573) define a location-scale heavy-tailed

LambertW times 119865119883random variable

119884 = 119880 exp(1205752

1198802)120590

119883+ 120583

119883 120575 isin R (6)

with parameter vector 120579 = (120573 120575) where119880 = (119883minus120583119883)120590

119883and

120583119883and120590

119883aremean and standard deviation of119883 respectively

The input is not necessarily Gaussian (Tukeyrsquos ℎ) but canbe any other location-scale continuous random variable forexample from a uniform distribution 119883 sim 119880(119886 119887) (seeFigure 4)

The Scientific World Journal 5

120575 0110

00 05 10 15 20 25 30000204060810

y

Pdf

00 05 10 15 20 25 300002040608

y

Cdf

(a) Lambert Wtimes1205942

119896with 120573=119896=1

13

Pdf

000204060810

Cdf

0 2 4 6 8 10 12000

010

020

y

0 2 4 6 8 10 12y

(b) Lambert W times Γ(119904 119903) with 120573 =(119904 119903) = (3 1)

1

minus4 minus2 0 2 4000204060810

y

Cdf

minus4 minus2 0 2 40001020304

y

Pdf

(c) Lambert W times N(120583 1205902) with

120573 = (120583 120590) = (0 1)

minus4 minus2 0 2 4000204060810

y

Cdf

minus4 minus2 0 2 4000102030405

y

Pdf

5

(d) Lambert W times 119880(119886 119887) with120573 = (119886 119887) = (minus1 1)

Figure 4 Pdf (left) and cdf (right) of a heavy tail (a) ldquononcentral nonscaledrdquo (b) ldquoscalerdquo and (c and d) ldquolocation-scalerdquo Lambert W times 119865119883

random variable 119884 for various degrees of heavy tails (color dashed lines)

Definition 3 Let 119883 sim 119865119883(119909119904 | 120573) be a continuous scale-

family random variable with scale parameter 119904 and standarddeviation 120590

119883 let 119880 = 119883120590

119883 Then

119884 = 119883 exp(1205752

1198802) 120575 isin R (7)

is a scaled heavy-tailed Lambert W times 119865119883random variable

with parameter 120579 = (120573 120575)

Let 120591 = (120583119883(120573) 120590

119883(120573) 120575) define transformation (6) (For

noncentral nonscale input set 120591 = (0 1 120575) for scale-familyinput 120591 = (0 120590

119883 120575)) The shape parameter 120575(= Tukeyrsquos ℎ)

governs the tail behavior of 119884 for 120575 gt 0 values further awayfrom 120583

119883are increasingly emphasized leading to a heavy-

tailed version of119883 for 120575 = 0 119884 equiv 119883 and for 120575 lt 0 values faraway from the mean are mapped back again closer to 120583

119883 For

120575 ge 0 and 119883 isin (minusinfininfin) also 119884 isin (minusinfininfin) For 120575 ge 0 and119883 isin [0infin) also 119884 isin [0infin)

Remark 4 (only nonnegative 120575) Although 120575 lt 0 givesinteresting properties for 119884 it defines a nonbijective trans-formation and leads to parameter-dependent support andnonunique inputThus for the remainder of this work assume120575 ge 0 unless stated otherwise

23 Inverse Transformation ldquoGaussianizerdquo Heavy-Tailed DataTransformation (6) is bijective and its inverse can beobtained via the Lambert W function which is the inverseof 119911 = 119906 exp(119906) that is that function which satisfies119882(119911) exp(119882(119911)) = 119911 It has been studied extensively inmathematics physics and other areas of science [29ndash31] andis implemented in the GNU Scientific Library (GSL) [32]Only recently the Lambert W function received attentionin the statistics literature [16 33ndash35] It has many usefulproperties (see Appendix A in the supplementary material

and Corless et al [29]) in particular 119882(119911) is bijective for119911 ge 0

Lemma 5 The inverse transformation of (6) is

119882120591 (119884) = 119882120575

(

119884 minus 120583119883

120590119883

)120590119883+ 120583

119883= 119880120590

119883+ 120583

119883= 119883 (8)

where

119882120575 (119911) = sgn (119911)(

119882(1205751199112)

120575

)

12

(9)

and sgn(119911) is the sign of 119911119882120575(119911) is bijective for all 120575 ge 0 and

all 119911 isin R

Lemma 5 gives for the first time an analytic bijectiveinverse of Tukeyrsquos ℎ transformation119867minus1

120575(119910) of Morgenthaler

and Tukey [26] is now analytically available as (8) Bijectivityimplies that for any data y and parameter 120591 the exact inputx120591= 119882

120591(y) sim 119865

119883(119909) can be obtained

In view of the importance and popularity of Normalitywe clearly want to back-transform heavy-tailed data todata from a Gaussian rather than yet another heavy-taileddistribution Tail behavior of random variables is typicallycompared by their kurtosis 120574

2(119883) = E(119883 minus 120583

119883)4120590

4

119883 which

equals 3 if 119883 is Gaussian Hence for the future when weldquonormalize yrdquo we cannot only center and scale but alsotransform it to x

120591with 120574

2(x120591) = 3 (see Figure 2(c)) While

1205742(119883) = 3 does not guarantee that 119883 is Gaussian it is a good

baseline for a Gaussian sample Furthermore it puts differentdata not only on the same scale but also on the same tail

This data-driven view of the Lambert W framework canalso be useful for kernel density estimation (KDE) wheremultivariate data is often prescaled to unit variance so thesame bandwidth can be used in each dimension [36 37]Thus

6 The Scientific World Journal

ldquonormalizingrdquo the Lambert Way can also improve KDE forheavy-tailed data (see also [38 39])

Remark 6 (generalized transformation) Transformation (2)can be generalized to

119885 = 119880 exp(1205752

(1198802)

120572

) 120572 gt 0 (10)

The inner term 1198802 guarantees bijectivity for all 120572 gt 0 The

inverse is

119882120575120572 (

119911) = sgn (119911)(119882(120572120575 (119911

2)

120572

)

120572120575

)

12120572

(11)

For comparison with Tukeyrsquos ℎ I consider 120572 = 1 onlyFor 120572 = 12 transformation (10) is closely related to skewedLambert W times 119865

119883distributions

24 Distribution and Density For ease of notation let

119911 =

119910 minus 120583119883

120590119883

119906 = 119882120575 (119911) 119909 = 119882

120591(119910) = 119906120590

119883+ 120583

119883 (12)

Theorem 7 The cdf and pdf of a location-scale heavy tailLambert W times 119865

119883random variable 119884 equal

119866119884(119910 | 120573 120575) = 119865

119883(119882

120575(

119910 minus 120583119909

120590119909

)120590119883+ 120583

119883| 120573) (13)

119892119884(119910 | 120573 120575)

= 119891119883(119882

120575(

119910 minus 120583119883

120590119883

)120590119883+ 120583

119883| 120573)

sdot

119882120575((119910 minus 120583

119883) 120590

119883)

((119910 minus 120583119883) 120590

119883) [1 +119882(120575 ((119910 minus 120583

119883) 120590

119883)2)]

(14)

Clearly 119866119884(119910 | 120573 120575 = 0) = 119865

119883(119910 | 120573) and 119892

119884(119910 | 120573 120575 =

0) = 119891119883(119910 | 120573) since lim

120575rarr0119882120575(119911) = 119911 and lim

120575rarr0119882(120575119911

2) =

0 for all 119911 isin RFor scale family or noncentral nonscale input set 120583

119883= 0

or 120583119883= 0 120590

119883= 1

The explicit formula (14) allows a fast computation andtheoretical analysis of the likelihood which is essential forstatistical inference Detailed properties of (14) are given inSection 41

Figure 4 shows (13) and (14) for various 120575 ge 0 with fourdifferent input distributions for 120575 = ℎ = 0 the input equalsthe output (solid black) for larger 120575 the tails of 119866

119884(119910 | 120579) and

119892119884(119910 | 120579) get heavier (dashed colored)

25 Quantile Function Quantile fitting has been the standardtechnique to estimate 120583

119883120590

119883 and 120575 of Tukeyrsquos ℎ In particular

the medians of 119884 and 119883 are equal Thus for symmetriclocation-scale family input the sample median of y is a robust

estimate of 120583119883for any 120575 ge 0 (see also Section 5) General

quantiles can be computed via [17]

119910120572= 119906

120572exp(120575

2

1199062

120572)120590

119883+ 120583

119883 (15)

where 119906120572= 119882

120575(119911120572) are the 120572-quantiles of 119865

119880(119906)

3 Tukeyrsquos ℎ Distribution Gaussian Input

For Gaussian input Lambert W times 119865119883equals Tukeyrsquos ℎ which

has been studied extensively Dutta and Babbel [40] show that

E119885119899=

0 if 119899 is odd 119899 lt 1120575

119899 (1 minus 119899120575)minus(119899+1)2

21198992(1198992)

if 119899 is even 119899 lt 1120575

∄ if 119899 is odd 119899 gt 1120575

infin if 119899 is even 119899 gt 1120575

(16)

which in particular implies that [28]

E119885 = E1198853= 0 if 120575 lt 1 and 1

3

respectively (17)

E1198852=

1

(1 minus 2120575)32 if 120575 lt 1

2

E1198854= 3

1

(1 minus 4120575)52 if 120575 lt 1

4

(18)

Thus the kurtosis of 119884 equals (see Figure 5)

1205742 (120575) = 3

(1 minus 2120575)3

(1 minus 4120575)52

for 120575 lt 14 (19)

For 120575 = 0 (18) and (19) reduce to the familiarGaussian valuesExpanding (19) around 120575 = 0 yields

1205742 (120575) = 3 + 12120575 + 66120575

2+ O (120575

3) (20)

Dropping O(1205753) and solving (20) gives a rule of thumbestimate

120575Taylor =

1

66

[radic66 1205742(y) minus 162 minus 6]

+

(21)

where 1205742(y) is the sample kurtosis and [119886]

+= max(119886 0) that

is 120575Taylor gt 0 if 1205742(y) gt 3 otherwise set 120575Taylor = 0

Corollary 8 The cdf of Tukeyrsquos ℎ equals

119866119884(119910 | 120583

119883 120590119883 120575) = Φ(

119882120591(119910) minus 120583

119883

120590119883

) (22)

whereΦ(119906) is the cdf of a standard NormalThe pdf equals (for120575 gt 0)

119892119884(119910 | 120583

119883 120590119883 120575) =

1

radic2120587

exp(minus1 + 1205752

119882120575(

119910 minus 120583119883

120590119883

)

2

)

sdot

1

1 +119882(120575 ((119910 minus 120583119883) 120590

119883)2)

(23)

The Scientific World Journal 7

000 005 010 015 020 02510

15

20

25

Varia

nce

Studentrsquos t

120575 or 1

Heavy tail Lambert W times Gaussian

(a) Variance

000 005 010 0153

4

5

6

7

8

9

10

Kurt

osis

120575 or 1

Studentrsquos tHeavy tail Lambert W times Gaussian

(b) Kurtosis

Figure 5 Comparing moments of Lambert W times Gaussian and Studentrsquos 119905

Proof Take119883 simN(120583119883 120590

2

119883) in Theorem 7

Section 41 studies functional properties of (23) in moredetail

31 Tukeyrsquos ℎ versus Studentrsquos 119905 Studentrsquos 119905]-distribution with] degrees of freedom is often used to model heavy-tailed data[41 42] as its tail index equals ] Thus the 119899th moment of aStudentrsquos 119905 random variable 119879 exists if 119899 lt ] In particular

E119879 = E1198793= 0 if ] gt 1 or gt 3

E1198792=

]] minus 2

=

1

1 minus (2])if ] gt 2

(24)

and kurtosis

1205742 (]) = 3

] minus 2] minus 4

= 3

1 minus 2 (1])1 minus 4 (1])

if ] gt 4 (25)

Comparing (24) and (25) with (18) and (19) shows anatural association between 1] and 120575 and a close similaritybetween the first four moments of Studentrsquos 119905 and Tukeyrsquosℎ (Figure 5) By continuity and monotonicity the first fourmoments of a location-scale 119905-distribution can always beexactly matched by a corresponding location-scale LambertW times Gaussian Thus if Studentrsquos 119905 is used to model heavytails and not as the true distribution of a test statisticit might be worthwhile to also fit heavy tail Lambert Wtimes Gaussian distributions for an equally valuable ldquosecondopinionrdquo For example a parallel analysis on SampP 500 log-returns in Section 62 leads to divergent inference regardingthe existence of fourth moments

4 Parameter Estimation

Due to the lack of a closed form pdf of 119884 120579 = (120573 120575) hastypically been estimated by matching quantiles or a methodof moments estimator [4 26 28] These methods can now

be replaced by the fast and usually efficient maximumlikelihood estimator (MLE) Rayner and MacGillivray [43]introduce a numerical MLE procedure based on quantilefunctions but they conclude that ldquosample sizes significantlylarger than 100 should be used to obtain reliable estimatesrdquoSimulations in Section 5 show that the MLE using the closedform Lambert W times 119865

119883distribution converges quickly and is

accurate even for sample sizes as small as119873 = 10

41 Maximum Likelihood Estimation (MLE) For an iidsample (119910

1 119910

119873) = y sim 119892

119884(119910 | 120573 120575) the log-likelihood

function equals

ℓ (120579 y) =119873

sum

119894=1

log119892119884(119910

119894| 120573 120575) (26)

The MLE is that 120579 = (120573 120575) which maximizes (26) that is

120579MLE = (

120573 120575)MLE

= argmax120573120575ℓ (120573 120575 y) (27)

Since 119892119884(119910119894| 120573 120575) is a function of 119891

119883(119909119894| 120573) the MLE

depends on the specification of the input density Equation(26) can be decomposed as

ℓ (120573 120575 y) = ℓ (120573 x120591) +R (120591 y) (28)

where

ℓ (120573 x120591) =

119873

sum

119894=1

log119891119883(119882

120575(

119910119894minus 120583

119883

120590119883

)120590119883+ 120583

119883| 120573)

=

119873

sum

119894=1

log119891119883(119909

120591119894| 120573)

(29)

is the log-likelihood of the back-transformed data x120591=

(1199091205911 119909

120591119873) (via (8)) and

R (120591 y) =119899

sum

119894=1

log119877 (120583119883 120590119883 120575 119910

119894) (30)

8 The Scientific World Journal

y

01

015 02 025 03 035 04

045 05

055

06 065 07 075 08 085 09 095

1

00 05 10 1500

05

10

15

20

25

30

120575

R(120575 y | 120583 = 0 120590 = 1)

(a) Penalty 119877(120583119883 120590119883 120575 119910119894)(31) as a function of120575 and 119910 (120583119883 = 0 and 120590119883 = 1)

0 200 400 600 800 1000

minus10

0

10

20

30

40

Index00 05 10 15

minus3000

minus2000

minus1000

0

Log-

likel

ihoo

d (a

nd p

enal

ty)

Penalty

Input loglikOutput loglik

120575

120575trueMLE

(b) Sample z of Lambert W times Gaussian with 120575 = 13 (left) log-likelihood ℓ(120579 y) (solid black)decomposes in input log-likelihood (dotted green) and penalty (dashed red)

Figure 6 Log-likelihood decomposition for Lambert W times 119865119883distributions

where

119877 (120583119883 120590119883 120575 119910

119894)

=

119882120575((119910

119894minus 120583

119883) 120590

119883)

((119910119894minus 120583

119883) 120590

119883) [1 + 120575 (119882

120575((119910

119894minus 120583

119883) 120590

119883))2]

(31)

Note that 119877(120583119883 120590119883 120575 119910

119894) only depends on 120583

119883(120573) and 120590

119883(120573)

(and 120575) but not necessarily on every coordinate of 120573Decomposition (28) shows the difference between the

exact MLE (120573 120575) based on y and the approximate MLE 120573x120591based on x

120591alone if we knew 120591 = (120583

119883 120590119883 120575) beforehand

then we could back-transform y to x120591and estimate 120573x120591 from

x120591(maximize (29) with respect to 120573) In practice however

120591 must also be estimated and this enters the likelihood viathe additive term R(120591 y) A little calculation shows that forany 119910

119894isin R log119877(120583

119883 120590119883 120575 119910

119894) le 0 if 120575 ge 0 with equality

if and only if 120575 = 0 Thus R(120591 y) can be interpreted as apenalty for transforming the data Maximizing (28) faces atrade-off between transforming the data to follow 119891

119883(119909 | 120573)

(and increasing ℓ(120573 x120591)) and the penalty of a more extreme

transformation (and decreasingR(120591 y)) see Figure 6(b)Figure 6(a) shows a contour plot of 119877(120583

119883= 0 120590

119883=

1 120575 119910) as a function of 120575 and 119910 = 119911 it increases (in absolutevalue) either if 120575 gets larger (for fixed 119910) or for larger 119910 (forfixed 120575) In both cases increasing 120575 makes the transformeddata119882

120575(119911) get closer to 0 = 120583

119883 which in turn increases its

input likelihood For 120575 = 0 the penalty disappears since inputequals output for 119910 = 0 there is no penalty since119882

120575(0) = 0

for all 120575Figure 6(b) shows an iid sample (119873 = 1000) z sim

Lambert W times Gaussian with 120575 = 13 and the decompositionof the log-likelihood as in (28) Since 120573 = (0 1) is knownthe likelihood and penalty are only functions of 120575Theorem 9

shows that the convexity of the penalty (decreasing red)and concavity of the input likelihood (increasing green) asa function of 120575 holds true in general for any data z and theirsum (solid black) has a unique maximum here 120575MLE = 037(blue dashed vertical line) See Theorem 9 below for details

The maximization of (28) can be carried out numericallyHere I show existence and uniqueness of 120575MLE assuming that120583119883and 120590

119883are known Further theoretical results for 120579MLE

remain for futureworkGiven the ldquonicerdquo formof119892119884(119910) which

is continuous twice differentiable (under the assumption that119891119909(sdot) is twice differentiable) the MLE for 120579 = (120573 120575) shouldhave the usual optimality properties such as being consistentand efficient [44]

411 Properties of the MLE for the Heavy Tail Parameter 120575Without loss of generality let 120583

119883= 0 and 120590

119883= 1 In this case

ℓ (120575 z) prop minus

1

2

119873

sum

119894=1

[119882120575(119911119894)]2+

119873

sum

119894=1

log119882120575(119911119894)

119911119894

minus

119873

sum

119894=1

log (1 + 120575 [119882120575(119911119894)]2)

(32)

= minus

1 + 120575

2

119873

sum

119894=1

[119882120575(119911119894)]2minus

119873

sum

119894=1

log (1 + 120575 [119882120575(119911119894)]2)

(33)

Theorem 9 (see [14]) Let 119885 have a Lambert W times Gaussiandistribution where 120583

119883= 0 and 120590

119883= 1 are assumed to be

known and fixed Also consider only 120575 isin [0infin) (While forsome samples z the MLE also exists for 120575 lt 0 it cannot beguaranteed for all z If 120575 lt 0 (and 119911 = 0) then119882

120575(119911) is either

The Scientific World Journal 9

not unique in R (principal and nonprincipal branch) or doesnot have a real-valued solution in R if 1205751199112 lt 119890minus1)

(a) If

sum119899

119894=11199114

119894

sum119899

119894=11199112

119894

le 3 (34)

then 120575119872119871119864

= 0If (34) does not hold then

(b) 120575119872119871119864

gt 0 exists and is a positive solution to

119873

sum

119894=1

1199112

119894119882

1015840(120575119911

2

119894)(

1

2

119882120575(119911119894)2minus (

1

2

+

1

1 +119882(1205751199112

119894)

)) = 0 (35)

(c) There is only one such 120575 satisfying (35) that is 120575119872119871119864

isunique

Proof See Appendix B in the supplementary material

Condition (34) says that 120575MLE gt 0 only if the datais heavy-tailed enough Points (b) and (c) guarantee thatthere is no ambiguity in the heavy tail estimate This is anadvantage over Studentrsquos 119905-distribution for example whichhas numerical problems and local maxima for unknown (andsmall) ] [45 46] On the contrary 120575MLE is always a globalmaximum

Given the heavy tails in z one might expect convergenceissues for larger 120575 However 119882

120575(119885) sim N(0 1) for the true

120575 ge 0 and close to a standard Gaussian if 120575MLE asymp 120575 Since thelog-likelihood and its gradient depend on 120575 and z only via119882120575(z) the performance of theMLE should thus not get worse

for large 120575 as long as the initial estimate is close enough to thetruth Simulations in Section 5 support this conjecture evenfor 120579MLE

42 Iterative Generalized Method of Moments (IGMM) Adisadvantage of the MLE is the mandatory a priori specifi-cation of the input distribution Especially for heavy-taileddata the eye is a bad judge to choose a particular parametric119891119883(119909 | 120573) It would be useful to directly estimate 120591 without

the intermediate step of estimating 120579 firstGoerg [16] presented an estimator for 120591 based on iterative

generalized methods of moments (IGMM) The idea ofIGMM is to find a 120591 such that the back-transformed datax120591has desired properties for example being symmetric or

having kurtosis 3 An estimator for 120583119883 120590

119883 and 120575 can be

constructed entirely analogous to the IGMMestimator for theskewed LambertWtimes F case See the SupplementaryMaterialAppendix C for details

IGMM requires less specific knowledge about the inputdistribution and is usually also faster than the MLE Once120591IGMM has been obtained the back-transformed x

120591IGMMcan be

used to check if 119883 has characteristics of a known parametricdistribution 119865

119883(119909 | 120573) It must be noted though that testing

for a particular distribution 119865119883is too optimistic as x

120591has

ldquonicerrdquo properties regarding 119865119883than the true x would have

However estimating the transformation requires only threeparameters and for a large enough sample losing threedegrees of freedom should not matter in practice

5 Simulations

This section explores finite sample properties of estimatorsfor 120579 = (120583

119883 120590119883 120575) and (120583

119884 120590119884) under Gaussian input

119883 sim N(120583119883 120590

2

119883) In particular it compares Gaussian MLE

(estimation of 120583119884and 120590

119884only) IGMM and Lambert W times

Gaussian MLE and for a heavy tail competitor the median(For IGMM optimization was restricted to 120575 isin [0 10]) Allresults below are based on 119899 = 1 000 replications

51 Estimating 120575 Only Here I show finite sample propertiesof 120575MLE for119880 simN(0 1) where 120583

119883= 0 and 120590

119883= 1 are known

and fixed Theorem 9 shows that 120575MLE is unique either at theboundary 120575 = 0 or at the globally optimal solution to (35)Results in Table 1 were obtained by numerical optimizationrestricted to 120575 ge 0 (hArr log 120575 isin R) using the nlm function in119877

Table 1 suggests that the MLE is asymptotically unbiasedfor every 120575 and converges quickly (at about 119873 = 400) to itsasymptotic variance which is increasing with 120575 Assuming120583119883and 120590

119883to be known is unrealistic and thus these finite

sample properties are only an indication of the behavior of thejoint MLE 120579MLE Nevertheless they are very remarkable forextremely heavy-tailed data (120575 gt 1) where classic average-based methods typically break down One reason lies in theparticular form of the likelihood (32) and its gradient (35)(Theorem 9) although both depend on z they only do sothrough 119882

120575(z) = u sim N(0 1) Hence as long as 120575MLE is

sufficiently close to the true 120575 (32) and (35) are functions ofalmost Gaussian random variables and standard asymptoticresults should still apply

52 Estimating All Parameters Jointly Here we consider therealistic scenario where 120583

119883and 120590

119883are also unknown We

consider various sample sizes (119873 = 50 100 and 1000) anddifferent degrees of heavy tails 120575 isin 0 13 1 15 each onerepresenting a particularly interesting situation (i) Gaussiandata (does additional superfluous estimation of120575 affect otherestimates) (ii) fourthmoments do not exist (iii) nonexistingmean and (iv) extremely heavy-tailed data (can we get usefulestimates at all)

The convergence tolerance for IGMM was set to tol =122 sdot 10

minus4 Table 2 summarizes the simulationThe Gaussian MLE estimates 120590

119884directly while IGMM

and the Lambert W times Gaussian MLE estimate 120575 and120590119883 which implicitly give

119884through 120590

119884(120575 120590

119883) = 120590

119883sdot

(1radic(1 minus 2120575)32) if 120575 lt 12 (see (18)) For a fair comparison

each subtable also includes a column for 119884

= 119883sdot

(1radic(1 minus 2120575)32) Some of these entries contain ldquoinfinrdquo even for

120575 lt 12 this occurs if at least one 120575 ge 12 For any 120575 lt 1120583119883= 120583

119884 thus 120583

119883and 120583

119884can be compared directly For 120575 ge 1

10 The Scientific World Journal

Table 1 Finite sample properties of 120575MLE For each119873 120575 was estimated 119899 = 1000 times from a random sample zsimTukeyrsquos ℎ The left columnfor each 120575 shows bias 120575MLE minus 120575 each right column shows the root mean square error (RMSE) timesradic119873

119873 120575 = 0 120575 = 110 120575 = 13 120575 = 12

10 0025 0191 minus0017 0394 minus0042 0915 minus0082 1167

50 0013 0187 minus0010 0492 minus0018 0931 minus0016 1156

100 0010 0200 minus0010 0513 minus0009 0914 minus0006 1225

400 0005 0186 minus0003 0528 0000 0927 minus0004 1211

1000 0003 0197 0000 0532 minus0001 0928 minus0001 1203

2000 0003 0217 minus0001 0523 0000 0935 minus0001 1130

119873 120575 = 1 120575 = 2 120575 = 5

10 minus0054 1987 minus0104 3384 minus0050 7601

50 minus0017 1948 minus0009 3529 0014 7942

100 minus0014 2024 minus0001 3294 0011 7798

400 0001 1919 minus0002 3433 0001 7855

1000 0001 1955 0001 3553 minus0001 7409

2000 0001 1896 0000 3508 minus0001 7578

the mean does not exist each subtable for these 120575 interprets120583119884as the median

Gaussian Data (120575 = 0) This setting checks if imposing theLambert W framework even though it is superfluous causesa quality loss in the estimation of 120583

119884= 120583

119883or 120590

119884= 120590

119883

Furthermore critical values for 1198670 120575 = 0 (Gaussian) can

be obtained As in the 120575-only case above Table 2(a) suggeststhat estimators are asymptotically unbiased and quickly tendto a large-sample variance Additional estimation of 120575 doesnot affect the efficiency of120583

119883compared to estimating solely120583

Estimating 120590119884directly by Gaussian MLE does not give better

results than the Lambert W times Gaussian MLE

No Fourth Moment (120575 = 13) Here 120590119884(120575 120590

119883= 1) =

228 but fourth moments do not exist This results in anincreasing empirical standard deviation of

119910as119873 grows In

contrast estimates for 120590119883are not drifting off In the presence

of these large heavy tails themedian ismuch less variable thanGaussianMLE and IGMM Yet Lambert W timesGaussianMLEfor 120583

119883even outperforms the median

Nonexisting Mean (120575 = 1) Both sample moments divergeand their standard errors are also growing quickly Themedian still provides a very good estimate for the locationbut is again inferior to both Lambert W estimators whichare closer to the true values and appear to converge to anasymptotic variance at rateradic119873

Extreme Heavy Tails (120575 = 15) As in Section 51 IGMMand Lambert WMLE continue to provide excellent estimateseven though the data is extremely heavy tailed MoreoverLambert W MLE also has the smallest empirical standarddeviation overall In particular the Lambert W MLE for 120583

119883

has an approximately 15 lower standard deviation than themedian

The last column shows that for some 119873 about 1 of the119899 = 1 000 simulations generated invalid likelihood values(NA and infin) Here the search for the optimal 120575 led into

regions with a numerical overflow in the evaluation of119882120575(119911)

For a comparable summary these few cases were omitted andnew simulations added until full 119899 = 1 000 finite estimateswere found Since this only happened in 1 of the cases andalso such heavy-tailed data is rarely encountered in practicethis numerical issue is not a limitation in statistical practice

53 Discussion of the Simulations IGMM performs wellindependent of the magnitude of 120575 As expected the LambertWMLE for 120579 has the best properties it can recover the truthfor all 120575 and for 120575 = 0 it performs as well as the classicsample mean and standard deviation For small 120575 it has thesame empirical standard deviation as the Gaussian MLE buta lower one than the median for large 120575

Hence the only advantage of estimating 120583119884and 120590

119884by

sample moments of y is speed otherwise the Lambert W times

Gaussian MLE is at least as good as the Gaussian MLE andclearly outperforms it in presence of heavy tails

6 Applications

Tukeyrsquos ℎ distribution has already proven useful to modelheavy-tailed data but parametric inference was limited toquantile fitting ormethods ofmoments estimation [4 27 28]Theorem 7 allows us to estimate 120579 by ML

This section applies the presented methodology on sim-ulated as well as real world data (i) Section 61 demonstratesGaussianizing on the Cauchy sample from the Introductionand (ii) Section 62 shows that heavy tail Lambert W times

Gaussian distributions provide an excellent fit to daily SampP500 log-return series

61 Estimating Location of a Cauchy with the Sample MeanIt is well known that the sample mean y is a poor estimateof the location parameter of a Cauchy distribution since thesampling distribution of y is again a Cauchy (see [47] for arecent overview) in particular its variance does not go to 0for119873 rarr infin

The Scientific World Journal 11

Table 2 In each subtable (first rows) average (middle rows) proportion of estimates below truth and (bottom rows) empirical standarddeviation timesradic119873 True 120583

119883= 0 and 120590

119883= 1

(a) Truly Gaussian data 120575 = 0

120575 = 0 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 098 000 097 002 099 000 096 002 098 0

100 000 000 099 000 098 001 100 000 097 001 099 0

1000 000 000 100 000 099 000 100 000 099 000 100 0

50 050 050 057 051 060 066 054 051 065 066 056 0

100 050 051 056 051 062 062 053 052 065 062 056 0

1000 050 049 052 049 062 056 052 049 063 056 052 0

50 124 101 072 101 076 021 073 102 078 026 072 0

100 125 102 070 102 076 023 070 103 078 026 070 0

1000 126 098 073 098 079 022 073 098 079 022 073 0

(b) No fourth moments 120575 = 13

120575 = 13 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 198 000 107 029 infin 000 101 033 infin 0

100 000 000 203 000 104 031 infin 000 100 033 infin 0

1000 000 000 218 000 100 033 234 000 100 033 234 0

50 050 051 078 050 038 063 060 050 052 054 054 0

100 050 051 078 051 042 061 060 050 051 054 054 0

1000 048 051 077 051 047 056 055 051 050 053 052 0

50 127 221 656 144 145 110 NA 123 135 114 NA 0

100 130 233 1128 143 142 112 NA 119 134 109 NA 0

1000 123 225 1676 139 145 120 1597 117 133 108 1230 0

(c) Non-existing mean 120575 = 1

120575 = 1 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 minus010 246 minus001 118 090 infin 000 101 099 infin 0

100 000 074 724 000 109 095 infin 000 101 099 infin 0

1000 000 384 3481 000 101 100 infin 000 100 100 infin 0

50 053 052 10 051 034 065 1 051 052 052 1 0

100 050 052 10 051 038 063 1 050 053 053 1 0

1000 049 052 10 051 048 053 1 049 051 051 1 0

50 127 6585 4243 210 250 232 NA 119 170 216 NA 0

100 130 41075 40502 201 228 259 NA 117 174 225 NA 0

1000 126 330758 1040527 193 221 281 NA 111 164 218 NA 0

(d) Extreme heavy tails 120575 = 15

120575 = 15 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 minus002 684 309 minus002 123 137 infin minus001 100 149 infin 001

100 000 minus5116 3080 minus001 112 144 infin 000 101 150 infin 000

1000 000 17613 14251 000 101 149 infin 000 100 150 infin 000

50 053 048 1 051 034 064 1 053 053 054 1 001

100 051 053 1 054 037 061 1 052 051 051 1 000

1000 050 050 1 050 047 054 1 049 053 052 1 000

50 132 134771 9261 257 320 312 NA 115 186 276 NA 001

100 133 4215628 418435 239 287 344 NA 112 178 284 NA 000

1000 126 12446282 3903629 218 266 367 NA 111 180 285 NA 000

12 The Scientific World Journal

Table 3 Summary statistics for observed (heavy-tailed) y and back-transformed (Gaussianized) data x120591MLE

lowastlowast stands for lt10minus16and lowast forlt22 sdot 10

minus16

y sim C(0 1)

(Section 61)y = SampP 500(Section 62)

y x120591

x

y x120591

Min minus16159 minus316 0 minus711 minus242Max 95295 381 3318 499 223Mean 230 003 1498 005 005Median 004 004 1496 004 004Standard Deviation 46980 106 120 095 071Skewness 1743 012 390 minus030 minus004Kurtosis 34334 321 16175 770 293Shapiro-Wilk lowast 071 lowastlowast lowast 024Anderson-Darling lowastlowast 051 lowastlowast lowast 018

Heavy-tailed Lambert W times Gaussian distributions havesimilar properties to a Cauchy for 120575 asymp 1 The mean of119883 (120583

119883)

equals the location of119884 (119888) due to symmetry around 120583119883and 119888

respectively Thus we can estimate 120591 from the Cauchy sampley transform y to x

120591 get 120583

119883from x

120591= 119882

120591(y) and thus obtain

an estimate for 119888 by 119888 = 120583119883

The random sample y sim C(0 1) with pdf 119891(119910) = 1120587(1+1199102) in Figure 2(a) has heavy tails with two extreme (positive)

observations A Cauchy ML fit gives 119888 = 003(0055) and119904 = 086(0053) (standard errors in parenthesis) A LambertWtimesGaussianMLEgives120583

119883= 003(0055)120590

119883= 105(0072)

and 120575 = 086(0082) Thus both fits correctly fail to reject120583119883= 119888 = 0 Table 3(a) shows summary statistics on both

samples Since the Cauchy distribution does not have a well-defined mean y = 2304(2101) is not meaningful Howeverx120591MLE

is approximately Gaussian and we can use the sampleaverage for inference x

120591MLE= 0033(00472) correctly fails to

reject a zero location for y The transformed x120591MLE

featuresadditional Gaussian characteristics (symmetric no excesskurtosis) and even the null hypothesis of Normality cannotbe rejected (119875-valuege 05) Note however that Normality forthe transformed data is only an empirical approximation therandom variable119882

120591((119884 minus 120583

119883)120590

119883) where 119884 is Cauchy does

not have a Normal distributionFigure 2(d) shows the cumulative sample average for

the original sample and its Gaussianized version For afair comparison 120591(119899)MLE was reestimated cumulatively for each119899 = 5 500 and then used to compute (119909

1 119909

119899) The

transformation works extremely well data point 11991049is highly

influential for y but has no relevant effect on x120591(119899)

MLE Even for

small 119899 it is already clear that the location of the underlyingCauchy distribution is approximately zero

Although it is a simulated example it demonstrates thatremoving (strong) heavy tails from data works well andprovides ldquonicerdquo data that can then be analyzed with morerefined Gaussian methods

62 Heavy Tails in Finance SampP 500 Case Study A lot offinancial data displays negative skewness and excess kurto-sis Since financial data in general is not iid it is oftenmodeled with a (skew) Studentrsquos 119905-distribution underlyinga (generalized) autoregressive conditional heteroskedastic(GARCH) [2 48] or a stochastic volatility (SV) [49 50]model Using the Lambert W approach we can build uponthe knowledge and implications of Normality (and avoidderiving properties of a GARCH or SV model with heavy-tailed innovations) and simply ldquoGaussianizerdquo the returnsbefore fitting more complex GARCH or SV models

Remark 10 Time series models with Lambert W times Gaussianwhite noise are far beyond the scope of this work but canbe a direction of future research Here I only consider theunconditional distribution

Figure 7(a) shows the SampP 500 log-returns with a totalof 119873 = 2 780 daily observations (R package MASSdataset SP500) Table 3(b) confirms the heavy tails (samplekurtosis 770) but also indicates negative skewness (minus0296)As the sample skewness 120574

1(y) is very sensitive to outliers

we fit a distribution which allows skewness and test forsymmetry In case of the double-tail Lambert W times Gaussianthis means testing 119867

0 120575

ℓ= 120575

119903= 120575 versus 119867

1

120575ℓ

= 120575119903 Using the likelihood expression in (28) we can

use a likelihood ratio test with one degree of freedom (3versus 4 parameters) The log-likelihood of the double-tail fit(Table 4(a)) equals minus36060 = minus297227 + (minus63373) (inputlog-likelihood + penalty) while the symmetric 120575 fit givesminus360656 = minus297147 + (minus63509) Here the symmetric fitgives a transformed sample that is more Gaussian but it paysa greater penalty for transforming the data Comparing twicetheir difference to a1205942

1distribution gives a119875-value of 029 For

comparison a skew-119905 fit [51] with location 119888 scale 119904 shape120572 and ] degrees of freedom also yields (Function stmle

in the R package sn) a nonsignificant (Table 4(b)) Thusboth fits cannot reject symmetry

The Scientific World Journal 13

Time

y

0 500 1500 2500

minus6

minus2

2

0 5 2010 3000

04

08

Lag

ACF

y

minus5 0 500

03

06

KernelGaussian

minus3 minus2 minus1 0 1 2 3

minus6

minus2

2

Theoretical quantiles

Sam

ple q

uant

ilesMean 005

SD 095Skewminus03Kurt 771

(a) Observed heavy tail returns y

Time

x

0 500 1500 2500

minus2

012

0 5 10 20 3000

04

08

Lag

ACF

x

00

03

06

minus3 minus2 minus1 0 1 2 3minus3 minus2 minus1 0 1 2 3

minus2

012

Theoretical quantiles

KernelGaussian

Sam

ple q

uant

iles

Mean 005SD 07Skewminus004Kurt 293

(b) Gaussianized returns x120591MLE

Figure 7 Lambert W Gaussianization of SampP 500 log-returns 120591 = (005 070 017) In (a) and (b) data (top left) autocorrelation function(ACF) (top right) histogram Gaussian fit and KDE (bottom left) Normal QQ plot (bottom right)

Assume we have to make a decision if we should trade acertificate replicating the SampP 500 Since we can either buyor sell it is not important if the average return is positive ornegative as long as it is significantly different from zero

621 Gaussian Fit to Returns Estimating (120583119884 120590119884) by Gaus-

sianMLE and thus ignoring the heavy tails 120583119884= 0 cannot be

rejected on a 120572 = 1 level (Table 4(e)) Ignoring heavy tailswe would thus decide to not trade a replicating certificate atall

622 Heavy Tail Fit to Returns Both a heavy tail LambertWtimes Gaussian (Table 4(c)) and Student 119905-fit (Table 4(d)) rejectthe zero mean null (119875-values 10minus4 and 3 sdot 10minus5 resp)

Location and scale estimates are almost identical buttail estimates lead to different conclusions while for ] =

371 only moments up to order 3 exist in the Lambert W times

Gaussian case moments up to order 5 exist (10172 = 581)This is especially noteworthy as many theoretical results inthe (financial) time series literature rely on finite fourthmoments [52 53] consequently many empirical studies testthis assumption [7 54] Here Studentrsquos 119905 and a LambertW times Gaussian fit give different conclusions Since previousempirical studies often use Studentrsquos 119905 as a baseline [41] itmight be worthwhile to reexamine their findings in light ofheavy tail Lambert W times Gaussian distributions

623 ldquoGaussianizingrdquo Financial Returns The back-trans-formed x

120591MLEis indistinguishable from a Gaussian sample

(Figure 7(b)) and thus demonstrates that a Lambert W times

Gaussian distribution is indeed appropriate for x120591MLE

Noteven one test can reject Normality 119875-values are 018 018031 and 024 respectively (Anderson-Darling Cramer-von-Mises Shapiro-Francia Shapiro-Wilk see Thode [55])Table 3(b) confirms that Lambert W ldquoGaussianiziationrdquo was

successful 1205741(x120591) = minus0039 and 120574

2(x120591) = 293 are within the

typical variation for aGaussian sample of size119873 = 2780Thus

119884 = (119880119890(01722)119880

2

) 0705 + 0055 119880 =

119883 minus 0055

0705

119880 simN (0 1)

(36)

is an adequate (unconditional) LambertWtimesGaussianmodelfor the SampP 500 log-returns y For trading thismeans that theexpected return is significantly larger than zero (120583

119883= 0055 gt

0) and thus replicating certificates should be bought

624 Gaussian MLE for Gaussianized Data For 120575119897= 120575

119903equiv

120575 lt 1 also 120583119883equiv 120583

119884 We can therefore replace testing

120583119910= 0 versus 120583

119910= 0 for a non-Gaussian y with the

very well-understood hypothesis test 120583119909= 0 versus 120583

119909= 0

for the Gaussian x120591MLE

In particular standard errors basedon radic119873 and thus t and 119875-values should be closer to theldquotruthrdquo (Tables 4(c) and 4(d)) than a Gaussian MLE on thenon-Gaussian y (Table 4(e)) Table 4(f) shows that standarderrors for 120583x are even a bit too small compared to the heavy-tailed versions Since the ldquoGaussianizingrdquo transformation wasestimated treating x

120591MLEas if it was original data is too

optimistic regarding its Normality (recall the penalty (30) inthe total likelihood (28))

This example confirms that if a model and its theoret-ical properties are based on Normality but the observeddata is heavy-tailed then Gaussianizing the data first givesmore reliable inference than applying Gaussian methods tothe original heavy-tailed data (Figure 1) Clearly a jointestimation of the model parameters based on Lambert Wtimes Gaussian random variables (or any other heavy-taileddistribution) would be optimal However theoretical prop-erties and estimation techniques may not be available orwell understood The Lambert Way to Gaussianize data is

14 The Scientific World Journal

Table 4 MLE fits to SampP 500 y Tables 4(a) 4(b) 4(c) 4(d) and 4(e)and the Gaussianized data x

120591MLETable 4(f)

(a) Double-tail Lambert W times Gaussian = Tukeyrsquos ℎℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 366 000120590119883

071 0016 4400 000120575ℓ

019 0021 899 000120575119903

016 0019 824 000

(b) Skew 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 010 0061 165 010119904 067 0017 3847 000120572 minus008 0101 minus077 044] 373 0297 1257 000

(c) Lambert W times Gaussian = Tukeyrsquos ℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 365 0000120590119883

071 0016 4395 0000120575 017 0016 1105 0000

(d) Studentrsquos 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 006 0015 365 000119904 067 0017 3951 000] 372 0295 1261 000

(e) Gaussian (SampP 500)

Est se 119905 Pr (gt|119905|)120583119884

005 0018 255 001120590119884

095 0013 7457 000

(f) Gaussian (x120591MLE )

Est se 119905 Pr (gt|119905|)120583119909120591

005 0013 381 000120590119909120591

071 0009 7457 000

thus a pragmatic method to improve statistical inferenceon heavy-tailed data while preserving the applicability andinterpretation of Gaussian models

7 Discussion and Outlook

In this work I use the Lambert W function to modeland remove heavy tails from continuous random variablesusing a data-transformation approach For Gaussian randomvariables this not only contributes to existing work onTukeyrsquos ℎ distribution but also gives convincing empiricalresults unimodal data with heavy tails can be transformedto Gaussian data Properties of a Gaussian model MN onthe back-transformed data mimic the features of the ldquotruerdquoheavy-tailed modelM

119866very closely

Since Normality is the single most typical and oftenrequired assumption in many areas of statistics machine

learning and signal processing future research can takemany directions From a theoretical perspective properties ofLambert W times 119865

119883distributions viewed as a generalization of

alreadywell-known distributions119865119883can be studiedThis area

will profit from existing literature on the Lambert W func-tion which has been discovered only recently by the statisticscommunity Empirical work can focus on transforming thedata and comparing approximate Gaussian with joint heavytail analyses The comparisons in this work showed thatapproximate inference for Gaussianized data is comparablewith the direct heavy tail modeling and so provides a simpletool to improve inference for heavy-tailed data in statisticalpractice

I also provide the R package Lambert W publicly avail-able at CRAN to facilitate the use of heavy tail Lambert Wtimes 119865

119883distributions in practice

Disclosure

The first version of this article entitled ldquoClosed-form cdfand pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert Wapproach and how to bijectively lsquoGaussianizersquo heavy-taileddatardquo has appeared online [56]This research was performedwhile the author was at Carnegie Mellon University Cur-rently Georg M Goerg works at Google Inc New York NY10011 USA

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

References

[1] P J Brockwell andR ADavisTime SeriesTheory andMethodsSpringer Series in Statistics 1998

[2] R F Engle ldquoAutoregressive conditional heteroscedasticity withestimates of the variance of United Kingdom inflationrdquo Econo-metrica vol 50 no 4 pp 987ndash1007 1982

[3] C W J Granger and R Joyeux ldquoAn introduction to long-memory time series models and fractional differencingrdquo Jour-nal of Time Series Analysis vol 1 pp 15ndash30 2001

[4] C A Field ldquoUsing the gh distribution to model extreme windspeedsrdquo Journal of Statistical Planning and Inference vol 122 no1-2 pp 15ndash22 2004

[5] A Vazquez J G Oliveira Z Dezso K-I Goh I Kondor andA-L Barabasi ldquoModeling bursts and heavy tails in humandynamicsrdquo Physical Review E vol 73 no 3 Article ID 0361272006

[6] M Gidlund and N Debernardi ldquoScheduling performance ofheavy-tailed data traffic in wireless high-speed shared chan-nelsrdquo in Proceedings of the IEEE Wireless Communications ampNetworking Conference (WCNC rsquo09) pp 1818ndash1823 IEEE PressBudapest Hungary April 2009

[7] R Cont ldquoEmpirical properties of asset returns stylized factsand statistical issuesrdquo Quantitative Finance vol 1 pp 223ndash2362001

[8] T-H Kim and H White ldquoOn more robust estim ation ofskewness and kurtosis simulation and application to the SP500indexrdquo Tech Rep Department of Economics UCSD 2003

The Scientific World Journal 15

[9] T C Aysal and K E Barner ldquoSecond-order heavy-taileddistributions and tail analysisrdquo IEEE Transactions on SignalProcessing vol 54 no 7 pp 2827ndash2832 2006

[10] V K Smith ldquoLeast squares regression with Cauchy errorsrdquoOxford Bulletin of Economics and Statistics vol 35 no 3 pp223ndash231 1973

[11] J Ilow ldquoForecasting network traffic using FARIMAmodels withheavy tailed innovationsrdquo inProceedings of the IEEE InterntionalConference onAcoustics Speech and Signal Processing pp 3814ndash3817 June 2000

[12] W Palma and M Zevallos ldquoFitting non-Gaussian persistentdatardquo Applied Stochastic Models in Business and Industry vol27 no 1 pp 23ndash36 2011

[13] J Nowicka-Zagrajek and R Weron ldquoModeling electricity loadsin California ARMA models with hyperbolic noiserdquo SignalProcessing vol 82 no 12 pp 1903ndash1915 2002

[14] R J Adler R E Feldman and M S Taqqu Eds A PracticalGuide to Heavy Tails Statistical Techniques and ApplicationsBirkhauser Cambridge Mass USA 1998

[15] H Liu J Lafferty and L Wasserman ldquoThe nonparanormalsemiparametric estimation of high dimensional undirectedgraphsrdquo Journal ofMachine Learning Research vol 10 pp 2295ndash2328 2009

[16] G M Goerg ldquoLambert W random variablesmdasha new familyof generalized skewed distributions with applications to riskestimationrdquo The Annals of Applied Statistics vol 5 no 3 pp2197ndash2230 2011

[17] D C Hoaglin ldquoSummarizing shape numerically the g-and-hdistributionsrdquo in Exploring Data Tables Trends and Shapes DC Hoaglin F Mosteller and J W Tukey Eds chapter 11 pp461ndash513 John Wiley amp Sons Hoboken NJ USA 1985

[18] RDevelopment Core Team R A Language and Environment forStatistical Computing R Foundation for Statistical ComputingVienna Austria 2010

[19] C Field and M G Genton ldquoThe Multivariate g-and-h Distri-butionrdquo Technometrics vol 48 no 1 pp 104ndash111 2006

[20] RM Sakia ldquoThe Box-Cox transformation technique a reviewrdquoJournal of the Royal Statistical Society Series D The Statisticianvol 41 no 2 pp 169ndash178 1992

[21] J R Blaylock L E Salathe and R D Green ldquoA note on the Box-Cox transformation under heteroskedasticityrdquoWestern Journalof Agricultural Economics vol 5 no 2 pp 129ndash136 1980

[22] A J Lawrance ldquoA note on the variance of the Box-Cox regres-sion transformation estimaterdquo Journal of the Royal StatisticalSociety C vol 36 no 2 pp 221ndash223 1987

[23] G Tsiotas ldquoOn the use of the Box-Cox transformation onconditional variance modelsrdquo Finance Research Letters vol 4no 1 pp 28ndash32 2007

[24] S Goncalves andNMeddahi ldquoBox-Cox transforms for realizedvolatilityrdquo Journal of Econometrics vol 160 no 1 pp 129ndash1442011

[25] N H Bingham C M Goldie and J L Teugels RegularVariation Encyclopedia of Mathematics and its ApplicationsCambridge University Press Cambridge UK 1989

[26] S Morgenthaler and J W Tukey ldquoFitting quantiles doublingHRHQ andHHH distributionsrdquo Journal of Computational andGraphical Statistics vol 9 no 1 pp 180ndash195 2000

[27] M Fischer ldquoGeneralized Tukey-type distributions with appli-cation to financial and teletraffic datardquo Statistical Papers vol 51no 1 pp 41ndash56 2010

[28] T C Headrick R K Kowalchuk and Y Sheng ldquoParametricprobability densities and distribution functions for Tukey g-and-h transform ations and their use for fi tting datardquo AppliedMathematical Sciences vol 2 no 9 pp 449ndash462 2008

[29] R M Corless G H Gonnet D E Hare D J Jeffrey andD E Knuth ldquoOn the Lambert 119882 functionrdquo Advances inComputational Mathematics vol 5 no 4 pp 329ndash359 1996

[30] M Rosenlicht ldquoOn the explicit solvability of certain transcen-dental equationsrdquo Publications Mathematiques de lrsquoInstitut desHautes Etudes Scientifiques no 36 pp 15ndash22 1969

[31] S R Valluri D J Jeffrey and R M Corless ldquoSome applicationsof the Lambert W function to physicsrdquo Canadian Journal ofPhysics vol 78 no 9 pp 823ndash831 2000

[32] M Galassi J Davies J Theiler et al GNU Scientific LibraryReference Manual 3rd edition 2011 httpwwwgnuorgsoft-waregsl

[33] P Jodra ldquoA closed-form expression for the quantile functionof the Gompertz-Makeham distributionrdquo Mathematics andComputers in Simulation vol 79 no 10 pp 3069ndash3075 2009

[34] A G Pakes ldquoLambertrsquos W infinite divisibility and poissonmixturesrdquo Journal of Mathematical Analysis and Applicationsvol 378 no 2 pp 480ndash492 2011

[35] M Stehlık ldquoDistributions of exact tests in the exponentialfamilyrdquoMetrika vol 57 no 2 pp 145ndash164 2003

[36] J-N Hwang S-R Lay and A Lippman ldquoNonparametricmultivariate density estimation a comparative studyrdquo IEEETransactions on Signal Processing vol 42 no 10 pp 2795ndash28101994

[37] L Wasserman All of Nonparametric Statistics Springer Texts inStatistics 2007

[38] R E Maiboroda and N M Markovich ldquoEstimation of heavy-tailed probability density functionwith application toweb datardquoComputational Statistics vol 19 no 4 pp 569ndash592 2004

[39] N M Markovich ldquoAccuracy of transformed kernel densityestimates for a heavy-tailed distributionrdquo Automation andRemote Control vol 66 no 2 pp 217ndash232 2005

[40] K K Dutta and D Babbel ldquoOn measuring skewness andkurtosis in short rate distributions the case of the US dollarLondon inter bank offer ratesrdquo Tech Rep Wharton SchoolCenter for Financial Institutions University of Pennsylvania2002

[41] C S Wong W S Chan and P L Kam ldquoA Student t-mixtureautoregressivemodel with applications to heavy-tailed financialdatardquo Biometrika vol 96 no 3 pp 751ndash760 2009

[42] J Yan ldquoAsymmetry fat-tail and autoregressive conditionaldensity in financial return data with systems of frequencycurvesrdquo 2005 httpciteseerxistpsueduviewdocsummarydoi=1011762741

[43] G D Rayner and H L MacGillivray ldquoNumerical maximumlikelihood estimation for the g-and-k and generalized g-and-hdistributionsrdquo Statistics and Computing vol 12 no 1 pp 57ndash752002

[44] E L Lehmann and G Casella Theory of Point EstimationSpringer Texts in Statistics Springer New York NY USA 2ndedition 1998

[45] C Fernandez andM F Steel ldquoMultivariate Student-119905 regressionmodels pitfalls and inferencerdquo Biometrika vol 86 no 1 pp153ndash167 1999

[46] C Liu and D B Rubin ldquoML estimation of the t distributionusing EM and its extensions ECMand ECMErdquo Statistica Sinicavol 5 pp 19ndash39 1995

16 The Scientific World Journal

[47] M P Cohen ldquoSample means of independent standard CAUchyrandom variables are standard cauchy a new approachrdquo TheAmerican Mathematical Monthly vol 119 no 3 pp 240ndash2442012

[48] T Bollerslev ldquoGeneralized autoregressive conditional het-eroskedasticityrdquo Journal of Econometrics vol 31 no 3 pp 307ndash327 1986

[49] R Deo C Hurvich and Y Lu ldquoForecasting realized volatilityusing a long-memory stochastic volatility model estimationprediction and seasonal adjustmentrdquo Journal of Econometricsvol 131 no 1-2 pp 29ndash58 2006

[50] AMelino and SM Turnbull ldquoPricing foreign currency optionswith stochastic volatilityrdquo Journal of Econometrics vol 45 no 1-2 pp 239ndash265 1990

[51] A Azzalini and A Capitanio ldquoDistributions generated byperturbation of symmetrywith emphasis on amultivariate skewt-distributionrdquo Journal of the Royal Statistical Society Series BStatistical Methodology vol 65 no 2 pp 367ndash389 2003

[52] R N Mantegna and H E Stanley ldquoModeling of financial datacomparison of the truncated Levy flight and the ARCH(1) andGARCH(11) processesrdquo Physica A Statistical Mechanics and itsApplications vol 254 no 1-2 pp 77ndash84 1998

[53] P A Zadrozny Necessary and Sufficient Restrictions for Exis-tence of a Unique Fourth Moment of a Univariate GARCH(pq)Process vol 20 of Advances in Econometrics chapter 13Emerald Group Publishing 2005 httpsideasrepecorgpcesceswps 1505html

[54] R Huisman K G Koedijk C J Kool and F Palm ldquoTail-indexestimates in small samplesrdquo Journal of Business amp EconomicStatistics vol 19 no 2 pp 208ndash216 2001

[55] H CThode JrTesting for Normality CRCPress NewYorkNYUSA 2002

[56] G M Goerg ldquoClosed-form cdf and pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert W approach and howto bijectively ldquoGaussianizerdquo heavy-tailed datardquo Tech RepCarnegieMellonUniversity 2010 httparxiv-web3librarycor-nelleduabs10102265v1

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article The Lambert Way to Gaussianize Heavy

4 The Scientific World Journal

minus2 minus1 0 1 2 3 4minus2minus1

01234

u

z=G120575(u)

120575r = 110z = u

120575120001 = 0

(a) 120575ℓ120575119903 transformation (3)

minus2 minus1 0 1 2 3 4minus2

minus1

0

1

2

z

120575r = 110u = z

u=W

120575(z)

120575120001 = 0

(b) Inverse119882120575ℓ 120575119903(119911) in (80) in the supplemen-

tary material

z

01 02 03 04 05 06 07 08 09 1 1

1 12

13

14 15

16 17

18 19

2

23 2

4

00 05 10 1500051015202530

120575

W120575(z)

(c) Inverse119882120575(119911)(9) as a function of 120575 and 119911

Figure 3 Transformation and inverse transformation for 120575ℓ= 0 and 120575

119903= 110 identity on the left (same tail behavior) and a heavy-tailed

transformation in the right tail of input 119880

where 119880 is standard Normal random variable and ℎ isthe heavy tail parameter The random variable 119885 has tailparameter 119886 = 1ℎ [17] and reduces to the Gaussian for ℎ = 0Morgenthaler and Tukey [26] extend the ℎ distribution to theskewed heavy-tailed family of ℎℎ random variables

119885 =

119880 exp(120575ℓ

2

1198802) if 119880 le 0

119880 exp(120575119903

2

1198802) if 119880 gt 0

(3)

where again 119880 sim N(0 1) Here 120575ℓge 0 and 120575

119903ge 0 shape the

left and right tail of 119885 respectively thus transformation (3)can model skewed and heavy-tailed data see Figure 3(a) Forthe sake of brevity let119867

120575(119906) = 119906 exp((1205752)1199062)

However despite their great flexibility Tukeyrsquos ℎ andℎℎ distributions are not very popular in statistical practicebecause expressions for the cdf or pdf have not been availablein closed form Although Morgenthaler and Tukey [26]express the pdf of (2) as (ℎ equiv 120575)

119892119885 (119911) =

119891119880(119867

minus1

120575(119911))

1198671015840

120575(119867

minus1

120575(119911))

(4)

they fall short of making119867minus1

120575(119911) explicit So far the inverse of

(2) or (3) has been considered analytically intractable [4 27]Thus parameter inference relied on matching empirical andtheoretical quantiles [4 17 26] or by themethod ofmoments[28] Only recently Headrick et al [28] provided numericalapproximations to the inverse However numerical approx-imations can be slow and prohibit analytical derivationsThus a closed form analytically tractable pdf which canbe computed efficiently is essential for a widespread use ofTukeyrsquos ℎ (and variants)

In this work I present this long sought closed-forminverse which has a large body of literature in mathematicsand is readily available in standard statistics software Forease of notation and concision main results are shown for120575ℓ= 120575

119903= 120575 analogous results for 120575

ℓ= 120575

119903will be stated

without details

22 Heavy Tail Lambert W Random Variables Tukeyrsquos ℎtransformation (2) is strongly related to the approach takenby Goerg [16] to introduce skewness in continuous randomvariables 119883 sim 119865

119883(119909) In particular if 119885 sim Tukeyrsquos ℎ then

1198852sim skewed Lambert W times 120594

2

1with skew parameter 120574 = ℎ

Adapting the skew Lambert W times 119865119883inputoutput idea

(see Figure 1) Tukeyrsquos ℎ random variables can be generalizedto heavy-tailed Lambert W times 119865

119883random variables (Most

concepts and methods from the skew Lambert W times 119865119883case

transfer one-to-one to the heavy tail Lambert W randomvariables presented hereThus for the sake of concision I referto Goerg [16] for details of the Lambert W framework)

Definition 1 Let119880 be a continuous random variable with cdf119865119880(119906 | 120573) pdf 119891

119880(119906 | 120573) and parameter vector 120573 Then

119885 = 119880 exp(1205752

1198802) 120575 isin R (5)

is a noncentral nonscaled heavy tail LambertW times 119865119880random

variable with parameter vector 120579 = (120573 120575) where 120575 is the tailparameter

Tukeyrsquos ℎ distribution results for 119880 being a standardGaussianN(0 1)

Definition 2 For a continuous location-scale family randomvariable 119883 sim 119865

119883(119909 | 120573) define a location-scale heavy-tailed

LambertW times 119865119883random variable

119884 = 119880 exp(1205752

1198802)120590

119883+ 120583

119883 120575 isin R (6)

with parameter vector 120579 = (120573 120575) where119880 = (119883minus120583119883)120590

119883and

120583119883and120590

119883aremean and standard deviation of119883 respectively

The input is not necessarily Gaussian (Tukeyrsquos ℎ) but canbe any other location-scale continuous random variable forexample from a uniform distribution 119883 sim 119880(119886 119887) (seeFigure 4)

The Scientific World Journal 5

120575 0110

00 05 10 15 20 25 30000204060810

y

Pdf

00 05 10 15 20 25 300002040608

y

Cdf

(a) Lambert Wtimes1205942

119896with 120573=119896=1

13

Pdf

000204060810

Cdf

0 2 4 6 8 10 12000

010

020

y

0 2 4 6 8 10 12y

(b) Lambert W times Γ(119904 119903) with 120573 =(119904 119903) = (3 1)

1

minus4 minus2 0 2 4000204060810

y

Cdf

minus4 minus2 0 2 40001020304

y

Pdf

(c) Lambert W times N(120583 1205902) with

120573 = (120583 120590) = (0 1)

minus4 minus2 0 2 4000204060810

y

Cdf

minus4 minus2 0 2 4000102030405

y

Pdf

5

(d) Lambert W times 119880(119886 119887) with120573 = (119886 119887) = (minus1 1)

Figure 4 Pdf (left) and cdf (right) of a heavy tail (a) ldquononcentral nonscaledrdquo (b) ldquoscalerdquo and (c and d) ldquolocation-scalerdquo Lambert W times 119865119883

random variable 119884 for various degrees of heavy tails (color dashed lines)

Definition 3 Let 119883 sim 119865119883(119909119904 | 120573) be a continuous scale-

family random variable with scale parameter 119904 and standarddeviation 120590

119883 let 119880 = 119883120590

119883 Then

119884 = 119883 exp(1205752

1198802) 120575 isin R (7)

is a scaled heavy-tailed Lambert W times 119865119883random variable

with parameter 120579 = (120573 120575)

Let 120591 = (120583119883(120573) 120590

119883(120573) 120575) define transformation (6) (For

noncentral nonscale input set 120591 = (0 1 120575) for scale-familyinput 120591 = (0 120590

119883 120575)) The shape parameter 120575(= Tukeyrsquos ℎ)

governs the tail behavior of 119884 for 120575 gt 0 values further awayfrom 120583

119883are increasingly emphasized leading to a heavy-

tailed version of119883 for 120575 = 0 119884 equiv 119883 and for 120575 lt 0 values faraway from the mean are mapped back again closer to 120583

119883 For

120575 ge 0 and 119883 isin (minusinfininfin) also 119884 isin (minusinfininfin) For 120575 ge 0 and119883 isin [0infin) also 119884 isin [0infin)

Remark 4 (only nonnegative 120575) Although 120575 lt 0 givesinteresting properties for 119884 it defines a nonbijective trans-formation and leads to parameter-dependent support andnonunique inputThus for the remainder of this work assume120575 ge 0 unless stated otherwise

23 Inverse Transformation ldquoGaussianizerdquo Heavy-Tailed DataTransformation (6) is bijective and its inverse can beobtained via the Lambert W function which is the inverseof 119911 = 119906 exp(119906) that is that function which satisfies119882(119911) exp(119882(119911)) = 119911 It has been studied extensively inmathematics physics and other areas of science [29ndash31] andis implemented in the GNU Scientific Library (GSL) [32]Only recently the Lambert W function received attentionin the statistics literature [16 33ndash35] It has many usefulproperties (see Appendix A in the supplementary material

and Corless et al [29]) in particular 119882(119911) is bijective for119911 ge 0

Lemma 5 The inverse transformation of (6) is

119882120591 (119884) = 119882120575

(

119884 minus 120583119883

120590119883

)120590119883+ 120583

119883= 119880120590

119883+ 120583

119883= 119883 (8)

where

119882120575 (119911) = sgn (119911)(

119882(1205751199112)

120575

)

12

(9)

and sgn(119911) is the sign of 119911119882120575(119911) is bijective for all 120575 ge 0 and

all 119911 isin R

Lemma 5 gives for the first time an analytic bijectiveinverse of Tukeyrsquos ℎ transformation119867minus1

120575(119910) of Morgenthaler

and Tukey [26] is now analytically available as (8) Bijectivityimplies that for any data y and parameter 120591 the exact inputx120591= 119882

120591(y) sim 119865

119883(119909) can be obtained

In view of the importance and popularity of Normalitywe clearly want to back-transform heavy-tailed data todata from a Gaussian rather than yet another heavy-taileddistribution Tail behavior of random variables is typicallycompared by their kurtosis 120574

2(119883) = E(119883 minus 120583

119883)4120590

4

119883 which

equals 3 if 119883 is Gaussian Hence for the future when weldquonormalize yrdquo we cannot only center and scale but alsotransform it to x

120591with 120574

2(x120591) = 3 (see Figure 2(c)) While

1205742(119883) = 3 does not guarantee that 119883 is Gaussian it is a good

baseline for a Gaussian sample Furthermore it puts differentdata not only on the same scale but also on the same tail

This data-driven view of the Lambert W framework canalso be useful for kernel density estimation (KDE) wheremultivariate data is often prescaled to unit variance so thesame bandwidth can be used in each dimension [36 37]Thus

6 The Scientific World Journal

ldquonormalizingrdquo the Lambert Way can also improve KDE forheavy-tailed data (see also [38 39])

Remark 6 (generalized transformation) Transformation (2)can be generalized to

119885 = 119880 exp(1205752

(1198802)

120572

) 120572 gt 0 (10)

The inner term 1198802 guarantees bijectivity for all 120572 gt 0 The

inverse is

119882120575120572 (

119911) = sgn (119911)(119882(120572120575 (119911

2)

120572

)

120572120575

)

12120572

(11)

For comparison with Tukeyrsquos ℎ I consider 120572 = 1 onlyFor 120572 = 12 transformation (10) is closely related to skewedLambert W times 119865

119883distributions

24 Distribution and Density For ease of notation let

119911 =

119910 minus 120583119883

120590119883

119906 = 119882120575 (119911) 119909 = 119882

120591(119910) = 119906120590

119883+ 120583

119883 (12)

Theorem 7 The cdf and pdf of a location-scale heavy tailLambert W times 119865

119883random variable 119884 equal

119866119884(119910 | 120573 120575) = 119865

119883(119882

120575(

119910 minus 120583119909

120590119909

)120590119883+ 120583

119883| 120573) (13)

119892119884(119910 | 120573 120575)

= 119891119883(119882

120575(

119910 minus 120583119883

120590119883

)120590119883+ 120583

119883| 120573)

sdot

119882120575((119910 minus 120583

119883) 120590

119883)

((119910 minus 120583119883) 120590

119883) [1 +119882(120575 ((119910 minus 120583

119883) 120590

119883)2)]

(14)

Clearly 119866119884(119910 | 120573 120575 = 0) = 119865

119883(119910 | 120573) and 119892

119884(119910 | 120573 120575 =

0) = 119891119883(119910 | 120573) since lim

120575rarr0119882120575(119911) = 119911 and lim

120575rarr0119882(120575119911

2) =

0 for all 119911 isin RFor scale family or noncentral nonscale input set 120583

119883= 0

or 120583119883= 0 120590

119883= 1

The explicit formula (14) allows a fast computation andtheoretical analysis of the likelihood which is essential forstatistical inference Detailed properties of (14) are given inSection 41

Figure 4 shows (13) and (14) for various 120575 ge 0 with fourdifferent input distributions for 120575 = ℎ = 0 the input equalsthe output (solid black) for larger 120575 the tails of 119866

119884(119910 | 120579) and

119892119884(119910 | 120579) get heavier (dashed colored)

25 Quantile Function Quantile fitting has been the standardtechnique to estimate 120583

119883120590

119883 and 120575 of Tukeyrsquos ℎ In particular

the medians of 119884 and 119883 are equal Thus for symmetriclocation-scale family input the sample median of y is a robust

estimate of 120583119883for any 120575 ge 0 (see also Section 5) General

quantiles can be computed via [17]

119910120572= 119906

120572exp(120575

2

1199062

120572)120590

119883+ 120583

119883 (15)

where 119906120572= 119882

120575(119911120572) are the 120572-quantiles of 119865

119880(119906)

3 Tukeyrsquos ℎ Distribution Gaussian Input

For Gaussian input Lambert W times 119865119883equals Tukeyrsquos ℎ which

has been studied extensively Dutta and Babbel [40] show that

E119885119899=

0 if 119899 is odd 119899 lt 1120575

119899 (1 minus 119899120575)minus(119899+1)2

21198992(1198992)

if 119899 is even 119899 lt 1120575

∄ if 119899 is odd 119899 gt 1120575

infin if 119899 is even 119899 gt 1120575

(16)

which in particular implies that [28]

E119885 = E1198853= 0 if 120575 lt 1 and 1

3

respectively (17)

E1198852=

1

(1 minus 2120575)32 if 120575 lt 1

2

E1198854= 3

1

(1 minus 4120575)52 if 120575 lt 1

4

(18)

Thus the kurtosis of 119884 equals (see Figure 5)

1205742 (120575) = 3

(1 minus 2120575)3

(1 minus 4120575)52

for 120575 lt 14 (19)

For 120575 = 0 (18) and (19) reduce to the familiarGaussian valuesExpanding (19) around 120575 = 0 yields

1205742 (120575) = 3 + 12120575 + 66120575

2+ O (120575

3) (20)

Dropping O(1205753) and solving (20) gives a rule of thumbestimate

120575Taylor =

1

66

[radic66 1205742(y) minus 162 minus 6]

+

(21)

where 1205742(y) is the sample kurtosis and [119886]

+= max(119886 0) that

is 120575Taylor gt 0 if 1205742(y) gt 3 otherwise set 120575Taylor = 0

Corollary 8 The cdf of Tukeyrsquos ℎ equals

119866119884(119910 | 120583

119883 120590119883 120575) = Φ(

119882120591(119910) minus 120583

119883

120590119883

) (22)

whereΦ(119906) is the cdf of a standard NormalThe pdf equals (for120575 gt 0)

119892119884(119910 | 120583

119883 120590119883 120575) =

1

radic2120587

exp(minus1 + 1205752

119882120575(

119910 minus 120583119883

120590119883

)

2

)

sdot

1

1 +119882(120575 ((119910 minus 120583119883) 120590

119883)2)

(23)

The Scientific World Journal 7

000 005 010 015 020 02510

15

20

25

Varia

nce

Studentrsquos t

120575 or 1

Heavy tail Lambert W times Gaussian

(a) Variance

000 005 010 0153

4

5

6

7

8

9

10

Kurt

osis

120575 or 1

Studentrsquos tHeavy tail Lambert W times Gaussian

(b) Kurtosis

Figure 5 Comparing moments of Lambert W times Gaussian and Studentrsquos 119905

Proof Take119883 simN(120583119883 120590

2

119883) in Theorem 7

Section 41 studies functional properties of (23) in moredetail

31 Tukeyrsquos ℎ versus Studentrsquos 119905 Studentrsquos 119905]-distribution with] degrees of freedom is often used to model heavy-tailed data[41 42] as its tail index equals ] Thus the 119899th moment of aStudentrsquos 119905 random variable 119879 exists if 119899 lt ] In particular

E119879 = E1198793= 0 if ] gt 1 or gt 3

E1198792=

]] minus 2

=

1

1 minus (2])if ] gt 2

(24)

and kurtosis

1205742 (]) = 3

] minus 2] minus 4

= 3

1 minus 2 (1])1 minus 4 (1])

if ] gt 4 (25)

Comparing (24) and (25) with (18) and (19) shows anatural association between 1] and 120575 and a close similaritybetween the first four moments of Studentrsquos 119905 and Tukeyrsquosℎ (Figure 5) By continuity and monotonicity the first fourmoments of a location-scale 119905-distribution can always beexactly matched by a corresponding location-scale LambertW times Gaussian Thus if Studentrsquos 119905 is used to model heavytails and not as the true distribution of a test statisticit might be worthwhile to also fit heavy tail Lambert Wtimes Gaussian distributions for an equally valuable ldquosecondopinionrdquo For example a parallel analysis on SampP 500 log-returns in Section 62 leads to divergent inference regardingthe existence of fourth moments

4 Parameter Estimation

Due to the lack of a closed form pdf of 119884 120579 = (120573 120575) hastypically been estimated by matching quantiles or a methodof moments estimator [4 26 28] These methods can now

be replaced by the fast and usually efficient maximumlikelihood estimator (MLE) Rayner and MacGillivray [43]introduce a numerical MLE procedure based on quantilefunctions but they conclude that ldquosample sizes significantlylarger than 100 should be used to obtain reliable estimatesrdquoSimulations in Section 5 show that the MLE using the closedform Lambert W times 119865

119883distribution converges quickly and is

accurate even for sample sizes as small as119873 = 10

41 Maximum Likelihood Estimation (MLE) For an iidsample (119910

1 119910

119873) = y sim 119892

119884(119910 | 120573 120575) the log-likelihood

function equals

ℓ (120579 y) =119873

sum

119894=1

log119892119884(119910

119894| 120573 120575) (26)

The MLE is that 120579 = (120573 120575) which maximizes (26) that is

120579MLE = (

120573 120575)MLE

= argmax120573120575ℓ (120573 120575 y) (27)

Since 119892119884(119910119894| 120573 120575) is a function of 119891

119883(119909119894| 120573) the MLE

depends on the specification of the input density Equation(26) can be decomposed as

ℓ (120573 120575 y) = ℓ (120573 x120591) +R (120591 y) (28)

where

ℓ (120573 x120591) =

119873

sum

119894=1

log119891119883(119882

120575(

119910119894minus 120583

119883

120590119883

)120590119883+ 120583

119883| 120573)

=

119873

sum

119894=1

log119891119883(119909

120591119894| 120573)

(29)

is the log-likelihood of the back-transformed data x120591=

(1199091205911 119909

120591119873) (via (8)) and

R (120591 y) =119899

sum

119894=1

log119877 (120583119883 120590119883 120575 119910

119894) (30)

8 The Scientific World Journal

y

01

015 02 025 03 035 04

045 05

055

06 065 07 075 08 085 09 095

1

00 05 10 1500

05

10

15

20

25

30

120575

R(120575 y | 120583 = 0 120590 = 1)

(a) Penalty 119877(120583119883 120590119883 120575 119910119894)(31) as a function of120575 and 119910 (120583119883 = 0 and 120590119883 = 1)

0 200 400 600 800 1000

minus10

0

10

20

30

40

Index00 05 10 15

minus3000

minus2000

minus1000

0

Log-

likel

ihoo

d (a

nd p

enal

ty)

Penalty

Input loglikOutput loglik

120575

120575trueMLE

(b) Sample z of Lambert W times Gaussian with 120575 = 13 (left) log-likelihood ℓ(120579 y) (solid black)decomposes in input log-likelihood (dotted green) and penalty (dashed red)

Figure 6 Log-likelihood decomposition for Lambert W times 119865119883distributions

where

119877 (120583119883 120590119883 120575 119910

119894)

=

119882120575((119910

119894minus 120583

119883) 120590

119883)

((119910119894minus 120583

119883) 120590

119883) [1 + 120575 (119882

120575((119910

119894minus 120583

119883) 120590

119883))2]

(31)

Note that 119877(120583119883 120590119883 120575 119910

119894) only depends on 120583

119883(120573) and 120590

119883(120573)

(and 120575) but not necessarily on every coordinate of 120573Decomposition (28) shows the difference between the

exact MLE (120573 120575) based on y and the approximate MLE 120573x120591based on x

120591alone if we knew 120591 = (120583

119883 120590119883 120575) beforehand

then we could back-transform y to x120591and estimate 120573x120591 from

x120591(maximize (29) with respect to 120573) In practice however

120591 must also be estimated and this enters the likelihood viathe additive term R(120591 y) A little calculation shows that forany 119910

119894isin R log119877(120583

119883 120590119883 120575 119910

119894) le 0 if 120575 ge 0 with equality

if and only if 120575 = 0 Thus R(120591 y) can be interpreted as apenalty for transforming the data Maximizing (28) faces atrade-off between transforming the data to follow 119891

119883(119909 | 120573)

(and increasing ℓ(120573 x120591)) and the penalty of a more extreme

transformation (and decreasingR(120591 y)) see Figure 6(b)Figure 6(a) shows a contour plot of 119877(120583

119883= 0 120590

119883=

1 120575 119910) as a function of 120575 and 119910 = 119911 it increases (in absolutevalue) either if 120575 gets larger (for fixed 119910) or for larger 119910 (forfixed 120575) In both cases increasing 120575 makes the transformeddata119882

120575(119911) get closer to 0 = 120583

119883 which in turn increases its

input likelihood For 120575 = 0 the penalty disappears since inputequals output for 119910 = 0 there is no penalty since119882

120575(0) = 0

for all 120575Figure 6(b) shows an iid sample (119873 = 1000) z sim

Lambert W times Gaussian with 120575 = 13 and the decompositionof the log-likelihood as in (28) Since 120573 = (0 1) is knownthe likelihood and penalty are only functions of 120575Theorem 9

shows that the convexity of the penalty (decreasing red)and concavity of the input likelihood (increasing green) asa function of 120575 holds true in general for any data z and theirsum (solid black) has a unique maximum here 120575MLE = 037(blue dashed vertical line) See Theorem 9 below for details

The maximization of (28) can be carried out numericallyHere I show existence and uniqueness of 120575MLE assuming that120583119883and 120590

119883are known Further theoretical results for 120579MLE

remain for futureworkGiven the ldquonicerdquo formof119892119884(119910) which

is continuous twice differentiable (under the assumption that119891119909(sdot) is twice differentiable) the MLE for 120579 = (120573 120575) shouldhave the usual optimality properties such as being consistentand efficient [44]

411 Properties of the MLE for the Heavy Tail Parameter 120575Without loss of generality let 120583

119883= 0 and 120590

119883= 1 In this case

ℓ (120575 z) prop minus

1

2

119873

sum

119894=1

[119882120575(119911119894)]2+

119873

sum

119894=1

log119882120575(119911119894)

119911119894

minus

119873

sum

119894=1

log (1 + 120575 [119882120575(119911119894)]2)

(32)

= minus

1 + 120575

2

119873

sum

119894=1

[119882120575(119911119894)]2minus

119873

sum

119894=1

log (1 + 120575 [119882120575(119911119894)]2)

(33)

Theorem 9 (see [14]) Let 119885 have a Lambert W times Gaussiandistribution where 120583

119883= 0 and 120590

119883= 1 are assumed to be

known and fixed Also consider only 120575 isin [0infin) (While forsome samples z the MLE also exists for 120575 lt 0 it cannot beguaranteed for all z If 120575 lt 0 (and 119911 = 0) then119882

120575(119911) is either

The Scientific World Journal 9

not unique in R (principal and nonprincipal branch) or doesnot have a real-valued solution in R if 1205751199112 lt 119890minus1)

(a) If

sum119899

119894=11199114

119894

sum119899

119894=11199112

119894

le 3 (34)

then 120575119872119871119864

= 0If (34) does not hold then

(b) 120575119872119871119864

gt 0 exists and is a positive solution to

119873

sum

119894=1

1199112

119894119882

1015840(120575119911

2

119894)(

1

2

119882120575(119911119894)2minus (

1

2

+

1

1 +119882(1205751199112

119894)

)) = 0 (35)

(c) There is only one such 120575 satisfying (35) that is 120575119872119871119864

isunique

Proof See Appendix B in the supplementary material

Condition (34) says that 120575MLE gt 0 only if the datais heavy-tailed enough Points (b) and (c) guarantee thatthere is no ambiguity in the heavy tail estimate This is anadvantage over Studentrsquos 119905-distribution for example whichhas numerical problems and local maxima for unknown (andsmall) ] [45 46] On the contrary 120575MLE is always a globalmaximum

Given the heavy tails in z one might expect convergenceissues for larger 120575 However 119882

120575(119885) sim N(0 1) for the true

120575 ge 0 and close to a standard Gaussian if 120575MLE asymp 120575 Since thelog-likelihood and its gradient depend on 120575 and z only via119882120575(z) the performance of theMLE should thus not get worse

for large 120575 as long as the initial estimate is close enough to thetruth Simulations in Section 5 support this conjecture evenfor 120579MLE

42 Iterative Generalized Method of Moments (IGMM) Adisadvantage of the MLE is the mandatory a priori specifi-cation of the input distribution Especially for heavy-taileddata the eye is a bad judge to choose a particular parametric119891119883(119909 | 120573) It would be useful to directly estimate 120591 without

the intermediate step of estimating 120579 firstGoerg [16] presented an estimator for 120591 based on iterative

generalized methods of moments (IGMM) The idea ofIGMM is to find a 120591 such that the back-transformed datax120591has desired properties for example being symmetric or

having kurtosis 3 An estimator for 120583119883 120590

119883 and 120575 can be

constructed entirely analogous to the IGMMestimator for theskewed LambertWtimes F case See the SupplementaryMaterialAppendix C for details

IGMM requires less specific knowledge about the inputdistribution and is usually also faster than the MLE Once120591IGMM has been obtained the back-transformed x

120591IGMMcan be

used to check if 119883 has characteristics of a known parametricdistribution 119865

119883(119909 | 120573) It must be noted though that testing

for a particular distribution 119865119883is too optimistic as x

120591has

ldquonicerrdquo properties regarding 119865119883than the true x would have

However estimating the transformation requires only threeparameters and for a large enough sample losing threedegrees of freedom should not matter in practice

5 Simulations

This section explores finite sample properties of estimatorsfor 120579 = (120583

119883 120590119883 120575) and (120583

119884 120590119884) under Gaussian input

119883 sim N(120583119883 120590

2

119883) In particular it compares Gaussian MLE

(estimation of 120583119884and 120590

119884only) IGMM and Lambert W times

Gaussian MLE and for a heavy tail competitor the median(For IGMM optimization was restricted to 120575 isin [0 10]) Allresults below are based on 119899 = 1 000 replications

51 Estimating 120575 Only Here I show finite sample propertiesof 120575MLE for119880 simN(0 1) where 120583

119883= 0 and 120590

119883= 1 are known

and fixed Theorem 9 shows that 120575MLE is unique either at theboundary 120575 = 0 or at the globally optimal solution to (35)Results in Table 1 were obtained by numerical optimizationrestricted to 120575 ge 0 (hArr log 120575 isin R) using the nlm function in119877

Table 1 suggests that the MLE is asymptotically unbiasedfor every 120575 and converges quickly (at about 119873 = 400) to itsasymptotic variance which is increasing with 120575 Assuming120583119883and 120590

119883to be known is unrealistic and thus these finite

sample properties are only an indication of the behavior of thejoint MLE 120579MLE Nevertheless they are very remarkable forextremely heavy-tailed data (120575 gt 1) where classic average-based methods typically break down One reason lies in theparticular form of the likelihood (32) and its gradient (35)(Theorem 9) although both depend on z they only do sothrough 119882

120575(z) = u sim N(0 1) Hence as long as 120575MLE is

sufficiently close to the true 120575 (32) and (35) are functions ofalmost Gaussian random variables and standard asymptoticresults should still apply

52 Estimating All Parameters Jointly Here we consider therealistic scenario where 120583

119883and 120590

119883are also unknown We

consider various sample sizes (119873 = 50 100 and 1000) anddifferent degrees of heavy tails 120575 isin 0 13 1 15 each onerepresenting a particularly interesting situation (i) Gaussiandata (does additional superfluous estimation of120575 affect otherestimates) (ii) fourthmoments do not exist (iii) nonexistingmean and (iv) extremely heavy-tailed data (can we get usefulestimates at all)

The convergence tolerance for IGMM was set to tol =122 sdot 10

minus4 Table 2 summarizes the simulationThe Gaussian MLE estimates 120590

119884directly while IGMM

and the Lambert W times Gaussian MLE estimate 120575 and120590119883 which implicitly give

119884through 120590

119884(120575 120590

119883) = 120590

119883sdot

(1radic(1 minus 2120575)32) if 120575 lt 12 (see (18)) For a fair comparison

each subtable also includes a column for 119884

= 119883sdot

(1radic(1 minus 2120575)32) Some of these entries contain ldquoinfinrdquo even for

120575 lt 12 this occurs if at least one 120575 ge 12 For any 120575 lt 1120583119883= 120583

119884 thus 120583

119883and 120583

119884can be compared directly For 120575 ge 1

10 The Scientific World Journal

Table 1 Finite sample properties of 120575MLE For each119873 120575 was estimated 119899 = 1000 times from a random sample zsimTukeyrsquos ℎ The left columnfor each 120575 shows bias 120575MLE minus 120575 each right column shows the root mean square error (RMSE) timesradic119873

119873 120575 = 0 120575 = 110 120575 = 13 120575 = 12

10 0025 0191 minus0017 0394 minus0042 0915 minus0082 1167

50 0013 0187 minus0010 0492 minus0018 0931 minus0016 1156

100 0010 0200 minus0010 0513 minus0009 0914 minus0006 1225

400 0005 0186 minus0003 0528 0000 0927 minus0004 1211

1000 0003 0197 0000 0532 minus0001 0928 minus0001 1203

2000 0003 0217 minus0001 0523 0000 0935 minus0001 1130

119873 120575 = 1 120575 = 2 120575 = 5

10 minus0054 1987 minus0104 3384 minus0050 7601

50 minus0017 1948 minus0009 3529 0014 7942

100 minus0014 2024 minus0001 3294 0011 7798

400 0001 1919 minus0002 3433 0001 7855

1000 0001 1955 0001 3553 minus0001 7409

2000 0001 1896 0000 3508 minus0001 7578

the mean does not exist each subtable for these 120575 interprets120583119884as the median

Gaussian Data (120575 = 0) This setting checks if imposing theLambert W framework even though it is superfluous causesa quality loss in the estimation of 120583

119884= 120583

119883or 120590

119884= 120590

119883

Furthermore critical values for 1198670 120575 = 0 (Gaussian) can

be obtained As in the 120575-only case above Table 2(a) suggeststhat estimators are asymptotically unbiased and quickly tendto a large-sample variance Additional estimation of 120575 doesnot affect the efficiency of120583

119883compared to estimating solely120583

Estimating 120590119884directly by Gaussian MLE does not give better

results than the Lambert W times Gaussian MLE

No Fourth Moment (120575 = 13) Here 120590119884(120575 120590

119883= 1) =

228 but fourth moments do not exist This results in anincreasing empirical standard deviation of

119910as119873 grows In

contrast estimates for 120590119883are not drifting off In the presence

of these large heavy tails themedian ismuch less variable thanGaussianMLE and IGMM Yet Lambert W timesGaussianMLEfor 120583

119883even outperforms the median

Nonexisting Mean (120575 = 1) Both sample moments divergeand their standard errors are also growing quickly Themedian still provides a very good estimate for the locationbut is again inferior to both Lambert W estimators whichare closer to the true values and appear to converge to anasymptotic variance at rateradic119873

Extreme Heavy Tails (120575 = 15) As in Section 51 IGMMand Lambert WMLE continue to provide excellent estimateseven though the data is extremely heavy tailed MoreoverLambert W MLE also has the smallest empirical standarddeviation overall In particular the Lambert W MLE for 120583

119883

has an approximately 15 lower standard deviation than themedian

The last column shows that for some 119873 about 1 of the119899 = 1 000 simulations generated invalid likelihood values(NA and infin) Here the search for the optimal 120575 led into

regions with a numerical overflow in the evaluation of119882120575(119911)

For a comparable summary these few cases were omitted andnew simulations added until full 119899 = 1 000 finite estimateswere found Since this only happened in 1 of the cases andalso such heavy-tailed data is rarely encountered in practicethis numerical issue is not a limitation in statistical practice

53 Discussion of the Simulations IGMM performs wellindependent of the magnitude of 120575 As expected the LambertWMLE for 120579 has the best properties it can recover the truthfor all 120575 and for 120575 = 0 it performs as well as the classicsample mean and standard deviation For small 120575 it has thesame empirical standard deviation as the Gaussian MLE buta lower one than the median for large 120575

Hence the only advantage of estimating 120583119884and 120590

119884by

sample moments of y is speed otherwise the Lambert W times

Gaussian MLE is at least as good as the Gaussian MLE andclearly outperforms it in presence of heavy tails

6 Applications

Tukeyrsquos ℎ distribution has already proven useful to modelheavy-tailed data but parametric inference was limited toquantile fitting ormethods ofmoments estimation [4 27 28]Theorem 7 allows us to estimate 120579 by ML

This section applies the presented methodology on sim-ulated as well as real world data (i) Section 61 demonstratesGaussianizing on the Cauchy sample from the Introductionand (ii) Section 62 shows that heavy tail Lambert W times

Gaussian distributions provide an excellent fit to daily SampP500 log-return series

61 Estimating Location of a Cauchy with the Sample MeanIt is well known that the sample mean y is a poor estimateof the location parameter of a Cauchy distribution since thesampling distribution of y is again a Cauchy (see [47] for arecent overview) in particular its variance does not go to 0for119873 rarr infin

The Scientific World Journal 11

Table 2 In each subtable (first rows) average (middle rows) proportion of estimates below truth and (bottom rows) empirical standarddeviation timesradic119873 True 120583

119883= 0 and 120590

119883= 1

(a) Truly Gaussian data 120575 = 0

120575 = 0 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 098 000 097 002 099 000 096 002 098 0

100 000 000 099 000 098 001 100 000 097 001 099 0

1000 000 000 100 000 099 000 100 000 099 000 100 0

50 050 050 057 051 060 066 054 051 065 066 056 0

100 050 051 056 051 062 062 053 052 065 062 056 0

1000 050 049 052 049 062 056 052 049 063 056 052 0

50 124 101 072 101 076 021 073 102 078 026 072 0

100 125 102 070 102 076 023 070 103 078 026 070 0

1000 126 098 073 098 079 022 073 098 079 022 073 0

(b) No fourth moments 120575 = 13

120575 = 13 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 198 000 107 029 infin 000 101 033 infin 0

100 000 000 203 000 104 031 infin 000 100 033 infin 0

1000 000 000 218 000 100 033 234 000 100 033 234 0

50 050 051 078 050 038 063 060 050 052 054 054 0

100 050 051 078 051 042 061 060 050 051 054 054 0

1000 048 051 077 051 047 056 055 051 050 053 052 0

50 127 221 656 144 145 110 NA 123 135 114 NA 0

100 130 233 1128 143 142 112 NA 119 134 109 NA 0

1000 123 225 1676 139 145 120 1597 117 133 108 1230 0

(c) Non-existing mean 120575 = 1

120575 = 1 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 minus010 246 minus001 118 090 infin 000 101 099 infin 0

100 000 074 724 000 109 095 infin 000 101 099 infin 0

1000 000 384 3481 000 101 100 infin 000 100 100 infin 0

50 053 052 10 051 034 065 1 051 052 052 1 0

100 050 052 10 051 038 063 1 050 053 053 1 0

1000 049 052 10 051 048 053 1 049 051 051 1 0

50 127 6585 4243 210 250 232 NA 119 170 216 NA 0

100 130 41075 40502 201 228 259 NA 117 174 225 NA 0

1000 126 330758 1040527 193 221 281 NA 111 164 218 NA 0

(d) Extreme heavy tails 120575 = 15

120575 = 15 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 minus002 684 309 minus002 123 137 infin minus001 100 149 infin 001

100 000 minus5116 3080 minus001 112 144 infin 000 101 150 infin 000

1000 000 17613 14251 000 101 149 infin 000 100 150 infin 000

50 053 048 1 051 034 064 1 053 053 054 1 001

100 051 053 1 054 037 061 1 052 051 051 1 000

1000 050 050 1 050 047 054 1 049 053 052 1 000

50 132 134771 9261 257 320 312 NA 115 186 276 NA 001

100 133 4215628 418435 239 287 344 NA 112 178 284 NA 000

1000 126 12446282 3903629 218 266 367 NA 111 180 285 NA 000

12 The Scientific World Journal

Table 3 Summary statistics for observed (heavy-tailed) y and back-transformed (Gaussianized) data x120591MLE

lowastlowast stands for lt10minus16and lowast forlt22 sdot 10

minus16

y sim C(0 1)

(Section 61)y = SampP 500(Section 62)

y x120591

x

y x120591

Min minus16159 minus316 0 minus711 minus242Max 95295 381 3318 499 223Mean 230 003 1498 005 005Median 004 004 1496 004 004Standard Deviation 46980 106 120 095 071Skewness 1743 012 390 minus030 minus004Kurtosis 34334 321 16175 770 293Shapiro-Wilk lowast 071 lowastlowast lowast 024Anderson-Darling lowastlowast 051 lowastlowast lowast 018

Heavy-tailed Lambert W times Gaussian distributions havesimilar properties to a Cauchy for 120575 asymp 1 The mean of119883 (120583

119883)

equals the location of119884 (119888) due to symmetry around 120583119883and 119888

respectively Thus we can estimate 120591 from the Cauchy sampley transform y to x

120591 get 120583

119883from x

120591= 119882

120591(y) and thus obtain

an estimate for 119888 by 119888 = 120583119883

The random sample y sim C(0 1) with pdf 119891(119910) = 1120587(1+1199102) in Figure 2(a) has heavy tails with two extreme (positive)

observations A Cauchy ML fit gives 119888 = 003(0055) and119904 = 086(0053) (standard errors in parenthesis) A LambertWtimesGaussianMLEgives120583

119883= 003(0055)120590

119883= 105(0072)

and 120575 = 086(0082) Thus both fits correctly fail to reject120583119883= 119888 = 0 Table 3(a) shows summary statistics on both

samples Since the Cauchy distribution does not have a well-defined mean y = 2304(2101) is not meaningful Howeverx120591MLE

is approximately Gaussian and we can use the sampleaverage for inference x

120591MLE= 0033(00472) correctly fails to

reject a zero location for y The transformed x120591MLE

featuresadditional Gaussian characteristics (symmetric no excesskurtosis) and even the null hypothesis of Normality cannotbe rejected (119875-valuege 05) Note however that Normality forthe transformed data is only an empirical approximation therandom variable119882

120591((119884 minus 120583

119883)120590

119883) where 119884 is Cauchy does

not have a Normal distributionFigure 2(d) shows the cumulative sample average for

the original sample and its Gaussianized version For afair comparison 120591(119899)MLE was reestimated cumulatively for each119899 = 5 500 and then used to compute (119909

1 119909

119899) The

transformation works extremely well data point 11991049is highly

influential for y but has no relevant effect on x120591(119899)

MLE Even for

small 119899 it is already clear that the location of the underlyingCauchy distribution is approximately zero

Although it is a simulated example it demonstrates thatremoving (strong) heavy tails from data works well andprovides ldquonicerdquo data that can then be analyzed with morerefined Gaussian methods

62 Heavy Tails in Finance SampP 500 Case Study A lot offinancial data displays negative skewness and excess kurto-sis Since financial data in general is not iid it is oftenmodeled with a (skew) Studentrsquos 119905-distribution underlyinga (generalized) autoregressive conditional heteroskedastic(GARCH) [2 48] or a stochastic volatility (SV) [49 50]model Using the Lambert W approach we can build uponthe knowledge and implications of Normality (and avoidderiving properties of a GARCH or SV model with heavy-tailed innovations) and simply ldquoGaussianizerdquo the returnsbefore fitting more complex GARCH or SV models

Remark 10 Time series models with Lambert W times Gaussianwhite noise are far beyond the scope of this work but canbe a direction of future research Here I only consider theunconditional distribution

Figure 7(a) shows the SampP 500 log-returns with a totalof 119873 = 2 780 daily observations (R package MASSdataset SP500) Table 3(b) confirms the heavy tails (samplekurtosis 770) but also indicates negative skewness (minus0296)As the sample skewness 120574

1(y) is very sensitive to outliers

we fit a distribution which allows skewness and test forsymmetry In case of the double-tail Lambert W times Gaussianthis means testing 119867

0 120575

ℓ= 120575

119903= 120575 versus 119867

1

120575ℓ

= 120575119903 Using the likelihood expression in (28) we can

use a likelihood ratio test with one degree of freedom (3versus 4 parameters) The log-likelihood of the double-tail fit(Table 4(a)) equals minus36060 = minus297227 + (minus63373) (inputlog-likelihood + penalty) while the symmetric 120575 fit givesminus360656 = minus297147 + (minus63509) Here the symmetric fitgives a transformed sample that is more Gaussian but it paysa greater penalty for transforming the data Comparing twicetheir difference to a1205942

1distribution gives a119875-value of 029 For

comparison a skew-119905 fit [51] with location 119888 scale 119904 shape120572 and ] degrees of freedom also yields (Function stmle

in the R package sn) a nonsignificant (Table 4(b)) Thusboth fits cannot reject symmetry

The Scientific World Journal 13

Time

y

0 500 1500 2500

minus6

minus2

2

0 5 2010 3000

04

08

Lag

ACF

y

minus5 0 500

03

06

KernelGaussian

minus3 minus2 minus1 0 1 2 3

minus6

minus2

2

Theoretical quantiles

Sam

ple q

uant

ilesMean 005

SD 095Skewminus03Kurt 771

(a) Observed heavy tail returns y

Time

x

0 500 1500 2500

minus2

012

0 5 10 20 3000

04

08

Lag

ACF

x

00

03

06

minus3 minus2 minus1 0 1 2 3minus3 minus2 minus1 0 1 2 3

minus2

012

Theoretical quantiles

KernelGaussian

Sam

ple q

uant

iles

Mean 005SD 07Skewminus004Kurt 293

(b) Gaussianized returns x120591MLE

Figure 7 Lambert W Gaussianization of SampP 500 log-returns 120591 = (005 070 017) In (a) and (b) data (top left) autocorrelation function(ACF) (top right) histogram Gaussian fit and KDE (bottom left) Normal QQ plot (bottom right)

Assume we have to make a decision if we should trade acertificate replicating the SampP 500 Since we can either buyor sell it is not important if the average return is positive ornegative as long as it is significantly different from zero

621 Gaussian Fit to Returns Estimating (120583119884 120590119884) by Gaus-

sianMLE and thus ignoring the heavy tails 120583119884= 0 cannot be

rejected on a 120572 = 1 level (Table 4(e)) Ignoring heavy tailswe would thus decide to not trade a replicating certificate atall

622 Heavy Tail Fit to Returns Both a heavy tail LambertWtimes Gaussian (Table 4(c)) and Student 119905-fit (Table 4(d)) rejectthe zero mean null (119875-values 10minus4 and 3 sdot 10minus5 resp)

Location and scale estimates are almost identical buttail estimates lead to different conclusions while for ] =

371 only moments up to order 3 exist in the Lambert W times

Gaussian case moments up to order 5 exist (10172 = 581)This is especially noteworthy as many theoretical results inthe (financial) time series literature rely on finite fourthmoments [52 53] consequently many empirical studies testthis assumption [7 54] Here Studentrsquos 119905 and a LambertW times Gaussian fit give different conclusions Since previousempirical studies often use Studentrsquos 119905 as a baseline [41] itmight be worthwhile to reexamine their findings in light ofheavy tail Lambert W times Gaussian distributions

623 ldquoGaussianizingrdquo Financial Returns The back-trans-formed x

120591MLEis indistinguishable from a Gaussian sample

(Figure 7(b)) and thus demonstrates that a Lambert W times

Gaussian distribution is indeed appropriate for x120591MLE

Noteven one test can reject Normality 119875-values are 018 018031 and 024 respectively (Anderson-Darling Cramer-von-Mises Shapiro-Francia Shapiro-Wilk see Thode [55])Table 3(b) confirms that Lambert W ldquoGaussianiziationrdquo was

successful 1205741(x120591) = minus0039 and 120574

2(x120591) = 293 are within the

typical variation for aGaussian sample of size119873 = 2780Thus

119884 = (119880119890(01722)119880

2

) 0705 + 0055 119880 =

119883 minus 0055

0705

119880 simN (0 1)

(36)

is an adequate (unconditional) LambertWtimesGaussianmodelfor the SampP 500 log-returns y For trading thismeans that theexpected return is significantly larger than zero (120583

119883= 0055 gt

0) and thus replicating certificates should be bought

624 Gaussian MLE for Gaussianized Data For 120575119897= 120575

119903equiv

120575 lt 1 also 120583119883equiv 120583

119884 We can therefore replace testing

120583119910= 0 versus 120583

119910= 0 for a non-Gaussian y with the

very well-understood hypothesis test 120583119909= 0 versus 120583

119909= 0

for the Gaussian x120591MLE

In particular standard errors basedon radic119873 and thus t and 119875-values should be closer to theldquotruthrdquo (Tables 4(c) and 4(d)) than a Gaussian MLE on thenon-Gaussian y (Table 4(e)) Table 4(f) shows that standarderrors for 120583x are even a bit too small compared to the heavy-tailed versions Since the ldquoGaussianizingrdquo transformation wasestimated treating x

120591MLEas if it was original data is too

optimistic regarding its Normality (recall the penalty (30) inthe total likelihood (28))

This example confirms that if a model and its theoret-ical properties are based on Normality but the observeddata is heavy-tailed then Gaussianizing the data first givesmore reliable inference than applying Gaussian methods tothe original heavy-tailed data (Figure 1) Clearly a jointestimation of the model parameters based on Lambert Wtimes Gaussian random variables (or any other heavy-taileddistribution) would be optimal However theoretical prop-erties and estimation techniques may not be available orwell understood The Lambert Way to Gaussianize data is

14 The Scientific World Journal

Table 4 MLE fits to SampP 500 y Tables 4(a) 4(b) 4(c) 4(d) and 4(e)and the Gaussianized data x

120591MLETable 4(f)

(a) Double-tail Lambert W times Gaussian = Tukeyrsquos ℎℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 366 000120590119883

071 0016 4400 000120575ℓ

019 0021 899 000120575119903

016 0019 824 000

(b) Skew 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 010 0061 165 010119904 067 0017 3847 000120572 minus008 0101 minus077 044] 373 0297 1257 000

(c) Lambert W times Gaussian = Tukeyrsquos ℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 365 0000120590119883

071 0016 4395 0000120575 017 0016 1105 0000

(d) Studentrsquos 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 006 0015 365 000119904 067 0017 3951 000] 372 0295 1261 000

(e) Gaussian (SampP 500)

Est se 119905 Pr (gt|119905|)120583119884

005 0018 255 001120590119884

095 0013 7457 000

(f) Gaussian (x120591MLE )

Est se 119905 Pr (gt|119905|)120583119909120591

005 0013 381 000120590119909120591

071 0009 7457 000

thus a pragmatic method to improve statistical inferenceon heavy-tailed data while preserving the applicability andinterpretation of Gaussian models

7 Discussion and Outlook

In this work I use the Lambert W function to modeland remove heavy tails from continuous random variablesusing a data-transformation approach For Gaussian randomvariables this not only contributes to existing work onTukeyrsquos ℎ distribution but also gives convincing empiricalresults unimodal data with heavy tails can be transformedto Gaussian data Properties of a Gaussian model MN onthe back-transformed data mimic the features of the ldquotruerdquoheavy-tailed modelM

119866very closely

Since Normality is the single most typical and oftenrequired assumption in many areas of statistics machine

learning and signal processing future research can takemany directions From a theoretical perspective properties ofLambert W times 119865

119883distributions viewed as a generalization of

alreadywell-known distributions119865119883can be studiedThis area

will profit from existing literature on the Lambert W func-tion which has been discovered only recently by the statisticscommunity Empirical work can focus on transforming thedata and comparing approximate Gaussian with joint heavytail analyses The comparisons in this work showed thatapproximate inference for Gaussianized data is comparablewith the direct heavy tail modeling and so provides a simpletool to improve inference for heavy-tailed data in statisticalpractice

I also provide the R package Lambert W publicly avail-able at CRAN to facilitate the use of heavy tail Lambert Wtimes 119865

119883distributions in practice

Disclosure

The first version of this article entitled ldquoClosed-form cdfand pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert Wapproach and how to bijectively lsquoGaussianizersquo heavy-taileddatardquo has appeared online [56]This research was performedwhile the author was at Carnegie Mellon University Cur-rently Georg M Goerg works at Google Inc New York NY10011 USA

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

References

[1] P J Brockwell andR ADavisTime SeriesTheory andMethodsSpringer Series in Statistics 1998

[2] R F Engle ldquoAutoregressive conditional heteroscedasticity withestimates of the variance of United Kingdom inflationrdquo Econo-metrica vol 50 no 4 pp 987ndash1007 1982

[3] C W J Granger and R Joyeux ldquoAn introduction to long-memory time series models and fractional differencingrdquo Jour-nal of Time Series Analysis vol 1 pp 15ndash30 2001

[4] C A Field ldquoUsing the gh distribution to model extreme windspeedsrdquo Journal of Statistical Planning and Inference vol 122 no1-2 pp 15ndash22 2004

[5] A Vazquez J G Oliveira Z Dezso K-I Goh I Kondor andA-L Barabasi ldquoModeling bursts and heavy tails in humandynamicsrdquo Physical Review E vol 73 no 3 Article ID 0361272006

[6] M Gidlund and N Debernardi ldquoScheduling performance ofheavy-tailed data traffic in wireless high-speed shared chan-nelsrdquo in Proceedings of the IEEE Wireless Communications ampNetworking Conference (WCNC rsquo09) pp 1818ndash1823 IEEE PressBudapest Hungary April 2009

[7] R Cont ldquoEmpirical properties of asset returns stylized factsand statistical issuesrdquo Quantitative Finance vol 1 pp 223ndash2362001

[8] T-H Kim and H White ldquoOn more robust estim ation ofskewness and kurtosis simulation and application to the SP500indexrdquo Tech Rep Department of Economics UCSD 2003

The Scientific World Journal 15

[9] T C Aysal and K E Barner ldquoSecond-order heavy-taileddistributions and tail analysisrdquo IEEE Transactions on SignalProcessing vol 54 no 7 pp 2827ndash2832 2006

[10] V K Smith ldquoLeast squares regression with Cauchy errorsrdquoOxford Bulletin of Economics and Statistics vol 35 no 3 pp223ndash231 1973

[11] J Ilow ldquoForecasting network traffic using FARIMAmodels withheavy tailed innovationsrdquo inProceedings of the IEEE InterntionalConference onAcoustics Speech and Signal Processing pp 3814ndash3817 June 2000

[12] W Palma and M Zevallos ldquoFitting non-Gaussian persistentdatardquo Applied Stochastic Models in Business and Industry vol27 no 1 pp 23ndash36 2011

[13] J Nowicka-Zagrajek and R Weron ldquoModeling electricity loadsin California ARMA models with hyperbolic noiserdquo SignalProcessing vol 82 no 12 pp 1903ndash1915 2002

[14] R J Adler R E Feldman and M S Taqqu Eds A PracticalGuide to Heavy Tails Statistical Techniques and ApplicationsBirkhauser Cambridge Mass USA 1998

[15] H Liu J Lafferty and L Wasserman ldquoThe nonparanormalsemiparametric estimation of high dimensional undirectedgraphsrdquo Journal ofMachine Learning Research vol 10 pp 2295ndash2328 2009

[16] G M Goerg ldquoLambert W random variablesmdasha new familyof generalized skewed distributions with applications to riskestimationrdquo The Annals of Applied Statistics vol 5 no 3 pp2197ndash2230 2011

[17] D C Hoaglin ldquoSummarizing shape numerically the g-and-hdistributionsrdquo in Exploring Data Tables Trends and Shapes DC Hoaglin F Mosteller and J W Tukey Eds chapter 11 pp461ndash513 John Wiley amp Sons Hoboken NJ USA 1985

[18] RDevelopment Core Team R A Language and Environment forStatistical Computing R Foundation for Statistical ComputingVienna Austria 2010

[19] C Field and M G Genton ldquoThe Multivariate g-and-h Distri-butionrdquo Technometrics vol 48 no 1 pp 104ndash111 2006

[20] RM Sakia ldquoThe Box-Cox transformation technique a reviewrdquoJournal of the Royal Statistical Society Series D The Statisticianvol 41 no 2 pp 169ndash178 1992

[21] J R Blaylock L E Salathe and R D Green ldquoA note on the Box-Cox transformation under heteroskedasticityrdquoWestern Journalof Agricultural Economics vol 5 no 2 pp 129ndash136 1980

[22] A J Lawrance ldquoA note on the variance of the Box-Cox regres-sion transformation estimaterdquo Journal of the Royal StatisticalSociety C vol 36 no 2 pp 221ndash223 1987

[23] G Tsiotas ldquoOn the use of the Box-Cox transformation onconditional variance modelsrdquo Finance Research Letters vol 4no 1 pp 28ndash32 2007

[24] S Goncalves andNMeddahi ldquoBox-Cox transforms for realizedvolatilityrdquo Journal of Econometrics vol 160 no 1 pp 129ndash1442011

[25] N H Bingham C M Goldie and J L Teugels RegularVariation Encyclopedia of Mathematics and its ApplicationsCambridge University Press Cambridge UK 1989

[26] S Morgenthaler and J W Tukey ldquoFitting quantiles doublingHRHQ andHHH distributionsrdquo Journal of Computational andGraphical Statistics vol 9 no 1 pp 180ndash195 2000

[27] M Fischer ldquoGeneralized Tukey-type distributions with appli-cation to financial and teletraffic datardquo Statistical Papers vol 51no 1 pp 41ndash56 2010

[28] T C Headrick R K Kowalchuk and Y Sheng ldquoParametricprobability densities and distribution functions for Tukey g-and-h transform ations and their use for fi tting datardquo AppliedMathematical Sciences vol 2 no 9 pp 449ndash462 2008

[29] R M Corless G H Gonnet D E Hare D J Jeffrey andD E Knuth ldquoOn the Lambert 119882 functionrdquo Advances inComputational Mathematics vol 5 no 4 pp 329ndash359 1996

[30] M Rosenlicht ldquoOn the explicit solvability of certain transcen-dental equationsrdquo Publications Mathematiques de lrsquoInstitut desHautes Etudes Scientifiques no 36 pp 15ndash22 1969

[31] S R Valluri D J Jeffrey and R M Corless ldquoSome applicationsof the Lambert W function to physicsrdquo Canadian Journal ofPhysics vol 78 no 9 pp 823ndash831 2000

[32] M Galassi J Davies J Theiler et al GNU Scientific LibraryReference Manual 3rd edition 2011 httpwwwgnuorgsoft-waregsl

[33] P Jodra ldquoA closed-form expression for the quantile functionof the Gompertz-Makeham distributionrdquo Mathematics andComputers in Simulation vol 79 no 10 pp 3069ndash3075 2009

[34] A G Pakes ldquoLambertrsquos W infinite divisibility and poissonmixturesrdquo Journal of Mathematical Analysis and Applicationsvol 378 no 2 pp 480ndash492 2011

[35] M Stehlık ldquoDistributions of exact tests in the exponentialfamilyrdquoMetrika vol 57 no 2 pp 145ndash164 2003

[36] J-N Hwang S-R Lay and A Lippman ldquoNonparametricmultivariate density estimation a comparative studyrdquo IEEETransactions on Signal Processing vol 42 no 10 pp 2795ndash28101994

[37] L Wasserman All of Nonparametric Statistics Springer Texts inStatistics 2007

[38] R E Maiboroda and N M Markovich ldquoEstimation of heavy-tailed probability density functionwith application toweb datardquoComputational Statistics vol 19 no 4 pp 569ndash592 2004

[39] N M Markovich ldquoAccuracy of transformed kernel densityestimates for a heavy-tailed distributionrdquo Automation andRemote Control vol 66 no 2 pp 217ndash232 2005

[40] K K Dutta and D Babbel ldquoOn measuring skewness andkurtosis in short rate distributions the case of the US dollarLondon inter bank offer ratesrdquo Tech Rep Wharton SchoolCenter for Financial Institutions University of Pennsylvania2002

[41] C S Wong W S Chan and P L Kam ldquoA Student t-mixtureautoregressivemodel with applications to heavy-tailed financialdatardquo Biometrika vol 96 no 3 pp 751ndash760 2009

[42] J Yan ldquoAsymmetry fat-tail and autoregressive conditionaldensity in financial return data with systems of frequencycurvesrdquo 2005 httpciteseerxistpsueduviewdocsummarydoi=1011762741

[43] G D Rayner and H L MacGillivray ldquoNumerical maximumlikelihood estimation for the g-and-k and generalized g-and-hdistributionsrdquo Statistics and Computing vol 12 no 1 pp 57ndash752002

[44] E L Lehmann and G Casella Theory of Point EstimationSpringer Texts in Statistics Springer New York NY USA 2ndedition 1998

[45] C Fernandez andM F Steel ldquoMultivariate Student-119905 regressionmodels pitfalls and inferencerdquo Biometrika vol 86 no 1 pp153ndash167 1999

[46] C Liu and D B Rubin ldquoML estimation of the t distributionusing EM and its extensions ECMand ECMErdquo Statistica Sinicavol 5 pp 19ndash39 1995

16 The Scientific World Journal

[47] M P Cohen ldquoSample means of independent standard CAUchyrandom variables are standard cauchy a new approachrdquo TheAmerican Mathematical Monthly vol 119 no 3 pp 240ndash2442012

[48] T Bollerslev ldquoGeneralized autoregressive conditional het-eroskedasticityrdquo Journal of Econometrics vol 31 no 3 pp 307ndash327 1986

[49] R Deo C Hurvich and Y Lu ldquoForecasting realized volatilityusing a long-memory stochastic volatility model estimationprediction and seasonal adjustmentrdquo Journal of Econometricsvol 131 no 1-2 pp 29ndash58 2006

[50] AMelino and SM Turnbull ldquoPricing foreign currency optionswith stochastic volatilityrdquo Journal of Econometrics vol 45 no 1-2 pp 239ndash265 1990

[51] A Azzalini and A Capitanio ldquoDistributions generated byperturbation of symmetrywith emphasis on amultivariate skewt-distributionrdquo Journal of the Royal Statistical Society Series BStatistical Methodology vol 65 no 2 pp 367ndash389 2003

[52] R N Mantegna and H E Stanley ldquoModeling of financial datacomparison of the truncated Levy flight and the ARCH(1) andGARCH(11) processesrdquo Physica A Statistical Mechanics and itsApplications vol 254 no 1-2 pp 77ndash84 1998

[53] P A Zadrozny Necessary and Sufficient Restrictions for Exis-tence of a Unique Fourth Moment of a Univariate GARCH(pq)Process vol 20 of Advances in Econometrics chapter 13Emerald Group Publishing 2005 httpsideasrepecorgpcesceswps 1505html

[54] R Huisman K G Koedijk C J Kool and F Palm ldquoTail-indexestimates in small samplesrdquo Journal of Business amp EconomicStatistics vol 19 no 2 pp 208ndash216 2001

[55] H CThode JrTesting for Normality CRCPress NewYorkNYUSA 2002

[56] G M Goerg ldquoClosed-form cdf and pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert W approach and howto bijectively ldquoGaussianizerdquo heavy-tailed datardquo Tech RepCarnegieMellonUniversity 2010 httparxiv-web3librarycor-nelleduabs10102265v1

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article The Lambert Way to Gaussianize Heavy

The Scientific World Journal 5

120575 0110

00 05 10 15 20 25 30000204060810

y

Pdf

00 05 10 15 20 25 300002040608

y

Cdf

(a) Lambert Wtimes1205942

119896with 120573=119896=1

13

Pdf

000204060810

Cdf

0 2 4 6 8 10 12000

010

020

y

0 2 4 6 8 10 12y

(b) Lambert W times Γ(119904 119903) with 120573 =(119904 119903) = (3 1)

1

minus4 minus2 0 2 4000204060810

y

Cdf

minus4 minus2 0 2 40001020304

y

Pdf

(c) Lambert W times N(120583 1205902) with

120573 = (120583 120590) = (0 1)

minus4 minus2 0 2 4000204060810

y

Cdf

minus4 minus2 0 2 4000102030405

y

Pdf

5

(d) Lambert W times 119880(119886 119887) with120573 = (119886 119887) = (minus1 1)

Figure 4 Pdf (left) and cdf (right) of a heavy tail (a) ldquononcentral nonscaledrdquo (b) ldquoscalerdquo and (c and d) ldquolocation-scalerdquo Lambert W times 119865119883

random variable 119884 for various degrees of heavy tails (color dashed lines)

Definition 3 Let 119883 sim 119865119883(119909119904 | 120573) be a continuous scale-

family random variable with scale parameter 119904 and standarddeviation 120590

119883 let 119880 = 119883120590

119883 Then

119884 = 119883 exp(1205752

1198802) 120575 isin R (7)

is a scaled heavy-tailed Lambert W times 119865119883random variable

with parameter 120579 = (120573 120575)

Let 120591 = (120583119883(120573) 120590

119883(120573) 120575) define transformation (6) (For

noncentral nonscale input set 120591 = (0 1 120575) for scale-familyinput 120591 = (0 120590

119883 120575)) The shape parameter 120575(= Tukeyrsquos ℎ)

governs the tail behavior of 119884 for 120575 gt 0 values further awayfrom 120583

119883are increasingly emphasized leading to a heavy-

tailed version of119883 for 120575 = 0 119884 equiv 119883 and for 120575 lt 0 values faraway from the mean are mapped back again closer to 120583

119883 For

120575 ge 0 and 119883 isin (minusinfininfin) also 119884 isin (minusinfininfin) For 120575 ge 0 and119883 isin [0infin) also 119884 isin [0infin)

Remark 4 (only nonnegative 120575) Although 120575 lt 0 givesinteresting properties for 119884 it defines a nonbijective trans-formation and leads to parameter-dependent support andnonunique inputThus for the remainder of this work assume120575 ge 0 unless stated otherwise

23 Inverse Transformation ldquoGaussianizerdquo Heavy-Tailed DataTransformation (6) is bijective and its inverse can beobtained via the Lambert W function which is the inverseof 119911 = 119906 exp(119906) that is that function which satisfies119882(119911) exp(119882(119911)) = 119911 It has been studied extensively inmathematics physics and other areas of science [29ndash31] andis implemented in the GNU Scientific Library (GSL) [32]Only recently the Lambert W function received attentionin the statistics literature [16 33ndash35] It has many usefulproperties (see Appendix A in the supplementary material

and Corless et al [29]) in particular 119882(119911) is bijective for119911 ge 0

Lemma 5 The inverse transformation of (6) is

119882120591 (119884) = 119882120575

(

119884 minus 120583119883

120590119883

)120590119883+ 120583

119883= 119880120590

119883+ 120583

119883= 119883 (8)

where

119882120575 (119911) = sgn (119911)(

119882(1205751199112)

120575

)

12

(9)

and sgn(119911) is the sign of 119911119882120575(119911) is bijective for all 120575 ge 0 and

all 119911 isin R

Lemma 5 gives for the first time an analytic bijectiveinverse of Tukeyrsquos ℎ transformation119867minus1

120575(119910) of Morgenthaler

and Tukey [26] is now analytically available as (8) Bijectivityimplies that for any data y and parameter 120591 the exact inputx120591= 119882

120591(y) sim 119865

119883(119909) can be obtained

In view of the importance and popularity of Normalitywe clearly want to back-transform heavy-tailed data todata from a Gaussian rather than yet another heavy-taileddistribution Tail behavior of random variables is typicallycompared by their kurtosis 120574

2(119883) = E(119883 minus 120583

119883)4120590

4

119883 which

equals 3 if 119883 is Gaussian Hence for the future when weldquonormalize yrdquo we cannot only center and scale but alsotransform it to x

120591with 120574

2(x120591) = 3 (see Figure 2(c)) While

1205742(119883) = 3 does not guarantee that 119883 is Gaussian it is a good

baseline for a Gaussian sample Furthermore it puts differentdata not only on the same scale but also on the same tail

This data-driven view of the Lambert W framework canalso be useful for kernel density estimation (KDE) wheremultivariate data is often prescaled to unit variance so thesame bandwidth can be used in each dimension [36 37]Thus

6 The Scientific World Journal

ldquonormalizingrdquo the Lambert Way can also improve KDE forheavy-tailed data (see also [38 39])

Remark 6 (generalized transformation) Transformation (2)can be generalized to

119885 = 119880 exp(1205752

(1198802)

120572

) 120572 gt 0 (10)

The inner term 1198802 guarantees bijectivity for all 120572 gt 0 The

inverse is

119882120575120572 (

119911) = sgn (119911)(119882(120572120575 (119911

2)

120572

)

120572120575

)

12120572

(11)

For comparison with Tukeyrsquos ℎ I consider 120572 = 1 onlyFor 120572 = 12 transformation (10) is closely related to skewedLambert W times 119865

119883distributions

24 Distribution and Density For ease of notation let

119911 =

119910 minus 120583119883

120590119883

119906 = 119882120575 (119911) 119909 = 119882

120591(119910) = 119906120590

119883+ 120583

119883 (12)

Theorem 7 The cdf and pdf of a location-scale heavy tailLambert W times 119865

119883random variable 119884 equal

119866119884(119910 | 120573 120575) = 119865

119883(119882

120575(

119910 minus 120583119909

120590119909

)120590119883+ 120583

119883| 120573) (13)

119892119884(119910 | 120573 120575)

= 119891119883(119882

120575(

119910 minus 120583119883

120590119883

)120590119883+ 120583

119883| 120573)

sdot

119882120575((119910 minus 120583

119883) 120590

119883)

((119910 minus 120583119883) 120590

119883) [1 +119882(120575 ((119910 minus 120583

119883) 120590

119883)2)]

(14)

Clearly 119866119884(119910 | 120573 120575 = 0) = 119865

119883(119910 | 120573) and 119892

119884(119910 | 120573 120575 =

0) = 119891119883(119910 | 120573) since lim

120575rarr0119882120575(119911) = 119911 and lim

120575rarr0119882(120575119911

2) =

0 for all 119911 isin RFor scale family or noncentral nonscale input set 120583

119883= 0

or 120583119883= 0 120590

119883= 1

The explicit formula (14) allows a fast computation andtheoretical analysis of the likelihood which is essential forstatistical inference Detailed properties of (14) are given inSection 41

Figure 4 shows (13) and (14) for various 120575 ge 0 with fourdifferent input distributions for 120575 = ℎ = 0 the input equalsthe output (solid black) for larger 120575 the tails of 119866

119884(119910 | 120579) and

119892119884(119910 | 120579) get heavier (dashed colored)

25 Quantile Function Quantile fitting has been the standardtechnique to estimate 120583

119883120590

119883 and 120575 of Tukeyrsquos ℎ In particular

the medians of 119884 and 119883 are equal Thus for symmetriclocation-scale family input the sample median of y is a robust

estimate of 120583119883for any 120575 ge 0 (see also Section 5) General

quantiles can be computed via [17]

119910120572= 119906

120572exp(120575

2

1199062

120572)120590

119883+ 120583

119883 (15)

where 119906120572= 119882

120575(119911120572) are the 120572-quantiles of 119865

119880(119906)

3 Tukeyrsquos ℎ Distribution Gaussian Input

For Gaussian input Lambert W times 119865119883equals Tukeyrsquos ℎ which

has been studied extensively Dutta and Babbel [40] show that

E119885119899=

0 if 119899 is odd 119899 lt 1120575

119899 (1 minus 119899120575)minus(119899+1)2

21198992(1198992)

if 119899 is even 119899 lt 1120575

∄ if 119899 is odd 119899 gt 1120575

infin if 119899 is even 119899 gt 1120575

(16)

which in particular implies that [28]

E119885 = E1198853= 0 if 120575 lt 1 and 1

3

respectively (17)

E1198852=

1

(1 minus 2120575)32 if 120575 lt 1

2

E1198854= 3

1

(1 minus 4120575)52 if 120575 lt 1

4

(18)

Thus the kurtosis of 119884 equals (see Figure 5)

1205742 (120575) = 3

(1 minus 2120575)3

(1 minus 4120575)52

for 120575 lt 14 (19)

For 120575 = 0 (18) and (19) reduce to the familiarGaussian valuesExpanding (19) around 120575 = 0 yields

1205742 (120575) = 3 + 12120575 + 66120575

2+ O (120575

3) (20)

Dropping O(1205753) and solving (20) gives a rule of thumbestimate

120575Taylor =

1

66

[radic66 1205742(y) minus 162 minus 6]

+

(21)

where 1205742(y) is the sample kurtosis and [119886]

+= max(119886 0) that

is 120575Taylor gt 0 if 1205742(y) gt 3 otherwise set 120575Taylor = 0

Corollary 8 The cdf of Tukeyrsquos ℎ equals

119866119884(119910 | 120583

119883 120590119883 120575) = Φ(

119882120591(119910) minus 120583

119883

120590119883

) (22)

whereΦ(119906) is the cdf of a standard NormalThe pdf equals (for120575 gt 0)

119892119884(119910 | 120583

119883 120590119883 120575) =

1

radic2120587

exp(minus1 + 1205752

119882120575(

119910 minus 120583119883

120590119883

)

2

)

sdot

1

1 +119882(120575 ((119910 minus 120583119883) 120590

119883)2)

(23)

The Scientific World Journal 7

000 005 010 015 020 02510

15

20

25

Varia

nce

Studentrsquos t

120575 or 1

Heavy tail Lambert W times Gaussian

(a) Variance

000 005 010 0153

4

5

6

7

8

9

10

Kurt

osis

120575 or 1

Studentrsquos tHeavy tail Lambert W times Gaussian

(b) Kurtosis

Figure 5 Comparing moments of Lambert W times Gaussian and Studentrsquos 119905

Proof Take119883 simN(120583119883 120590

2

119883) in Theorem 7

Section 41 studies functional properties of (23) in moredetail

31 Tukeyrsquos ℎ versus Studentrsquos 119905 Studentrsquos 119905]-distribution with] degrees of freedom is often used to model heavy-tailed data[41 42] as its tail index equals ] Thus the 119899th moment of aStudentrsquos 119905 random variable 119879 exists if 119899 lt ] In particular

E119879 = E1198793= 0 if ] gt 1 or gt 3

E1198792=

]] minus 2

=

1

1 minus (2])if ] gt 2

(24)

and kurtosis

1205742 (]) = 3

] minus 2] minus 4

= 3

1 minus 2 (1])1 minus 4 (1])

if ] gt 4 (25)

Comparing (24) and (25) with (18) and (19) shows anatural association between 1] and 120575 and a close similaritybetween the first four moments of Studentrsquos 119905 and Tukeyrsquosℎ (Figure 5) By continuity and monotonicity the first fourmoments of a location-scale 119905-distribution can always beexactly matched by a corresponding location-scale LambertW times Gaussian Thus if Studentrsquos 119905 is used to model heavytails and not as the true distribution of a test statisticit might be worthwhile to also fit heavy tail Lambert Wtimes Gaussian distributions for an equally valuable ldquosecondopinionrdquo For example a parallel analysis on SampP 500 log-returns in Section 62 leads to divergent inference regardingthe existence of fourth moments

4 Parameter Estimation

Due to the lack of a closed form pdf of 119884 120579 = (120573 120575) hastypically been estimated by matching quantiles or a methodof moments estimator [4 26 28] These methods can now

be replaced by the fast and usually efficient maximumlikelihood estimator (MLE) Rayner and MacGillivray [43]introduce a numerical MLE procedure based on quantilefunctions but they conclude that ldquosample sizes significantlylarger than 100 should be used to obtain reliable estimatesrdquoSimulations in Section 5 show that the MLE using the closedform Lambert W times 119865

119883distribution converges quickly and is

accurate even for sample sizes as small as119873 = 10

41 Maximum Likelihood Estimation (MLE) For an iidsample (119910

1 119910

119873) = y sim 119892

119884(119910 | 120573 120575) the log-likelihood

function equals

ℓ (120579 y) =119873

sum

119894=1

log119892119884(119910

119894| 120573 120575) (26)

The MLE is that 120579 = (120573 120575) which maximizes (26) that is

120579MLE = (

120573 120575)MLE

= argmax120573120575ℓ (120573 120575 y) (27)

Since 119892119884(119910119894| 120573 120575) is a function of 119891

119883(119909119894| 120573) the MLE

depends on the specification of the input density Equation(26) can be decomposed as

ℓ (120573 120575 y) = ℓ (120573 x120591) +R (120591 y) (28)

where

ℓ (120573 x120591) =

119873

sum

119894=1

log119891119883(119882

120575(

119910119894minus 120583

119883

120590119883

)120590119883+ 120583

119883| 120573)

=

119873

sum

119894=1

log119891119883(119909

120591119894| 120573)

(29)

is the log-likelihood of the back-transformed data x120591=

(1199091205911 119909

120591119873) (via (8)) and

R (120591 y) =119899

sum

119894=1

log119877 (120583119883 120590119883 120575 119910

119894) (30)

8 The Scientific World Journal

y

01

015 02 025 03 035 04

045 05

055

06 065 07 075 08 085 09 095

1

00 05 10 1500

05

10

15

20

25

30

120575

R(120575 y | 120583 = 0 120590 = 1)

(a) Penalty 119877(120583119883 120590119883 120575 119910119894)(31) as a function of120575 and 119910 (120583119883 = 0 and 120590119883 = 1)

0 200 400 600 800 1000

minus10

0

10

20

30

40

Index00 05 10 15

minus3000

minus2000

minus1000

0

Log-

likel

ihoo

d (a

nd p

enal

ty)

Penalty

Input loglikOutput loglik

120575

120575trueMLE

(b) Sample z of Lambert W times Gaussian with 120575 = 13 (left) log-likelihood ℓ(120579 y) (solid black)decomposes in input log-likelihood (dotted green) and penalty (dashed red)

Figure 6 Log-likelihood decomposition for Lambert W times 119865119883distributions

where

119877 (120583119883 120590119883 120575 119910

119894)

=

119882120575((119910

119894minus 120583

119883) 120590

119883)

((119910119894minus 120583

119883) 120590

119883) [1 + 120575 (119882

120575((119910

119894minus 120583

119883) 120590

119883))2]

(31)

Note that 119877(120583119883 120590119883 120575 119910

119894) only depends on 120583

119883(120573) and 120590

119883(120573)

(and 120575) but not necessarily on every coordinate of 120573Decomposition (28) shows the difference between the

exact MLE (120573 120575) based on y and the approximate MLE 120573x120591based on x

120591alone if we knew 120591 = (120583

119883 120590119883 120575) beforehand

then we could back-transform y to x120591and estimate 120573x120591 from

x120591(maximize (29) with respect to 120573) In practice however

120591 must also be estimated and this enters the likelihood viathe additive term R(120591 y) A little calculation shows that forany 119910

119894isin R log119877(120583

119883 120590119883 120575 119910

119894) le 0 if 120575 ge 0 with equality

if and only if 120575 = 0 Thus R(120591 y) can be interpreted as apenalty for transforming the data Maximizing (28) faces atrade-off between transforming the data to follow 119891

119883(119909 | 120573)

(and increasing ℓ(120573 x120591)) and the penalty of a more extreme

transformation (and decreasingR(120591 y)) see Figure 6(b)Figure 6(a) shows a contour plot of 119877(120583

119883= 0 120590

119883=

1 120575 119910) as a function of 120575 and 119910 = 119911 it increases (in absolutevalue) either if 120575 gets larger (for fixed 119910) or for larger 119910 (forfixed 120575) In both cases increasing 120575 makes the transformeddata119882

120575(119911) get closer to 0 = 120583

119883 which in turn increases its

input likelihood For 120575 = 0 the penalty disappears since inputequals output for 119910 = 0 there is no penalty since119882

120575(0) = 0

for all 120575Figure 6(b) shows an iid sample (119873 = 1000) z sim

Lambert W times Gaussian with 120575 = 13 and the decompositionof the log-likelihood as in (28) Since 120573 = (0 1) is knownthe likelihood and penalty are only functions of 120575Theorem 9

shows that the convexity of the penalty (decreasing red)and concavity of the input likelihood (increasing green) asa function of 120575 holds true in general for any data z and theirsum (solid black) has a unique maximum here 120575MLE = 037(blue dashed vertical line) See Theorem 9 below for details

The maximization of (28) can be carried out numericallyHere I show existence and uniqueness of 120575MLE assuming that120583119883and 120590

119883are known Further theoretical results for 120579MLE

remain for futureworkGiven the ldquonicerdquo formof119892119884(119910) which

is continuous twice differentiable (under the assumption that119891119909(sdot) is twice differentiable) the MLE for 120579 = (120573 120575) shouldhave the usual optimality properties such as being consistentand efficient [44]

411 Properties of the MLE for the Heavy Tail Parameter 120575Without loss of generality let 120583

119883= 0 and 120590

119883= 1 In this case

ℓ (120575 z) prop minus

1

2

119873

sum

119894=1

[119882120575(119911119894)]2+

119873

sum

119894=1

log119882120575(119911119894)

119911119894

minus

119873

sum

119894=1

log (1 + 120575 [119882120575(119911119894)]2)

(32)

= minus

1 + 120575

2

119873

sum

119894=1

[119882120575(119911119894)]2minus

119873

sum

119894=1

log (1 + 120575 [119882120575(119911119894)]2)

(33)

Theorem 9 (see [14]) Let 119885 have a Lambert W times Gaussiandistribution where 120583

119883= 0 and 120590

119883= 1 are assumed to be

known and fixed Also consider only 120575 isin [0infin) (While forsome samples z the MLE also exists for 120575 lt 0 it cannot beguaranteed for all z If 120575 lt 0 (and 119911 = 0) then119882

120575(119911) is either

The Scientific World Journal 9

not unique in R (principal and nonprincipal branch) or doesnot have a real-valued solution in R if 1205751199112 lt 119890minus1)

(a) If

sum119899

119894=11199114

119894

sum119899

119894=11199112

119894

le 3 (34)

then 120575119872119871119864

= 0If (34) does not hold then

(b) 120575119872119871119864

gt 0 exists and is a positive solution to

119873

sum

119894=1

1199112

119894119882

1015840(120575119911

2

119894)(

1

2

119882120575(119911119894)2minus (

1

2

+

1

1 +119882(1205751199112

119894)

)) = 0 (35)

(c) There is only one such 120575 satisfying (35) that is 120575119872119871119864

isunique

Proof See Appendix B in the supplementary material

Condition (34) says that 120575MLE gt 0 only if the datais heavy-tailed enough Points (b) and (c) guarantee thatthere is no ambiguity in the heavy tail estimate This is anadvantage over Studentrsquos 119905-distribution for example whichhas numerical problems and local maxima for unknown (andsmall) ] [45 46] On the contrary 120575MLE is always a globalmaximum

Given the heavy tails in z one might expect convergenceissues for larger 120575 However 119882

120575(119885) sim N(0 1) for the true

120575 ge 0 and close to a standard Gaussian if 120575MLE asymp 120575 Since thelog-likelihood and its gradient depend on 120575 and z only via119882120575(z) the performance of theMLE should thus not get worse

for large 120575 as long as the initial estimate is close enough to thetruth Simulations in Section 5 support this conjecture evenfor 120579MLE

42 Iterative Generalized Method of Moments (IGMM) Adisadvantage of the MLE is the mandatory a priori specifi-cation of the input distribution Especially for heavy-taileddata the eye is a bad judge to choose a particular parametric119891119883(119909 | 120573) It would be useful to directly estimate 120591 without

the intermediate step of estimating 120579 firstGoerg [16] presented an estimator for 120591 based on iterative

generalized methods of moments (IGMM) The idea ofIGMM is to find a 120591 such that the back-transformed datax120591has desired properties for example being symmetric or

having kurtosis 3 An estimator for 120583119883 120590

119883 and 120575 can be

constructed entirely analogous to the IGMMestimator for theskewed LambertWtimes F case See the SupplementaryMaterialAppendix C for details

IGMM requires less specific knowledge about the inputdistribution and is usually also faster than the MLE Once120591IGMM has been obtained the back-transformed x

120591IGMMcan be

used to check if 119883 has characteristics of a known parametricdistribution 119865

119883(119909 | 120573) It must be noted though that testing

for a particular distribution 119865119883is too optimistic as x

120591has

ldquonicerrdquo properties regarding 119865119883than the true x would have

However estimating the transformation requires only threeparameters and for a large enough sample losing threedegrees of freedom should not matter in practice

5 Simulations

This section explores finite sample properties of estimatorsfor 120579 = (120583

119883 120590119883 120575) and (120583

119884 120590119884) under Gaussian input

119883 sim N(120583119883 120590

2

119883) In particular it compares Gaussian MLE

(estimation of 120583119884and 120590

119884only) IGMM and Lambert W times

Gaussian MLE and for a heavy tail competitor the median(For IGMM optimization was restricted to 120575 isin [0 10]) Allresults below are based on 119899 = 1 000 replications

51 Estimating 120575 Only Here I show finite sample propertiesof 120575MLE for119880 simN(0 1) where 120583

119883= 0 and 120590

119883= 1 are known

and fixed Theorem 9 shows that 120575MLE is unique either at theboundary 120575 = 0 or at the globally optimal solution to (35)Results in Table 1 were obtained by numerical optimizationrestricted to 120575 ge 0 (hArr log 120575 isin R) using the nlm function in119877

Table 1 suggests that the MLE is asymptotically unbiasedfor every 120575 and converges quickly (at about 119873 = 400) to itsasymptotic variance which is increasing with 120575 Assuming120583119883and 120590

119883to be known is unrealistic and thus these finite

sample properties are only an indication of the behavior of thejoint MLE 120579MLE Nevertheless they are very remarkable forextremely heavy-tailed data (120575 gt 1) where classic average-based methods typically break down One reason lies in theparticular form of the likelihood (32) and its gradient (35)(Theorem 9) although both depend on z they only do sothrough 119882

120575(z) = u sim N(0 1) Hence as long as 120575MLE is

sufficiently close to the true 120575 (32) and (35) are functions ofalmost Gaussian random variables and standard asymptoticresults should still apply

52 Estimating All Parameters Jointly Here we consider therealistic scenario where 120583

119883and 120590

119883are also unknown We

consider various sample sizes (119873 = 50 100 and 1000) anddifferent degrees of heavy tails 120575 isin 0 13 1 15 each onerepresenting a particularly interesting situation (i) Gaussiandata (does additional superfluous estimation of120575 affect otherestimates) (ii) fourthmoments do not exist (iii) nonexistingmean and (iv) extremely heavy-tailed data (can we get usefulestimates at all)

The convergence tolerance for IGMM was set to tol =122 sdot 10

minus4 Table 2 summarizes the simulationThe Gaussian MLE estimates 120590

119884directly while IGMM

and the Lambert W times Gaussian MLE estimate 120575 and120590119883 which implicitly give

119884through 120590

119884(120575 120590

119883) = 120590

119883sdot

(1radic(1 minus 2120575)32) if 120575 lt 12 (see (18)) For a fair comparison

each subtable also includes a column for 119884

= 119883sdot

(1radic(1 minus 2120575)32) Some of these entries contain ldquoinfinrdquo even for

120575 lt 12 this occurs if at least one 120575 ge 12 For any 120575 lt 1120583119883= 120583

119884 thus 120583

119883and 120583

119884can be compared directly For 120575 ge 1

10 The Scientific World Journal

Table 1 Finite sample properties of 120575MLE For each119873 120575 was estimated 119899 = 1000 times from a random sample zsimTukeyrsquos ℎ The left columnfor each 120575 shows bias 120575MLE minus 120575 each right column shows the root mean square error (RMSE) timesradic119873

119873 120575 = 0 120575 = 110 120575 = 13 120575 = 12

10 0025 0191 minus0017 0394 minus0042 0915 minus0082 1167

50 0013 0187 minus0010 0492 minus0018 0931 minus0016 1156

100 0010 0200 minus0010 0513 minus0009 0914 minus0006 1225

400 0005 0186 minus0003 0528 0000 0927 minus0004 1211

1000 0003 0197 0000 0532 minus0001 0928 minus0001 1203

2000 0003 0217 minus0001 0523 0000 0935 minus0001 1130

119873 120575 = 1 120575 = 2 120575 = 5

10 minus0054 1987 minus0104 3384 minus0050 7601

50 minus0017 1948 minus0009 3529 0014 7942

100 minus0014 2024 minus0001 3294 0011 7798

400 0001 1919 minus0002 3433 0001 7855

1000 0001 1955 0001 3553 minus0001 7409

2000 0001 1896 0000 3508 minus0001 7578

the mean does not exist each subtable for these 120575 interprets120583119884as the median

Gaussian Data (120575 = 0) This setting checks if imposing theLambert W framework even though it is superfluous causesa quality loss in the estimation of 120583

119884= 120583

119883or 120590

119884= 120590

119883

Furthermore critical values for 1198670 120575 = 0 (Gaussian) can

be obtained As in the 120575-only case above Table 2(a) suggeststhat estimators are asymptotically unbiased and quickly tendto a large-sample variance Additional estimation of 120575 doesnot affect the efficiency of120583

119883compared to estimating solely120583

Estimating 120590119884directly by Gaussian MLE does not give better

results than the Lambert W times Gaussian MLE

No Fourth Moment (120575 = 13) Here 120590119884(120575 120590

119883= 1) =

228 but fourth moments do not exist This results in anincreasing empirical standard deviation of

119910as119873 grows In

contrast estimates for 120590119883are not drifting off In the presence

of these large heavy tails themedian ismuch less variable thanGaussianMLE and IGMM Yet Lambert W timesGaussianMLEfor 120583

119883even outperforms the median

Nonexisting Mean (120575 = 1) Both sample moments divergeand their standard errors are also growing quickly Themedian still provides a very good estimate for the locationbut is again inferior to both Lambert W estimators whichare closer to the true values and appear to converge to anasymptotic variance at rateradic119873

Extreme Heavy Tails (120575 = 15) As in Section 51 IGMMand Lambert WMLE continue to provide excellent estimateseven though the data is extremely heavy tailed MoreoverLambert W MLE also has the smallest empirical standarddeviation overall In particular the Lambert W MLE for 120583

119883

has an approximately 15 lower standard deviation than themedian

The last column shows that for some 119873 about 1 of the119899 = 1 000 simulations generated invalid likelihood values(NA and infin) Here the search for the optimal 120575 led into

regions with a numerical overflow in the evaluation of119882120575(119911)

For a comparable summary these few cases were omitted andnew simulations added until full 119899 = 1 000 finite estimateswere found Since this only happened in 1 of the cases andalso such heavy-tailed data is rarely encountered in practicethis numerical issue is not a limitation in statistical practice

53 Discussion of the Simulations IGMM performs wellindependent of the magnitude of 120575 As expected the LambertWMLE for 120579 has the best properties it can recover the truthfor all 120575 and for 120575 = 0 it performs as well as the classicsample mean and standard deviation For small 120575 it has thesame empirical standard deviation as the Gaussian MLE buta lower one than the median for large 120575

Hence the only advantage of estimating 120583119884and 120590

119884by

sample moments of y is speed otherwise the Lambert W times

Gaussian MLE is at least as good as the Gaussian MLE andclearly outperforms it in presence of heavy tails

6 Applications

Tukeyrsquos ℎ distribution has already proven useful to modelheavy-tailed data but parametric inference was limited toquantile fitting ormethods ofmoments estimation [4 27 28]Theorem 7 allows us to estimate 120579 by ML

This section applies the presented methodology on sim-ulated as well as real world data (i) Section 61 demonstratesGaussianizing on the Cauchy sample from the Introductionand (ii) Section 62 shows that heavy tail Lambert W times

Gaussian distributions provide an excellent fit to daily SampP500 log-return series

61 Estimating Location of a Cauchy with the Sample MeanIt is well known that the sample mean y is a poor estimateof the location parameter of a Cauchy distribution since thesampling distribution of y is again a Cauchy (see [47] for arecent overview) in particular its variance does not go to 0for119873 rarr infin

The Scientific World Journal 11

Table 2 In each subtable (first rows) average (middle rows) proportion of estimates below truth and (bottom rows) empirical standarddeviation timesradic119873 True 120583

119883= 0 and 120590

119883= 1

(a) Truly Gaussian data 120575 = 0

120575 = 0 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 098 000 097 002 099 000 096 002 098 0

100 000 000 099 000 098 001 100 000 097 001 099 0

1000 000 000 100 000 099 000 100 000 099 000 100 0

50 050 050 057 051 060 066 054 051 065 066 056 0

100 050 051 056 051 062 062 053 052 065 062 056 0

1000 050 049 052 049 062 056 052 049 063 056 052 0

50 124 101 072 101 076 021 073 102 078 026 072 0

100 125 102 070 102 076 023 070 103 078 026 070 0

1000 126 098 073 098 079 022 073 098 079 022 073 0

(b) No fourth moments 120575 = 13

120575 = 13 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 198 000 107 029 infin 000 101 033 infin 0

100 000 000 203 000 104 031 infin 000 100 033 infin 0

1000 000 000 218 000 100 033 234 000 100 033 234 0

50 050 051 078 050 038 063 060 050 052 054 054 0

100 050 051 078 051 042 061 060 050 051 054 054 0

1000 048 051 077 051 047 056 055 051 050 053 052 0

50 127 221 656 144 145 110 NA 123 135 114 NA 0

100 130 233 1128 143 142 112 NA 119 134 109 NA 0

1000 123 225 1676 139 145 120 1597 117 133 108 1230 0

(c) Non-existing mean 120575 = 1

120575 = 1 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 minus010 246 minus001 118 090 infin 000 101 099 infin 0

100 000 074 724 000 109 095 infin 000 101 099 infin 0

1000 000 384 3481 000 101 100 infin 000 100 100 infin 0

50 053 052 10 051 034 065 1 051 052 052 1 0

100 050 052 10 051 038 063 1 050 053 053 1 0

1000 049 052 10 051 048 053 1 049 051 051 1 0

50 127 6585 4243 210 250 232 NA 119 170 216 NA 0

100 130 41075 40502 201 228 259 NA 117 174 225 NA 0

1000 126 330758 1040527 193 221 281 NA 111 164 218 NA 0

(d) Extreme heavy tails 120575 = 15

120575 = 15 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 minus002 684 309 minus002 123 137 infin minus001 100 149 infin 001

100 000 minus5116 3080 minus001 112 144 infin 000 101 150 infin 000

1000 000 17613 14251 000 101 149 infin 000 100 150 infin 000

50 053 048 1 051 034 064 1 053 053 054 1 001

100 051 053 1 054 037 061 1 052 051 051 1 000

1000 050 050 1 050 047 054 1 049 053 052 1 000

50 132 134771 9261 257 320 312 NA 115 186 276 NA 001

100 133 4215628 418435 239 287 344 NA 112 178 284 NA 000

1000 126 12446282 3903629 218 266 367 NA 111 180 285 NA 000

12 The Scientific World Journal

Table 3 Summary statistics for observed (heavy-tailed) y and back-transformed (Gaussianized) data x120591MLE

lowastlowast stands for lt10minus16and lowast forlt22 sdot 10

minus16

y sim C(0 1)

(Section 61)y = SampP 500(Section 62)

y x120591

x

y x120591

Min minus16159 minus316 0 minus711 minus242Max 95295 381 3318 499 223Mean 230 003 1498 005 005Median 004 004 1496 004 004Standard Deviation 46980 106 120 095 071Skewness 1743 012 390 minus030 minus004Kurtosis 34334 321 16175 770 293Shapiro-Wilk lowast 071 lowastlowast lowast 024Anderson-Darling lowastlowast 051 lowastlowast lowast 018

Heavy-tailed Lambert W times Gaussian distributions havesimilar properties to a Cauchy for 120575 asymp 1 The mean of119883 (120583

119883)

equals the location of119884 (119888) due to symmetry around 120583119883and 119888

respectively Thus we can estimate 120591 from the Cauchy sampley transform y to x

120591 get 120583

119883from x

120591= 119882

120591(y) and thus obtain

an estimate for 119888 by 119888 = 120583119883

The random sample y sim C(0 1) with pdf 119891(119910) = 1120587(1+1199102) in Figure 2(a) has heavy tails with two extreme (positive)

observations A Cauchy ML fit gives 119888 = 003(0055) and119904 = 086(0053) (standard errors in parenthesis) A LambertWtimesGaussianMLEgives120583

119883= 003(0055)120590

119883= 105(0072)

and 120575 = 086(0082) Thus both fits correctly fail to reject120583119883= 119888 = 0 Table 3(a) shows summary statistics on both

samples Since the Cauchy distribution does not have a well-defined mean y = 2304(2101) is not meaningful Howeverx120591MLE

is approximately Gaussian and we can use the sampleaverage for inference x

120591MLE= 0033(00472) correctly fails to

reject a zero location for y The transformed x120591MLE

featuresadditional Gaussian characteristics (symmetric no excesskurtosis) and even the null hypothesis of Normality cannotbe rejected (119875-valuege 05) Note however that Normality forthe transformed data is only an empirical approximation therandom variable119882

120591((119884 minus 120583

119883)120590

119883) where 119884 is Cauchy does

not have a Normal distributionFigure 2(d) shows the cumulative sample average for

the original sample and its Gaussianized version For afair comparison 120591(119899)MLE was reestimated cumulatively for each119899 = 5 500 and then used to compute (119909

1 119909

119899) The

transformation works extremely well data point 11991049is highly

influential for y but has no relevant effect on x120591(119899)

MLE Even for

small 119899 it is already clear that the location of the underlyingCauchy distribution is approximately zero

Although it is a simulated example it demonstrates thatremoving (strong) heavy tails from data works well andprovides ldquonicerdquo data that can then be analyzed with morerefined Gaussian methods

62 Heavy Tails in Finance SampP 500 Case Study A lot offinancial data displays negative skewness and excess kurto-sis Since financial data in general is not iid it is oftenmodeled with a (skew) Studentrsquos 119905-distribution underlyinga (generalized) autoregressive conditional heteroskedastic(GARCH) [2 48] or a stochastic volatility (SV) [49 50]model Using the Lambert W approach we can build uponthe knowledge and implications of Normality (and avoidderiving properties of a GARCH or SV model with heavy-tailed innovations) and simply ldquoGaussianizerdquo the returnsbefore fitting more complex GARCH or SV models

Remark 10 Time series models with Lambert W times Gaussianwhite noise are far beyond the scope of this work but canbe a direction of future research Here I only consider theunconditional distribution

Figure 7(a) shows the SampP 500 log-returns with a totalof 119873 = 2 780 daily observations (R package MASSdataset SP500) Table 3(b) confirms the heavy tails (samplekurtosis 770) but also indicates negative skewness (minus0296)As the sample skewness 120574

1(y) is very sensitive to outliers

we fit a distribution which allows skewness and test forsymmetry In case of the double-tail Lambert W times Gaussianthis means testing 119867

0 120575

ℓ= 120575

119903= 120575 versus 119867

1

120575ℓ

= 120575119903 Using the likelihood expression in (28) we can

use a likelihood ratio test with one degree of freedom (3versus 4 parameters) The log-likelihood of the double-tail fit(Table 4(a)) equals minus36060 = minus297227 + (minus63373) (inputlog-likelihood + penalty) while the symmetric 120575 fit givesminus360656 = minus297147 + (minus63509) Here the symmetric fitgives a transformed sample that is more Gaussian but it paysa greater penalty for transforming the data Comparing twicetheir difference to a1205942

1distribution gives a119875-value of 029 For

comparison a skew-119905 fit [51] with location 119888 scale 119904 shape120572 and ] degrees of freedom also yields (Function stmle

in the R package sn) a nonsignificant (Table 4(b)) Thusboth fits cannot reject symmetry

The Scientific World Journal 13

Time

y

0 500 1500 2500

minus6

minus2

2

0 5 2010 3000

04

08

Lag

ACF

y

minus5 0 500

03

06

KernelGaussian

minus3 minus2 minus1 0 1 2 3

minus6

minus2

2

Theoretical quantiles

Sam

ple q

uant

ilesMean 005

SD 095Skewminus03Kurt 771

(a) Observed heavy tail returns y

Time

x

0 500 1500 2500

minus2

012

0 5 10 20 3000

04

08

Lag

ACF

x

00

03

06

minus3 minus2 minus1 0 1 2 3minus3 minus2 minus1 0 1 2 3

minus2

012

Theoretical quantiles

KernelGaussian

Sam

ple q

uant

iles

Mean 005SD 07Skewminus004Kurt 293

(b) Gaussianized returns x120591MLE

Figure 7 Lambert W Gaussianization of SampP 500 log-returns 120591 = (005 070 017) In (a) and (b) data (top left) autocorrelation function(ACF) (top right) histogram Gaussian fit and KDE (bottom left) Normal QQ plot (bottom right)

Assume we have to make a decision if we should trade acertificate replicating the SampP 500 Since we can either buyor sell it is not important if the average return is positive ornegative as long as it is significantly different from zero

621 Gaussian Fit to Returns Estimating (120583119884 120590119884) by Gaus-

sianMLE and thus ignoring the heavy tails 120583119884= 0 cannot be

rejected on a 120572 = 1 level (Table 4(e)) Ignoring heavy tailswe would thus decide to not trade a replicating certificate atall

622 Heavy Tail Fit to Returns Both a heavy tail LambertWtimes Gaussian (Table 4(c)) and Student 119905-fit (Table 4(d)) rejectthe zero mean null (119875-values 10minus4 and 3 sdot 10minus5 resp)

Location and scale estimates are almost identical buttail estimates lead to different conclusions while for ] =

371 only moments up to order 3 exist in the Lambert W times

Gaussian case moments up to order 5 exist (10172 = 581)This is especially noteworthy as many theoretical results inthe (financial) time series literature rely on finite fourthmoments [52 53] consequently many empirical studies testthis assumption [7 54] Here Studentrsquos 119905 and a LambertW times Gaussian fit give different conclusions Since previousempirical studies often use Studentrsquos 119905 as a baseline [41] itmight be worthwhile to reexamine their findings in light ofheavy tail Lambert W times Gaussian distributions

623 ldquoGaussianizingrdquo Financial Returns The back-trans-formed x

120591MLEis indistinguishable from a Gaussian sample

(Figure 7(b)) and thus demonstrates that a Lambert W times

Gaussian distribution is indeed appropriate for x120591MLE

Noteven one test can reject Normality 119875-values are 018 018031 and 024 respectively (Anderson-Darling Cramer-von-Mises Shapiro-Francia Shapiro-Wilk see Thode [55])Table 3(b) confirms that Lambert W ldquoGaussianiziationrdquo was

successful 1205741(x120591) = minus0039 and 120574

2(x120591) = 293 are within the

typical variation for aGaussian sample of size119873 = 2780Thus

119884 = (119880119890(01722)119880

2

) 0705 + 0055 119880 =

119883 minus 0055

0705

119880 simN (0 1)

(36)

is an adequate (unconditional) LambertWtimesGaussianmodelfor the SampP 500 log-returns y For trading thismeans that theexpected return is significantly larger than zero (120583

119883= 0055 gt

0) and thus replicating certificates should be bought

624 Gaussian MLE for Gaussianized Data For 120575119897= 120575

119903equiv

120575 lt 1 also 120583119883equiv 120583

119884 We can therefore replace testing

120583119910= 0 versus 120583

119910= 0 for a non-Gaussian y with the

very well-understood hypothesis test 120583119909= 0 versus 120583

119909= 0

for the Gaussian x120591MLE

In particular standard errors basedon radic119873 and thus t and 119875-values should be closer to theldquotruthrdquo (Tables 4(c) and 4(d)) than a Gaussian MLE on thenon-Gaussian y (Table 4(e)) Table 4(f) shows that standarderrors for 120583x are even a bit too small compared to the heavy-tailed versions Since the ldquoGaussianizingrdquo transformation wasestimated treating x

120591MLEas if it was original data is too

optimistic regarding its Normality (recall the penalty (30) inthe total likelihood (28))

This example confirms that if a model and its theoret-ical properties are based on Normality but the observeddata is heavy-tailed then Gaussianizing the data first givesmore reliable inference than applying Gaussian methods tothe original heavy-tailed data (Figure 1) Clearly a jointestimation of the model parameters based on Lambert Wtimes Gaussian random variables (or any other heavy-taileddistribution) would be optimal However theoretical prop-erties and estimation techniques may not be available orwell understood The Lambert Way to Gaussianize data is

14 The Scientific World Journal

Table 4 MLE fits to SampP 500 y Tables 4(a) 4(b) 4(c) 4(d) and 4(e)and the Gaussianized data x

120591MLETable 4(f)

(a) Double-tail Lambert W times Gaussian = Tukeyrsquos ℎℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 366 000120590119883

071 0016 4400 000120575ℓ

019 0021 899 000120575119903

016 0019 824 000

(b) Skew 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 010 0061 165 010119904 067 0017 3847 000120572 minus008 0101 minus077 044] 373 0297 1257 000

(c) Lambert W times Gaussian = Tukeyrsquos ℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 365 0000120590119883

071 0016 4395 0000120575 017 0016 1105 0000

(d) Studentrsquos 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 006 0015 365 000119904 067 0017 3951 000] 372 0295 1261 000

(e) Gaussian (SampP 500)

Est se 119905 Pr (gt|119905|)120583119884

005 0018 255 001120590119884

095 0013 7457 000

(f) Gaussian (x120591MLE )

Est se 119905 Pr (gt|119905|)120583119909120591

005 0013 381 000120590119909120591

071 0009 7457 000

thus a pragmatic method to improve statistical inferenceon heavy-tailed data while preserving the applicability andinterpretation of Gaussian models

7 Discussion and Outlook

In this work I use the Lambert W function to modeland remove heavy tails from continuous random variablesusing a data-transformation approach For Gaussian randomvariables this not only contributes to existing work onTukeyrsquos ℎ distribution but also gives convincing empiricalresults unimodal data with heavy tails can be transformedto Gaussian data Properties of a Gaussian model MN onthe back-transformed data mimic the features of the ldquotruerdquoheavy-tailed modelM

119866very closely

Since Normality is the single most typical and oftenrequired assumption in many areas of statistics machine

learning and signal processing future research can takemany directions From a theoretical perspective properties ofLambert W times 119865

119883distributions viewed as a generalization of

alreadywell-known distributions119865119883can be studiedThis area

will profit from existing literature on the Lambert W func-tion which has been discovered only recently by the statisticscommunity Empirical work can focus on transforming thedata and comparing approximate Gaussian with joint heavytail analyses The comparisons in this work showed thatapproximate inference for Gaussianized data is comparablewith the direct heavy tail modeling and so provides a simpletool to improve inference for heavy-tailed data in statisticalpractice

I also provide the R package Lambert W publicly avail-able at CRAN to facilitate the use of heavy tail Lambert Wtimes 119865

119883distributions in practice

Disclosure

The first version of this article entitled ldquoClosed-form cdfand pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert Wapproach and how to bijectively lsquoGaussianizersquo heavy-taileddatardquo has appeared online [56]This research was performedwhile the author was at Carnegie Mellon University Cur-rently Georg M Goerg works at Google Inc New York NY10011 USA

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

References

[1] P J Brockwell andR ADavisTime SeriesTheory andMethodsSpringer Series in Statistics 1998

[2] R F Engle ldquoAutoregressive conditional heteroscedasticity withestimates of the variance of United Kingdom inflationrdquo Econo-metrica vol 50 no 4 pp 987ndash1007 1982

[3] C W J Granger and R Joyeux ldquoAn introduction to long-memory time series models and fractional differencingrdquo Jour-nal of Time Series Analysis vol 1 pp 15ndash30 2001

[4] C A Field ldquoUsing the gh distribution to model extreme windspeedsrdquo Journal of Statistical Planning and Inference vol 122 no1-2 pp 15ndash22 2004

[5] A Vazquez J G Oliveira Z Dezso K-I Goh I Kondor andA-L Barabasi ldquoModeling bursts and heavy tails in humandynamicsrdquo Physical Review E vol 73 no 3 Article ID 0361272006

[6] M Gidlund and N Debernardi ldquoScheduling performance ofheavy-tailed data traffic in wireless high-speed shared chan-nelsrdquo in Proceedings of the IEEE Wireless Communications ampNetworking Conference (WCNC rsquo09) pp 1818ndash1823 IEEE PressBudapest Hungary April 2009

[7] R Cont ldquoEmpirical properties of asset returns stylized factsand statistical issuesrdquo Quantitative Finance vol 1 pp 223ndash2362001

[8] T-H Kim and H White ldquoOn more robust estim ation ofskewness and kurtosis simulation and application to the SP500indexrdquo Tech Rep Department of Economics UCSD 2003

The Scientific World Journal 15

[9] T C Aysal and K E Barner ldquoSecond-order heavy-taileddistributions and tail analysisrdquo IEEE Transactions on SignalProcessing vol 54 no 7 pp 2827ndash2832 2006

[10] V K Smith ldquoLeast squares regression with Cauchy errorsrdquoOxford Bulletin of Economics and Statistics vol 35 no 3 pp223ndash231 1973

[11] J Ilow ldquoForecasting network traffic using FARIMAmodels withheavy tailed innovationsrdquo inProceedings of the IEEE InterntionalConference onAcoustics Speech and Signal Processing pp 3814ndash3817 June 2000

[12] W Palma and M Zevallos ldquoFitting non-Gaussian persistentdatardquo Applied Stochastic Models in Business and Industry vol27 no 1 pp 23ndash36 2011

[13] J Nowicka-Zagrajek and R Weron ldquoModeling electricity loadsin California ARMA models with hyperbolic noiserdquo SignalProcessing vol 82 no 12 pp 1903ndash1915 2002

[14] R J Adler R E Feldman and M S Taqqu Eds A PracticalGuide to Heavy Tails Statistical Techniques and ApplicationsBirkhauser Cambridge Mass USA 1998

[15] H Liu J Lafferty and L Wasserman ldquoThe nonparanormalsemiparametric estimation of high dimensional undirectedgraphsrdquo Journal ofMachine Learning Research vol 10 pp 2295ndash2328 2009

[16] G M Goerg ldquoLambert W random variablesmdasha new familyof generalized skewed distributions with applications to riskestimationrdquo The Annals of Applied Statistics vol 5 no 3 pp2197ndash2230 2011

[17] D C Hoaglin ldquoSummarizing shape numerically the g-and-hdistributionsrdquo in Exploring Data Tables Trends and Shapes DC Hoaglin F Mosteller and J W Tukey Eds chapter 11 pp461ndash513 John Wiley amp Sons Hoboken NJ USA 1985

[18] RDevelopment Core Team R A Language and Environment forStatistical Computing R Foundation for Statistical ComputingVienna Austria 2010

[19] C Field and M G Genton ldquoThe Multivariate g-and-h Distri-butionrdquo Technometrics vol 48 no 1 pp 104ndash111 2006

[20] RM Sakia ldquoThe Box-Cox transformation technique a reviewrdquoJournal of the Royal Statistical Society Series D The Statisticianvol 41 no 2 pp 169ndash178 1992

[21] J R Blaylock L E Salathe and R D Green ldquoA note on the Box-Cox transformation under heteroskedasticityrdquoWestern Journalof Agricultural Economics vol 5 no 2 pp 129ndash136 1980

[22] A J Lawrance ldquoA note on the variance of the Box-Cox regres-sion transformation estimaterdquo Journal of the Royal StatisticalSociety C vol 36 no 2 pp 221ndash223 1987

[23] G Tsiotas ldquoOn the use of the Box-Cox transformation onconditional variance modelsrdquo Finance Research Letters vol 4no 1 pp 28ndash32 2007

[24] S Goncalves andNMeddahi ldquoBox-Cox transforms for realizedvolatilityrdquo Journal of Econometrics vol 160 no 1 pp 129ndash1442011

[25] N H Bingham C M Goldie and J L Teugels RegularVariation Encyclopedia of Mathematics and its ApplicationsCambridge University Press Cambridge UK 1989

[26] S Morgenthaler and J W Tukey ldquoFitting quantiles doublingHRHQ andHHH distributionsrdquo Journal of Computational andGraphical Statistics vol 9 no 1 pp 180ndash195 2000

[27] M Fischer ldquoGeneralized Tukey-type distributions with appli-cation to financial and teletraffic datardquo Statistical Papers vol 51no 1 pp 41ndash56 2010

[28] T C Headrick R K Kowalchuk and Y Sheng ldquoParametricprobability densities and distribution functions for Tukey g-and-h transform ations and their use for fi tting datardquo AppliedMathematical Sciences vol 2 no 9 pp 449ndash462 2008

[29] R M Corless G H Gonnet D E Hare D J Jeffrey andD E Knuth ldquoOn the Lambert 119882 functionrdquo Advances inComputational Mathematics vol 5 no 4 pp 329ndash359 1996

[30] M Rosenlicht ldquoOn the explicit solvability of certain transcen-dental equationsrdquo Publications Mathematiques de lrsquoInstitut desHautes Etudes Scientifiques no 36 pp 15ndash22 1969

[31] S R Valluri D J Jeffrey and R M Corless ldquoSome applicationsof the Lambert W function to physicsrdquo Canadian Journal ofPhysics vol 78 no 9 pp 823ndash831 2000

[32] M Galassi J Davies J Theiler et al GNU Scientific LibraryReference Manual 3rd edition 2011 httpwwwgnuorgsoft-waregsl

[33] P Jodra ldquoA closed-form expression for the quantile functionof the Gompertz-Makeham distributionrdquo Mathematics andComputers in Simulation vol 79 no 10 pp 3069ndash3075 2009

[34] A G Pakes ldquoLambertrsquos W infinite divisibility and poissonmixturesrdquo Journal of Mathematical Analysis and Applicationsvol 378 no 2 pp 480ndash492 2011

[35] M Stehlık ldquoDistributions of exact tests in the exponentialfamilyrdquoMetrika vol 57 no 2 pp 145ndash164 2003

[36] J-N Hwang S-R Lay and A Lippman ldquoNonparametricmultivariate density estimation a comparative studyrdquo IEEETransactions on Signal Processing vol 42 no 10 pp 2795ndash28101994

[37] L Wasserman All of Nonparametric Statistics Springer Texts inStatistics 2007

[38] R E Maiboroda and N M Markovich ldquoEstimation of heavy-tailed probability density functionwith application toweb datardquoComputational Statistics vol 19 no 4 pp 569ndash592 2004

[39] N M Markovich ldquoAccuracy of transformed kernel densityestimates for a heavy-tailed distributionrdquo Automation andRemote Control vol 66 no 2 pp 217ndash232 2005

[40] K K Dutta and D Babbel ldquoOn measuring skewness andkurtosis in short rate distributions the case of the US dollarLondon inter bank offer ratesrdquo Tech Rep Wharton SchoolCenter for Financial Institutions University of Pennsylvania2002

[41] C S Wong W S Chan and P L Kam ldquoA Student t-mixtureautoregressivemodel with applications to heavy-tailed financialdatardquo Biometrika vol 96 no 3 pp 751ndash760 2009

[42] J Yan ldquoAsymmetry fat-tail and autoregressive conditionaldensity in financial return data with systems of frequencycurvesrdquo 2005 httpciteseerxistpsueduviewdocsummarydoi=1011762741

[43] G D Rayner and H L MacGillivray ldquoNumerical maximumlikelihood estimation for the g-and-k and generalized g-and-hdistributionsrdquo Statistics and Computing vol 12 no 1 pp 57ndash752002

[44] E L Lehmann and G Casella Theory of Point EstimationSpringer Texts in Statistics Springer New York NY USA 2ndedition 1998

[45] C Fernandez andM F Steel ldquoMultivariate Student-119905 regressionmodels pitfalls and inferencerdquo Biometrika vol 86 no 1 pp153ndash167 1999

[46] C Liu and D B Rubin ldquoML estimation of the t distributionusing EM and its extensions ECMand ECMErdquo Statistica Sinicavol 5 pp 19ndash39 1995

16 The Scientific World Journal

[47] M P Cohen ldquoSample means of independent standard CAUchyrandom variables are standard cauchy a new approachrdquo TheAmerican Mathematical Monthly vol 119 no 3 pp 240ndash2442012

[48] T Bollerslev ldquoGeneralized autoregressive conditional het-eroskedasticityrdquo Journal of Econometrics vol 31 no 3 pp 307ndash327 1986

[49] R Deo C Hurvich and Y Lu ldquoForecasting realized volatilityusing a long-memory stochastic volatility model estimationprediction and seasonal adjustmentrdquo Journal of Econometricsvol 131 no 1-2 pp 29ndash58 2006

[50] AMelino and SM Turnbull ldquoPricing foreign currency optionswith stochastic volatilityrdquo Journal of Econometrics vol 45 no 1-2 pp 239ndash265 1990

[51] A Azzalini and A Capitanio ldquoDistributions generated byperturbation of symmetrywith emphasis on amultivariate skewt-distributionrdquo Journal of the Royal Statistical Society Series BStatistical Methodology vol 65 no 2 pp 367ndash389 2003

[52] R N Mantegna and H E Stanley ldquoModeling of financial datacomparison of the truncated Levy flight and the ARCH(1) andGARCH(11) processesrdquo Physica A Statistical Mechanics and itsApplications vol 254 no 1-2 pp 77ndash84 1998

[53] P A Zadrozny Necessary and Sufficient Restrictions for Exis-tence of a Unique Fourth Moment of a Univariate GARCH(pq)Process vol 20 of Advances in Econometrics chapter 13Emerald Group Publishing 2005 httpsideasrepecorgpcesceswps 1505html

[54] R Huisman K G Koedijk C J Kool and F Palm ldquoTail-indexestimates in small samplesrdquo Journal of Business amp EconomicStatistics vol 19 no 2 pp 208ndash216 2001

[55] H CThode JrTesting for Normality CRCPress NewYorkNYUSA 2002

[56] G M Goerg ldquoClosed-form cdf and pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert W approach and howto bijectively ldquoGaussianizerdquo heavy-tailed datardquo Tech RepCarnegieMellonUniversity 2010 httparxiv-web3librarycor-nelleduabs10102265v1

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article The Lambert Way to Gaussianize Heavy

6 The Scientific World Journal

ldquonormalizingrdquo the Lambert Way can also improve KDE forheavy-tailed data (see also [38 39])

Remark 6 (generalized transformation) Transformation (2)can be generalized to

119885 = 119880 exp(1205752

(1198802)

120572

) 120572 gt 0 (10)

The inner term 1198802 guarantees bijectivity for all 120572 gt 0 The

inverse is

119882120575120572 (

119911) = sgn (119911)(119882(120572120575 (119911

2)

120572

)

120572120575

)

12120572

(11)

For comparison with Tukeyrsquos ℎ I consider 120572 = 1 onlyFor 120572 = 12 transformation (10) is closely related to skewedLambert W times 119865

119883distributions

24 Distribution and Density For ease of notation let

119911 =

119910 minus 120583119883

120590119883

119906 = 119882120575 (119911) 119909 = 119882

120591(119910) = 119906120590

119883+ 120583

119883 (12)

Theorem 7 The cdf and pdf of a location-scale heavy tailLambert W times 119865

119883random variable 119884 equal

119866119884(119910 | 120573 120575) = 119865

119883(119882

120575(

119910 minus 120583119909

120590119909

)120590119883+ 120583

119883| 120573) (13)

119892119884(119910 | 120573 120575)

= 119891119883(119882

120575(

119910 minus 120583119883

120590119883

)120590119883+ 120583

119883| 120573)

sdot

119882120575((119910 minus 120583

119883) 120590

119883)

((119910 minus 120583119883) 120590

119883) [1 +119882(120575 ((119910 minus 120583

119883) 120590

119883)2)]

(14)

Clearly 119866119884(119910 | 120573 120575 = 0) = 119865

119883(119910 | 120573) and 119892

119884(119910 | 120573 120575 =

0) = 119891119883(119910 | 120573) since lim

120575rarr0119882120575(119911) = 119911 and lim

120575rarr0119882(120575119911

2) =

0 for all 119911 isin RFor scale family or noncentral nonscale input set 120583

119883= 0

or 120583119883= 0 120590

119883= 1

The explicit formula (14) allows a fast computation andtheoretical analysis of the likelihood which is essential forstatistical inference Detailed properties of (14) are given inSection 41

Figure 4 shows (13) and (14) for various 120575 ge 0 with fourdifferent input distributions for 120575 = ℎ = 0 the input equalsthe output (solid black) for larger 120575 the tails of 119866

119884(119910 | 120579) and

119892119884(119910 | 120579) get heavier (dashed colored)

25 Quantile Function Quantile fitting has been the standardtechnique to estimate 120583

119883120590

119883 and 120575 of Tukeyrsquos ℎ In particular

the medians of 119884 and 119883 are equal Thus for symmetriclocation-scale family input the sample median of y is a robust

estimate of 120583119883for any 120575 ge 0 (see also Section 5) General

quantiles can be computed via [17]

119910120572= 119906

120572exp(120575

2

1199062

120572)120590

119883+ 120583

119883 (15)

where 119906120572= 119882

120575(119911120572) are the 120572-quantiles of 119865

119880(119906)

3 Tukeyrsquos ℎ Distribution Gaussian Input

For Gaussian input Lambert W times 119865119883equals Tukeyrsquos ℎ which

has been studied extensively Dutta and Babbel [40] show that

E119885119899=

0 if 119899 is odd 119899 lt 1120575

119899 (1 minus 119899120575)minus(119899+1)2

21198992(1198992)

if 119899 is even 119899 lt 1120575

∄ if 119899 is odd 119899 gt 1120575

infin if 119899 is even 119899 gt 1120575

(16)

which in particular implies that [28]

E119885 = E1198853= 0 if 120575 lt 1 and 1

3

respectively (17)

E1198852=

1

(1 minus 2120575)32 if 120575 lt 1

2

E1198854= 3

1

(1 minus 4120575)52 if 120575 lt 1

4

(18)

Thus the kurtosis of 119884 equals (see Figure 5)

1205742 (120575) = 3

(1 minus 2120575)3

(1 minus 4120575)52

for 120575 lt 14 (19)

For 120575 = 0 (18) and (19) reduce to the familiarGaussian valuesExpanding (19) around 120575 = 0 yields

1205742 (120575) = 3 + 12120575 + 66120575

2+ O (120575

3) (20)

Dropping O(1205753) and solving (20) gives a rule of thumbestimate

120575Taylor =

1

66

[radic66 1205742(y) minus 162 minus 6]

+

(21)

where 1205742(y) is the sample kurtosis and [119886]

+= max(119886 0) that

is 120575Taylor gt 0 if 1205742(y) gt 3 otherwise set 120575Taylor = 0

Corollary 8 The cdf of Tukeyrsquos ℎ equals

119866119884(119910 | 120583

119883 120590119883 120575) = Φ(

119882120591(119910) minus 120583

119883

120590119883

) (22)

whereΦ(119906) is the cdf of a standard NormalThe pdf equals (for120575 gt 0)

119892119884(119910 | 120583

119883 120590119883 120575) =

1

radic2120587

exp(minus1 + 1205752

119882120575(

119910 minus 120583119883

120590119883

)

2

)

sdot

1

1 +119882(120575 ((119910 minus 120583119883) 120590

119883)2)

(23)

The Scientific World Journal 7

000 005 010 015 020 02510

15

20

25

Varia

nce

Studentrsquos t

120575 or 1

Heavy tail Lambert W times Gaussian

(a) Variance

000 005 010 0153

4

5

6

7

8

9

10

Kurt

osis

120575 or 1

Studentrsquos tHeavy tail Lambert W times Gaussian

(b) Kurtosis

Figure 5 Comparing moments of Lambert W times Gaussian and Studentrsquos 119905

Proof Take119883 simN(120583119883 120590

2

119883) in Theorem 7

Section 41 studies functional properties of (23) in moredetail

31 Tukeyrsquos ℎ versus Studentrsquos 119905 Studentrsquos 119905]-distribution with] degrees of freedom is often used to model heavy-tailed data[41 42] as its tail index equals ] Thus the 119899th moment of aStudentrsquos 119905 random variable 119879 exists if 119899 lt ] In particular

E119879 = E1198793= 0 if ] gt 1 or gt 3

E1198792=

]] minus 2

=

1

1 minus (2])if ] gt 2

(24)

and kurtosis

1205742 (]) = 3

] minus 2] minus 4

= 3

1 minus 2 (1])1 minus 4 (1])

if ] gt 4 (25)

Comparing (24) and (25) with (18) and (19) shows anatural association between 1] and 120575 and a close similaritybetween the first four moments of Studentrsquos 119905 and Tukeyrsquosℎ (Figure 5) By continuity and monotonicity the first fourmoments of a location-scale 119905-distribution can always beexactly matched by a corresponding location-scale LambertW times Gaussian Thus if Studentrsquos 119905 is used to model heavytails and not as the true distribution of a test statisticit might be worthwhile to also fit heavy tail Lambert Wtimes Gaussian distributions for an equally valuable ldquosecondopinionrdquo For example a parallel analysis on SampP 500 log-returns in Section 62 leads to divergent inference regardingthe existence of fourth moments

4 Parameter Estimation

Due to the lack of a closed form pdf of 119884 120579 = (120573 120575) hastypically been estimated by matching quantiles or a methodof moments estimator [4 26 28] These methods can now

be replaced by the fast and usually efficient maximumlikelihood estimator (MLE) Rayner and MacGillivray [43]introduce a numerical MLE procedure based on quantilefunctions but they conclude that ldquosample sizes significantlylarger than 100 should be used to obtain reliable estimatesrdquoSimulations in Section 5 show that the MLE using the closedform Lambert W times 119865

119883distribution converges quickly and is

accurate even for sample sizes as small as119873 = 10

41 Maximum Likelihood Estimation (MLE) For an iidsample (119910

1 119910

119873) = y sim 119892

119884(119910 | 120573 120575) the log-likelihood

function equals

ℓ (120579 y) =119873

sum

119894=1

log119892119884(119910

119894| 120573 120575) (26)

The MLE is that 120579 = (120573 120575) which maximizes (26) that is

120579MLE = (

120573 120575)MLE

= argmax120573120575ℓ (120573 120575 y) (27)

Since 119892119884(119910119894| 120573 120575) is a function of 119891

119883(119909119894| 120573) the MLE

depends on the specification of the input density Equation(26) can be decomposed as

ℓ (120573 120575 y) = ℓ (120573 x120591) +R (120591 y) (28)

where

ℓ (120573 x120591) =

119873

sum

119894=1

log119891119883(119882

120575(

119910119894minus 120583

119883

120590119883

)120590119883+ 120583

119883| 120573)

=

119873

sum

119894=1

log119891119883(119909

120591119894| 120573)

(29)

is the log-likelihood of the back-transformed data x120591=

(1199091205911 119909

120591119873) (via (8)) and

R (120591 y) =119899

sum

119894=1

log119877 (120583119883 120590119883 120575 119910

119894) (30)

8 The Scientific World Journal

y

01

015 02 025 03 035 04

045 05

055

06 065 07 075 08 085 09 095

1

00 05 10 1500

05

10

15

20

25

30

120575

R(120575 y | 120583 = 0 120590 = 1)

(a) Penalty 119877(120583119883 120590119883 120575 119910119894)(31) as a function of120575 and 119910 (120583119883 = 0 and 120590119883 = 1)

0 200 400 600 800 1000

minus10

0

10

20

30

40

Index00 05 10 15

minus3000

minus2000

minus1000

0

Log-

likel

ihoo

d (a

nd p

enal

ty)

Penalty

Input loglikOutput loglik

120575

120575trueMLE

(b) Sample z of Lambert W times Gaussian with 120575 = 13 (left) log-likelihood ℓ(120579 y) (solid black)decomposes in input log-likelihood (dotted green) and penalty (dashed red)

Figure 6 Log-likelihood decomposition for Lambert W times 119865119883distributions

where

119877 (120583119883 120590119883 120575 119910

119894)

=

119882120575((119910

119894minus 120583

119883) 120590

119883)

((119910119894minus 120583

119883) 120590

119883) [1 + 120575 (119882

120575((119910

119894minus 120583

119883) 120590

119883))2]

(31)

Note that 119877(120583119883 120590119883 120575 119910

119894) only depends on 120583

119883(120573) and 120590

119883(120573)

(and 120575) but not necessarily on every coordinate of 120573Decomposition (28) shows the difference between the

exact MLE (120573 120575) based on y and the approximate MLE 120573x120591based on x

120591alone if we knew 120591 = (120583

119883 120590119883 120575) beforehand

then we could back-transform y to x120591and estimate 120573x120591 from

x120591(maximize (29) with respect to 120573) In practice however

120591 must also be estimated and this enters the likelihood viathe additive term R(120591 y) A little calculation shows that forany 119910

119894isin R log119877(120583

119883 120590119883 120575 119910

119894) le 0 if 120575 ge 0 with equality

if and only if 120575 = 0 Thus R(120591 y) can be interpreted as apenalty for transforming the data Maximizing (28) faces atrade-off between transforming the data to follow 119891

119883(119909 | 120573)

(and increasing ℓ(120573 x120591)) and the penalty of a more extreme

transformation (and decreasingR(120591 y)) see Figure 6(b)Figure 6(a) shows a contour plot of 119877(120583

119883= 0 120590

119883=

1 120575 119910) as a function of 120575 and 119910 = 119911 it increases (in absolutevalue) either if 120575 gets larger (for fixed 119910) or for larger 119910 (forfixed 120575) In both cases increasing 120575 makes the transformeddata119882

120575(119911) get closer to 0 = 120583

119883 which in turn increases its

input likelihood For 120575 = 0 the penalty disappears since inputequals output for 119910 = 0 there is no penalty since119882

120575(0) = 0

for all 120575Figure 6(b) shows an iid sample (119873 = 1000) z sim

Lambert W times Gaussian with 120575 = 13 and the decompositionof the log-likelihood as in (28) Since 120573 = (0 1) is knownthe likelihood and penalty are only functions of 120575Theorem 9

shows that the convexity of the penalty (decreasing red)and concavity of the input likelihood (increasing green) asa function of 120575 holds true in general for any data z and theirsum (solid black) has a unique maximum here 120575MLE = 037(blue dashed vertical line) See Theorem 9 below for details

The maximization of (28) can be carried out numericallyHere I show existence and uniqueness of 120575MLE assuming that120583119883and 120590

119883are known Further theoretical results for 120579MLE

remain for futureworkGiven the ldquonicerdquo formof119892119884(119910) which

is continuous twice differentiable (under the assumption that119891119909(sdot) is twice differentiable) the MLE for 120579 = (120573 120575) shouldhave the usual optimality properties such as being consistentand efficient [44]

411 Properties of the MLE for the Heavy Tail Parameter 120575Without loss of generality let 120583

119883= 0 and 120590

119883= 1 In this case

ℓ (120575 z) prop minus

1

2

119873

sum

119894=1

[119882120575(119911119894)]2+

119873

sum

119894=1

log119882120575(119911119894)

119911119894

minus

119873

sum

119894=1

log (1 + 120575 [119882120575(119911119894)]2)

(32)

= minus

1 + 120575

2

119873

sum

119894=1

[119882120575(119911119894)]2minus

119873

sum

119894=1

log (1 + 120575 [119882120575(119911119894)]2)

(33)

Theorem 9 (see [14]) Let 119885 have a Lambert W times Gaussiandistribution where 120583

119883= 0 and 120590

119883= 1 are assumed to be

known and fixed Also consider only 120575 isin [0infin) (While forsome samples z the MLE also exists for 120575 lt 0 it cannot beguaranteed for all z If 120575 lt 0 (and 119911 = 0) then119882

120575(119911) is either

The Scientific World Journal 9

not unique in R (principal and nonprincipal branch) or doesnot have a real-valued solution in R if 1205751199112 lt 119890minus1)

(a) If

sum119899

119894=11199114

119894

sum119899

119894=11199112

119894

le 3 (34)

then 120575119872119871119864

= 0If (34) does not hold then

(b) 120575119872119871119864

gt 0 exists and is a positive solution to

119873

sum

119894=1

1199112

119894119882

1015840(120575119911

2

119894)(

1

2

119882120575(119911119894)2minus (

1

2

+

1

1 +119882(1205751199112

119894)

)) = 0 (35)

(c) There is only one such 120575 satisfying (35) that is 120575119872119871119864

isunique

Proof See Appendix B in the supplementary material

Condition (34) says that 120575MLE gt 0 only if the datais heavy-tailed enough Points (b) and (c) guarantee thatthere is no ambiguity in the heavy tail estimate This is anadvantage over Studentrsquos 119905-distribution for example whichhas numerical problems and local maxima for unknown (andsmall) ] [45 46] On the contrary 120575MLE is always a globalmaximum

Given the heavy tails in z one might expect convergenceissues for larger 120575 However 119882

120575(119885) sim N(0 1) for the true

120575 ge 0 and close to a standard Gaussian if 120575MLE asymp 120575 Since thelog-likelihood and its gradient depend on 120575 and z only via119882120575(z) the performance of theMLE should thus not get worse

for large 120575 as long as the initial estimate is close enough to thetruth Simulations in Section 5 support this conjecture evenfor 120579MLE

42 Iterative Generalized Method of Moments (IGMM) Adisadvantage of the MLE is the mandatory a priori specifi-cation of the input distribution Especially for heavy-taileddata the eye is a bad judge to choose a particular parametric119891119883(119909 | 120573) It would be useful to directly estimate 120591 without

the intermediate step of estimating 120579 firstGoerg [16] presented an estimator for 120591 based on iterative

generalized methods of moments (IGMM) The idea ofIGMM is to find a 120591 such that the back-transformed datax120591has desired properties for example being symmetric or

having kurtosis 3 An estimator for 120583119883 120590

119883 and 120575 can be

constructed entirely analogous to the IGMMestimator for theskewed LambertWtimes F case See the SupplementaryMaterialAppendix C for details

IGMM requires less specific knowledge about the inputdistribution and is usually also faster than the MLE Once120591IGMM has been obtained the back-transformed x

120591IGMMcan be

used to check if 119883 has characteristics of a known parametricdistribution 119865

119883(119909 | 120573) It must be noted though that testing

for a particular distribution 119865119883is too optimistic as x

120591has

ldquonicerrdquo properties regarding 119865119883than the true x would have

However estimating the transformation requires only threeparameters and for a large enough sample losing threedegrees of freedom should not matter in practice

5 Simulations

This section explores finite sample properties of estimatorsfor 120579 = (120583

119883 120590119883 120575) and (120583

119884 120590119884) under Gaussian input

119883 sim N(120583119883 120590

2

119883) In particular it compares Gaussian MLE

(estimation of 120583119884and 120590

119884only) IGMM and Lambert W times

Gaussian MLE and for a heavy tail competitor the median(For IGMM optimization was restricted to 120575 isin [0 10]) Allresults below are based on 119899 = 1 000 replications

51 Estimating 120575 Only Here I show finite sample propertiesof 120575MLE for119880 simN(0 1) where 120583

119883= 0 and 120590

119883= 1 are known

and fixed Theorem 9 shows that 120575MLE is unique either at theboundary 120575 = 0 or at the globally optimal solution to (35)Results in Table 1 were obtained by numerical optimizationrestricted to 120575 ge 0 (hArr log 120575 isin R) using the nlm function in119877

Table 1 suggests that the MLE is asymptotically unbiasedfor every 120575 and converges quickly (at about 119873 = 400) to itsasymptotic variance which is increasing with 120575 Assuming120583119883and 120590

119883to be known is unrealistic and thus these finite

sample properties are only an indication of the behavior of thejoint MLE 120579MLE Nevertheless they are very remarkable forextremely heavy-tailed data (120575 gt 1) where classic average-based methods typically break down One reason lies in theparticular form of the likelihood (32) and its gradient (35)(Theorem 9) although both depend on z they only do sothrough 119882

120575(z) = u sim N(0 1) Hence as long as 120575MLE is

sufficiently close to the true 120575 (32) and (35) are functions ofalmost Gaussian random variables and standard asymptoticresults should still apply

52 Estimating All Parameters Jointly Here we consider therealistic scenario where 120583

119883and 120590

119883are also unknown We

consider various sample sizes (119873 = 50 100 and 1000) anddifferent degrees of heavy tails 120575 isin 0 13 1 15 each onerepresenting a particularly interesting situation (i) Gaussiandata (does additional superfluous estimation of120575 affect otherestimates) (ii) fourthmoments do not exist (iii) nonexistingmean and (iv) extremely heavy-tailed data (can we get usefulestimates at all)

The convergence tolerance for IGMM was set to tol =122 sdot 10

minus4 Table 2 summarizes the simulationThe Gaussian MLE estimates 120590

119884directly while IGMM

and the Lambert W times Gaussian MLE estimate 120575 and120590119883 which implicitly give

119884through 120590

119884(120575 120590

119883) = 120590

119883sdot

(1radic(1 minus 2120575)32) if 120575 lt 12 (see (18)) For a fair comparison

each subtable also includes a column for 119884

= 119883sdot

(1radic(1 minus 2120575)32) Some of these entries contain ldquoinfinrdquo even for

120575 lt 12 this occurs if at least one 120575 ge 12 For any 120575 lt 1120583119883= 120583

119884 thus 120583

119883and 120583

119884can be compared directly For 120575 ge 1

10 The Scientific World Journal

Table 1 Finite sample properties of 120575MLE For each119873 120575 was estimated 119899 = 1000 times from a random sample zsimTukeyrsquos ℎ The left columnfor each 120575 shows bias 120575MLE minus 120575 each right column shows the root mean square error (RMSE) timesradic119873

119873 120575 = 0 120575 = 110 120575 = 13 120575 = 12

10 0025 0191 minus0017 0394 minus0042 0915 minus0082 1167

50 0013 0187 minus0010 0492 minus0018 0931 minus0016 1156

100 0010 0200 minus0010 0513 minus0009 0914 minus0006 1225

400 0005 0186 minus0003 0528 0000 0927 minus0004 1211

1000 0003 0197 0000 0532 minus0001 0928 minus0001 1203

2000 0003 0217 minus0001 0523 0000 0935 minus0001 1130

119873 120575 = 1 120575 = 2 120575 = 5

10 minus0054 1987 minus0104 3384 minus0050 7601

50 minus0017 1948 minus0009 3529 0014 7942

100 minus0014 2024 minus0001 3294 0011 7798

400 0001 1919 minus0002 3433 0001 7855

1000 0001 1955 0001 3553 minus0001 7409

2000 0001 1896 0000 3508 minus0001 7578

the mean does not exist each subtable for these 120575 interprets120583119884as the median

Gaussian Data (120575 = 0) This setting checks if imposing theLambert W framework even though it is superfluous causesa quality loss in the estimation of 120583

119884= 120583

119883or 120590

119884= 120590

119883

Furthermore critical values for 1198670 120575 = 0 (Gaussian) can

be obtained As in the 120575-only case above Table 2(a) suggeststhat estimators are asymptotically unbiased and quickly tendto a large-sample variance Additional estimation of 120575 doesnot affect the efficiency of120583

119883compared to estimating solely120583

Estimating 120590119884directly by Gaussian MLE does not give better

results than the Lambert W times Gaussian MLE

No Fourth Moment (120575 = 13) Here 120590119884(120575 120590

119883= 1) =

228 but fourth moments do not exist This results in anincreasing empirical standard deviation of

119910as119873 grows In

contrast estimates for 120590119883are not drifting off In the presence

of these large heavy tails themedian ismuch less variable thanGaussianMLE and IGMM Yet Lambert W timesGaussianMLEfor 120583

119883even outperforms the median

Nonexisting Mean (120575 = 1) Both sample moments divergeand their standard errors are also growing quickly Themedian still provides a very good estimate for the locationbut is again inferior to both Lambert W estimators whichare closer to the true values and appear to converge to anasymptotic variance at rateradic119873

Extreme Heavy Tails (120575 = 15) As in Section 51 IGMMand Lambert WMLE continue to provide excellent estimateseven though the data is extremely heavy tailed MoreoverLambert W MLE also has the smallest empirical standarddeviation overall In particular the Lambert W MLE for 120583

119883

has an approximately 15 lower standard deviation than themedian

The last column shows that for some 119873 about 1 of the119899 = 1 000 simulations generated invalid likelihood values(NA and infin) Here the search for the optimal 120575 led into

regions with a numerical overflow in the evaluation of119882120575(119911)

For a comparable summary these few cases were omitted andnew simulations added until full 119899 = 1 000 finite estimateswere found Since this only happened in 1 of the cases andalso such heavy-tailed data is rarely encountered in practicethis numerical issue is not a limitation in statistical practice

53 Discussion of the Simulations IGMM performs wellindependent of the magnitude of 120575 As expected the LambertWMLE for 120579 has the best properties it can recover the truthfor all 120575 and for 120575 = 0 it performs as well as the classicsample mean and standard deviation For small 120575 it has thesame empirical standard deviation as the Gaussian MLE buta lower one than the median for large 120575

Hence the only advantage of estimating 120583119884and 120590

119884by

sample moments of y is speed otherwise the Lambert W times

Gaussian MLE is at least as good as the Gaussian MLE andclearly outperforms it in presence of heavy tails

6 Applications

Tukeyrsquos ℎ distribution has already proven useful to modelheavy-tailed data but parametric inference was limited toquantile fitting ormethods ofmoments estimation [4 27 28]Theorem 7 allows us to estimate 120579 by ML

This section applies the presented methodology on sim-ulated as well as real world data (i) Section 61 demonstratesGaussianizing on the Cauchy sample from the Introductionand (ii) Section 62 shows that heavy tail Lambert W times

Gaussian distributions provide an excellent fit to daily SampP500 log-return series

61 Estimating Location of a Cauchy with the Sample MeanIt is well known that the sample mean y is a poor estimateof the location parameter of a Cauchy distribution since thesampling distribution of y is again a Cauchy (see [47] for arecent overview) in particular its variance does not go to 0for119873 rarr infin

The Scientific World Journal 11

Table 2 In each subtable (first rows) average (middle rows) proportion of estimates below truth and (bottom rows) empirical standarddeviation timesradic119873 True 120583

119883= 0 and 120590

119883= 1

(a) Truly Gaussian data 120575 = 0

120575 = 0 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 098 000 097 002 099 000 096 002 098 0

100 000 000 099 000 098 001 100 000 097 001 099 0

1000 000 000 100 000 099 000 100 000 099 000 100 0

50 050 050 057 051 060 066 054 051 065 066 056 0

100 050 051 056 051 062 062 053 052 065 062 056 0

1000 050 049 052 049 062 056 052 049 063 056 052 0

50 124 101 072 101 076 021 073 102 078 026 072 0

100 125 102 070 102 076 023 070 103 078 026 070 0

1000 126 098 073 098 079 022 073 098 079 022 073 0

(b) No fourth moments 120575 = 13

120575 = 13 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 198 000 107 029 infin 000 101 033 infin 0

100 000 000 203 000 104 031 infin 000 100 033 infin 0

1000 000 000 218 000 100 033 234 000 100 033 234 0

50 050 051 078 050 038 063 060 050 052 054 054 0

100 050 051 078 051 042 061 060 050 051 054 054 0

1000 048 051 077 051 047 056 055 051 050 053 052 0

50 127 221 656 144 145 110 NA 123 135 114 NA 0

100 130 233 1128 143 142 112 NA 119 134 109 NA 0

1000 123 225 1676 139 145 120 1597 117 133 108 1230 0

(c) Non-existing mean 120575 = 1

120575 = 1 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 minus010 246 minus001 118 090 infin 000 101 099 infin 0

100 000 074 724 000 109 095 infin 000 101 099 infin 0

1000 000 384 3481 000 101 100 infin 000 100 100 infin 0

50 053 052 10 051 034 065 1 051 052 052 1 0

100 050 052 10 051 038 063 1 050 053 053 1 0

1000 049 052 10 051 048 053 1 049 051 051 1 0

50 127 6585 4243 210 250 232 NA 119 170 216 NA 0

100 130 41075 40502 201 228 259 NA 117 174 225 NA 0

1000 126 330758 1040527 193 221 281 NA 111 164 218 NA 0

(d) Extreme heavy tails 120575 = 15

120575 = 15 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 minus002 684 309 minus002 123 137 infin minus001 100 149 infin 001

100 000 minus5116 3080 minus001 112 144 infin 000 101 150 infin 000

1000 000 17613 14251 000 101 149 infin 000 100 150 infin 000

50 053 048 1 051 034 064 1 053 053 054 1 001

100 051 053 1 054 037 061 1 052 051 051 1 000

1000 050 050 1 050 047 054 1 049 053 052 1 000

50 132 134771 9261 257 320 312 NA 115 186 276 NA 001

100 133 4215628 418435 239 287 344 NA 112 178 284 NA 000

1000 126 12446282 3903629 218 266 367 NA 111 180 285 NA 000

12 The Scientific World Journal

Table 3 Summary statistics for observed (heavy-tailed) y and back-transformed (Gaussianized) data x120591MLE

lowastlowast stands for lt10minus16and lowast forlt22 sdot 10

minus16

y sim C(0 1)

(Section 61)y = SampP 500(Section 62)

y x120591

x

y x120591

Min minus16159 minus316 0 minus711 minus242Max 95295 381 3318 499 223Mean 230 003 1498 005 005Median 004 004 1496 004 004Standard Deviation 46980 106 120 095 071Skewness 1743 012 390 minus030 minus004Kurtosis 34334 321 16175 770 293Shapiro-Wilk lowast 071 lowastlowast lowast 024Anderson-Darling lowastlowast 051 lowastlowast lowast 018

Heavy-tailed Lambert W times Gaussian distributions havesimilar properties to a Cauchy for 120575 asymp 1 The mean of119883 (120583

119883)

equals the location of119884 (119888) due to symmetry around 120583119883and 119888

respectively Thus we can estimate 120591 from the Cauchy sampley transform y to x

120591 get 120583

119883from x

120591= 119882

120591(y) and thus obtain

an estimate for 119888 by 119888 = 120583119883

The random sample y sim C(0 1) with pdf 119891(119910) = 1120587(1+1199102) in Figure 2(a) has heavy tails with two extreme (positive)

observations A Cauchy ML fit gives 119888 = 003(0055) and119904 = 086(0053) (standard errors in parenthesis) A LambertWtimesGaussianMLEgives120583

119883= 003(0055)120590

119883= 105(0072)

and 120575 = 086(0082) Thus both fits correctly fail to reject120583119883= 119888 = 0 Table 3(a) shows summary statistics on both

samples Since the Cauchy distribution does not have a well-defined mean y = 2304(2101) is not meaningful Howeverx120591MLE

is approximately Gaussian and we can use the sampleaverage for inference x

120591MLE= 0033(00472) correctly fails to

reject a zero location for y The transformed x120591MLE

featuresadditional Gaussian characteristics (symmetric no excesskurtosis) and even the null hypothesis of Normality cannotbe rejected (119875-valuege 05) Note however that Normality forthe transformed data is only an empirical approximation therandom variable119882

120591((119884 minus 120583

119883)120590

119883) where 119884 is Cauchy does

not have a Normal distributionFigure 2(d) shows the cumulative sample average for

the original sample and its Gaussianized version For afair comparison 120591(119899)MLE was reestimated cumulatively for each119899 = 5 500 and then used to compute (119909

1 119909

119899) The

transformation works extremely well data point 11991049is highly

influential for y but has no relevant effect on x120591(119899)

MLE Even for

small 119899 it is already clear that the location of the underlyingCauchy distribution is approximately zero

Although it is a simulated example it demonstrates thatremoving (strong) heavy tails from data works well andprovides ldquonicerdquo data that can then be analyzed with morerefined Gaussian methods

62 Heavy Tails in Finance SampP 500 Case Study A lot offinancial data displays negative skewness and excess kurto-sis Since financial data in general is not iid it is oftenmodeled with a (skew) Studentrsquos 119905-distribution underlyinga (generalized) autoregressive conditional heteroskedastic(GARCH) [2 48] or a stochastic volatility (SV) [49 50]model Using the Lambert W approach we can build uponthe knowledge and implications of Normality (and avoidderiving properties of a GARCH or SV model with heavy-tailed innovations) and simply ldquoGaussianizerdquo the returnsbefore fitting more complex GARCH or SV models

Remark 10 Time series models with Lambert W times Gaussianwhite noise are far beyond the scope of this work but canbe a direction of future research Here I only consider theunconditional distribution

Figure 7(a) shows the SampP 500 log-returns with a totalof 119873 = 2 780 daily observations (R package MASSdataset SP500) Table 3(b) confirms the heavy tails (samplekurtosis 770) but also indicates negative skewness (minus0296)As the sample skewness 120574

1(y) is very sensitive to outliers

we fit a distribution which allows skewness and test forsymmetry In case of the double-tail Lambert W times Gaussianthis means testing 119867

0 120575

ℓ= 120575

119903= 120575 versus 119867

1

120575ℓ

= 120575119903 Using the likelihood expression in (28) we can

use a likelihood ratio test with one degree of freedom (3versus 4 parameters) The log-likelihood of the double-tail fit(Table 4(a)) equals minus36060 = minus297227 + (minus63373) (inputlog-likelihood + penalty) while the symmetric 120575 fit givesminus360656 = minus297147 + (minus63509) Here the symmetric fitgives a transformed sample that is more Gaussian but it paysa greater penalty for transforming the data Comparing twicetheir difference to a1205942

1distribution gives a119875-value of 029 For

comparison a skew-119905 fit [51] with location 119888 scale 119904 shape120572 and ] degrees of freedom also yields (Function stmle

in the R package sn) a nonsignificant (Table 4(b)) Thusboth fits cannot reject symmetry

The Scientific World Journal 13

Time

y

0 500 1500 2500

minus6

minus2

2

0 5 2010 3000

04

08

Lag

ACF

y

minus5 0 500

03

06

KernelGaussian

minus3 minus2 minus1 0 1 2 3

minus6

minus2

2

Theoretical quantiles

Sam

ple q

uant

ilesMean 005

SD 095Skewminus03Kurt 771

(a) Observed heavy tail returns y

Time

x

0 500 1500 2500

minus2

012

0 5 10 20 3000

04

08

Lag

ACF

x

00

03

06

minus3 minus2 minus1 0 1 2 3minus3 minus2 minus1 0 1 2 3

minus2

012

Theoretical quantiles

KernelGaussian

Sam

ple q

uant

iles

Mean 005SD 07Skewminus004Kurt 293

(b) Gaussianized returns x120591MLE

Figure 7 Lambert W Gaussianization of SampP 500 log-returns 120591 = (005 070 017) In (a) and (b) data (top left) autocorrelation function(ACF) (top right) histogram Gaussian fit and KDE (bottom left) Normal QQ plot (bottom right)

Assume we have to make a decision if we should trade acertificate replicating the SampP 500 Since we can either buyor sell it is not important if the average return is positive ornegative as long as it is significantly different from zero

621 Gaussian Fit to Returns Estimating (120583119884 120590119884) by Gaus-

sianMLE and thus ignoring the heavy tails 120583119884= 0 cannot be

rejected on a 120572 = 1 level (Table 4(e)) Ignoring heavy tailswe would thus decide to not trade a replicating certificate atall

622 Heavy Tail Fit to Returns Both a heavy tail LambertWtimes Gaussian (Table 4(c)) and Student 119905-fit (Table 4(d)) rejectthe zero mean null (119875-values 10minus4 and 3 sdot 10minus5 resp)

Location and scale estimates are almost identical buttail estimates lead to different conclusions while for ] =

371 only moments up to order 3 exist in the Lambert W times

Gaussian case moments up to order 5 exist (10172 = 581)This is especially noteworthy as many theoretical results inthe (financial) time series literature rely on finite fourthmoments [52 53] consequently many empirical studies testthis assumption [7 54] Here Studentrsquos 119905 and a LambertW times Gaussian fit give different conclusions Since previousempirical studies often use Studentrsquos 119905 as a baseline [41] itmight be worthwhile to reexamine their findings in light ofheavy tail Lambert W times Gaussian distributions

623 ldquoGaussianizingrdquo Financial Returns The back-trans-formed x

120591MLEis indistinguishable from a Gaussian sample

(Figure 7(b)) and thus demonstrates that a Lambert W times

Gaussian distribution is indeed appropriate for x120591MLE

Noteven one test can reject Normality 119875-values are 018 018031 and 024 respectively (Anderson-Darling Cramer-von-Mises Shapiro-Francia Shapiro-Wilk see Thode [55])Table 3(b) confirms that Lambert W ldquoGaussianiziationrdquo was

successful 1205741(x120591) = minus0039 and 120574

2(x120591) = 293 are within the

typical variation for aGaussian sample of size119873 = 2780Thus

119884 = (119880119890(01722)119880

2

) 0705 + 0055 119880 =

119883 minus 0055

0705

119880 simN (0 1)

(36)

is an adequate (unconditional) LambertWtimesGaussianmodelfor the SampP 500 log-returns y For trading thismeans that theexpected return is significantly larger than zero (120583

119883= 0055 gt

0) and thus replicating certificates should be bought

624 Gaussian MLE for Gaussianized Data For 120575119897= 120575

119903equiv

120575 lt 1 also 120583119883equiv 120583

119884 We can therefore replace testing

120583119910= 0 versus 120583

119910= 0 for a non-Gaussian y with the

very well-understood hypothesis test 120583119909= 0 versus 120583

119909= 0

for the Gaussian x120591MLE

In particular standard errors basedon radic119873 and thus t and 119875-values should be closer to theldquotruthrdquo (Tables 4(c) and 4(d)) than a Gaussian MLE on thenon-Gaussian y (Table 4(e)) Table 4(f) shows that standarderrors for 120583x are even a bit too small compared to the heavy-tailed versions Since the ldquoGaussianizingrdquo transformation wasestimated treating x

120591MLEas if it was original data is too

optimistic regarding its Normality (recall the penalty (30) inthe total likelihood (28))

This example confirms that if a model and its theoret-ical properties are based on Normality but the observeddata is heavy-tailed then Gaussianizing the data first givesmore reliable inference than applying Gaussian methods tothe original heavy-tailed data (Figure 1) Clearly a jointestimation of the model parameters based on Lambert Wtimes Gaussian random variables (or any other heavy-taileddistribution) would be optimal However theoretical prop-erties and estimation techniques may not be available orwell understood The Lambert Way to Gaussianize data is

14 The Scientific World Journal

Table 4 MLE fits to SampP 500 y Tables 4(a) 4(b) 4(c) 4(d) and 4(e)and the Gaussianized data x

120591MLETable 4(f)

(a) Double-tail Lambert W times Gaussian = Tukeyrsquos ℎℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 366 000120590119883

071 0016 4400 000120575ℓ

019 0021 899 000120575119903

016 0019 824 000

(b) Skew 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 010 0061 165 010119904 067 0017 3847 000120572 minus008 0101 minus077 044] 373 0297 1257 000

(c) Lambert W times Gaussian = Tukeyrsquos ℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 365 0000120590119883

071 0016 4395 0000120575 017 0016 1105 0000

(d) Studentrsquos 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 006 0015 365 000119904 067 0017 3951 000] 372 0295 1261 000

(e) Gaussian (SampP 500)

Est se 119905 Pr (gt|119905|)120583119884

005 0018 255 001120590119884

095 0013 7457 000

(f) Gaussian (x120591MLE )

Est se 119905 Pr (gt|119905|)120583119909120591

005 0013 381 000120590119909120591

071 0009 7457 000

thus a pragmatic method to improve statistical inferenceon heavy-tailed data while preserving the applicability andinterpretation of Gaussian models

7 Discussion and Outlook

In this work I use the Lambert W function to modeland remove heavy tails from continuous random variablesusing a data-transformation approach For Gaussian randomvariables this not only contributes to existing work onTukeyrsquos ℎ distribution but also gives convincing empiricalresults unimodal data with heavy tails can be transformedto Gaussian data Properties of a Gaussian model MN onthe back-transformed data mimic the features of the ldquotruerdquoheavy-tailed modelM

119866very closely

Since Normality is the single most typical and oftenrequired assumption in many areas of statistics machine

learning and signal processing future research can takemany directions From a theoretical perspective properties ofLambert W times 119865

119883distributions viewed as a generalization of

alreadywell-known distributions119865119883can be studiedThis area

will profit from existing literature on the Lambert W func-tion which has been discovered only recently by the statisticscommunity Empirical work can focus on transforming thedata and comparing approximate Gaussian with joint heavytail analyses The comparisons in this work showed thatapproximate inference for Gaussianized data is comparablewith the direct heavy tail modeling and so provides a simpletool to improve inference for heavy-tailed data in statisticalpractice

I also provide the R package Lambert W publicly avail-able at CRAN to facilitate the use of heavy tail Lambert Wtimes 119865

119883distributions in practice

Disclosure

The first version of this article entitled ldquoClosed-form cdfand pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert Wapproach and how to bijectively lsquoGaussianizersquo heavy-taileddatardquo has appeared online [56]This research was performedwhile the author was at Carnegie Mellon University Cur-rently Georg M Goerg works at Google Inc New York NY10011 USA

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

References

[1] P J Brockwell andR ADavisTime SeriesTheory andMethodsSpringer Series in Statistics 1998

[2] R F Engle ldquoAutoregressive conditional heteroscedasticity withestimates of the variance of United Kingdom inflationrdquo Econo-metrica vol 50 no 4 pp 987ndash1007 1982

[3] C W J Granger and R Joyeux ldquoAn introduction to long-memory time series models and fractional differencingrdquo Jour-nal of Time Series Analysis vol 1 pp 15ndash30 2001

[4] C A Field ldquoUsing the gh distribution to model extreme windspeedsrdquo Journal of Statistical Planning and Inference vol 122 no1-2 pp 15ndash22 2004

[5] A Vazquez J G Oliveira Z Dezso K-I Goh I Kondor andA-L Barabasi ldquoModeling bursts and heavy tails in humandynamicsrdquo Physical Review E vol 73 no 3 Article ID 0361272006

[6] M Gidlund and N Debernardi ldquoScheduling performance ofheavy-tailed data traffic in wireless high-speed shared chan-nelsrdquo in Proceedings of the IEEE Wireless Communications ampNetworking Conference (WCNC rsquo09) pp 1818ndash1823 IEEE PressBudapest Hungary April 2009

[7] R Cont ldquoEmpirical properties of asset returns stylized factsand statistical issuesrdquo Quantitative Finance vol 1 pp 223ndash2362001

[8] T-H Kim and H White ldquoOn more robust estim ation ofskewness and kurtosis simulation and application to the SP500indexrdquo Tech Rep Department of Economics UCSD 2003

The Scientific World Journal 15

[9] T C Aysal and K E Barner ldquoSecond-order heavy-taileddistributions and tail analysisrdquo IEEE Transactions on SignalProcessing vol 54 no 7 pp 2827ndash2832 2006

[10] V K Smith ldquoLeast squares regression with Cauchy errorsrdquoOxford Bulletin of Economics and Statistics vol 35 no 3 pp223ndash231 1973

[11] J Ilow ldquoForecasting network traffic using FARIMAmodels withheavy tailed innovationsrdquo inProceedings of the IEEE InterntionalConference onAcoustics Speech and Signal Processing pp 3814ndash3817 June 2000

[12] W Palma and M Zevallos ldquoFitting non-Gaussian persistentdatardquo Applied Stochastic Models in Business and Industry vol27 no 1 pp 23ndash36 2011

[13] J Nowicka-Zagrajek and R Weron ldquoModeling electricity loadsin California ARMA models with hyperbolic noiserdquo SignalProcessing vol 82 no 12 pp 1903ndash1915 2002

[14] R J Adler R E Feldman and M S Taqqu Eds A PracticalGuide to Heavy Tails Statistical Techniques and ApplicationsBirkhauser Cambridge Mass USA 1998

[15] H Liu J Lafferty and L Wasserman ldquoThe nonparanormalsemiparametric estimation of high dimensional undirectedgraphsrdquo Journal ofMachine Learning Research vol 10 pp 2295ndash2328 2009

[16] G M Goerg ldquoLambert W random variablesmdasha new familyof generalized skewed distributions with applications to riskestimationrdquo The Annals of Applied Statistics vol 5 no 3 pp2197ndash2230 2011

[17] D C Hoaglin ldquoSummarizing shape numerically the g-and-hdistributionsrdquo in Exploring Data Tables Trends and Shapes DC Hoaglin F Mosteller and J W Tukey Eds chapter 11 pp461ndash513 John Wiley amp Sons Hoboken NJ USA 1985

[18] RDevelopment Core Team R A Language and Environment forStatistical Computing R Foundation for Statistical ComputingVienna Austria 2010

[19] C Field and M G Genton ldquoThe Multivariate g-and-h Distri-butionrdquo Technometrics vol 48 no 1 pp 104ndash111 2006

[20] RM Sakia ldquoThe Box-Cox transformation technique a reviewrdquoJournal of the Royal Statistical Society Series D The Statisticianvol 41 no 2 pp 169ndash178 1992

[21] J R Blaylock L E Salathe and R D Green ldquoA note on the Box-Cox transformation under heteroskedasticityrdquoWestern Journalof Agricultural Economics vol 5 no 2 pp 129ndash136 1980

[22] A J Lawrance ldquoA note on the variance of the Box-Cox regres-sion transformation estimaterdquo Journal of the Royal StatisticalSociety C vol 36 no 2 pp 221ndash223 1987

[23] G Tsiotas ldquoOn the use of the Box-Cox transformation onconditional variance modelsrdquo Finance Research Letters vol 4no 1 pp 28ndash32 2007

[24] S Goncalves andNMeddahi ldquoBox-Cox transforms for realizedvolatilityrdquo Journal of Econometrics vol 160 no 1 pp 129ndash1442011

[25] N H Bingham C M Goldie and J L Teugels RegularVariation Encyclopedia of Mathematics and its ApplicationsCambridge University Press Cambridge UK 1989

[26] S Morgenthaler and J W Tukey ldquoFitting quantiles doublingHRHQ andHHH distributionsrdquo Journal of Computational andGraphical Statistics vol 9 no 1 pp 180ndash195 2000

[27] M Fischer ldquoGeneralized Tukey-type distributions with appli-cation to financial and teletraffic datardquo Statistical Papers vol 51no 1 pp 41ndash56 2010

[28] T C Headrick R K Kowalchuk and Y Sheng ldquoParametricprobability densities and distribution functions for Tukey g-and-h transform ations and their use for fi tting datardquo AppliedMathematical Sciences vol 2 no 9 pp 449ndash462 2008

[29] R M Corless G H Gonnet D E Hare D J Jeffrey andD E Knuth ldquoOn the Lambert 119882 functionrdquo Advances inComputational Mathematics vol 5 no 4 pp 329ndash359 1996

[30] M Rosenlicht ldquoOn the explicit solvability of certain transcen-dental equationsrdquo Publications Mathematiques de lrsquoInstitut desHautes Etudes Scientifiques no 36 pp 15ndash22 1969

[31] S R Valluri D J Jeffrey and R M Corless ldquoSome applicationsof the Lambert W function to physicsrdquo Canadian Journal ofPhysics vol 78 no 9 pp 823ndash831 2000

[32] M Galassi J Davies J Theiler et al GNU Scientific LibraryReference Manual 3rd edition 2011 httpwwwgnuorgsoft-waregsl

[33] P Jodra ldquoA closed-form expression for the quantile functionof the Gompertz-Makeham distributionrdquo Mathematics andComputers in Simulation vol 79 no 10 pp 3069ndash3075 2009

[34] A G Pakes ldquoLambertrsquos W infinite divisibility and poissonmixturesrdquo Journal of Mathematical Analysis and Applicationsvol 378 no 2 pp 480ndash492 2011

[35] M Stehlık ldquoDistributions of exact tests in the exponentialfamilyrdquoMetrika vol 57 no 2 pp 145ndash164 2003

[36] J-N Hwang S-R Lay and A Lippman ldquoNonparametricmultivariate density estimation a comparative studyrdquo IEEETransactions on Signal Processing vol 42 no 10 pp 2795ndash28101994

[37] L Wasserman All of Nonparametric Statistics Springer Texts inStatistics 2007

[38] R E Maiboroda and N M Markovich ldquoEstimation of heavy-tailed probability density functionwith application toweb datardquoComputational Statistics vol 19 no 4 pp 569ndash592 2004

[39] N M Markovich ldquoAccuracy of transformed kernel densityestimates for a heavy-tailed distributionrdquo Automation andRemote Control vol 66 no 2 pp 217ndash232 2005

[40] K K Dutta and D Babbel ldquoOn measuring skewness andkurtosis in short rate distributions the case of the US dollarLondon inter bank offer ratesrdquo Tech Rep Wharton SchoolCenter for Financial Institutions University of Pennsylvania2002

[41] C S Wong W S Chan and P L Kam ldquoA Student t-mixtureautoregressivemodel with applications to heavy-tailed financialdatardquo Biometrika vol 96 no 3 pp 751ndash760 2009

[42] J Yan ldquoAsymmetry fat-tail and autoregressive conditionaldensity in financial return data with systems of frequencycurvesrdquo 2005 httpciteseerxistpsueduviewdocsummarydoi=1011762741

[43] G D Rayner and H L MacGillivray ldquoNumerical maximumlikelihood estimation for the g-and-k and generalized g-and-hdistributionsrdquo Statistics and Computing vol 12 no 1 pp 57ndash752002

[44] E L Lehmann and G Casella Theory of Point EstimationSpringer Texts in Statistics Springer New York NY USA 2ndedition 1998

[45] C Fernandez andM F Steel ldquoMultivariate Student-119905 regressionmodels pitfalls and inferencerdquo Biometrika vol 86 no 1 pp153ndash167 1999

[46] C Liu and D B Rubin ldquoML estimation of the t distributionusing EM and its extensions ECMand ECMErdquo Statistica Sinicavol 5 pp 19ndash39 1995

16 The Scientific World Journal

[47] M P Cohen ldquoSample means of independent standard CAUchyrandom variables are standard cauchy a new approachrdquo TheAmerican Mathematical Monthly vol 119 no 3 pp 240ndash2442012

[48] T Bollerslev ldquoGeneralized autoregressive conditional het-eroskedasticityrdquo Journal of Econometrics vol 31 no 3 pp 307ndash327 1986

[49] R Deo C Hurvich and Y Lu ldquoForecasting realized volatilityusing a long-memory stochastic volatility model estimationprediction and seasonal adjustmentrdquo Journal of Econometricsvol 131 no 1-2 pp 29ndash58 2006

[50] AMelino and SM Turnbull ldquoPricing foreign currency optionswith stochastic volatilityrdquo Journal of Econometrics vol 45 no 1-2 pp 239ndash265 1990

[51] A Azzalini and A Capitanio ldquoDistributions generated byperturbation of symmetrywith emphasis on amultivariate skewt-distributionrdquo Journal of the Royal Statistical Society Series BStatistical Methodology vol 65 no 2 pp 367ndash389 2003

[52] R N Mantegna and H E Stanley ldquoModeling of financial datacomparison of the truncated Levy flight and the ARCH(1) andGARCH(11) processesrdquo Physica A Statistical Mechanics and itsApplications vol 254 no 1-2 pp 77ndash84 1998

[53] P A Zadrozny Necessary and Sufficient Restrictions for Exis-tence of a Unique Fourth Moment of a Univariate GARCH(pq)Process vol 20 of Advances in Econometrics chapter 13Emerald Group Publishing 2005 httpsideasrepecorgpcesceswps 1505html

[54] R Huisman K G Koedijk C J Kool and F Palm ldquoTail-indexestimates in small samplesrdquo Journal of Business amp EconomicStatistics vol 19 no 2 pp 208ndash216 2001

[55] H CThode JrTesting for Normality CRCPress NewYorkNYUSA 2002

[56] G M Goerg ldquoClosed-form cdf and pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert W approach and howto bijectively ldquoGaussianizerdquo heavy-tailed datardquo Tech RepCarnegieMellonUniversity 2010 httparxiv-web3librarycor-nelleduabs10102265v1

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article The Lambert Way to Gaussianize Heavy

The Scientific World Journal 7

000 005 010 015 020 02510

15

20

25

Varia

nce

Studentrsquos t

120575 or 1

Heavy tail Lambert W times Gaussian

(a) Variance

000 005 010 0153

4

5

6

7

8

9

10

Kurt

osis

120575 or 1

Studentrsquos tHeavy tail Lambert W times Gaussian

(b) Kurtosis

Figure 5 Comparing moments of Lambert W times Gaussian and Studentrsquos 119905

Proof Take119883 simN(120583119883 120590

2

119883) in Theorem 7

Section 41 studies functional properties of (23) in moredetail

31 Tukeyrsquos ℎ versus Studentrsquos 119905 Studentrsquos 119905]-distribution with] degrees of freedom is often used to model heavy-tailed data[41 42] as its tail index equals ] Thus the 119899th moment of aStudentrsquos 119905 random variable 119879 exists if 119899 lt ] In particular

E119879 = E1198793= 0 if ] gt 1 or gt 3

E1198792=

]] minus 2

=

1

1 minus (2])if ] gt 2

(24)

and kurtosis

1205742 (]) = 3

] minus 2] minus 4

= 3

1 minus 2 (1])1 minus 4 (1])

if ] gt 4 (25)

Comparing (24) and (25) with (18) and (19) shows anatural association between 1] and 120575 and a close similaritybetween the first four moments of Studentrsquos 119905 and Tukeyrsquosℎ (Figure 5) By continuity and monotonicity the first fourmoments of a location-scale 119905-distribution can always beexactly matched by a corresponding location-scale LambertW times Gaussian Thus if Studentrsquos 119905 is used to model heavytails and not as the true distribution of a test statisticit might be worthwhile to also fit heavy tail Lambert Wtimes Gaussian distributions for an equally valuable ldquosecondopinionrdquo For example a parallel analysis on SampP 500 log-returns in Section 62 leads to divergent inference regardingthe existence of fourth moments

4 Parameter Estimation

Due to the lack of a closed form pdf of 119884 120579 = (120573 120575) hastypically been estimated by matching quantiles or a methodof moments estimator [4 26 28] These methods can now

be replaced by the fast and usually efficient maximumlikelihood estimator (MLE) Rayner and MacGillivray [43]introduce a numerical MLE procedure based on quantilefunctions but they conclude that ldquosample sizes significantlylarger than 100 should be used to obtain reliable estimatesrdquoSimulations in Section 5 show that the MLE using the closedform Lambert W times 119865

119883distribution converges quickly and is

accurate even for sample sizes as small as119873 = 10

41 Maximum Likelihood Estimation (MLE) For an iidsample (119910

1 119910

119873) = y sim 119892

119884(119910 | 120573 120575) the log-likelihood

function equals

ℓ (120579 y) =119873

sum

119894=1

log119892119884(119910

119894| 120573 120575) (26)

The MLE is that 120579 = (120573 120575) which maximizes (26) that is

120579MLE = (

120573 120575)MLE

= argmax120573120575ℓ (120573 120575 y) (27)

Since 119892119884(119910119894| 120573 120575) is a function of 119891

119883(119909119894| 120573) the MLE

depends on the specification of the input density Equation(26) can be decomposed as

ℓ (120573 120575 y) = ℓ (120573 x120591) +R (120591 y) (28)

where

ℓ (120573 x120591) =

119873

sum

119894=1

log119891119883(119882

120575(

119910119894minus 120583

119883

120590119883

)120590119883+ 120583

119883| 120573)

=

119873

sum

119894=1

log119891119883(119909

120591119894| 120573)

(29)

is the log-likelihood of the back-transformed data x120591=

(1199091205911 119909

120591119873) (via (8)) and

R (120591 y) =119899

sum

119894=1

log119877 (120583119883 120590119883 120575 119910

119894) (30)

8 The Scientific World Journal

y

01

015 02 025 03 035 04

045 05

055

06 065 07 075 08 085 09 095

1

00 05 10 1500

05

10

15

20

25

30

120575

R(120575 y | 120583 = 0 120590 = 1)

(a) Penalty 119877(120583119883 120590119883 120575 119910119894)(31) as a function of120575 and 119910 (120583119883 = 0 and 120590119883 = 1)

0 200 400 600 800 1000

minus10

0

10

20

30

40

Index00 05 10 15

minus3000

minus2000

minus1000

0

Log-

likel

ihoo

d (a

nd p

enal

ty)

Penalty

Input loglikOutput loglik

120575

120575trueMLE

(b) Sample z of Lambert W times Gaussian with 120575 = 13 (left) log-likelihood ℓ(120579 y) (solid black)decomposes in input log-likelihood (dotted green) and penalty (dashed red)

Figure 6 Log-likelihood decomposition for Lambert W times 119865119883distributions

where

119877 (120583119883 120590119883 120575 119910

119894)

=

119882120575((119910

119894minus 120583

119883) 120590

119883)

((119910119894minus 120583

119883) 120590

119883) [1 + 120575 (119882

120575((119910

119894minus 120583

119883) 120590

119883))2]

(31)

Note that 119877(120583119883 120590119883 120575 119910

119894) only depends on 120583

119883(120573) and 120590

119883(120573)

(and 120575) but not necessarily on every coordinate of 120573Decomposition (28) shows the difference between the

exact MLE (120573 120575) based on y and the approximate MLE 120573x120591based on x

120591alone if we knew 120591 = (120583

119883 120590119883 120575) beforehand

then we could back-transform y to x120591and estimate 120573x120591 from

x120591(maximize (29) with respect to 120573) In practice however

120591 must also be estimated and this enters the likelihood viathe additive term R(120591 y) A little calculation shows that forany 119910

119894isin R log119877(120583

119883 120590119883 120575 119910

119894) le 0 if 120575 ge 0 with equality

if and only if 120575 = 0 Thus R(120591 y) can be interpreted as apenalty for transforming the data Maximizing (28) faces atrade-off between transforming the data to follow 119891

119883(119909 | 120573)

(and increasing ℓ(120573 x120591)) and the penalty of a more extreme

transformation (and decreasingR(120591 y)) see Figure 6(b)Figure 6(a) shows a contour plot of 119877(120583

119883= 0 120590

119883=

1 120575 119910) as a function of 120575 and 119910 = 119911 it increases (in absolutevalue) either if 120575 gets larger (for fixed 119910) or for larger 119910 (forfixed 120575) In both cases increasing 120575 makes the transformeddata119882

120575(119911) get closer to 0 = 120583

119883 which in turn increases its

input likelihood For 120575 = 0 the penalty disappears since inputequals output for 119910 = 0 there is no penalty since119882

120575(0) = 0

for all 120575Figure 6(b) shows an iid sample (119873 = 1000) z sim

Lambert W times Gaussian with 120575 = 13 and the decompositionof the log-likelihood as in (28) Since 120573 = (0 1) is knownthe likelihood and penalty are only functions of 120575Theorem 9

shows that the convexity of the penalty (decreasing red)and concavity of the input likelihood (increasing green) asa function of 120575 holds true in general for any data z and theirsum (solid black) has a unique maximum here 120575MLE = 037(blue dashed vertical line) See Theorem 9 below for details

The maximization of (28) can be carried out numericallyHere I show existence and uniqueness of 120575MLE assuming that120583119883and 120590

119883are known Further theoretical results for 120579MLE

remain for futureworkGiven the ldquonicerdquo formof119892119884(119910) which

is continuous twice differentiable (under the assumption that119891119909(sdot) is twice differentiable) the MLE for 120579 = (120573 120575) shouldhave the usual optimality properties such as being consistentand efficient [44]

411 Properties of the MLE for the Heavy Tail Parameter 120575Without loss of generality let 120583

119883= 0 and 120590

119883= 1 In this case

ℓ (120575 z) prop minus

1

2

119873

sum

119894=1

[119882120575(119911119894)]2+

119873

sum

119894=1

log119882120575(119911119894)

119911119894

minus

119873

sum

119894=1

log (1 + 120575 [119882120575(119911119894)]2)

(32)

= minus

1 + 120575

2

119873

sum

119894=1

[119882120575(119911119894)]2minus

119873

sum

119894=1

log (1 + 120575 [119882120575(119911119894)]2)

(33)

Theorem 9 (see [14]) Let 119885 have a Lambert W times Gaussiandistribution where 120583

119883= 0 and 120590

119883= 1 are assumed to be

known and fixed Also consider only 120575 isin [0infin) (While forsome samples z the MLE also exists for 120575 lt 0 it cannot beguaranteed for all z If 120575 lt 0 (and 119911 = 0) then119882

120575(119911) is either

The Scientific World Journal 9

not unique in R (principal and nonprincipal branch) or doesnot have a real-valued solution in R if 1205751199112 lt 119890minus1)

(a) If

sum119899

119894=11199114

119894

sum119899

119894=11199112

119894

le 3 (34)

then 120575119872119871119864

= 0If (34) does not hold then

(b) 120575119872119871119864

gt 0 exists and is a positive solution to

119873

sum

119894=1

1199112

119894119882

1015840(120575119911

2

119894)(

1

2

119882120575(119911119894)2minus (

1

2

+

1

1 +119882(1205751199112

119894)

)) = 0 (35)

(c) There is only one such 120575 satisfying (35) that is 120575119872119871119864

isunique

Proof See Appendix B in the supplementary material

Condition (34) says that 120575MLE gt 0 only if the datais heavy-tailed enough Points (b) and (c) guarantee thatthere is no ambiguity in the heavy tail estimate This is anadvantage over Studentrsquos 119905-distribution for example whichhas numerical problems and local maxima for unknown (andsmall) ] [45 46] On the contrary 120575MLE is always a globalmaximum

Given the heavy tails in z one might expect convergenceissues for larger 120575 However 119882

120575(119885) sim N(0 1) for the true

120575 ge 0 and close to a standard Gaussian if 120575MLE asymp 120575 Since thelog-likelihood and its gradient depend on 120575 and z only via119882120575(z) the performance of theMLE should thus not get worse

for large 120575 as long as the initial estimate is close enough to thetruth Simulations in Section 5 support this conjecture evenfor 120579MLE

42 Iterative Generalized Method of Moments (IGMM) Adisadvantage of the MLE is the mandatory a priori specifi-cation of the input distribution Especially for heavy-taileddata the eye is a bad judge to choose a particular parametric119891119883(119909 | 120573) It would be useful to directly estimate 120591 without

the intermediate step of estimating 120579 firstGoerg [16] presented an estimator for 120591 based on iterative

generalized methods of moments (IGMM) The idea ofIGMM is to find a 120591 such that the back-transformed datax120591has desired properties for example being symmetric or

having kurtosis 3 An estimator for 120583119883 120590

119883 and 120575 can be

constructed entirely analogous to the IGMMestimator for theskewed LambertWtimes F case See the SupplementaryMaterialAppendix C for details

IGMM requires less specific knowledge about the inputdistribution and is usually also faster than the MLE Once120591IGMM has been obtained the back-transformed x

120591IGMMcan be

used to check if 119883 has characteristics of a known parametricdistribution 119865

119883(119909 | 120573) It must be noted though that testing

for a particular distribution 119865119883is too optimistic as x

120591has

ldquonicerrdquo properties regarding 119865119883than the true x would have

However estimating the transformation requires only threeparameters and for a large enough sample losing threedegrees of freedom should not matter in practice

5 Simulations

This section explores finite sample properties of estimatorsfor 120579 = (120583

119883 120590119883 120575) and (120583

119884 120590119884) under Gaussian input

119883 sim N(120583119883 120590

2

119883) In particular it compares Gaussian MLE

(estimation of 120583119884and 120590

119884only) IGMM and Lambert W times

Gaussian MLE and for a heavy tail competitor the median(For IGMM optimization was restricted to 120575 isin [0 10]) Allresults below are based on 119899 = 1 000 replications

51 Estimating 120575 Only Here I show finite sample propertiesof 120575MLE for119880 simN(0 1) where 120583

119883= 0 and 120590

119883= 1 are known

and fixed Theorem 9 shows that 120575MLE is unique either at theboundary 120575 = 0 or at the globally optimal solution to (35)Results in Table 1 were obtained by numerical optimizationrestricted to 120575 ge 0 (hArr log 120575 isin R) using the nlm function in119877

Table 1 suggests that the MLE is asymptotically unbiasedfor every 120575 and converges quickly (at about 119873 = 400) to itsasymptotic variance which is increasing with 120575 Assuming120583119883and 120590

119883to be known is unrealistic and thus these finite

sample properties are only an indication of the behavior of thejoint MLE 120579MLE Nevertheless they are very remarkable forextremely heavy-tailed data (120575 gt 1) where classic average-based methods typically break down One reason lies in theparticular form of the likelihood (32) and its gradient (35)(Theorem 9) although both depend on z they only do sothrough 119882

120575(z) = u sim N(0 1) Hence as long as 120575MLE is

sufficiently close to the true 120575 (32) and (35) are functions ofalmost Gaussian random variables and standard asymptoticresults should still apply

52 Estimating All Parameters Jointly Here we consider therealistic scenario where 120583

119883and 120590

119883are also unknown We

consider various sample sizes (119873 = 50 100 and 1000) anddifferent degrees of heavy tails 120575 isin 0 13 1 15 each onerepresenting a particularly interesting situation (i) Gaussiandata (does additional superfluous estimation of120575 affect otherestimates) (ii) fourthmoments do not exist (iii) nonexistingmean and (iv) extremely heavy-tailed data (can we get usefulestimates at all)

The convergence tolerance for IGMM was set to tol =122 sdot 10

minus4 Table 2 summarizes the simulationThe Gaussian MLE estimates 120590

119884directly while IGMM

and the Lambert W times Gaussian MLE estimate 120575 and120590119883 which implicitly give

119884through 120590

119884(120575 120590

119883) = 120590

119883sdot

(1radic(1 minus 2120575)32) if 120575 lt 12 (see (18)) For a fair comparison

each subtable also includes a column for 119884

= 119883sdot

(1radic(1 minus 2120575)32) Some of these entries contain ldquoinfinrdquo even for

120575 lt 12 this occurs if at least one 120575 ge 12 For any 120575 lt 1120583119883= 120583

119884 thus 120583

119883and 120583

119884can be compared directly For 120575 ge 1

10 The Scientific World Journal

Table 1 Finite sample properties of 120575MLE For each119873 120575 was estimated 119899 = 1000 times from a random sample zsimTukeyrsquos ℎ The left columnfor each 120575 shows bias 120575MLE minus 120575 each right column shows the root mean square error (RMSE) timesradic119873

119873 120575 = 0 120575 = 110 120575 = 13 120575 = 12

10 0025 0191 minus0017 0394 minus0042 0915 minus0082 1167

50 0013 0187 minus0010 0492 minus0018 0931 minus0016 1156

100 0010 0200 minus0010 0513 minus0009 0914 minus0006 1225

400 0005 0186 minus0003 0528 0000 0927 minus0004 1211

1000 0003 0197 0000 0532 minus0001 0928 minus0001 1203

2000 0003 0217 minus0001 0523 0000 0935 minus0001 1130

119873 120575 = 1 120575 = 2 120575 = 5

10 minus0054 1987 minus0104 3384 minus0050 7601

50 minus0017 1948 minus0009 3529 0014 7942

100 minus0014 2024 minus0001 3294 0011 7798

400 0001 1919 minus0002 3433 0001 7855

1000 0001 1955 0001 3553 minus0001 7409

2000 0001 1896 0000 3508 minus0001 7578

the mean does not exist each subtable for these 120575 interprets120583119884as the median

Gaussian Data (120575 = 0) This setting checks if imposing theLambert W framework even though it is superfluous causesa quality loss in the estimation of 120583

119884= 120583

119883or 120590

119884= 120590

119883

Furthermore critical values for 1198670 120575 = 0 (Gaussian) can

be obtained As in the 120575-only case above Table 2(a) suggeststhat estimators are asymptotically unbiased and quickly tendto a large-sample variance Additional estimation of 120575 doesnot affect the efficiency of120583

119883compared to estimating solely120583

Estimating 120590119884directly by Gaussian MLE does not give better

results than the Lambert W times Gaussian MLE

No Fourth Moment (120575 = 13) Here 120590119884(120575 120590

119883= 1) =

228 but fourth moments do not exist This results in anincreasing empirical standard deviation of

119910as119873 grows In

contrast estimates for 120590119883are not drifting off In the presence

of these large heavy tails themedian ismuch less variable thanGaussianMLE and IGMM Yet Lambert W timesGaussianMLEfor 120583

119883even outperforms the median

Nonexisting Mean (120575 = 1) Both sample moments divergeand their standard errors are also growing quickly Themedian still provides a very good estimate for the locationbut is again inferior to both Lambert W estimators whichare closer to the true values and appear to converge to anasymptotic variance at rateradic119873

Extreme Heavy Tails (120575 = 15) As in Section 51 IGMMand Lambert WMLE continue to provide excellent estimateseven though the data is extremely heavy tailed MoreoverLambert W MLE also has the smallest empirical standarddeviation overall In particular the Lambert W MLE for 120583

119883

has an approximately 15 lower standard deviation than themedian

The last column shows that for some 119873 about 1 of the119899 = 1 000 simulations generated invalid likelihood values(NA and infin) Here the search for the optimal 120575 led into

regions with a numerical overflow in the evaluation of119882120575(119911)

For a comparable summary these few cases were omitted andnew simulations added until full 119899 = 1 000 finite estimateswere found Since this only happened in 1 of the cases andalso such heavy-tailed data is rarely encountered in practicethis numerical issue is not a limitation in statistical practice

53 Discussion of the Simulations IGMM performs wellindependent of the magnitude of 120575 As expected the LambertWMLE for 120579 has the best properties it can recover the truthfor all 120575 and for 120575 = 0 it performs as well as the classicsample mean and standard deviation For small 120575 it has thesame empirical standard deviation as the Gaussian MLE buta lower one than the median for large 120575

Hence the only advantage of estimating 120583119884and 120590

119884by

sample moments of y is speed otherwise the Lambert W times

Gaussian MLE is at least as good as the Gaussian MLE andclearly outperforms it in presence of heavy tails

6 Applications

Tukeyrsquos ℎ distribution has already proven useful to modelheavy-tailed data but parametric inference was limited toquantile fitting ormethods ofmoments estimation [4 27 28]Theorem 7 allows us to estimate 120579 by ML

This section applies the presented methodology on sim-ulated as well as real world data (i) Section 61 demonstratesGaussianizing on the Cauchy sample from the Introductionand (ii) Section 62 shows that heavy tail Lambert W times

Gaussian distributions provide an excellent fit to daily SampP500 log-return series

61 Estimating Location of a Cauchy with the Sample MeanIt is well known that the sample mean y is a poor estimateof the location parameter of a Cauchy distribution since thesampling distribution of y is again a Cauchy (see [47] for arecent overview) in particular its variance does not go to 0for119873 rarr infin

The Scientific World Journal 11

Table 2 In each subtable (first rows) average (middle rows) proportion of estimates below truth and (bottom rows) empirical standarddeviation timesradic119873 True 120583

119883= 0 and 120590

119883= 1

(a) Truly Gaussian data 120575 = 0

120575 = 0 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 098 000 097 002 099 000 096 002 098 0

100 000 000 099 000 098 001 100 000 097 001 099 0

1000 000 000 100 000 099 000 100 000 099 000 100 0

50 050 050 057 051 060 066 054 051 065 066 056 0

100 050 051 056 051 062 062 053 052 065 062 056 0

1000 050 049 052 049 062 056 052 049 063 056 052 0

50 124 101 072 101 076 021 073 102 078 026 072 0

100 125 102 070 102 076 023 070 103 078 026 070 0

1000 126 098 073 098 079 022 073 098 079 022 073 0

(b) No fourth moments 120575 = 13

120575 = 13 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 198 000 107 029 infin 000 101 033 infin 0

100 000 000 203 000 104 031 infin 000 100 033 infin 0

1000 000 000 218 000 100 033 234 000 100 033 234 0

50 050 051 078 050 038 063 060 050 052 054 054 0

100 050 051 078 051 042 061 060 050 051 054 054 0

1000 048 051 077 051 047 056 055 051 050 053 052 0

50 127 221 656 144 145 110 NA 123 135 114 NA 0

100 130 233 1128 143 142 112 NA 119 134 109 NA 0

1000 123 225 1676 139 145 120 1597 117 133 108 1230 0

(c) Non-existing mean 120575 = 1

120575 = 1 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 minus010 246 minus001 118 090 infin 000 101 099 infin 0

100 000 074 724 000 109 095 infin 000 101 099 infin 0

1000 000 384 3481 000 101 100 infin 000 100 100 infin 0

50 053 052 10 051 034 065 1 051 052 052 1 0

100 050 052 10 051 038 063 1 050 053 053 1 0

1000 049 052 10 051 048 053 1 049 051 051 1 0

50 127 6585 4243 210 250 232 NA 119 170 216 NA 0

100 130 41075 40502 201 228 259 NA 117 174 225 NA 0

1000 126 330758 1040527 193 221 281 NA 111 164 218 NA 0

(d) Extreme heavy tails 120575 = 15

120575 = 15 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 minus002 684 309 minus002 123 137 infin minus001 100 149 infin 001

100 000 minus5116 3080 minus001 112 144 infin 000 101 150 infin 000

1000 000 17613 14251 000 101 149 infin 000 100 150 infin 000

50 053 048 1 051 034 064 1 053 053 054 1 001

100 051 053 1 054 037 061 1 052 051 051 1 000

1000 050 050 1 050 047 054 1 049 053 052 1 000

50 132 134771 9261 257 320 312 NA 115 186 276 NA 001

100 133 4215628 418435 239 287 344 NA 112 178 284 NA 000

1000 126 12446282 3903629 218 266 367 NA 111 180 285 NA 000

12 The Scientific World Journal

Table 3 Summary statistics for observed (heavy-tailed) y and back-transformed (Gaussianized) data x120591MLE

lowastlowast stands for lt10minus16and lowast forlt22 sdot 10

minus16

y sim C(0 1)

(Section 61)y = SampP 500(Section 62)

y x120591

x

y x120591

Min minus16159 minus316 0 minus711 minus242Max 95295 381 3318 499 223Mean 230 003 1498 005 005Median 004 004 1496 004 004Standard Deviation 46980 106 120 095 071Skewness 1743 012 390 minus030 minus004Kurtosis 34334 321 16175 770 293Shapiro-Wilk lowast 071 lowastlowast lowast 024Anderson-Darling lowastlowast 051 lowastlowast lowast 018

Heavy-tailed Lambert W times Gaussian distributions havesimilar properties to a Cauchy for 120575 asymp 1 The mean of119883 (120583

119883)

equals the location of119884 (119888) due to symmetry around 120583119883and 119888

respectively Thus we can estimate 120591 from the Cauchy sampley transform y to x

120591 get 120583

119883from x

120591= 119882

120591(y) and thus obtain

an estimate for 119888 by 119888 = 120583119883

The random sample y sim C(0 1) with pdf 119891(119910) = 1120587(1+1199102) in Figure 2(a) has heavy tails with two extreme (positive)

observations A Cauchy ML fit gives 119888 = 003(0055) and119904 = 086(0053) (standard errors in parenthesis) A LambertWtimesGaussianMLEgives120583

119883= 003(0055)120590

119883= 105(0072)

and 120575 = 086(0082) Thus both fits correctly fail to reject120583119883= 119888 = 0 Table 3(a) shows summary statistics on both

samples Since the Cauchy distribution does not have a well-defined mean y = 2304(2101) is not meaningful Howeverx120591MLE

is approximately Gaussian and we can use the sampleaverage for inference x

120591MLE= 0033(00472) correctly fails to

reject a zero location for y The transformed x120591MLE

featuresadditional Gaussian characteristics (symmetric no excesskurtosis) and even the null hypothesis of Normality cannotbe rejected (119875-valuege 05) Note however that Normality forthe transformed data is only an empirical approximation therandom variable119882

120591((119884 minus 120583

119883)120590

119883) where 119884 is Cauchy does

not have a Normal distributionFigure 2(d) shows the cumulative sample average for

the original sample and its Gaussianized version For afair comparison 120591(119899)MLE was reestimated cumulatively for each119899 = 5 500 and then used to compute (119909

1 119909

119899) The

transformation works extremely well data point 11991049is highly

influential for y but has no relevant effect on x120591(119899)

MLE Even for

small 119899 it is already clear that the location of the underlyingCauchy distribution is approximately zero

Although it is a simulated example it demonstrates thatremoving (strong) heavy tails from data works well andprovides ldquonicerdquo data that can then be analyzed with morerefined Gaussian methods

62 Heavy Tails in Finance SampP 500 Case Study A lot offinancial data displays negative skewness and excess kurto-sis Since financial data in general is not iid it is oftenmodeled with a (skew) Studentrsquos 119905-distribution underlyinga (generalized) autoregressive conditional heteroskedastic(GARCH) [2 48] or a stochastic volatility (SV) [49 50]model Using the Lambert W approach we can build uponthe knowledge and implications of Normality (and avoidderiving properties of a GARCH or SV model with heavy-tailed innovations) and simply ldquoGaussianizerdquo the returnsbefore fitting more complex GARCH or SV models

Remark 10 Time series models with Lambert W times Gaussianwhite noise are far beyond the scope of this work but canbe a direction of future research Here I only consider theunconditional distribution

Figure 7(a) shows the SampP 500 log-returns with a totalof 119873 = 2 780 daily observations (R package MASSdataset SP500) Table 3(b) confirms the heavy tails (samplekurtosis 770) but also indicates negative skewness (minus0296)As the sample skewness 120574

1(y) is very sensitive to outliers

we fit a distribution which allows skewness and test forsymmetry In case of the double-tail Lambert W times Gaussianthis means testing 119867

0 120575

ℓ= 120575

119903= 120575 versus 119867

1

120575ℓ

= 120575119903 Using the likelihood expression in (28) we can

use a likelihood ratio test with one degree of freedom (3versus 4 parameters) The log-likelihood of the double-tail fit(Table 4(a)) equals minus36060 = minus297227 + (minus63373) (inputlog-likelihood + penalty) while the symmetric 120575 fit givesminus360656 = minus297147 + (minus63509) Here the symmetric fitgives a transformed sample that is more Gaussian but it paysa greater penalty for transforming the data Comparing twicetheir difference to a1205942

1distribution gives a119875-value of 029 For

comparison a skew-119905 fit [51] with location 119888 scale 119904 shape120572 and ] degrees of freedom also yields (Function stmle

in the R package sn) a nonsignificant (Table 4(b)) Thusboth fits cannot reject symmetry

The Scientific World Journal 13

Time

y

0 500 1500 2500

minus6

minus2

2

0 5 2010 3000

04

08

Lag

ACF

y

minus5 0 500

03

06

KernelGaussian

minus3 minus2 minus1 0 1 2 3

minus6

minus2

2

Theoretical quantiles

Sam

ple q

uant

ilesMean 005

SD 095Skewminus03Kurt 771

(a) Observed heavy tail returns y

Time

x

0 500 1500 2500

minus2

012

0 5 10 20 3000

04

08

Lag

ACF

x

00

03

06

minus3 minus2 minus1 0 1 2 3minus3 minus2 minus1 0 1 2 3

minus2

012

Theoretical quantiles

KernelGaussian

Sam

ple q

uant

iles

Mean 005SD 07Skewminus004Kurt 293

(b) Gaussianized returns x120591MLE

Figure 7 Lambert W Gaussianization of SampP 500 log-returns 120591 = (005 070 017) In (a) and (b) data (top left) autocorrelation function(ACF) (top right) histogram Gaussian fit and KDE (bottom left) Normal QQ plot (bottom right)

Assume we have to make a decision if we should trade acertificate replicating the SampP 500 Since we can either buyor sell it is not important if the average return is positive ornegative as long as it is significantly different from zero

621 Gaussian Fit to Returns Estimating (120583119884 120590119884) by Gaus-

sianMLE and thus ignoring the heavy tails 120583119884= 0 cannot be

rejected on a 120572 = 1 level (Table 4(e)) Ignoring heavy tailswe would thus decide to not trade a replicating certificate atall

622 Heavy Tail Fit to Returns Both a heavy tail LambertWtimes Gaussian (Table 4(c)) and Student 119905-fit (Table 4(d)) rejectthe zero mean null (119875-values 10minus4 and 3 sdot 10minus5 resp)

Location and scale estimates are almost identical buttail estimates lead to different conclusions while for ] =

371 only moments up to order 3 exist in the Lambert W times

Gaussian case moments up to order 5 exist (10172 = 581)This is especially noteworthy as many theoretical results inthe (financial) time series literature rely on finite fourthmoments [52 53] consequently many empirical studies testthis assumption [7 54] Here Studentrsquos 119905 and a LambertW times Gaussian fit give different conclusions Since previousempirical studies often use Studentrsquos 119905 as a baseline [41] itmight be worthwhile to reexamine their findings in light ofheavy tail Lambert W times Gaussian distributions

623 ldquoGaussianizingrdquo Financial Returns The back-trans-formed x

120591MLEis indistinguishable from a Gaussian sample

(Figure 7(b)) and thus demonstrates that a Lambert W times

Gaussian distribution is indeed appropriate for x120591MLE

Noteven one test can reject Normality 119875-values are 018 018031 and 024 respectively (Anderson-Darling Cramer-von-Mises Shapiro-Francia Shapiro-Wilk see Thode [55])Table 3(b) confirms that Lambert W ldquoGaussianiziationrdquo was

successful 1205741(x120591) = minus0039 and 120574

2(x120591) = 293 are within the

typical variation for aGaussian sample of size119873 = 2780Thus

119884 = (119880119890(01722)119880

2

) 0705 + 0055 119880 =

119883 minus 0055

0705

119880 simN (0 1)

(36)

is an adequate (unconditional) LambertWtimesGaussianmodelfor the SampP 500 log-returns y For trading thismeans that theexpected return is significantly larger than zero (120583

119883= 0055 gt

0) and thus replicating certificates should be bought

624 Gaussian MLE for Gaussianized Data For 120575119897= 120575

119903equiv

120575 lt 1 also 120583119883equiv 120583

119884 We can therefore replace testing

120583119910= 0 versus 120583

119910= 0 for a non-Gaussian y with the

very well-understood hypothesis test 120583119909= 0 versus 120583

119909= 0

for the Gaussian x120591MLE

In particular standard errors basedon radic119873 and thus t and 119875-values should be closer to theldquotruthrdquo (Tables 4(c) and 4(d)) than a Gaussian MLE on thenon-Gaussian y (Table 4(e)) Table 4(f) shows that standarderrors for 120583x are even a bit too small compared to the heavy-tailed versions Since the ldquoGaussianizingrdquo transformation wasestimated treating x

120591MLEas if it was original data is too

optimistic regarding its Normality (recall the penalty (30) inthe total likelihood (28))

This example confirms that if a model and its theoret-ical properties are based on Normality but the observeddata is heavy-tailed then Gaussianizing the data first givesmore reliable inference than applying Gaussian methods tothe original heavy-tailed data (Figure 1) Clearly a jointestimation of the model parameters based on Lambert Wtimes Gaussian random variables (or any other heavy-taileddistribution) would be optimal However theoretical prop-erties and estimation techniques may not be available orwell understood The Lambert Way to Gaussianize data is

14 The Scientific World Journal

Table 4 MLE fits to SampP 500 y Tables 4(a) 4(b) 4(c) 4(d) and 4(e)and the Gaussianized data x

120591MLETable 4(f)

(a) Double-tail Lambert W times Gaussian = Tukeyrsquos ℎℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 366 000120590119883

071 0016 4400 000120575ℓ

019 0021 899 000120575119903

016 0019 824 000

(b) Skew 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 010 0061 165 010119904 067 0017 3847 000120572 minus008 0101 minus077 044] 373 0297 1257 000

(c) Lambert W times Gaussian = Tukeyrsquos ℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 365 0000120590119883

071 0016 4395 0000120575 017 0016 1105 0000

(d) Studentrsquos 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 006 0015 365 000119904 067 0017 3951 000] 372 0295 1261 000

(e) Gaussian (SampP 500)

Est se 119905 Pr (gt|119905|)120583119884

005 0018 255 001120590119884

095 0013 7457 000

(f) Gaussian (x120591MLE )

Est se 119905 Pr (gt|119905|)120583119909120591

005 0013 381 000120590119909120591

071 0009 7457 000

thus a pragmatic method to improve statistical inferenceon heavy-tailed data while preserving the applicability andinterpretation of Gaussian models

7 Discussion and Outlook

In this work I use the Lambert W function to modeland remove heavy tails from continuous random variablesusing a data-transformation approach For Gaussian randomvariables this not only contributes to existing work onTukeyrsquos ℎ distribution but also gives convincing empiricalresults unimodal data with heavy tails can be transformedto Gaussian data Properties of a Gaussian model MN onthe back-transformed data mimic the features of the ldquotruerdquoheavy-tailed modelM

119866very closely

Since Normality is the single most typical and oftenrequired assumption in many areas of statistics machine

learning and signal processing future research can takemany directions From a theoretical perspective properties ofLambert W times 119865

119883distributions viewed as a generalization of

alreadywell-known distributions119865119883can be studiedThis area

will profit from existing literature on the Lambert W func-tion which has been discovered only recently by the statisticscommunity Empirical work can focus on transforming thedata and comparing approximate Gaussian with joint heavytail analyses The comparisons in this work showed thatapproximate inference for Gaussianized data is comparablewith the direct heavy tail modeling and so provides a simpletool to improve inference for heavy-tailed data in statisticalpractice

I also provide the R package Lambert W publicly avail-able at CRAN to facilitate the use of heavy tail Lambert Wtimes 119865

119883distributions in practice

Disclosure

The first version of this article entitled ldquoClosed-form cdfand pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert Wapproach and how to bijectively lsquoGaussianizersquo heavy-taileddatardquo has appeared online [56]This research was performedwhile the author was at Carnegie Mellon University Cur-rently Georg M Goerg works at Google Inc New York NY10011 USA

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

References

[1] P J Brockwell andR ADavisTime SeriesTheory andMethodsSpringer Series in Statistics 1998

[2] R F Engle ldquoAutoregressive conditional heteroscedasticity withestimates of the variance of United Kingdom inflationrdquo Econo-metrica vol 50 no 4 pp 987ndash1007 1982

[3] C W J Granger and R Joyeux ldquoAn introduction to long-memory time series models and fractional differencingrdquo Jour-nal of Time Series Analysis vol 1 pp 15ndash30 2001

[4] C A Field ldquoUsing the gh distribution to model extreme windspeedsrdquo Journal of Statistical Planning and Inference vol 122 no1-2 pp 15ndash22 2004

[5] A Vazquez J G Oliveira Z Dezso K-I Goh I Kondor andA-L Barabasi ldquoModeling bursts and heavy tails in humandynamicsrdquo Physical Review E vol 73 no 3 Article ID 0361272006

[6] M Gidlund and N Debernardi ldquoScheduling performance ofheavy-tailed data traffic in wireless high-speed shared chan-nelsrdquo in Proceedings of the IEEE Wireless Communications ampNetworking Conference (WCNC rsquo09) pp 1818ndash1823 IEEE PressBudapest Hungary April 2009

[7] R Cont ldquoEmpirical properties of asset returns stylized factsand statistical issuesrdquo Quantitative Finance vol 1 pp 223ndash2362001

[8] T-H Kim and H White ldquoOn more robust estim ation ofskewness and kurtosis simulation and application to the SP500indexrdquo Tech Rep Department of Economics UCSD 2003

The Scientific World Journal 15

[9] T C Aysal and K E Barner ldquoSecond-order heavy-taileddistributions and tail analysisrdquo IEEE Transactions on SignalProcessing vol 54 no 7 pp 2827ndash2832 2006

[10] V K Smith ldquoLeast squares regression with Cauchy errorsrdquoOxford Bulletin of Economics and Statistics vol 35 no 3 pp223ndash231 1973

[11] J Ilow ldquoForecasting network traffic using FARIMAmodels withheavy tailed innovationsrdquo inProceedings of the IEEE InterntionalConference onAcoustics Speech and Signal Processing pp 3814ndash3817 June 2000

[12] W Palma and M Zevallos ldquoFitting non-Gaussian persistentdatardquo Applied Stochastic Models in Business and Industry vol27 no 1 pp 23ndash36 2011

[13] J Nowicka-Zagrajek and R Weron ldquoModeling electricity loadsin California ARMA models with hyperbolic noiserdquo SignalProcessing vol 82 no 12 pp 1903ndash1915 2002

[14] R J Adler R E Feldman and M S Taqqu Eds A PracticalGuide to Heavy Tails Statistical Techniques and ApplicationsBirkhauser Cambridge Mass USA 1998

[15] H Liu J Lafferty and L Wasserman ldquoThe nonparanormalsemiparametric estimation of high dimensional undirectedgraphsrdquo Journal ofMachine Learning Research vol 10 pp 2295ndash2328 2009

[16] G M Goerg ldquoLambert W random variablesmdasha new familyof generalized skewed distributions with applications to riskestimationrdquo The Annals of Applied Statistics vol 5 no 3 pp2197ndash2230 2011

[17] D C Hoaglin ldquoSummarizing shape numerically the g-and-hdistributionsrdquo in Exploring Data Tables Trends and Shapes DC Hoaglin F Mosteller and J W Tukey Eds chapter 11 pp461ndash513 John Wiley amp Sons Hoboken NJ USA 1985

[18] RDevelopment Core Team R A Language and Environment forStatistical Computing R Foundation for Statistical ComputingVienna Austria 2010

[19] C Field and M G Genton ldquoThe Multivariate g-and-h Distri-butionrdquo Technometrics vol 48 no 1 pp 104ndash111 2006

[20] RM Sakia ldquoThe Box-Cox transformation technique a reviewrdquoJournal of the Royal Statistical Society Series D The Statisticianvol 41 no 2 pp 169ndash178 1992

[21] J R Blaylock L E Salathe and R D Green ldquoA note on the Box-Cox transformation under heteroskedasticityrdquoWestern Journalof Agricultural Economics vol 5 no 2 pp 129ndash136 1980

[22] A J Lawrance ldquoA note on the variance of the Box-Cox regres-sion transformation estimaterdquo Journal of the Royal StatisticalSociety C vol 36 no 2 pp 221ndash223 1987

[23] G Tsiotas ldquoOn the use of the Box-Cox transformation onconditional variance modelsrdquo Finance Research Letters vol 4no 1 pp 28ndash32 2007

[24] S Goncalves andNMeddahi ldquoBox-Cox transforms for realizedvolatilityrdquo Journal of Econometrics vol 160 no 1 pp 129ndash1442011

[25] N H Bingham C M Goldie and J L Teugels RegularVariation Encyclopedia of Mathematics and its ApplicationsCambridge University Press Cambridge UK 1989

[26] S Morgenthaler and J W Tukey ldquoFitting quantiles doublingHRHQ andHHH distributionsrdquo Journal of Computational andGraphical Statistics vol 9 no 1 pp 180ndash195 2000

[27] M Fischer ldquoGeneralized Tukey-type distributions with appli-cation to financial and teletraffic datardquo Statistical Papers vol 51no 1 pp 41ndash56 2010

[28] T C Headrick R K Kowalchuk and Y Sheng ldquoParametricprobability densities and distribution functions for Tukey g-and-h transform ations and their use for fi tting datardquo AppliedMathematical Sciences vol 2 no 9 pp 449ndash462 2008

[29] R M Corless G H Gonnet D E Hare D J Jeffrey andD E Knuth ldquoOn the Lambert 119882 functionrdquo Advances inComputational Mathematics vol 5 no 4 pp 329ndash359 1996

[30] M Rosenlicht ldquoOn the explicit solvability of certain transcen-dental equationsrdquo Publications Mathematiques de lrsquoInstitut desHautes Etudes Scientifiques no 36 pp 15ndash22 1969

[31] S R Valluri D J Jeffrey and R M Corless ldquoSome applicationsof the Lambert W function to physicsrdquo Canadian Journal ofPhysics vol 78 no 9 pp 823ndash831 2000

[32] M Galassi J Davies J Theiler et al GNU Scientific LibraryReference Manual 3rd edition 2011 httpwwwgnuorgsoft-waregsl

[33] P Jodra ldquoA closed-form expression for the quantile functionof the Gompertz-Makeham distributionrdquo Mathematics andComputers in Simulation vol 79 no 10 pp 3069ndash3075 2009

[34] A G Pakes ldquoLambertrsquos W infinite divisibility and poissonmixturesrdquo Journal of Mathematical Analysis and Applicationsvol 378 no 2 pp 480ndash492 2011

[35] M Stehlık ldquoDistributions of exact tests in the exponentialfamilyrdquoMetrika vol 57 no 2 pp 145ndash164 2003

[36] J-N Hwang S-R Lay and A Lippman ldquoNonparametricmultivariate density estimation a comparative studyrdquo IEEETransactions on Signal Processing vol 42 no 10 pp 2795ndash28101994

[37] L Wasserman All of Nonparametric Statistics Springer Texts inStatistics 2007

[38] R E Maiboroda and N M Markovich ldquoEstimation of heavy-tailed probability density functionwith application toweb datardquoComputational Statistics vol 19 no 4 pp 569ndash592 2004

[39] N M Markovich ldquoAccuracy of transformed kernel densityestimates for a heavy-tailed distributionrdquo Automation andRemote Control vol 66 no 2 pp 217ndash232 2005

[40] K K Dutta and D Babbel ldquoOn measuring skewness andkurtosis in short rate distributions the case of the US dollarLondon inter bank offer ratesrdquo Tech Rep Wharton SchoolCenter for Financial Institutions University of Pennsylvania2002

[41] C S Wong W S Chan and P L Kam ldquoA Student t-mixtureautoregressivemodel with applications to heavy-tailed financialdatardquo Biometrika vol 96 no 3 pp 751ndash760 2009

[42] J Yan ldquoAsymmetry fat-tail and autoregressive conditionaldensity in financial return data with systems of frequencycurvesrdquo 2005 httpciteseerxistpsueduviewdocsummarydoi=1011762741

[43] G D Rayner and H L MacGillivray ldquoNumerical maximumlikelihood estimation for the g-and-k and generalized g-and-hdistributionsrdquo Statistics and Computing vol 12 no 1 pp 57ndash752002

[44] E L Lehmann and G Casella Theory of Point EstimationSpringer Texts in Statistics Springer New York NY USA 2ndedition 1998

[45] C Fernandez andM F Steel ldquoMultivariate Student-119905 regressionmodels pitfalls and inferencerdquo Biometrika vol 86 no 1 pp153ndash167 1999

[46] C Liu and D B Rubin ldquoML estimation of the t distributionusing EM and its extensions ECMand ECMErdquo Statistica Sinicavol 5 pp 19ndash39 1995

16 The Scientific World Journal

[47] M P Cohen ldquoSample means of independent standard CAUchyrandom variables are standard cauchy a new approachrdquo TheAmerican Mathematical Monthly vol 119 no 3 pp 240ndash2442012

[48] T Bollerslev ldquoGeneralized autoregressive conditional het-eroskedasticityrdquo Journal of Econometrics vol 31 no 3 pp 307ndash327 1986

[49] R Deo C Hurvich and Y Lu ldquoForecasting realized volatilityusing a long-memory stochastic volatility model estimationprediction and seasonal adjustmentrdquo Journal of Econometricsvol 131 no 1-2 pp 29ndash58 2006

[50] AMelino and SM Turnbull ldquoPricing foreign currency optionswith stochastic volatilityrdquo Journal of Econometrics vol 45 no 1-2 pp 239ndash265 1990

[51] A Azzalini and A Capitanio ldquoDistributions generated byperturbation of symmetrywith emphasis on amultivariate skewt-distributionrdquo Journal of the Royal Statistical Society Series BStatistical Methodology vol 65 no 2 pp 367ndash389 2003

[52] R N Mantegna and H E Stanley ldquoModeling of financial datacomparison of the truncated Levy flight and the ARCH(1) andGARCH(11) processesrdquo Physica A Statistical Mechanics and itsApplications vol 254 no 1-2 pp 77ndash84 1998

[53] P A Zadrozny Necessary and Sufficient Restrictions for Exis-tence of a Unique Fourth Moment of a Univariate GARCH(pq)Process vol 20 of Advances in Econometrics chapter 13Emerald Group Publishing 2005 httpsideasrepecorgpcesceswps 1505html

[54] R Huisman K G Koedijk C J Kool and F Palm ldquoTail-indexestimates in small samplesrdquo Journal of Business amp EconomicStatistics vol 19 no 2 pp 208ndash216 2001

[55] H CThode JrTesting for Normality CRCPress NewYorkNYUSA 2002

[56] G M Goerg ldquoClosed-form cdf and pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert W approach and howto bijectively ldquoGaussianizerdquo heavy-tailed datardquo Tech RepCarnegieMellonUniversity 2010 httparxiv-web3librarycor-nelleduabs10102265v1

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article The Lambert Way to Gaussianize Heavy

8 The Scientific World Journal

y

01

015 02 025 03 035 04

045 05

055

06 065 07 075 08 085 09 095

1

00 05 10 1500

05

10

15

20

25

30

120575

R(120575 y | 120583 = 0 120590 = 1)

(a) Penalty 119877(120583119883 120590119883 120575 119910119894)(31) as a function of120575 and 119910 (120583119883 = 0 and 120590119883 = 1)

0 200 400 600 800 1000

minus10

0

10

20

30

40

Index00 05 10 15

minus3000

minus2000

minus1000

0

Log-

likel

ihoo

d (a

nd p

enal

ty)

Penalty

Input loglikOutput loglik

120575

120575trueMLE

(b) Sample z of Lambert W times Gaussian with 120575 = 13 (left) log-likelihood ℓ(120579 y) (solid black)decomposes in input log-likelihood (dotted green) and penalty (dashed red)

Figure 6 Log-likelihood decomposition for Lambert W times 119865119883distributions

where

119877 (120583119883 120590119883 120575 119910

119894)

=

119882120575((119910

119894minus 120583

119883) 120590

119883)

((119910119894minus 120583

119883) 120590

119883) [1 + 120575 (119882

120575((119910

119894minus 120583

119883) 120590

119883))2]

(31)

Note that 119877(120583119883 120590119883 120575 119910

119894) only depends on 120583

119883(120573) and 120590

119883(120573)

(and 120575) but not necessarily on every coordinate of 120573Decomposition (28) shows the difference between the

exact MLE (120573 120575) based on y and the approximate MLE 120573x120591based on x

120591alone if we knew 120591 = (120583

119883 120590119883 120575) beforehand

then we could back-transform y to x120591and estimate 120573x120591 from

x120591(maximize (29) with respect to 120573) In practice however

120591 must also be estimated and this enters the likelihood viathe additive term R(120591 y) A little calculation shows that forany 119910

119894isin R log119877(120583

119883 120590119883 120575 119910

119894) le 0 if 120575 ge 0 with equality

if and only if 120575 = 0 Thus R(120591 y) can be interpreted as apenalty for transforming the data Maximizing (28) faces atrade-off between transforming the data to follow 119891

119883(119909 | 120573)

(and increasing ℓ(120573 x120591)) and the penalty of a more extreme

transformation (and decreasingR(120591 y)) see Figure 6(b)Figure 6(a) shows a contour plot of 119877(120583

119883= 0 120590

119883=

1 120575 119910) as a function of 120575 and 119910 = 119911 it increases (in absolutevalue) either if 120575 gets larger (for fixed 119910) or for larger 119910 (forfixed 120575) In both cases increasing 120575 makes the transformeddata119882

120575(119911) get closer to 0 = 120583

119883 which in turn increases its

input likelihood For 120575 = 0 the penalty disappears since inputequals output for 119910 = 0 there is no penalty since119882

120575(0) = 0

for all 120575Figure 6(b) shows an iid sample (119873 = 1000) z sim

Lambert W times Gaussian with 120575 = 13 and the decompositionof the log-likelihood as in (28) Since 120573 = (0 1) is knownthe likelihood and penalty are only functions of 120575Theorem 9

shows that the convexity of the penalty (decreasing red)and concavity of the input likelihood (increasing green) asa function of 120575 holds true in general for any data z and theirsum (solid black) has a unique maximum here 120575MLE = 037(blue dashed vertical line) See Theorem 9 below for details

The maximization of (28) can be carried out numericallyHere I show existence and uniqueness of 120575MLE assuming that120583119883and 120590

119883are known Further theoretical results for 120579MLE

remain for futureworkGiven the ldquonicerdquo formof119892119884(119910) which

is continuous twice differentiable (under the assumption that119891119909(sdot) is twice differentiable) the MLE for 120579 = (120573 120575) shouldhave the usual optimality properties such as being consistentand efficient [44]

411 Properties of the MLE for the Heavy Tail Parameter 120575Without loss of generality let 120583

119883= 0 and 120590

119883= 1 In this case

ℓ (120575 z) prop minus

1

2

119873

sum

119894=1

[119882120575(119911119894)]2+

119873

sum

119894=1

log119882120575(119911119894)

119911119894

minus

119873

sum

119894=1

log (1 + 120575 [119882120575(119911119894)]2)

(32)

= minus

1 + 120575

2

119873

sum

119894=1

[119882120575(119911119894)]2minus

119873

sum

119894=1

log (1 + 120575 [119882120575(119911119894)]2)

(33)

Theorem 9 (see [14]) Let 119885 have a Lambert W times Gaussiandistribution where 120583

119883= 0 and 120590

119883= 1 are assumed to be

known and fixed Also consider only 120575 isin [0infin) (While forsome samples z the MLE also exists for 120575 lt 0 it cannot beguaranteed for all z If 120575 lt 0 (and 119911 = 0) then119882

120575(119911) is either

The Scientific World Journal 9

not unique in R (principal and nonprincipal branch) or doesnot have a real-valued solution in R if 1205751199112 lt 119890minus1)

(a) If

sum119899

119894=11199114

119894

sum119899

119894=11199112

119894

le 3 (34)

then 120575119872119871119864

= 0If (34) does not hold then

(b) 120575119872119871119864

gt 0 exists and is a positive solution to

119873

sum

119894=1

1199112

119894119882

1015840(120575119911

2

119894)(

1

2

119882120575(119911119894)2minus (

1

2

+

1

1 +119882(1205751199112

119894)

)) = 0 (35)

(c) There is only one such 120575 satisfying (35) that is 120575119872119871119864

isunique

Proof See Appendix B in the supplementary material

Condition (34) says that 120575MLE gt 0 only if the datais heavy-tailed enough Points (b) and (c) guarantee thatthere is no ambiguity in the heavy tail estimate This is anadvantage over Studentrsquos 119905-distribution for example whichhas numerical problems and local maxima for unknown (andsmall) ] [45 46] On the contrary 120575MLE is always a globalmaximum

Given the heavy tails in z one might expect convergenceissues for larger 120575 However 119882

120575(119885) sim N(0 1) for the true

120575 ge 0 and close to a standard Gaussian if 120575MLE asymp 120575 Since thelog-likelihood and its gradient depend on 120575 and z only via119882120575(z) the performance of theMLE should thus not get worse

for large 120575 as long as the initial estimate is close enough to thetruth Simulations in Section 5 support this conjecture evenfor 120579MLE

42 Iterative Generalized Method of Moments (IGMM) Adisadvantage of the MLE is the mandatory a priori specifi-cation of the input distribution Especially for heavy-taileddata the eye is a bad judge to choose a particular parametric119891119883(119909 | 120573) It would be useful to directly estimate 120591 without

the intermediate step of estimating 120579 firstGoerg [16] presented an estimator for 120591 based on iterative

generalized methods of moments (IGMM) The idea ofIGMM is to find a 120591 such that the back-transformed datax120591has desired properties for example being symmetric or

having kurtosis 3 An estimator for 120583119883 120590

119883 and 120575 can be

constructed entirely analogous to the IGMMestimator for theskewed LambertWtimes F case See the SupplementaryMaterialAppendix C for details

IGMM requires less specific knowledge about the inputdistribution and is usually also faster than the MLE Once120591IGMM has been obtained the back-transformed x

120591IGMMcan be

used to check if 119883 has characteristics of a known parametricdistribution 119865

119883(119909 | 120573) It must be noted though that testing

for a particular distribution 119865119883is too optimistic as x

120591has

ldquonicerrdquo properties regarding 119865119883than the true x would have

However estimating the transformation requires only threeparameters and for a large enough sample losing threedegrees of freedom should not matter in practice

5 Simulations

This section explores finite sample properties of estimatorsfor 120579 = (120583

119883 120590119883 120575) and (120583

119884 120590119884) under Gaussian input

119883 sim N(120583119883 120590

2

119883) In particular it compares Gaussian MLE

(estimation of 120583119884and 120590

119884only) IGMM and Lambert W times

Gaussian MLE and for a heavy tail competitor the median(For IGMM optimization was restricted to 120575 isin [0 10]) Allresults below are based on 119899 = 1 000 replications

51 Estimating 120575 Only Here I show finite sample propertiesof 120575MLE for119880 simN(0 1) where 120583

119883= 0 and 120590

119883= 1 are known

and fixed Theorem 9 shows that 120575MLE is unique either at theboundary 120575 = 0 or at the globally optimal solution to (35)Results in Table 1 were obtained by numerical optimizationrestricted to 120575 ge 0 (hArr log 120575 isin R) using the nlm function in119877

Table 1 suggests that the MLE is asymptotically unbiasedfor every 120575 and converges quickly (at about 119873 = 400) to itsasymptotic variance which is increasing with 120575 Assuming120583119883and 120590

119883to be known is unrealistic and thus these finite

sample properties are only an indication of the behavior of thejoint MLE 120579MLE Nevertheless they are very remarkable forextremely heavy-tailed data (120575 gt 1) where classic average-based methods typically break down One reason lies in theparticular form of the likelihood (32) and its gradient (35)(Theorem 9) although both depend on z they only do sothrough 119882

120575(z) = u sim N(0 1) Hence as long as 120575MLE is

sufficiently close to the true 120575 (32) and (35) are functions ofalmost Gaussian random variables and standard asymptoticresults should still apply

52 Estimating All Parameters Jointly Here we consider therealistic scenario where 120583

119883and 120590

119883are also unknown We

consider various sample sizes (119873 = 50 100 and 1000) anddifferent degrees of heavy tails 120575 isin 0 13 1 15 each onerepresenting a particularly interesting situation (i) Gaussiandata (does additional superfluous estimation of120575 affect otherestimates) (ii) fourthmoments do not exist (iii) nonexistingmean and (iv) extremely heavy-tailed data (can we get usefulestimates at all)

The convergence tolerance for IGMM was set to tol =122 sdot 10

minus4 Table 2 summarizes the simulationThe Gaussian MLE estimates 120590

119884directly while IGMM

and the Lambert W times Gaussian MLE estimate 120575 and120590119883 which implicitly give

119884through 120590

119884(120575 120590

119883) = 120590

119883sdot

(1radic(1 minus 2120575)32) if 120575 lt 12 (see (18)) For a fair comparison

each subtable also includes a column for 119884

= 119883sdot

(1radic(1 minus 2120575)32) Some of these entries contain ldquoinfinrdquo even for

120575 lt 12 this occurs if at least one 120575 ge 12 For any 120575 lt 1120583119883= 120583

119884 thus 120583

119883and 120583

119884can be compared directly For 120575 ge 1

10 The Scientific World Journal

Table 1 Finite sample properties of 120575MLE For each119873 120575 was estimated 119899 = 1000 times from a random sample zsimTukeyrsquos ℎ The left columnfor each 120575 shows bias 120575MLE minus 120575 each right column shows the root mean square error (RMSE) timesradic119873

119873 120575 = 0 120575 = 110 120575 = 13 120575 = 12

10 0025 0191 minus0017 0394 minus0042 0915 minus0082 1167

50 0013 0187 minus0010 0492 minus0018 0931 minus0016 1156

100 0010 0200 minus0010 0513 minus0009 0914 minus0006 1225

400 0005 0186 minus0003 0528 0000 0927 minus0004 1211

1000 0003 0197 0000 0532 minus0001 0928 minus0001 1203

2000 0003 0217 minus0001 0523 0000 0935 minus0001 1130

119873 120575 = 1 120575 = 2 120575 = 5

10 minus0054 1987 minus0104 3384 minus0050 7601

50 minus0017 1948 minus0009 3529 0014 7942

100 minus0014 2024 minus0001 3294 0011 7798

400 0001 1919 minus0002 3433 0001 7855

1000 0001 1955 0001 3553 minus0001 7409

2000 0001 1896 0000 3508 minus0001 7578

the mean does not exist each subtable for these 120575 interprets120583119884as the median

Gaussian Data (120575 = 0) This setting checks if imposing theLambert W framework even though it is superfluous causesa quality loss in the estimation of 120583

119884= 120583

119883or 120590

119884= 120590

119883

Furthermore critical values for 1198670 120575 = 0 (Gaussian) can

be obtained As in the 120575-only case above Table 2(a) suggeststhat estimators are asymptotically unbiased and quickly tendto a large-sample variance Additional estimation of 120575 doesnot affect the efficiency of120583

119883compared to estimating solely120583

Estimating 120590119884directly by Gaussian MLE does not give better

results than the Lambert W times Gaussian MLE

No Fourth Moment (120575 = 13) Here 120590119884(120575 120590

119883= 1) =

228 but fourth moments do not exist This results in anincreasing empirical standard deviation of

119910as119873 grows In

contrast estimates for 120590119883are not drifting off In the presence

of these large heavy tails themedian ismuch less variable thanGaussianMLE and IGMM Yet Lambert W timesGaussianMLEfor 120583

119883even outperforms the median

Nonexisting Mean (120575 = 1) Both sample moments divergeand their standard errors are also growing quickly Themedian still provides a very good estimate for the locationbut is again inferior to both Lambert W estimators whichare closer to the true values and appear to converge to anasymptotic variance at rateradic119873

Extreme Heavy Tails (120575 = 15) As in Section 51 IGMMand Lambert WMLE continue to provide excellent estimateseven though the data is extremely heavy tailed MoreoverLambert W MLE also has the smallest empirical standarddeviation overall In particular the Lambert W MLE for 120583

119883

has an approximately 15 lower standard deviation than themedian

The last column shows that for some 119873 about 1 of the119899 = 1 000 simulations generated invalid likelihood values(NA and infin) Here the search for the optimal 120575 led into

regions with a numerical overflow in the evaluation of119882120575(119911)

For a comparable summary these few cases were omitted andnew simulations added until full 119899 = 1 000 finite estimateswere found Since this only happened in 1 of the cases andalso such heavy-tailed data is rarely encountered in practicethis numerical issue is not a limitation in statistical practice

53 Discussion of the Simulations IGMM performs wellindependent of the magnitude of 120575 As expected the LambertWMLE for 120579 has the best properties it can recover the truthfor all 120575 and for 120575 = 0 it performs as well as the classicsample mean and standard deviation For small 120575 it has thesame empirical standard deviation as the Gaussian MLE buta lower one than the median for large 120575

Hence the only advantage of estimating 120583119884and 120590

119884by

sample moments of y is speed otherwise the Lambert W times

Gaussian MLE is at least as good as the Gaussian MLE andclearly outperforms it in presence of heavy tails

6 Applications

Tukeyrsquos ℎ distribution has already proven useful to modelheavy-tailed data but parametric inference was limited toquantile fitting ormethods ofmoments estimation [4 27 28]Theorem 7 allows us to estimate 120579 by ML

This section applies the presented methodology on sim-ulated as well as real world data (i) Section 61 demonstratesGaussianizing on the Cauchy sample from the Introductionand (ii) Section 62 shows that heavy tail Lambert W times

Gaussian distributions provide an excellent fit to daily SampP500 log-return series

61 Estimating Location of a Cauchy with the Sample MeanIt is well known that the sample mean y is a poor estimateof the location parameter of a Cauchy distribution since thesampling distribution of y is again a Cauchy (see [47] for arecent overview) in particular its variance does not go to 0for119873 rarr infin

The Scientific World Journal 11

Table 2 In each subtable (first rows) average (middle rows) proportion of estimates below truth and (bottom rows) empirical standarddeviation timesradic119873 True 120583

119883= 0 and 120590

119883= 1

(a) Truly Gaussian data 120575 = 0

120575 = 0 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 098 000 097 002 099 000 096 002 098 0

100 000 000 099 000 098 001 100 000 097 001 099 0

1000 000 000 100 000 099 000 100 000 099 000 100 0

50 050 050 057 051 060 066 054 051 065 066 056 0

100 050 051 056 051 062 062 053 052 065 062 056 0

1000 050 049 052 049 062 056 052 049 063 056 052 0

50 124 101 072 101 076 021 073 102 078 026 072 0

100 125 102 070 102 076 023 070 103 078 026 070 0

1000 126 098 073 098 079 022 073 098 079 022 073 0

(b) No fourth moments 120575 = 13

120575 = 13 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 198 000 107 029 infin 000 101 033 infin 0

100 000 000 203 000 104 031 infin 000 100 033 infin 0

1000 000 000 218 000 100 033 234 000 100 033 234 0

50 050 051 078 050 038 063 060 050 052 054 054 0

100 050 051 078 051 042 061 060 050 051 054 054 0

1000 048 051 077 051 047 056 055 051 050 053 052 0

50 127 221 656 144 145 110 NA 123 135 114 NA 0

100 130 233 1128 143 142 112 NA 119 134 109 NA 0

1000 123 225 1676 139 145 120 1597 117 133 108 1230 0

(c) Non-existing mean 120575 = 1

120575 = 1 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 minus010 246 minus001 118 090 infin 000 101 099 infin 0

100 000 074 724 000 109 095 infin 000 101 099 infin 0

1000 000 384 3481 000 101 100 infin 000 100 100 infin 0

50 053 052 10 051 034 065 1 051 052 052 1 0

100 050 052 10 051 038 063 1 050 053 053 1 0

1000 049 052 10 051 048 053 1 049 051 051 1 0

50 127 6585 4243 210 250 232 NA 119 170 216 NA 0

100 130 41075 40502 201 228 259 NA 117 174 225 NA 0

1000 126 330758 1040527 193 221 281 NA 111 164 218 NA 0

(d) Extreme heavy tails 120575 = 15

120575 = 15 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 minus002 684 309 minus002 123 137 infin minus001 100 149 infin 001

100 000 minus5116 3080 minus001 112 144 infin 000 101 150 infin 000

1000 000 17613 14251 000 101 149 infin 000 100 150 infin 000

50 053 048 1 051 034 064 1 053 053 054 1 001

100 051 053 1 054 037 061 1 052 051 051 1 000

1000 050 050 1 050 047 054 1 049 053 052 1 000

50 132 134771 9261 257 320 312 NA 115 186 276 NA 001

100 133 4215628 418435 239 287 344 NA 112 178 284 NA 000

1000 126 12446282 3903629 218 266 367 NA 111 180 285 NA 000

12 The Scientific World Journal

Table 3 Summary statistics for observed (heavy-tailed) y and back-transformed (Gaussianized) data x120591MLE

lowastlowast stands for lt10minus16and lowast forlt22 sdot 10

minus16

y sim C(0 1)

(Section 61)y = SampP 500(Section 62)

y x120591

x

y x120591

Min minus16159 minus316 0 minus711 minus242Max 95295 381 3318 499 223Mean 230 003 1498 005 005Median 004 004 1496 004 004Standard Deviation 46980 106 120 095 071Skewness 1743 012 390 minus030 minus004Kurtosis 34334 321 16175 770 293Shapiro-Wilk lowast 071 lowastlowast lowast 024Anderson-Darling lowastlowast 051 lowastlowast lowast 018

Heavy-tailed Lambert W times Gaussian distributions havesimilar properties to a Cauchy for 120575 asymp 1 The mean of119883 (120583

119883)

equals the location of119884 (119888) due to symmetry around 120583119883and 119888

respectively Thus we can estimate 120591 from the Cauchy sampley transform y to x

120591 get 120583

119883from x

120591= 119882

120591(y) and thus obtain

an estimate for 119888 by 119888 = 120583119883

The random sample y sim C(0 1) with pdf 119891(119910) = 1120587(1+1199102) in Figure 2(a) has heavy tails with two extreme (positive)

observations A Cauchy ML fit gives 119888 = 003(0055) and119904 = 086(0053) (standard errors in parenthesis) A LambertWtimesGaussianMLEgives120583

119883= 003(0055)120590

119883= 105(0072)

and 120575 = 086(0082) Thus both fits correctly fail to reject120583119883= 119888 = 0 Table 3(a) shows summary statistics on both

samples Since the Cauchy distribution does not have a well-defined mean y = 2304(2101) is not meaningful Howeverx120591MLE

is approximately Gaussian and we can use the sampleaverage for inference x

120591MLE= 0033(00472) correctly fails to

reject a zero location for y The transformed x120591MLE

featuresadditional Gaussian characteristics (symmetric no excesskurtosis) and even the null hypothesis of Normality cannotbe rejected (119875-valuege 05) Note however that Normality forthe transformed data is only an empirical approximation therandom variable119882

120591((119884 minus 120583

119883)120590

119883) where 119884 is Cauchy does

not have a Normal distributionFigure 2(d) shows the cumulative sample average for

the original sample and its Gaussianized version For afair comparison 120591(119899)MLE was reestimated cumulatively for each119899 = 5 500 and then used to compute (119909

1 119909

119899) The

transformation works extremely well data point 11991049is highly

influential for y but has no relevant effect on x120591(119899)

MLE Even for

small 119899 it is already clear that the location of the underlyingCauchy distribution is approximately zero

Although it is a simulated example it demonstrates thatremoving (strong) heavy tails from data works well andprovides ldquonicerdquo data that can then be analyzed with morerefined Gaussian methods

62 Heavy Tails in Finance SampP 500 Case Study A lot offinancial data displays negative skewness and excess kurto-sis Since financial data in general is not iid it is oftenmodeled with a (skew) Studentrsquos 119905-distribution underlyinga (generalized) autoregressive conditional heteroskedastic(GARCH) [2 48] or a stochastic volatility (SV) [49 50]model Using the Lambert W approach we can build uponthe knowledge and implications of Normality (and avoidderiving properties of a GARCH or SV model with heavy-tailed innovations) and simply ldquoGaussianizerdquo the returnsbefore fitting more complex GARCH or SV models

Remark 10 Time series models with Lambert W times Gaussianwhite noise are far beyond the scope of this work but canbe a direction of future research Here I only consider theunconditional distribution

Figure 7(a) shows the SampP 500 log-returns with a totalof 119873 = 2 780 daily observations (R package MASSdataset SP500) Table 3(b) confirms the heavy tails (samplekurtosis 770) but also indicates negative skewness (minus0296)As the sample skewness 120574

1(y) is very sensitive to outliers

we fit a distribution which allows skewness and test forsymmetry In case of the double-tail Lambert W times Gaussianthis means testing 119867

0 120575

ℓ= 120575

119903= 120575 versus 119867

1

120575ℓ

= 120575119903 Using the likelihood expression in (28) we can

use a likelihood ratio test with one degree of freedom (3versus 4 parameters) The log-likelihood of the double-tail fit(Table 4(a)) equals minus36060 = minus297227 + (minus63373) (inputlog-likelihood + penalty) while the symmetric 120575 fit givesminus360656 = minus297147 + (minus63509) Here the symmetric fitgives a transformed sample that is more Gaussian but it paysa greater penalty for transforming the data Comparing twicetheir difference to a1205942

1distribution gives a119875-value of 029 For

comparison a skew-119905 fit [51] with location 119888 scale 119904 shape120572 and ] degrees of freedom also yields (Function stmle

in the R package sn) a nonsignificant (Table 4(b)) Thusboth fits cannot reject symmetry

The Scientific World Journal 13

Time

y

0 500 1500 2500

minus6

minus2

2

0 5 2010 3000

04

08

Lag

ACF

y

minus5 0 500

03

06

KernelGaussian

minus3 minus2 minus1 0 1 2 3

minus6

minus2

2

Theoretical quantiles

Sam

ple q

uant

ilesMean 005

SD 095Skewminus03Kurt 771

(a) Observed heavy tail returns y

Time

x

0 500 1500 2500

minus2

012

0 5 10 20 3000

04

08

Lag

ACF

x

00

03

06

minus3 minus2 minus1 0 1 2 3minus3 minus2 minus1 0 1 2 3

minus2

012

Theoretical quantiles

KernelGaussian

Sam

ple q

uant

iles

Mean 005SD 07Skewminus004Kurt 293

(b) Gaussianized returns x120591MLE

Figure 7 Lambert W Gaussianization of SampP 500 log-returns 120591 = (005 070 017) In (a) and (b) data (top left) autocorrelation function(ACF) (top right) histogram Gaussian fit and KDE (bottom left) Normal QQ plot (bottom right)

Assume we have to make a decision if we should trade acertificate replicating the SampP 500 Since we can either buyor sell it is not important if the average return is positive ornegative as long as it is significantly different from zero

621 Gaussian Fit to Returns Estimating (120583119884 120590119884) by Gaus-

sianMLE and thus ignoring the heavy tails 120583119884= 0 cannot be

rejected on a 120572 = 1 level (Table 4(e)) Ignoring heavy tailswe would thus decide to not trade a replicating certificate atall

622 Heavy Tail Fit to Returns Both a heavy tail LambertWtimes Gaussian (Table 4(c)) and Student 119905-fit (Table 4(d)) rejectthe zero mean null (119875-values 10minus4 and 3 sdot 10minus5 resp)

Location and scale estimates are almost identical buttail estimates lead to different conclusions while for ] =

371 only moments up to order 3 exist in the Lambert W times

Gaussian case moments up to order 5 exist (10172 = 581)This is especially noteworthy as many theoretical results inthe (financial) time series literature rely on finite fourthmoments [52 53] consequently many empirical studies testthis assumption [7 54] Here Studentrsquos 119905 and a LambertW times Gaussian fit give different conclusions Since previousempirical studies often use Studentrsquos 119905 as a baseline [41] itmight be worthwhile to reexamine their findings in light ofheavy tail Lambert W times Gaussian distributions

623 ldquoGaussianizingrdquo Financial Returns The back-trans-formed x

120591MLEis indistinguishable from a Gaussian sample

(Figure 7(b)) and thus demonstrates that a Lambert W times

Gaussian distribution is indeed appropriate for x120591MLE

Noteven one test can reject Normality 119875-values are 018 018031 and 024 respectively (Anderson-Darling Cramer-von-Mises Shapiro-Francia Shapiro-Wilk see Thode [55])Table 3(b) confirms that Lambert W ldquoGaussianiziationrdquo was

successful 1205741(x120591) = minus0039 and 120574

2(x120591) = 293 are within the

typical variation for aGaussian sample of size119873 = 2780Thus

119884 = (119880119890(01722)119880

2

) 0705 + 0055 119880 =

119883 minus 0055

0705

119880 simN (0 1)

(36)

is an adequate (unconditional) LambertWtimesGaussianmodelfor the SampP 500 log-returns y For trading thismeans that theexpected return is significantly larger than zero (120583

119883= 0055 gt

0) and thus replicating certificates should be bought

624 Gaussian MLE for Gaussianized Data For 120575119897= 120575

119903equiv

120575 lt 1 also 120583119883equiv 120583

119884 We can therefore replace testing

120583119910= 0 versus 120583

119910= 0 for a non-Gaussian y with the

very well-understood hypothesis test 120583119909= 0 versus 120583

119909= 0

for the Gaussian x120591MLE

In particular standard errors basedon radic119873 and thus t and 119875-values should be closer to theldquotruthrdquo (Tables 4(c) and 4(d)) than a Gaussian MLE on thenon-Gaussian y (Table 4(e)) Table 4(f) shows that standarderrors for 120583x are even a bit too small compared to the heavy-tailed versions Since the ldquoGaussianizingrdquo transformation wasestimated treating x

120591MLEas if it was original data is too

optimistic regarding its Normality (recall the penalty (30) inthe total likelihood (28))

This example confirms that if a model and its theoret-ical properties are based on Normality but the observeddata is heavy-tailed then Gaussianizing the data first givesmore reliable inference than applying Gaussian methods tothe original heavy-tailed data (Figure 1) Clearly a jointestimation of the model parameters based on Lambert Wtimes Gaussian random variables (or any other heavy-taileddistribution) would be optimal However theoretical prop-erties and estimation techniques may not be available orwell understood The Lambert Way to Gaussianize data is

14 The Scientific World Journal

Table 4 MLE fits to SampP 500 y Tables 4(a) 4(b) 4(c) 4(d) and 4(e)and the Gaussianized data x

120591MLETable 4(f)

(a) Double-tail Lambert W times Gaussian = Tukeyrsquos ℎℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 366 000120590119883

071 0016 4400 000120575ℓ

019 0021 899 000120575119903

016 0019 824 000

(b) Skew 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 010 0061 165 010119904 067 0017 3847 000120572 minus008 0101 minus077 044] 373 0297 1257 000

(c) Lambert W times Gaussian = Tukeyrsquos ℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 365 0000120590119883

071 0016 4395 0000120575 017 0016 1105 0000

(d) Studentrsquos 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 006 0015 365 000119904 067 0017 3951 000] 372 0295 1261 000

(e) Gaussian (SampP 500)

Est se 119905 Pr (gt|119905|)120583119884

005 0018 255 001120590119884

095 0013 7457 000

(f) Gaussian (x120591MLE )

Est se 119905 Pr (gt|119905|)120583119909120591

005 0013 381 000120590119909120591

071 0009 7457 000

thus a pragmatic method to improve statistical inferenceon heavy-tailed data while preserving the applicability andinterpretation of Gaussian models

7 Discussion and Outlook

In this work I use the Lambert W function to modeland remove heavy tails from continuous random variablesusing a data-transformation approach For Gaussian randomvariables this not only contributes to existing work onTukeyrsquos ℎ distribution but also gives convincing empiricalresults unimodal data with heavy tails can be transformedto Gaussian data Properties of a Gaussian model MN onthe back-transformed data mimic the features of the ldquotruerdquoheavy-tailed modelM

119866very closely

Since Normality is the single most typical and oftenrequired assumption in many areas of statistics machine

learning and signal processing future research can takemany directions From a theoretical perspective properties ofLambert W times 119865

119883distributions viewed as a generalization of

alreadywell-known distributions119865119883can be studiedThis area

will profit from existing literature on the Lambert W func-tion which has been discovered only recently by the statisticscommunity Empirical work can focus on transforming thedata and comparing approximate Gaussian with joint heavytail analyses The comparisons in this work showed thatapproximate inference for Gaussianized data is comparablewith the direct heavy tail modeling and so provides a simpletool to improve inference for heavy-tailed data in statisticalpractice

I also provide the R package Lambert W publicly avail-able at CRAN to facilitate the use of heavy tail Lambert Wtimes 119865

119883distributions in practice

Disclosure

The first version of this article entitled ldquoClosed-form cdfand pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert Wapproach and how to bijectively lsquoGaussianizersquo heavy-taileddatardquo has appeared online [56]This research was performedwhile the author was at Carnegie Mellon University Cur-rently Georg M Goerg works at Google Inc New York NY10011 USA

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

References

[1] P J Brockwell andR ADavisTime SeriesTheory andMethodsSpringer Series in Statistics 1998

[2] R F Engle ldquoAutoregressive conditional heteroscedasticity withestimates of the variance of United Kingdom inflationrdquo Econo-metrica vol 50 no 4 pp 987ndash1007 1982

[3] C W J Granger and R Joyeux ldquoAn introduction to long-memory time series models and fractional differencingrdquo Jour-nal of Time Series Analysis vol 1 pp 15ndash30 2001

[4] C A Field ldquoUsing the gh distribution to model extreme windspeedsrdquo Journal of Statistical Planning and Inference vol 122 no1-2 pp 15ndash22 2004

[5] A Vazquez J G Oliveira Z Dezso K-I Goh I Kondor andA-L Barabasi ldquoModeling bursts and heavy tails in humandynamicsrdquo Physical Review E vol 73 no 3 Article ID 0361272006

[6] M Gidlund and N Debernardi ldquoScheduling performance ofheavy-tailed data traffic in wireless high-speed shared chan-nelsrdquo in Proceedings of the IEEE Wireless Communications ampNetworking Conference (WCNC rsquo09) pp 1818ndash1823 IEEE PressBudapest Hungary April 2009

[7] R Cont ldquoEmpirical properties of asset returns stylized factsand statistical issuesrdquo Quantitative Finance vol 1 pp 223ndash2362001

[8] T-H Kim and H White ldquoOn more robust estim ation ofskewness and kurtosis simulation and application to the SP500indexrdquo Tech Rep Department of Economics UCSD 2003

The Scientific World Journal 15

[9] T C Aysal and K E Barner ldquoSecond-order heavy-taileddistributions and tail analysisrdquo IEEE Transactions on SignalProcessing vol 54 no 7 pp 2827ndash2832 2006

[10] V K Smith ldquoLeast squares regression with Cauchy errorsrdquoOxford Bulletin of Economics and Statistics vol 35 no 3 pp223ndash231 1973

[11] J Ilow ldquoForecasting network traffic using FARIMAmodels withheavy tailed innovationsrdquo inProceedings of the IEEE InterntionalConference onAcoustics Speech and Signal Processing pp 3814ndash3817 June 2000

[12] W Palma and M Zevallos ldquoFitting non-Gaussian persistentdatardquo Applied Stochastic Models in Business and Industry vol27 no 1 pp 23ndash36 2011

[13] J Nowicka-Zagrajek and R Weron ldquoModeling electricity loadsin California ARMA models with hyperbolic noiserdquo SignalProcessing vol 82 no 12 pp 1903ndash1915 2002

[14] R J Adler R E Feldman and M S Taqqu Eds A PracticalGuide to Heavy Tails Statistical Techniques and ApplicationsBirkhauser Cambridge Mass USA 1998

[15] H Liu J Lafferty and L Wasserman ldquoThe nonparanormalsemiparametric estimation of high dimensional undirectedgraphsrdquo Journal ofMachine Learning Research vol 10 pp 2295ndash2328 2009

[16] G M Goerg ldquoLambert W random variablesmdasha new familyof generalized skewed distributions with applications to riskestimationrdquo The Annals of Applied Statistics vol 5 no 3 pp2197ndash2230 2011

[17] D C Hoaglin ldquoSummarizing shape numerically the g-and-hdistributionsrdquo in Exploring Data Tables Trends and Shapes DC Hoaglin F Mosteller and J W Tukey Eds chapter 11 pp461ndash513 John Wiley amp Sons Hoboken NJ USA 1985

[18] RDevelopment Core Team R A Language and Environment forStatistical Computing R Foundation for Statistical ComputingVienna Austria 2010

[19] C Field and M G Genton ldquoThe Multivariate g-and-h Distri-butionrdquo Technometrics vol 48 no 1 pp 104ndash111 2006

[20] RM Sakia ldquoThe Box-Cox transformation technique a reviewrdquoJournal of the Royal Statistical Society Series D The Statisticianvol 41 no 2 pp 169ndash178 1992

[21] J R Blaylock L E Salathe and R D Green ldquoA note on the Box-Cox transformation under heteroskedasticityrdquoWestern Journalof Agricultural Economics vol 5 no 2 pp 129ndash136 1980

[22] A J Lawrance ldquoA note on the variance of the Box-Cox regres-sion transformation estimaterdquo Journal of the Royal StatisticalSociety C vol 36 no 2 pp 221ndash223 1987

[23] G Tsiotas ldquoOn the use of the Box-Cox transformation onconditional variance modelsrdquo Finance Research Letters vol 4no 1 pp 28ndash32 2007

[24] S Goncalves andNMeddahi ldquoBox-Cox transforms for realizedvolatilityrdquo Journal of Econometrics vol 160 no 1 pp 129ndash1442011

[25] N H Bingham C M Goldie and J L Teugels RegularVariation Encyclopedia of Mathematics and its ApplicationsCambridge University Press Cambridge UK 1989

[26] S Morgenthaler and J W Tukey ldquoFitting quantiles doublingHRHQ andHHH distributionsrdquo Journal of Computational andGraphical Statistics vol 9 no 1 pp 180ndash195 2000

[27] M Fischer ldquoGeneralized Tukey-type distributions with appli-cation to financial and teletraffic datardquo Statistical Papers vol 51no 1 pp 41ndash56 2010

[28] T C Headrick R K Kowalchuk and Y Sheng ldquoParametricprobability densities and distribution functions for Tukey g-and-h transform ations and their use for fi tting datardquo AppliedMathematical Sciences vol 2 no 9 pp 449ndash462 2008

[29] R M Corless G H Gonnet D E Hare D J Jeffrey andD E Knuth ldquoOn the Lambert 119882 functionrdquo Advances inComputational Mathematics vol 5 no 4 pp 329ndash359 1996

[30] M Rosenlicht ldquoOn the explicit solvability of certain transcen-dental equationsrdquo Publications Mathematiques de lrsquoInstitut desHautes Etudes Scientifiques no 36 pp 15ndash22 1969

[31] S R Valluri D J Jeffrey and R M Corless ldquoSome applicationsof the Lambert W function to physicsrdquo Canadian Journal ofPhysics vol 78 no 9 pp 823ndash831 2000

[32] M Galassi J Davies J Theiler et al GNU Scientific LibraryReference Manual 3rd edition 2011 httpwwwgnuorgsoft-waregsl

[33] P Jodra ldquoA closed-form expression for the quantile functionof the Gompertz-Makeham distributionrdquo Mathematics andComputers in Simulation vol 79 no 10 pp 3069ndash3075 2009

[34] A G Pakes ldquoLambertrsquos W infinite divisibility and poissonmixturesrdquo Journal of Mathematical Analysis and Applicationsvol 378 no 2 pp 480ndash492 2011

[35] M Stehlık ldquoDistributions of exact tests in the exponentialfamilyrdquoMetrika vol 57 no 2 pp 145ndash164 2003

[36] J-N Hwang S-R Lay and A Lippman ldquoNonparametricmultivariate density estimation a comparative studyrdquo IEEETransactions on Signal Processing vol 42 no 10 pp 2795ndash28101994

[37] L Wasserman All of Nonparametric Statistics Springer Texts inStatistics 2007

[38] R E Maiboroda and N M Markovich ldquoEstimation of heavy-tailed probability density functionwith application toweb datardquoComputational Statistics vol 19 no 4 pp 569ndash592 2004

[39] N M Markovich ldquoAccuracy of transformed kernel densityestimates for a heavy-tailed distributionrdquo Automation andRemote Control vol 66 no 2 pp 217ndash232 2005

[40] K K Dutta and D Babbel ldquoOn measuring skewness andkurtosis in short rate distributions the case of the US dollarLondon inter bank offer ratesrdquo Tech Rep Wharton SchoolCenter for Financial Institutions University of Pennsylvania2002

[41] C S Wong W S Chan and P L Kam ldquoA Student t-mixtureautoregressivemodel with applications to heavy-tailed financialdatardquo Biometrika vol 96 no 3 pp 751ndash760 2009

[42] J Yan ldquoAsymmetry fat-tail and autoregressive conditionaldensity in financial return data with systems of frequencycurvesrdquo 2005 httpciteseerxistpsueduviewdocsummarydoi=1011762741

[43] G D Rayner and H L MacGillivray ldquoNumerical maximumlikelihood estimation for the g-and-k and generalized g-and-hdistributionsrdquo Statistics and Computing vol 12 no 1 pp 57ndash752002

[44] E L Lehmann and G Casella Theory of Point EstimationSpringer Texts in Statistics Springer New York NY USA 2ndedition 1998

[45] C Fernandez andM F Steel ldquoMultivariate Student-119905 regressionmodels pitfalls and inferencerdquo Biometrika vol 86 no 1 pp153ndash167 1999

[46] C Liu and D B Rubin ldquoML estimation of the t distributionusing EM and its extensions ECMand ECMErdquo Statistica Sinicavol 5 pp 19ndash39 1995

16 The Scientific World Journal

[47] M P Cohen ldquoSample means of independent standard CAUchyrandom variables are standard cauchy a new approachrdquo TheAmerican Mathematical Monthly vol 119 no 3 pp 240ndash2442012

[48] T Bollerslev ldquoGeneralized autoregressive conditional het-eroskedasticityrdquo Journal of Econometrics vol 31 no 3 pp 307ndash327 1986

[49] R Deo C Hurvich and Y Lu ldquoForecasting realized volatilityusing a long-memory stochastic volatility model estimationprediction and seasonal adjustmentrdquo Journal of Econometricsvol 131 no 1-2 pp 29ndash58 2006

[50] AMelino and SM Turnbull ldquoPricing foreign currency optionswith stochastic volatilityrdquo Journal of Econometrics vol 45 no 1-2 pp 239ndash265 1990

[51] A Azzalini and A Capitanio ldquoDistributions generated byperturbation of symmetrywith emphasis on amultivariate skewt-distributionrdquo Journal of the Royal Statistical Society Series BStatistical Methodology vol 65 no 2 pp 367ndash389 2003

[52] R N Mantegna and H E Stanley ldquoModeling of financial datacomparison of the truncated Levy flight and the ARCH(1) andGARCH(11) processesrdquo Physica A Statistical Mechanics and itsApplications vol 254 no 1-2 pp 77ndash84 1998

[53] P A Zadrozny Necessary and Sufficient Restrictions for Exis-tence of a Unique Fourth Moment of a Univariate GARCH(pq)Process vol 20 of Advances in Econometrics chapter 13Emerald Group Publishing 2005 httpsideasrepecorgpcesceswps 1505html

[54] R Huisman K G Koedijk C J Kool and F Palm ldquoTail-indexestimates in small samplesrdquo Journal of Business amp EconomicStatistics vol 19 no 2 pp 208ndash216 2001

[55] H CThode JrTesting for Normality CRCPress NewYorkNYUSA 2002

[56] G M Goerg ldquoClosed-form cdf and pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert W approach and howto bijectively ldquoGaussianizerdquo heavy-tailed datardquo Tech RepCarnegieMellonUniversity 2010 httparxiv-web3librarycor-nelleduabs10102265v1

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article The Lambert Way to Gaussianize Heavy

The Scientific World Journal 9

not unique in R (principal and nonprincipal branch) or doesnot have a real-valued solution in R if 1205751199112 lt 119890minus1)

(a) If

sum119899

119894=11199114

119894

sum119899

119894=11199112

119894

le 3 (34)

then 120575119872119871119864

= 0If (34) does not hold then

(b) 120575119872119871119864

gt 0 exists and is a positive solution to

119873

sum

119894=1

1199112

119894119882

1015840(120575119911

2

119894)(

1

2

119882120575(119911119894)2minus (

1

2

+

1

1 +119882(1205751199112

119894)

)) = 0 (35)

(c) There is only one such 120575 satisfying (35) that is 120575119872119871119864

isunique

Proof See Appendix B in the supplementary material

Condition (34) says that 120575MLE gt 0 only if the datais heavy-tailed enough Points (b) and (c) guarantee thatthere is no ambiguity in the heavy tail estimate This is anadvantage over Studentrsquos 119905-distribution for example whichhas numerical problems and local maxima for unknown (andsmall) ] [45 46] On the contrary 120575MLE is always a globalmaximum

Given the heavy tails in z one might expect convergenceissues for larger 120575 However 119882

120575(119885) sim N(0 1) for the true

120575 ge 0 and close to a standard Gaussian if 120575MLE asymp 120575 Since thelog-likelihood and its gradient depend on 120575 and z only via119882120575(z) the performance of theMLE should thus not get worse

for large 120575 as long as the initial estimate is close enough to thetruth Simulations in Section 5 support this conjecture evenfor 120579MLE

42 Iterative Generalized Method of Moments (IGMM) Adisadvantage of the MLE is the mandatory a priori specifi-cation of the input distribution Especially for heavy-taileddata the eye is a bad judge to choose a particular parametric119891119883(119909 | 120573) It would be useful to directly estimate 120591 without

the intermediate step of estimating 120579 firstGoerg [16] presented an estimator for 120591 based on iterative

generalized methods of moments (IGMM) The idea ofIGMM is to find a 120591 such that the back-transformed datax120591has desired properties for example being symmetric or

having kurtosis 3 An estimator for 120583119883 120590

119883 and 120575 can be

constructed entirely analogous to the IGMMestimator for theskewed LambertWtimes F case See the SupplementaryMaterialAppendix C for details

IGMM requires less specific knowledge about the inputdistribution and is usually also faster than the MLE Once120591IGMM has been obtained the back-transformed x

120591IGMMcan be

used to check if 119883 has characteristics of a known parametricdistribution 119865

119883(119909 | 120573) It must be noted though that testing

for a particular distribution 119865119883is too optimistic as x

120591has

ldquonicerrdquo properties regarding 119865119883than the true x would have

However estimating the transformation requires only threeparameters and for a large enough sample losing threedegrees of freedom should not matter in practice

5 Simulations

This section explores finite sample properties of estimatorsfor 120579 = (120583

119883 120590119883 120575) and (120583

119884 120590119884) under Gaussian input

119883 sim N(120583119883 120590

2

119883) In particular it compares Gaussian MLE

(estimation of 120583119884and 120590

119884only) IGMM and Lambert W times

Gaussian MLE and for a heavy tail competitor the median(For IGMM optimization was restricted to 120575 isin [0 10]) Allresults below are based on 119899 = 1 000 replications

51 Estimating 120575 Only Here I show finite sample propertiesof 120575MLE for119880 simN(0 1) where 120583

119883= 0 and 120590

119883= 1 are known

and fixed Theorem 9 shows that 120575MLE is unique either at theboundary 120575 = 0 or at the globally optimal solution to (35)Results in Table 1 were obtained by numerical optimizationrestricted to 120575 ge 0 (hArr log 120575 isin R) using the nlm function in119877

Table 1 suggests that the MLE is asymptotically unbiasedfor every 120575 and converges quickly (at about 119873 = 400) to itsasymptotic variance which is increasing with 120575 Assuming120583119883and 120590

119883to be known is unrealistic and thus these finite

sample properties are only an indication of the behavior of thejoint MLE 120579MLE Nevertheless they are very remarkable forextremely heavy-tailed data (120575 gt 1) where classic average-based methods typically break down One reason lies in theparticular form of the likelihood (32) and its gradient (35)(Theorem 9) although both depend on z they only do sothrough 119882

120575(z) = u sim N(0 1) Hence as long as 120575MLE is

sufficiently close to the true 120575 (32) and (35) are functions ofalmost Gaussian random variables and standard asymptoticresults should still apply

52 Estimating All Parameters Jointly Here we consider therealistic scenario where 120583

119883and 120590

119883are also unknown We

consider various sample sizes (119873 = 50 100 and 1000) anddifferent degrees of heavy tails 120575 isin 0 13 1 15 each onerepresenting a particularly interesting situation (i) Gaussiandata (does additional superfluous estimation of120575 affect otherestimates) (ii) fourthmoments do not exist (iii) nonexistingmean and (iv) extremely heavy-tailed data (can we get usefulestimates at all)

The convergence tolerance for IGMM was set to tol =122 sdot 10

minus4 Table 2 summarizes the simulationThe Gaussian MLE estimates 120590

119884directly while IGMM

and the Lambert W times Gaussian MLE estimate 120575 and120590119883 which implicitly give

119884through 120590

119884(120575 120590

119883) = 120590

119883sdot

(1radic(1 minus 2120575)32) if 120575 lt 12 (see (18)) For a fair comparison

each subtable also includes a column for 119884

= 119883sdot

(1radic(1 minus 2120575)32) Some of these entries contain ldquoinfinrdquo even for

120575 lt 12 this occurs if at least one 120575 ge 12 For any 120575 lt 1120583119883= 120583

119884 thus 120583

119883and 120583

119884can be compared directly For 120575 ge 1

10 The Scientific World Journal

Table 1 Finite sample properties of 120575MLE For each119873 120575 was estimated 119899 = 1000 times from a random sample zsimTukeyrsquos ℎ The left columnfor each 120575 shows bias 120575MLE minus 120575 each right column shows the root mean square error (RMSE) timesradic119873

119873 120575 = 0 120575 = 110 120575 = 13 120575 = 12

10 0025 0191 minus0017 0394 minus0042 0915 minus0082 1167

50 0013 0187 minus0010 0492 minus0018 0931 minus0016 1156

100 0010 0200 minus0010 0513 minus0009 0914 minus0006 1225

400 0005 0186 minus0003 0528 0000 0927 minus0004 1211

1000 0003 0197 0000 0532 minus0001 0928 minus0001 1203

2000 0003 0217 minus0001 0523 0000 0935 minus0001 1130

119873 120575 = 1 120575 = 2 120575 = 5

10 minus0054 1987 minus0104 3384 minus0050 7601

50 minus0017 1948 minus0009 3529 0014 7942

100 minus0014 2024 minus0001 3294 0011 7798

400 0001 1919 minus0002 3433 0001 7855

1000 0001 1955 0001 3553 minus0001 7409

2000 0001 1896 0000 3508 minus0001 7578

the mean does not exist each subtable for these 120575 interprets120583119884as the median

Gaussian Data (120575 = 0) This setting checks if imposing theLambert W framework even though it is superfluous causesa quality loss in the estimation of 120583

119884= 120583

119883or 120590

119884= 120590

119883

Furthermore critical values for 1198670 120575 = 0 (Gaussian) can

be obtained As in the 120575-only case above Table 2(a) suggeststhat estimators are asymptotically unbiased and quickly tendto a large-sample variance Additional estimation of 120575 doesnot affect the efficiency of120583

119883compared to estimating solely120583

Estimating 120590119884directly by Gaussian MLE does not give better

results than the Lambert W times Gaussian MLE

No Fourth Moment (120575 = 13) Here 120590119884(120575 120590

119883= 1) =

228 but fourth moments do not exist This results in anincreasing empirical standard deviation of

119910as119873 grows In

contrast estimates for 120590119883are not drifting off In the presence

of these large heavy tails themedian ismuch less variable thanGaussianMLE and IGMM Yet Lambert W timesGaussianMLEfor 120583

119883even outperforms the median

Nonexisting Mean (120575 = 1) Both sample moments divergeand their standard errors are also growing quickly Themedian still provides a very good estimate for the locationbut is again inferior to both Lambert W estimators whichare closer to the true values and appear to converge to anasymptotic variance at rateradic119873

Extreme Heavy Tails (120575 = 15) As in Section 51 IGMMand Lambert WMLE continue to provide excellent estimateseven though the data is extremely heavy tailed MoreoverLambert W MLE also has the smallest empirical standarddeviation overall In particular the Lambert W MLE for 120583

119883

has an approximately 15 lower standard deviation than themedian

The last column shows that for some 119873 about 1 of the119899 = 1 000 simulations generated invalid likelihood values(NA and infin) Here the search for the optimal 120575 led into

regions with a numerical overflow in the evaluation of119882120575(119911)

For a comparable summary these few cases were omitted andnew simulations added until full 119899 = 1 000 finite estimateswere found Since this only happened in 1 of the cases andalso such heavy-tailed data is rarely encountered in practicethis numerical issue is not a limitation in statistical practice

53 Discussion of the Simulations IGMM performs wellindependent of the magnitude of 120575 As expected the LambertWMLE for 120579 has the best properties it can recover the truthfor all 120575 and for 120575 = 0 it performs as well as the classicsample mean and standard deviation For small 120575 it has thesame empirical standard deviation as the Gaussian MLE buta lower one than the median for large 120575

Hence the only advantage of estimating 120583119884and 120590

119884by

sample moments of y is speed otherwise the Lambert W times

Gaussian MLE is at least as good as the Gaussian MLE andclearly outperforms it in presence of heavy tails

6 Applications

Tukeyrsquos ℎ distribution has already proven useful to modelheavy-tailed data but parametric inference was limited toquantile fitting ormethods ofmoments estimation [4 27 28]Theorem 7 allows us to estimate 120579 by ML

This section applies the presented methodology on sim-ulated as well as real world data (i) Section 61 demonstratesGaussianizing on the Cauchy sample from the Introductionand (ii) Section 62 shows that heavy tail Lambert W times

Gaussian distributions provide an excellent fit to daily SampP500 log-return series

61 Estimating Location of a Cauchy with the Sample MeanIt is well known that the sample mean y is a poor estimateof the location parameter of a Cauchy distribution since thesampling distribution of y is again a Cauchy (see [47] for arecent overview) in particular its variance does not go to 0for119873 rarr infin

The Scientific World Journal 11

Table 2 In each subtable (first rows) average (middle rows) proportion of estimates below truth and (bottom rows) empirical standarddeviation timesradic119873 True 120583

119883= 0 and 120590

119883= 1

(a) Truly Gaussian data 120575 = 0

120575 = 0 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 098 000 097 002 099 000 096 002 098 0

100 000 000 099 000 098 001 100 000 097 001 099 0

1000 000 000 100 000 099 000 100 000 099 000 100 0

50 050 050 057 051 060 066 054 051 065 066 056 0

100 050 051 056 051 062 062 053 052 065 062 056 0

1000 050 049 052 049 062 056 052 049 063 056 052 0

50 124 101 072 101 076 021 073 102 078 026 072 0

100 125 102 070 102 076 023 070 103 078 026 070 0

1000 126 098 073 098 079 022 073 098 079 022 073 0

(b) No fourth moments 120575 = 13

120575 = 13 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 198 000 107 029 infin 000 101 033 infin 0

100 000 000 203 000 104 031 infin 000 100 033 infin 0

1000 000 000 218 000 100 033 234 000 100 033 234 0

50 050 051 078 050 038 063 060 050 052 054 054 0

100 050 051 078 051 042 061 060 050 051 054 054 0

1000 048 051 077 051 047 056 055 051 050 053 052 0

50 127 221 656 144 145 110 NA 123 135 114 NA 0

100 130 233 1128 143 142 112 NA 119 134 109 NA 0

1000 123 225 1676 139 145 120 1597 117 133 108 1230 0

(c) Non-existing mean 120575 = 1

120575 = 1 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 minus010 246 minus001 118 090 infin 000 101 099 infin 0

100 000 074 724 000 109 095 infin 000 101 099 infin 0

1000 000 384 3481 000 101 100 infin 000 100 100 infin 0

50 053 052 10 051 034 065 1 051 052 052 1 0

100 050 052 10 051 038 063 1 050 053 053 1 0

1000 049 052 10 051 048 053 1 049 051 051 1 0

50 127 6585 4243 210 250 232 NA 119 170 216 NA 0

100 130 41075 40502 201 228 259 NA 117 174 225 NA 0

1000 126 330758 1040527 193 221 281 NA 111 164 218 NA 0

(d) Extreme heavy tails 120575 = 15

120575 = 15 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 minus002 684 309 minus002 123 137 infin minus001 100 149 infin 001

100 000 minus5116 3080 minus001 112 144 infin 000 101 150 infin 000

1000 000 17613 14251 000 101 149 infin 000 100 150 infin 000

50 053 048 1 051 034 064 1 053 053 054 1 001

100 051 053 1 054 037 061 1 052 051 051 1 000

1000 050 050 1 050 047 054 1 049 053 052 1 000

50 132 134771 9261 257 320 312 NA 115 186 276 NA 001

100 133 4215628 418435 239 287 344 NA 112 178 284 NA 000

1000 126 12446282 3903629 218 266 367 NA 111 180 285 NA 000

12 The Scientific World Journal

Table 3 Summary statistics for observed (heavy-tailed) y and back-transformed (Gaussianized) data x120591MLE

lowastlowast stands for lt10minus16and lowast forlt22 sdot 10

minus16

y sim C(0 1)

(Section 61)y = SampP 500(Section 62)

y x120591

x

y x120591

Min minus16159 minus316 0 minus711 minus242Max 95295 381 3318 499 223Mean 230 003 1498 005 005Median 004 004 1496 004 004Standard Deviation 46980 106 120 095 071Skewness 1743 012 390 minus030 minus004Kurtosis 34334 321 16175 770 293Shapiro-Wilk lowast 071 lowastlowast lowast 024Anderson-Darling lowastlowast 051 lowastlowast lowast 018

Heavy-tailed Lambert W times Gaussian distributions havesimilar properties to a Cauchy for 120575 asymp 1 The mean of119883 (120583

119883)

equals the location of119884 (119888) due to symmetry around 120583119883and 119888

respectively Thus we can estimate 120591 from the Cauchy sampley transform y to x

120591 get 120583

119883from x

120591= 119882

120591(y) and thus obtain

an estimate for 119888 by 119888 = 120583119883

The random sample y sim C(0 1) with pdf 119891(119910) = 1120587(1+1199102) in Figure 2(a) has heavy tails with two extreme (positive)

observations A Cauchy ML fit gives 119888 = 003(0055) and119904 = 086(0053) (standard errors in parenthesis) A LambertWtimesGaussianMLEgives120583

119883= 003(0055)120590

119883= 105(0072)

and 120575 = 086(0082) Thus both fits correctly fail to reject120583119883= 119888 = 0 Table 3(a) shows summary statistics on both

samples Since the Cauchy distribution does not have a well-defined mean y = 2304(2101) is not meaningful Howeverx120591MLE

is approximately Gaussian and we can use the sampleaverage for inference x

120591MLE= 0033(00472) correctly fails to

reject a zero location for y The transformed x120591MLE

featuresadditional Gaussian characteristics (symmetric no excesskurtosis) and even the null hypothesis of Normality cannotbe rejected (119875-valuege 05) Note however that Normality forthe transformed data is only an empirical approximation therandom variable119882

120591((119884 minus 120583

119883)120590

119883) where 119884 is Cauchy does

not have a Normal distributionFigure 2(d) shows the cumulative sample average for

the original sample and its Gaussianized version For afair comparison 120591(119899)MLE was reestimated cumulatively for each119899 = 5 500 and then used to compute (119909

1 119909

119899) The

transformation works extremely well data point 11991049is highly

influential for y but has no relevant effect on x120591(119899)

MLE Even for

small 119899 it is already clear that the location of the underlyingCauchy distribution is approximately zero

Although it is a simulated example it demonstrates thatremoving (strong) heavy tails from data works well andprovides ldquonicerdquo data that can then be analyzed with morerefined Gaussian methods

62 Heavy Tails in Finance SampP 500 Case Study A lot offinancial data displays negative skewness and excess kurto-sis Since financial data in general is not iid it is oftenmodeled with a (skew) Studentrsquos 119905-distribution underlyinga (generalized) autoregressive conditional heteroskedastic(GARCH) [2 48] or a stochastic volatility (SV) [49 50]model Using the Lambert W approach we can build uponthe knowledge and implications of Normality (and avoidderiving properties of a GARCH or SV model with heavy-tailed innovations) and simply ldquoGaussianizerdquo the returnsbefore fitting more complex GARCH or SV models

Remark 10 Time series models with Lambert W times Gaussianwhite noise are far beyond the scope of this work but canbe a direction of future research Here I only consider theunconditional distribution

Figure 7(a) shows the SampP 500 log-returns with a totalof 119873 = 2 780 daily observations (R package MASSdataset SP500) Table 3(b) confirms the heavy tails (samplekurtosis 770) but also indicates negative skewness (minus0296)As the sample skewness 120574

1(y) is very sensitive to outliers

we fit a distribution which allows skewness and test forsymmetry In case of the double-tail Lambert W times Gaussianthis means testing 119867

0 120575

ℓ= 120575

119903= 120575 versus 119867

1

120575ℓ

= 120575119903 Using the likelihood expression in (28) we can

use a likelihood ratio test with one degree of freedom (3versus 4 parameters) The log-likelihood of the double-tail fit(Table 4(a)) equals minus36060 = minus297227 + (minus63373) (inputlog-likelihood + penalty) while the symmetric 120575 fit givesminus360656 = minus297147 + (minus63509) Here the symmetric fitgives a transformed sample that is more Gaussian but it paysa greater penalty for transforming the data Comparing twicetheir difference to a1205942

1distribution gives a119875-value of 029 For

comparison a skew-119905 fit [51] with location 119888 scale 119904 shape120572 and ] degrees of freedom also yields (Function stmle

in the R package sn) a nonsignificant (Table 4(b)) Thusboth fits cannot reject symmetry

The Scientific World Journal 13

Time

y

0 500 1500 2500

minus6

minus2

2

0 5 2010 3000

04

08

Lag

ACF

y

minus5 0 500

03

06

KernelGaussian

minus3 minus2 minus1 0 1 2 3

minus6

minus2

2

Theoretical quantiles

Sam

ple q

uant

ilesMean 005

SD 095Skewminus03Kurt 771

(a) Observed heavy tail returns y

Time

x

0 500 1500 2500

minus2

012

0 5 10 20 3000

04

08

Lag

ACF

x

00

03

06

minus3 minus2 minus1 0 1 2 3minus3 minus2 minus1 0 1 2 3

minus2

012

Theoretical quantiles

KernelGaussian

Sam

ple q

uant

iles

Mean 005SD 07Skewminus004Kurt 293

(b) Gaussianized returns x120591MLE

Figure 7 Lambert W Gaussianization of SampP 500 log-returns 120591 = (005 070 017) In (a) and (b) data (top left) autocorrelation function(ACF) (top right) histogram Gaussian fit and KDE (bottom left) Normal QQ plot (bottom right)

Assume we have to make a decision if we should trade acertificate replicating the SampP 500 Since we can either buyor sell it is not important if the average return is positive ornegative as long as it is significantly different from zero

621 Gaussian Fit to Returns Estimating (120583119884 120590119884) by Gaus-

sianMLE and thus ignoring the heavy tails 120583119884= 0 cannot be

rejected on a 120572 = 1 level (Table 4(e)) Ignoring heavy tailswe would thus decide to not trade a replicating certificate atall

622 Heavy Tail Fit to Returns Both a heavy tail LambertWtimes Gaussian (Table 4(c)) and Student 119905-fit (Table 4(d)) rejectthe zero mean null (119875-values 10minus4 and 3 sdot 10minus5 resp)

Location and scale estimates are almost identical buttail estimates lead to different conclusions while for ] =

371 only moments up to order 3 exist in the Lambert W times

Gaussian case moments up to order 5 exist (10172 = 581)This is especially noteworthy as many theoretical results inthe (financial) time series literature rely on finite fourthmoments [52 53] consequently many empirical studies testthis assumption [7 54] Here Studentrsquos 119905 and a LambertW times Gaussian fit give different conclusions Since previousempirical studies often use Studentrsquos 119905 as a baseline [41] itmight be worthwhile to reexamine their findings in light ofheavy tail Lambert W times Gaussian distributions

623 ldquoGaussianizingrdquo Financial Returns The back-trans-formed x

120591MLEis indistinguishable from a Gaussian sample

(Figure 7(b)) and thus demonstrates that a Lambert W times

Gaussian distribution is indeed appropriate for x120591MLE

Noteven one test can reject Normality 119875-values are 018 018031 and 024 respectively (Anderson-Darling Cramer-von-Mises Shapiro-Francia Shapiro-Wilk see Thode [55])Table 3(b) confirms that Lambert W ldquoGaussianiziationrdquo was

successful 1205741(x120591) = minus0039 and 120574

2(x120591) = 293 are within the

typical variation for aGaussian sample of size119873 = 2780Thus

119884 = (119880119890(01722)119880

2

) 0705 + 0055 119880 =

119883 minus 0055

0705

119880 simN (0 1)

(36)

is an adequate (unconditional) LambertWtimesGaussianmodelfor the SampP 500 log-returns y For trading thismeans that theexpected return is significantly larger than zero (120583

119883= 0055 gt

0) and thus replicating certificates should be bought

624 Gaussian MLE for Gaussianized Data For 120575119897= 120575

119903equiv

120575 lt 1 also 120583119883equiv 120583

119884 We can therefore replace testing

120583119910= 0 versus 120583

119910= 0 for a non-Gaussian y with the

very well-understood hypothesis test 120583119909= 0 versus 120583

119909= 0

for the Gaussian x120591MLE

In particular standard errors basedon radic119873 and thus t and 119875-values should be closer to theldquotruthrdquo (Tables 4(c) and 4(d)) than a Gaussian MLE on thenon-Gaussian y (Table 4(e)) Table 4(f) shows that standarderrors for 120583x are even a bit too small compared to the heavy-tailed versions Since the ldquoGaussianizingrdquo transformation wasestimated treating x

120591MLEas if it was original data is too

optimistic regarding its Normality (recall the penalty (30) inthe total likelihood (28))

This example confirms that if a model and its theoret-ical properties are based on Normality but the observeddata is heavy-tailed then Gaussianizing the data first givesmore reliable inference than applying Gaussian methods tothe original heavy-tailed data (Figure 1) Clearly a jointestimation of the model parameters based on Lambert Wtimes Gaussian random variables (or any other heavy-taileddistribution) would be optimal However theoretical prop-erties and estimation techniques may not be available orwell understood The Lambert Way to Gaussianize data is

14 The Scientific World Journal

Table 4 MLE fits to SampP 500 y Tables 4(a) 4(b) 4(c) 4(d) and 4(e)and the Gaussianized data x

120591MLETable 4(f)

(a) Double-tail Lambert W times Gaussian = Tukeyrsquos ℎℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 366 000120590119883

071 0016 4400 000120575ℓ

019 0021 899 000120575119903

016 0019 824 000

(b) Skew 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 010 0061 165 010119904 067 0017 3847 000120572 minus008 0101 minus077 044] 373 0297 1257 000

(c) Lambert W times Gaussian = Tukeyrsquos ℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 365 0000120590119883

071 0016 4395 0000120575 017 0016 1105 0000

(d) Studentrsquos 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 006 0015 365 000119904 067 0017 3951 000] 372 0295 1261 000

(e) Gaussian (SampP 500)

Est se 119905 Pr (gt|119905|)120583119884

005 0018 255 001120590119884

095 0013 7457 000

(f) Gaussian (x120591MLE )

Est se 119905 Pr (gt|119905|)120583119909120591

005 0013 381 000120590119909120591

071 0009 7457 000

thus a pragmatic method to improve statistical inferenceon heavy-tailed data while preserving the applicability andinterpretation of Gaussian models

7 Discussion and Outlook

In this work I use the Lambert W function to modeland remove heavy tails from continuous random variablesusing a data-transformation approach For Gaussian randomvariables this not only contributes to existing work onTukeyrsquos ℎ distribution but also gives convincing empiricalresults unimodal data with heavy tails can be transformedto Gaussian data Properties of a Gaussian model MN onthe back-transformed data mimic the features of the ldquotruerdquoheavy-tailed modelM

119866very closely

Since Normality is the single most typical and oftenrequired assumption in many areas of statistics machine

learning and signal processing future research can takemany directions From a theoretical perspective properties ofLambert W times 119865

119883distributions viewed as a generalization of

alreadywell-known distributions119865119883can be studiedThis area

will profit from existing literature on the Lambert W func-tion which has been discovered only recently by the statisticscommunity Empirical work can focus on transforming thedata and comparing approximate Gaussian with joint heavytail analyses The comparisons in this work showed thatapproximate inference for Gaussianized data is comparablewith the direct heavy tail modeling and so provides a simpletool to improve inference for heavy-tailed data in statisticalpractice

I also provide the R package Lambert W publicly avail-able at CRAN to facilitate the use of heavy tail Lambert Wtimes 119865

119883distributions in practice

Disclosure

The first version of this article entitled ldquoClosed-form cdfand pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert Wapproach and how to bijectively lsquoGaussianizersquo heavy-taileddatardquo has appeared online [56]This research was performedwhile the author was at Carnegie Mellon University Cur-rently Georg M Goerg works at Google Inc New York NY10011 USA

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

References

[1] P J Brockwell andR ADavisTime SeriesTheory andMethodsSpringer Series in Statistics 1998

[2] R F Engle ldquoAutoregressive conditional heteroscedasticity withestimates of the variance of United Kingdom inflationrdquo Econo-metrica vol 50 no 4 pp 987ndash1007 1982

[3] C W J Granger and R Joyeux ldquoAn introduction to long-memory time series models and fractional differencingrdquo Jour-nal of Time Series Analysis vol 1 pp 15ndash30 2001

[4] C A Field ldquoUsing the gh distribution to model extreme windspeedsrdquo Journal of Statistical Planning and Inference vol 122 no1-2 pp 15ndash22 2004

[5] A Vazquez J G Oliveira Z Dezso K-I Goh I Kondor andA-L Barabasi ldquoModeling bursts and heavy tails in humandynamicsrdquo Physical Review E vol 73 no 3 Article ID 0361272006

[6] M Gidlund and N Debernardi ldquoScheduling performance ofheavy-tailed data traffic in wireless high-speed shared chan-nelsrdquo in Proceedings of the IEEE Wireless Communications ampNetworking Conference (WCNC rsquo09) pp 1818ndash1823 IEEE PressBudapest Hungary April 2009

[7] R Cont ldquoEmpirical properties of asset returns stylized factsand statistical issuesrdquo Quantitative Finance vol 1 pp 223ndash2362001

[8] T-H Kim and H White ldquoOn more robust estim ation ofskewness and kurtosis simulation and application to the SP500indexrdquo Tech Rep Department of Economics UCSD 2003

The Scientific World Journal 15

[9] T C Aysal and K E Barner ldquoSecond-order heavy-taileddistributions and tail analysisrdquo IEEE Transactions on SignalProcessing vol 54 no 7 pp 2827ndash2832 2006

[10] V K Smith ldquoLeast squares regression with Cauchy errorsrdquoOxford Bulletin of Economics and Statistics vol 35 no 3 pp223ndash231 1973

[11] J Ilow ldquoForecasting network traffic using FARIMAmodels withheavy tailed innovationsrdquo inProceedings of the IEEE InterntionalConference onAcoustics Speech and Signal Processing pp 3814ndash3817 June 2000

[12] W Palma and M Zevallos ldquoFitting non-Gaussian persistentdatardquo Applied Stochastic Models in Business and Industry vol27 no 1 pp 23ndash36 2011

[13] J Nowicka-Zagrajek and R Weron ldquoModeling electricity loadsin California ARMA models with hyperbolic noiserdquo SignalProcessing vol 82 no 12 pp 1903ndash1915 2002

[14] R J Adler R E Feldman and M S Taqqu Eds A PracticalGuide to Heavy Tails Statistical Techniques and ApplicationsBirkhauser Cambridge Mass USA 1998

[15] H Liu J Lafferty and L Wasserman ldquoThe nonparanormalsemiparametric estimation of high dimensional undirectedgraphsrdquo Journal ofMachine Learning Research vol 10 pp 2295ndash2328 2009

[16] G M Goerg ldquoLambert W random variablesmdasha new familyof generalized skewed distributions with applications to riskestimationrdquo The Annals of Applied Statistics vol 5 no 3 pp2197ndash2230 2011

[17] D C Hoaglin ldquoSummarizing shape numerically the g-and-hdistributionsrdquo in Exploring Data Tables Trends and Shapes DC Hoaglin F Mosteller and J W Tukey Eds chapter 11 pp461ndash513 John Wiley amp Sons Hoboken NJ USA 1985

[18] RDevelopment Core Team R A Language and Environment forStatistical Computing R Foundation for Statistical ComputingVienna Austria 2010

[19] C Field and M G Genton ldquoThe Multivariate g-and-h Distri-butionrdquo Technometrics vol 48 no 1 pp 104ndash111 2006

[20] RM Sakia ldquoThe Box-Cox transformation technique a reviewrdquoJournal of the Royal Statistical Society Series D The Statisticianvol 41 no 2 pp 169ndash178 1992

[21] J R Blaylock L E Salathe and R D Green ldquoA note on the Box-Cox transformation under heteroskedasticityrdquoWestern Journalof Agricultural Economics vol 5 no 2 pp 129ndash136 1980

[22] A J Lawrance ldquoA note on the variance of the Box-Cox regres-sion transformation estimaterdquo Journal of the Royal StatisticalSociety C vol 36 no 2 pp 221ndash223 1987

[23] G Tsiotas ldquoOn the use of the Box-Cox transformation onconditional variance modelsrdquo Finance Research Letters vol 4no 1 pp 28ndash32 2007

[24] S Goncalves andNMeddahi ldquoBox-Cox transforms for realizedvolatilityrdquo Journal of Econometrics vol 160 no 1 pp 129ndash1442011

[25] N H Bingham C M Goldie and J L Teugels RegularVariation Encyclopedia of Mathematics and its ApplicationsCambridge University Press Cambridge UK 1989

[26] S Morgenthaler and J W Tukey ldquoFitting quantiles doublingHRHQ andHHH distributionsrdquo Journal of Computational andGraphical Statistics vol 9 no 1 pp 180ndash195 2000

[27] M Fischer ldquoGeneralized Tukey-type distributions with appli-cation to financial and teletraffic datardquo Statistical Papers vol 51no 1 pp 41ndash56 2010

[28] T C Headrick R K Kowalchuk and Y Sheng ldquoParametricprobability densities and distribution functions for Tukey g-and-h transform ations and their use for fi tting datardquo AppliedMathematical Sciences vol 2 no 9 pp 449ndash462 2008

[29] R M Corless G H Gonnet D E Hare D J Jeffrey andD E Knuth ldquoOn the Lambert 119882 functionrdquo Advances inComputational Mathematics vol 5 no 4 pp 329ndash359 1996

[30] M Rosenlicht ldquoOn the explicit solvability of certain transcen-dental equationsrdquo Publications Mathematiques de lrsquoInstitut desHautes Etudes Scientifiques no 36 pp 15ndash22 1969

[31] S R Valluri D J Jeffrey and R M Corless ldquoSome applicationsof the Lambert W function to physicsrdquo Canadian Journal ofPhysics vol 78 no 9 pp 823ndash831 2000

[32] M Galassi J Davies J Theiler et al GNU Scientific LibraryReference Manual 3rd edition 2011 httpwwwgnuorgsoft-waregsl

[33] P Jodra ldquoA closed-form expression for the quantile functionof the Gompertz-Makeham distributionrdquo Mathematics andComputers in Simulation vol 79 no 10 pp 3069ndash3075 2009

[34] A G Pakes ldquoLambertrsquos W infinite divisibility and poissonmixturesrdquo Journal of Mathematical Analysis and Applicationsvol 378 no 2 pp 480ndash492 2011

[35] M Stehlık ldquoDistributions of exact tests in the exponentialfamilyrdquoMetrika vol 57 no 2 pp 145ndash164 2003

[36] J-N Hwang S-R Lay and A Lippman ldquoNonparametricmultivariate density estimation a comparative studyrdquo IEEETransactions on Signal Processing vol 42 no 10 pp 2795ndash28101994

[37] L Wasserman All of Nonparametric Statistics Springer Texts inStatistics 2007

[38] R E Maiboroda and N M Markovich ldquoEstimation of heavy-tailed probability density functionwith application toweb datardquoComputational Statistics vol 19 no 4 pp 569ndash592 2004

[39] N M Markovich ldquoAccuracy of transformed kernel densityestimates for a heavy-tailed distributionrdquo Automation andRemote Control vol 66 no 2 pp 217ndash232 2005

[40] K K Dutta and D Babbel ldquoOn measuring skewness andkurtosis in short rate distributions the case of the US dollarLondon inter bank offer ratesrdquo Tech Rep Wharton SchoolCenter for Financial Institutions University of Pennsylvania2002

[41] C S Wong W S Chan and P L Kam ldquoA Student t-mixtureautoregressivemodel with applications to heavy-tailed financialdatardquo Biometrika vol 96 no 3 pp 751ndash760 2009

[42] J Yan ldquoAsymmetry fat-tail and autoregressive conditionaldensity in financial return data with systems of frequencycurvesrdquo 2005 httpciteseerxistpsueduviewdocsummarydoi=1011762741

[43] G D Rayner and H L MacGillivray ldquoNumerical maximumlikelihood estimation for the g-and-k and generalized g-and-hdistributionsrdquo Statistics and Computing vol 12 no 1 pp 57ndash752002

[44] E L Lehmann and G Casella Theory of Point EstimationSpringer Texts in Statistics Springer New York NY USA 2ndedition 1998

[45] C Fernandez andM F Steel ldquoMultivariate Student-119905 regressionmodels pitfalls and inferencerdquo Biometrika vol 86 no 1 pp153ndash167 1999

[46] C Liu and D B Rubin ldquoML estimation of the t distributionusing EM and its extensions ECMand ECMErdquo Statistica Sinicavol 5 pp 19ndash39 1995

16 The Scientific World Journal

[47] M P Cohen ldquoSample means of independent standard CAUchyrandom variables are standard cauchy a new approachrdquo TheAmerican Mathematical Monthly vol 119 no 3 pp 240ndash2442012

[48] T Bollerslev ldquoGeneralized autoregressive conditional het-eroskedasticityrdquo Journal of Econometrics vol 31 no 3 pp 307ndash327 1986

[49] R Deo C Hurvich and Y Lu ldquoForecasting realized volatilityusing a long-memory stochastic volatility model estimationprediction and seasonal adjustmentrdquo Journal of Econometricsvol 131 no 1-2 pp 29ndash58 2006

[50] AMelino and SM Turnbull ldquoPricing foreign currency optionswith stochastic volatilityrdquo Journal of Econometrics vol 45 no 1-2 pp 239ndash265 1990

[51] A Azzalini and A Capitanio ldquoDistributions generated byperturbation of symmetrywith emphasis on amultivariate skewt-distributionrdquo Journal of the Royal Statistical Society Series BStatistical Methodology vol 65 no 2 pp 367ndash389 2003

[52] R N Mantegna and H E Stanley ldquoModeling of financial datacomparison of the truncated Levy flight and the ARCH(1) andGARCH(11) processesrdquo Physica A Statistical Mechanics and itsApplications vol 254 no 1-2 pp 77ndash84 1998

[53] P A Zadrozny Necessary and Sufficient Restrictions for Exis-tence of a Unique Fourth Moment of a Univariate GARCH(pq)Process vol 20 of Advances in Econometrics chapter 13Emerald Group Publishing 2005 httpsideasrepecorgpcesceswps 1505html

[54] R Huisman K G Koedijk C J Kool and F Palm ldquoTail-indexestimates in small samplesrdquo Journal of Business amp EconomicStatistics vol 19 no 2 pp 208ndash216 2001

[55] H CThode JrTesting for Normality CRCPress NewYorkNYUSA 2002

[56] G M Goerg ldquoClosed-form cdf and pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert W approach and howto bijectively ldquoGaussianizerdquo heavy-tailed datardquo Tech RepCarnegieMellonUniversity 2010 httparxiv-web3librarycor-nelleduabs10102265v1

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article The Lambert Way to Gaussianize Heavy

10 The Scientific World Journal

Table 1 Finite sample properties of 120575MLE For each119873 120575 was estimated 119899 = 1000 times from a random sample zsimTukeyrsquos ℎ The left columnfor each 120575 shows bias 120575MLE minus 120575 each right column shows the root mean square error (RMSE) timesradic119873

119873 120575 = 0 120575 = 110 120575 = 13 120575 = 12

10 0025 0191 minus0017 0394 minus0042 0915 minus0082 1167

50 0013 0187 minus0010 0492 minus0018 0931 minus0016 1156

100 0010 0200 minus0010 0513 minus0009 0914 minus0006 1225

400 0005 0186 minus0003 0528 0000 0927 minus0004 1211

1000 0003 0197 0000 0532 minus0001 0928 minus0001 1203

2000 0003 0217 minus0001 0523 0000 0935 minus0001 1130

119873 120575 = 1 120575 = 2 120575 = 5

10 minus0054 1987 minus0104 3384 minus0050 7601

50 minus0017 1948 minus0009 3529 0014 7942

100 minus0014 2024 minus0001 3294 0011 7798

400 0001 1919 minus0002 3433 0001 7855

1000 0001 1955 0001 3553 minus0001 7409

2000 0001 1896 0000 3508 minus0001 7578

the mean does not exist each subtable for these 120575 interprets120583119884as the median

Gaussian Data (120575 = 0) This setting checks if imposing theLambert W framework even though it is superfluous causesa quality loss in the estimation of 120583

119884= 120583

119883or 120590

119884= 120590

119883

Furthermore critical values for 1198670 120575 = 0 (Gaussian) can

be obtained As in the 120575-only case above Table 2(a) suggeststhat estimators are asymptotically unbiased and quickly tendto a large-sample variance Additional estimation of 120575 doesnot affect the efficiency of120583

119883compared to estimating solely120583

Estimating 120590119884directly by Gaussian MLE does not give better

results than the Lambert W times Gaussian MLE

No Fourth Moment (120575 = 13) Here 120590119884(120575 120590

119883= 1) =

228 but fourth moments do not exist This results in anincreasing empirical standard deviation of

119910as119873 grows In

contrast estimates for 120590119883are not drifting off In the presence

of these large heavy tails themedian ismuch less variable thanGaussianMLE and IGMM Yet Lambert W timesGaussianMLEfor 120583

119883even outperforms the median

Nonexisting Mean (120575 = 1) Both sample moments divergeand their standard errors are also growing quickly Themedian still provides a very good estimate for the locationbut is again inferior to both Lambert W estimators whichare closer to the true values and appear to converge to anasymptotic variance at rateradic119873

Extreme Heavy Tails (120575 = 15) As in Section 51 IGMMand Lambert WMLE continue to provide excellent estimateseven though the data is extremely heavy tailed MoreoverLambert W MLE also has the smallest empirical standarddeviation overall In particular the Lambert W MLE for 120583

119883

has an approximately 15 lower standard deviation than themedian

The last column shows that for some 119873 about 1 of the119899 = 1 000 simulations generated invalid likelihood values(NA and infin) Here the search for the optimal 120575 led into

regions with a numerical overflow in the evaluation of119882120575(119911)

For a comparable summary these few cases were omitted andnew simulations added until full 119899 = 1 000 finite estimateswere found Since this only happened in 1 of the cases andalso such heavy-tailed data is rarely encountered in practicethis numerical issue is not a limitation in statistical practice

53 Discussion of the Simulations IGMM performs wellindependent of the magnitude of 120575 As expected the LambertWMLE for 120579 has the best properties it can recover the truthfor all 120575 and for 120575 = 0 it performs as well as the classicsample mean and standard deviation For small 120575 it has thesame empirical standard deviation as the Gaussian MLE buta lower one than the median for large 120575

Hence the only advantage of estimating 120583119884and 120590

119884by

sample moments of y is speed otherwise the Lambert W times

Gaussian MLE is at least as good as the Gaussian MLE andclearly outperforms it in presence of heavy tails

6 Applications

Tukeyrsquos ℎ distribution has already proven useful to modelheavy-tailed data but parametric inference was limited toquantile fitting ormethods ofmoments estimation [4 27 28]Theorem 7 allows us to estimate 120579 by ML

This section applies the presented methodology on sim-ulated as well as real world data (i) Section 61 demonstratesGaussianizing on the Cauchy sample from the Introductionand (ii) Section 62 shows that heavy tail Lambert W times

Gaussian distributions provide an excellent fit to daily SampP500 log-return series

61 Estimating Location of a Cauchy with the Sample MeanIt is well known that the sample mean y is a poor estimateof the location parameter of a Cauchy distribution since thesampling distribution of y is again a Cauchy (see [47] for arecent overview) in particular its variance does not go to 0for119873 rarr infin

The Scientific World Journal 11

Table 2 In each subtable (first rows) average (middle rows) proportion of estimates below truth and (bottom rows) empirical standarddeviation timesradic119873 True 120583

119883= 0 and 120590

119883= 1

(a) Truly Gaussian data 120575 = 0

120575 = 0 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 098 000 097 002 099 000 096 002 098 0

100 000 000 099 000 098 001 100 000 097 001 099 0

1000 000 000 100 000 099 000 100 000 099 000 100 0

50 050 050 057 051 060 066 054 051 065 066 056 0

100 050 051 056 051 062 062 053 052 065 062 056 0

1000 050 049 052 049 062 056 052 049 063 056 052 0

50 124 101 072 101 076 021 073 102 078 026 072 0

100 125 102 070 102 076 023 070 103 078 026 070 0

1000 126 098 073 098 079 022 073 098 079 022 073 0

(b) No fourth moments 120575 = 13

120575 = 13 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 198 000 107 029 infin 000 101 033 infin 0

100 000 000 203 000 104 031 infin 000 100 033 infin 0

1000 000 000 218 000 100 033 234 000 100 033 234 0

50 050 051 078 050 038 063 060 050 052 054 054 0

100 050 051 078 051 042 061 060 050 051 054 054 0

1000 048 051 077 051 047 056 055 051 050 053 052 0

50 127 221 656 144 145 110 NA 123 135 114 NA 0

100 130 233 1128 143 142 112 NA 119 134 109 NA 0

1000 123 225 1676 139 145 120 1597 117 133 108 1230 0

(c) Non-existing mean 120575 = 1

120575 = 1 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 minus010 246 minus001 118 090 infin 000 101 099 infin 0

100 000 074 724 000 109 095 infin 000 101 099 infin 0

1000 000 384 3481 000 101 100 infin 000 100 100 infin 0

50 053 052 10 051 034 065 1 051 052 052 1 0

100 050 052 10 051 038 063 1 050 053 053 1 0

1000 049 052 10 051 048 053 1 049 051 051 1 0

50 127 6585 4243 210 250 232 NA 119 170 216 NA 0

100 130 41075 40502 201 228 259 NA 117 174 225 NA 0

1000 126 330758 1040527 193 221 281 NA 111 164 218 NA 0

(d) Extreme heavy tails 120575 = 15

120575 = 15 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 minus002 684 309 minus002 123 137 infin minus001 100 149 infin 001

100 000 minus5116 3080 minus001 112 144 infin 000 101 150 infin 000

1000 000 17613 14251 000 101 149 infin 000 100 150 infin 000

50 053 048 1 051 034 064 1 053 053 054 1 001

100 051 053 1 054 037 061 1 052 051 051 1 000

1000 050 050 1 050 047 054 1 049 053 052 1 000

50 132 134771 9261 257 320 312 NA 115 186 276 NA 001

100 133 4215628 418435 239 287 344 NA 112 178 284 NA 000

1000 126 12446282 3903629 218 266 367 NA 111 180 285 NA 000

12 The Scientific World Journal

Table 3 Summary statistics for observed (heavy-tailed) y and back-transformed (Gaussianized) data x120591MLE

lowastlowast stands for lt10minus16and lowast forlt22 sdot 10

minus16

y sim C(0 1)

(Section 61)y = SampP 500(Section 62)

y x120591

x

y x120591

Min minus16159 minus316 0 minus711 minus242Max 95295 381 3318 499 223Mean 230 003 1498 005 005Median 004 004 1496 004 004Standard Deviation 46980 106 120 095 071Skewness 1743 012 390 minus030 minus004Kurtosis 34334 321 16175 770 293Shapiro-Wilk lowast 071 lowastlowast lowast 024Anderson-Darling lowastlowast 051 lowastlowast lowast 018

Heavy-tailed Lambert W times Gaussian distributions havesimilar properties to a Cauchy for 120575 asymp 1 The mean of119883 (120583

119883)

equals the location of119884 (119888) due to symmetry around 120583119883and 119888

respectively Thus we can estimate 120591 from the Cauchy sampley transform y to x

120591 get 120583

119883from x

120591= 119882

120591(y) and thus obtain

an estimate for 119888 by 119888 = 120583119883

The random sample y sim C(0 1) with pdf 119891(119910) = 1120587(1+1199102) in Figure 2(a) has heavy tails with two extreme (positive)

observations A Cauchy ML fit gives 119888 = 003(0055) and119904 = 086(0053) (standard errors in parenthesis) A LambertWtimesGaussianMLEgives120583

119883= 003(0055)120590

119883= 105(0072)

and 120575 = 086(0082) Thus both fits correctly fail to reject120583119883= 119888 = 0 Table 3(a) shows summary statistics on both

samples Since the Cauchy distribution does not have a well-defined mean y = 2304(2101) is not meaningful Howeverx120591MLE

is approximately Gaussian and we can use the sampleaverage for inference x

120591MLE= 0033(00472) correctly fails to

reject a zero location for y The transformed x120591MLE

featuresadditional Gaussian characteristics (symmetric no excesskurtosis) and even the null hypothesis of Normality cannotbe rejected (119875-valuege 05) Note however that Normality forthe transformed data is only an empirical approximation therandom variable119882

120591((119884 minus 120583

119883)120590

119883) where 119884 is Cauchy does

not have a Normal distributionFigure 2(d) shows the cumulative sample average for

the original sample and its Gaussianized version For afair comparison 120591(119899)MLE was reestimated cumulatively for each119899 = 5 500 and then used to compute (119909

1 119909

119899) The

transformation works extremely well data point 11991049is highly

influential for y but has no relevant effect on x120591(119899)

MLE Even for

small 119899 it is already clear that the location of the underlyingCauchy distribution is approximately zero

Although it is a simulated example it demonstrates thatremoving (strong) heavy tails from data works well andprovides ldquonicerdquo data that can then be analyzed with morerefined Gaussian methods

62 Heavy Tails in Finance SampP 500 Case Study A lot offinancial data displays negative skewness and excess kurto-sis Since financial data in general is not iid it is oftenmodeled with a (skew) Studentrsquos 119905-distribution underlyinga (generalized) autoregressive conditional heteroskedastic(GARCH) [2 48] or a stochastic volatility (SV) [49 50]model Using the Lambert W approach we can build uponthe knowledge and implications of Normality (and avoidderiving properties of a GARCH or SV model with heavy-tailed innovations) and simply ldquoGaussianizerdquo the returnsbefore fitting more complex GARCH or SV models

Remark 10 Time series models with Lambert W times Gaussianwhite noise are far beyond the scope of this work but canbe a direction of future research Here I only consider theunconditional distribution

Figure 7(a) shows the SampP 500 log-returns with a totalof 119873 = 2 780 daily observations (R package MASSdataset SP500) Table 3(b) confirms the heavy tails (samplekurtosis 770) but also indicates negative skewness (minus0296)As the sample skewness 120574

1(y) is very sensitive to outliers

we fit a distribution which allows skewness and test forsymmetry In case of the double-tail Lambert W times Gaussianthis means testing 119867

0 120575

ℓ= 120575

119903= 120575 versus 119867

1

120575ℓ

= 120575119903 Using the likelihood expression in (28) we can

use a likelihood ratio test with one degree of freedom (3versus 4 parameters) The log-likelihood of the double-tail fit(Table 4(a)) equals minus36060 = minus297227 + (minus63373) (inputlog-likelihood + penalty) while the symmetric 120575 fit givesminus360656 = minus297147 + (minus63509) Here the symmetric fitgives a transformed sample that is more Gaussian but it paysa greater penalty for transforming the data Comparing twicetheir difference to a1205942

1distribution gives a119875-value of 029 For

comparison a skew-119905 fit [51] with location 119888 scale 119904 shape120572 and ] degrees of freedom also yields (Function stmle

in the R package sn) a nonsignificant (Table 4(b)) Thusboth fits cannot reject symmetry

The Scientific World Journal 13

Time

y

0 500 1500 2500

minus6

minus2

2

0 5 2010 3000

04

08

Lag

ACF

y

minus5 0 500

03

06

KernelGaussian

minus3 minus2 minus1 0 1 2 3

minus6

minus2

2

Theoretical quantiles

Sam

ple q

uant

ilesMean 005

SD 095Skewminus03Kurt 771

(a) Observed heavy tail returns y

Time

x

0 500 1500 2500

minus2

012

0 5 10 20 3000

04

08

Lag

ACF

x

00

03

06

minus3 minus2 minus1 0 1 2 3minus3 minus2 minus1 0 1 2 3

minus2

012

Theoretical quantiles

KernelGaussian

Sam

ple q

uant

iles

Mean 005SD 07Skewminus004Kurt 293

(b) Gaussianized returns x120591MLE

Figure 7 Lambert W Gaussianization of SampP 500 log-returns 120591 = (005 070 017) In (a) and (b) data (top left) autocorrelation function(ACF) (top right) histogram Gaussian fit and KDE (bottom left) Normal QQ plot (bottom right)

Assume we have to make a decision if we should trade acertificate replicating the SampP 500 Since we can either buyor sell it is not important if the average return is positive ornegative as long as it is significantly different from zero

621 Gaussian Fit to Returns Estimating (120583119884 120590119884) by Gaus-

sianMLE and thus ignoring the heavy tails 120583119884= 0 cannot be

rejected on a 120572 = 1 level (Table 4(e)) Ignoring heavy tailswe would thus decide to not trade a replicating certificate atall

622 Heavy Tail Fit to Returns Both a heavy tail LambertWtimes Gaussian (Table 4(c)) and Student 119905-fit (Table 4(d)) rejectthe zero mean null (119875-values 10minus4 and 3 sdot 10minus5 resp)

Location and scale estimates are almost identical buttail estimates lead to different conclusions while for ] =

371 only moments up to order 3 exist in the Lambert W times

Gaussian case moments up to order 5 exist (10172 = 581)This is especially noteworthy as many theoretical results inthe (financial) time series literature rely on finite fourthmoments [52 53] consequently many empirical studies testthis assumption [7 54] Here Studentrsquos 119905 and a LambertW times Gaussian fit give different conclusions Since previousempirical studies often use Studentrsquos 119905 as a baseline [41] itmight be worthwhile to reexamine their findings in light ofheavy tail Lambert W times Gaussian distributions

623 ldquoGaussianizingrdquo Financial Returns The back-trans-formed x

120591MLEis indistinguishable from a Gaussian sample

(Figure 7(b)) and thus demonstrates that a Lambert W times

Gaussian distribution is indeed appropriate for x120591MLE

Noteven one test can reject Normality 119875-values are 018 018031 and 024 respectively (Anderson-Darling Cramer-von-Mises Shapiro-Francia Shapiro-Wilk see Thode [55])Table 3(b) confirms that Lambert W ldquoGaussianiziationrdquo was

successful 1205741(x120591) = minus0039 and 120574

2(x120591) = 293 are within the

typical variation for aGaussian sample of size119873 = 2780Thus

119884 = (119880119890(01722)119880

2

) 0705 + 0055 119880 =

119883 minus 0055

0705

119880 simN (0 1)

(36)

is an adequate (unconditional) LambertWtimesGaussianmodelfor the SampP 500 log-returns y For trading thismeans that theexpected return is significantly larger than zero (120583

119883= 0055 gt

0) and thus replicating certificates should be bought

624 Gaussian MLE for Gaussianized Data For 120575119897= 120575

119903equiv

120575 lt 1 also 120583119883equiv 120583

119884 We can therefore replace testing

120583119910= 0 versus 120583

119910= 0 for a non-Gaussian y with the

very well-understood hypothesis test 120583119909= 0 versus 120583

119909= 0

for the Gaussian x120591MLE

In particular standard errors basedon radic119873 and thus t and 119875-values should be closer to theldquotruthrdquo (Tables 4(c) and 4(d)) than a Gaussian MLE on thenon-Gaussian y (Table 4(e)) Table 4(f) shows that standarderrors for 120583x are even a bit too small compared to the heavy-tailed versions Since the ldquoGaussianizingrdquo transformation wasestimated treating x

120591MLEas if it was original data is too

optimistic regarding its Normality (recall the penalty (30) inthe total likelihood (28))

This example confirms that if a model and its theoret-ical properties are based on Normality but the observeddata is heavy-tailed then Gaussianizing the data first givesmore reliable inference than applying Gaussian methods tothe original heavy-tailed data (Figure 1) Clearly a jointestimation of the model parameters based on Lambert Wtimes Gaussian random variables (or any other heavy-taileddistribution) would be optimal However theoretical prop-erties and estimation techniques may not be available orwell understood The Lambert Way to Gaussianize data is

14 The Scientific World Journal

Table 4 MLE fits to SampP 500 y Tables 4(a) 4(b) 4(c) 4(d) and 4(e)and the Gaussianized data x

120591MLETable 4(f)

(a) Double-tail Lambert W times Gaussian = Tukeyrsquos ℎℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 366 000120590119883

071 0016 4400 000120575ℓ

019 0021 899 000120575119903

016 0019 824 000

(b) Skew 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 010 0061 165 010119904 067 0017 3847 000120572 minus008 0101 minus077 044] 373 0297 1257 000

(c) Lambert W times Gaussian = Tukeyrsquos ℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 365 0000120590119883

071 0016 4395 0000120575 017 0016 1105 0000

(d) Studentrsquos 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 006 0015 365 000119904 067 0017 3951 000] 372 0295 1261 000

(e) Gaussian (SampP 500)

Est se 119905 Pr (gt|119905|)120583119884

005 0018 255 001120590119884

095 0013 7457 000

(f) Gaussian (x120591MLE )

Est se 119905 Pr (gt|119905|)120583119909120591

005 0013 381 000120590119909120591

071 0009 7457 000

thus a pragmatic method to improve statistical inferenceon heavy-tailed data while preserving the applicability andinterpretation of Gaussian models

7 Discussion and Outlook

In this work I use the Lambert W function to modeland remove heavy tails from continuous random variablesusing a data-transformation approach For Gaussian randomvariables this not only contributes to existing work onTukeyrsquos ℎ distribution but also gives convincing empiricalresults unimodal data with heavy tails can be transformedto Gaussian data Properties of a Gaussian model MN onthe back-transformed data mimic the features of the ldquotruerdquoheavy-tailed modelM

119866very closely

Since Normality is the single most typical and oftenrequired assumption in many areas of statistics machine

learning and signal processing future research can takemany directions From a theoretical perspective properties ofLambert W times 119865

119883distributions viewed as a generalization of

alreadywell-known distributions119865119883can be studiedThis area

will profit from existing literature on the Lambert W func-tion which has been discovered only recently by the statisticscommunity Empirical work can focus on transforming thedata and comparing approximate Gaussian with joint heavytail analyses The comparisons in this work showed thatapproximate inference for Gaussianized data is comparablewith the direct heavy tail modeling and so provides a simpletool to improve inference for heavy-tailed data in statisticalpractice

I also provide the R package Lambert W publicly avail-able at CRAN to facilitate the use of heavy tail Lambert Wtimes 119865

119883distributions in practice

Disclosure

The first version of this article entitled ldquoClosed-form cdfand pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert Wapproach and how to bijectively lsquoGaussianizersquo heavy-taileddatardquo has appeared online [56]This research was performedwhile the author was at Carnegie Mellon University Cur-rently Georg M Goerg works at Google Inc New York NY10011 USA

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

References

[1] P J Brockwell andR ADavisTime SeriesTheory andMethodsSpringer Series in Statistics 1998

[2] R F Engle ldquoAutoregressive conditional heteroscedasticity withestimates of the variance of United Kingdom inflationrdquo Econo-metrica vol 50 no 4 pp 987ndash1007 1982

[3] C W J Granger and R Joyeux ldquoAn introduction to long-memory time series models and fractional differencingrdquo Jour-nal of Time Series Analysis vol 1 pp 15ndash30 2001

[4] C A Field ldquoUsing the gh distribution to model extreme windspeedsrdquo Journal of Statistical Planning and Inference vol 122 no1-2 pp 15ndash22 2004

[5] A Vazquez J G Oliveira Z Dezso K-I Goh I Kondor andA-L Barabasi ldquoModeling bursts and heavy tails in humandynamicsrdquo Physical Review E vol 73 no 3 Article ID 0361272006

[6] M Gidlund and N Debernardi ldquoScheduling performance ofheavy-tailed data traffic in wireless high-speed shared chan-nelsrdquo in Proceedings of the IEEE Wireless Communications ampNetworking Conference (WCNC rsquo09) pp 1818ndash1823 IEEE PressBudapest Hungary April 2009

[7] R Cont ldquoEmpirical properties of asset returns stylized factsand statistical issuesrdquo Quantitative Finance vol 1 pp 223ndash2362001

[8] T-H Kim and H White ldquoOn more robust estim ation ofskewness and kurtosis simulation and application to the SP500indexrdquo Tech Rep Department of Economics UCSD 2003

The Scientific World Journal 15

[9] T C Aysal and K E Barner ldquoSecond-order heavy-taileddistributions and tail analysisrdquo IEEE Transactions on SignalProcessing vol 54 no 7 pp 2827ndash2832 2006

[10] V K Smith ldquoLeast squares regression with Cauchy errorsrdquoOxford Bulletin of Economics and Statistics vol 35 no 3 pp223ndash231 1973

[11] J Ilow ldquoForecasting network traffic using FARIMAmodels withheavy tailed innovationsrdquo inProceedings of the IEEE InterntionalConference onAcoustics Speech and Signal Processing pp 3814ndash3817 June 2000

[12] W Palma and M Zevallos ldquoFitting non-Gaussian persistentdatardquo Applied Stochastic Models in Business and Industry vol27 no 1 pp 23ndash36 2011

[13] J Nowicka-Zagrajek and R Weron ldquoModeling electricity loadsin California ARMA models with hyperbolic noiserdquo SignalProcessing vol 82 no 12 pp 1903ndash1915 2002

[14] R J Adler R E Feldman and M S Taqqu Eds A PracticalGuide to Heavy Tails Statistical Techniques and ApplicationsBirkhauser Cambridge Mass USA 1998

[15] H Liu J Lafferty and L Wasserman ldquoThe nonparanormalsemiparametric estimation of high dimensional undirectedgraphsrdquo Journal ofMachine Learning Research vol 10 pp 2295ndash2328 2009

[16] G M Goerg ldquoLambert W random variablesmdasha new familyof generalized skewed distributions with applications to riskestimationrdquo The Annals of Applied Statistics vol 5 no 3 pp2197ndash2230 2011

[17] D C Hoaglin ldquoSummarizing shape numerically the g-and-hdistributionsrdquo in Exploring Data Tables Trends and Shapes DC Hoaglin F Mosteller and J W Tukey Eds chapter 11 pp461ndash513 John Wiley amp Sons Hoboken NJ USA 1985

[18] RDevelopment Core Team R A Language and Environment forStatistical Computing R Foundation for Statistical ComputingVienna Austria 2010

[19] C Field and M G Genton ldquoThe Multivariate g-and-h Distri-butionrdquo Technometrics vol 48 no 1 pp 104ndash111 2006

[20] RM Sakia ldquoThe Box-Cox transformation technique a reviewrdquoJournal of the Royal Statistical Society Series D The Statisticianvol 41 no 2 pp 169ndash178 1992

[21] J R Blaylock L E Salathe and R D Green ldquoA note on the Box-Cox transformation under heteroskedasticityrdquoWestern Journalof Agricultural Economics vol 5 no 2 pp 129ndash136 1980

[22] A J Lawrance ldquoA note on the variance of the Box-Cox regres-sion transformation estimaterdquo Journal of the Royal StatisticalSociety C vol 36 no 2 pp 221ndash223 1987

[23] G Tsiotas ldquoOn the use of the Box-Cox transformation onconditional variance modelsrdquo Finance Research Letters vol 4no 1 pp 28ndash32 2007

[24] S Goncalves andNMeddahi ldquoBox-Cox transforms for realizedvolatilityrdquo Journal of Econometrics vol 160 no 1 pp 129ndash1442011

[25] N H Bingham C M Goldie and J L Teugels RegularVariation Encyclopedia of Mathematics and its ApplicationsCambridge University Press Cambridge UK 1989

[26] S Morgenthaler and J W Tukey ldquoFitting quantiles doublingHRHQ andHHH distributionsrdquo Journal of Computational andGraphical Statistics vol 9 no 1 pp 180ndash195 2000

[27] M Fischer ldquoGeneralized Tukey-type distributions with appli-cation to financial and teletraffic datardquo Statistical Papers vol 51no 1 pp 41ndash56 2010

[28] T C Headrick R K Kowalchuk and Y Sheng ldquoParametricprobability densities and distribution functions for Tukey g-and-h transform ations and their use for fi tting datardquo AppliedMathematical Sciences vol 2 no 9 pp 449ndash462 2008

[29] R M Corless G H Gonnet D E Hare D J Jeffrey andD E Knuth ldquoOn the Lambert 119882 functionrdquo Advances inComputational Mathematics vol 5 no 4 pp 329ndash359 1996

[30] M Rosenlicht ldquoOn the explicit solvability of certain transcen-dental equationsrdquo Publications Mathematiques de lrsquoInstitut desHautes Etudes Scientifiques no 36 pp 15ndash22 1969

[31] S R Valluri D J Jeffrey and R M Corless ldquoSome applicationsof the Lambert W function to physicsrdquo Canadian Journal ofPhysics vol 78 no 9 pp 823ndash831 2000

[32] M Galassi J Davies J Theiler et al GNU Scientific LibraryReference Manual 3rd edition 2011 httpwwwgnuorgsoft-waregsl

[33] P Jodra ldquoA closed-form expression for the quantile functionof the Gompertz-Makeham distributionrdquo Mathematics andComputers in Simulation vol 79 no 10 pp 3069ndash3075 2009

[34] A G Pakes ldquoLambertrsquos W infinite divisibility and poissonmixturesrdquo Journal of Mathematical Analysis and Applicationsvol 378 no 2 pp 480ndash492 2011

[35] M Stehlık ldquoDistributions of exact tests in the exponentialfamilyrdquoMetrika vol 57 no 2 pp 145ndash164 2003

[36] J-N Hwang S-R Lay and A Lippman ldquoNonparametricmultivariate density estimation a comparative studyrdquo IEEETransactions on Signal Processing vol 42 no 10 pp 2795ndash28101994

[37] L Wasserman All of Nonparametric Statistics Springer Texts inStatistics 2007

[38] R E Maiboroda and N M Markovich ldquoEstimation of heavy-tailed probability density functionwith application toweb datardquoComputational Statistics vol 19 no 4 pp 569ndash592 2004

[39] N M Markovich ldquoAccuracy of transformed kernel densityestimates for a heavy-tailed distributionrdquo Automation andRemote Control vol 66 no 2 pp 217ndash232 2005

[40] K K Dutta and D Babbel ldquoOn measuring skewness andkurtosis in short rate distributions the case of the US dollarLondon inter bank offer ratesrdquo Tech Rep Wharton SchoolCenter for Financial Institutions University of Pennsylvania2002

[41] C S Wong W S Chan and P L Kam ldquoA Student t-mixtureautoregressivemodel with applications to heavy-tailed financialdatardquo Biometrika vol 96 no 3 pp 751ndash760 2009

[42] J Yan ldquoAsymmetry fat-tail and autoregressive conditionaldensity in financial return data with systems of frequencycurvesrdquo 2005 httpciteseerxistpsueduviewdocsummarydoi=1011762741

[43] G D Rayner and H L MacGillivray ldquoNumerical maximumlikelihood estimation for the g-and-k and generalized g-and-hdistributionsrdquo Statistics and Computing vol 12 no 1 pp 57ndash752002

[44] E L Lehmann and G Casella Theory of Point EstimationSpringer Texts in Statistics Springer New York NY USA 2ndedition 1998

[45] C Fernandez andM F Steel ldquoMultivariate Student-119905 regressionmodels pitfalls and inferencerdquo Biometrika vol 86 no 1 pp153ndash167 1999

[46] C Liu and D B Rubin ldquoML estimation of the t distributionusing EM and its extensions ECMand ECMErdquo Statistica Sinicavol 5 pp 19ndash39 1995

16 The Scientific World Journal

[47] M P Cohen ldquoSample means of independent standard CAUchyrandom variables are standard cauchy a new approachrdquo TheAmerican Mathematical Monthly vol 119 no 3 pp 240ndash2442012

[48] T Bollerslev ldquoGeneralized autoregressive conditional het-eroskedasticityrdquo Journal of Econometrics vol 31 no 3 pp 307ndash327 1986

[49] R Deo C Hurvich and Y Lu ldquoForecasting realized volatilityusing a long-memory stochastic volatility model estimationprediction and seasonal adjustmentrdquo Journal of Econometricsvol 131 no 1-2 pp 29ndash58 2006

[50] AMelino and SM Turnbull ldquoPricing foreign currency optionswith stochastic volatilityrdquo Journal of Econometrics vol 45 no 1-2 pp 239ndash265 1990

[51] A Azzalini and A Capitanio ldquoDistributions generated byperturbation of symmetrywith emphasis on amultivariate skewt-distributionrdquo Journal of the Royal Statistical Society Series BStatistical Methodology vol 65 no 2 pp 367ndash389 2003

[52] R N Mantegna and H E Stanley ldquoModeling of financial datacomparison of the truncated Levy flight and the ARCH(1) andGARCH(11) processesrdquo Physica A Statistical Mechanics and itsApplications vol 254 no 1-2 pp 77ndash84 1998

[53] P A Zadrozny Necessary and Sufficient Restrictions for Exis-tence of a Unique Fourth Moment of a Univariate GARCH(pq)Process vol 20 of Advances in Econometrics chapter 13Emerald Group Publishing 2005 httpsideasrepecorgpcesceswps 1505html

[54] R Huisman K G Koedijk C J Kool and F Palm ldquoTail-indexestimates in small samplesrdquo Journal of Business amp EconomicStatistics vol 19 no 2 pp 208ndash216 2001

[55] H CThode JrTesting for Normality CRCPress NewYorkNYUSA 2002

[56] G M Goerg ldquoClosed-form cdf and pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert W approach and howto bijectively ldquoGaussianizerdquo heavy-tailed datardquo Tech RepCarnegieMellonUniversity 2010 httparxiv-web3librarycor-nelleduabs10102265v1

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article The Lambert Way to Gaussianize Heavy

The Scientific World Journal 11

Table 2 In each subtable (first rows) average (middle rows) proportion of estimates below truth and (bottom rows) empirical standarddeviation timesradic119873 True 120583

119883= 0 and 120590

119883= 1

(a) Truly Gaussian data 120575 = 0

120575 = 0 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 098 000 097 002 099 000 096 002 098 0

100 000 000 099 000 098 001 100 000 097 001 099 0

1000 000 000 100 000 099 000 100 000 099 000 100 0

50 050 050 057 051 060 066 054 051 065 066 056 0

100 050 051 056 051 062 062 053 052 065 062 056 0

1000 050 049 052 049 062 056 052 049 063 056 052 0

50 124 101 072 101 076 021 073 102 078 026 072 0

100 125 102 070 102 076 023 070 103 078 026 070 0

1000 126 098 073 098 079 022 073 098 079 022 073 0

(b) No fourth moments 120575 = 13

120575 = 13 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 000 198 000 107 029 infin 000 101 033 infin 0

100 000 000 203 000 104 031 infin 000 100 033 infin 0

1000 000 000 218 000 100 033 234 000 100 033 234 0

50 050 051 078 050 038 063 060 050 052 054 054 0

100 050 051 078 051 042 061 060 050 051 054 054 0

1000 048 051 077 051 047 056 055 051 050 053 052 0

50 127 221 656 144 145 110 NA 123 135 114 NA 0

100 130 233 1128 143 142 112 NA 119 134 109 NA 0

1000 123 225 1676 139 145 120 1597 117 133 108 1230 0

(c) Non-existing mean 120575 = 1

120575 = 1 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 000 minus010 246 minus001 118 090 infin 000 101 099 infin 0

100 000 074 724 000 109 095 infin 000 101 099 infin 0

1000 000 384 3481 000 101 100 infin 000 100 100 infin 0

50 053 052 10 051 034 065 1 051 052 052 1 0

100 050 052 10 051 038 063 1 050 053 053 1 0

1000 049 052 10 051 048 053 1 049 051 051 1 0

50 127 6585 4243 210 250 232 NA 119 170 216 NA 0

100 130 41075 40502 201 228 259 NA 117 174 225 NA 0

1000 126 330758 1040527 193 221 281 NA 111 164 218 NA 0

(d) Extreme heavy tails 120575 = 15

120575 = 15 Median Gaussian MLE IGMM Lambert WMLE NA119873 120583

119884120590119884

120583119883

120590119883

120575 120590119884

120583119883

120590119883

120575 120590119884

Ratio50 minus002 684 309 minus002 123 137 infin minus001 100 149 infin 001

100 000 minus5116 3080 minus001 112 144 infin 000 101 150 infin 000

1000 000 17613 14251 000 101 149 infin 000 100 150 infin 000

50 053 048 1 051 034 064 1 053 053 054 1 001

100 051 053 1 054 037 061 1 052 051 051 1 000

1000 050 050 1 050 047 054 1 049 053 052 1 000

50 132 134771 9261 257 320 312 NA 115 186 276 NA 001

100 133 4215628 418435 239 287 344 NA 112 178 284 NA 000

1000 126 12446282 3903629 218 266 367 NA 111 180 285 NA 000

12 The Scientific World Journal

Table 3 Summary statistics for observed (heavy-tailed) y and back-transformed (Gaussianized) data x120591MLE

lowastlowast stands for lt10minus16and lowast forlt22 sdot 10

minus16

y sim C(0 1)

(Section 61)y = SampP 500(Section 62)

y x120591

x

y x120591

Min minus16159 minus316 0 minus711 minus242Max 95295 381 3318 499 223Mean 230 003 1498 005 005Median 004 004 1496 004 004Standard Deviation 46980 106 120 095 071Skewness 1743 012 390 minus030 minus004Kurtosis 34334 321 16175 770 293Shapiro-Wilk lowast 071 lowastlowast lowast 024Anderson-Darling lowastlowast 051 lowastlowast lowast 018

Heavy-tailed Lambert W times Gaussian distributions havesimilar properties to a Cauchy for 120575 asymp 1 The mean of119883 (120583

119883)

equals the location of119884 (119888) due to symmetry around 120583119883and 119888

respectively Thus we can estimate 120591 from the Cauchy sampley transform y to x

120591 get 120583

119883from x

120591= 119882

120591(y) and thus obtain

an estimate for 119888 by 119888 = 120583119883

The random sample y sim C(0 1) with pdf 119891(119910) = 1120587(1+1199102) in Figure 2(a) has heavy tails with two extreme (positive)

observations A Cauchy ML fit gives 119888 = 003(0055) and119904 = 086(0053) (standard errors in parenthesis) A LambertWtimesGaussianMLEgives120583

119883= 003(0055)120590

119883= 105(0072)

and 120575 = 086(0082) Thus both fits correctly fail to reject120583119883= 119888 = 0 Table 3(a) shows summary statistics on both

samples Since the Cauchy distribution does not have a well-defined mean y = 2304(2101) is not meaningful Howeverx120591MLE

is approximately Gaussian and we can use the sampleaverage for inference x

120591MLE= 0033(00472) correctly fails to

reject a zero location for y The transformed x120591MLE

featuresadditional Gaussian characteristics (symmetric no excesskurtosis) and even the null hypothesis of Normality cannotbe rejected (119875-valuege 05) Note however that Normality forthe transformed data is only an empirical approximation therandom variable119882

120591((119884 minus 120583

119883)120590

119883) where 119884 is Cauchy does

not have a Normal distributionFigure 2(d) shows the cumulative sample average for

the original sample and its Gaussianized version For afair comparison 120591(119899)MLE was reestimated cumulatively for each119899 = 5 500 and then used to compute (119909

1 119909

119899) The

transformation works extremely well data point 11991049is highly

influential for y but has no relevant effect on x120591(119899)

MLE Even for

small 119899 it is already clear that the location of the underlyingCauchy distribution is approximately zero

Although it is a simulated example it demonstrates thatremoving (strong) heavy tails from data works well andprovides ldquonicerdquo data that can then be analyzed with morerefined Gaussian methods

62 Heavy Tails in Finance SampP 500 Case Study A lot offinancial data displays negative skewness and excess kurto-sis Since financial data in general is not iid it is oftenmodeled with a (skew) Studentrsquos 119905-distribution underlyinga (generalized) autoregressive conditional heteroskedastic(GARCH) [2 48] or a stochastic volatility (SV) [49 50]model Using the Lambert W approach we can build uponthe knowledge and implications of Normality (and avoidderiving properties of a GARCH or SV model with heavy-tailed innovations) and simply ldquoGaussianizerdquo the returnsbefore fitting more complex GARCH or SV models

Remark 10 Time series models with Lambert W times Gaussianwhite noise are far beyond the scope of this work but canbe a direction of future research Here I only consider theunconditional distribution

Figure 7(a) shows the SampP 500 log-returns with a totalof 119873 = 2 780 daily observations (R package MASSdataset SP500) Table 3(b) confirms the heavy tails (samplekurtosis 770) but also indicates negative skewness (minus0296)As the sample skewness 120574

1(y) is very sensitive to outliers

we fit a distribution which allows skewness and test forsymmetry In case of the double-tail Lambert W times Gaussianthis means testing 119867

0 120575

ℓ= 120575

119903= 120575 versus 119867

1

120575ℓ

= 120575119903 Using the likelihood expression in (28) we can

use a likelihood ratio test with one degree of freedom (3versus 4 parameters) The log-likelihood of the double-tail fit(Table 4(a)) equals minus36060 = minus297227 + (minus63373) (inputlog-likelihood + penalty) while the symmetric 120575 fit givesminus360656 = minus297147 + (minus63509) Here the symmetric fitgives a transformed sample that is more Gaussian but it paysa greater penalty for transforming the data Comparing twicetheir difference to a1205942

1distribution gives a119875-value of 029 For

comparison a skew-119905 fit [51] with location 119888 scale 119904 shape120572 and ] degrees of freedom also yields (Function stmle

in the R package sn) a nonsignificant (Table 4(b)) Thusboth fits cannot reject symmetry

The Scientific World Journal 13

Time

y

0 500 1500 2500

minus6

minus2

2

0 5 2010 3000

04

08

Lag

ACF

y

minus5 0 500

03

06

KernelGaussian

minus3 minus2 minus1 0 1 2 3

minus6

minus2

2

Theoretical quantiles

Sam

ple q

uant

ilesMean 005

SD 095Skewminus03Kurt 771

(a) Observed heavy tail returns y

Time

x

0 500 1500 2500

minus2

012

0 5 10 20 3000

04

08

Lag

ACF

x

00

03

06

minus3 minus2 minus1 0 1 2 3minus3 minus2 minus1 0 1 2 3

minus2

012

Theoretical quantiles

KernelGaussian

Sam

ple q

uant

iles

Mean 005SD 07Skewminus004Kurt 293

(b) Gaussianized returns x120591MLE

Figure 7 Lambert W Gaussianization of SampP 500 log-returns 120591 = (005 070 017) In (a) and (b) data (top left) autocorrelation function(ACF) (top right) histogram Gaussian fit and KDE (bottom left) Normal QQ plot (bottom right)

Assume we have to make a decision if we should trade acertificate replicating the SampP 500 Since we can either buyor sell it is not important if the average return is positive ornegative as long as it is significantly different from zero

621 Gaussian Fit to Returns Estimating (120583119884 120590119884) by Gaus-

sianMLE and thus ignoring the heavy tails 120583119884= 0 cannot be

rejected on a 120572 = 1 level (Table 4(e)) Ignoring heavy tailswe would thus decide to not trade a replicating certificate atall

622 Heavy Tail Fit to Returns Both a heavy tail LambertWtimes Gaussian (Table 4(c)) and Student 119905-fit (Table 4(d)) rejectthe zero mean null (119875-values 10minus4 and 3 sdot 10minus5 resp)

Location and scale estimates are almost identical buttail estimates lead to different conclusions while for ] =

371 only moments up to order 3 exist in the Lambert W times

Gaussian case moments up to order 5 exist (10172 = 581)This is especially noteworthy as many theoretical results inthe (financial) time series literature rely on finite fourthmoments [52 53] consequently many empirical studies testthis assumption [7 54] Here Studentrsquos 119905 and a LambertW times Gaussian fit give different conclusions Since previousempirical studies often use Studentrsquos 119905 as a baseline [41] itmight be worthwhile to reexamine their findings in light ofheavy tail Lambert W times Gaussian distributions

623 ldquoGaussianizingrdquo Financial Returns The back-trans-formed x

120591MLEis indistinguishable from a Gaussian sample

(Figure 7(b)) and thus demonstrates that a Lambert W times

Gaussian distribution is indeed appropriate for x120591MLE

Noteven one test can reject Normality 119875-values are 018 018031 and 024 respectively (Anderson-Darling Cramer-von-Mises Shapiro-Francia Shapiro-Wilk see Thode [55])Table 3(b) confirms that Lambert W ldquoGaussianiziationrdquo was

successful 1205741(x120591) = minus0039 and 120574

2(x120591) = 293 are within the

typical variation for aGaussian sample of size119873 = 2780Thus

119884 = (119880119890(01722)119880

2

) 0705 + 0055 119880 =

119883 minus 0055

0705

119880 simN (0 1)

(36)

is an adequate (unconditional) LambertWtimesGaussianmodelfor the SampP 500 log-returns y For trading thismeans that theexpected return is significantly larger than zero (120583

119883= 0055 gt

0) and thus replicating certificates should be bought

624 Gaussian MLE for Gaussianized Data For 120575119897= 120575

119903equiv

120575 lt 1 also 120583119883equiv 120583

119884 We can therefore replace testing

120583119910= 0 versus 120583

119910= 0 for a non-Gaussian y with the

very well-understood hypothesis test 120583119909= 0 versus 120583

119909= 0

for the Gaussian x120591MLE

In particular standard errors basedon radic119873 and thus t and 119875-values should be closer to theldquotruthrdquo (Tables 4(c) and 4(d)) than a Gaussian MLE on thenon-Gaussian y (Table 4(e)) Table 4(f) shows that standarderrors for 120583x are even a bit too small compared to the heavy-tailed versions Since the ldquoGaussianizingrdquo transformation wasestimated treating x

120591MLEas if it was original data is too

optimistic regarding its Normality (recall the penalty (30) inthe total likelihood (28))

This example confirms that if a model and its theoret-ical properties are based on Normality but the observeddata is heavy-tailed then Gaussianizing the data first givesmore reliable inference than applying Gaussian methods tothe original heavy-tailed data (Figure 1) Clearly a jointestimation of the model parameters based on Lambert Wtimes Gaussian random variables (or any other heavy-taileddistribution) would be optimal However theoretical prop-erties and estimation techniques may not be available orwell understood The Lambert Way to Gaussianize data is

14 The Scientific World Journal

Table 4 MLE fits to SampP 500 y Tables 4(a) 4(b) 4(c) 4(d) and 4(e)and the Gaussianized data x

120591MLETable 4(f)

(a) Double-tail Lambert W times Gaussian = Tukeyrsquos ℎℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 366 000120590119883

071 0016 4400 000120575ℓ

019 0021 899 000120575119903

016 0019 824 000

(b) Skew 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 010 0061 165 010119904 067 0017 3847 000120572 minus008 0101 minus077 044] 373 0297 1257 000

(c) Lambert W times Gaussian = Tukeyrsquos ℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 365 0000120590119883

071 0016 4395 0000120575 017 0016 1105 0000

(d) Studentrsquos 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 006 0015 365 000119904 067 0017 3951 000] 372 0295 1261 000

(e) Gaussian (SampP 500)

Est se 119905 Pr (gt|119905|)120583119884

005 0018 255 001120590119884

095 0013 7457 000

(f) Gaussian (x120591MLE )

Est se 119905 Pr (gt|119905|)120583119909120591

005 0013 381 000120590119909120591

071 0009 7457 000

thus a pragmatic method to improve statistical inferenceon heavy-tailed data while preserving the applicability andinterpretation of Gaussian models

7 Discussion and Outlook

In this work I use the Lambert W function to modeland remove heavy tails from continuous random variablesusing a data-transformation approach For Gaussian randomvariables this not only contributes to existing work onTukeyrsquos ℎ distribution but also gives convincing empiricalresults unimodal data with heavy tails can be transformedto Gaussian data Properties of a Gaussian model MN onthe back-transformed data mimic the features of the ldquotruerdquoheavy-tailed modelM

119866very closely

Since Normality is the single most typical and oftenrequired assumption in many areas of statistics machine

learning and signal processing future research can takemany directions From a theoretical perspective properties ofLambert W times 119865

119883distributions viewed as a generalization of

alreadywell-known distributions119865119883can be studiedThis area

will profit from existing literature on the Lambert W func-tion which has been discovered only recently by the statisticscommunity Empirical work can focus on transforming thedata and comparing approximate Gaussian with joint heavytail analyses The comparisons in this work showed thatapproximate inference for Gaussianized data is comparablewith the direct heavy tail modeling and so provides a simpletool to improve inference for heavy-tailed data in statisticalpractice

I also provide the R package Lambert W publicly avail-able at CRAN to facilitate the use of heavy tail Lambert Wtimes 119865

119883distributions in practice

Disclosure

The first version of this article entitled ldquoClosed-form cdfand pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert Wapproach and how to bijectively lsquoGaussianizersquo heavy-taileddatardquo has appeared online [56]This research was performedwhile the author was at Carnegie Mellon University Cur-rently Georg M Goerg works at Google Inc New York NY10011 USA

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

References

[1] P J Brockwell andR ADavisTime SeriesTheory andMethodsSpringer Series in Statistics 1998

[2] R F Engle ldquoAutoregressive conditional heteroscedasticity withestimates of the variance of United Kingdom inflationrdquo Econo-metrica vol 50 no 4 pp 987ndash1007 1982

[3] C W J Granger and R Joyeux ldquoAn introduction to long-memory time series models and fractional differencingrdquo Jour-nal of Time Series Analysis vol 1 pp 15ndash30 2001

[4] C A Field ldquoUsing the gh distribution to model extreme windspeedsrdquo Journal of Statistical Planning and Inference vol 122 no1-2 pp 15ndash22 2004

[5] A Vazquez J G Oliveira Z Dezso K-I Goh I Kondor andA-L Barabasi ldquoModeling bursts and heavy tails in humandynamicsrdquo Physical Review E vol 73 no 3 Article ID 0361272006

[6] M Gidlund and N Debernardi ldquoScheduling performance ofheavy-tailed data traffic in wireless high-speed shared chan-nelsrdquo in Proceedings of the IEEE Wireless Communications ampNetworking Conference (WCNC rsquo09) pp 1818ndash1823 IEEE PressBudapest Hungary April 2009

[7] R Cont ldquoEmpirical properties of asset returns stylized factsand statistical issuesrdquo Quantitative Finance vol 1 pp 223ndash2362001

[8] T-H Kim and H White ldquoOn more robust estim ation ofskewness and kurtosis simulation and application to the SP500indexrdquo Tech Rep Department of Economics UCSD 2003

The Scientific World Journal 15

[9] T C Aysal and K E Barner ldquoSecond-order heavy-taileddistributions and tail analysisrdquo IEEE Transactions on SignalProcessing vol 54 no 7 pp 2827ndash2832 2006

[10] V K Smith ldquoLeast squares regression with Cauchy errorsrdquoOxford Bulletin of Economics and Statistics vol 35 no 3 pp223ndash231 1973

[11] J Ilow ldquoForecasting network traffic using FARIMAmodels withheavy tailed innovationsrdquo inProceedings of the IEEE InterntionalConference onAcoustics Speech and Signal Processing pp 3814ndash3817 June 2000

[12] W Palma and M Zevallos ldquoFitting non-Gaussian persistentdatardquo Applied Stochastic Models in Business and Industry vol27 no 1 pp 23ndash36 2011

[13] J Nowicka-Zagrajek and R Weron ldquoModeling electricity loadsin California ARMA models with hyperbolic noiserdquo SignalProcessing vol 82 no 12 pp 1903ndash1915 2002

[14] R J Adler R E Feldman and M S Taqqu Eds A PracticalGuide to Heavy Tails Statistical Techniques and ApplicationsBirkhauser Cambridge Mass USA 1998

[15] H Liu J Lafferty and L Wasserman ldquoThe nonparanormalsemiparametric estimation of high dimensional undirectedgraphsrdquo Journal ofMachine Learning Research vol 10 pp 2295ndash2328 2009

[16] G M Goerg ldquoLambert W random variablesmdasha new familyof generalized skewed distributions with applications to riskestimationrdquo The Annals of Applied Statistics vol 5 no 3 pp2197ndash2230 2011

[17] D C Hoaglin ldquoSummarizing shape numerically the g-and-hdistributionsrdquo in Exploring Data Tables Trends and Shapes DC Hoaglin F Mosteller and J W Tukey Eds chapter 11 pp461ndash513 John Wiley amp Sons Hoboken NJ USA 1985

[18] RDevelopment Core Team R A Language and Environment forStatistical Computing R Foundation for Statistical ComputingVienna Austria 2010

[19] C Field and M G Genton ldquoThe Multivariate g-and-h Distri-butionrdquo Technometrics vol 48 no 1 pp 104ndash111 2006

[20] RM Sakia ldquoThe Box-Cox transformation technique a reviewrdquoJournal of the Royal Statistical Society Series D The Statisticianvol 41 no 2 pp 169ndash178 1992

[21] J R Blaylock L E Salathe and R D Green ldquoA note on the Box-Cox transformation under heteroskedasticityrdquoWestern Journalof Agricultural Economics vol 5 no 2 pp 129ndash136 1980

[22] A J Lawrance ldquoA note on the variance of the Box-Cox regres-sion transformation estimaterdquo Journal of the Royal StatisticalSociety C vol 36 no 2 pp 221ndash223 1987

[23] G Tsiotas ldquoOn the use of the Box-Cox transformation onconditional variance modelsrdquo Finance Research Letters vol 4no 1 pp 28ndash32 2007

[24] S Goncalves andNMeddahi ldquoBox-Cox transforms for realizedvolatilityrdquo Journal of Econometrics vol 160 no 1 pp 129ndash1442011

[25] N H Bingham C M Goldie and J L Teugels RegularVariation Encyclopedia of Mathematics and its ApplicationsCambridge University Press Cambridge UK 1989

[26] S Morgenthaler and J W Tukey ldquoFitting quantiles doublingHRHQ andHHH distributionsrdquo Journal of Computational andGraphical Statistics vol 9 no 1 pp 180ndash195 2000

[27] M Fischer ldquoGeneralized Tukey-type distributions with appli-cation to financial and teletraffic datardquo Statistical Papers vol 51no 1 pp 41ndash56 2010

[28] T C Headrick R K Kowalchuk and Y Sheng ldquoParametricprobability densities and distribution functions for Tukey g-and-h transform ations and their use for fi tting datardquo AppliedMathematical Sciences vol 2 no 9 pp 449ndash462 2008

[29] R M Corless G H Gonnet D E Hare D J Jeffrey andD E Knuth ldquoOn the Lambert 119882 functionrdquo Advances inComputational Mathematics vol 5 no 4 pp 329ndash359 1996

[30] M Rosenlicht ldquoOn the explicit solvability of certain transcen-dental equationsrdquo Publications Mathematiques de lrsquoInstitut desHautes Etudes Scientifiques no 36 pp 15ndash22 1969

[31] S R Valluri D J Jeffrey and R M Corless ldquoSome applicationsof the Lambert W function to physicsrdquo Canadian Journal ofPhysics vol 78 no 9 pp 823ndash831 2000

[32] M Galassi J Davies J Theiler et al GNU Scientific LibraryReference Manual 3rd edition 2011 httpwwwgnuorgsoft-waregsl

[33] P Jodra ldquoA closed-form expression for the quantile functionof the Gompertz-Makeham distributionrdquo Mathematics andComputers in Simulation vol 79 no 10 pp 3069ndash3075 2009

[34] A G Pakes ldquoLambertrsquos W infinite divisibility and poissonmixturesrdquo Journal of Mathematical Analysis and Applicationsvol 378 no 2 pp 480ndash492 2011

[35] M Stehlık ldquoDistributions of exact tests in the exponentialfamilyrdquoMetrika vol 57 no 2 pp 145ndash164 2003

[36] J-N Hwang S-R Lay and A Lippman ldquoNonparametricmultivariate density estimation a comparative studyrdquo IEEETransactions on Signal Processing vol 42 no 10 pp 2795ndash28101994

[37] L Wasserman All of Nonparametric Statistics Springer Texts inStatistics 2007

[38] R E Maiboroda and N M Markovich ldquoEstimation of heavy-tailed probability density functionwith application toweb datardquoComputational Statistics vol 19 no 4 pp 569ndash592 2004

[39] N M Markovich ldquoAccuracy of transformed kernel densityestimates for a heavy-tailed distributionrdquo Automation andRemote Control vol 66 no 2 pp 217ndash232 2005

[40] K K Dutta and D Babbel ldquoOn measuring skewness andkurtosis in short rate distributions the case of the US dollarLondon inter bank offer ratesrdquo Tech Rep Wharton SchoolCenter for Financial Institutions University of Pennsylvania2002

[41] C S Wong W S Chan and P L Kam ldquoA Student t-mixtureautoregressivemodel with applications to heavy-tailed financialdatardquo Biometrika vol 96 no 3 pp 751ndash760 2009

[42] J Yan ldquoAsymmetry fat-tail and autoregressive conditionaldensity in financial return data with systems of frequencycurvesrdquo 2005 httpciteseerxistpsueduviewdocsummarydoi=1011762741

[43] G D Rayner and H L MacGillivray ldquoNumerical maximumlikelihood estimation for the g-and-k and generalized g-and-hdistributionsrdquo Statistics and Computing vol 12 no 1 pp 57ndash752002

[44] E L Lehmann and G Casella Theory of Point EstimationSpringer Texts in Statistics Springer New York NY USA 2ndedition 1998

[45] C Fernandez andM F Steel ldquoMultivariate Student-119905 regressionmodels pitfalls and inferencerdquo Biometrika vol 86 no 1 pp153ndash167 1999

[46] C Liu and D B Rubin ldquoML estimation of the t distributionusing EM and its extensions ECMand ECMErdquo Statistica Sinicavol 5 pp 19ndash39 1995

16 The Scientific World Journal

[47] M P Cohen ldquoSample means of independent standard CAUchyrandom variables are standard cauchy a new approachrdquo TheAmerican Mathematical Monthly vol 119 no 3 pp 240ndash2442012

[48] T Bollerslev ldquoGeneralized autoregressive conditional het-eroskedasticityrdquo Journal of Econometrics vol 31 no 3 pp 307ndash327 1986

[49] R Deo C Hurvich and Y Lu ldquoForecasting realized volatilityusing a long-memory stochastic volatility model estimationprediction and seasonal adjustmentrdquo Journal of Econometricsvol 131 no 1-2 pp 29ndash58 2006

[50] AMelino and SM Turnbull ldquoPricing foreign currency optionswith stochastic volatilityrdquo Journal of Econometrics vol 45 no 1-2 pp 239ndash265 1990

[51] A Azzalini and A Capitanio ldquoDistributions generated byperturbation of symmetrywith emphasis on amultivariate skewt-distributionrdquo Journal of the Royal Statistical Society Series BStatistical Methodology vol 65 no 2 pp 367ndash389 2003

[52] R N Mantegna and H E Stanley ldquoModeling of financial datacomparison of the truncated Levy flight and the ARCH(1) andGARCH(11) processesrdquo Physica A Statistical Mechanics and itsApplications vol 254 no 1-2 pp 77ndash84 1998

[53] P A Zadrozny Necessary and Sufficient Restrictions for Exis-tence of a Unique Fourth Moment of a Univariate GARCH(pq)Process vol 20 of Advances in Econometrics chapter 13Emerald Group Publishing 2005 httpsideasrepecorgpcesceswps 1505html

[54] R Huisman K G Koedijk C J Kool and F Palm ldquoTail-indexestimates in small samplesrdquo Journal of Business amp EconomicStatistics vol 19 no 2 pp 208ndash216 2001

[55] H CThode JrTesting for Normality CRCPress NewYorkNYUSA 2002

[56] G M Goerg ldquoClosed-form cdf and pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert W approach and howto bijectively ldquoGaussianizerdquo heavy-tailed datardquo Tech RepCarnegieMellonUniversity 2010 httparxiv-web3librarycor-nelleduabs10102265v1

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article The Lambert Way to Gaussianize Heavy

12 The Scientific World Journal

Table 3 Summary statistics for observed (heavy-tailed) y and back-transformed (Gaussianized) data x120591MLE

lowastlowast stands for lt10minus16and lowast forlt22 sdot 10

minus16

y sim C(0 1)

(Section 61)y = SampP 500(Section 62)

y x120591

x

y x120591

Min minus16159 minus316 0 minus711 minus242Max 95295 381 3318 499 223Mean 230 003 1498 005 005Median 004 004 1496 004 004Standard Deviation 46980 106 120 095 071Skewness 1743 012 390 minus030 minus004Kurtosis 34334 321 16175 770 293Shapiro-Wilk lowast 071 lowastlowast lowast 024Anderson-Darling lowastlowast 051 lowastlowast lowast 018

Heavy-tailed Lambert W times Gaussian distributions havesimilar properties to a Cauchy for 120575 asymp 1 The mean of119883 (120583

119883)

equals the location of119884 (119888) due to symmetry around 120583119883and 119888

respectively Thus we can estimate 120591 from the Cauchy sampley transform y to x

120591 get 120583

119883from x

120591= 119882

120591(y) and thus obtain

an estimate for 119888 by 119888 = 120583119883

The random sample y sim C(0 1) with pdf 119891(119910) = 1120587(1+1199102) in Figure 2(a) has heavy tails with two extreme (positive)

observations A Cauchy ML fit gives 119888 = 003(0055) and119904 = 086(0053) (standard errors in parenthesis) A LambertWtimesGaussianMLEgives120583

119883= 003(0055)120590

119883= 105(0072)

and 120575 = 086(0082) Thus both fits correctly fail to reject120583119883= 119888 = 0 Table 3(a) shows summary statistics on both

samples Since the Cauchy distribution does not have a well-defined mean y = 2304(2101) is not meaningful Howeverx120591MLE

is approximately Gaussian and we can use the sampleaverage for inference x

120591MLE= 0033(00472) correctly fails to

reject a zero location for y The transformed x120591MLE

featuresadditional Gaussian characteristics (symmetric no excesskurtosis) and even the null hypothesis of Normality cannotbe rejected (119875-valuege 05) Note however that Normality forthe transformed data is only an empirical approximation therandom variable119882

120591((119884 minus 120583

119883)120590

119883) where 119884 is Cauchy does

not have a Normal distributionFigure 2(d) shows the cumulative sample average for

the original sample and its Gaussianized version For afair comparison 120591(119899)MLE was reestimated cumulatively for each119899 = 5 500 and then used to compute (119909

1 119909

119899) The

transformation works extremely well data point 11991049is highly

influential for y but has no relevant effect on x120591(119899)

MLE Even for

small 119899 it is already clear that the location of the underlyingCauchy distribution is approximately zero

Although it is a simulated example it demonstrates thatremoving (strong) heavy tails from data works well andprovides ldquonicerdquo data that can then be analyzed with morerefined Gaussian methods

62 Heavy Tails in Finance SampP 500 Case Study A lot offinancial data displays negative skewness and excess kurto-sis Since financial data in general is not iid it is oftenmodeled with a (skew) Studentrsquos 119905-distribution underlyinga (generalized) autoregressive conditional heteroskedastic(GARCH) [2 48] or a stochastic volatility (SV) [49 50]model Using the Lambert W approach we can build uponthe knowledge and implications of Normality (and avoidderiving properties of a GARCH or SV model with heavy-tailed innovations) and simply ldquoGaussianizerdquo the returnsbefore fitting more complex GARCH or SV models

Remark 10 Time series models with Lambert W times Gaussianwhite noise are far beyond the scope of this work but canbe a direction of future research Here I only consider theunconditional distribution

Figure 7(a) shows the SampP 500 log-returns with a totalof 119873 = 2 780 daily observations (R package MASSdataset SP500) Table 3(b) confirms the heavy tails (samplekurtosis 770) but also indicates negative skewness (minus0296)As the sample skewness 120574

1(y) is very sensitive to outliers

we fit a distribution which allows skewness and test forsymmetry In case of the double-tail Lambert W times Gaussianthis means testing 119867

0 120575

ℓ= 120575

119903= 120575 versus 119867

1

120575ℓ

= 120575119903 Using the likelihood expression in (28) we can

use a likelihood ratio test with one degree of freedom (3versus 4 parameters) The log-likelihood of the double-tail fit(Table 4(a)) equals minus36060 = minus297227 + (minus63373) (inputlog-likelihood + penalty) while the symmetric 120575 fit givesminus360656 = minus297147 + (minus63509) Here the symmetric fitgives a transformed sample that is more Gaussian but it paysa greater penalty for transforming the data Comparing twicetheir difference to a1205942

1distribution gives a119875-value of 029 For

comparison a skew-119905 fit [51] with location 119888 scale 119904 shape120572 and ] degrees of freedom also yields (Function stmle

in the R package sn) a nonsignificant (Table 4(b)) Thusboth fits cannot reject symmetry

The Scientific World Journal 13

Time

y

0 500 1500 2500

minus6

minus2

2

0 5 2010 3000

04

08

Lag

ACF

y

minus5 0 500

03

06

KernelGaussian

minus3 minus2 minus1 0 1 2 3

minus6

minus2

2

Theoretical quantiles

Sam

ple q

uant

ilesMean 005

SD 095Skewminus03Kurt 771

(a) Observed heavy tail returns y

Time

x

0 500 1500 2500

minus2

012

0 5 10 20 3000

04

08

Lag

ACF

x

00

03

06

minus3 minus2 minus1 0 1 2 3minus3 minus2 minus1 0 1 2 3

minus2

012

Theoretical quantiles

KernelGaussian

Sam

ple q

uant

iles

Mean 005SD 07Skewminus004Kurt 293

(b) Gaussianized returns x120591MLE

Figure 7 Lambert W Gaussianization of SampP 500 log-returns 120591 = (005 070 017) In (a) and (b) data (top left) autocorrelation function(ACF) (top right) histogram Gaussian fit and KDE (bottom left) Normal QQ plot (bottom right)

Assume we have to make a decision if we should trade acertificate replicating the SampP 500 Since we can either buyor sell it is not important if the average return is positive ornegative as long as it is significantly different from zero

621 Gaussian Fit to Returns Estimating (120583119884 120590119884) by Gaus-

sianMLE and thus ignoring the heavy tails 120583119884= 0 cannot be

rejected on a 120572 = 1 level (Table 4(e)) Ignoring heavy tailswe would thus decide to not trade a replicating certificate atall

622 Heavy Tail Fit to Returns Both a heavy tail LambertWtimes Gaussian (Table 4(c)) and Student 119905-fit (Table 4(d)) rejectthe zero mean null (119875-values 10minus4 and 3 sdot 10minus5 resp)

Location and scale estimates are almost identical buttail estimates lead to different conclusions while for ] =

371 only moments up to order 3 exist in the Lambert W times

Gaussian case moments up to order 5 exist (10172 = 581)This is especially noteworthy as many theoretical results inthe (financial) time series literature rely on finite fourthmoments [52 53] consequently many empirical studies testthis assumption [7 54] Here Studentrsquos 119905 and a LambertW times Gaussian fit give different conclusions Since previousempirical studies often use Studentrsquos 119905 as a baseline [41] itmight be worthwhile to reexamine their findings in light ofheavy tail Lambert W times Gaussian distributions

623 ldquoGaussianizingrdquo Financial Returns The back-trans-formed x

120591MLEis indistinguishable from a Gaussian sample

(Figure 7(b)) and thus demonstrates that a Lambert W times

Gaussian distribution is indeed appropriate for x120591MLE

Noteven one test can reject Normality 119875-values are 018 018031 and 024 respectively (Anderson-Darling Cramer-von-Mises Shapiro-Francia Shapiro-Wilk see Thode [55])Table 3(b) confirms that Lambert W ldquoGaussianiziationrdquo was

successful 1205741(x120591) = minus0039 and 120574

2(x120591) = 293 are within the

typical variation for aGaussian sample of size119873 = 2780Thus

119884 = (119880119890(01722)119880

2

) 0705 + 0055 119880 =

119883 minus 0055

0705

119880 simN (0 1)

(36)

is an adequate (unconditional) LambertWtimesGaussianmodelfor the SampP 500 log-returns y For trading thismeans that theexpected return is significantly larger than zero (120583

119883= 0055 gt

0) and thus replicating certificates should be bought

624 Gaussian MLE for Gaussianized Data For 120575119897= 120575

119903equiv

120575 lt 1 also 120583119883equiv 120583

119884 We can therefore replace testing

120583119910= 0 versus 120583

119910= 0 for a non-Gaussian y with the

very well-understood hypothesis test 120583119909= 0 versus 120583

119909= 0

for the Gaussian x120591MLE

In particular standard errors basedon radic119873 and thus t and 119875-values should be closer to theldquotruthrdquo (Tables 4(c) and 4(d)) than a Gaussian MLE on thenon-Gaussian y (Table 4(e)) Table 4(f) shows that standarderrors for 120583x are even a bit too small compared to the heavy-tailed versions Since the ldquoGaussianizingrdquo transformation wasestimated treating x

120591MLEas if it was original data is too

optimistic regarding its Normality (recall the penalty (30) inthe total likelihood (28))

This example confirms that if a model and its theoret-ical properties are based on Normality but the observeddata is heavy-tailed then Gaussianizing the data first givesmore reliable inference than applying Gaussian methods tothe original heavy-tailed data (Figure 1) Clearly a jointestimation of the model parameters based on Lambert Wtimes Gaussian random variables (or any other heavy-taileddistribution) would be optimal However theoretical prop-erties and estimation techniques may not be available orwell understood The Lambert Way to Gaussianize data is

14 The Scientific World Journal

Table 4 MLE fits to SampP 500 y Tables 4(a) 4(b) 4(c) 4(d) and 4(e)and the Gaussianized data x

120591MLETable 4(f)

(a) Double-tail Lambert W times Gaussian = Tukeyrsquos ℎℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 366 000120590119883

071 0016 4400 000120575ℓ

019 0021 899 000120575119903

016 0019 824 000

(b) Skew 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 010 0061 165 010119904 067 0017 3847 000120572 minus008 0101 minus077 044] 373 0297 1257 000

(c) Lambert W times Gaussian = Tukeyrsquos ℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 365 0000120590119883

071 0016 4395 0000120575 017 0016 1105 0000

(d) Studentrsquos 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 006 0015 365 000119904 067 0017 3951 000] 372 0295 1261 000

(e) Gaussian (SampP 500)

Est se 119905 Pr (gt|119905|)120583119884

005 0018 255 001120590119884

095 0013 7457 000

(f) Gaussian (x120591MLE )

Est se 119905 Pr (gt|119905|)120583119909120591

005 0013 381 000120590119909120591

071 0009 7457 000

thus a pragmatic method to improve statistical inferenceon heavy-tailed data while preserving the applicability andinterpretation of Gaussian models

7 Discussion and Outlook

In this work I use the Lambert W function to modeland remove heavy tails from continuous random variablesusing a data-transformation approach For Gaussian randomvariables this not only contributes to existing work onTukeyrsquos ℎ distribution but also gives convincing empiricalresults unimodal data with heavy tails can be transformedto Gaussian data Properties of a Gaussian model MN onthe back-transformed data mimic the features of the ldquotruerdquoheavy-tailed modelM

119866very closely

Since Normality is the single most typical and oftenrequired assumption in many areas of statistics machine

learning and signal processing future research can takemany directions From a theoretical perspective properties ofLambert W times 119865

119883distributions viewed as a generalization of

alreadywell-known distributions119865119883can be studiedThis area

will profit from existing literature on the Lambert W func-tion which has been discovered only recently by the statisticscommunity Empirical work can focus on transforming thedata and comparing approximate Gaussian with joint heavytail analyses The comparisons in this work showed thatapproximate inference for Gaussianized data is comparablewith the direct heavy tail modeling and so provides a simpletool to improve inference for heavy-tailed data in statisticalpractice

I also provide the R package Lambert W publicly avail-able at CRAN to facilitate the use of heavy tail Lambert Wtimes 119865

119883distributions in practice

Disclosure

The first version of this article entitled ldquoClosed-form cdfand pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert Wapproach and how to bijectively lsquoGaussianizersquo heavy-taileddatardquo has appeared online [56]This research was performedwhile the author was at Carnegie Mellon University Cur-rently Georg M Goerg works at Google Inc New York NY10011 USA

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

References

[1] P J Brockwell andR ADavisTime SeriesTheory andMethodsSpringer Series in Statistics 1998

[2] R F Engle ldquoAutoregressive conditional heteroscedasticity withestimates of the variance of United Kingdom inflationrdquo Econo-metrica vol 50 no 4 pp 987ndash1007 1982

[3] C W J Granger and R Joyeux ldquoAn introduction to long-memory time series models and fractional differencingrdquo Jour-nal of Time Series Analysis vol 1 pp 15ndash30 2001

[4] C A Field ldquoUsing the gh distribution to model extreme windspeedsrdquo Journal of Statistical Planning and Inference vol 122 no1-2 pp 15ndash22 2004

[5] A Vazquez J G Oliveira Z Dezso K-I Goh I Kondor andA-L Barabasi ldquoModeling bursts and heavy tails in humandynamicsrdquo Physical Review E vol 73 no 3 Article ID 0361272006

[6] M Gidlund and N Debernardi ldquoScheduling performance ofheavy-tailed data traffic in wireless high-speed shared chan-nelsrdquo in Proceedings of the IEEE Wireless Communications ampNetworking Conference (WCNC rsquo09) pp 1818ndash1823 IEEE PressBudapest Hungary April 2009

[7] R Cont ldquoEmpirical properties of asset returns stylized factsand statistical issuesrdquo Quantitative Finance vol 1 pp 223ndash2362001

[8] T-H Kim and H White ldquoOn more robust estim ation ofskewness and kurtosis simulation and application to the SP500indexrdquo Tech Rep Department of Economics UCSD 2003

The Scientific World Journal 15

[9] T C Aysal and K E Barner ldquoSecond-order heavy-taileddistributions and tail analysisrdquo IEEE Transactions on SignalProcessing vol 54 no 7 pp 2827ndash2832 2006

[10] V K Smith ldquoLeast squares regression with Cauchy errorsrdquoOxford Bulletin of Economics and Statistics vol 35 no 3 pp223ndash231 1973

[11] J Ilow ldquoForecasting network traffic using FARIMAmodels withheavy tailed innovationsrdquo inProceedings of the IEEE InterntionalConference onAcoustics Speech and Signal Processing pp 3814ndash3817 June 2000

[12] W Palma and M Zevallos ldquoFitting non-Gaussian persistentdatardquo Applied Stochastic Models in Business and Industry vol27 no 1 pp 23ndash36 2011

[13] J Nowicka-Zagrajek and R Weron ldquoModeling electricity loadsin California ARMA models with hyperbolic noiserdquo SignalProcessing vol 82 no 12 pp 1903ndash1915 2002

[14] R J Adler R E Feldman and M S Taqqu Eds A PracticalGuide to Heavy Tails Statistical Techniques and ApplicationsBirkhauser Cambridge Mass USA 1998

[15] H Liu J Lafferty and L Wasserman ldquoThe nonparanormalsemiparametric estimation of high dimensional undirectedgraphsrdquo Journal ofMachine Learning Research vol 10 pp 2295ndash2328 2009

[16] G M Goerg ldquoLambert W random variablesmdasha new familyof generalized skewed distributions with applications to riskestimationrdquo The Annals of Applied Statistics vol 5 no 3 pp2197ndash2230 2011

[17] D C Hoaglin ldquoSummarizing shape numerically the g-and-hdistributionsrdquo in Exploring Data Tables Trends and Shapes DC Hoaglin F Mosteller and J W Tukey Eds chapter 11 pp461ndash513 John Wiley amp Sons Hoboken NJ USA 1985

[18] RDevelopment Core Team R A Language and Environment forStatistical Computing R Foundation for Statistical ComputingVienna Austria 2010

[19] C Field and M G Genton ldquoThe Multivariate g-and-h Distri-butionrdquo Technometrics vol 48 no 1 pp 104ndash111 2006

[20] RM Sakia ldquoThe Box-Cox transformation technique a reviewrdquoJournal of the Royal Statistical Society Series D The Statisticianvol 41 no 2 pp 169ndash178 1992

[21] J R Blaylock L E Salathe and R D Green ldquoA note on the Box-Cox transformation under heteroskedasticityrdquoWestern Journalof Agricultural Economics vol 5 no 2 pp 129ndash136 1980

[22] A J Lawrance ldquoA note on the variance of the Box-Cox regres-sion transformation estimaterdquo Journal of the Royal StatisticalSociety C vol 36 no 2 pp 221ndash223 1987

[23] G Tsiotas ldquoOn the use of the Box-Cox transformation onconditional variance modelsrdquo Finance Research Letters vol 4no 1 pp 28ndash32 2007

[24] S Goncalves andNMeddahi ldquoBox-Cox transforms for realizedvolatilityrdquo Journal of Econometrics vol 160 no 1 pp 129ndash1442011

[25] N H Bingham C M Goldie and J L Teugels RegularVariation Encyclopedia of Mathematics and its ApplicationsCambridge University Press Cambridge UK 1989

[26] S Morgenthaler and J W Tukey ldquoFitting quantiles doublingHRHQ andHHH distributionsrdquo Journal of Computational andGraphical Statistics vol 9 no 1 pp 180ndash195 2000

[27] M Fischer ldquoGeneralized Tukey-type distributions with appli-cation to financial and teletraffic datardquo Statistical Papers vol 51no 1 pp 41ndash56 2010

[28] T C Headrick R K Kowalchuk and Y Sheng ldquoParametricprobability densities and distribution functions for Tukey g-and-h transform ations and their use for fi tting datardquo AppliedMathematical Sciences vol 2 no 9 pp 449ndash462 2008

[29] R M Corless G H Gonnet D E Hare D J Jeffrey andD E Knuth ldquoOn the Lambert 119882 functionrdquo Advances inComputational Mathematics vol 5 no 4 pp 329ndash359 1996

[30] M Rosenlicht ldquoOn the explicit solvability of certain transcen-dental equationsrdquo Publications Mathematiques de lrsquoInstitut desHautes Etudes Scientifiques no 36 pp 15ndash22 1969

[31] S R Valluri D J Jeffrey and R M Corless ldquoSome applicationsof the Lambert W function to physicsrdquo Canadian Journal ofPhysics vol 78 no 9 pp 823ndash831 2000

[32] M Galassi J Davies J Theiler et al GNU Scientific LibraryReference Manual 3rd edition 2011 httpwwwgnuorgsoft-waregsl

[33] P Jodra ldquoA closed-form expression for the quantile functionof the Gompertz-Makeham distributionrdquo Mathematics andComputers in Simulation vol 79 no 10 pp 3069ndash3075 2009

[34] A G Pakes ldquoLambertrsquos W infinite divisibility and poissonmixturesrdquo Journal of Mathematical Analysis and Applicationsvol 378 no 2 pp 480ndash492 2011

[35] M Stehlık ldquoDistributions of exact tests in the exponentialfamilyrdquoMetrika vol 57 no 2 pp 145ndash164 2003

[36] J-N Hwang S-R Lay and A Lippman ldquoNonparametricmultivariate density estimation a comparative studyrdquo IEEETransactions on Signal Processing vol 42 no 10 pp 2795ndash28101994

[37] L Wasserman All of Nonparametric Statistics Springer Texts inStatistics 2007

[38] R E Maiboroda and N M Markovich ldquoEstimation of heavy-tailed probability density functionwith application toweb datardquoComputational Statistics vol 19 no 4 pp 569ndash592 2004

[39] N M Markovich ldquoAccuracy of transformed kernel densityestimates for a heavy-tailed distributionrdquo Automation andRemote Control vol 66 no 2 pp 217ndash232 2005

[40] K K Dutta and D Babbel ldquoOn measuring skewness andkurtosis in short rate distributions the case of the US dollarLondon inter bank offer ratesrdquo Tech Rep Wharton SchoolCenter for Financial Institutions University of Pennsylvania2002

[41] C S Wong W S Chan and P L Kam ldquoA Student t-mixtureautoregressivemodel with applications to heavy-tailed financialdatardquo Biometrika vol 96 no 3 pp 751ndash760 2009

[42] J Yan ldquoAsymmetry fat-tail and autoregressive conditionaldensity in financial return data with systems of frequencycurvesrdquo 2005 httpciteseerxistpsueduviewdocsummarydoi=1011762741

[43] G D Rayner and H L MacGillivray ldquoNumerical maximumlikelihood estimation for the g-and-k and generalized g-and-hdistributionsrdquo Statistics and Computing vol 12 no 1 pp 57ndash752002

[44] E L Lehmann and G Casella Theory of Point EstimationSpringer Texts in Statistics Springer New York NY USA 2ndedition 1998

[45] C Fernandez andM F Steel ldquoMultivariate Student-119905 regressionmodels pitfalls and inferencerdquo Biometrika vol 86 no 1 pp153ndash167 1999

[46] C Liu and D B Rubin ldquoML estimation of the t distributionusing EM and its extensions ECMand ECMErdquo Statistica Sinicavol 5 pp 19ndash39 1995

16 The Scientific World Journal

[47] M P Cohen ldquoSample means of independent standard CAUchyrandom variables are standard cauchy a new approachrdquo TheAmerican Mathematical Monthly vol 119 no 3 pp 240ndash2442012

[48] T Bollerslev ldquoGeneralized autoregressive conditional het-eroskedasticityrdquo Journal of Econometrics vol 31 no 3 pp 307ndash327 1986

[49] R Deo C Hurvich and Y Lu ldquoForecasting realized volatilityusing a long-memory stochastic volatility model estimationprediction and seasonal adjustmentrdquo Journal of Econometricsvol 131 no 1-2 pp 29ndash58 2006

[50] AMelino and SM Turnbull ldquoPricing foreign currency optionswith stochastic volatilityrdquo Journal of Econometrics vol 45 no 1-2 pp 239ndash265 1990

[51] A Azzalini and A Capitanio ldquoDistributions generated byperturbation of symmetrywith emphasis on amultivariate skewt-distributionrdquo Journal of the Royal Statistical Society Series BStatistical Methodology vol 65 no 2 pp 367ndash389 2003

[52] R N Mantegna and H E Stanley ldquoModeling of financial datacomparison of the truncated Levy flight and the ARCH(1) andGARCH(11) processesrdquo Physica A Statistical Mechanics and itsApplications vol 254 no 1-2 pp 77ndash84 1998

[53] P A Zadrozny Necessary and Sufficient Restrictions for Exis-tence of a Unique Fourth Moment of a Univariate GARCH(pq)Process vol 20 of Advances in Econometrics chapter 13Emerald Group Publishing 2005 httpsideasrepecorgpcesceswps 1505html

[54] R Huisman K G Koedijk C J Kool and F Palm ldquoTail-indexestimates in small samplesrdquo Journal of Business amp EconomicStatistics vol 19 no 2 pp 208ndash216 2001

[55] H CThode JrTesting for Normality CRCPress NewYorkNYUSA 2002

[56] G M Goerg ldquoClosed-form cdf and pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert W approach and howto bijectively ldquoGaussianizerdquo heavy-tailed datardquo Tech RepCarnegieMellonUniversity 2010 httparxiv-web3librarycor-nelleduabs10102265v1

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Research Article The Lambert Way to Gaussianize Heavy

The Scientific World Journal 13

Time

y

0 500 1500 2500

minus6

minus2

2

0 5 2010 3000

04

08

Lag

ACF

y

minus5 0 500

03

06

KernelGaussian

minus3 minus2 minus1 0 1 2 3

minus6

minus2

2

Theoretical quantiles

Sam

ple q

uant

ilesMean 005

SD 095Skewminus03Kurt 771

(a) Observed heavy tail returns y

Time

x

0 500 1500 2500

minus2

012

0 5 10 20 3000

04

08

Lag

ACF

x

00

03

06

minus3 minus2 minus1 0 1 2 3minus3 minus2 minus1 0 1 2 3

minus2

012

Theoretical quantiles

KernelGaussian

Sam

ple q

uant

iles

Mean 005SD 07Skewminus004Kurt 293

(b) Gaussianized returns x120591MLE

Figure 7 Lambert W Gaussianization of SampP 500 log-returns 120591 = (005 070 017) In (a) and (b) data (top left) autocorrelation function(ACF) (top right) histogram Gaussian fit and KDE (bottom left) Normal QQ plot (bottom right)

Assume we have to make a decision if we should trade acertificate replicating the SampP 500 Since we can either buyor sell it is not important if the average return is positive ornegative as long as it is significantly different from zero

621 Gaussian Fit to Returns Estimating (120583119884 120590119884) by Gaus-

sianMLE and thus ignoring the heavy tails 120583119884= 0 cannot be

rejected on a 120572 = 1 level (Table 4(e)) Ignoring heavy tailswe would thus decide to not trade a replicating certificate atall

622 Heavy Tail Fit to Returns Both a heavy tail LambertWtimes Gaussian (Table 4(c)) and Student 119905-fit (Table 4(d)) rejectthe zero mean null (119875-values 10minus4 and 3 sdot 10minus5 resp)

Location and scale estimates are almost identical buttail estimates lead to different conclusions while for ] =

371 only moments up to order 3 exist in the Lambert W times

Gaussian case moments up to order 5 exist (10172 = 581)This is especially noteworthy as many theoretical results inthe (financial) time series literature rely on finite fourthmoments [52 53] consequently many empirical studies testthis assumption [7 54] Here Studentrsquos 119905 and a LambertW times Gaussian fit give different conclusions Since previousempirical studies often use Studentrsquos 119905 as a baseline [41] itmight be worthwhile to reexamine their findings in light ofheavy tail Lambert W times Gaussian distributions

623 ldquoGaussianizingrdquo Financial Returns The back-trans-formed x

120591MLEis indistinguishable from a Gaussian sample

(Figure 7(b)) and thus demonstrates that a Lambert W times

Gaussian distribution is indeed appropriate for x120591MLE

Noteven one test can reject Normality 119875-values are 018 018031 and 024 respectively (Anderson-Darling Cramer-von-Mises Shapiro-Francia Shapiro-Wilk see Thode [55])Table 3(b) confirms that Lambert W ldquoGaussianiziationrdquo was

successful 1205741(x120591) = minus0039 and 120574

2(x120591) = 293 are within the

typical variation for aGaussian sample of size119873 = 2780Thus

119884 = (119880119890(01722)119880

2

) 0705 + 0055 119880 =

119883 minus 0055

0705

119880 simN (0 1)

(36)

is an adequate (unconditional) LambertWtimesGaussianmodelfor the SampP 500 log-returns y For trading thismeans that theexpected return is significantly larger than zero (120583

119883= 0055 gt

0) and thus replicating certificates should be bought

624 Gaussian MLE for Gaussianized Data For 120575119897= 120575

119903equiv

120575 lt 1 also 120583119883equiv 120583

119884 We can therefore replace testing

120583119910= 0 versus 120583

119910= 0 for a non-Gaussian y with the

very well-understood hypothesis test 120583119909= 0 versus 120583

119909= 0

for the Gaussian x120591MLE

In particular standard errors basedon radic119873 and thus t and 119875-values should be closer to theldquotruthrdquo (Tables 4(c) and 4(d)) than a Gaussian MLE on thenon-Gaussian y (Table 4(e)) Table 4(f) shows that standarderrors for 120583x are even a bit too small compared to the heavy-tailed versions Since the ldquoGaussianizingrdquo transformation wasestimated treating x

120591MLEas if it was original data is too

optimistic regarding its Normality (recall the penalty (30) inthe total likelihood (28))

This example confirms that if a model and its theoret-ical properties are based on Normality but the observeddata is heavy-tailed then Gaussianizing the data first givesmore reliable inference than applying Gaussian methods tothe original heavy-tailed data (Figure 1) Clearly a jointestimation of the model parameters based on Lambert Wtimes Gaussian random variables (or any other heavy-taileddistribution) would be optimal However theoretical prop-erties and estimation techniques may not be available orwell understood The Lambert Way to Gaussianize data is

14 The Scientific World Journal

Table 4 MLE fits to SampP 500 y Tables 4(a) 4(b) 4(c) 4(d) and 4(e)and the Gaussianized data x

120591MLETable 4(f)

(a) Double-tail Lambert W times Gaussian = Tukeyrsquos ℎℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 366 000120590119883

071 0016 4400 000120575ℓ

019 0021 899 000120575119903

016 0019 824 000

(b) Skew 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 010 0061 165 010119904 067 0017 3847 000120572 minus008 0101 minus077 044] 373 0297 1257 000

(c) Lambert W times Gaussian = Tukeyrsquos ℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 365 0000120590119883

071 0016 4395 0000120575 017 0016 1105 0000

(d) Studentrsquos 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 006 0015 365 000119904 067 0017 3951 000] 372 0295 1261 000

(e) Gaussian (SampP 500)

Est se 119905 Pr (gt|119905|)120583119884

005 0018 255 001120590119884

095 0013 7457 000

(f) Gaussian (x120591MLE )

Est se 119905 Pr (gt|119905|)120583119909120591

005 0013 381 000120590119909120591

071 0009 7457 000

thus a pragmatic method to improve statistical inferenceon heavy-tailed data while preserving the applicability andinterpretation of Gaussian models

7 Discussion and Outlook

In this work I use the Lambert W function to modeland remove heavy tails from continuous random variablesusing a data-transformation approach For Gaussian randomvariables this not only contributes to existing work onTukeyrsquos ℎ distribution but also gives convincing empiricalresults unimodal data with heavy tails can be transformedto Gaussian data Properties of a Gaussian model MN onthe back-transformed data mimic the features of the ldquotruerdquoheavy-tailed modelM

119866very closely

Since Normality is the single most typical and oftenrequired assumption in many areas of statistics machine

learning and signal processing future research can takemany directions From a theoretical perspective properties ofLambert W times 119865

119883distributions viewed as a generalization of

alreadywell-known distributions119865119883can be studiedThis area

will profit from existing literature on the Lambert W func-tion which has been discovered only recently by the statisticscommunity Empirical work can focus on transforming thedata and comparing approximate Gaussian with joint heavytail analyses The comparisons in this work showed thatapproximate inference for Gaussianized data is comparablewith the direct heavy tail modeling and so provides a simpletool to improve inference for heavy-tailed data in statisticalpractice

I also provide the R package Lambert W publicly avail-able at CRAN to facilitate the use of heavy tail Lambert Wtimes 119865

119883distributions in practice

Disclosure

The first version of this article entitled ldquoClosed-form cdfand pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert Wapproach and how to bijectively lsquoGaussianizersquo heavy-taileddatardquo has appeared online [56]This research was performedwhile the author was at Carnegie Mellon University Cur-rently Georg M Goerg works at Google Inc New York NY10011 USA

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

References

[1] P J Brockwell andR ADavisTime SeriesTheory andMethodsSpringer Series in Statistics 1998

[2] R F Engle ldquoAutoregressive conditional heteroscedasticity withestimates of the variance of United Kingdom inflationrdquo Econo-metrica vol 50 no 4 pp 987ndash1007 1982

[3] C W J Granger and R Joyeux ldquoAn introduction to long-memory time series models and fractional differencingrdquo Jour-nal of Time Series Analysis vol 1 pp 15ndash30 2001

[4] C A Field ldquoUsing the gh distribution to model extreme windspeedsrdquo Journal of Statistical Planning and Inference vol 122 no1-2 pp 15ndash22 2004

[5] A Vazquez J G Oliveira Z Dezso K-I Goh I Kondor andA-L Barabasi ldquoModeling bursts and heavy tails in humandynamicsrdquo Physical Review E vol 73 no 3 Article ID 0361272006

[6] M Gidlund and N Debernardi ldquoScheduling performance ofheavy-tailed data traffic in wireless high-speed shared chan-nelsrdquo in Proceedings of the IEEE Wireless Communications ampNetworking Conference (WCNC rsquo09) pp 1818ndash1823 IEEE PressBudapest Hungary April 2009

[7] R Cont ldquoEmpirical properties of asset returns stylized factsand statistical issuesrdquo Quantitative Finance vol 1 pp 223ndash2362001

[8] T-H Kim and H White ldquoOn more robust estim ation ofskewness and kurtosis simulation and application to the SP500indexrdquo Tech Rep Department of Economics UCSD 2003

The Scientific World Journal 15

[9] T C Aysal and K E Barner ldquoSecond-order heavy-taileddistributions and tail analysisrdquo IEEE Transactions on SignalProcessing vol 54 no 7 pp 2827ndash2832 2006

[10] V K Smith ldquoLeast squares regression with Cauchy errorsrdquoOxford Bulletin of Economics and Statistics vol 35 no 3 pp223ndash231 1973

[11] J Ilow ldquoForecasting network traffic using FARIMAmodels withheavy tailed innovationsrdquo inProceedings of the IEEE InterntionalConference onAcoustics Speech and Signal Processing pp 3814ndash3817 June 2000

[12] W Palma and M Zevallos ldquoFitting non-Gaussian persistentdatardquo Applied Stochastic Models in Business and Industry vol27 no 1 pp 23ndash36 2011

[13] J Nowicka-Zagrajek and R Weron ldquoModeling electricity loadsin California ARMA models with hyperbolic noiserdquo SignalProcessing vol 82 no 12 pp 1903ndash1915 2002

[14] R J Adler R E Feldman and M S Taqqu Eds A PracticalGuide to Heavy Tails Statistical Techniques and ApplicationsBirkhauser Cambridge Mass USA 1998

[15] H Liu J Lafferty and L Wasserman ldquoThe nonparanormalsemiparametric estimation of high dimensional undirectedgraphsrdquo Journal ofMachine Learning Research vol 10 pp 2295ndash2328 2009

[16] G M Goerg ldquoLambert W random variablesmdasha new familyof generalized skewed distributions with applications to riskestimationrdquo The Annals of Applied Statistics vol 5 no 3 pp2197ndash2230 2011

[17] D C Hoaglin ldquoSummarizing shape numerically the g-and-hdistributionsrdquo in Exploring Data Tables Trends and Shapes DC Hoaglin F Mosteller and J W Tukey Eds chapter 11 pp461ndash513 John Wiley amp Sons Hoboken NJ USA 1985

[18] RDevelopment Core Team R A Language and Environment forStatistical Computing R Foundation for Statistical ComputingVienna Austria 2010

[19] C Field and M G Genton ldquoThe Multivariate g-and-h Distri-butionrdquo Technometrics vol 48 no 1 pp 104ndash111 2006

[20] RM Sakia ldquoThe Box-Cox transformation technique a reviewrdquoJournal of the Royal Statistical Society Series D The Statisticianvol 41 no 2 pp 169ndash178 1992

[21] J R Blaylock L E Salathe and R D Green ldquoA note on the Box-Cox transformation under heteroskedasticityrdquoWestern Journalof Agricultural Economics vol 5 no 2 pp 129ndash136 1980

[22] A J Lawrance ldquoA note on the variance of the Box-Cox regres-sion transformation estimaterdquo Journal of the Royal StatisticalSociety C vol 36 no 2 pp 221ndash223 1987

[23] G Tsiotas ldquoOn the use of the Box-Cox transformation onconditional variance modelsrdquo Finance Research Letters vol 4no 1 pp 28ndash32 2007

[24] S Goncalves andNMeddahi ldquoBox-Cox transforms for realizedvolatilityrdquo Journal of Econometrics vol 160 no 1 pp 129ndash1442011

[25] N H Bingham C M Goldie and J L Teugels RegularVariation Encyclopedia of Mathematics and its ApplicationsCambridge University Press Cambridge UK 1989

[26] S Morgenthaler and J W Tukey ldquoFitting quantiles doublingHRHQ andHHH distributionsrdquo Journal of Computational andGraphical Statistics vol 9 no 1 pp 180ndash195 2000

[27] M Fischer ldquoGeneralized Tukey-type distributions with appli-cation to financial and teletraffic datardquo Statistical Papers vol 51no 1 pp 41ndash56 2010

[28] T C Headrick R K Kowalchuk and Y Sheng ldquoParametricprobability densities and distribution functions for Tukey g-and-h transform ations and their use for fi tting datardquo AppliedMathematical Sciences vol 2 no 9 pp 449ndash462 2008

[29] R M Corless G H Gonnet D E Hare D J Jeffrey andD E Knuth ldquoOn the Lambert 119882 functionrdquo Advances inComputational Mathematics vol 5 no 4 pp 329ndash359 1996

[30] M Rosenlicht ldquoOn the explicit solvability of certain transcen-dental equationsrdquo Publications Mathematiques de lrsquoInstitut desHautes Etudes Scientifiques no 36 pp 15ndash22 1969

[31] S R Valluri D J Jeffrey and R M Corless ldquoSome applicationsof the Lambert W function to physicsrdquo Canadian Journal ofPhysics vol 78 no 9 pp 823ndash831 2000

[32] M Galassi J Davies J Theiler et al GNU Scientific LibraryReference Manual 3rd edition 2011 httpwwwgnuorgsoft-waregsl

[33] P Jodra ldquoA closed-form expression for the quantile functionof the Gompertz-Makeham distributionrdquo Mathematics andComputers in Simulation vol 79 no 10 pp 3069ndash3075 2009

[34] A G Pakes ldquoLambertrsquos W infinite divisibility and poissonmixturesrdquo Journal of Mathematical Analysis and Applicationsvol 378 no 2 pp 480ndash492 2011

[35] M Stehlık ldquoDistributions of exact tests in the exponentialfamilyrdquoMetrika vol 57 no 2 pp 145ndash164 2003

[36] J-N Hwang S-R Lay and A Lippman ldquoNonparametricmultivariate density estimation a comparative studyrdquo IEEETransactions on Signal Processing vol 42 no 10 pp 2795ndash28101994

[37] L Wasserman All of Nonparametric Statistics Springer Texts inStatistics 2007

[38] R E Maiboroda and N M Markovich ldquoEstimation of heavy-tailed probability density functionwith application toweb datardquoComputational Statistics vol 19 no 4 pp 569ndash592 2004

[39] N M Markovich ldquoAccuracy of transformed kernel densityestimates for a heavy-tailed distributionrdquo Automation andRemote Control vol 66 no 2 pp 217ndash232 2005

[40] K K Dutta and D Babbel ldquoOn measuring skewness andkurtosis in short rate distributions the case of the US dollarLondon inter bank offer ratesrdquo Tech Rep Wharton SchoolCenter for Financial Institutions University of Pennsylvania2002

[41] C S Wong W S Chan and P L Kam ldquoA Student t-mixtureautoregressivemodel with applications to heavy-tailed financialdatardquo Biometrika vol 96 no 3 pp 751ndash760 2009

[42] J Yan ldquoAsymmetry fat-tail and autoregressive conditionaldensity in financial return data with systems of frequencycurvesrdquo 2005 httpciteseerxistpsueduviewdocsummarydoi=1011762741

[43] G D Rayner and H L MacGillivray ldquoNumerical maximumlikelihood estimation for the g-and-k and generalized g-and-hdistributionsrdquo Statistics and Computing vol 12 no 1 pp 57ndash752002

[44] E L Lehmann and G Casella Theory of Point EstimationSpringer Texts in Statistics Springer New York NY USA 2ndedition 1998

[45] C Fernandez andM F Steel ldquoMultivariate Student-119905 regressionmodels pitfalls and inferencerdquo Biometrika vol 86 no 1 pp153ndash167 1999

[46] C Liu and D B Rubin ldquoML estimation of the t distributionusing EM and its extensions ECMand ECMErdquo Statistica Sinicavol 5 pp 19ndash39 1995

16 The Scientific World Journal

[47] M P Cohen ldquoSample means of independent standard CAUchyrandom variables are standard cauchy a new approachrdquo TheAmerican Mathematical Monthly vol 119 no 3 pp 240ndash2442012

[48] T Bollerslev ldquoGeneralized autoregressive conditional het-eroskedasticityrdquo Journal of Econometrics vol 31 no 3 pp 307ndash327 1986

[49] R Deo C Hurvich and Y Lu ldquoForecasting realized volatilityusing a long-memory stochastic volatility model estimationprediction and seasonal adjustmentrdquo Journal of Econometricsvol 131 no 1-2 pp 29ndash58 2006

[50] AMelino and SM Turnbull ldquoPricing foreign currency optionswith stochastic volatilityrdquo Journal of Econometrics vol 45 no 1-2 pp 239ndash265 1990

[51] A Azzalini and A Capitanio ldquoDistributions generated byperturbation of symmetrywith emphasis on amultivariate skewt-distributionrdquo Journal of the Royal Statistical Society Series BStatistical Methodology vol 65 no 2 pp 367ndash389 2003

[52] R N Mantegna and H E Stanley ldquoModeling of financial datacomparison of the truncated Levy flight and the ARCH(1) andGARCH(11) processesrdquo Physica A Statistical Mechanics and itsApplications vol 254 no 1-2 pp 77ndash84 1998

[53] P A Zadrozny Necessary and Sufficient Restrictions for Exis-tence of a Unique Fourth Moment of a Univariate GARCH(pq)Process vol 20 of Advances in Econometrics chapter 13Emerald Group Publishing 2005 httpsideasrepecorgpcesceswps 1505html

[54] R Huisman K G Koedijk C J Kool and F Palm ldquoTail-indexestimates in small samplesrdquo Journal of Business amp EconomicStatistics vol 19 no 2 pp 208ndash216 2001

[55] H CThode JrTesting for Normality CRCPress NewYorkNYUSA 2002

[56] G M Goerg ldquoClosed-form cdf and pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert W approach and howto bijectively ldquoGaussianizerdquo heavy-tailed datardquo Tech RepCarnegieMellonUniversity 2010 httparxiv-web3librarycor-nelleduabs10102265v1

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 14: Research Article The Lambert Way to Gaussianize Heavy

14 The Scientific World Journal

Table 4 MLE fits to SampP 500 y Tables 4(a) 4(b) 4(c) 4(d) and 4(e)and the Gaussianized data x

120591MLETable 4(f)

(a) Double-tail Lambert W times Gaussian = Tukeyrsquos ℎℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 366 000120590119883

071 0016 4400 000120575ℓ

019 0021 899 000120575119903

016 0019 824 000

(b) Skew 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 010 0061 165 010119904 067 0017 3847 000120572 minus008 0101 minus077 044] 373 0297 1257 000

(c) Lambert W times Gaussian = Tukeyrsquos ℎ (SampP 500)

Est se 119905 Pr (gt|119905|)120583119883

006 0015 365 0000120590119883

071 0016 4395 0000120575 017 0016 1105 0000

(d) Studentrsquos 119905 (SampP 500)

Est se 119905 Pr (gt|119905|)119888 006 0015 365 000119904 067 0017 3951 000] 372 0295 1261 000

(e) Gaussian (SampP 500)

Est se 119905 Pr (gt|119905|)120583119884

005 0018 255 001120590119884

095 0013 7457 000

(f) Gaussian (x120591MLE )

Est se 119905 Pr (gt|119905|)120583119909120591

005 0013 381 000120590119909120591

071 0009 7457 000

thus a pragmatic method to improve statistical inferenceon heavy-tailed data while preserving the applicability andinterpretation of Gaussian models

7 Discussion and Outlook

In this work I use the Lambert W function to modeland remove heavy tails from continuous random variablesusing a data-transformation approach For Gaussian randomvariables this not only contributes to existing work onTukeyrsquos ℎ distribution but also gives convincing empiricalresults unimodal data with heavy tails can be transformedto Gaussian data Properties of a Gaussian model MN onthe back-transformed data mimic the features of the ldquotruerdquoheavy-tailed modelM

119866very closely

Since Normality is the single most typical and oftenrequired assumption in many areas of statistics machine

learning and signal processing future research can takemany directions From a theoretical perspective properties ofLambert W times 119865

119883distributions viewed as a generalization of

alreadywell-known distributions119865119883can be studiedThis area

will profit from existing literature on the Lambert W func-tion which has been discovered only recently by the statisticscommunity Empirical work can focus on transforming thedata and comparing approximate Gaussian with joint heavytail analyses The comparisons in this work showed thatapproximate inference for Gaussianized data is comparablewith the direct heavy tail modeling and so provides a simpletool to improve inference for heavy-tailed data in statisticalpractice

I also provide the R package Lambert W publicly avail-able at CRAN to facilitate the use of heavy tail Lambert Wtimes 119865

119883distributions in practice

Disclosure

The first version of this article entitled ldquoClosed-form cdfand pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert Wapproach and how to bijectively lsquoGaussianizersquo heavy-taileddatardquo has appeared online [56]This research was performedwhile the author was at Carnegie Mellon University Cur-rently Georg M Goerg works at Google Inc New York NY10011 USA

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

References

[1] P J Brockwell andR ADavisTime SeriesTheory andMethodsSpringer Series in Statistics 1998

[2] R F Engle ldquoAutoregressive conditional heteroscedasticity withestimates of the variance of United Kingdom inflationrdquo Econo-metrica vol 50 no 4 pp 987ndash1007 1982

[3] C W J Granger and R Joyeux ldquoAn introduction to long-memory time series models and fractional differencingrdquo Jour-nal of Time Series Analysis vol 1 pp 15ndash30 2001

[4] C A Field ldquoUsing the gh distribution to model extreme windspeedsrdquo Journal of Statistical Planning and Inference vol 122 no1-2 pp 15ndash22 2004

[5] A Vazquez J G Oliveira Z Dezso K-I Goh I Kondor andA-L Barabasi ldquoModeling bursts and heavy tails in humandynamicsrdquo Physical Review E vol 73 no 3 Article ID 0361272006

[6] M Gidlund and N Debernardi ldquoScheduling performance ofheavy-tailed data traffic in wireless high-speed shared chan-nelsrdquo in Proceedings of the IEEE Wireless Communications ampNetworking Conference (WCNC rsquo09) pp 1818ndash1823 IEEE PressBudapest Hungary April 2009

[7] R Cont ldquoEmpirical properties of asset returns stylized factsand statistical issuesrdquo Quantitative Finance vol 1 pp 223ndash2362001

[8] T-H Kim and H White ldquoOn more robust estim ation ofskewness and kurtosis simulation and application to the SP500indexrdquo Tech Rep Department of Economics UCSD 2003

The Scientific World Journal 15

[9] T C Aysal and K E Barner ldquoSecond-order heavy-taileddistributions and tail analysisrdquo IEEE Transactions on SignalProcessing vol 54 no 7 pp 2827ndash2832 2006

[10] V K Smith ldquoLeast squares regression with Cauchy errorsrdquoOxford Bulletin of Economics and Statistics vol 35 no 3 pp223ndash231 1973

[11] J Ilow ldquoForecasting network traffic using FARIMAmodels withheavy tailed innovationsrdquo inProceedings of the IEEE InterntionalConference onAcoustics Speech and Signal Processing pp 3814ndash3817 June 2000

[12] W Palma and M Zevallos ldquoFitting non-Gaussian persistentdatardquo Applied Stochastic Models in Business and Industry vol27 no 1 pp 23ndash36 2011

[13] J Nowicka-Zagrajek and R Weron ldquoModeling electricity loadsin California ARMA models with hyperbolic noiserdquo SignalProcessing vol 82 no 12 pp 1903ndash1915 2002

[14] R J Adler R E Feldman and M S Taqqu Eds A PracticalGuide to Heavy Tails Statistical Techniques and ApplicationsBirkhauser Cambridge Mass USA 1998

[15] H Liu J Lafferty and L Wasserman ldquoThe nonparanormalsemiparametric estimation of high dimensional undirectedgraphsrdquo Journal ofMachine Learning Research vol 10 pp 2295ndash2328 2009

[16] G M Goerg ldquoLambert W random variablesmdasha new familyof generalized skewed distributions with applications to riskestimationrdquo The Annals of Applied Statistics vol 5 no 3 pp2197ndash2230 2011

[17] D C Hoaglin ldquoSummarizing shape numerically the g-and-hdistributionsrdquo in Exploring Data Tables Trends and Shapes DC Hoaglin F Mosteller and J W Tukey Eds chapter 11 pp461ndash513 John Wiley amp Sons Hoboken NJ USA 1985

[18] RDevelopment Core Team R A Language and Environment forStatistical Computing R Foundation for Statistical ComputingVienna Austria 2010

[19] C Field and M G Genton ldquoThe Multivariate g-and-h Distri-butionrdquo Technometrics vol 48 no 1 pp 104ndash111 2006

[20] RM Sakia ldquoThe Box-Cox transformation technique a reviewrdquoJournal of the Royal Statistical Society Series D The Statisticianvol 41 no 2 pp 169ndash178 1992

[21] J R Blaylock L E Salathe and R D Green ldquoA note on the Box-Cox transformation under heteroskedasticityrdquoWestern Journalof Agricultural Economics vol 5 no 2 pp 129ndash136 1980

[22] A J Lawrance ldquoA note on the variance of the Box-Cox regres-sion transformation estimaterdquo Journal of the Royal StatisticalSociety C vol 36 no 2 pp 221ndash223 1987

[23] G Tsiotas ldquoOn the use of the Box-Cox transformation onconditional variance modelsrdquo Finance Research Letters vol 4no 1 pp 28ndash32 2007

[24] S Goncalves andNMeddahi ldquoBox-Cox transforms for realizedvolatilityrdquo Journal of Econometrics vol 160 no 1 pp 129ndash1442011

[25] N H Bingham C M Goldie and J L Teugels RegularVariation Encyclopedia of Mathematics and its ApplicationsCambridge University Press Cambridge UK 1989

[26] S Morgenthaler and J W Tukey ldquoFitting quantiles doublingHRHQ andHHH distributionsrdquo Journal of Computational andGraphical Statistics vol 9 no 1 pp 180ndash195 2000

[27] M Fischer ldquoGeneralized Tukey-type distributions with appli-cation to financial and teletraffic datardquo Statistical Papers vol 51no 1 pp 41ndash56 2010

[28] T C Headrick R K Kowalchuk and Y Sheng ldquoParametricprobability densities and distribution functions for Tukey g-and-h transform ations and their use for fi tting datardquo AppliedMathematical Sciences vol 2 no 9 pp 449ndash462 2008

[29] R M Corless G H Gonnet D E Hare D J Jeffrey andD E Knuth ldquoOn the Lambert 119882 functionrdquo Advances inComputational Mathematics vol 5 no 4 pp 329ndash359 1996

[30] M Rosenlicht ldquoOn the explicit solvability of certain transcen-dental equationsrdquo Publications Mathematiques de lrsquoInstitut desHautes Etudes Scientifiques no 36 pp 15ndash22 1969

[31] S R Valluri D J Jeffrey and R M Corless ldquoSome applicationsof the Lambert W function to physicsrdquo Canadian Journal ofPhysics vol 78 no 9 pp 823ndash831 2000

[32] M Galassi J Davies J Theiler et al GNU Scientific LibraryReference Manual 3rd edition 2011 httpwwwgnuorgsoft-waregsl

[33] P Jodra ldquoA closed-form expression for the quantile functionof the Gompertz-Makeham distributionrdquo Mathematics andComputers in Simulation vol 79 no 10 pp 3069ndash3075 2009

[34] A G Pakes ldquoLambertrsquos W infinite divisibility and poissonmixturesrdquo Journal of Mathematical Analysis and Applicationsvol 378 no 2 pp 480ndash492 2011

[35] M Stehlık ldquoDistributions of exact tests in the exponentialfamilyrdquoMetrika vol 57 no 2 pp 145ndash164 2003

[36] J-N Hwang S-R Lay and A Lippman ldquoNonparametricmultivariate density estimation a comparative studyrdquo IEEETransactions on Signal Processing vol 42 no 10 pp 2795ndash28101994

[37] L Wasserman All of Nonparametric Statistics Springer Texts inStatistics 2007

[38] R E Maiboroda and N M Markovich ldquoEstimation of heavy-tailed probability density functionwith application toweb datardquoComputational Statistics vol 19 no 4 pp 569ndash592 2004

[39] N M Markovich ldquoAccuracy of transformed kernel densityestimates for a heavy-tailed distributionrdquo Automation andRemote Control vol 66 no 2 pp 217ndash232 2005

[40] K K Dutta and D Babbel ldquoOn measuring skewness andkurtosis in short rate distributions the case of the US dollarLondon inter bank offer ratesrdquo Tech Rep Wharton SchoolCenter for Financial Institutions University of Pennsylvania2002

[41] C S Wong W S Chan and P L Kam ldquoA Student t-mixtureautoregressivemodel with applications to heavy-tailed financialdatardquo Biometrika vol 96 no 3 pp 751ndash760 2009

[42] J Yan ldquoAsymmetry fat-tail and autoregressive conditionaldensity in financial return data with systems of frequencycurvesrdquo 2005 httpciteseerxistpsueduviewdocsummarydoi=1011762741

[43] G D Rayner and H L MacGillivray ldquoNumerical maximumlikelihood estimation for the g-and-k and generalized g-and-hdistributionsrdquo Statistics and Computing vol 12 no 1 pp 57ndash752002

[44] E L Lehmann and G Casella Theory of Point EstimationSpringer Texts in Statistics Springer New York NY USA 2ndedition 1998

[45] C Fernandez andM F Steel ldquoMultivariate Student-119905 regressionmodels pitfalls and inferencerdquo Biometrika vol 86 no 1 pp153ndash167 1999

[46] C Liu and D B Rubin ldquoML estimation of the t distributionusing EM and its extensions ECMand ECMErdquo Statistica Sinicavol 5 pp 19ndash39 1995

16 The Scientific World Journal

[47] M P Cohen ldquoSample means of independent standard CAUchyrandom variables are standard cauchy a new approachrdquo TheAmerican Mathematical Monthly vol 119 no 3 pp 240ndash2442012

[48] T Bollerslev ldquoGeneralized autoregressive conditional het-eroskedasticityrdquo Journal of Econometrics vol 31 no 3 pp 307ndash327 1986

[49] R Deo C Hurvich and Y Lu ldquoForecasting realized volatilityusing a long-memory stochastic volatility model estimationprediction and seasonal adjustmentrdquo Journal of Econometricsvol 131 no 1-2 pp 29ndash58 2006

[50] AMelino and SM Turnbull ldquoPricing foreign currency optionswith stochastic volatilityrdquo Journal of Econometrics vol 45 no 1-2 pp 239ndash265 1990

[51] A Azzalini and A Capitanio ldquoDistributions generated byperturbation of symmetrywith emphasis on amultivariate skewt-distributionrdquo Journal of the Royal Statistical Society Series BStatistical Methodology vol 65 no 2 pp 367ndash389 2003

[52] R N Mantegna and H E Stanley ldquoModeling of financial datacomparison of the truncated Levy flight and the ARCH(1) andGARCH(11) processesrdquo Physica A Statistical Mechanics and itsApplications vol 254 no 1-2 pp 77ndash84 1998

[53] P A Zadrozny Necessary and Sufficient Restrictions for Exis-tence of a Unique Fourth Moment of a Univariate GARCH(pq)Process vol 20 of Advances in Econometrics chapter 13Emerald Group Publishing 2005 httpsideasrepecorgpcesceswps 1505html

[54] R Huisman K G Koedijk C J Kool and F Palm ldquoTail-indexestimates in small samplesrdquo Journal of Business amp EconomicStatistics vol 19 no 2 pp 208ndash216 2001

[55] H CThode JrTesting for Normality CRCPress NewYorkNYUSA 2002

[56] G M Goerg ldquoClosed-form cdf and pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert W approach and howto bijectively ldquoGaussianizerdquo heavy-tailed datardquo Tech RepCarnegieMellonUniversity 2010 httparxiv-web3librarycor-nelleduabs10102265v1

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 15: Research Article The Lambert Way to Gaussianize Heavy

The Scientific World Journal 15

[9] T C Aysal and K E Barner ldquoSecond-order heavy-taileddistributions and tail analysisrdquo IEEE Transactions on SignalProcessing vol 54 no 7 pp 2827ndash2832 2006

[10] V K Smith ldquoLeast squares regression with Cauchy errorsrdquoOxford Bulletin of Economics and Statistics vol 35 no 3 pp223ndash231 1973

[11] J Ilow ldquoForecasting network traffic using FARIMAmodels withheavy tailed innovationsrdquo inProceedings of the IEEE InterntionalConference onAcoustics Speech and Signal Processing pp 3814ndash3817 June 2000

[12] W Palma and M Zevallos ldquoFitting non-Gaussian persistentdatardquo Applied Stochastic Models in Business and Industry vol27 no 1 pp 23ndash36 2011

[13] J Nowicka-Zagrajek and R Weron ldquoModeling electricity loadsin California ARMA models with hyperbolic noiserdquo SignalProcessing vol 82 no 12 pp 1903ndash1915 2002

[14] R J Adler R E Feldman and M S Taqqu Eds A PracticalGuide to Heavy Tails Statistical Techniques and ApplicationsBirkhauser Cambridge Mass USA 1998

[15] H Liu J Lafferty and L Wasserman ldquoThe nonparanormalsemiparametric estimation of high dimensional undirectedgraphsrdquo Journal ofMachine Learning Research vol 10 pp 2295ndash2328 2009

[16] G M Goerg ldquoLambert W random variablesmdasha new familyof generalized skewed distributions with applications to riskestimationrdquo The Annals of Applied Statistics vol 5 no 3 pp2197ndash2230 2011

[17] D C Hoaglin ldquoSummarizing shape numerically the g-and-hdistributionsrdquo in Exploring Data Tables Trends and Shapes DC Hoaglin F Mosteller and J W Tukey Eds chapter 11 pp461ndash513 John Wiley amp Sons Hoboken NJ USA 1985

[18] RDevelopment Core Team R A Language and Environment forStatistical Computing R Foundation for Statistical ComputingVienna Austria 2010

[19] C Field and M G Genton ldquoThe Multivariate g-and-h Distri-butionrdquo Technometrics vol 48 no 1 pp 104ndash111 2006

[20] RM Sakia ldquoThe Box-Cox transformation technique a reviewrdquoJournal of the Royal Statistical Society Series D The Statisticianvol 41 no 2 pp 169ndash178 1992

[21] J R Blaylock L E Salathe and R D Green ldquoA note on the Box-Cox transformation under heteroskedasticityrdquoWestern Journalof Agricultural Economics vol 5 no 2 pp 129ndash136 1980

[22] A J Lawrance ldquoA note on the variance of the Box-Cox regres-sion transformation estimaterdquo Journal of the Royal StatisticalSociety C vol 36 no 2 pp 221ndash223 1987

[23] G Tsiotas ldquoOn the use of the Box-Cox transformation onconditional variance modelsrdquo Finance Research Letters vol 4no 1 pp 28ndash32 2007

[24] S Goncalves andNMeddahi ldquoBox-Cox transforms for realizedvolatilityrdquo Journal of Econometrics vol 160 no 1 pp 129ndash1442011

[25] N H Bingham C M Goldie and J L Teugels RegularVariation Encyclopedia of Mathematics and its ApplicationsCambridge University Press Cambridge UK 1989

[26] S Morgenthaler and J W Tukey ldquoFitting quantiles doublingHRHQ andHHH distributionsrdquo Journal of Computational andGraphical Statistics vol 9 no 1 pp 180ndash195 2000

[27] M Fischer ldquoGeneralized Tukey-type distributions with appli-cation to financial and teletraffic datardquo Statistical Papers vol 51no 1 pp 41ndash56 2010

[28] T C Headrick R K Kowalchuk and Y Sheng ldquoParametricprobability densities and distribution functions for Tukey g-and-h transform ations and their use for fi tting datardquo AppliedMathematical Sciences vol 2 no 9 pp 449ndash462 2008

[29] R M Corless G H Gonnet D E Hare D J Jeffrey andD E Knuth ldquoOn the Lambert 119882 functionrdquo Advances inComputational Mathematics vol 5 no 4 pp 329ndash359 1996

[30] M Rosenlicht ldquoOn the explicit solvability of certain transcen-dental equationsrdquo Publications Mathematiques de lrsquoInstitut desHautes Etudes Scientifiques no 36 pp 15ndash22 1969

[31] S R Valluri D J Jeffrey and R M Corless ldquoSome applicationsof the Lambert W function to physicsrdquo Canadian Journal ofPhysics vol 78 no 9 pp 823ndash831 2000

[32] M Galassi J Davies J Theiler et al GNU Scientific LibraryReference Manual 3rd edition 2011 httpwwwgnuorgsoft-waregsl

[33] P Jodra ldquoA closed-form expression for the quantile functionof the Gompertz-Makeham distributionrdquo Mathematics andComputers in Simulation vol 79 no 10 pp 3069ndash3075 2009

[34] A G Pakes ldquoLambertrsquos W infinite divisibility and poissonmixturesrdquo Journal of Mathematical Analysis and Applicationsvol 378 no 2 pp 480ndash492 2011

[35] M Stehlık ldquoDistributions of exact tests in the exponentialfamilyrdquoMetrika vol 57 no 2 pp 145ndash164 2003

[36] J-N Hwang S-R Lay and A Lippman ldquoNonparametricmultivariate density estimation a comparative studyrdquo IEEETransactions on Signal Processing vol 42 no 10 pp 2795ndash28101994

[37] L Wasserman All of Nonparametric Statistics Springer Texts inStatistics 2007

[38] R E Maiboroda and N M Markovich ldquoEstimation of heavy-tailed probability density functionwith application toweb datardquoComputational Statistics vol 19 no 4 pp 569ndash592 2004

[39] N M Markovich ldquoAccuracy of transformed kernel densityestimates for a heavy-tailed distributionrdquo Automation andRemote Control vol 66 no 2 pp 217ndash232 2005

[40] K K Dutta and D Babbel ldquoOn measuring skewness andkurtosis in short rate distributions the case of the US dollarLondon inter bank offer ratesrdquo Tech Rep Wharton SchoolCenter for Financial Institutions University of Pennsylvania2002

[41] C S Wong W S Chan and P L Kam ldquoA Student t-mixtureautoregressivemodel with applications to heavy-tailed financialdatardquo Biometrika vol 96 no 3 pp 751ndash760 2009

[42] J Yan ldquoAsymmetry fat-tail and autoregressive conditionaldensity in financial return data with systems of frequencycurvesrdquo 2005 httpciteseerxistpsueduviewdocsummarydoi=1011762741

[43] G D Rayner and H L MacGillivray ldquoNumerical maximumlikelihood estimation for the g-and-k and generalized g-and-hdistributionsrdquo Statistics and Computing vol 12 no 1 pp 57ndash752002

[44] E L Lehmann and G Casella Theory of Point EstimationSpringer Texts in Statistics Springer New York NY USA 2ndedition 1998

[45] C Fernandez andM F Steel ldquoMultivariate Student-119905 regressionmodels pitfalls and inferencerdquo Biometrika vol 86 no 1 pp153ndash167 1999

[46] C Liu and D B Rubin ldquoML estimation of the t distributionusing EM and its extensions ECMand ECMErdquo Statistica Sinicavol 5 pp 19ndash39 1995

16 The Scientific World Journal

[47] M P Cohen ldquoSample means of independent standard CAUchyrandom variables are standard cauchy a new approachrdquo TheAmerican Mathematical Monthly vol 119 no 3 pp 240ndash2442012

[48] T Bollerslev ldquoGeneralized autoregressive conditional het-eroskedasticityrdquo Journal of Econometrics vol 31 no 3 pp 307ndash327 1986

[49] R Deo C Hurvich and Y Lu ldquoForecasting realized volatilityusing a long-memory stochastic volatility model estimationprediction and seasonal adjustmentrdquo Journal of Econometricsvol 131 no 1-2 pp 29ndash58 2006

[50] AMelino and SM Turnbull ldquoPricing foreign currency optionswith stochastic volatilityrdquo Journal of Econometrics vol 45 no 1-2 pp 239ndash265 1990

[51] A Azzalini and A Capitanio ldquoDistributions generated byperturbation of symmetrywith emphasis on amultivariate skewt-distributionrdquo Journal of the Royal Statistical Society Series BStatistical Methodology vol 65 no 2 pp 367ndash389 2003

[52] R N Mantegna and H E Stanley ldquoModeling of financial datacomparison of the truncated Levy flight and the ARCH(1) andGARCH(11) processesrdquo Physica A Statistical Mechanics and itsApplications vol 254 no 1-2 pp 77ndash84 1998

[53] P A Zadrozny Necessary and Sufficient Restrictions for Exis-tence of a Unique Fourth Moment of a Univariate GARCH(pq)Process vol 20 of Advances in Econometrics chapter 13Emerald Group Publishing 2005 httpsideasrepecorgpcesceswps 1505html

[54] R Huisman K G Koedijk C J Kool and F Palm ldquoTail-indexestimates in small samplesrdquo Journal of Business amp EconomicStatistics vol 19 no 2 pp 208ndash216 2001

[55] H CThode JrTesting for Normality CRCPress NewYorkNYUSA 2002

[56] G M Goerg ldquoClosed-form cdf and pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert W approach and howto bijectively ldquoGaussianizerdquo heavy-tailed datardquo Tech RepCarnegieMellonUniversity 2010 httparxiv-web3librarycor-nelleduabs10102265v1

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 16: Research Article The Lambert Way to Gaussianize Heavy

16 The Scientific World Journal

[47] M P Cohen ldquoSample means of independent standard CAUchyrandom variables are standard cauchy a new approachrdquo TheAmerican Mathematical Monthly vol 119 no 3 pp 240ndash2442012

[48] T Bollerslev ldquoGeneralized autoregressive conditional het-eroskedasticityrdquo Journal of Econometrics vol 31 no 3 pp 307ndash327 1986

[49] R Deo C Hurvich and Y Lu ldquoForecasting realized volatilityusing a long-memory stochastic volatility model estimationprediction and seasonal adjustmentrdquo Journal of Econometricsvol 131 no 1-2 pp 29ndash58 2006

[50] AMelino and SM Turnbull ldquoPricing foreign currency optionswith stochastic volatilityrdquo Journal of Econometrics vol 45 no 1-2 pp 239ndash265 1990

[51] A Azzalini and A Capitanio ldquoDistributions generated byperturbation of symmetrywith emphasis on amultivariate skewt-distributionrdquo Journal of the Royal Statistical Society Series BStatistical Methodology vol 65 no 2 pp 367ndash389 2003

[52] R N Mantegna and H E Stanley ldquoModeling of financial datacomparison of the truncated Levy flight and the ARCH(1) andGARCH(11) processesrdquo Physica A Statistical Mechanics and itsApplications vol 254 no 1-2 pp 77ndash84 1998

[53] P A Zadrozny Necessary and Sufficient Restrictions for Exis-tence of a Unique Fourth Moment of a Univariate GARCH(pq)Process vol 20 of Advances in Econometrics chapter 13Emerald Group Publishing 2005 httpsideasrepecorgpcesceswps 1505html

[54] R Huisman K G Koedijk C J Kool and F Palm ldquoTail-indexestimates in small samplesrdquo Journal of Business amp EconomicStatistics vol 19 no 2 pp 208ndash216 2001

[55] H CThode JrTesting for Normality CRCPress NewYorkNYUSA 2002

[56] G M Goerg ldquoClosed-form cdf and pdf of Tukeyrsquos ℎ-distribution the heavy-tail Lambert W approach and howto bijectively ldquoGaussianizerdquo heavy-tailed datardquo Tech RepCarnegieMellonUniversity 2010 httparxiv-web3librarycor-nelleduabs10102265v1

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 17: Research Article The Lambert Way to Gaussianize Heavy

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of