exploring the relation between the r* approximation and the edgeworth expansion

16
Metrika DOI 10.1007/s00184-011-0365-5 Exploring the relation between the r approximation and the Edgeworth expansion Jorge M. Arevalillo Received: 16 December 2009 © Springer-Verlag 2011 Abstract In this paper we study the relation between the r saddlepoint approxi- mation and the Edgeworth expansion when quite general assumptions for the statistic under consideration are fulfilled. We will show that the two term Edgeworth expansion approximates the r formula up to an O (n 3/2 ) remainder; this provides a new way of looking at the order of the error of the r approximation. This finding will be used to inspect the close connection between the r formula and the Edgeworth B adjustment introduced in Phillips (Biometrika 65:91–98, 1978). We will show that, whenever an Edgeworth expansion exists, this adjustment approximates both the distribution func- tion of the statistic and the r formula to the same order degree as the Edgeworth expansion. Some numerical examples for the sample mean and U-statistics are given in order to shed light on the theoretical discussion. Keywords Asymptotic statistics · Edgeworth expansion · r saddlepoint approximation · Cumulants 1 Introduction The theory of high order approximations comprises Edgeworth and saddlepoint ones. The former have a long history in statistics and date back to Francis Ysidro Edgeworth. The latter were introduced (Daniels 1954) to approximate the density function of the sample mean. Later on, new saddlepoint formulae were developed to approximate the distribution function (Lugannani and Rice 1980; Robinson 1982; Jensen 1992); they have proved to be competitive with the Edgeworth expansion J. M. Arevalillo (B ) Department of Statistics, Operational Research and Numerical Analysis, University Nacional Educación a Distancia (UNED), Paseo Senda del Rey 9, 28040 Madrid, Spain e-mail: [email protected] 123

Upload: jorge-m

Post on 08-Dec-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Exploring the relation between the r* approximation and the Edgeworth expansion

MetrikaDOI 10.1007/s00184-011-0365-5

Exploring the relation between the r∗ approximationand the Edgeworth expansion

Jorge M. Arevalillo

Received: 16 December 2009© Springer-Verlag 2011

Abstract In this paper we study the relation between the r∗ saddlepoint approxi-mation and the Edgeworth expansion when quite general assumptions for the statisticunder consideration are fulfilled. We will show that the two term Edgeworth expansionapproximates the r∗ formula up to an O(n−3/2) remainder; this provides a new way oflooking at the order of the error of the r∗ approximation. This finding will be used toinspect the close connection between the r∗ formula and the Edgeworth B adjustmentintroduced in Phillips (Biometrika 65:91–98, 1978). We will show that, whenever anEdgeworth expansion exists, this adjustment approximates both the distribution func-tion of the statistic and the r∗ formula to the same order degree as the Edgeworthexpansion. Some numerical examples for the sample mean and U-statistics are givenin order to shed light on the theoretical discussion.

Keywords Asymptotic statistics · Edgeworth expansion ·r∗ saddlepoint approximation · Cumulants

1 Introduction

The theory of high order approximations comprises Edgeworth and saddlepointones. The former have a long history in statistics and date back to Francis YsidroEdgeworth. The latter were introduced (Daniels 1954) to approximate the densityfunction of the sample mean. Later on, new saddlepoint formulae were developed toapproximate the distribution function (Lugannani and Rice 1980; Robinson 1982;Jensen 1992); they have proved to be competitive with the Edgeworth expansion

J. M. Arevalillo (B)Department of Statistics, Operational Research and Numerical Analysis, University NacionalEducación a Distancia (UNED), Paseo Senda del Rey 9, 28040 Madrid, Spaine-mail: [email protected]

123

Page 2: Exploring the relation between the r* approximation and the Edgeworth expansion

J. M. Arevalillo

and even to improve it, specially for small sample sizes (Field and Ronchetti 1990;Kakizawa and Taniguchi 1994).

The comparison of saddlepoint approximations with the Edgeworth expansion hasbeen a subject of research since Daniels’ pioneer work. The relation between them wasstudied for densities (Barndorff-Nielsen and Cox 1979; Monti 1993) and later on fordistribution functions (Kakizawa and Taniguchi 1994), with the focus on Lugannani-Rice and Esscher saddlepoint formulae. The recent manual on asymptotic statistics(DasGupta 2008) contains a nice account about the foundations of Edgeworth andsaddlepoint approximations and also gives a complete list of key references in theliterature.

This paper explores the connection between the r∗ formula, a simple saddlepointapproximation, and the Edgeworth expansion; the relation between both of them allowsto deal with an r∗ adjustment, which approximates the distribution function of thestatistic up to third order, providing an easy to implement alternative. The work isorganized as follows: Sect. 2 establishes some assumptions for the statistic underconsideration and introduces both the Edgeworth and r∗ saddlepoint approximations;their relation is discussed in Sect. 3. In Sect. 4 we give several examples so that allthe approximations are compared numerically. Finally, we establish some concludingremarks.

2 Edgeworth expansions and the r∗ approximation

Let us consider an asymptotically normal statistic Zn , based on a random sample ofvariables X1, X2, . . . , Xn with common distribution F . Assume that Zn has absolutelycontinuous distribution Fn , finite moment generating function Mn(t) within an inter-val [−r, s], r ≥ 0, s ≥ 0 : r + s > 0 containing the origin and cumulant generatingfunction (c.g.f) Kn(t) = log Mn(t).

Assume that the cumulants of Zn are expandable as a convergent series in powersof n−1/2

k(n)1 = k1,1

n1/2 + k1,2

n+ · · · , k(n)2 = 1 + k2,1

n1/2 + k2,2

n+ · · ·

k(n)j = k j,1

n( j−2)/2+ k j,2

n( j−1)/2+ · · · , j ≥ 3.

These assumptions on the cumulants are general enough to cover a wide range ofsituations such as the sample mean of independent random variables, smooth functionsof the mean vector (Hall 1997), a wide range of U-statistics (Callaert et al. 1980) ora class of estimators of the parameter of the fisrt-order autoregressive process (Ochi1983).

Under these assumptions, an Edgeworth expansion for the distribution function Fn

of the statistic can be derived formally applying the device used by Hall (1997), so weobtain that

123

Page 3: Exploring the relation between the r* approximation and the Edgeworth expansion

Relation between r∗ approximation and Edgeworth expansion

Fn(x)=Φ(x)−φ(x)[n−1/2 p1(x)+n−1 p2(x)+· · ·+n− j/2 p j (x)

]+O(n− j+1

2 ),

(1)

with Φ and φ the distribution and density functions of the N (0, 1) law.When certain regularity conditions are met, the validity of (1) can be proved theo-

retically even with an o(n− j/2) remainder (Hall 1997; Lahiri 2003).The first two polynomials in (1) give the two term Edgeworth expansion. They are

given by

p1(x) = k1,1 + k2,1x

2+ k3,1(x2 − 1)

6

p2(x) = k1,2 +(

k21,1 + k2,2

2

)x +

(k3,2

6+ k1,1k2,1

2

)(x2 − 1)

+(

k4,1

24+ k2

2,1

8+ k1,1k3,1

6

)(x3 − 3x)

+k2,1k3,1

12(x4 − 6x2 + 3)+ k2

3,1

72(x5 − 10x3 + 15x)

Other high order approximations were obtained by the saddlepoint method or byexponential tilting (Lugannani and Rice 1980; Daniels 1987). A simple adjustment ofthem is the r∗ formula (Jensen 1992; Reid 1996), given by

Fn(x) ≈ Φ(rn(τ )) (2)

where the quantity rn(τ ) is defined by rn(τ ) = ξn(τ )+ 1

ξn(τ )log

zn(τ )

ξn(τ ), with ξn(τ ) =

sgn(τ ) [2 (τ x − Kn(τ ))]1/2 , zn(τ ) = τ

√K ′′

n (τ ) and τ the point solving the saddle-point equation: K ′

n(τ ) = x .Usually, the error in (2) has an O(n−3/2) order provided that x = O(1).

3 High order connections between r∗ formula and the Edgeworth expansion

Theorem 1 of this section establishes the connection between the r∗ formula and thetwo term Edgeworth expansion. Some lemmas providing expansions in powers ofn−1/2 of the saddlepoint τ and the quantities ξn(τ ), zn(τ ) and rn(τ ) in (2) are neededin advance.

Lemma 1 For every x such that x = O(1) the saddlepoint τ verifies that τ = x +τ1√

n+ τ2

n+ O(n−3/2), where

123

Page 4: Exploring the relation between the r* approximation and the Edgeworth expansion

J. M. Arevalillo

τ1 = −(

k1,1 + k2,1x + k3,1x2

2

)and

τ2 = k1,1k2,1 − k1,2 + (k22,1 + k1,1k3,1 − k2,2)x

+(

3k2,1k3,1

2− k3,2

2

)x2 +

(k2

3,1

2− k4,1

6

)x3

Proof See Sect. 3 in Kakizawa and Taniguchi (1994).

Lemma 2 For every x such that x = O(1) the function zn(τ ) has an expansionin powers of n−1/2, which up to the O(n−3/2) term, is given by zn = zn(τ ) =x + z1√

n+ z2

n+ O(n−3/2), where z1 = −

(k1,1 + k2,1x

2

)and

z2 = k2,1k1,1

2− k1,2 +

(3k2

2,1

8− k2,2

2

)x +

(k4,1

12− k2

3,1

8

)x3

Proof The expansion of K ′′n (τ ) yields

[zn(τ )]2 = τ 2 K ′′n (τ ) = k(n)2 τ 2 + k(n)3 τ 3 + k(n)4 τ 4

2+ · · ·

Replacing the saddlepoint τ in the expression above by its expansion in the previouslemma, we will obtain

[zn(τ )]2

= x2 + 1√n

(2xτ1 + k2,1x2 + k3,1x3

)

+1

n

(2xτ2+τ 2

1 +2k2,1xτ1 + k2,2x2 + 3k3,1x2τ1 + k3,2x3 + k4,1x4

2

)+ O(n−3/2)

= x2− 1√n

(2k1,1x+k2,1x2

)+ 1

n

[k2

1,1 + 2k2,1k1,1x − 2k1,2x +(

k22,1 − k2,2

)x2

+(

k4,1

6− k2

3,1

4

)x4

]+ O(n−3/2) = x2 + 2xz1√

n+ 1

n(2xz2 + z2

1)+ O(n−3/2)

which implies, after taking the square root, the statement of the lemma.

Lemma 3 For every x > 0 such that x = O(1) the function ξn(τ ) is expandable

in powers of n−1/2, up to the O(n−3/2) term, as ξn = ξn(τ ) = x + ξ1√n

+ ξ2

n+

O(n−3/2) where ξ1 = z1 − k3,1x2

6and ξ2 = z2 + k1,1k3,1x

3+ 5k2,1k3,1 − 2k3,2

12x2 +

17k23,1 − 9k4,1

72x3.

123

Page 5: Exploring the relation between the r* approximation and the Edgeworth expansion

Relation between r∗ approximation and Edgeworth expansion

Proof From the expansions of Kn(τ ), K ′n(τ ) and K ′′

n (τ ) it follows that

[ξn(τ )]2 = 2[τK ′n(τ )− Kn(τ )] − τ 2 K ′′

n (τ )+ [zn(τ )]2

= −k(n)3 τ 3

3− k(n)4 τ 4

4+ · · · + [zn(τ )]2.

If we insert the expansions of τ and zn(τ ) from the previous lemmas in the expres-sion above and gather together the terms of the same order, we get

[ξn(τ )]2 = x2 + 1√n

(2xz1 − k3,1x3

3

)

+1

n

(z2

1 + 2xz2 − k3,1x2τ1 − k3,2x3

3− k4,1x4

4

)

+O(n−3/2).

Replacing τ1 by the expression in Lemma 1, we will obtain that

[ξn(τ )]2

= x2 + 1√n

(2xz1 − k3,1x3

3

)+ 1

n

[2xz2 +

(z1 − k3,1x2

6

)2

+ 2k1,1k3,1x2

3+

(5k2,1k3,1

6− k3,2

3

)x3 +

(17k2

3,1

36− k4,1

4

)x4

]+ O(n−3/2)

= x2 + 2xξ1√n

+ 1

n(ξ2

1 + 2xξ2)+ O(n−3/2)

from which the assertion stated by the lemma will follow, after taking the square root.

Lemma 4 For every x > 0 such that x = O(1) an expansion of rn(τ ) in powers ofn−1/2, up to the O(n−3/2) error term, is given by

rn(τ ) = x − 1√n

p1(x)+ 1

n

(x

2p1(x)

2 − p2(x)

)+ O(n−3/2)

Proof From the results provided by lemmas 2 and 3, we will obtain the following

expansion for the quotient:zn(τ )

ξn(τ )= 1 + c1√

n+ c2

n+ O(n−3/2), with c1 = z1 − ξ1

x

and c2 = z2 − ξ2

x− z1ξ1

x2 + ξ21

x2 . Therefore,

logzn(τ )

ξn(τ )= log

(1 + c1√

n+ c2

n+ O(n−3/2)

)=

(c1√

n+ c2

n+ O(n−3/2)

)

−1

2

(c1√

n+ c2

n+ O(n−3/2)

)2

+ · · · = c1√n

+ 1

n

(c2 − c2

1

2

)+ O(n−3/2).

123

Page 6: Exploring the relation between the r* approximation and the Edgeworth expansion

J. M. Arevalillo

Finally, the expansion of ξn(τ ) in Lemma 3 together with the expression abovelead to

rn(τ ) = ξn(τ )+ 1

ξn(τ )log

zn(τ )

ξn(τ )= ξn(τ )+

c1√n

+ 1n

(c2 − c2

12

)+ O(n−3/2)

x + ξ1√n

+ ξ2n + O(n−3/2)

= x + ξ1√n

+ ξ2

n+ c1

x√

n+ 1

n

(c2

x− c2

1

2x− ξ1c1

x2

)+ O(n−3/2).

A few simple computations show that ξ1 + c1

x= −p1(x) and

ξ2 + c2

x− c2

1

2x− ξ1c1

x2 = (z1 − ξ1)2

2x3 + xξ21

2+ ξ1(z1 − ξ1)

x− p2(x).

Taking into account the expressions for z1 and ξ1 obtained in Lemmas 2 and 3, wecan easily obtain that

(z1 − ξ1)2

2x3 + xξ21

2+ ξ1(z1 − ξ1)

x= x

2p1(x)

2

from which the statement of the lemma follows.

Theorem 1 For every x > 0, such that x = O(1), the r∗ approximation verifies that

Φ(rn(τ )) = Φ(x)− φ(x)

(1√n

p1(x)+ 1

np2(x)

)+ O(n−3/2) (3)

Proof From the expansion of rn(τ ) in Lemma 4, it will suffice to expand Φ(rn(τ )) ina Taylor series at x to get

Φ(rn(τ )) = Φ(x)+ φ(x)(rn(τ )− x)− xφ(x)

2(rn(τ )− x)2 + O(n−3/2)

= Φ(x)+ φ(x)

[− 1√

np1(x)+ 1

n

( x

2p1(x)

2 − p2(x))

+ O(n−3/2)

]

− xφ(x)

2

[1

np1(x)

2 + O(n−3/2)

]+ O(n−3/2)

= Φ(x)− φ(x)

(1√n

p1(x)+ 1

np2(x)

)+ O(n−3/2).

as was intended to prove.

123

Page 7: Exploring the relation between the r* approximation and the Edgeworth expansion

Relation between r∗ approximation and Edgeworth expansion

Remark 1 Note that Theorem 1 implies that

Fn(x)−Φ(rn(τ )) = Fn(x)−Φ(x)+ φ(x)

(1√n

p1(x)+ 1

np2(x)

)

+Φ(x)− φ(x)

(1√n

p1(x)+ 1

np2(x)

)−Φ(rn(τ )) = O(n−3/2).

Therefore, the order of the error in approximation (2) can be derived from theconnection between the Edgeworth expansion and the r∗ formula.

Remark 2 If we put

r̂n(τ ) = x − 1√n

p1(x)+ 1

n

( x

2p1(x)

2 − p2(x))

and take into account the order of the difference rn(τ )− r̂n(τ ) = O(n−3/2), then weconclude, by means of a Taylor expansion, that

Φ(rn(τ )) = Φ(r̂n(τ ))+ φ(r̂n(τ ))(rn(τ )− r̂n(τ ))+ · · · = Φ(r̂n(τ ))+ O(n−3/2).

Hence, the distribution function Fn may be approximated by the adjusted formulaΦ(r̂n(τ )), known as the Edgeworth B expansion (Phillips 1978), which only requiresknowing the first four cumulants of the underlying distribution. We can conclude that

Fn(x) = Φ(r̂n(τ ))+ O(n−3/2) (4)

Unlike the Egdeworth expansion, approximation (4) is contained in [0, 1]. Never-theless, it could not be a nondecreasing monotone function for all x and lead to poorapproximations, as its Edgeworth counterpart, specially for very small sample sizes.Note also that the O(n−3/2) reminder appearing in all the approximations doesn’tnecessarily guarantee their accuracy, as some of our numerical examples will show.This is explained by the asymptotic nature of the approximations, in particular approx-imations (1) and (4), which may fail within the small sample framework.

4 Applications and numerical examples

4.1 The mean of independent random variables

Let us consider X1, X2, . . . , Xn independent and equally distributed (i.i.d) randomvariables with distribution function F . Assume that μ and σ 2 are the mean and var-iance of this distribution. Provided that the moments up to fourth order exist andthe Cramér’s condition—lim sup|t |→∞ |E{exp(i t X1)}| < 1—is satisfied, the validityof the two term expansion in (1) for the standardized sample mean can be provedtheoretically (see details in Hall 1997; Sect. 2.2).

123

Page 8: Exploring the relation between the r* approximation and the Edgeworth expansion

J. M. Arevalillo

In this case, the cumulants of the statistic Zn =√

n(X̄n − μ)

σare given by

k(n)1 = 0, k(n)2 = 1, k(n)3 = k3

n1/2 and k(n)4 = k4

n

where k3 and k4 are the skewness and kurtosis of the distribution F .In order to carry out some numerical comparisons, we will consider the cases of

the exponential distribution with mean λ and the uniform distribution in [−1, 1], forwhich the exact values of Fn(x) are known.

Exponential distribution

The standardized sample mean of observations drawn from an exponential distribution

with mean λ is Zn =√

n(X̄n − λ)

λ. For λ = 1 the c.g.f of Zn is given by

Kn(t) = −n log(1 − t/√

n)− √nt : t <

√n.

This c.g.f leads to the following cumulants:

k(n)1 = 0, k(n)2 = 1, k(n)3 = 2

n1/2 and k(n)4 = 6

n.

The expressions above provide all the elements we need to implement the Edge-worth expansion and approximations (2) and (4). The exact values of the distributionfunction Fn(x) are known in this example, so we will be use them to compare theaccuracy of all the approximations. Figure 1 shows plots of these approximations forsample sizes n = 1 and n = 10.

Note that the normal approximation is the least accurate one. On the other hand,when n = 1 approximations (2) and (4) improves the two term Edgeworth expan-sion, which displays an undesirable decreasing shape in the range (1.9, 2.5). All theapproximations exhibit similar results when n increases. Extra numerical work, notreported here, has shown that the r∗ approximation is the best one for all the samplesizes; however, it has the limitation of having to know the c.g.f Kn(t).

Uniform distribution in [−1, 1]

We now consider X1, X2, . . . , Xn i.i.d uniform random variables in [−1, 1]. The distri-bution function of the sum Sn =

∑n

i=1Xi admits an analytical expression (Killmann

and von Collani 2001) given by

Hn(x) =

⎧⎪⎪⎨⎪⎪⎩

0 if x < −n∑j∈Bn−1 (−1)|j| Qn(x+n−2|j|)

n!2n if − n ≤ x < n

1 if x ≥ n

123

Page 9: Exploring the relation between the r* approximation and the Edgeworth expansion

Relation between r∗ approximation and Edgeworth expansion

n=1

x

Exa

ct a

nd a

ppro

xim

ated

val

ues

of F

n(x)

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.70

0.75

0.80

0.85

0.90

0.95

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.6

0.7

0.8

0.9

1.0

n=10

xE

xact

and

app

roxi

mat

ed v

alue

s of

Fn(

x)

Fig. 1 Exact values of Fn(x) (solid curve). Normal approximation (dotted curve). Two term Edgeworthexpansion (dashed curve). Adjusted Edgeworth expansion (long dashed curve). r∗ approximation (dottedand dashed curve)

where Bn = {j = ( j1, j2, . . . , jn) : ji ∈ {0, 1}, i = 1, 2, . . . , n} denotes the n-dimensional binary space, with B0 = {0}, |j| is the number of ones in the vector j andQn is the function defined by

Qn(y) =

⎧⎪⎨⎪⎩

0 if y < 0

yn if 0 ≤ y < 2

yn − (y − 2)n if y ≥ 2

Let us consider the normalized sample mean Zn = √3n X̄n = ∑n

i=1

√3n Xi , whose

distribution function Fn is given by Fn(x) = Hn

(√n3 x

). In this case Zn has c.g.f

defined by Kn(t) = nK

(√3n t

)with K (t) = log

( sinh tt

)the c.g.f of a uniform

random variable in [−1, 1].It is well known that K (t) can be expanded in a power series as K (t) = ∑∞

k=22k Bk tk

k·k!provided that |t | < π , with Bk the so-called Bernoulli numbers. From this expansion,the cumulants of Zn , needed to compute the Edgeworth expansion, can be obtained ina straightforward manner; they are given by

k(n)2 j−1 = 0 and k(n)2 j = 12 j B2 j

(2 j)n j−1 j = 1, 2, . . . .

These expressions lead to the following particular cases: k(n)1 = k(n)3 = 0, k(n)2 = 1

and k(n)4 = 122 B44n = − 6

5n , from which we get the polynomials of the two termEdgeworth expansion:

123

Page 10: Exploring the relation between the r* approximation and the Edgeworth expansion

J. M. Arevalillo

n=5 n=10

0.6

0.7

0.8

0.9

1.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0

x

0.0 0.5 1.0 1.5 2.0 2.5 3.0

xE

xact

and

app

roxi

mat

ed v

alue

s of

Fn(

x)

0.6

0.7

0.8

0.9

1.0

Exa

ct a

nd a

ppro

xim

ated

val

ues

of F

n(x)

Fig. 2 Exact values of Fn(x) (solid curve). Normal approximation (dotted curve). Two term Edgeworthexpansion (dashed curve). Adjusted Edgeworth expansion (long dashed curve). r∗ approximation (dottedand dashed curve)

p1(x) = 0 and p2(x) = − 1

20(x3 − 3x).

These polynomials, along with Kn(t) and its second order derivative both of themevaluated at the saddlepoint, are the key quantities involved in the Edgeworth expan-sion, the r∗ approximation and the adjusted version in (4). We have plotted them inFig. 2, together with the exact Fn(x), for sample sizes n = 5 and n = 10.

Note that all the approximations reach a high level of accuracy for small samplesizes as shows the agreement of the curves in Fig. 2. Further numerical work, notshown here, has revealed the high quality of all these approximations even for smallersample sizes when the underlying distribution is uniform in [−1, 1].

4.2 The studentized mean of independent random variables

Let X1, X2, . . . , Xn be i.i.d random variables having distribution function F with meanμ and variance σ 2. Assume that γ and κ are the skewness and kurtosis parametersof F .

Let us consider the studentized statistic Zn =√

n(X̄n−μ)σ̂

, where σ̂ 2 is the usualsample variance. Provided that the sixth moment of F exists and a Cramér type con-dition is met (see Hall 1987) we can obtain a two term Edgeworth expansion for thedistribution of Zn .

The expansion is derived in an easy way by writing

Fn(x) = P

(√n(X̄n − μ)

σ̂≤ x

)= P

(√nW̄

β̂≤ x

),

123

Page 11: Exploring the relation between the r* approximation and the Edgeworth expansion

Relation between r∗ approximation and Edgeworth expansion

with W̄ and β̂2 the sample mean and variance of the standardized variables Wi = Xi −μσ

.Using the device and notation in Hall (1997), the following expansions for the cumu-lants of

√nW̄/β̂ can be obtained:

k(n)1 = −1

2n−1/2γ + O(n−3/2), k(n)2 = 1 + 1

4n−1(7γ 2 + 12)+ O(n−2),

k(n)3 = −2n−1/2γ + O(n−3/2) and k(n)4 = n−1(12γ 2 − 2κ + 6)+ O(n−2).

Therefore, we get the quantities k1,2 = k2,1 = k3,2 = 0 and

k1,1 = −γ2, k2,2 = 1

4(7γ 2 + 12), k3,1 = −2γ, k4,1 = 12γ 2 − 2κ + 6

which lead to the polynomials p1(x) and p2(x) of the Edgeworth expansion:

p1(x) = −γ6(2x2 + 1),

p2(x) = −x

12(x2 − 3)− γ 2

18(x4 + 2x2 − 3)− 1

4(x2 + 3)

].

In order to carry out comparisons, we consider a standard normal underlying dis-

tribution for which we know that√

n−1n Zn has a t distribution with n − 1 degrees of

freedom. In this case, we have γ = κ = 0 and therefore

p1(x) = 0, p2(x) = x

4(x2 + 3) and r̂n(τ ) = x − x

4n(x2 + 3).

For this example we cannot implement (2) because the t distribution has no momentgenerating function within an interval containing the origin; so we will only calculatethe normal approximation—which agrees with the one term Edgeworth expansion inthis case—, the two term Edgeworth expansion and the adjusted version (4).

Figure 3 displays the plot of the exact tn−1 distribution function, along with plotsof all the approximations for sample sizes n = 5 and n = 20.

See that the adjusted approximation in (4) leads to poor results when the samplesize is small enough. This was totally expected and is concerned with the nonmono-tone behavior of r̂n(τ ) and Φ(r̂n(τ )), explained by the influence of the cubic leadingterm in r̂n(τ )− x which has a negative coefficient. When the sample size is increased,such influence is masked by a large n and the approximation becomes more accurate;in this case, all the approximations are highly accurate and remain very close to theexact, as highlighted by the overlap in the plots.

123

Page 12: Exploring the relation between the r* approximation and the Edgeworth expansion

J. M. Arevalillo

n=5

x

Exa

ct a

nd a

ppro

xim

ated

val

ues

of F

n(x)

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.6

0.7

0.8

0.9

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.6

0.7

0.8

0.9

1.0

n=20

xE

xact

and

app

roxi

mat

ed v

alue

s of

Fn(

x)

Fig. 3 Exact values of the tn−1 distribution function (solid curve). Normal approximation (dotted curve).Two term Edgeworth expansion (dashed curve). Adjusted Edgeworth expansion (long dashed curve)

4.3 U-Statistics

Let X1, X2, . . . , Xn be i.i.d random variables with common distribution F . AnU-Statistic of degree two is defined by

Un = 1(n2

)∑

1≤i< j≤n

h(Xi , X j )

where h is a permutation invariant function having expectation with respect toF : E{h(X1, X2)} = 0 and satisfying that g(X1) = E{h(X1, X2)|X1} has positivevariance σ 2

g .Under existence of certain moments of the functions h(X1, X2), g(X1), ψ(X1, X2)

= h(X1, X2)− g(X1)− g(X2) and some of their cross products, along with the ful-fillment of a Cramér type condition for g(X1), a two term Edgeworth expansion forthe statistic

Zn = Un

σn, with σ 2

n = 4σ 2g

n+ 1(n

2

) E{ψ(X1, X2)2}

was derived by Callaert et al. (1980).The polynomials in the expansion are given by

p1(x) = k3(x2 − 1)

6and p2(x) = k4

24(x3 − 3x)+ k2

3

72(x5 − 10x3 + 15x)

with the quantities k3 and k4:

123

Page 13: Exploring the relation between the r* approximation and the Edgeworth expansion

Relation between r∗ approximation and Edgeworth expansion

k3 = 1

σ 3g

[E{g(X1)

3} + 3E{g(X1)g(X2)ψ(X1, X2)}], k4 = 1

σ 4g

[E{g(X1)

4}

+12E{g(X1)2g(X2)ψ(X1, X2)}+12E{g(X2)g(X3)ψ(X1, X2)ψ(X1, X3)}

]−3

The sample variance

Let us consider the unbiased sample variance S2, which may be written in a degreetwo U-statistic form as

S2 − 1 = 1

n − 1

n∑i=1

(Xi − X̄)2 − 1 = 1(n2

)∑

1≤i< j≤n

[(Xi − X j )

2

2− 1

],

so that h(X1, X2) = (X1−X2)2

2 − 1.After some simple algebra (see for instance DasGupta 2008), we arrive to the fol-

lowing expressions for k3 and k4:

k3 = μ6 − 3μ4 − 6μ23 + 2

(μ4 − 1)3/2, k4 = μ8 − 4μ6 − 3μ2

4 + 12μ4 + 96μ23 − 24μ3μ5 − 6

(μ4 − 1)2.

When the underlying model is the standard normal law, we obtain k3 = 4/√

2 andk4 = 12. In this case, the distribution of Zn is known and has c.g.f

Kn(t) = −t

√n − 1

2− n − 1

2log

(1 −

√2

n − 1t

): t <

√n − 1

2.

Now we can easily compare all the approximations—r∗, the adjusted (4) and thetwo term Edgeworth expansion—with the exact Fn(x). Figure 4 displays plots of all ofthem for very small sample sizes such as n = 2 and n = 10. Note that all the approx-imations are similar to those ones obtained for the sample mean with an exponentialdistribution.

The Gini mean difference

Now we consider the Gini mean difference Gn of independent random variablesX1, X2, . . . , Xn with underlying uniform distribution on [0, 1] and the correspondingU-statistic, given by

Un = Gn − 1

3with Gn = 1(n

2

)∑

1≤i< j≤n

|Xi − X j |.

The permutation invariant function is h(X1, X2) = |X1 − X2| − 13 ; therefore,

g(Xi ) = E{h(Xi , X2)|Xi } = X2i − Xi + 1

6 : i = 1, 2, 3. Careful computations leadto the following results:

123

Page 14: Exploring the relation between the r* approximation and the Edgeworth expansion

J. M. Arevalillo

n=2

x

Exa

ct a

nd a

ppro

xim

ated

val

ues

of F

n(x)

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.75

0.80

0.85

0.90

0.95

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.6

0.7

0.8

0.9

1.0

n=10

xE

xact

and

app

roxi

mat

ed v

alue

s of

Fn(

x)

Fig. 4 Exact values of Fn(x) (solid curve). Normal approximation (dotted curve). Two term Edgeworthexpansion (dashed curve). Adjusted Edgeworth expansion (long dashed curve). r∗ approximation (dottedand dashed curve)

σ 2g = 1

180, E{ψ(X1, X2)

2} = 2

45, E{g(X1)

3} = 1

3780and

E{g(X1)g(X2)ψ(X1, X2)} = E{g(X1)g(X2)h(X1, X2)} = − 1

3780,

from which we obtain k3 = − 4√

57 .

On the other hand, some extra algebra yields the quantities

E{g(X1)4} = 1

15120, E{g(X1)

2g(X2)ψ(X1, X2)} = − 1

113400

and E{g(X2)g(X3)ψ(X1, X2)ψ(X1, X3)} = 1

75600,

from which we get k4 = 67 .

Unlike the case of the sample variance, the exact distribution of Gn is unknown,so we will obtain Fn(x) by simulation, drawing 105 samples of size n from a uniformvariable on [0, 1]. The experimental study was carried out for n = 5 and n = 20; com-parisons of the simulated values of Fn with the normal approximation, the two termEdgeworth expansion and the adjustment in (4) are displayed in Fig. 5. Once again,we don’t implement the r∗ approximation because the c.g.f of Gn is not a tractableone.

We note that when n = 5 the Edgeworth expansion takes values slightly greaterthan 1 for points x within the range (2.1, 3.5) and displays a slight decreasing shapewhen x ≥ 3.5; meanwhile, the adjusted version (4) is always within the interval[0, 1], giving a correction to the Edgeworth approximation. The previous behaviordiminishes or disappears when the sample size is increased to n = 20. In this case

123

Page 15: Exploring the relation between the r* approximation and the Edgeworth expansion

Relation between r∗ approximation and Edgeworth expansion

n=5

x

Exa

ct a

nd a

ppro

xim

ated

val

ues

of F

n(x)

0 1 2 3 4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4

0.6

0.7

0.8

0.9

1.0

n=20

xE

xact

and

app

roxi

mat

ed v

alue

s of

Fn(

x)

Fig. 5 Exact values of Fn(x) (solid curve). Normal approximation (dotted curve). Two term Edgeworthexpansion (dashed curve). Adjusted Edgeworth expansion (long dashed curve)

both approximations do overlap and agree with the exact; they are also very close tothe normal approximation.

5 Summary and conclusion

We have explored the high order connections between the r∗ approximation and theEdgeworth expansion and have shown that the Edgeworth B approximation intro-duced by Phillips (1978) is just an asymptotic expansion of the r∗ formula up to thethird order. We conclude that r∗, two term Edgeworth expansion and Edgeworth Bapproximations are asymptotically equivalent with an O(n−3/2) reminder. However,as the numerical applications have shown, the asymptotic validity of the approxima-tions does not necessarily imply their accuracy for finite samples, specially for smallsample sizes.

Acknowledgments The author wishes to thank his wife Marian for her encouragement and constantsupport. He is also in debt to an anonymous referee for useful comments that improved the paper.

References

Barndorff-Nielsen OE, Cox DR (1979) Edgeworth and saddlepoint approximations with statistical appli-cations. J R Stat Soc B 41:279–312

Callaert H, Janssen P, Veraverbeke N (1980) An Edgeworth expansion for U-statistics. Ann Stat 8:299–312Daniels HE (1954) Saddlepoint approximations in statistics. Ann Math Stat 25:631–650Daniels HE (1987) Tail probability approximations. Int Stat Rev 55:37–48DasGupta A (2008) Asymptotic theory of statistics and probability. Springer, Berlin, pp 185–234Field C, Ronchetti E (1990) Small sample asymptotics. Inst Math StatHall P (1987) Edgeworth expansions for Student’s t-statistic under minimal moment conditions. Ann

Probab 15:920–931

123

Page 16: Exploring the relation between the r* approximation and the Edgeworth expansion

J. M. Arevalillo

Hall P (1997) The bootstrap and Edgeworth expansion. Springer, Berlin, pp 39–81Jensen JL (1992) The modified signed likelihood statistic and saddlepoint approximations. Biometrika

79:693–703Kakizawa Y, Taniguchi M (1994) Higher order asymptotic relation between Edgeworth approximation and

Saddlepoint approximation. J Jpn Stat Soc 24:109–119Killmann F, von Collani E (2001) A note on the convolution of the uniform and related distributions and

their use in quality control. Econ Qual Control 16:17–41Lahiri SN (2003) Resampling methods for dependent data. Springer, Berlin, pp 145–173Lugannani R, Rice S (1980) Saddlepoint approximations for the distribution of the sum of independent

random variables. Adv Appl Probab 12:475–490Monti AC (1993) A new look at the relationship between Edgeworth expansion and saddlepoint approxi-

mation. Stat Probab Lett 17:49–52Ochi Y (1983) Asymptotic expansions for the distribution of an estimator in the first-order autoregressive

process. J Time Ser Anal 4:57–67Phillips PCB (1978) Edgeworth and saddlepoint approximations in the first-order noncircular autoregres-

sion. Biometrika 65:91–98Reid N (1996) Likelihood and higher-order approximations to tail areas: a review and annotated bibliogra-

phy. Can J Stat 24:141–166Robinson J (1982) Saddlepoint approximations for permutation tests and confidence intervals. J R Stat Soc

B 44:91–101

123