exploring the relation between the r* approximation and the edgeworth expansion
TRANSCRIPT
MetrikaDOI 10.1007/s00184-011-0365-5
Exploring the relation between the r∗ approximationand the Edgeworth expansion
Jorge M. Arevalillo
Received: 16 December 2009© Springer-Verlag 2011
Abstract In this paper we study the relation between the r∗ saddlepoint approxi-mation and the Edgeworth expansion when quite general assumptions for the statisticunder consideration are fulfilled. We will show that the two term Edgeworth expansionapproximates the r∗ formula up to an O(n−3/2) remainder; this provides a new way oflooking at the order of the error of the r∗ approximation. This finding will be used toinspect the close connection between the r∗ formula and the Edgeworth B adjustmentintroduced in Phillips (Biometrika 65:91–98, 1978). We will show that, whenever anEdgeworth expansion exists, this adjustment approximates both the distribution func-tion of the statistic and the r∗ formula to the same order degree as the Edgeworthexpansion. Some numerical examples for the sample mean and U-statistics are givenin order to shed light on the theoretical discussion.
Keywords Asymptotic statistics · Edgeworth expansion ·r∗ saddlepoint approximation · Cumulants
1 Introduction
The theory of high order approximations comprises Edgeworth and saddlepointones. The former have a long history in statistics and date back to Francis YsidroEdgeworth. The latter were introduced (Daniels 1954) to approximate the densityfunction of the sample mean. Later on, new saddlepoint formulae were developed toapproximate the distribution function (Lugannani and Rice 1980; Robinson 1982;Jensen 1992); they have proved to be competitive with the Edgeworth expansion
J. M. Arevalillo (B)Department of Statistics, Operational Research and Numerical Analysis, University NacionalEducación a Distancia (UNED), Paseo Senda del Rey 9, 28040 Madrid, Spaine-mail: [email protected]
123
J. M. Arevalillo
and even to improve it, specially for small sample sizes (Field and Ronchetti 1990;Kakizawa and Taniguchi 1994).
The comparison of saddlepoint approximations with the Edgeworth expansion hasbeen a subject of research since Daniels’ pioneer work. The relation between them wasstudied for densities (Barndorff-Nielsen and Cox 1979; Monti 1993) and later on fordistribution functions (Kakizawa and Taniguchi 1994), with the focus on Lugannani-Rice and Esscher saddlepoint formulae. The recent manual on asymptotic statistics(DasGupta 2008) contains a nice account about the foundations of Edgeworth andsaddlepoint approximations and also gives a complete list of key references in theliterature.
This paper explores the connection between the r∗ formula, a simple saddlepointapproximation, and the Edgeworth expansion; the relation between both of them allowsto deal with an r∗ adjustment, which approximates the distribution function of thestatistic up to third order, providing an easy to implement alternative. The work isorganized as follows: Sect. 2 establishes some assumptions for the statistic underconsideration and introduces both the Edgeworth and r∗ saddlepoint approximations;their relation is discussed in Sect. 3. In Sect. 4 we give several examples so that allthe approximations are compared numerically. Finally, we establish some concludingremarks.
2 Edgeworth expansions and the r∗ approximation
Let us consider an asymptotically normal statistic Zn , based on a random sample ofvariables X1, X2, . . . , Xn with common distribution F . Assume that Zn has absolutelycontinuous distribution Fn , finite moment generating function Mn(t) within an inter-val [−r, s], r ≥ 0, s ≥ 0 : r + s > 0 containing the origin and cumulant generatingfunction (c.g.f) Kn(t) = log Mn(t).
Assume that the cumulants of Zn are expandable as a convergent series in powersof n−1/2
k(n)1 = k1,1
n1/2 + k1,2
n+ · · · , k(n)2 = 1 + k2,1
n1/2 + k2,2
n+ · · ·
k(n)j = k j,1
n( j−2)/2+ k j,2
n( j−1)/2+ · · · , j ≥ 3.
These assumptions on the cumulants are general enough to cover a wide range ofsituations such as the sample mean of independent random variables, smooth functionsof the mean vector (Hall 1997), a wide range of U-statistics (Callaert et al. 1980) ora class of estimators of the parameter of the fisrt-order autoregressive process (Ochi1983).
Under these assumptions, an Edgeworth expansion for the distribution function Fn
of the statistic can be derived formally applying the device used by Hall (1997), so weobtain that
123
Relation between r∗ approximation and Edgeworth expansion
Fn(x)=Φ(x)−φ(x)[n−1/2 p1(x)+n−1 p2(x)+· · ·+n− j/2 p j (x)
]+O(n− j+1
2 ),
(1)
with Φ and φ the distribution and density functions of the N (0, 1) law.When certain regularity conditions are met, the validity of (1) can be proved theo-
retically even with an o(n− j/2) remainder (Hall 1997; Lahiri 2003).The first two polynomials in (1) give the two term Edgeworth expansion. They are
given by
p1(x) = k1,1 + k2,1x
2+ k3,1(x2 − 1)
6
p2(x) = k1,2 +(
k21,1 + k2,2
2
)x +
(k3,2
6+ k1,1k2,1
2
)(x2 − 1)
+(
k4,1
24+ k2
2,1
8+ k1,1k3,1
6
)(x3 − 3x)
+k2,1k3,1
12(x4 − 6x2 + 3)+ k2
3,1
72(x5 − 10x3 + 15x)
Other high order approximations were obtained by the saddlepoint method or byexponential tilting (Lugannani and Rice 1980; Daniels 1987). A simple adjustment ofthem is the r∗ formula (Jensen 1992; Reid 1996), given by
Fn(x) ≈ Φ(rn(τ )) (2)
where the quantity rn(τ ) is defined by rn(τ ) = ξn(τ )+ 1
ξn(τ )log
zn(τ )
ξn(τ ), with ξn(τ ) =
sgn(τ ) [2 (τ x − Kn(τ ))]1/2 , zn(τ ) = τ
√K ′′
n (τ ) and τ the point solving the saddle-point equation: K ′
n(τ ) = x .Usually, the error in (2) has an O(n−3/2) order provided that x = O(1).
3 High order connections between r∗ formula and the Edgeworth expansion
Theorem 1 of this section establishes the connection between the r∗ formula and thetwo term Edgeworth expansion. Some lemmas providing expansions in powers ofn−1/2 of the saddlepoint τ and the quantities ξn(τ ), zn(τ ) and rn(τ ) in (2) are neededin advance.
Lemma 1 For every x such that x = O(1) the saddlepoint τ verifies that τ = x +τ1√
n+ τ2
n+ O(n−3/2), where
123
J. M. Arevalillo
τ1 = −(
k1,1 + k2,1x + k3,1x2
2
)and
τ2 = k1,1k2,1 − k1,2 + (k22,1 + k1,1k3,1 − k2,2)x
+(
3k2,1k3,1
2− k3,2
2
)x2 +
(k2
3,1
2− k4,1
6
)x3
Proof See Sect. 3 in Kakizawa and Taniguchi (1994).
Lemma 2 For every x such that x = O(1) the function zn(τ ) has an expansionin powers of n−1/2, which up to the O(n−3/2) term, is given by zn = zn(τ ) =x + z1√
n+ z2
n+ O(n−3/2), where z1 = −
(k1,1 + k2,1x
2
)and
z2 = k2,1k1,1
2− k1,2 +
(3k2
2,1
8− k2,2
2
)x +
(k4,1
12− k2
3,1
8
)x3
Proof The expansion of K ′′n (τ ) yields
[zn(τ )]2 = τ 2 K ′′n (τ ) = k(n)2 τ 2 + k(n)3 τ 3 + k(n)4 τ 4
2+ · · ·
Replacing the saddlepoint τ in the expression above by its expansion in the previouslemma, we will obtain
[zn(τ )]2
= x2 + 1√n
(2xτ1 + k2,1x2 + k3,1x3
)
+1
n
(2xτ2+τ 2
1 +2k2,1xτ1 + k2,2x2 + 3k3,1x2τ1 + k3,2x3 + k4,1x4
2
)+ O(n−3/2)
= x2− 1√n
(2k1,1x+k2,1x2
)+ 1
n
[k2
1,1 + 2k2,1k1,1x − 2k1,2x +(
k22,1 − k2,2
)x2
+(
k4,1
6− k2
3,1
4
)x4
]+ O(n−3/2) = x2 + 2xz1√
n+ 1
n(2xz2 + z2
1)+ O(n−3/2)
which implies, after taking the square root, the statement of the lemma.
Lemma 3 For every x > 0 such that x = O(1) the function ξn(τ ) is expandable
in powers of n−1/2, up to the O(n−3/2) term, as ξn = ξn(τ ) = x + ξ1√n
+ ξ2
n+
O(n−3/2) where ξ1 = z1 − k3,1x2
6and ξ2 = z2 + k1,1k3,1x
3+ 5k2,1k3,1 − 2k3,2
12x2 +
17k23,1 − 9k4,1
72x3.
123
Relation between r∗ approximation and Edgeworth expansion
Proof From the expansions of Kn(τ ), K ′n(τ ) and K ′′
n (τ ) it follows that
[ξn(τ )]2 = 2[τK ′n(τ )− Kn(τ )] − τ 2 K ′′
n (τ )+ [zn(τ )]2
= −k(n)3 τ 3
3− k(n)4 τ 4
4+ · · · + [zn(τ )]2.
If we insert the expansions of τ and zn(τ ) from the previous lemmas in the expres-sion above and gather together the terms of the same order, we get
[ξn(τ )]2 = x2 + 1√n
(2xz1 − k3,1x3
3
)
+1
n
(z2
1 + 2xz2 − k3,1x2τ1 − k3,2x3
3− k4,1x4
4
)
+O(n−3/2).
Replacing τ1 by the expression in Lemma 1, we will obtain that
[ξn(τ )]2
= x2 + 1√n
(2xz1 − k3,1x3
3
)+ 1
n
[2xz2 +
(z1 − k3,1x2
6
)2
+ 2k1,1k3,1x2
3+
(5k2,1k3,1
6− k3,2
3
)x3 +
(17k2
3,1
36− k4,1
4
)x4
]+ O(n−3/2)
= x2 + 2xξ1√n
+ 1
n(ξ2
1 + 2xξ2)+ O(n−3/2)
from which the assertion stated by the lemma will follow, after taking the square root.
Lemma 4 For every x > 0 such that x = O(1) an expansion of rn(τ ) in powers ofn−1/2, up to the O(n−3/2) error term, is given by
rn(τ ) = x − 1√n
p1(x)+ 1
n
(x
2p1(x)
2 − p2(x)
)+ O(n−3/2)
Proof From the results provided by lemmas 2 and 3, we will obtain the following
expansion for the quotient:zn(τ )
ξn(τ )= 1 + c1√
n+ c2
n+ O(n−3/2), with c1 = z1 − ξ1
x
and c2 = z2 − ξ2
x− z1ξ1
x2 + ξ21
x2 . Therefore,
logzn(τ )
ξn(τ )= log
(1 + c1√
n+ c2
n+ O(n−3/2)
)=
(c1√
n+ c2
n+ O(n−3/2)
)
−1
2
(c1√
n+ c2
n+ O(n−3/2)
)2
+ · · · = c1√n
+ 1
n
(c2 − c2
1
2
)+ O(n−3/2).
123
J. M. Arevalillo
Finally, the expansion of ξn(τ ) in Lemma 3 together with the expression abovelead to
rn(τ ) = ξn(τ )+ 1
ξn(τ )log
zn(τ )
ξn(τ )= ξn(τ )+
c1√n
+ 1n
(c2 − c2
12
)+ O(n−3/2)
x + ξ1√n
+ ξ2n + O(n−3/2)
= x + ξ1√n
+ ξ2
n+ c1
x√
n+ 1
n
(c2
x− c2
1
2x− ξ1c1
x2
)+ O(n−3/2).
A few simple computations show that ξ1 + c1
x= −p1(x) and
ξ2 + c2
x− c2
1
2x− ξ1c1
x2 = (z1 − ξ1)2
2x3 + xξ21
2+ ξ1(z1 − ξ1)
x− p2(x).
Taking into account the expressions for z1 and ξ1 obtained in Lemmas 2 and 3, wecan easily obtain that
(z1 − ξ1)2
2x3 + xξ21
2+ ξ1(z1 − ξ1)
x= x
2p1(x)
2
from which the statement of the lemma follows.
Theorem 1 For every x > 0, such that x = O(1), the r∗ approximation verifies that
Φ(rn(τ )) = Φ(x)− φ(x)
(1√n
p1(x)+ 1
np2(x)
)+ O(n−3/2) (3)
Proof From the expansion of rn(τ ) in Lemma 4, it will suffice to expand Φ(rn(τ )) ina Taylor series at x to get
Φ(rn(τ )) = Φ(x)+ φ(x)(rn(τ )− x)− xφ(x)
2(rn(τ )− x)2 + O(n−3/2)
= Φ(x)+ φ(x)
[− 1√
np1(x)+ 1
n
( x
2p1(x)
2 − p2(x))
+ O(n−3/2)
]
− xφ(x)
2
[1
np1(x)
2 + O(n−3/2)
]+ O(n−3/2)
= Φ(x)− φ(x)
(1√n
p1(x)+ 1
np2(x)
)+ O(n−3/2).
as was intended to prove.
123
Relation between r∗ approximation and Edgeworth expansion
Remark 1 Note that Theorem 1 implies that
Fn(x)−Φ(rn(τ )) = Fn(x)−Φ(x)+ φ(x)
(1√n
p1(x)+ 1
np2(x)
)
+Φ(x)− φ(x)
(1√n
p1(x)+ 1
np2(x)
)−Φ(rn(τ )) = O(n−3/2).
Therefore, the order of the error in approximation (2) can be derived from theconnection between the Edgeworth expansion and the r∗ formula.
Remark 2 If we put
r̂n(τ ) = x − 1√n
p1(x)+ 1
n
( x
2p1(x)
2 − p2(x))
and take into account the order of the difference rn(τ )− r̂n(τ ) = O(n−3/2), then weconclude, by means of a Taylor expansion, that
Φ(rn(τ )) = Φ(r̂n(τ ))+ φ(r̂n(τ ))(rn(τ )− r̂n(τ ))+ · · · = Φ(r̂n(τ ))+ O(n−3/2).
Hence, the distribution function Fn may be approximated by the adjusted formulaΦ(r̂n(τ )), known as the Edgeworth B expansion (Phillips 1978), which only requiresknowing the first four cumulants of the underlying distribution. We can conclude that
Fn(x) = Φ(r̂n(τ ))+ O(n−3/2) (4)
Unlike the Egdeworth expansion, approximation (4) is contained in [0, 1]. Never-theless, it could not be a nondecreasing monotone function for all x and lead to poorapproximations, as its Edgeworth counterpart, specially for very small sample sizes.Note also that the O(n−3/2) reminder appearing in all the approximations doesn’tnecessarily guarantee their accuracy, as some of our numerical examples will show.This is explained by the asymptotic nature of the approximations, in particular approx-imations (1) and (4), which may fail within the small sample framework.
4 Applications and numerical examples
4.1 The mean of independent random variables
Let us consider X1, X2, . . . , Xn independent and equally distributed (i.i.d) randomvariables with distribution function F . Assume that μ and σ 2 are the mean and var-iance of this distribution. Provided that the moments up to fourth order exist andthe Cramér’s condition—lim sup|t |→∞ |E{exp(i t X1)}| < 1—is satisfied, the validityof the two term expansion in (1) for the standardized sample mean can be provedtheoretically (see details in Hall 1997; Sect. 2.2).
123
J. M. Arevalillo
In this case, the cumulants of the statistic Zn =√
n(X̄n − μ)
σare given by
k(n)1 = 0, k(n)2 = 1, k(n)3 = k3
n1/2 and k(n)4 = k4
n
where k3 and k4 are the skewness and kurtosis of the distribution F .In order to carry out some numerical comparisons, we will consider the cases of
the exponential distribution with mean λ and the uniform distribution in [−1, 1], forwhich the exact values of Fn(x) are known.
Exponential distribution
The standardized sample mean of observations drawn from an exponential distribution
with mean λ is Zn =√
n(X̄n − λ)
λ. For λ = 1 the c.g.f of Zn is given by
Kn(t) = −n log(1 − t/√
n)− √nt : t <
√n.
This c.g.f leads to the following cumulants:
k(n)1 = 0, k(n)2 = 1, k(n)3 = 2
n1/2 and k(n)4 = 6
n.
The expressions above provide all the elements we need to implement the Edge-worth expansion and approximations (2) and (4). The exact values of the distributionfunction Fn(x) are known in this example, so we will be use them to compare theaccuracy of all the approximations. Figure 1 shows plots of these approximations forsample sizes n = 1 and n = 10.
Note that the normal approximation is the least accurate one. On the other hand,when n = 1 approximations (2) and (4) improves the two term Edgeworth expan-sion, which displays an undesirable decreasing shape in the range (1.9, 2.5). All theapproximations exhibit similar results when n increases. Extra numerical work, notreported here, has shown that the r∗ approximation is the best one for all the samplesizes; however, it has the limitation of having to know the c.g.f Kn(t).
Uniform distribution in [−1, 1]
We now consider X1, X2, . . . , Xn i.i.d uniform random variables in [−1, 1]. The distri-bution function of the sum Sn =
∑n
i=1Xi admits an analytical expression (Killmann
and von Collani 2001) given by
Hn(x) =
⎧⎪⎪⎨⎪⎪⎩
0 if x < −n∑j∈Bn−1 (−1)|j| Qn(x+n−2|j|)
n!2n if − n ≤ x < n
1 if x ≥ n
123
Relation between r∗ approximation and Edgeworth expansion
n=1
x
Exa
ct a
nd a
ppro
xim
ated
val
ues
of F
n(x)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.70
0.75
0.80
0.85
0.90
0.95
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.6
0.7
0.8
0.9
1.0
n=10
xE
xact
and
app
roxi
mat
ed v
alue
s of
Fn(
x)
Fig. 1 Exact values of Fn(x) (solid curve). Normal approximation (dotted curve). Two term Edgeworthexpansion (dashed curve). Adjusted Edgeworth expansion (long dashed curve). r∗ approximation (dottedand dashed curve)
where Bn = {j = ( j1, j2, . . . , jn) : ji ∈ {0, 1}, i = 1, 2, . . . , n} denotes the n-dimensional binary space, with B0 = {0}, |j| is the number of ones in the vector j andQn is the function defined by
Qn(y) =
⎧⎪⎨⎪⎩
0 if y < 0
yn if 0 ≤ y < 2
yn − (y − 2)n if y ≥ 2
Let us consider the normalized sample mean Zn = √3n X̄n = ∑n
i=1
√3n Xi , whose
distribution function Fn is given by Fn(x) = Hn
(√n3 x
). In this case Zn has c.g.f
defined by Kn(t) = nK
(√3n t
)with K (t) = log
( sinh tt
)the c.g.f of a uniform
random variable in [−1, 1].It is well known that K (t) can be expanded in a power series as K (t) = ∑∞
k=22k Bk tk
k·k!provided that |t | < π , with Bk the so-called Bernoulli numbers. From this expansion,the cumulants of Zn , needed to compute the Edgeworth expansion, can be obtained ina straightforward manner; they are given by
k(n)2 j−1 = 0 and k(n)2 j = 12 j B2 j
(2 j)n j−1 j = 1, 2, . . . .
These expressions lead to the following particular cases: k(n)1 = k(n)3 = 0, k(n)2 = 1
and k(n)4 = 122 B44n = − 6
5n , from which we get the polynomials of the two termEdgeworth expansion:
123
J. M. Arevalillo
n=5 n=10
0.6
0.7
0.8
0.9
1.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
x
0.0 0.5 1.0 1.5 2.0 2.5 3.0
xE
xact
and
app
roxi
mat
ed v
alue
s of
Fn(
x)
0.6
0.7
0.8
0.9
1.0
Exa
ct a
nd a
ppro
xim
ated
val
ues
of F
n(x)
Fig. 2 Exact values of Fn(x) (solid curve). Normal approximation (dotted curve). Two term Edgeworthexpansion (dashed curve). Adjusted Edgeworth expansion (long dashed curve). r∗ approximation (dottedand dashed curve)
p1(x) = 0 and p2(x) = − 1
20(x3 − 3x).
These polynomials, along with Kn(t) and its second order derivative both of themevaluated at the saddlepoint, are the key quantities involved in the Edgeworth expan-sion, the r∗ approximation and the adjusted version in (4). We have plotted them inFig. 2, together with the exact Fn(x), for sample sizes n = 5 and n = 10.
Note that all the approximations reach a high level of accuracy for small samplesizes as shows the agreement of the curves in Fig. 2. Further numerical work, notshown here, has revealed the high quality of all these approximations even for smallersample sizes when the underlying distribution is uniform in [−1, 1].
4.2 The studentized mean of independent random variables
Let X1, X2, . . . , Xn be i.i.d random variables having distribution function F with meanμ and variance σ 2. Assume that γ and κ are the skewness and kurtosis parametersof F .
Let us consider the studentized statistic Zn =√
n(X̄n−μ)σ̂
, where σ̂ 2 is the usualsample variance. Provided that the sixth moment of F exists and a Cramér type con-dition is met (see Hall 1987) we can obtain a two term Edgeworth expansion for thedistribution of Zn .
The expansion is derived in an easy way by writing
Fn(x) = P
(√n(X̄n − μ)
σ̂≤ x
)= P
(√nW̄
β̂≤ x
),
123
Relation between r∗ approximation and Edgeworth expansion
with W̄ and β̂2 the sample mean and variance of the standardized variables Wi = Xi −μσ
.Using the device and notation in Hall (1997), the following expansions for the cumu-lants of
√nW̄/β̂ can be obtained:
k(n)1 = −1
2n−1/2γ + O(n−3/2), k(n)2 = 1 + 1
4n−1(7γ 2 + 12)+ O(n−2),
k(n)3 = −2n−1/2γ + O(n−3/2) and k(n)4 = n−1(12γ 2 − 2κ + 6)+ O(n−2).
Therefore, we get the quantities k1,2 = k2,1 = k3,2 = 0 and
k1,1 = −γ2, k2,2 = 1
4(7γ 2 + 12), k3,1 = −2γ, k4,1 = 12γ 2 − 2κ + 6
which lead to the polynomials p1(x) and p2(x) of the Edgeworth expansion:
p1(x) = −γ6(2x2 + 1),
p2(x) = −x
[κ
12(x2 − 3)− γ 2
18(x4 + 2x2 − 3)− 1
4(x2 + 3)
].
In order to carry out comparisons, we consider a standard normal underlying dis-
tribution for which we know that√
n−1n Zn has a t distribution with n − 1 degrees of
freedom. In this case, we have γ = κ = 0 and therefore
p1(x) = 0, p2(x) = x
4(x2 + 3) and r̂n(τ ) = x − x
4n(x2 + 3).
For this example we cannot implement (2) because the t distribution has no momentgenerating function within an interval containing the origin; so we will only calculatethe normal approximation—which agrees with the one term Edgeworth expansion inthis case—, the two term Edgeworth expansion and the adjusted version (4).
Figure 3 displays the plot of the exact tn−1 distribution function, along with plotsof all the approximations for sample sizes n = 5 and n = 20.
See that the adjusted approximation in (4) leads to poor results when the samplesize is small enough. This was totally expected and is concerned with the nonmono-tone behavior of r̂n(τ ) and Φ(r̂n(τ )), explained by the influence of the cubic leadingterm in r̂n(τ )− x which has a negative coefficient. When the sample size is increased,such influence is masked by a large n and the approximation becomes more accurate;in this case, all the approximations are highly accurate and remain very close to theexact, as highlighted by the overlap in the plots.
123
J. M. Arevalillo
n=5
x
Exa
ct a
nd a
ppro
xim
ated
val
ues
of F
n(x)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.6
0.7
0.8
0.9
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.6
0.7
0.8
0.9
1.0
n=20
xE
xact
and
app
roxi
mat
ed v
alue
s of
Fn(
x)
Fig. 3 Exact values of the tn−1 distribution function (solid curve). Normal approximation (dotted curve).Two term Edgeworth expansion (dashed curve). Adjusted Edgeworth expansion (long dashed curve)
4.3 U-Statistics
Let X1, X2, . . . , Xn be i.i.d random variables with common distribution F . AnU-Statistic of degree two is defined by
Un = 1(n2
)∑
1≤i< j≤n
h(Xi , X j )
where h is a permutation invariant function having expectation with respect toF : E{h(X1, X2)} = 0 and satisfying that g(X1) = E{h(X1, X2)|X1} has positivevariance σ 2
g .Under existence of certain moments of the functions h(X1, X2), g(X1), ψ(X1, X2)
= h(X1, X2)− g(X1)− g(X2) and some of their cross products, along with the ful-fillment of a Cramér type condition for g(X1), a two term Edgeworth expansion forthe statistic
Zn = Un
σn, with σ 2
n = 4σ 2g
n+ 1(n
2
) E{ψ(X1, X2)2}
was derived by Callaert et al. (1980).The polynomials in the expansion are given by
p1(x) = k3(x2 − 1)
6and p2(x) = k4
24(x3 − 3x)+ k2
3
72(x5 − 10x3 + 15x)
with the quantities k3 and k4:
123
Relation between r∗ approximation and Edgeworth expansion
k3 = 1
σ 3g
[E{g(X1)
3} + 3E{g(X1)g(X2)ψ(X1, X2)}], k4 = 1
σ 4g
[E{g(X1)
4}
+12E{g(X1)2g(X2)ψ(X1, X2)}+12E{g(X2)g(X3)ψ(X1, X2)ψ(X1, X3)}
]−3
The sample variance
Let us consider the unbiased sample variance S2, which may be written in a degreetwo U-statistic form as
S2 − 1 = 1
n − 1
n∑i=1
(Xi − X̄)2 − 1 = 1(n2
)∑
1≤i< j≤n
[(Xi − X j )
2
2− 1
],
so that h(X1, X2) = (X1−X2)2
2 − 1.After some simple algebra (see for instance DasGupta 2008), we arrive to the fol-
lowing expressions for k3 and k4:
k3 = μ6 − 3μ4 − 6μ23 + 2
(μ4 − 1)3/2, k4 = μ8 − 4μ6 − 3μ2
4 + 12μ4 + 96μ23 − 24μ3μ5 − 6
(μ4 − 1)2.
When the underlying model is the standard normal law, we obtain k3 = 4/√
2 andk4 = 12. In this case, the distribution of Zn is known and has c.g.f
Kn(t) = −t
√n − 1
2− n − 1
2log
(1 −
√2
n − 1t
): t <
√n − 1
2.
Now we can easily compare all the approximations—r∗, the adjusted (4) and thetwo term Edgeworth expansion—with the exact Fn(x). Figure 4 displays plots of all ofthem for very small sample sizes such as n = 2 and n = 10. Note that all the approx-imations are similar to those ones obtained for the sample mean with an exponentialdistribution.
The Gini mean difference
Now we consider the Gini mean difference Gn of independent random variablesX1, X2, . . . , Xn with underlying uniform distribution on [0, 1] and the correspondingU-statistic, given by
Un = Gn − 1
3with Gn = 1(n
2
)∑
1≤i< j≤n
|Xi − X j |.
The permutation invariant function is h(X1, X2) = |X1 − X2| − 13 ; therefore,
g(Xi ) = E{h(Xi , X2)|Xi } = X2i − Xi + 1
6 : i = 1, 2, 3. Careful computations leadto the following results:
123
J. M. Arevalillo
n=2
x
Exa
ct a
nd a
ppro
xim
ated
val
ues
of F
n(x)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.75
0.80
0.85
0.90
0.95
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.6
0.7
0.8
0.9
1.0
n=10
xE
xact
and
app
roxi
mat
ed v
alue
s of
Fn(
x)
Fig. 4 Exact values of Fn(x) (solid curve). Normal approximation (dotted curve). Two term Edgeworthexpansion (dashed curve). Adjusted Edgeworth expansion (long dashed curve). r∗ approximation (dottedand dashed curve)
σ 2g = 1
180, E{ψ(X1, X2)
2} = 2
45, E{g(X1)
3} = 1
3780and
E{g(X1)g(X2)ψ(X1, X2)} = E{g(X1)g(X2)h(X1, X2)} = − 1
3780,
from which we obtain k3 = − 4√
57 .
On the other hand, some extra algebra yields the quantities
E{g(X1)4} = 1
15120, E{g(X1)
2g(X2)ψ(X1, X2)} = − 1
113400
and E{g(X2)g(X3)ψ(X1, X2)ψ(X1, X3)} = 1
75600,
from which we get k4 = 67 .
Unlike the case of the sample variance, the exact distribution of Gn is unknown,so we will obtain Fn(x) by simulation, drawing 105 samples of size n from a uniformvariable on [0, 1]. The experimental study was carried out for n = 5 and n = 20; com-parisons of the simulated values of Fn with the normal approximation, the two termEdgeworth expansion and the adjustment in (4) are displayed in Fig. 5. Once again,we don’t implement the r∗ approximation because the c.g.f of Gn is not a tractableone.
We note that when n = 5 the Edgeworth expansion takes values slightly greaterthan 1 for points x within the range (2.1, 3.5) and displays a slight decreasing shapewhen x ≥ 3.5; meanwhile, the adjusted version (4) is always within the interval[0, 1], giving a correction to the Edgeworth approximation. The previous behaviordiminishes or disappears when the sample size is increased to n = 20. In this case
123
Relation between r∗ approximation and Edgeworth expansion
n=5
x
Exa
ct a
nd a
ppro
xim
ated
val
ues
of F
n(x)
0 1 2 3 4
0.5
0.6
0.7
0.8
0.9
1.0
0 1 2 3 4
0.6
0.7
0.8
0.9
1.0
n=20
xE
xact
and
app
roxi
mat
ed v
alue
s of
Fn(
x)
Fig. 5 Exact values of Fn(x) (solid curve). Normal approximation (dotted curve). Two term Edgeworthexpansion (dashed curve). Adjusted Edgeworth expansion (long dashed curve)
both approximations do overlap and agree with the exact; they are also very close tothe normal approximation.
5 Summary and conclusion
We have explored the high order connections between the r∗ approximation and theEdgeworth expansion and have shown that the Edgeworth B approximation intro-duced by Phillips (1978) is just an asymptotic expansion of the r∗ formula up to thethird order. We conclude that r∗, two term Edgeworth expansion and Edgeworth Bapproximations are asymptotically equivalent with an O(n−3/2) reminder. However,as the numerical applications have shown, the asymptotic validity of the approxima-tions does not necessarily imply their accuracy for finite samples, specially for smallsample sizes.
Acknowledgments The author wishes to thank his wife Marian for her encouragement and constantsupport. He is also in debt to an anonymous referee for useful comments that improved the paper.
References
Barndorff-Nielsen OE, Cox DR (1979) Edgeworth and saddlepoint approximations with statistical appli-cations. J R Stat Soc B 41:279–312
Callaert H, Janssen P, Veraverbeke N (1980) An Edgeworth expansion for U-statistics. Ann Stat 8:299–312Daniels HE (1954) Saddlepoint approximations in statistics. Ann Math Stat 25:631–650Daniels HE (1987) Tail probability approximations. Int Stat Rev 55:37–48DasGupta A (2008) Asymptotic theory of statistics and probability. Springer, Berlin, pp 185–234Field C, Ronchetti E (1990) Small sample asymptotics. Inst Math StatHall P (1987) Edgeworth expansions for Student’s t-statistic under minimal moment conditions. Ann
Probab 15:920–931
123
J. M. Arevalillo
Hall P (1997) The bootstrap and Edgeworth expansion. Springer, Berlin, pp 39–81Jensen JL (1992) The modified signed likelihood statistic and saddlepoint approximations. Biometrika
79:693–703Kakizawa Y, Taniguchi M (1994) Higher order asymptotic relation between Edgeworth approximation and
Saddlepoint approximation. J Jpn Stat Soc 24:109–119Killmann F, von Collani E (2001) A note on the convolution of the uniform and related distributions and
their use in quality control. Econ Qual Control 16:17–41Lahiri SN (2003) Resampling methods for dependent data. Springer, Berlin, pp 145–173Lugannani R, Rice S (1980) Saddlepoint approximations for the distribution of the sum of independent
random variables. Adv Appl Probab 12:475–490Monti AC (1993) A new look at the relationship between Edgeworth expansion and saddlepoint approxi-
mation. Stat Probab Lett 17:49–52Ochi Y (1983) Asymptotic expansions for the distribution of an estimator in the first-order autoregressive
process. J Time Ser Anal 4:57–67Phillips PCB (1978) Edgeworth and saddlepoint approximations in the first-order noncircular autoregres-
sion. Biometrika 65:91–98Reid N (1996) Likelihood and higher-order approximations to tail areas: a review and annotated bibliogra-
phy. Can J Stat 24:141–166Robinson J (1982) Saddlepoint approximations for permutation tests and confidence intervals. J R Stat Soc
B 44:91–101
123