first order covariance inequalities via stein’s method · the rst to obtain upper variance bounds...

32
First order covariance inequalities via Stein’s method Marie Ernst * , Gesine Reinert and Yvik Swan * Abstract We propose probabilistic representations for inverse Stein operators (i.e. solutions to Stein equa- tions) under general conditions; in particular we deduce new simple expressions for the Stein kernel. These representations allow to deduce uniform and non-uniform Stein factors (i.e. bounds on solu- tions to Stein equations) and lead to new covariance identities expressing the covariance between arbitrary functionals of an arbitrary univariate target in terms of a weighted covariance of the derivatives of the functionals. Our weights are explicit, easily computable in most cases, and ex- pressed in terms of objects familiar within the context of Stein’s method. Applications of the Cauchy-Schwarz inequality to these weighted covariance identities lead to sharp upper and lower covariance bounds and, in particular, weighted Poincar´ e inequalities. Many examples are given and, in particular, classical variance bounds due to Klaassen, Brascamp and Lieb or Otto and Menz are corollaries. Connections with more recent literature are also detailed. 1 Introduction Much attention has been given in the literature to the problem of providing sharp tractable estimates on the variance of functions of random variables. Such estimates are directly related to fundamental considerations of pure mathematics (e.g., isoperimetric, logarithmic Sobolev and Poincar´ e inequalities), as well as essential issues from statistics (e.g., Cramer-Rao bounds, efficiency and asymptotic relative efficiency computations, maximum correlation coefficients, and concentration inequalities). One of the starting points of this line of research is Chernoff’s famous result from [27] which states that, if N ∼N (0, 1), then E[g 0 (N )] 2 Var[g(N )] E[g 0 (N ) 2 ] (1.1) for all sufficiently regular functions g : IR IR. Chernoff obtained the upper bound by exploiting orthogonality properties of the family of Hermite polynomials. The upper bound in (1.1) is, in fact, already available in [56] and is also a special case of the central inequality in [11], see below. Ca- coullos [12] extends Chernoff’s bound to a wide class of univariate distributions (including discrete distributions) by proving that if X p has a density function p with respect to the Lebesgue measure then E[τ p (X ) g 0 (X )] 2 Var[X ] Var[g(X )] E τ p (X ) g 0 (X ) 2 (1.2) with τ p (x)= p(x) -1 R x (t - E[X ])p(t)dt. It is easy to see that, if p is the standard normal density, then τ p (x) = 1 so that (1.2) contain (1.1). Cacoullos also obtains a similar bound as (1.2) for discrete distributions on the positive integers, where the derivative is replaced by the forward difference and the weight becomes τ p (x)= p(x) -1 t=x+1 tp(t). Variance inequalities such as (1.2) are closely related to the celebrated Brascamp-Lieb inequality from [11] which, in dimension 1, states that if X p and p is strictly log-concave then Var[g(X )] E (g 0 (X )) 2 (- log p) 00 (X ) (1.3) * Universit´ e de Li` ege, corresponding author Yvik Swan: [email protected]. University of Oxford. 1 arXiv:1906.08372v1 [math.PR] 19 Jun 2019

Upload: others

Post on 23-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

First order covariance inequalities via Stein’s method

Marie Ernst∗, Gesine Reinert† and Yvik Swan∗

Abstract

We propose probabilistic representations for inverse Stein operators (i.e. solutions to Stein equa-tions) under general conditions; in particular we deduce new simple expressions for the Stein kernel.These representations allow to deduce uniform and non-uniform Stein factors (i.e. bounds on solu-tions to Stein equations) and lead to new covariance identities expressing the covariance betweenarbitrary functionals of an arbitrary univariate target in terms of a weighted covariance of thederivatives of the functionals. Our weights are explicit, easily computable in most cases, and ex-pressed in terms of objects familiar within the context of Stein’s method. Applications of theCauchy-Schwarz inequality to these weighted covariance identities lead to sharp upper and lowercovariance bounds and, in particular, weighted Poincare inequalities. Many examples are given and,in particular, classical variance bounds due to Klaassen, Brascamp and Lieb or Otto and Menz arecorollaries. Connections with more recent literature are also detailed.

1 Introduction

Much attention has been given in the literature to the problem of providing sharp tractable estimateson the variance of functions of random variables. Such estimates are directly related to fundamentalconsiderations of pure mathematics (e.g., isoperimetric, logarithmic Sobolev and Poincare inequalities),as well as essential issues from statistics (e.g., Cramer-Rao bounds, efficiency and asymptotic relativeefficiency computations, maximum correlation coefficients, and concentration inequalities).

One of the starting points of this line of research is Chernoff’s famous result from [27] which statesthat, if N ∼ N (0, 1), then

E[g′(N)]2 ≤ Var[g(N)] ≤ E[g′(N)2] (1.1)

for all sufficiently regular functions g : IR → IR. Chernoff obtained the upper bound by exploitingorthogonality properties of the family of Hermite polynomials. The upper bound in (1.1) is, in fact,already available in [56] and is also a special case of the central inequality in [11], see below. Ca-coullos [12] extends Chernoff’s bound to a wide class of univariate distributions (including discretedistributions) by proving that if X ∼ p has a density function p with respect to the Lebesgue measurethen

E[τp(X) g′(X)]2

Var[X]≤ Var[g(X)] ≤ E

[τp(X) g′(X)2

](1.2)

with τp(x) = p(x)−1 ∫∞x (t − E[X])p(t)dt. It is easy to see that, if p is the standard normal density,

then τp(x) = 1 so that (1.2) contain (1.1). Cacoullos also obtains a similar bound as (1.2) for discretedistributions on the positive integers, where the derivative is replaced by the forward difference andthe weight becomes τp(x) = p(x)−1∑∞

t=x+1 tp(t).Variance inequalities such as (1.2) are closely related to the celebrated Brascamp-Lieb inequality

from [11] which, in dimension 1, states that if X ∼ p and p is strictly log-concave then

Var[g(X)] ≤ E[

(g′(X))2

(− log p)′′(X)

](1.3)

∗Universite de Liege, corresponding author Yvik Swan: [email protected].†University of Oxford.

1

arX

iv:1

906.

0837

2v1

[m

ath.

PR]

19

Jun

2019

Page 2: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

for all sufficiently regular functions g. In fact, the upper bound from (1.1) is an immediate consequenceof (1.3) because, if p is the standard Gaussian density, then (− log p)′′(x) ≡ 1. The Brascamp-Liebinequality is proved in [55] to be a consequence of Hoeffding’s classical covariance inequality from[41], which states that if (X,Y ) is a continuous bivariate random vector with cumulative distributionH(x, y) and marginal cdfs F (x), G(x) then

Cov[f(X), g(Y )] =

∫ ∞−∞

∫ ∞−∞

f ′(x)(H(x, y)− F (x)G(y)

)g′(y) dx dy (1.4)

under weak assumptions on f, g (see e.g. [29]). The freedom of choice in the test functions f, g in (1.4)is exploited by [55] to prove that, if X has a C2 strictly convex absolutely continuous density p thenthe asymmetric Brascamp-Lieb inequality holds:

|Cov[f(X), g(X)]| ≤ supx

{|f ′(x)|

(log p)′′(x)

}E[ ∣∣g′(X)

∣∣ ]. (1.5)

Identity (1.4) and inequalities (1.3) and (1.5) are extended to the multivariate setting in [19] which alsogives connections with logarithmic Sobolev inequalities for spin systems and related inequalities forlog-concave densities. This material is revisited and extended in [66, 64, 65], providing applications inthe context of isoperimetric inequalities and weighted Poincare inequalities. In [29] the identity (1.4)is proved in all generality and used to provide expansions for the covariance in terms of canonicalcorrelations and variables.

Further generalizations of Chernoff’s bounds are provided in [24, 14, 15], and [44] (e.g., Karlin [44]deals with the entire class of log-concave distributions). See also [10, 16, 46, 58, 18] for the connectionwith probabilistic characterizations and other properties. Similar inequalities were obtained – oftenby exploiting properties of suitable families of orthogonal polynomials – for univariate functionals ofsome specific multivariate distributions e.g., in [17, 13, 20, 48, 3, 49]. A historical overview as wellas a description of the connection between such bounds, the so-called Stein identities from Stein’smethod (see below) and Sturm-Liouville theory (see Section 4) can be found in [30]. To the best ofour knowledge, the most general version of (1.1) and (1.2) is due to [45], where the following result isproved

Theorem 1.1 (Klaassen bounds). Let µ be some σ-finite measure. Let ρ(x, y) be a measurable functionsuch that ρ(x, ·) does not change sign for µ almost x ∈ R. Suppose that g is a measurable functionsuch that G(x) =

∫ρ(x, y)g(y)µ(dy) + c is well defined for some c ∈ IR. Let X be a real random

variable with density p with respect to µ.

• (Klaassen upper variance bound) For all nonnegative measurable functions h : IR→ IR such thatµ ({x ∈ IR | g(x) 6= 0, p(x)h(x) = 0}) = 0 we have

Var[G(X)] ≤ E[g(X)2

h(X)

(1

p(X)

∫ρ(z,X)H(z)p(z)µ(dz)

)](1.6)

with H : IR→ IR supposed well-defined by H(x) =∫ρ(x, y)h(y)µ(dy).

• (Cramer-Rao lower variance bound) For all measurable functions k : IR → IR such that 0 <E[k2(X)] <∞ and E[k(X)] = 0 we have

Var[G(X)] ≥ E [g(X)K(X)]2

Var[k(X)](1.7)

where K(x) = 1p(x)

∫ρ(z, x)k(z)p(z)µ(dz). Equality in (1.7) holds if and only if G is linear in

k, p-almost everywhere.

2

Page 3: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

Klaassen’s proof of Theorem 1.1 relies on little more than the Cauchy-Schwarz inequality andFubini’s theorem; it has a slightly magical aura as little or no heuristic or context is provided as tothe best choices of test functions h, k and kernel ρ or even to the nature of the weights appearingin (1.6) and (1.7). To the best of our knowledge, all available first order variance bounds from theliterature can be obtained from either (1.6) or (1.7) by choosing the appropriate test functions h or kand the appropriate kernel ρ. For instance, the weights appearing in the upper bound (1.6) generalizethe Stein kernel from Cacoullos’ bound (1.2) – both in the discrete and the continuous case. Indeedtaking H(x) = x when the distribution p is continuous we see that then h(x) = 1 and the weightbecomes p(x)−1

∫ρ(z, x)zp(z)dµ(z) which is none other than τp(x). A similar argument holds as well

in the discrete case. In the same way, taking k(x) = x leads to K(x) = τp(x) in (1.7) and thus the lowerbound in (1.2) follows as well. The freedom of choice in the function h allows for much flexibility inthe quality of the weights; this fact seems somewhat under exploited in the literature. This is perhapsdue to the rather obscure nature of Klaassen’s weights, a topic which we shall be one of the centrallearnings of this paper. Indeed we shall provide a natural theoretical home for Klaassen’s result, inthe framework of Stein’s method.

Several variations on Klaassen’s theorem have already been obtained via techniques related toStein’s method. We defer a proper introduction of these techniques to Section 2. The gist of theapproach can nevertheless be understood very simply in case the underlying distribution is standardnormal. Stein’s classical identity states that if N ∼ N (0, 1) then

E[Ng(N)] = E[g′(N)] for all g such that E[|g′(N)|] <∞. (1.8)

By the Cauchy-Schwarz inequality we immediately deduce that, for all appropriate g,

E[g′(N)]2 = E[Ng(N)]2 ≤ E[N2]E[g(N)2

]≤ Var[g(N)], (1.9)

which gives the lower bound in (1.1). For the upper bound, still by the Cauchy-Schwarz inequality,

Var[g(N)] ≤ E

[(∫ N

0g′(x)dx

)2]≤ E

[N

∫ N

0(g′(x))2dx

]= E

[(g′(N))2

](1.10)

where the last identity is a direct consequence of Stein’s identity (1.8) applied to the function g(x) =∫ x0 (g′(u))2du. This is the upper bound in (1.1). The idea behind this proof is due to Chen [23]. As

is now well known (again, we refer the reader to Section 2 for references and details), Stein’s identity(1.8) for the normal distribution can be extended to basically any univariate (and even multivariate)distribution via a family of objects called “Stein operators”. This leads to a wide variety of Stein-typeintegration by parts identities and it is natural to wonder whether Chen’s approach can be used toobtain generalizations of Klaassen’s theorem. First steps in this direction are detailed in [51, 52];in particular it is seen that general lower variance bounds are easy to obtain from generalized Steinidentities in the same way as in (1.9). Nevertheless, the method of proof in (1.10) for the upperbound cannot be generalized to arbitrary targets and, even in cases where the method does apply,the assumptions under which the bounds hold are quite stringent. To the best of our knowledge,the first to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], bycombining generalized Stein identities – expressed in terms of the Stein kernel τp(x) – with Hoeffding’sidentity (1.4). The scope of Saumard’s weighted Poincare inequalities is, nevertheless, limited and ageneral result such as Klaassen’s is, to this date, not available in the literature.

The main contributions of this paper can be categorized in two types:

• Covariance identities and inequalities. The first main contribution of this paper is a general-ization of Klaassen’s variance bounds from Theorem 1.1 to covariance inequalities of arbitraryfunctionals of arbitrary univariate targets under minimal assumptions (see Theorems 3.1 and3.5). Our results hereby therefore also contains basically the entire literature on the topic. More-over, the weights that appear in our bounds bear a clear and natural interpretation in terms

3

Page 4: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

of Stein operators which allow for easy computation for a wide variety of targets, as illustratedin the different examples we tackle as well as in Tables 1, 2 and 3 in which we provide explicitvariance bounds for univariate target distributions belonging to the classical integrated Pearsonand Ord families (see Example 3.8 for a definition). In particular, Klaassen’s bounds now arisenaturally in this setting.

• Stein operators and their properties. The second main contribution of the paper lies in ourmethod of proof, which contributes to the theory of Stein operators themselves. Specifically,we obtain several new probabilistic representations of inverse Stein operators (a.k.a. solutionsto Stein equations) which open the way to a wealth of new manipulations which where hithertounavailable. These representations also lead to new interpretations and ultimately new handleson several quantities which are crucial to the theory surrounding Stein’s method (such as Steinkernels, Stein equations, Stein factors, and Stein bounds). Finally the various objects we identifyprovide natural connections with other topics of interest, including the well-known connectionwith Sturm-Liouville theory already identified in [30].

The paper is organised as follows. Section 2 contains the theoretical foundations of the paper. InSection 2.1 we recall the theory of canonical and standardized Stein operators introduced in [53] andintroduce a (new) notion of inverse Stein operator (Definition 2.4). We also identify minimal conditionsunder which Stein-type probabilistic integration by parts formulas hold (see Lemmas 2.3 and 2.16).In Section 2.2 we provide the representation formulas for the inverse Stein operator (Lemmas 2.18and 2.19). In Section 2.3 we clarify the conditions on the test functions under which the differentidentities hold, and provide bridges with the classical assumptions in the literature. Section 2.4contains bounds on the solutions to the Stein equations. Section 3 contains the covariance identitiesand inequalities. After re-interpreting Hoeffding’s identity (1.4) we obtain general and flexible lowerand upper covariance bounds (Proposition 3.1 and Theorem 3.5). We then deduce Klaassen’s bounds(Corollary 3.7) and provide examples for several concrete distributions, with more examples deferredto the three tables mentioned above. Finally a discussion is provided in Section 4, wherein severalexamples are treated and connections with other theories are established, for instance the Brascamp-Lieb inequality (Corollary 4.1) and Menz and Otto’s asymmetric Brascamp-Lieb inequality (Corollary4.2), as well as the link with an eigenfunction problem which can be seen as an extended Sturm-Liouville problem. The proofs from Section 2.3 are technical and postponed to the appendix A.

2 Stein differentiation

Stein’s method consists in a collection of techniques for distributional approximation that was origi-nally developed for normal approximation in [69] and for Poisson approximation in [25]; for expositionssee the books [70, 7, 8, 26, 57] and the review papers [61, 63, 21]. Outside the Gaussian and Poissonframeworks, there exist several non-equivalent general theories allowing to setup Stein’s method forlarge swaths of probability distributions, of which we single out the papers [22, 31, 71] for univariatedistributions under analytical assumptions, [4, 5] for infinitely divisible distributions, [6] for discretemultivariate distributions, and [54, 38, 39] as well as [34] for multivariate densities under diffusiveassumptions.

The backbone of the present paper consists in the approach from [50, 53, 62]. Before introducingthese results, we fix the notations. Let X∈ B(IR) and equip it with some σ-algebra A and σ-finitemeasure µ. Let X be a random variable on X , with induced probability measure PX which is absolutelycontinuous with respect to µ; we denote by p the corresponding probability density, and its supportby S(p) = {x ∈ X : p(x) > 0}. As usual, L1(p) is the collection of all real valued functions f such thatE|f(X)| <∞. We sometimes call the expectation under p the p-mean. Although we could in principlekeep the discussion to come very general, in order to make the paper more concrete and readable weshall restrict our attention to distributions satisfying the following Assumption.

4

Page 5: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

Assumption A. The measure µ is either the counting measure on X = Z or the Lebesgue measure onX = IR. If µ is the counting measure then there exist a<b ∈ Z∪{−∞,∞} such that S(p) = [a, b]∩Z. Ifµ is the Lebesgue measure then there exist a, b ∈ IR {−∞,∞} such that S(p)o =]a, b[ and S(p) = [a, b].Moreover, the measure µ is not point mass.

Here not allowing point mass much simplifies the presentation. Stein’s method for point mass isavailable in [60].

Let ` ∈ {−1, 0, 1}. In the sequel we shall restrict our attention to the following three derivative-typeoperators:

∆`f(x) =

{f ′(x), if ` = 0;1` (f(x+ `)− f(x)) if ` ∈ {−1,+1},

with f ′(x) the weak derivative defined Lebesgue almost everywhere, ∆+1(≡∆+) the classical forwarddifference and ∆−1(≡∆−) the classical backward difference. Whenever ` = 0 we take µ as the Lebesguemeasure and speak of the continuous case; whenever ` ∈ {−1, 1} we take µ as the counting measureand speak of the discrete case. There are two choices of derivatives in the discrete case, only one inthe continuous case. We let dom(∆`) denote the collection of functions f : IR→ IR such that ∆`f(x)exists and is finite µ-almost surely. In the case ` = 0, this corresponds to all absolutely continuousfunctions; in the case ` = ±1 the domain is the collection of all functions on Z. For ease of referencewe note that, if f ∈ dom(∆`) is such that ∆`f I[a, b] ∈ L1(µ) then, for all c, d such that a ≤ c ≤ d ≤ bwe have

∫ d

c∆`f(x)µ(dx) =

∫ dc f′(x)dx = f(d)− f(c) if ` = 0∑d

j=c ∆−f(x) = f(d)− f(c− 1) if ` = −1∑dj=c ∆+f(x) = f(d+ 1)− f(c) if ` = +1

which we summarize as ∫ d

c∆`f(x)µ(dx) = f(d+ a`)− f(c− b`) (2.1)

wherea` = I[` = 1] and b` = I[` = −1]. (2.2)

We stress the fact that the values at c, d are understood as limits if either is infinite.

2.1 Stein operators and Stein equations

Our first definitions come from [53]. We first define dom(p,∆`) as the collection of f : IR → IR suchthat f p ∈ dom(∆`).

Definition 2.1 (Canonical Stein operators). Let f ∈ dom(p,∆`) and consider the linear operatorf 7→ T `p f defined as

T `p f(x) =∆`(f(x)p(x))

p(x)

for all x ∈ S(p) and T `p f(x) = 0 for x /∈ S(p). The operator T `p is called the canonical (`-)Steinoperator of p. The cases ` = 1 and ` = −1 provide the forward and backward Stein operators, denotedby T +

p and T −p , respectively; the case ` = 0 provides the differential Stein operator denoted by Tp.

To describe the domain and the range of T `p we introduce the following sets of functions:

F (0)(p) =

{f ∈ L1(p) : E[f(X)] = 0

};

F (1)` (p) =

{f ∈ dom(p,∆`) : ∆`(fp)I[S(p)] ∈ L1(µ) and

∫S(p)

∆`(fp)(x)µ(dx) = E[T `p f(X)

]= 0

}.

5

Page 6: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

We draw the reader’s attention to the fact that the second condition in the definition of F (1)` (p) can

be rewrittenf(b+ a`)p(b+ a`) = f(a− b`)p(a− b`).

The next lemma, which follows immediately from the definition of T `p f and of the different sets of

functions, shows why F (1)` (p) is called the canonical Stein class.

Lemma 2.2 (Canonical Stein class). For f ∈ F (1)` (p), T `p f ∈ F (0)(p) .

Crucially for the results in this paper, for all f ∈ dom(∆`), g ∈ dom(∆−`) such that f(·)g(· − `) ∈dom(∆`) the operators ∆` satisfy the product rule

∆`(f(x)g(x− `)

)= (∆`f(x))g(x) + f(x)∆−`g(x) (2.3)

for all ` ∈ {−1, 0, 1}. This product rule leads to an integration by parts (IBP) formula (a.k.a. Abel-typesummation formula) as follows.

Lemma 2.3 (Stein IBP formula - version 1). For all f ∈ dom(p,∆`), g ∈ dom(∆−`) such that (i)

f(·)g(· − `) ∈ F (1)` (p) and (ii) f(·)∆−`g(·) ∈ L1(p) we have

E[(T `p f(X))g(X)

]= −E

[f(X)∆−`g(X)

]. (2.4)

Proof. Under the stated assumptions, we can apply (2.3) to get

T `p(f(x)g(x− `)

)= (T `p f(x))g(x) + f(x)(∆−`g(x)) (2.5)

for all x ∈ S(p). Condition (i) in the statement guarantees that the left hand side (l.h.s.) of (2.5) hasmean 0, while condition (ii) guarantees that we can separate the expectation of the sum on the righthand side (r.h.s.) into the sum of the individual expectations.

A natural interpretation of (2.4) is that operator T `p is, in some sense to be made precise, the

skew-adjoint operator to ∆−` with respect to the scalar product 〈f, g〉 = E [f(X)g(X)]; this providesa supplementary justification to the use of the terminology “canonical” for operator T `p . We discussa consequence of this interpretation in Section 4. The conditions under which Lemma 2.3 holds areall but transparent. We clarify these assumptions in Section 2.3. For more details on Stein class andoperators, we refer to [53] for the construction in an abstract setting, [50] for the construction in thecontinuous setting (i.e. ` = 0) and [33] for the construction in the discrete setting (i.e. ` ∈ {−1, 1}).Multivariate extensions are developed in [62].

The fundamental stepping stone for our theory is an inverse of the canonical operator T `p providedin the next definition.

Definition 2.4 (Canonical pseudo inverse Stein operator). Let ` ∈ {−1, 0, 1} and recall the notationsa`, b` from (2.2). The canonical pseudo-inverse Stein operator L`p for the operator T `p is defined, forh ∈ L1(p), as

L`ph(x) =1

p(x)

∫ x−a`

a(h(u)− E[h(X)])p(u)µ(du) =

1

p(x)

∫ b

x+b`

(E[h(X)]− h(u))p(u)µ(du) (2.6)

for all x ∈ S(p) and L`ph(x) = 0 for all x /∈ S(p).

Equality between the second and third expressions in (2.6) is justified because h ∈ L1(p) so thatthe integral of h(·) − E[h(X)] over the whole support cancels out. For ease of reference we detail L`p

6

Page 7: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

in the three cases that interest us:

L0ph(x) =

1

p(x)

∫ x

a(h(u)− E[h(X)])p(u)du =

1

p(x)

∫ b

x(E[h(X)]− h(u))p(u)du (` = 0)

L−p h(x) =1

p(x)

x∑j=a

(h(j)− E[h(X)])p(j) =1

p(x)

b∑j=x+1

(E[h(X)]− h(j))p(j) (` = −1)

L+p h(x) =

1

p(x)

x−1∑j=a

(h(j)− E[h(X)])p(j) =1

p(x)

b∑j=x

(E[h(X)]− h(j))p(j) (` = 1).

Note that L−p h(b) = 0 but L−p h(a) = h(a) − E[h(X)] and, conversely, L+p h(a) = 0 but L+

p h(b) =

E[h(X)] − h(b). The denomination pseudo-inverse-Stein operator for L`p is justified by the followinglemma whose proof is immediate.

Lemma 2.5. For any h ∈ L1(p), L`ph ∈ F(1)` (p). Moreover, (i) for all h ∈ L1(p) we have T `p L`ph(x) =

h(x)−E[h(X)] at all x ∈ S(p) and (ii) for all f ∈ F (1)` (p) we have L`pT `p f(x) = f(x)− f(a+−b`)p(a+−b`)

p(x) =

f(x)− f(b−+a`)p(b−+a`)

p(x) at all x ∈ S(p). Operator L`p is invertible (with inverse T `p ) on the subclass of

functions in F (0)(p)∩F (1)(p) which, moreover, satisfy f(b−+a`)p(b−+a`) = f(a+−b`)p(a+−b`) = 0.

Starting from (2.5) we postulate the next definition.

Definition 2.6 (Standardizations of the canonical operator). Fix ` ∈ {−1, 0, 1} and η ∈ L1(p). Theη-standardized Stein operator is

A`,ηp g(x) = T `p(L`pη(·)g(· − `)

)(x) =

(η(x)− E[η(X)]

)g(x) + L`pη(x)

(∆−`g(x)

)(2.7)

acting on the collection F(A`,ηp ) of test functions g such that L`pη(·)g(·−`) ∈ F (1)` (p) and (L`pη)∆−`g ∈

L1(p).

Remark 2.7. The conditions appearing in the definition of F(A`,ηp ) are tailored to ensure that allidentities and manipulations follow immediately. For instance, the requirement that L`pη(·)g(· − `) ∈F (1)` (p) in the definition of F(A`,ηp ) guarantees that the resulting functions A`,ηp g(x) have p-mean 0

and the condition (L`pη)∆−`g ∈ L1(p) guarantees that the expectations of the individual summands onthe r.h.s. of (2.7) exist. Again, our assumptions are not transparent; we discuss them in detail inSection 2.3.

The final ingredient for Stein differentiation is the Stein equation:

Definition 2.8 (Stein equation). Fix ` ∈ {−1, 0, 1} and η ∈ L1(p). For h ∈ L1(p), the A`,ηp -Stein

equation for h is the functional equation A`,ηp g(x) = h(x)− E[h(X)], x ∈ S(p), i.e.(η(x)− E[η(X)]

)g(x) + L`pη(x)

(∆−`g(x)

)= h(x)− E[h(X)], x ∈ S(p). (2.8)

A solution to the Stein equation is any function g ∈ F(A`,ηp ) which satisfies (2.8) for all x ∈ S(p).

Our notations lead immediately to the next result.

Lemma 2.9 (Solution to the Stein equation). Fix η ∈ L1(p). The Stein equation (2.8) for h ∈ L1(p)is solved by

gp,`,ηh (x) =L`ph(x+ `)

L`pη(x+ `)(2.9)

with the convention that gp,`,ηh (x) = 0 for all x+ ` outside of S(p).

7

Page 8: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

Proof. With g = gp,`,ηh ,

A`,ηp g(x) = T `p(L`pη(·)g(· − `)

)(x) = T `p

(L`pη(·)

L`ph(·)L`pη(·)

)(x) = T `p

(L`ph(·)

)(x)

= h(x)− E[h(X)]

using Lemma 2.5 for the last step. Hence (2.8) is satisfied for all x ∈ S(p). Since, by construction,

g ∈ F(A`,ηp ), the claim follows.

When the context is clear then we drop the superscripts and the subscript in g of (2.9). Beforeproceeding we provide two examples. The notation Id refers to the identity function x→ Id(x) = x.

Example 2.10 (Binomial distribution). Let p(x) =(nx

)θx(1 − θ)n−x be the binomial density with

parameters (n, θ) and S(p) = [0, n] ∩ IN; assume that 0 < θ < 1. Stein’s method for the binomialdistribution was first developed in [32] using ∆−; see also [68, 43].

Picking ` = 1, the class F (1)+ (p) consists of functions f : Z → IR which are bounded on S(p) and

f(0) = 0. Fixing η(x) = x− nθ gives L+bin(n,θ)η(x) = −(1− θ)x leading to

A+,Idbin(n,θ)g(x) = (x− nθ)g(x)− (1− θ)x∆−g(x) (2.10)

with corresponding class F(A+,Id

bin(n,θ)

)which contains all functions g : Z → IR. The solution to the

A+,Idbin(n,θ)-Stein equation (see (2.8)) is

g+(x) =−1

(1− θ)(x+ 1)p(x+ 1)

x∑j=0

(h(j)− E[h(X)])p(j) for all 0 ≤ x ≤ n− 1

and g+(n) = 0.

Picking ` = −1, the class F (1)− (p) consists of functions f : Z→ IR which are bounded on S(p) and

such that f(n) = 0. Again fixing η(x) = x− nθ gives L−bin(n,θ)η(x) = −θ(n− x) leading to

A−,Idbin(n,θ)g(x) = (x− nθ)g(x)− θ(n− x)∆+g(x)

acting on the same class as (2.10). The solution to the A−,Idbin(n,θ)-Stein equation is

g−(x) =−1

θ(n− (x− 1))p(x− 1)

x−1∑j=0

(h(j)− E[h(X)])p(j) for all 1 ≤ x ≤ n

and g−(0) = 0. The function −g− is studied in [32] where bounds on ‖∆−g−‖ are provided (seeequation (10) in that paper); see also Section 2.4 where bounds on ‖g−‖ are provided.

Example 2.11 (Beta distribution). Let p(x) = xα−1(1 − x)β−1/B(α, β) be the beta density withparameters (α, β) and S(p) = (0, 1). Stein’s method for the beta distribution was developed in [37, 31]using the Stein operator Af(x) = x(1 − x)f ′(x) + (α(1 − x) − βx)f(x). In our notations, we have

` = 0 and F (1)0 (p) consists of functions f : IR → IR such that f(0+)p(0+) = f(1−)p(1−) and |(fp)′|

is Lebesgue integrable on [0, 1]. Fixing η(x) = x − αα+β gives Lbeta(α,β)η(x) = −x(1−x)

α+β leading to theoperator

AIdBeta(α,β)g(x) =

(x− α

α+ β

)g(x)− x(1− x)

α+ βg′(x)

8

Page 9: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

with domain F(AId

Beta(α,β)

)the set of differentiable functions g : IR → IR such that x(1 − x)g(x) ∈

F (1)0 (p) and x(1− x)g′(x) ∈ L1(p). The solution to the AId

Beta(α,β)-Stein equation is

g(x) =−(α+ β)

x(1− x)p(x)

∫ x

0(h(u)− E[h(X)])p(u)du, x ∈ (0, 1).

The operator AIdBeta(α,β)f is, up to multiplication by α+β, the classical Stein operator Af for the beta

density, see [37, 31] for details and bounds on solutions and their derivatives. See also Section 2.4where bounds on ‖g‖ are provided.

In order to propose a more general example, we recall the concept of a Stein kernel, here extendedto continuous and discrete distributions alike.

Definition 2.12 (The Stein kernel). Let X ∼ p have finite mean. The (`-)Stein kernel of X (or ofp) is the function

τ `p(x) = −L`p(Id)(x).

Metonymously, we refer to the random variable τ `p(X) as the (`-)Stein kernel of X.

Remark 2.13. The function τ `p(·) is studied in detail for ` = 0 in [70, Lecture VI]. This function isparticularly useful for Pearson (and discrete Pearson a.k.a. Ord) distributions which are characterizedby the fact that their Stein kernel τ `p is a second degree polynomial, see Example 3.8. For more on thistopic, we also refer to forthcoming [33] as well as [28, 36, 35] wherein important contributions to thetheory of Stein kernels are provided in a multivariate setting.

The next example gives some (`-)Stein kernels, exploiting the fact that if the mean of X is ν, thenL`p(Id)(x) = L`p(Id− ν)(x).

Example 2.14. If X ∼ Bin(n, θ) then using η(x) = x−nθ, Example 2.10 gives τ+bin(n,θ)(x) = (1−θ)x

and τ−bin(n,θ)(x) = θ(n − x). If X ∼ Beta(α, β) then Example 2.11 with η(x) = x − αα+β gives

τ0Beta(α,β)(x) = x(1− x)/(α+ β).

Example 2.15 (A general example). Let p satisfy Assumption A and suppose that it has finite meanν. Fixing η(x) = x− ν, operator (2.7) becomes

Aτ`pp g(x) =

(x− ν

)g(x)− τ `p(x)∆−`g(x)

with corresponding class F(Aτ

`pp

)which contains all functions g : IR → IR such that τ `p(·)g(· − `) ∈

F (1)` (p) and τ `p∆−`g ∈ L1(p). Again, we stress that such conditions are clarified in Section 2.3. Using

Lemma 2.9, the solution to the Aτpp Stein equation is

gp,`,hId (x) =−L`ph(x+`)

τ `p(x+`).

Bounds on ‖g‖ are provided in Section 2.3. Stein’s method based on Aτ`pp is already available in several

important subcases, e.g. in [67, 47, 31] for continuous distributions.

The construction is tailored to ensure that all operators have mean 0 over the entire classes offunctions on which they are defined. We immediately deduce the following family of Stein integrationby parts formulas:

9

Page 10: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

Lemma 2.16 (Stein IBP formula - version 2). Let X ∼ p. Then

E[−{L`pf(X)}∆−`g(X)

]= E [(f(X)−E[f(X)])g(X)] (2.11)

for all f ∈ L1(p) and all g ∈ dom(∆−`) such that L`pf(·)g(· − `) ∈ F (1)` (p) and L`pf(·)∆−`g(·) ∈ L1(p).

Proof. Identity (2.11) follows directly from the Stein product rule in [53, Theorem 3.24] or by usingthe fact that expectations of the operators in (2.7) are equal to 0.

We stress the fact that in the formulation of Lemma 2.16 the test functions f and g do not play asymmetric role. If g ∈ L1(p) then the right hand side of (2.11) is the covariance Cov(f(X), g(X)). Weshall use this heavily in our future developments. Similarly as for Lemma 2.3, the conditions underwhich Lemma 2.16 applies are not transparent in their present form. In Section 2.3 various explicitsets of conditions are provided under which the IBP (2.11) is applicable.

2.2 Representations of the inverse Stein operator

This section contains the first main results of the paper, namely probabilistic representations for thisoperator. Such representations are extremely useful for manipulations of the operators. We start witha simple rewriting of L`p. Given ` ∈ {−1, 0, 1}, recall the notation a` = I[` = 1] and define

χ`(x, y) = I[x ≤ y − a`]. (2.12)

Such generalized indicator functions particularize, in the three cases that interest us, to χ0(x, y) =I[x ≤ y]

(` = 0

), χ−(x, y) = I[x ≤ y]

(` = −1

)and χ+(x, y) = I[x < y]

(` = 1

). Their properties lead

to some form of “calculus” which shall be useful in the sequel.

Lemma 2.17 (Chi calculation rules). The function χ`(x, y) is non-increasing in x and non-decreasingin y. For all x, y we have

χ`(x, y) + χ−`(y, x) = 1 + I[` = 0]I[x = y] (2.13)

Moreover,

χ`(u, y)χ`(v, y) = χ`(max(u, v), y) and χ`(x, u)χ`(x, v) = χ`(x,min(u, v)). (2.14)

Let p with support S(p) satisfy Assumption A. Then for any f ∈ L1(p) it is easy to check fromthe definition (2.6) that

L`pf(x) =1

p(x)E[χ`(X,x)

(f(X)− E[f(X)]

)](2.15)

=1

p(x)E[(χ`(X,x)− E[χ`(X,x)]

)(f(X)− E[f(X)]

)].

Next, define

Φ`p(u, x, v) =

χ`(u, x)χ−`(x, v)

p(x)(2.16)

for all x ∈ S(p) and 0 elsewhere. This function is used in the following representation formula for theStein inverse operator:

Lemma 2.18 (Representation formula I). Let X,X1, X2 be independent copies of X ∼ p with supportS(p). Then, for all f ∈ L1(p) we have

− L`pf(x) = E[(f(X2)− f(X1))Φ`

p(X1, x,X2)]. (2.17)

10

Page 11: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

Proof. The L1(p) condition on f suffices for the expectation on the r.h.s. of (2.17) to be finite for allx ∈ S(p). Suppose without loss of generality that E[f(X)] = 0. Using that X1, X2 are i.i.d., we reap

E[(f(X2)− f(X1))χ`(X1, x)χ−`(x,X2)

]= E

[χ`(X1, x)

]E[f(X2)χ−`(x,X2)]− E[f(X1)χ`(X1, x)]E

[χ−`(x,X2)

]= E

[χ`(X1, x)

]E[f(X2)

(1− χ`(X2, x)

)]− E

[f(X1)χ`(X1, x)

]E[χ−`(x,X2)]

= −E[f(X)χ`(X,x)

](E[χ`(X,x)] + E[χ−`(x,X)]

),

where in the third line we used the fact that E[f(X)I[` = 0]I[X = x]] = 0 under the stated assumptions.For the same reasons, we have E[χ`(X,x) + χ−`(x,X)] = 1 for all x ∈ X and all ` ∈ {−1, 0, 1}. Theconclusion follows by recalling (2.15).

The function defined in (2.16) allows to perform “probabilistic integration” as follows: if f ∈dom(∆−`) is such that (∆−`f) is integrable on [x1, x2] ∩ S(p) then

f(x2)− f(x1) = E[Φ`p(x1, X, x2)∆−`f(X)

]=

∫ x2x1f ′(u)du (` = 0)∑x2−1

j=x1∆+f(j) (` = −1)∑x2

j=x1+1 ∆−f(j) (` = 1)

(2.18)

for all x1 < x2 ∈ S(p). If, furthermore, f ∈ L1(p) then (by a conditioning argument)

E[(f(X2)− f(X1))I[X1 < X2]] = E[Φ`p(X1, X,X2)∆−`f(X)

].

Equation (2.18) leads to the next representation formula for the inverse Stein operator.

Lemma 2.19 (Representation formula II). Let X ∼ p. Define the kernel Kp on S(p)× S(p) by

K`p(x, x

′) = E[χ`(X,x)χ`(X,x′)]− E[χ`(X,x)

]E[χ`(X,x′)

]. (2.19)

Then K`p(x, x

′) is symmetric and positive. Moreover, for all f ∈ dom(∆−`) such that f ∈ L1(p) wehave,

− L`pf(x) = E

[K`p(X,x)

p(X)p(x)∆−`f(X)

]. (2.20)

Proof. Symmetry of K`p is immediate. To see that it is positive, applying first (2.14) and then (2.13),

K`p(x, x

′) = E[χ`(X,min(x, x′))

](1− E

[χ`(X,max(x, x′))

])= E

[χ`(X,min(x, x′))

]E[χ−`(max(x, x′), X)

](2.21)

which is necessarily positive. To prove (2.20), we insert (2.18) into (2.17), to obtain

−L`pf(x) = E[∆−`f(X ′)Φ`

p(X1, X′, X2)Φ`

p(X1, x,X2)]

= E[∆−`f(X ′)E

[Φ`p(X1, X

′, X2)Φ`p(X1, x,X2) |X ′

]].

For all x, x′ ∈ S(p), by (2.14),

E[Φ`p(X1, x,X2)Φ`

p(X1, x′, X2)

]=

1

p(x)p(x′)E[χ`(X,x)χ`(X,x′)]E[χ−`(x,X)χ−`(x′, X)]

=1

p(x)p(x′)

(E[χ`(X,min(x, x′))

]E[χ−`(max(x, x′), X)

]).

Using (2.21), we recognize the kernel K`p(x, x

′) in the numerator, and identity (2.20) follows.

11

Page 12: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

Example 2.20. Representations (2.17) and (2.20) can easily be applied to obtain representations forthe Stein kernel τ `p(x):

τ `p(x) = −L`p(Id)(x) = E[(X2 −X1)Φ`

p(X1, x,X2)]

= E

[K`p(X,x)

p(X)p(x)

].

In particular the Stein kernel is positive on S(p).

Identity (2.17) seems to be new, although it is present in non-explicit form in [22, Equation (4.16)].Representation (2.20) is, in the continuous ` = 0 case, already available in [64]. The kernel K`

p(x, x′)

is a classical object in the theory of covariance representations and inequalities; an early appearanceis attributed by [59] to [41] (see [42, pp 57–109] for an English translation). The perhaps not verysurprising extension to the discrete case is, to the best of our knowledge, new.

As a first result from our set-up, (2.20) applied to the function f(x) = T `p 1(x) immediately givesthe following:

Proposition 2.21 (Menz-Otto formula). Suppose that the constant function 1 belongs to F (1)` (p),

that −∆−`T `p 1(x) > 0 for almost all x ∈ S(p) and ∆−`(T `p 1) ∈ L1(µ). Then, for every x′ ∈ S(p), thefunction

p?x′(x) =K`p(x, x

′)

p(x′)

(−∆−`T `p 1(x)

)(2.22)

is a density on S(p) with respect to µ.

Proof. From (2.20) with f(x) = T `p 1(x) = ∆`(p(x))p(x) and L`pf(x) = 1,

1 = −E

[K`p(X,x)

p(X)p(x)∆−`T `p 1(X)

]=

∫ b

a

K`p(u, x)

p(u)p(x)

(−∆−`T `p 1(u)

)p(u)µ(du)

=

∫ b

a

K`p(u, x)

p(x)

(−∆−`T `p 1(u)

)µ(du) =

∫ b

ap?x′(x)µ(dx).

Since 0 ≤ K`p(x, x

′) ≤ 1, the integral exists because −∆−`T `p 1(x)K`

p(x,x)

p(x) ∈ L1(p) and, by assumption,

−∆−`T `p 1(x) > 0. Hence the assertion follows.

Remark 2.22. IfK`

p(x,x)

p(x) is bounded, then the assumptions in Proposition 2.21 are satisfied as soon

as −∆−`T `p 1 ∈ L1(µ). The proposition thus applies when ` = 0 and p(x) = e−H(x) with H a strictlyconvex function such that limx→±∞H

′(x) = 0. This puts us in the context studied by [55] and formula(2.22) is equivalent to their [55, Equation (14)]; we return to this in Section 3.

The next proposition gives some properties of K`p(x, x

′).

Proposition 2.23. (i) It holds that for all x, x′ ∈ S(p), that K`p(x, x

′) ≤ K`p(min(x, x′),min(x, x′)).

(ii) If E[χ`(X,x)]/p(x) is non decreasing, then the function x 7→ K`p(x, x

′)/p(x) is non-decreasing

for x < x′. (iii) If E[χ−`(x,X)]/p(x) is non increasing, then the function x 7→ K`p(x, x

′)/p(x) isnon-increasing for x > x′.

Proof. To see (i), we start from (2.21), K`p(x, x

′)= E[χ`(X,min(x, x′))

]E[χ−`(max(x, x′), X)

]and by

Lemma 2.17, χ−`(u, x) is non-increasing in u: χ−`(max(x, x′), X) ≤ χ−`(min(x, x′), X). We deducethat

K`p(x, x

′) ≤ E[χ`(X,min(x, x′))

]E[χ−`(min(x, x′), X)

]

12

Page 13: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

Assertion (i) follows by reverting the argument, because (χ`(X,min(x, x′))2 = χ`(X,min(x, x′)). Tosee (ii), assume that E[χ`(X,x)]/p(x) is non-decreasing. Then with (2.21), for x < x′,

1

p(x)K`p(x, x

′) =1

p(x)E[χ`(X,min(x, x′))

]E[χ−`(max(x, x′), X)

]=

(1

p(x)E[χ`(X,x)

])E[χ−`(x′, X)

];

the second factor is a constant, and the first factor is assumed to be non-decreasing. Hence theassertion follows. For (iii), assume that E[χ−`(x,X)]/p(x) is non-increasing; then similarly as above,for x > x′,

1

p(x)K`p(x, x

′) =1

p(x)E[χ`(X,min(x, x′))

]E[χ−`(max(x, x′), X)

]=(E[χ`(X,x′)

])( 1

p(x)E[χ−`(x,X)

]);

the first factor is constant, and the second factor is non increasing. Hence the assertion follows.

Figures 1 and 2 display the functions x 7→ K`p(x, x

′)/p(x) (for various values of x′) and x 7→K`p(x, x)/p(x) for the standard normal and several choices of the parameters in beta, gamma, binomial,

Poisson and hypergeometric distributions.

Example 2.24. The following facts are easy to prove:

1. If p(x) is the standard normal distribution then K0p(x, x)/p(x) behaves as 1/|x| for large |x|, see

Figure 2a.

2. If p(x) is gamma then K0p(x, x)/p(x) behaves as a constant for large x, see Figure 2c.

3. The function x 7→ K0p(x, x)/p(x) is not in L1(p) for p a Cauchy distribution.

4. If p is strictly-log concave then K`p(x, x)/p(x) is bounded.

2.3 Sufficient conditions and integrability

As anticipated, we now study the conditions under which the IBP Lemmas 2.3 and 2.16 hold. Allproofs are technical manipulations of basic calculus and relegated to the Appendix A.

We start by the decryption of the conditions for Lemma 2.3. Recall the notations a` and b`from (2.2). Furthermore if ` = 0 we write f(a+) = limx→a,x>a f(x) and f(b−) = limx→b,x<b f(x).In the case that a = −∞ or b = ∞, for ` ∈ {−1, 0, 1}, we write f(−∞+) = limx→−∞ f(x) andf(∞−) = limx→∞ f(x). To simplify notation, if ` ∈ {−1, 1} and a 6= −∞, we write f(a+) = f(a), andsimilarly, if b 6=∞, f(b−) = f(b).

Proposition 2.25 (Sufficient conditions for IBP – version 1). Let f ∈ dom(p,∆`) and g ∈ dom(∆−`).In order for (2.4) to hold it suffices that they jointly satisfy the following conditions

(T `p f)g and f(∆−`g) ∈ L1(p) (2.23)

f(b− + a`)g(b− + a` − `)p(b− + a`) = f(a+ − b`)g(a+ − b` − `)p(a+ − b`). (2.24)

For ease of future reference, we spell out (2.24) in the three cases that interest us:f(b−)g(b−)p(b−) = f(a+)g(a+)p(a+) ` = 0

f(b−)g(b− + 1)p(b−) = 0 ` = −1

f(a+)g(a+ − 1)p(a+) = 0 ` = 1.

We now derive a set of (almost) necessary and sufficient conditions under which (2.11) holds.

13

Page 14: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

(a) (b)

(c) (d)

(e) (f)

Figure 1: The functions x 7→ K`p(x, x

′)/p(x) for different (fixed) values of x′ and p the standard normaldistribution (Figure 1a); beta distribution with parameters 1.3 and 2.4 (Figure 1b); gamma distributionwith parameters 1.3 and 2.4 (Figure 1c); binomial distribution with parameters (50, 0.2) (Figure 1d);Poisson distribution with parameter 20 (Figure 1e); hypergeometric distribution with parameters 100,50 and 500 (Figure 1f).

14

Page 15: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

(a) (b)

(c)(d)

(e)(f)

Figure 2: The functions x 7→ K`p(x, x)/p(x) for different parameter values p the standard normal

distribution (Figure 2a); beta distribution (Figure 2b); gamma distribution (Figure 2c); binomial dis-tribution with parameters (Figure 2d); Poisson distribution (Figure 2e); hypergeometric distribution(Figure 2f).

15

Page 16: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

Proposition 2.26 (Sufficient conditions for IBP – version 2). Let g ∈ dom(∆−`). In order for (2.11)to hold, it is necessary and sufficient that they jointly satisfy the three following conditions:

f, g and fg ∈ L1(p), (2.25)

L`pf(∆−`g) ∈ L1(p) (2.26)(L`pf(b− + a`)

)g(b− + a` − `)p(b− + a`) =

(L`pf(a+ − b`)

)g(a+ − b` − `)p(a+ − b`). (2.27)

Requirement (2.25) is natural and condition (2.27) is mild as it is satisfied as soon as g and/or fare well behaved at the edges of the support. Condition (2.26) (which is already stated in the originalstatement of Lemma 2.16) is harder to fathom. In order to make it even more readable, and facilitatethe connexion with the literature, we specialise the conditions further in our next result.

Proposition 2.27. Let f, g and fg ∈ L1(p). If g ∈ dom(∆−`) is of bounded variation and satisfiesthe following two conditions:

1. g(a+ − b` − `)P(X ≤ a+ − a` − b`) = 0 and g(b− + a` − `)P(X ≥ b− + a` + b`) = 0

2. g(a+ − b` − `)E[|f(X)|χ`(X ≤ a+ − b`)] = 0 and g(b− + a` − `)E[|f(X)|χ−`(b− + a`, X)] = 0,

then (2.27) holds. In particular if f is bounded or in L2(p), then the condition 2 above is implied bycondition 1.

Remark 2.28. This assumption is closer to what is to be found in the literature, see e.g. [64] in thecase ` = 0. The main difference between the classical assumptions and ours is that we only imposeconditions on one of the functions. We stress that there is a certain degree of redundancy in the items1 and 2 together with the assumption that g ∈ L1(p) and is of bounded variation; the statement couldbe shortened at the loss of readability.

In the sequel, to preserve as much generality as possible and not overburden the statements, wewill simply require that “the assumptions of Lemma 2.16 are satisfied.”

2.4 The inverse Stein operator

We conclude this section by exploring easy consequences of the representations from Section 2.2. Theseresults are also of independent interest to practitioners of Stein’s method.

Lemma 2.29. If f,L`pf(X) ∈ L1(p) then

E[−L`pf(X)

]= E

[(X2 −X1)+(f(X2)− f(X1))

]=

1

2E [(X2 −X1)(f(X2)− f(X1))] (2.28)

where (·)+ denotes the positive part of (·). In particular, if the conditions of Lemma 2.16 are satisfiedwith f(x) = g(x) = Id, then E[τ `p(X)] = Var(X).

Proof. Representation (2.17) gives E[− L`pf(X)

]= E[(f(X2) − f(X1))Φ`

p(X1, X,X2)]. Using (2.18)

with f(x) = x, we have E[Φ`p(x1, X, x2)] = (x2 − x1)+. Hence, after conditioning with respect to

X1, X2, the first equality in (2.28) follows. The second equality follows by symmetry. The secondclaim is immediate under the stated assumptions.

Remark 2.30. Once again, our assumptions are minimal but not transparent. It is easy to spell outthese conditions explicitly for any specific target. For instance if X has bounded support or support IRthen finite variance suffices.

Proposition 2.31. Suppose that all test functions satisfy the conditions in Lemma 2.16. Let ‖f‖S(p),∞ =supx∈S(p) |f(x)|.

16

Page 17: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

1. If f is monotone then L`pf(x) does not change sign.

2. (Uniform bounds Stein bounds) Consider, for h and η in L1(p) the function

gp,`,ηh (x) =L`ph(x+ `)

L`pη(x+ `)

defined in (2.9) which solves the η-Stein equation (2.8) for h. If η is monotone and |h(x) −h(y)| ≤ k|η(x)− η(y)| for all x, y ∈ S(p), then

‖gp,`,ηh ‖S(p),∞ ≤ k.

In particular, if h ∈ L1(p) is Lipschitz continuous with Lipschitz constant 1 then the above

applies with η(x) = x, and ‖gp,0,Idh ‖S(p),∞ ≤ 1.

3. (Non uniform bounds Stein bounds)∣∣∣L`pf(x)∣∣∣ ≤ 2‖f‖S(p),∞

E[χ`(X1, x)]E[χ−`(x,X2)]

p(x)(2.29)

for all x ∈ S(p).

Proof. Recall representation (2.17) which states that

−L`pf(x) =1

p(x)E[(f(X2)− f(X1))χ`(X1, x)χ−`(x,X2)

].

1. If f(x) is monotone then f(X2)−f(X1) is of constant sign conditionally on χ`(X1, x)χ−`(x,X2) =I[X1 + a` ≤ x ≤ X2 − b`] = 1, because on this event, X1 ≤ X2−I[` 6= 0]. Hence the first assertionfollows.

2. Suppose that the function η is strictly decreasing. By definition of g we have, under the statedconditions,

|g(x)| =

∣∣∣∣∣−E[(h(X2)− h(X1))χ`(X1, x+`)χ−`(x+`,X2)

]−E [(η(X2)− η(X1))χ`(X1, x+`)χ−`(x+`,X2)]

∣∣∣∣∣=

∣∣−E [(h(X2)− h(X1))χ`(X1, x+`)χ−`(x+`,X2)]∣∣

E [(η(X1)− η(X2))χ`(X1, x+`)χ−`(x+`,X2)]

≤E[|h(X2)− h(X1)|χ`(X1, x+`)χ−`(x+`,X2)

]E [|η(X2)− η(X1)|χ`(X1, x+`)χ−`(x+`,X2)]

≤ k

for x ∈ S(p).

3. By (2.17), ∣∣L`pf(x)∣∣p(x) = E

[|f(X2)− f(X1)|χ`(X1, x)χ−`(x,X2)

]≤ 2‖f‖S(p),∞E

[χ`(X1, x)χ−`(x,X2)

]≤ 2‖f‖S(p),∞E

[χ`(X1, x)

]E[χ−`(x,X2)

]which leads to the conclusion.

17

Page 18: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

Example 2.32. If p = φ is the standard Gaussian with cdf Φ, then ` = 0 and the third bound inProposition 2.31 reduces to 2||f ||∞Φ(x)(1−Φ(x))

φ(x) . The ratio Φ(x)(1− Φ(x)

)/φ(x) is closely related to

Mill’s ratio of the standard normal law. The study of such a function is classical and much is known.For instance, we can apply [9, Theorem 2.3] to get

1√x2 + 4 + x

≤Φ(x)

(1− Φ(x)

)φ(x)

≤ 4√x2 + 8 + 3x

(2.30)

for all x ≥ 0. Moreover, Φ(x)(1− Φ(x)

)/φ(x) ≤ Φ(0)

(1− Φ(0)

)/φ(0) = 1/2

√π/2 ≈ 0.626. In par-

ticular Proposition 2.31 recovers the well-known bound ‖L`pf‖∞ ≤√π/2‖f‖∞, see e.g. [57, Theorem

3.3.1].

3 Covariance identities and inequalities

We start with an easy lower bound inequality, which follows immediately from Lemma 2.3.

Proposition 3.1 (Cramer-Rao type bound). Let g ∈ L2(p). For any f ∈ F (1)` (p) such that T `p f ∈

L2(p) and the assumptions of Lemma 2.3 are satisfied:

Var[g(X)] ≥E[f(X)(∆−`g(X))

]2E[(T `p f(X)

)2] (3.1)

with equality if and only if there exist α, β real numbers such that g(x) = αT `p f(x)+β for all x ∈ S(p).

Proof. The lower bound (3.1) follows from the fact that T `p f(X) ∈ F (0)(p) for all f ∈ F (1)` (p) by

Lemma 2.2. Therefore, from (2.4), we have{E[f(X)

(∆−`g(X)

)]}2={E[(T `p f(X)

)g(X)

]}2={E[(T `p f(X)

)(g(X)− E[g(X)])

]}2

≤ E[(T `p f(X)

)2]Var[g(X)]

by the Cauchy-Schwarz inequality.

Upper bounds require some more work. We start with an easy consequence of our framework.

Corollary 3.2 (First order covariance identities). For all f, g that jointly satisfy the assumptions ofLemma 2.16, we have

Cov[f(X), g(X)] = E

[∆−`f(X)

K`p(X,X

′)

p(X)p(X ′)∆−`g(X ′)

]. (3.2)

Moreover, if choice f = Id is allowed, then

Cov[X, g(X)] = E[τ `p(X)∆−`g(X)

]. (3.3)

Remark 3.3. Identity (3.2) is provided in [55] (see their equation (11)) in the case ` = 0 for a log-concave density. Some of the history of this identity, including the connection with a classical identityof Hoeffding [41], is provided in [66, Section 2]. The earliest version of the same identity (still for` = 0) we have found in [29], along with applications to measures of correlation as well as furtherreferences. A similar identity is provided in [55], without explicit conditions; a clear statement is givenin [66, Corollary 2.2] where the identity is proved for absolutely continuous f ∈ Lr and g ∈ Ls withconjugate exponents. Our approach shows that it suffices to impose regularity on one of the functionsfor the identity to hold.

18

Page 19: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

Proof. Let f(x) = f(x) − E[f(X)]. Note that ∆`f = ∆`f. To obtain (3.2) we start from (2.11) andnote that if f, g satisfy the assumptions of Lemma 2.16, then

Cov[f(X), g(X)] = E[−{L`pf(X)}∆−`g(X)

]. (3.4)

From this equation, (3.3) follows immediately. Applying (2.20) we obtain

Cov[f(X), g(X)] = E[E[Kp(X

′, X)

p(X ′)p(X)∆−`f(X ′) |X

]∆−`g(X)

]which gives the claim after removing the conditioning.

Example 3.4. Example 2.14 and identity (3.3) give the following covariance identities.

• Binomial distribution: For all functions g : Z→ IR that are bounded on [0, n],

Cov[X, g(X)] = E[(1− θ)X∆−g(X)

]= θE

[(n−X)∆+g(X)

].

Combining the two identities we also arrive at

Cov[X, g(X)] = Var[X]E[∇bin(n,θ)g(X)]

with ∇bin(n,θ) the “natural” binomial gradient ∇bin(n,θ)g(x) = xn∆−g(x) + n−x

n ∆+g(x) from [40].

• Beta distribution: For all absolutely continuous g such that E [|X(1−X)g′(X)|] <∞,

Cov[X, g(X)] =1

α+ βE[X(1−X)g′(X)

].

It is of interest to work as in [45] to obtain a corresponding upper bound, which would providesome “weighted Poincare inequality” such as those described in [64]. The representation formulae(3.2) turns out to simplify the work considerably.

Theorem 3.5. Fix h ∈ L1(p) a decreasing function. For all f, g which T `p c ∈ L2(p)satisfy theassumptions of Lemma 2.16 we have

|Cov[f(X), g(X)]| ≤

√E[(∆−`f(X))2

−L`ph(X)

∆−`h(X)

]√E[(∆−`g(X))2

−L`ph(X)

∆−`h(X)

](3.5)

with equality if and only if there exist αi, i = 1, . . . , 4 real numbers such that f(x) = α1h(x) + α2 andg(x) = α3h(x) + α4 for all x ∈ S(p).

Proof. We simply apply (3.2) and the Cauchy-Schwarz inequality to obtain

|Cov[f(X), g(X)]| =

∣∣∣∣∣E[

∆−`f(X)K`p(X,X

′)

p(X)p(X ′)∆−`g(X ′)

]∣∣∣∣∣=

∣∣∣∣∣∣E ∆−`f(X)√

−∆−`h(X)

√−K`p(X,X

′)

p(X)p(X ′)∆−`h(X ′)

∆−`g(X ′)√−∆−`h(X ′)

√−K`p(X,X

′)

p(X)p(X ′)∆−`h(X)

∣∣∣∣∣∣

√E[

(∆−`f(X))2

∆−`h(X)

K`p(X,X

′)

p(X)p(X ′)∆−`h(X ′)

]√E[

(∆−`g(X))2

∆−`h(X)

K`p(X,X

′)

p(X)p(X ′)∆−`h(X ′)

];

using (2.20) leads to the inequality.The only part of the claim that remains to be proved concerns the saturation condition in the

inequality. This follows from the Cauchy-Schwarz inequality which is an equality if and only if∆−`f(x)/∆−`h(x) ∝ ∆−`g(x′)/∆−`h(x′) is constant throughout S(p). This is only possible underthe stated condition.

19

Page 20: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

Remark 3.6. Theorem 3.5 can be refined using the exact expression for the remainder in the Cauchy-Schwarz inequality, given by the Lagrange-type identity

(E[f(X1, X2)g(X1, X2)])2

= E[f2(X1, X2)]E[g2(X1, X2)]− 1

2E[(f(X1, X2)g(X3, X4)− f(X3, X4)g(X1, X2))2]

with X1, X2, X3, X4 independent copies with density p and f, g ∈ L2(p). Fix h ∈ L1(p) a decreasingfunction such that ‖h‖S(p),∞ <∞. For all f, g which satisfy the assumptions of Lemma 2.16 we have

(Cov[f(X), g(X)])2 = E

[(∆−`f(X))2

−L`ph(X)

∆−`h(X)

]E

[(∆−`g(X))2

−L`ph(X)

∆−`h(X)

]− 1

2R(f, g, h)

with

R(f, g, h) = E

[(∆−`f(X1)∆−`g(X4)−∆−`f(X3)∆−`g(X2)

∆−`h(X1)∆−`h(X4)

∆−`h(X2)∆−`h(X3)

)2

∆−`h(X2)∆−`h(X3)

∆−`h(X1)∆−`h(X4)

K`p(X1, X2)K`

p(X3, X4)

p(X1)p(X2)p(X3)p(X4)

].

In particular when h(x) = x the remainder term simplifies to

R(f, g, h) = E

[(∆−`f(X1)∆−`g(X4)−∆−`f(X3)∆−`g(X2)

)2 K`p(X1, X2)K`

p(X3, X4)

p(X1)p(X2)p(X3)p(X4)

].

Combining Proposition 3.1 and Theorem 3.5 (applied with f = g) we arrive at the following result(applied to a smaller class of functions h) which, as we shall argue below, share a similar flavour tothe upper and lower bounds from Theorem 1.1.

Corollary 3.7 (Klaassen bounds, revisited). For any decreasing function h ∈ L2(p) and all g suchthat Lemma 2.16 applies (with f = g), we have

E[−L`ph(X)(∆−`g(X))

]2Var(h(X)

) ≤ Var[g(X)] ≤ E

[(∆−`g(X))2

−L`ph(X)

∆−`h(X)

]. (3.6)

Equality in the upper bound holds if and only if there exists constants α, β such that g(x) = αh(x)+β.

Proof. For the lower bound, we apply Proposition 3.1 with f(x) = −L`ph(x) so that T `p c(x) = h(x)−−E[h(X)]. For the upper bound we use Theorem 3.5 with f = g.

Example 3.8 (Pearson and Ord families). Tables 1, 2, and 3 present the results for random variableswhose distribution belongs to the Pearson and Ord families of distributions. A random variable X ∼ pbelongs to the integrated Pearson family if X is absolutely continuous and there exist δ, β, γ ∈ IR notall equal to 0 such that τ `p(x)

(:= −L`p(Id)

)= δx2 + βx+ γ for all x ∈ S(p). Similarly, X ∼ p belongs

to the cumulative Ord family if X is discrete and there exist δ, β, γ ∈ IR not all equal to 0 such thatτ `p(x)

(:= −L`p(Id)

)= δx2 + βx+ γ for all x ∈ S(p). The bounds for these distributions generalize the

results e.g. from [2].

Remark 3.9 (About the connection with Klaassen’s bounds). The bounds in Corollary 3.7 and thosefrom Theorem 1.1 are obviously of a similar flavour. Upon closer inspection, however, the connectionis not transparent. In order to clarify this point, we follow [45] and restrict our attention to kernelsof the form

ρ+ζ (x, y) = I[ζ < y ≤ x]− I[x < y ≤ ζ] and ρ−ζ (x, y) = I[ζ ≤ y < x]− I[x ≤ y < ζ]

20

Page 21: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

for some ζ ∈ IR. In our notations, these become

ρ`ζ(x, y) = χ`(ζ, y)χ−`(y, x)− χ`(x, y)χ−`(y, ζ)

for ` ∈ {−1, 0, 1} .We first tackle the relation between the main arguments of the bounds, namely G(x) and g(x).

Given a measurable function g, we mimic the statement of Theorem 1.1 and introduce the generalizedprimitive G(x) = G`ζ(x) :=

∫ρ`ζ(x, y)g(y)µ(dy) + c with c arbitrary, fixed w.l.o.g. to 0. Again in our

notations, this becomes

G`ζ(x) =

∫ x−b`

ζ+a`

g(y)µ(dy)I[ζ < x]−∫ ζ−b`

x+a`

g(y)µ(dy)I[x < ζ].

By construction, ∆−`G`ζ(x) = g(x) for all ζ and all `, as expected. Nevertheless, in order for G`ζ(x)to be well-defined, strong (joint) assumptions on g and ζ are required; for instance, if g(x) = 1 then ζmust be finite and G`ζ,c(x) = x− ζ while if g(x) has p-mean 0 then the values ζ = ±∞ are allowed.

Next, we examine the connection between the lower bound (1.7) and the lower bound of (3.6). Letk ∈ L2(p) have p-mean 0. Then

E[k(X)ρ`ζ(X,x)] = E[k(X)χ−`(x,X)]χ`(ζ, x)− E[k(X)χ`(X,x)]χ−`(x, ζ)

= E[k(X)]χ`(ζ, x)− E[k(X)χ`(X,x)](χ`(ζ, x) + χ−`(x, ζ)

)= −E[k(X)χ`(X,x)]

so that

K(x) =1

p(x)

∫ρ`ζ(z, x)k(z)p(z)µ(dz) = −L`pk(x)

and thus (1.7) follows from the lower bound of (3.6).Finally, we consider the upper bounds (1.6) and (3.6). Let H(x) = H`

ζ(x) be a generalized primitiveof some nonnegative function h. The same manipulations as above lead to

1

p(x)

∫ρ`ζ(z, x)H(z)p(z)µ(dz) = −L`ph(x)− E[H(X)]

p(x)(P (x− a`)− χ`(ζ, x)).

If, following [45], we choose ζ in such a way that E[H(X)] = 0 (this is equivalent to requiring∫ bζ+a`

h(y)(1 − P (y + b`))µ(dy) =∫ ζ−b`a h(y)(P (y − a`))µ(dy)) then we see that the upper bound in

(3.6) is equivalent to (1.6).Of course there is some gain in generality at allowing for a general kernel ρ as in Theorem 1.1,

though this comes at the expense of readability: given a positive function h, understanding the formof function H is actually non trivial and our result illuminates Klaassen’s discovery by providing theconnection with Stein characterizations.

4 About the weights

The freedom of choice in the test functions h appearing in the bounds invite a study of the impact ofthe choice of h on the validity and quality of the resulting inequalities.

4.1 Score function and the Brascamp-Lieb inequality

The form of the lower bound in Proposition 3.1 encourages the choice f(x) = 1. This is only permitted

if the constant function 1 ∈ F (1)` (p) and E

[(T `p 1(X)

)2]<∞; these are two strong assumptions which

21

Page 22: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

exclude some natural targets such as e.g. the exponential or beta distributions. If this choice ispermitted, then we reap the lower bound

Var[g(X)] ≥ E[∆−`g(X)]

I`(p)

with I`(p) =[(T `p 1(X)

)2].

The function T `p 1(x) = ∆`(p(x))p(x) is some form of generalized score function and I`(p) a generalized

Fisher information. Indeed, if ` = 0 and X ∼ p is absolutely continuous, then T `p 1(x) = (log p(x))′

is exactly the (location) score function of p and I(0)(p) is none other than the (location) Fisher

information of p. More generally we note that if 1 ∈ F (1)` (p) then E[T `p 1(X)] = 0 and, by Lemma 2.3,

it satisfies

E[T `p 1(X)g(X)] = −E[∆−`g(X)]

for all appropriate g; this further reinforces the analogy.The corresponding upper bound from (3.5) is obtained for h(x) = T `p 1(x) in (3.6). Suppose that

p(b− + a`) = p(a+ − b`) = 0. By construction, L`ph(x) = IS(p)(x). If we can further suppose that

T `p 1(x) is a decreasing function then

|Cov[f(X), g(X)]| ≤

√E[

(∆−`f(X))2

−∆−`T `p 1(X)

]√E[

(∆−`g(X))2

−∆−`T `p 1(X)

].

Taking g = f we deduce the following result whose continuous version (i.e. the case ` = 0) dates backto [11].

Corollary 4.1 (Brascamp-Lieb inequality). Under the same conditions as Proposition 2.21 we have

E[(∆−`g(X))

]2E[(T `p 1(X)

)2] ≤ Var[g(X)] ≤ E[

(∆−`g(X))2

−∆−`T `p 1(X)

](4.1)

for all g such that T `p 1, g satisfy together the assumptions of Lemma 2.16.

We conclude with a generalized version of the elegant inequality due to [55, Lemma 2.11], in theform stated in [19, Equation (1.5)].

Corollary 4.2 (Asymmetric Brascamp-Lieb inequality). Under the same conditions as above, if−∆−`T `p 1 ∈ L1(µ) then

|Cov[f(X), g(X)]| ≤ supx

∣∣∣∣ ∆−`f(x)

∆−`T `p 1(x)

∣∣∣∣E [∣∣∣∆−`g(X)∣∣∣] (4.2)

for all f, g in L2(p).

Proof. Under the stated assumptions, we may apply (3.2) to get, after some notational reshuffling,

|Cov[f(X), g(X)]| ≤ E

[ ∣∣∆−`f(X)∣∣

−∆−`T `p 1(X)|∆−`g(X ′)|

(−∆−`T `p 1(X)

) K`p(X,X

′)

p(X)p(X ′)

]

≤ supx

∣∣∣∣ ∆−`f(x)

∆−`T `p 1(x)

∣∣∣∣E[|∆−`g(X ′)|

K`p(X,X

′)

p(X)p(X ′)

(−∆−`T `p 1(X)

)]

= supx

∣∣∣∣ ∆−`f(x)

∆−`T `p 1(x)

∣∣∣∣E [∣∣∣∆−`g(X ′)∣∣∣]

where the last line follows by conditioning on X ′ and applying Proposition 2.21.

22

Page 23: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

4.2 Stein kernel and Cacoullos’ bound

It is natural to consider test function h = −Id in Theorem 3.5. Since ∆−`h(x) = 1, we obtain

|Cov[f(X), g(X)]| ≤√

E[τ `p(X)(∆−`f(X))2

]√E[τ `p(X)(∆−`g(X))2

]In particular if g = f then

Var[g(X)] ≤ E[τ `p(X)(∆−`g(X))2

]in which one recognizes the upper bounds from [12] and also, when ` = 0, [64]. The correspondinglower bound in (3.1) is obtained for f(x) = τ `p(x) for which T `p f(x) = xIS(p)(x), and the overall boundbecomes

E[τ `p(X)(∆−`g(X))

]2Var[X]

≤ Var[g(X)] ≤ E[(∆−`g(X))2τ `p(X)

]. (4.3)

Example 4.3. In our examples (4.3) gives the following covariance identities.

• Binomial distribution: Let X ∼ Bin(n, θ) as in Example 2.10. From Example 2.14 we obtain theupper and lower bounds

(1− θ)nθ

E[X∆−g(X)

]2 ≤ Var[g(X)] ≤ (1− θ)E[X(∆−g(X))2

];

θ

n(1− θ)E[(n−X)∆+g(X)

]2 ≤ Var[g(X)] ≤ θE[(n−X)(∆+g(X))2

].

• Beta distribution: From Example 3.4, for the Beta(α, β)-distribution with variance αβ(α+β)2(α+β+1)

,

(α+ β + 1)

αβE[X(1−X)g′(X)

]2 ≤ Var[g(X)] ≤ 1

α+ βE[X(1−X)(g′(X))2

].

The particular case of other Pearson/Ord distributions is detailed in Tables 1, 2 and 3. The tablesinclude the Binomial distribution and the Beta distribution for easy reference. The Stein operatorswhich are given are those from Example 2.15.

4.3 Eigenfunctions of the adjoint Stein operator

A final interesting choice is h in Theorem 3.5 such that the corresponding weight−L`ph(x)

∆−`h(x)is constant,

i.e. any function h such that there exists λ ∈ IR for which

−L`ph(x)

∆−`h(x)= λ for all x ∈ S(p).

By construction, such functions are solution to the eigenfunction problem

h(x) = −λT `p (∆−`h)(x) for all x ∈ S(p)

where operator R`ph := T `p (∆−`h) is self-adjoint in the sense of that

E[(R`pf(X))g(X)] = E[f(X)(R`pg(X))]

for all f, g such that Lemmas 2.3 and 2.16 apply.

23

Page 24: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

Acknowledgements

The research of YS was partially supported by the Fonds de la Recherche Scientifique – FNRS underGrant no F.4539.16. ME acknowledges partial funding via a Welcome Grant of the Universite deLiege and via the Interuniversity Attraction Pole StUDyS (IAP/P7/06). YS also thanks Lihu Xu fororganizing the “Workshop on Stein’s method and related topics” at University of Macau in December2018, and where a preliminary version of this contribution was first presented. GR and YS thankEmilie Clette for fruitful discussions on a preliminary version of this work. YS thanks Celine Esser formany fruitful discussions. We also thank Benjamin Arras for several pointers to relevant literature,as well as corrections on the first draft of the paper.

References

[1] G. Afendras and N. Papadatos. On matrix variance inequalities. Journal of Statistical Planningand Inference, 141(11):3628–3631, 2011.

[2] G. Afendras, N. Papadatos, and V. Papathanasiou. The discrete Mohr and Noll inequality withapplications to variance bounds. Sankhya, 69(2):162–189, 2007.

[3] G. Afendras and V. Papathanasiou. A note on a variance bound for the multinomial and thenegative multinomial distribution. Naval Research Logistics (NRL), 61(3):179–183, 2014.

[4] B. Arras and C. Houdre. On Stein’s method for infinitely divisible laws with finite first moment.arXiv preprint arXiv:1712.10051, 2017.

[5] B. Arras and C. Houdre. On Stein’s method for multivariate self-decomposable laws with finitefirst moment. arXiv preprint arXiv:1809.02050, 2018.

[6] A. Barbour, M. J. Luczak, A. Xia, et al. Multivariate approximation in total variation, ii: Discretenormal approximation. The Annals of Probability, 46(3):1405–1440, 2018.

[7] A. D. Barbour and L. H. Y. Chen. An introduction to Stein’s method, volume 4 of Lect. NotesSer. Inst. Math. Sci. Natl. Univ. Singap. Singapore University Press, Singapore, 2005.

[8] A. D. Barbour and L. H. Y. Chen. Stein’s method and applications, volume 5 of Lect. Notes Ser.Inst. Math. Sci. Natl. Univ. Singap. Singapore University Press, Singapore, 2005.

[9] A. Baricz. Mills’ ratio: monotonicity patterns and functional inequalities. Journal of MathematicalAnalysis and Applications, 340(2):1362–1370, 2008.

[10] A. Borovkov and S. Utev. On an inequality and a related characterization of the normal distri-bution. Theory of Probability & Its Applications, 28(2):219–228, 1984.

[11] H. J. Brascamp and E. H. Lieb. On extensions of the Brunn-Minkowski and Prekopa-Leindlertheorems, including inequalities for log concave functions, and with an application to the diffusionequation. Journal of Functional Analysis, 22(4):366–389, 1976.

[12] T. Cacoullos. On upper and lower bounds for the variance of a function of a random variable.The Annals of Probability, 10(3):799–809, 1982.

[13] T. Cacoullos, N. Papadatos, and V. Papathanasiou. Variance inequalities for covariance kernelsand applications to central limit theorems. Theory of Probability & Its Applications, 42(1):149–155, 1998.

[14] T. Cacoullos and V. Papathanasiou. On upper and lower bounds for the variance of functions ofa random variable. Statistics & Probability Letters, 3:175–184, 1985.

24

Page 25: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

nam

ep

.m.f

.p(x

)S

tein

kern

elτ` (x

)p

ara

mete

rsu

pp

ort

Cu

m.

Ord

rela

tion

Pois

son

(λ)

e−λλx/x

!τ−

(x)

λ>

0x

=0,

1,...

τ+

(x)

=x

(δ,β,γ

)=

(0,0,λ

)

Ste

inop

erato

rsA

+ Poi(λ

)g(x

)=

(x−λ

)g(x

)−x

∆−g(x

)

A− P

oi(λ

)g(x

)=

(x−λ

)g(x

)−λ

∆+g(x

)

AP

oi(λ

)g(x

)=xg(x

)−λg(x

+1)

Vari

ance

bou

nd

sλE

[∆+g(X

)]2≤

Var

[g(X

)]≤λE[ (∆

+g(X

))2]

λ−

1E

[X∆−g(X

)]2≤

Var

[g(X

)]≤

E[ X(∆

−g(X

))2]

Bin

omia

l(n,θ

)( n x

) θx (1−θ)n−x

τ−

(x)

=θ(n−x

)0<θ<

1x

=0,

1,...,n

τ+

(x)

=(1−θ)x

n=

1,2,...

(δ,β,γ

)=

(0,−θ,nθ)

Ste

inop

erato

rsA

+ bin

(n,θ

)g(x

)=

(x−nθ)g(x

)−

(1−θ)x

∆−g(x

)

A− b

in(n,θ

)g(x

)=

(x−nθ)g(x

)−θ(n−x

)∆+g(x

)

Ab

in(n,θ

)g(x

)=xg(x

)+

θ1−θ(n−x

)g(x

+1)

Vari

ance

bou

nd

n(1−θ)E

[(n−X

)∆+g(X

)]2≤

Var

[g(X

)]≤θE[ (n−X

)(∆

+g(X

))2]

1−θ

nθE

[X∆−g(X

)]2≤

Var

[g(X

)]≤

(1−θ)E[ X(∆

−g(X

))2]

Neg

ati

veB

inom

ial

(r,p

)( r+x

−1

x

) pr (1−p)x

τ−

(x)

=1−pp

(r+x

)

0<p<

1x

=0,

1,...

τ+

(x)

=1 px

r>

0(δ,β,γ

)=

(0,

1−pp,r

1−pp

)

Ste

inop

erato

rsA

+ NB

(r,p

)g(x

)=( x−

1−ppr) g

(x)−

x p∆−g(x

)

A− N

B(r,p

)g(x

)=( x−

1−ppr) g

(x)−

1−pp

(r+x

)∆+g(x

)

AN

B(r,p

)g(x

)=xg(x

)−

(1−p)(r

+x

)g(x

+1)

Vari

ance

bou

nd

s1−prE[

(X+r)

∆+g(X

)]2≤

Var

[g(X

)]≤

1−ppE[

(X+r)

(∆+g(X

))2]

1r(1−p)E[X

∆−g(X

)]2≤

Var

[g(X

)]≤

1 pE[X

(∆−g(X

))2]

Tab

le1:

Sp

ecifi

cfo

rms

som

ed

iscr

ete

dis

trib

uti

ons

from

the

cum

ula

tive

Ord

fam

ily.

This

tab

leis

aco

mp

lete

dve

rsio

nof

Tab

le1

of[2

].

25

Page 26: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

nam

ep

.m.f

.p(x

)S

tein

kern

elτ` (x

)p

ara

mete

rsu

pp

ort

Cu

m.

Ord

rela

tion

Hyp

ergeo

met

ric

(K x)(

N−K

n−x

)(N n

)τ−

(x)

=1 N

(K−x

)(n−x

)

(n,K

,N)

x=

0,1,...,m

in{K

,n}

τ+

(x)

=1 Nx

(x+N−K−n

)

1≤K≤N

(δ,β,γ

)=( 1 N

,−

(n+K

)N

,nK N

)n

=1,

2,...,N

Ste

inop

erato

rsA

+ H(n,K,N

)g(x

)=( x−

nK N

) g(x)−

1 Nx

(N−K−n

+x

)∆−g(x

)

A− H

(n,K,N

)g(x

)=( x−

nK N

) g(x)−

1 N(K−x

)(n−x

)∆+g(x

)

AH

(n,K,N

)g(x

)=xg(x

)+

1N

+K

+nx

2g(x

)−

1N

+K

+n

(K−x

)(n−x

)g(x

+1)

Var

ian

ceb

oun

ds

N−

1nK

(N−K

)(N−n

)E[

(K−X

)(n−X

)∆+g(X

)]2≤

Var

[g(X

)]≤

1 NE[

(K−X

)(n−X

)(∆

+g(X

))2]

N−

1nK

(N−K

)(N−n

)E[X

(N−K−n

+X

)∆−g(X

)]2≤

Var

[g(X

)]≤

1 NE[X

(N−K−n

+X

)(∆−g(X

))2]

Neg

ati

veH

yp

er-

(x+r−1

x)(

N−r−x

K−x

)(N K

)τ−

(x)

=1

N−K

+1(K−x

)(r

+x

)

geo

met

ric

(N,K

,r)

x=

0,1,...,K

τ+

(x)

=1

N−K

+1x

(N−r

+1−x

)

0≤K≤N

(δ,β,γ

)=( −

1N−K

+1,

K−r

N−K

+1,

rK

N−K

+1

)S

tein

oper

ato

rsA

+ NH

(N,K,r

)g(x

)=( x−

rK

N−K

+1

) g(x

)−

x(N

+1−r−x

)N−K

+1

∆−g(x

)

A− N

H(N,K,r

)g(x

)=( x−

rK

N−K

+1

) g(x

)−

(K−x

)(r+x

)N−K

+1

∆+g(x

)

AN

H(N,K,r

)g(x

)=xg(x

)−

1N−r+

1x

2g(x

)−

1N−r+

1(K−x

)(r

+x

)g(x

+1)

Var

ian

ceb

oun

ds

N−K

+2

r(N

+1)K

(N−K−r+

1)E[

(K−X

)(r

+X

)∆+g(X

)]2≤

Var

[g(X

)]

Var

[g(X

)]≤

1N−K

+1E[

(K−X

)(r

+X

)(∆

+g(X

))2]

N−K

+2

r(N

+1)K

(N−K−r+

1)E[X

(N+

1−r−X

)∆−g(X

)]2≤

Var

[g(X

)]

Var

[g(X

)]≤

1N−K

+1E[X

(N+

1−r−X

)(∆−g(X

))2]

Tab

le2:

Sp

ecifi

cfo

rmfo

rso

me

dis

cret

ed

istr

ibu

tion

sfr

omth

ecu

mu

lati

veO

rdfa

mil

y(s

econ

dp

art)

.

26

Page 27: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

nam

ep

.m.f

.p(x

)τ(x

)p

ara

mete

rsu

pp

ort

Pears

on

rela

tion

Nor

mal(µ,σ

2)

1√

2πσ2

exp( (x−µ

)2

2σ2

)τ(x

)=σ

2

µ∈

IR,σ

2>

0x∈

IR(δ,β,γ

)=

(0,0,σ

2)

Ste

inop

erat

ors

AN

(µ,σ

2)g

(x)

=(x−µ

)g(x

)−σ

2g′ (x

)

Var

ian

ceb

oun

ds

σ2E[g′ (X

)]2≤

Var

[f(X

)]≤σ

2E[g′ (X

)2]

Bet

a(α,β

)xα−

1(1−x

)β−

1/B

(α,β

)τ(x

)=

x(1−x

α>

0,β>

0x∈

(0,1

)(δ,β,γ

)=

(−

+β,

+β,0

)

Ste

inop

erat

ors

AB

eta(α,β

)g(x

)=( x−

αα

) g(x

)−

x(1−x

+βg′ (x

)

Var

ian

ceb

oun

ds

(α+β

+1)

αβ

E[X

(1−X

)g′ (X

)]2≤

Var

[g(X

)]≤

+βE[ X(1

−X

)(g′ (X

))2]

Gam

ma(µ,σ

2)

xα−

1β−αe−

x/β/Γ

(α)

τ(x

)=βx

α>

0,β>

0x∈

(0,∞

)(α

<1)

(δ,β,γ

)=

(0,β,0

)x∈

[0,∞

)(α≥

1)

Ste

inop

erat

ors

AG

am

ma(α,β

)g(x

)=

(x−αβ

)g(x

)−βxg′ (x

)

Var

ian

ceb

oun

ds

1 αE[Xg′ (X

)]2≤

Var

[g(X

)]≤βE[Xg′ (X

)2]

Stu

den

t(ν

)ν−

1/2B

(ν/2,1/2)−

1(ν/(ν

+x

2))

(1+ν)/

2τ(x

)=

x2+ν

ν−

1fo

rν>

1

ν>

0x∈

IR(δ,β,γ

)=( 1ν−

1,0,

νν−

1

)S

tein

op

erat

ors

At(ν)g

(x)

=xg(x

)−

x2+ν

ν−

1g′ (x

)

Var

ian

ceb

oun

ds

(ν>

2)(ν−

2)

ν(ν−

1)2E[

(X2

)g′ (X

)]2≤

Var

[g(X

)]≤

1ν−

1E[

(X2

)(g′ (X

)2)]

Fd

istr

ibu

tion

(d1,d

2)

( d 1 d2

) d 1/2xd1 2−1( 1

+d1

d2x) −d

1+d2

2

Bet

a(d1/2,d

2/2)

τ(x

)=

2x

(d1x

+d2)

d1(d

2−

2)

ford

2>

2

d1>

0,d

2>

0x∈

(0,∞

)(δ,β,γ

)=( 2

d1

d1(d

2−

2),

2d2

d1(d

2−

2),0)

Ste

inop

erat

ors

AF

(d1,d

2)g

(x)

=( x−

d2

d2−

1

) g(x

)−

2x

(d2+d1x

d1(d

2−

2)g′ (x

)

Var

ian

ceb

oun

ds

(d2>

4)2(d

2−

4)

d1d2 2(d

1+d2−

2)E[X

(d2

+d

1X

)g′ (X

)]2≤

Var

[g(X

)]≤

2d1(d

2−

2)E[X

(d2

+d

1X

)g′ (X

)2]

Tab

le3:

Sp

ecifi

cfo

rmfo

rso

me

conti

nu

ous

dis

trib

uti

ons.

Ifth

ed

istr

ibu

tion

bel

ongs

toth

eP

ears

onfa

mil

y(e

xam

ple

sfr

om[1

]),

the

coeffi

cien

t(δ,β,γ

)are

give

n.

Th

ista

ble

isan

adap

ted

vers

ion

ofT

able

1of

[2].

27

Page 28: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

[15] T. Cacoullos and V. Papathanasiou. Bounds for the variance of functions of random variables byorthogonal polynomials and Bhattacharyya bounds. Statistics & Probability Letters, 4(1):21–23,1986.

[16] T. Cacoullos and V. Papathanasiou. Characterizations of distributions by variance bounds. Statis-tics & Probability Letters, 7(5):351–356, 1989.

[17] T. Cacoullos and V. Papathanasiou. Lower variance bounds and a new proof of the central limittheorem. Journal of Multivariate Analysis, 43(2):173–184, 1992.

[18] T. Cacoullos and V. Papathanasiou. A generalization of covariance identity and related charac-terizations. Mathematical Methods of Statistics, 4(1):106–113, 1995.

[19] E. A. Carlen, D. Cordero-Erausquin, and E. H. Lieb. Asymmetric covariance estimates ofBrascamp–Lieb type and related inequalities for log-concave measures. Annales de l’InstitutHenri Poincare, Probabilites et Statistiques, 49:1–12, 2013.

[20] W.-Y. Chang and D. S. P. Richards. Variance inequalities for functions of multivariate randomvariables. Advances in Stochastic Inequalities: AMS Special Session on Stochastic Inequalitiesand Their Applications, October 17-19, 1997, Georgia Institute of Technology, 234:43, 1999.

[21] S. Chatterjee. A short survey of Stein’s method. Preprint arXiv:1404.1392, 2014.

[22] S. Chatterjee and Q.-M. Shao. Nonnormal approximation by Stein’s method of exchangeable pairswith application to the Curie-Weiss model. The Annals of Applied Probability, 21(2):464–483,2011.

[23] L. H. Chen. An inequality for the multivariate normal distribution. Journal of MultivariateAnalysis, 12(2):306–315, 1982.

[24] L. H. Chen. Poincare-type inequalities via stochastic integrals. Zeitschrift fur Wahrscheinlichkeit-stheorie und Verwandte Gebiete, 69(2):251–277, 1985.

[25] L. H. Y. Chen. Poisson approximation for dependent trials. The Annals of Probability, 3(3):534–545, 1975.

[26] L. H. Y. Chen, L. Goldstein, and Q.-M. Shao. Normal approximation by Stein’s method. Proba-bility and its Applications (New York). Springer, Heidelberg, 2011.

[27] H. Chernoff. The identification of an element of a large population in the presence of noise. TheAnnals of Statistics, 8(6):1179–1197, 1980.

[28] T. A. Courtade, M. Fathi, and A. Pananjady. Existence of Stein kernels under a spectral gap,and discrepancy bound. arXiv preprint arXiv:1703.07707, 2017.

[29] C. M. Cuadras. On the covariance between functions. Journal of Multivariate Analysis, 81(1):19–27, 2002.

[30] P. Diaconis and S. Zabell. Closed form summation for classical distributions: variations on atheme of de Moivre. Statistical Science, 6(3):284–302, 1991.

[31] C. Dobler. Stein’s method of exchangeable pairs for the Beta distribution and generalizations.Electronic Journal of Probability, 20(109):1–34, 2015.

[32] W. Ehm. Binomial approximation to the poisson binomial distribution. Statistics & ProbabilityLetters, 11(1):7–16, 1991.

28

Page 29: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

[33] M. Ernst, G. Reinert, and Y. Swan. Papathanasiou and Olkin-Shepp–type expansions for uni-variate target distributions. 2019.

[34] X. Fang, Q.-M. Shao, and L. Xu. Multivariate approximations in Wasserstein distance by Stein’smethod and bismut’s formula. Probability Theory and Related Fields, pages 1–35, 2018.

[35] M. Fathi. Higher-Order Stein kernels for Gaussian approximation. arXiv preprintarXiv:1812.02703, 2018.

[36] M. Fathi. Stein kernels and moment maps. arXiv preprint arXiv:1804.04699, 2018.

[37] L. Goldstein and G. Reinert. Stein’s method for the Beta distribution and the Polya-Eggenbergerurn. Journal of Applied Probability, 50(4):1187–1205, 2013.

[38] J. Gorham, A. B. Duncan, S. J. Vollmer, and L. Mackey. Measuring sample quality with diffusions.The Annals of Applied Probability (to appear), 2019.

[39] J. Gorham and L. Mackey. Measuring sample quality with kernels. In Proceedings of the 34thInternational Conference on Machine Learning-Volume 70, pages 1292–1301. JMLR. org, 2017.

[40] E. Hillion, O. Johnson, and Y. Yu. A natural derivative on [0, n] and a binomial Poincareinequality. ESAIM: Probability and Statistics, 18:703–712, 2014.

[41] W. Hoffding. Masstabinvariante Korrelationstheorie. Schriften des Mathematischen Instituts undInstituts fur Angewandte Mathematik der Universitat Berlin, 5:181–233, 1940.

[42] W. Hoffding. The collected works of Wassily Hoeffding. Springer Science & Business Media, 2012.

[43] S. Holmes. Stein’s method for birth and death chains. In Stein’s method: expository lectures andapplications, volume 46 of IMS Lecture Notes Monogr. Ser., pages 45–67. Inst. Math. Statist.,Beachwood, OH, 2004.

[44] S. Karlin. A general class of variance inequalities. Multivariate Analysis: Future Directions,Elsevier Science Publishers, New York, pages 279–294, 1993.

[45] C. A. J. Klaassen. On an inequality of Chernoff. The Annals of Probability, 13(3):966–974, 1985.

[46] R. Korwar. On characterizations of distributions by mean absolute deviation and variance bounds.Annals of the Institute of Statistical Mathematics, 43(2):287–295, 1991.

[47] S. Kusuoka and C. A. Tudor. Stein’s method for invariant measures of diffusions via Malliavincalculus. Stochastic Processes and their Applications, 122(4):1627–1651, 2012.

[48] Z. Landsman, S. Vanduffel, and J. Yao. A note on Stein’s lemma for multivariate ellipticaldistributions. Journal of Statistical Planning and Inference, 143(11):2016–2022, 2013.

[49] Z. Landsman, S. Vanduffel, and J. Yao. Some Stein-type inequalities for multivariate ellipticaldistributions and applications. Statistics & Probability Letters, 97:54–62, 2015.

[50] C. Ley, G. Reinert, and Y. Swan. Distances between nested densities and a measure of the impactof the prior in Bayesian statistics. Annals of Applied Probability, 27(1):216–241, 2016.

[51] C. Ley and Y. Swan. Stein’s density approach and information inequalities. Electronic Commu-nications in Probability, 18(7):1–14, 2013.

[52] C. Ley and Y. Swan. Parametric Stein operators and variance bounds. Brazilian Journal ofProbability and Statistics, 30:171–195, 2016.

29

Page 30: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

[53] C. Ley, Y. Swan, and G. Reinert. Stein’s method for comparison of univariate distributions.Probability Surveys, 14:1–52, 2017.

[54] L. Mackey and J. Gorham. Multivariate Stein factors for a class of strongly log-concave distribu-tions. Electronic Communications in Probability, 21, 2016.

[55] G. Menz and F. Otto. Uniform logarithmic sobolev inequalities for conservative spin systemswith super-quadratic single-site potential. The Annals of Probability, 41(3B):2182–2224, 2013.

[56] J. Nash. Continuity of solutions of parabolic and elliptic equations. The American Journal ofMathematics, 80:931–954, 1958.

[57] I. Nourdin and G. Peccati. Normal approximations with Malliavin calculus : from Stein’s methodto universality. Cambridge Tracts in Mathematics. Cambridge University Press, 2012.

[58] V. Papathanasiou. A characterization of the Pearson system of distributions and the associatedorthogonal polynomials. Annals of the Institute of Statistical Mathematics, 47(1):171–176, 1995.

[59] B. P. Rao. Matrix variance inequalities for multivariate distributions. Statistical Methodology,3(4):416–430, 2006.

[60] G. Reinert. A weak law of large numbers for empirical measures via stein’s method. The Annalsof Probability, pages 334–354, 1995.

[61] G. Reinert. Three general approaches to Stein’s method. In An introduction to Stein’s method,volume 4. Lecture Notes Series, Institute for Mathematical Sciences, National University of Sin-gapore, 2004.

[62] G. Reinert, G. Mijoule, and Y. Swan. Stein gradients and divergences for multivariate continuousdistributions. arXiv:1806.03478, 2018.

[63] N. Ross. Fundamentals of Stein’s method. Probability Surveys, 8:210–293, 2011.

[64] A. Saumard. Weighted Poincare inequalities, concentration inequalities and tail bounds relatedto the behavior of the Stein kernel in dimension one. arXiv preprint arXiv:1804.03926, 2018.

[65] A. Saumard and J. A. Wellner. On the Isoperimetric constant, covariance inequalities and Lp-Poincare inequalities in dimension one. arXiv preprint arXiv:1711.00668, 2017.

[66] A. Saumard and J. A. Wellner. Efron’s monotonicity property for measures on R2. Journal ofMultivariate Analysis, 166:212–224, 2018.

[67] W. Schoutens. Orthogonal polynomials in Stein’s method. Journal of Mathematical Analysis andApplications, 253(2):515–531, 2001.

[68] S. Y. Soon. Binomial approximation for dependent indicators. Statistica Sinica, 6(3):703–714,1996.

[69] C. Stein. A bound for the error in the normal approximation to the distribution of a sum ofdependent random variables. In Proceedings of the Sixth Berkeley Symposium on MathematicalStatistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. II: Probabilitytheory, pages 583–602, Berkeley, Calif., 1972. Univ. California Press.

[70] C. Stein. Approximate computation of expectations. Institute of Mathematical Statistics LectureNotes—Monograph Series, 7. Institute of Mathematical Statistics, Hayward, CA, 1986.

[71] N. Upadhye, V. Cekanavicius, and P. Vellaisamy. On Stein operators for discrete approximations.Bernoulli, 23(4A):2828–2859, 2017.

30

Page 31: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

A Proofs from Section 2.3

Proof of Proposition 2.25. In order for (2.4) to hold we need (i) f(·)g(· − `) ∈ F (1)` (p) and (ii)

f(·)∆−`g(·) ∈ L1(p). Condition (ii) is satisfied under (2.23). By definition of F (1)` (p), condition

(i) holds if the following three conditions apply: (iA) f(·)g(· − `) ∈ dom(p,∆`), (iB) ∆`(p(·)f(·)g(· −

`))I[S(p)] ∈ L1(µ) and (iC) E

[T `p f(X)g(X − `)

]= 0. The proof hinges on product formula (2.3)

which yields:∆`(p(x)f(x)g(x− `)

)= ∆`

(p(x)f(x)

)g(x) + p(x)f(x)∆−`g(x)

for all x ∈ S(p). In light of this, condition (iA) is implied by the requirement that f ∈ dom(p,∆`) and

g ∈ dom(∆−`). Similarly, because condition (iB) is equivalent to∆`(p(·)f(·)g(·−`)

)p(·) ∈ L1(p), we see that

it is guaranteed by (2.23). Finally, applying (2.1), we that (iC) follows from (2.24). Hence Condition(i) holds under the stated assumptions.

Proof of Proposition 2.26. In order for (2.11) to hold, it is necessary and sufficient that (i) f ∈ L1(p),g ∈ dom(∆−`), (ii)

(L`pf(·)

)g(· − `) and (iii) L`pf(∆−`g) ∈ L1(p). Conditions (i) and (iii) are stated

explicitly and all that remains is to check that (ii) is equivalent to the stated assumptions. As before, we

recall that (ii) is equivalent to (iiA)(L`pf(·)

)g(·−`) ∈ dom(p,∆`); (iiB) ∆`

(p(·)(L`pf(·)

)g(·−`)

)/p(·) ∈

L1(p); (iiC) E[T `p((L`pf(·)

)g(· − `)

)(X)

]= 0. As in the proof of Proposition 2.25, the result hinges

on the product rule (2.3) which now reads

∆`(p(x)

(L`pf(x)

)g(x− `)

)=

(∆`(p(x)L`pf(x)

))g(x) + p(x)L`pf(x)∆−`g(x)

= (f(x)− E[f(X)])g(x) + p(x)L`pf(x)∆−`g(x)

Hence condition (iiA) holds solely under the assumption that g ∈ dom(∆−`), condition (iiA) holdsunder (2.25) and (2.26). Finally, (2.27) guarantees that (iiC) is satisfied.

Proof of Proposition 2.27. We want to apply Proposition 2.26; hence we check each condition inProposition 2.26 separately. By assumption, (2.25) is satisfied and g ∈ dom(∆−`).

• For Assumption (2.26): First suppose that g is monotone increasing. It is to show thatL`pf(∆−`g) ∈ L1(p). As f ∈ L1(p) is assumed, we can use (2.17) to get

E[∣∣∣L`pf(X)

∣∣∣ ∣∣∣∆−`g(X)∣∣∣] = E

[∣∣∣L`pf(X)∣∣∣∆−`g(X)

]≤ E

[|f(X2)− f(X1)|Φ`

p(X1, X,X2)∆−`g(X)]

≤ E[|f(X2)− f(X1)|E

[Φ`p(X1, X,X2)∆−`g(X) |X1, X2

]]= E

[|f(X2)− f(X1)| (g(X2)− g(X1)) I[X1 < X2]

]where we used the first identity in (2.18) in the last line. This last expression is necessarily finitebecause f, g and fg are in L1(p). The general conclusion follows from the fact that any functionof bounded variation is the difference between two monotone functions; the triangle inequalitythus yielding the claim.

• For Assumption (2.27): Since f ∈ L1(p), we can apply (2.17) and the definition of Φ`p to obtain

−p(x)L`pf(x) = E[f(X)χ−`(x,X)]E[χ`(X,x)]− E[f(X)χ`(X,x)]E[χ−`(x,X)].

31

Page 32: First order covariance inequalities via Stein’s method · the rst to obtain upper variance bounds via properties of Stein operators is due to Saumard [64], by combining generalized

Then

limx→a,x>a

∣∣∣∣(L`pf(x− b`))g(x− b` − `)p(x− b`)

∣∣∣∣≤ lim

x→a,x>a

(|g(x− b` − `)|E[|f(X)|χ−`(x− b`, X)]E[χ`(X,x− b`)]

+ |g(x− b` − `)|E[|f(X)|χ`(X,x− b`)]E[χ−`(x− b`, X)]

)≤ lim

x→a,x>a|g(x− b` − `)|E[|f(X)|χ−`(x− b`, X)]P (X ≤ x− a` − b`) = L1

+ limx→a,x>a

|g(x− b` − `)|E[|f(X)|χ`(X,x− b`)]P (X ≥ x) = L2

and

limx→b,x<b

∣∣∣∣(L`pf(x+ a`)

)g(x+ a` − `)p(x+ a`)

∣∣∣∣≤ lim

x→b,x<b

(|g(x+ a` − `)|E[|f(X)|χ−`(x+ a`, X)]E[χ`(X,x+ a`)]

+ |g(x+ a` − `)|E[|f(X)|χ`(X,x+ a`)]E[χ−`(x+ a`, X)]

)≤ lim

x→b,x<b|g(x+ a` − `)|E[|f(X)|χ−`(x+ a`, X)]P (X ≤ x) = L3

+ limx→b,x<b

|g(x+ a` − `)|E[|f(X)|χ`(X,x+ a`)]P (X ≥ x+ a` + b`)) = L4.

Condition 1 guarantees that L1 = L4 = 0; condition 2 guarantees that L2 = L3 = 0. If,furthermore, f is bounded then the sufficiency of 1 is immediate; if f ∈ L2(p) then it followsfrom the Cauchy-Schwarz inequality.

32