empirical regression quantile process with · pdf fileempirical regression quantile process...

14
arXiv:1710.06638v1 [math.ST] 18 Oct 2017 EMPIRICAL REGRESSION QUANTILE PROCESS WITH POSSIBLE APPLICATION TO RISK ANALYSIS JANA JURE ˇ CKOV ´ A, MARTIN SCHINDLER, AND JAN PICEK Abstract. The processes of the averaged regression quantiles and of their modifications provide useful tools in the regression models when the covariates are not fully under our control. As an application we mention the probabilistic risk assessment in the situation when the return depends on some exogenous variables. The processes enable to evaluate the expected α-shortfall (0 α 1) and other measures of the risk, recently generally accepted in the financial literature, but also help to measure the risk in environment analysis and elsewhere. 1. Introduction In everyday life and practice we encounter various risks, depending on various con- tributors. The risk contributors may be partially under our control, and information on them is important, because it helps to make good decisions about system design. This problem appears not only in the financial market, insurance and social sta- tistics, but also in environment analysis dealing with exposures to toxic chemicals (coming from power plants, road vehicles, agriculture), and elsewhere; see [30] for an excellent review of such problems. Our aim is to analyze the risks with the aid of probabilistic risk assessment. In the literature were recently defined various coherent risk measures, some satisfying suitable axioms. We refer to [5], [6], [31], [41], [43], [36], [1], [42], [9], [37], [38], and to other papers cited in, for discussions and some projects. For possible applications in the insurance we refer to [10]. A generally accepted measure of the risk is the expected shortfall, based on quantiles of a portfolio return. Its properties were recently intensively studied. Acerbi and Tasche in [1] speak on ”expected loss in the 100α% worst cases”, or shortly on ”expected α-shortfall”, 0 <α< 1, which is defined as (1.1) IE{Y |Y F 1 (α)} = 1 α α 0 F 1 (u)du, where F is the distribution function of the asset Y. The quantity can be estimated by means of approximations of the quantile function F 1 (u) by the sample quantiles. The quantile regression is an important method for investigation of the risk of an assett in the situation that it depends on some exogenous variables. An averaged regression quantile, introduced in [22], or some of its modifications, serve as a convenient tool for the global risk measurement in such a situation, when the 1991 Mathematics Subject Classification. Primary 62J02 , 62G30; Secondary 90C05, 65K05, 49M29, 91B30 . Key words and phrases. Averaged regression quantile, one-step regression quantile, R- estimator, risk measurement. The authors gratefully acknowledge the support of the Grant GA ˇ CR 15-00243S. 1

Upload: hoangduong

Post on 17-Mar-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: EMPIRICAL REGRESSION QUANTILE PROCESS WITH · PDF fileEMPIRICAL REGRESSION QUANTILE PROCESS WITH ... but also in environment analysis dealing with exposures to toxic chemicals (coming

arX

iv:1

710.

0663

8v1

[m

ath.

ST]

18

Oct

201

7

EMPIRICAL REGRESSION QUANTILE PROCESS WITH

POSSIBLE APPLICATION TO RISK ANALYSIS

JANA JURECKOVA, MARTIN SCHINDLER, AND JAN PICEK

Abstract. The processes of the averaged regression quantiles and of theirmodifications provide useful tools in the regression models when the covariatesare not fully under our control. As an application we mention the probabilisticrisk assessment in the situation when the return depends on some exogenousvariables. The processes enable to evaluate the expected α-shortfall (0 ≤

α ≤ 1) and other measures of the risk, recently generally accepted in thefinancial literature, but also help to measure the risk in environment analysisand elsewhere.

1. Introduction

In everyday life and practice we encounter various risks, depending on various con-tributors. The risk contributors may be partially under our control, and informationon them is important, because it helps to make good decisions about system design.This problem appears not only in the financial market, insurance and social sta-tistics, but also in environment analysis dealing with exposures to toxic chemicals(coming from power plants, road vehicles, agriculture), and elsewhere; see [30] foran excellent review of such problems. Our aim is to analyze the risks with theaid of probabilistic risk assessment. In the literature were recently defined variouscoherent risk measures, some satisfying suitable axioms. We refer to [5], [6], [31],[41], [43], [36], [1], [42], [9], [37], [38], and to other papers cited in, for discussionsand some projects. For possible applications in the insurance we refer to [10].

A generally accepted measure of the risk is the expected shortfall, based onquantiles of a portfolio return. Its properties were recently intensively studied.Acerbi and Tasche in [1] speak on ”expected loss in the 100α% worst cases”, orshortly on ”expected α-shortfall”, 0 < α < 1, which is defined as

(1.1) − IE{Y |Y ≤ F−1(α)} = − 1

α

∫ α

0

F−1(u)du,

where F is the distribution function of the asset Y. The quantity can be estimated bymeans of approximations of the quantile function F−1(u) by the sample quantiles.

The quantile regression is an important method for investigation of the riskof an assett in the situation that it depends on some exogenous variables. Anaveraged regression quantile, introduced in [22], or some of its modifications, serveas a convenient tool for the global risk measurement in such a situation, when the

1991 Mathematics Subject Classification. Primary 62J02 , 62G30; Secondary 90C05, 65K05,49M29, 91B30 .

Key words and phrases. Averaged regression quantile, one-step regression quantile, R-estimator, risk measurement.

The authors gratefully acknowledge the support of the Grant GACR 15-00243S.

1

Page 2: EMPIRICAL REGRESSION QUANTILE PROCESS WITH · PDF fileEMPIRICAL REGRESSION QUANTILE PROCESS WITH ... but also in environment analysis dealing with exposures to toxic chemicals (coming

2 JANA JURECKOVA, MARTIN SCHINDLER, AND JAN PICEK

amount of covariates is not under our control. The typical model for the relationof the loss to the covariates is the regression model

(1.2) Yni = β0 + x⊤niβ + eni, i = 1, . . . , n

where Yn1, . . . , Ynn are observed responses, en1, . . . , enn are independent modelerrors, possibly non-identically distributed with unknown distribution functionsFi, i = 1, . . . , n. The covariates xni = (xi1, . . . , xip)

⊤, i = 1, . . . , n are random or

nonrandom, and β∗ = (β0,β⊤)⊤ = (β0, β1, . . . , βp)

⊤ ∈ IRp+1 is an unknown param-eter. For the sake of brevity, we also use the notation x∗

ni = (1, xi1, . . . , xip)⊤, i =

1, . . . , n.An important tool in the risk analysis is the regression α-quantile

β∗

n(α) =(βn0(α), (βn(α))

⊤)⊤

=(βn0(α), βn1(α), . . . , βnp(α)

)⊤

.

It is a (p+ 1)-dimensional vector defined as a minimizer

β∗

n(α) = arg minb∈IRp+1

{ n∑

i=1

[α(Yi − x∗⊤

i b)+ + (1− α)(Yi − x∗⊤i b)−

]}

where z+ = max(z, 0) and z− = max(−z, 0), z ∈ IR1.(1.3)

The solution β∗

n(α) = (β0(α), β(α))⊤ minimizes the (α, 1−α) convex combination

of residuals (Yi−x∗⊤i b) over b ∈ IRp+1, where the choice of α depends on the balance

between underestimating and overestimating the respective losses Yi. The increasingα ր 1 reflects a greater concern about underestimating losses Y, comparing tooverestimating.

The methodology is based on the averaged regression α-quantile, what is the

following weighted mean of components of β∗

n(α), 0 ≤ α ≤ 1:

(1.4) Bn(α) = x∗⊤n β

n(α) = βn0(α) +1

n

n∑

i=1

p∑

j=1

xij βj(α), x∗n =

1

n

n∑

i=1

x∗i

In [22] it was shown that Bn(α) − β0 − x⊤nβ is asymptotically equivalent to the

[nα]-quantile en:[nα] of the model errors, if they are identically distributed. Hence,

Bn(·) can help to make an inference on the expected α-shortfall (1.1) even underthe nuisance regression.

Besides Bn(α), its various modifications can also be used, sometimes better com-prehensible. The methods are nonparametric, thus applicable also to heavy-tailedand skewed distribution; notice that [8] speak about considerable improvementover normality, trying to use different distributions. An extension to autoregressivemodels is possible and will be a subject of the further study; there the main toolwill be the autoregression quantiles, introduced in [29], and their averaged versions.The autoregression quantile will reflect the value-at-risk, based on the past assets,while its averaged version will try to mask the past history.

The behavior of Bn(α) with 0 < α < 1 has been illustrated in [4] and [27], andsummarized in [25]; here it is showed that Bn(α) is nondecreasing step function ofα ∈ (0, 1). The extreme Bn(1) with α = 1 was studied in [19]. Notice that the upper

bound of the number Jn of breakpoints of β∗

n(·) and also of Bn(·) is(

n

p+ 1

)=

O(np+1

). However, Portnoy in [34] showed that, under some condition on the

Page 3: EMPIRICAL REGRESSION QUANTILE PROCESS WITH · PDF fileEMPIRICAL REGRESSION QUANTILE PROCESS WITH ... but also in environment analysis dealing with exposures to toxic chemicals (coming

EMPIRICAL REGRESSION QUANTILE PROCESS 3

design matrix Xn, the number Jn of breakpoints is of order Op(n logn) as n→ ∞,

and thus much smaller.An alternative to the regression quantile is the two-step regression α-quantile,

introduced in [21]. Here the slope components β are estimated by a specific rank-

estimate βnR, which is invariant to the shift in location. The intercept component

is then estimated by the α-quantile of residuals of Yi’s from βnR. The averaged

two-step regression quantile Bn(α) is asymptotically equivalent to Bn(α) under awide choice of the R-estimators of the slopes. However, finite-sample behavior ofBn(α) generally differs from that of Bn(α); is affected by the choice of R-estimator,

but the main difference is that the number of breakpoints of Bn(α) exactly equalsto n.

Being aware of various important applications of the problem, we shall studythis situation in more detail. The averaged regression quantile Bn(α) is monotone

in α, while the two-step averaged regression quantile Bn(α) can be made motonone

by a suitable choice of R-estimate βnR. Hence, we can consider their inversions,which in turn estimate the parent distribution F of the model errors. As such theyboth provide a tool for an inference. The behavior of these processes and of theirapproximations is analyzed and numerically illustrated.

2. Behavior of Bn(α) over α ∈ (0, 1).

Let us first describe one possible form of the averaged regression quantile Bn(α)as a weighted mean of the basic components of vector Y. Consider again the min-imization (1.3), fixed α ∈ [0, 1] fixed. This was treated in [26] as a special linearprogramming problem, and later on various modifications of this algorithm weredeveloped. Its dual program is a parametric linear program, which can be writtensimply as

maximize Y⊤n a(α)

under X∗⊤n a(α) = (1− α)X∗⊤

n 1⊤n(2.1)

a(α) ∈ [0, 1]n, 0 ≤ α ≤ 1

where

(2.2) X∗n =

x∗⊤n1

. . .

x∗⊤nn

is of order n× (p+ 1).

The components of the optimal solution a(α) = (an1(α), . . . , ann(α))⊤ of (2.1),

called regression rank scores, were studied in[12], who showed that ani(α) is acontinuous, piecewise linear function of α ∈ [0, 1] and ani(0) = 1, ani(1) = 0, i =1, . . . , n. Moreover, a(α) is invariant in the sense that it does not change if Y isreplaced with Y +X∗

nb∗, ∀b∗ ∈ IRp+1 (see [12] for detail).

Let {x∗i1, . . . ,x∗

ip+1} be the optimal base in (2.1) and let {Yi1 , . . . , Yip+1

} be the

corresponding responses in model (1.2). Then Bn(α) equals to a weighted meanof {Yi1 , . . . , Yip+1

}, with the weights based on the regressors. Indeed, we have atheorem

Theorem 1. Assume that the regression matrix (2.2) has full rank p+ 1 and thatthe distribution functions F1, . . . , Fn of model errors are continuous and increasing

Page 4: EMPIRICAL REGRESSION QUANTILE PROCESS WITH · PDF fileEMPIRICAL REGRESSION QUANTILE PROCESS WITH ... but also in environment analysis dealing with exposures to toxic chemicals (coming

4 JANA JURECKOVA, MARTIN SCHINDLER, AND JAN PICEK

in (−∞,∞). Then with probability 1

Bn(α) =

p+1∑

k=1

wk,αYik ,

p+1∑

k=1

wk,α = 1(2.3)

and Bn(α) ≤ Bn(1) < maxi≤n

Yi(2.4)

where the vector Yn(1) = (Yi1 , . . . , Yip+1)⊤ corresponds to the optimal base of the

linear program (2.1).The vector wα = (w1,α, . . . , wp+1,α)

⊤ of coefficients equals to

(2.5) wα =[n−11⊤

nX∗n(X

∗n1)

−1]⊤, while

p+1∑

k=1

wk,α = 1

where X∗n1 is the submatrix of X∗

n with the rows x∗⊤i1, . . . ,x∗⊤

ip+1.

Proof. The regression quantile β∗

n(α) is a step function of α ∈ (0, 1). If α is acontinuity point of the regression quantile trajectory, then we have the followingidentity, proven in [22]:

(2.6) Bn(α) =1

n

n∑

i=1

x∗⊤i β

n(α) = − 1

n

n∑

i=1

Yia′ni(α)

where a′ni(α)) =ddα ani(α). Moreover, (2.1) implies

n∑

i=1

a′ni(α) = −n(2.7)

n∑

i=1

xij a′ni(α) = −

n∑

i=1

xij , 1 ≤ j ≤ p,

Notice that a′ni(α) 6= 0 iff α is the point of continuity of β∗

n(·) and Yi = x∗⊤i β

n(α).To every fixed continuity point α correspond exactly p+ 1 such components, suchthat the corresponding x∗

i belongs to the optimal base of program (2.1). Hencethere exist coefficients wk,α, k = 1, . . . , p+ 1 such that

Bn(α) = − 1

n

n∑

i=1

Yia′ni(α) =

p+1∑

k=1

wk,αYik .

The equalities Yi = x∗⊤i β

n(α) hold just for p+1 components of the optimal basex∗i1, . . . ,x∗

ip+1. Let X∗

n1 be the submatrix of X∗n with the rows x∗⊤

i1, . . . ,x∗⊤

ip+1and

let (a′1(α))⊤ = (a′i1 (α), . . . , a

′ip+1

(α)). Then X∗n1 is regular with probability 1 and

w⊤α = − 1

n(a′(α))⊤ =

1

n1⊤nX

∗n(X

∗n1)

−1.

(a′1(α))⊤ = −1⊤

nX∗n(X

∗n1)

−1 and

p+1∑

k=1

wk,α = 1.

This and (2.6) imply (2.3) and (2.5). The inequality (2.4) was proven in [19]. �

Page 5: EMPIRICAL REGRESSION QUANTILE PROCESS WITH · PDF fileEMPIRICAL REGRESSION QUANTILE PROCESS WITH ... but also in environment analysis dealing with exposures to toxic chemicals (coming

EMPIRICAL REGRESSION QUANTILE PROCESS 5

Let us now consider Bn(α) as a process in α ∈ (0, 1). Assume that all modelerrors eni, i = 1, . . . , n are independent and equally distributed according to jointcontinuous increasing distribution function F. We are interested in the the averageregression quantile process

Bn(α) ={n1/2x∗⊤

n

(β∗

n(α) − β(α)); 0 < α < 1

}

where β(α) = (F−1(α) + β0, β1, . . . , βp)⊤ is the population counterpart of the

regression quantile. As proven in [20], the process Bn converges to a Gaussianprocess in the Skorokhod topology as n→ ∞, under mild conditions on F and Xn.

More precisely,

(2.8) BnD→ (f(F−1))−1W ∗ as n→ ∞

where W ∗ is the Brownian bridge on (0,1).However, we are rather interested in behavior of the process Bn under a finite

number of observations. The trajectories of Bn are step functions, nondecreasing inα ∈ (0, 1), and they have finite numbers of discontinuities for each n. As shown in[7], if Bn(α1) = Bn(α2) for 0 < α1 < α2 < 1, then α2 − α1 ≤ p+1

n with probability

1, then the length of interval, on which is Bn(α) constant, tends to 0 for n → ∞and fixed p. Let 0 < α1 < . . . < αJn

< 1 be the breakpoints of Bn(α), 0 < α < 1,and −∞ < Z1 < . . . < ZJn+1 < ∞ be the corresponding values of Bn(α) between

the breakpoints. Then we can consider the inversion Fn(z) of Bn(α), namely

Fn(z) = inf{α : Bn(α) ≥ z}, −∞ < z <∞.

It is a bounded nondecreasing step function and, given Y1, . . . , Yn satisfying (1.2),

Fn is a distribution function of a random variable attaining values Z1, . . . , ZJn+1

with probabilities equal to the spacings of 0, α1, . . . , αJn, 1. The tightness of the

empirical process Fn and its convergence to F was studied in [33] under some

specific conditions; Fn is recommended as an estimate of F, which would enablee.g. goodness-of-fit testing about F in the presence of a nuisance regression.

3. Properties of the averaged two-step regression quantile Bn(α)

While the advantage of Bn(α) is in its monotonicity, the inference based on the

process Bn(α) can be more comprehensible. Hence, we can consider the empirical

process Bn(α) based on two-step regression quantiles βn(α) as an alternative to

βn(α). Both processes are asymptotically equivalent as n→ ∞.

The two-step regression α-quantile treats the slope components β and the in-

tercept β0 separately. The slope component part is an R-estimate βnR of β. Itsadvantage is that it is invariant to the shift in location, hence independent of β0. Itstarts with selection of a nondecreasing function ϕ(u), u ∈ (0, 1), square-integrableon (0,1). Then we can consider two types of rank scores, generated by ϕ :

(1)

(3.1) Exact scores: An(i) = IE{ϕ(Un:i)}, i = 1, . . . , n

where Un:1 ≤ . . . ≤ Un:n is the ordered random sample of size n from the

uniform (0,1) distribution.

Page 6: EMPIRICAL REGRESSION QUANTILE PROCESS WITH · PDF fileEMPIRICAL REGRESSION QUANTILE PROCESS WITH ... but also in environment analysis dealing with exposures to toxic chemicals (coming

6 JANA JURECKOVA, MARTIN SCHINDLER, AND JAN PICEK

(2)

(3.2) Approximate scores:either (i) An(i) = n

∫ i/n

(i−1)/nϕ(u)du,

or (ii) An(i) = ϕ(

in+1

), i = 1, . . . , n.

The test criteria and estimates based on either of these scores are asymptoticallyequivalent as n→ ∞; but the rank tests based on the exact scores are locally most

powerful against pertinent alternatives under finite n. The R-estimator βnR of theslopes is a minimizer of the [14] measure of rank dispersion Dn(b) :

βnR = argminb∈RpDn(b),(3.3)

where Dn(b) =

n∑

i=1

(Yi − x⊤i b) An(Rni(Yi − x⊤

i b)), b ∈ Rp

and where An(·) can be replaced with An(·). Here Rni(Yi − x⊤i b) is the rank of

the i-th residual, i = 1, . . . , n. The intercept component βn0(α) pertaining to thetwo-step regression α-quantile is defined as the [nα]-order statistic of the residuals

Yi − x⊤i βnR, i = 1, . . . , n. The two-step α-regression quantile is then the vector

(3.4) β∗

n(α) =

(βn0(α)

βnR

)∈ R

p+1.

The typical choice of ϕ is the following:

(3.5) ϕλ(u) = λ− I[u < λ], 0 < u < 1, 0 < λ < 1

combined with the approximate scores (ii) in (3.2). These scores were originated in[15]; he used the following scores [now known as Hajek’s rank scores]:

(3.6) ai(λ,b) =

0 . . . Rni(Yi) < nλ

Rni(Yi)− nλ . . . nλ ≤ Rni(Yi) < nλ+ 1

1 . . . nλ+ 1 ≤ Rni(Yi).

The solutions of (3.3) are generally not uniquely determined. We can e.g. take thecenter of gravity of the set of all solutions; however, the asymptotic representationsand distributions apply to any solution.

Define the averaged two-step regression α-quantile Bn(α) as

(3.7) Bn(α) = x∗⊤n β

n(α).

By (3.3),

(3.8) Bn(α) =(Yi − (xi − xn)

⊤βnR

)n:[nα]

,

hence it is equal to the [nα]-th order statistic of the residuals Yi−(xi−xn)⊤βnR, i =

1, . . . , n. Then Bn(α) is obviously scale equivariant and regression equivariant. [21]originally considered the two-step regression α-quantile with λ = α in (3.5) foreach α ∈ (0, 1) in the R-estimator of the slopes. The averaged two-step versioncorresponding to this choice is very close to Bn(α), but for finite n it is generally notmonotone in α. However, it suffices to consider λ ∈ (0, 1) fixed, independent of α;

this makes Bn(α) monotone in α, thus invertible and simpler. Bn(α) is asymptoticequivalent to Bn(α) under general conditions, hence also asymptotically equivalent

to en:[nα] + β0 + x⊤nβ. Hence Bn(α) is a convenient tool for an inference under a

Page 7: EMPIRICAL REGRESSION QUANTILE PROCESS WITH · PDF fileEMPIRICAL REGRESSION QUANTILE PROCESS WITH ... but also in environment analysis dealing with exposures to toxic chemicals (coming

EMPIRICAL REGRESSION QUANTILE PROCESS 7

nuisance regression. The asymptotic equivalence of Bn(α) and Bn(α) will be proven

under the following mild conditions on F and on Xn = [xn1, . . . ,xnn]⊤ :

(A1): Smoothness of F : The errors eni, i = 1, . . . , n are independent andidentically distributed. Their distribution function F has an absolutelycontinuous density and positive and finite Fisher’s information:

0 < I(f) =∫ (

f ′(z)

f(z)

)2

dF <∞.

(A2): Noether’s condition on regressors:

limn→∞

Qn = Q where Qn = n−1n∑

i=1

(xi − xn)(xi − xn)⊤

and Q is positively definite p× p matrix; moreover,

limn→∞

max1≤i≤n

n−1(xi − xn)⊤Q−1

n (xi − xn) = 0.

(A3): Rate of regressors:

max1≤i≤n

‖xni − xn‖ = o(n1/4) as n→ ∞.

We shall prove the asymptotic equivalence for R-estimators based on score functionϕλ, 0 < λ < 1, because of its simplicity. However, an analogous proof applies toan R-estimator generated by any nondecreasing and square-integrable function ϕ.

Theorem 2. Let Bn(α) =(Yi − (xi − xn)

⊤βnR

)n:[nα]

be two-step averaged α-

regression quantile (TARQ) in the model (1.2), with R-estimator βnR generated byϕλ in (3.5), λ ∈ (0, 1) fixed. Then, under the conditions (A1)–(A3),

(i) n1/2[(Bn(α)− β0 − xnβ)− en:[nα]

]= op(1)(3.9)

(ii) n1/2∣∣∣Bn(α) − Bn(α)

∣∣∣ = op(1)(3.10)

as n→ ∞, uniformly over α ∈ (ε, 1− ε) ⊂ (0, 1), ∀ε ∈ (0, 12 ).

Proof. Let us write

Yi − (xi − xn)⊤βnR(3.11)

= eni + β0 + x⊤nβ − (xi − xn)

⊤(βnR − β), i = 1, . . . , n.

We shall study the [nα]-quantile of variables

rni = eni − (xi − xn)⊤(βnR − β) = Yi − (xi − xn)

⊤βnR − β0 − x⊤nβ, i = 1, . . . , n.

Recall the Bahadur representation of sample α-quantile en:[nα] of en1, . . . , enn :

n1/2[en:[nα] − F−1(α)]

= n−1/2[f(F−1(α))]−1n∑

i=1

{α− I[eni < F−1(α)]} + o(1)(3.12)

Page 8: EMPIRICAL REGRESSION QUANTILE PROCESS WITH · PDF fileEMPIRICAL REGRESSION QUANTILE PROCESS WITH ... but also in environment analysis dealing with exposures to toxic chemicals (coming

8 JANA JURECKOVA, MARTIN SCHINDLER, AND JAN PICEK

a.s. as n → ∞. Under conditions (A1)–(A3), the R-estimator βnR admits thefollowing asymptotic representation:

n12 (βnR − β)(3.13)

= n− 12 (f(F−1(λ))−1Q−1

n

n∑

i=1

(xi − xn)(λ− I[eni < F−1(λ)]

)+ op(n

−1/4),

hence ‖n 12 (βnR − β)‖ = Op(1). The details for (3.12) and (3.13) can be found in

[23].The [nα] quantile an(α) of rn1, . . . , rnn is a solution of the minimization

an(α) = arg mina∈IR1

n∑

i=1

ρα(rni − a),

where ρα(z) = |z|{αI[z > 0] + (1 − α)I[z < 0]}, z ∈ IR1. Denote as ψα the right-hand derivative of ρα, i.e. ψα(z) = α− I[z < 0], z ∈ IR. Using Lemma A.2 in [39],we can show that

n− 12

n∑

i=1

ψ(rni − an(α)) → 0, i.e.(3.14)

n− 12

n∑

i=1

(α− I

[eni − (xi − xn)

⊤(βnR − β) < an(α)])

→ 0

almost surely as n → ∞. Notice that∑n

i=1(xi − xn) = 0; hence we conclude from[23], Lemma 5.5, that it holds

sup‖b‖≤C

{n− 1

2

∣∣∣n∑

i=1

(I[eni − n− 1

2 (xi − xn)⊤b < an(α)]

−I[eni < an(α)])∣∣∣}= op(1) as n→ ∞,(3.15)

for every C, 0 < C < ∞. Inserting b 7→ n12 (βnR − β) = Op(1) into (3.15), we

obtain

n− 12

∣∣∣n∑

i=1

(I[eni − (xi − xn)

⊤(βnR − β) < an(α)]

−I [eni < an(α)])∣∣∣ = op(1) as n→ ∞.(3.16)

Combining (3.12), (3.14)–(3.16), we conclude that n1/2(an(α))−en:[nα]) = op(1) asn→ ∞, hence

(3.17) n1/2[Bn(α)− β0 − x⊤

nβ − en:[nα]

]= op(1) as n→ ∞

what gives (3.9). This together with Theorem 2 in [22] implies (3.10). �

Remark 1. βnR can be replaced by any√n-consistent R-estimator of β. How-

ever, the score function of type ϕλ is more convenient for computation and hencemore convenient for applications. Various choices of R-estimators are numericallycompared in Section 4.

Page 9: EMPIRICAL REGRESSION QUANTILE PROCESS WITH · PDF fileEMPIRICAL REGRESSION QUANTILE PROCESS WITH ... but also in environment analysis dealing with exposures to toxic chemicals (coming

EMPIRICAL REGRESSION QUANTILE PROCESS 9

4. Computation and numerical illustrations

The simulation study describes the methods for computation of the proposedestimates and illustrates their properties. For the computation of the averagedregression quantile Bn(α), the R package quantreg and its function rq() is used; itmakes use of a variant of the simplex algorithm.

Concerning the two-step averaged regression quantile Bn(α), the most difficult

is the first step - the computation of the R-estimator βnR of the slopes. In thenumerical illustration below the score function ϕλ(u) = λ − I[u < λ] from (3.5)and the approximate scores (i) from (3.2) are applied. In this case the functionrfit() from the R package Rfit could be directly used. For the rfit() function thescore function corresponding to (3.5) and (i) of (3.2) has to be defined, i.e. at

point ⌈λn⌉n+1 attaining the value ⌈λn⌉ − 1 − λ(n − 1) = An(i). The function rfit()

uses the minimization routine optim() which is a quasi-Newton optimizer. Thismethod works well for simple linear regression model but is less precise in caseof multiple regression. So, it is better to use the fact that when employing thescore function (3.5) with λ = α and the approximate scores (i) from (3.2) the slopecomponents of the regression α-quantile and the two-step regression α-quantile

coincide, βn(α) = βnR for every fixed α ∈ (0, 1), see [20]. Therefore, the rq()

function from the quantreg package is then used to find the exact solution βnR.The averaged regression quantile Bn(α) and the two-step averaged regression

quantile Bn(α) (and their inversions) can be used as the estimates of the quantilefunction (and of the distribution function, respectively) of the model errors. Thebehavior of the proposed estimates is illustrated in the following simulation study.

The regression model

(4.1) Yni = β0 + x⊤niβ + eni, i = 1, . . . , n

is simulated with the following parameters:

• sample size n = 25,• β0 = 5,• β = (β1, β2) = (−3, 2).

The columns of the regression matrix (x11, . . . , xn1)⊤ and (x12, . . . , xn2)

⊤ are gen-erated as two independent samples from the uniform distributions U(0, 4) andU(−4, 2), respectively, and are standardized so that

∑ni=1 xij = 0, j = 1, 2. The

errors eni are generated from the standard normal, the standard Cauchy or the gen-eralized extreme value (GEV) distribution with the shape parameter k = −0.5. For

each case 10 000 replications of the model were simulated and Bn(α) and Bn(α)

and their inversions were computed. For the two-step version Bn(α) the score-generating function (3.5) with fixed λ = 0.5 or 0.9 was used. For a comparison,the empirical quantile function of the errors eni and its inversion were calculated

as well. Empirical quantile estimates based on Bn and on Bn were then calculatedand plotted. Since the figures showing estimates of the true quantile functions andof the true distribution functions look very similar, up to the inversion, only thefigures for the distribution functions are presented. The statistical software R wasused for all calculations.

The Figures 1 - 3 show the empirical quantile estimates of the normal, Cauchyand GEV distribution functions. The approximation of the distribution functionsappears to be very good. We notice that in the case of two-step regression quantile

Page 10: EMPIRICAL REGRESSION QUANTILE PROCESS WITH · PDF fileEMPIRICAL REGRESSION QUANTILE PROCESS WITH ... but also in environment analysis dealing with exposures to toxic chemicals (coming

10 JANA JURECKOVA, MARTIN SCHINDLER, AND JAN PICEK

2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Estimates based on B~

(λ = 0.5)

Val

ues

of th

e di

strib

utio

n f.

2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Estimates based on B~

(λ = 0.9)

Val

ues

of th

e di

strib

utio

n f.

2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Estimates based on B

Val

ues

of th

e di

strib

utio

n f.

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

Estimates based on EQF

Val

ues

of th

e di

strib

utio

n f.

0.5 empirical quantile0.1 and 0.9 emp. q.0.01 and 0.99 emp. q.0.001 and 0.999 emp. q.true error distribution function

Figure 1. Empirical quantile estimates of normal distribution

function based on B, B(0.5), B(0.9) and empirical quantile func-tion (EQF) of errors.

Bn(α) with λ fixed, the quality of the estimate is sensitive to the choice of λ,especially for skewed or heavy-tailed distributions. The choice around λ = 0.5 isgenerally recommended.

5. Conclusion

The averaged regression quantile Bn(α) and its two-step modification Bn(α),0 < α < 1 appear to be very convenient tools in the analysis of various functionalsof the risk in the situation that the this depends on some exogenous variables, inthe intensity that is not fully under our control. The choice of α ∈ (0, 1) provides abalance between the concerns about underestimating and overestimating the lossesin the situation. The increasing α ր 1 reflects a greater concern about underes-

timating the loss, comparing to overestimating. Both Bn(α) and Bn(α) can beadvantageously used in estimating the expected shortfall (1.1) and other modernmeasures of the risk, as well as estimating and testing other characteristics of themarket or the everyday practice, based on the functionals of the quantiles.

Page 11: EMPIRICAL REGRESSION QUANTILE PROCESS WITH · PDF fileEMPIRICAL REGRESSION QUANTILE PROCESS WITH ... but also in environment analysis dealing with exposures to toxic chemicals (coming

EMPIRICAL REGRESSION QUANTILE PROCESS 11

−10 −5 0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

Estimates based on B~

(λ = 0.5)

Val

ues

of th

e di

strib

utio

n f.

−10 −5 0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

Estimates based on B~

(λ = 0.9)

Val

ues

of th

e di

strib

utio

n f.

−10 −5 0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

Estimates based on B

Val

ues

of th

e di

strib

utio

n f.

−15 −10 −5 0 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

Estimates based on EQF

Val

ues

of th

e di

strib

utio

n f.

0.5 empirical quantile0.1 and 0.9 emp. q.0.01 and 0.99 emp. q.0.001 and 0.999 emp. q.true error distribution function

Figure 2. Empirical quantile estimates of cauchy distribution

function based on B, B(0.5), B(0.9) and empirical quantile func-tion (EQF) of errors.

References

[1] Acerbi, C. and Dirk Tasche, D.: Expected shortfall: A natural coherent alternative to value

at risk, Economic Notes, 31, (2002), 379–388.[2] Angrist, J., Chernozhukov, V. and Fernandez-Val, I.: Quantile regression under misspec-

ification, with an application to the U.S. wage structure, Econometrica, 74, (2006), 539–563.[3] Arias, O., Hallock, K. F. and Sosa-Escudero, W.: Individual heterogeneity in the returns

to schooling: Instrumental variables quantile regression using twins data, Empirical Econom-ics, 26, (2001), 7–40.

[4] Bassett, G.W. Jr., A property of the observations fit by the extreme regression quantiles,Comp. Stat. Data Anal., 6, (1988), 353–359.

[5] Artzner, P., Delbaen, F., Eber, J.-M. and Heath, D.: Thinking coherently, RISK, 10/11,(1997).

[6] Artzner, P., Delbaen, F., Eber, J.-M. and Heath, D.: Coherent measures of risk, Math.Fin., 9, (1999), 203–228.

[7] Bassett, G. W. and Koenker, R. W.: An empirical quantile function for linear models with

iid errors, J. Amer. Statist. Assoc., 77, (1982), 405–415.[8] Braione, M. and Scholtes, N. K.: Forecasting value-at-risk under different distributional

assumptions, Econometrics, 4, (2016), 2–27.

Page 12: EMPIRICAL REGRESSION QUANTILE PROCESS WITH · PDF fileEMPIRICAL REGRESSION QUANTILE PROCESS WITH ... but also in environment analysis dealing with exposures to toxic chemicals (coming

12 JANA JURECKOVA, MARTIN SCHINDLER, AND JAN PICEK

5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

Estimates based on B~

(λ = 0.5)

Val

ues

of th

e di

strib

utio

n f.

5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

Estimates based on B~

(λ = 0.9)

Val

ues

of th

e di

strib

utio

n f.

5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

Estimates based on B

Val

ues

of th

e di

strib

utio

n f.

0 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

Estimates based on EQF

Val

ues

of th

e di

strib

utio

n f.

0.5 empirical quantile0.1 and 0.9 emp. q.0.01 and 0.99 emp. q.0.001 and 0.999 emp. q.true error distribution function

Figure 3. Empirical quantile estimates of GEV (k = −0.5) distri-

bution function based on B, B(0.5), B(0.9) and empirical quantilefunction (EQF) of errors.

[9] Cai, Z. and Wang, X.: Nonparametric estimation of conditional VaR and expected shortfall,Journal of Econometrics, 147, (2008), 120–130.

[10] Chan, J. S .K., Predicting loss reserves using quantile regression, Journal of Data Science,13, (2015), 127–156.

[11] Carroll, R. J., Maca, J. D. and Ruppert, D.: Nonparametric regression in the presence

of mesurement error, Biometrika, 86, (1999), 541–554.[12] Gutenbrunner,C. and Jureckova, J.: Regression rank scores and regression quantiles, Ann.

Statist., 20, (1992), 305–330.[13] Gutenbrunner,C., Jureckova, J., Koenker, R. and Portnoy, S.: Tests of linear hypothe-

ses based on regression rank scores, Nonpar. Statist., 2, (1993), 307–331.[14] Jaeckel, L. A.: Estimating regression coefficients by minimizing the dispersion of the resid-

uals, Ann. Math. Statist., 43, (1972), 1449–1459.[15] Hajek, J., Extension of the Kolmogorov-Smirnov test to regression alternatives, In: Pro-

ceedings of the International Research Seminar , LeCam, L., Editor, Univ. of California Press,Berkeley, (1965), 45–60.

[16] Jureckova, J.: Asymptotic relation ofM-estimates and R-estimates in the linear regression

model, Ann. Statist., 5, (1977), 464–472.[17] Jureckova, J.: Remark on extreme regression quantile, Sankhya, 69, (2007), 87–100.[18] Jureckova, J.: Remark on extreme regression quantile II, In: Proceedings of the 56th Ses-

sion, Section CPM026, (2007).

Page 13: EMPIRICAL REGRESSION QUANTILE PROCESS WITH · PDF fileEMPIRICAL REGRESSION QUANTILE PROCESS WITH ... but also in environment analysis dealing with exposures to toxic chemicals (coming

EMPIRICAL REGRESSION QUANTILE PROCESS 13

[19] Jureckova, J.: Finite sample behavior of averaged extreme regression quantile, Extremes,19, (2016), 41–49.

[20] Jureckova, J.: Regression Quantile and Averaged Regression Quantile Processes, In: An-

alytical Methods in Statistics, J. Antoch et al. (Eds.), Springer Proceedings in Mathematicsand Statistics, (2017), 53–62.

[21] Jureckova, J. and Picek, J.: Two-step regression quantiles, Sankhya, 67, (2005), 227–252.[22] Jureckova, J. and Picek, J.: Averaged regression quantiles, In: Contemporary Develop-

ments in Statistical Theory , Lahiri, S. et al. (Eds.), Springer Proceedings in Mathematics andStatistics, 68, (2014), 203–216.

[23] Jureckova, Sen, P.K., Picek, J.: Methodology in Robust and Nonparametric Statistics,(2013), CRC Press.

[24] Knight, K.: Limiting distributions for linear programming estimators, Extremes, 4, (2001),87–103.

[25] Koenker, R.: Quantile Regression, (2005), Cambridge University Press, Cambridge, UK.[26] Koenker, R. and Bassett, G.: Regression quantiles, Econometrica, 46, (1978), 33–50.[27] Koenker, R. and Bassett, G.: An empirical quantile function for linear models with iid

errors, J. of the American Statist. Assoc., 77, (1982), 407–415.[28] Koul, H. L.: Weighted empirical processes in dynamic nonlinear models, Lecture Notes in

Statistics, 166, (2000), Springer-Verlag.

[29] Koul, H. L. and Saleh, A. K. Md. E.: Autoregression quantiles and related rank-scores

processes, Ann. Statist., 23, (1995), 670–689.[30] Molak, V. (ed.): Fundamentals of risk analysis and risk management, (1997), CRC Press.[31] Pflug, G.: Some remarks on the value-at-risk and the conditional value-at-risk, In: Prob-

abilistic Constrained Optimization: Methodology and Applications (Uryasev, S., ed.), (2000),Kluwer Academic Publishers.

[32] Pollard, D.: Asymptotics for least absolute deviation regression estimators, EconometricTheory, 7, (1991), 186–199.

[33] Portnoy, S.: Tightness of the sequence of empiric c.d.f. processes defined from regression

fractiles, In: Robust and Nonlinear Time Series Analysis, Franke, J. et al. eds., Lecture Notesin Statistics, 26, (1984), 231–245.

[34] Portnoy, S.: Asymptotic behavior of the number of regression quantile breakpoints, SIAMJ. Sci. Statistical Computing, 12, (1991), 867–883.

[35] Portnoy, S. and Jureckova, J.: On extreme regression quantiles, Extremes, 2, (1999),227–243.

[36] Rockafellar, R.T. and Uryasev, S.: Conditional Value-at-Risk for general loss distribu-

tions, Research report 2001-5, ISE Depart., University of Florida, (2001).[37] Rockafellar, R.T. and Uryasev, S.: Rockafellar, R. T. and Uryasev, S., The Fundamental

Risk Quadrangle in Risk Management, Optimization and Statistical Estimation, Surveys inOperations Research and Management Science, 18, (2013), 33–53.

[38] Rockafellar, R. T., Royset, J. O. and Miranda, S. I.: Superquantile regression with

applications to buffered reliability, uncertainty quantification, and conditional value-at-risk,European Journal of Operational Research, 234, (2014), 140–154.

[39] Ruppert, D. and R. J. Carroll: Trimmed least squares estimation in the linear model, J.Amer. Statist. Assoc., 75, (1980), 828–838.

[40] Smith, R.: Nonregular regression, Biometrika, 81, (1994), 173–183.[41] Tasche, D.: Conditional expectation as quantile derivative, Working paper, TU Munchen,

(2000).[42] Trindade, A. A. , Uryasev, S., Shapiro, A. and Zrazhevsky, G: Financial prediction with

constrained tail risk, Journal of Banking & Finance, 31, (2007), 3524–3538.[43] Uryasev, S.: Conditional Value-at-Risk: Optimization Algorithms and Applications, Fi-

nancial Engineering News, 2/3, (2000).

Page 14: EMPIRICAL REGRESSION QUANTILE PROCESS WITH · PDF fileEMPIRICAL REGRESSION QUANTILE PROCESS WITH ... but also in environment analysis dealing with exposures to toxic chemicals (coming

14 JANA JURECKOVA, MARTIN SCHINDLER, AND JAN PICEK

(J. Jureckova) Department of Probability and Statistics, Charles University,

Faculty of Mathematics and Physics, Prague, Czech Republic

E-mail address: [email protected]

URL: http://www.karlin.mff.cuni.cz/~jurecko

(M. Schindler) Department of Applied Mathematics, Technical University

Liberec, Czech Republic

E-mail address: [email protected]

(J. Picek) Department of Applied Mathematics, Technical University

Liberec, Czech Republic

E-mail address: [email protected]