parametric families of distributions and their interaction with the workshop title chris jones the...

Parametric Families of Distributions and Their Interaction

with the Workshop Title

Chris Jones

The Open University, U.K.

How the talk will pan out …

• it will start as a talk in distribution theory– concentrating on generating one family of

distributions• then will continue as a talk in distribution theory

– concentrating on generating a different family of distributions

• but in this second part, the talk will metamorphose through links with kernels and quantiles …

• … and finally get on to a more serious application to smooth (nonparametric) QR

• the parts of the talk involving QR are joint with Keming Yu

Set

Starting point: simple symmetric g

How might we introduce (at most two) shape parameters a and b which will account for skewness and/or “kurtosis”/tailweight (while retaining unimodality)?

Modelling data with such families of distributions will, inter alia, afford robust estimation of location (and maybe scale).

.1,0

FAMILY 1

g

Actual density of order statistic:

)1,(

))(1()()()(

1

iniB

xGxGxgxf

ini

Generalised density of order statistic:

),(

))(1()()()(

11

baB

xGxGxgxf

ba

)),((1 baBetaGX

(i,n integer)

(a,b>0 real)

Roles of a and b

• a=b=1: f = g

• a=b: family of symmetric distributions

• a≠b: skew distributions

• a controls left-hand tail weight, b controls right

• the smaller a or b, the heavier the corresponding tail

Properties of (Generalised) Order Statistic Distributions

• Distribution function: • Tail behaviour. For large x>0:

– power tails:– exponential tails:

• Limiting distributions:– a and b large: normal distribution– one of a or b large, appropriate extreme value

distribution

),()( baI xG

1)1( ~~ bxfxg

Other properties such as moments and modality need to be examined on a case-by-case basis

bxx efeg ~~

For more, see Jones (2004, Test)

Tractable Example 1

1

2/1

2

2/1

2

2),(

11

)(

ba

ba

babaB

xba

x

xba

x

xf

Jones & Faddy’s (2003, JRSSB) skew t density

When a=b, Student t density on 2a d.f.

Some skew t densities

4)2,2( tt )2,4(t

)2,8(t

)2,128(t

… and with a and b swopped

4)2,2( tt

)4,2(t

)8,2(t

)128,2(t

f = skew t density arises from ???

g

Yes, the t distribution on 2 d.f.!

2/32 )2(

1)(

xxg

221

2

1)(

x

xxG

)1(2

12)(

uu

uuQ

Tractable Example 2

Q: The (order statistics of the) logistic distribution generate the ???

A : Log F distribution– This has exponential tails

These examples, seen before, are therefore log F distributions

The log F distribution

bax

ax

e

e

baBxf

)1(),(

1)(

axexfx ~)(

bxexfx ~)(

The simple exponential tail property is shared by:

• the log F distribution

• the asymmetric Laplace distribution

• the hyperbolic distribution

Is there a general form for such distributions?

)0()0(exp)(

xbxIxaxIba

abxf

2122

exp)( xba

xba

xf

FAMILY 2: distributions with simple exponential tails

Starting point: simple symmetric g with distribution function G and

x

dttGxG .)()(]2[

General form for density is:

)()(exp)( ]2[ xGbaaxxf

Special Cases

• G is point mass at zero, G^[2]=xI(x>0)☺f is asymmetric Laplace• G is logistic, G^[2]=log(1+exp(x))☺f is log F• G is t_2, G^[2]=½(x+√(1+x^2))☺f is hyperbolic• G is normal, G^[2]= xΦ(x)+φ(x)• G uniform, G^[2]=½(1+x)I(-1<x<1)+I(x>1)

solid line: log Fdashed line: hyperbolic

dotted line: normal-based

Practical Point 1

• the asymmetric Laplace is a three parameter distribution; other members of family have four;

• fourth parameter is redundant in practice: (asymptotic) correlations between ML estimates of σ and either of a or b are very near 1;

• reason: σ, a and b are all scale parameters, yet you only need two such parameters to describe main scale-related aspects of distribution [either (i) a left-scale and a right-scale or (ii) an overall scale and a left-right comparer]

Practical Point 2

Parametrise by μ, σ, a=1-p, b=p.Then, score equation for μ reads:

This is kernel quantile estimation, with kernel G and bandwidth σ

n

i

iXGn

p1

1

Includes bandwidth selection by choosing σ to solve the second score equation:

But its simulation performance is variable:

n

i

ii

XGpX

n 1

)(1

And so to Quantile Regression:

The usual (regression) log-likelihood,

,)1(log1

]2[

n

iiiii XY

GXY

pn

is kernel localised to point x by

n

iiiiii xXY

GxXY

pnh

XxK1

1]2[1 )()()1(log

this (version of) DOUBLE KERNEL LOCALLINEAR QUANTILE REGRESSION satisfies

Writing )()()( 1

1

)(i

kn

j iki XxhKxXx

and ,)(1

)(

n

i

kik xS

.1,0,)(

)()( 11

)(0

k

YxXGxxpS iin

i

ki

Contrast this with Yu & Jones (1998, JASA) version of DKLLQR:

,1,0,)())()()((1

2120

k

YGxwxSxSxSp in

i i

).()()()()( 1)1(

2)0( xSxxSxxw iii where

The ‘vertical’ bandwidth σ=σ(x) can also be estimated by ML: solve

.)(

)]()[()( 111

)0(0

ii

ii

n

i i

YxXGpxXYxxS

Compare 3 versions of DKLLQR:

,~pq

,

0ˆ pq

,

1ˆ pq

Yu & Jones (1998) including r-o-t σ and h;

new version including r-o-t σ and h;

new version including above σ and r-o-t h.

Based on this limited evidence:

• Clear recommendation:– replace Yu & Jones (1998) DKLLQR method

by (gently but consistently improved) new version

• Unclear non-recommendation:– use new bandwidth selection?

References

parametric families of distributions and their interaction with the workshop title chris jones the...

Documents

g slide

f distributions slide

bandwidth slide

log f distribution slide

test slide

b swopped slide

t distribution

distribution function