parametric families of distributions and their interaction with the workshop title chris jones the...
TRANSCRIPT
Parametric Families of Distributions and Their Interaction
with the Workshop Title
Chris Jones
The Open University, U.K.
How the talk will pan out …
• it will start as a talk in distribution theory– concentrating on generating one family of
distributions• then will continue as a talk in distribution theory
– concentrating on generating a different family of distributions
• but in this second part, the talk will metamorphose through links with kernels and quantiles …
• … and finally get on to a more serious application to smooth (nonparametric) QR
• the parts of the talk involving QR are joint with Keming Yu
Set
Starting point: simple symmetric g
How might we introduce (at most two) shape parameters a and b which will account for skewness and/or “kurtosis”/tailweight (while retaining unimodality)?
Modelling data with such families of distributions will, inter alia, afford robust estimation of location (and maybe scale).
.1,0
FAMILY 1
g
Actual density of order statistic:
)1,(
))(1()()()(
1
iniB
xGxGxgxf
ini
Generalised density of order statistic:
),(
))(1()()()(
11
baB
xGxGxgxf
ba
)),((1 baBetaGX
(i,n integer)
(a,b>0 real)
Roles of a and b
• a=b=1: f = g
• a=b: family of symmetric distributions
• a≠b: skew distributions
• a controls left-hand tail weight, b controls right
• the smaller a or b, the heavier the corresponding tail
Properties of (Generalised) Order Statistic Distributions
• Distribution function: • Tail behaviour. For large x>0:
– power tails:– exponential tails:
• Limiting distributions:– a and b large: normal distribution– one of a or b large, appropriate extreme value
distribution
),()( baI xG
1)1( ~~ bxfxg
Other properties such as moments and modality need to be examined on a case-by-case basis
bxx efeg ~~
For more, see Jones (2004, Test)
Tractable Example 1
1
2/1
2
2/1
2
2),(
11
)(
ba
ba
babaB
xba
x
xba
x
xf
Jones & Faddy’s (2003, JRSSB) skew t density
When a=b, Student t density on 2a d.f.
Some skew t densities
4)2,2( tt )2,4(t
)2,8(t
)2,128(t
… and with a and b swopped
4)2,2( tt
)4,2(t
)8,2(t
)128,2(t
f = skew t density arises from ???
g
Yes, the t distribution on 2 d.f.!
2/32 )2(
1)(
xxg
221
2
1)(
x
xxG
)1(2
12)(
uu
uuQ
Tractable Example 2
Q: The (order statistics of the) logistic distribution generate the ???
A : Log F distribution– This has exponential tails
These examples, seen before, are therefore log F distributions
The log F distribution
bax
ax
e
e
baBxf
)1(),(
1)(
axexfx ~)(
bxexfx ~)(
The simple exponential tail property is shared by:
• the log F distribution
• the asymmetric Laplace distribution
• the hyperbolic distribution
Is there a general form for such distributions?
)0()0(exp)(
xbxIxaxIba
abxf
2122
exp)( xba
xba
xf
FAMILY 2: distributions with simple exponential tails
Starting point: simple symmetric g with distribution function G and
x
dttGxG .)()(]2[
General form for density is:
)()(exp)( ]2[ xGbaaxxf
Special Cases
• G is point mass at zero, G^[2]=xI(x>0)☺f is asymmetric Laplace• G is logistic, G^[2]=log(1+exp(x))☺f is log F• G is t_2, G^[2]=½(x+√(1+x^2))☺f is hyperbolic• G is normal, G^[2]= xΦ(x)+φ(x)• G uniform, G^[2]=½(1+x)I(-1<x<1)+I(x>1)
solid line: log Fdashed line: hyperbolic
dotted line: normal-based
Practical Point 1
• the asymmetric Laplace is a three parameter distribution; other members of family have four;
• fourth parameter is redundant in practice: (asymptotic) correlations between ML estimates of σ and either of a or b are very near 1;
• reason: σ, a and b are all scale parameters, yet you only need two such parameters to describe main scale-related aspects of distribution [either (i) a left-scale and a right-scale or (ii) an overall scale and a left-right comparer]
Practical Point 2
Parametrise by μ, σ, a=1-p, b=p.Then, score equation for μ reads:
This is kernel quantile estimation, with kernel G and bandwidth σ
n
i
iXGn
p1
1
Includes bandwidth selection by choosing σ to solve the second score equation:
But its simulation performance is variable:
n
i
ii
XGpX
n 1
)(1
And so to Quantile Regression:
The usual (regression) log-likelihood,
,)1(log1
]2[
n
iiiii XY
GXY
pn
is kernel localised to point x by
n
iiiiii xXY
GxXY
pnh
XxK1
1]2[1 )()()1(log
this (version of) DOUBLE KERNEL LOCALLINEAR QUANTILE REGRESSION satisfies
Writing )()()( 1
1
)(i
kn
j iki XxhKxXx
and ,)(1
)(
n
i
kik xS
.1,0,)(
)()( 11
)(0
k
YxXGxxpS iin
i
ki
Contrast this with Yu & Jones (1998, JASA) version of DKLLQR:
,1,0,)())()()((1
2120
k
YGxwxSxSxSp in
i i
).()()()()( 1)1(
2)0( xSxxSxxw iii where
The ‘vertical’ bandwidth σ=σ(x) can also be estimated by ML: solve
.)(
)]()[()( 111
)0(0
ii
ii
n
i i
YxXGpxXYxxS
Compare 3 versions of DKLLQR:
,~pq
,
0ˆ pq
,
1ˆ pq
Yu & Jones (1998) including r-o-t σ and h;
new version including r-o-t σ and h;
new version including above σ and r-o-t h.
Based on this limited evidence:
• Clear recommendation:– replace Yu & Jones (1998) DKLLQR method
by (gently but consistently improved) new version
• Unclear non-recommendation:– use new bandwidth selection?
References