beyond gaussian processes: on the distributions of …lisa/seminaires/07-03-2006-2.pdf · lim jxj!1...
Post on 29-Apr-2018
216 Views
Preview:
TRANSCRIPT
Beyond Gaussian Processes: On theDistributions of Infinite Networks
Ricky Der Daniel LeeDepartment of Mathematics Department of Electrical Engineeringrickyder@math.upenn.edu ddlee@seas.upenn.edu
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.1/18
Prior on neural networks
Extension to Radford Neal’s ph.d. thesis:
fn(x) = 1sn
n∑
j=1v jh(x; u j) ≡
1sn
n∑
j=1v jh j(x),
this can be viewed as a multi-layer perceptron withinput x, hidden functions h, weights u j, output weigthsv j and sn, a sequence of normalizing constants.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.2/18
Gaussian process limit
When v j are i.i.d. with finite variance, Neal has shownthat the limiting distribution (n→ ∞), is a Gaussianprocess.
The authors investigate1. the case when v j has infinite variance2. the case when v j is not i.i.d.3. the possibility of doing regression with stable
processes
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.3/18
Gaussian process limit
When v j are i.i.d. with finite variance, Neal has shownthat the limiting distribution (n→ ∞), is a Gaussianprocess.The authors investigate
1. the case when v j has infinite variance2. the case when v j is not i.i.d.3. the possibility of doing regression with stable
processes
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.3/18
Assumptions
• h j(x) ≡ h(x, u j) are uniformily bounded in x, e.g. his a fixed nonlinearity
• {u j} is an i.i.d. sequence• this entails that h j(x) are i.i.d. for fixed x and
independant of {v j}
The choice of output priors v j will dictate the large net-work behavior.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.4/18
Assumptions
• h j(x) ≡ h(x, u j) are uniformily bounded in x, e.g. his a fixed nonlinearity
• {u j} is an i.i.d. sequence• this entails that h j(x) are i.i.d. for fixed x and
independant of {v j}
The choice of output priors v j will dictate the large net-work behavior.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.4/18
Stable distributions
X1, X2 independent copies of Gaussian variable X,then for any a, b ∈ �
aX1 + bX2d= cX + d,
for some c, d ∈ �.
This stability property is satisfied by all the stable dis-tributions.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.5/18
Stable distributions
X1, X2 independent copies of Gaussian variable X,then for any a, b ∈ �
aX1 + bX2d= cX + d,
for some c, d ∈ �.
This stability property is satisfied by all the stable dis-tributions.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.5/18
Definition of stable variable
A symmetric stable distribution has the followingcharacteristic function:
Φ(t) = e−σα|t|α ,
where σ > 0 is the spread parameter and 0 ≤ α ≤ 2 isthe stability index.
We write X ∼ S α(σ) for a symmetric α-stable variableof spread σ > 0.
Xα = 2 is the Gaussian case and for α < 2, E[X2] = ∞.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.6/18
Definition of stable variable
A symmetric stable distribution has the followingcharacteristic function:
Φ(t) = e−σα|t|α ,
where σ > 0 is the spread parameter and 0 ≤ α ≤ 2 isthe stability index.We write X ∼ S α(σ) for a symmetric α-stable variableof spread σ > 0.
Xα = 2 is the Gaussian case and for α < 2, E[X2] = ∞.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.6/18
Definition of stable variable
A symmetric stable distribution has the followingcharacteristic function:
Φ(t) = e−σα|t|α ,
where σ > 0 is the spread parameter and 0 ≤ α ≤ 2 isthe stability index.We write X ∼ S α(σ) for a symmetric α-stable variableof spread σ > 0.
Xα = 2 is the Gaussian case and for α < 2, E[X2] =∞.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.6/18
Domain of attraction
Let Y1, ..., Yn be independent copies of a randomvariable Y; we say that Y belongs to the domain of thevariable X if:
an +1sn
n∑
j=1Y j
d→ X,
for appropriate sequences an, sn ∈ �. Then X must bestable.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.7/18
Multivariate stable distributions
Let X1, X2 be independent copies of X, then X is stableif for every a, b ∈ �, there exists c ∈ � such that:
aX1 + bX2d= cX
A process is said to be stable if all its finite-dimensionaldistributions are multivariate stable.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.8/18
Multivariate stable distributions
Let X1, X2 be independent copies of X, then X is stableif for every a, b ∈ �, there exists c ∈ � such that:
aX1 + bX2d= cX
A process is said to be stable if all its finite-dimensionaldistributions are multivariate stable.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.8/18
Characteristic function
Theorem X is a symmetric α-stable vector if andonly if it has characteristic function
Φ(t) = exp{
−
∫
S d−1| < t, s > |αdΓ(s)
}
where Γ is a finite measure on the unit (d-1)-sphereS d−1, and 0 ≤ α ≤ 2.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.9/18
Preliminary result
Lemma Let v ∼ S α(σ) and let h be independent of vwith E|h|α < ∞. If y = hv, and {yi} are independentcopies of y, then
1n1/α
n∑
i=1yi
d→ X,
where X is α-stable with characteristic function:
Φ(t) = exp {−|σt|αE|h|α} .
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.10/18
Stable prior
Proposition Let the output weights of the neuralnetwork be i.i.d. v j ∼ S α(σ). Then
fn(x) = 1n1/α
n∑
j=1v jh j(x) d
→ f (x),
where f (x) is a symmetric α-stable process.The finite-dimensional distribution of ( f (x1), . . . , f (xd))has characteristic function:
Ψ(t) = exp(−σαEh| < t, h > |α).
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.11/18
Normal domain of attraction
The normal domain of attraction of index αencompasses distributions whose tails areasymptoticly equivalent to |x|−(α+1), for 0 < α < 2.
XThe previous proposition holds if the output weightsv j are i.i.d. random variable in the normal domain ofattraction of index α.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.12/18
Normal domain of attraction
The normal domain of attraction of index αencompasses distributions whose tails areasymptoticly equivalent to |x|−(α+1), for 0 < α < 2.
XThe previous proposition holds if the output weightsv j are i.i.d. random variable in the normal domain ofattraction of index α.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.12/18
Local Brownian motion
Let h(x) = sgn(a + ux), the step-function, where a and uare independant Gaussians with zero mean.
lim|x|→∞
f (x) = constant
In the “central region”, Neal has shown that this givesrise to a local Brownian motion in the central regime.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.13/18
Symmetric α-stable Lévy motion
Symmetric α-stable priors give rise to a symmetricα-stable Lévy motion, i.e. a process {wt; t ∈ �}satisfying:• w0 = 0 almost surely• independent increments wt − ws, with s < t• wt − ws ∼ S α(|t − s|1/α)
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.14/18
Brownian vs Lévy processes
0 200 400 600 800 1000−3
−2
−1
0
1
2
3
0 200 400 600 800 1000−1500
−1000
−500
0
500
1000
0 200 400 600 800 1000−50
−40
−30
−20
−10
0
10
0 200 400 600 800 1000−1000
−500
0
500
1000
1500
Student Version of MATLAB
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.15/18
Limits with non-i.i.d. priors
What conditions on independent priors v j, notnecessarily identically distributed for convergence to aGaussian process?
An easy condition to verify is given by the followingcorollary which is based on the Lindeberg-Fellertheorem.Corollary If the ouput weights {v j} are a uniformlybounded sequence of independent variables, andlimn→∞ sn = ∞ then fn(x) converges to a Gaussianprocess.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.16/18
Limits with non-i.i.d. priors
What conditions on independent priors v j, notnecessarily identically distributed for convergence to aGaussian process?An easy condition to verify is given by the followingcorollary which is based on the Lindeberg-Fellertheorem.
Corollary If the ouput weights {v j} are a uniformlybounded sequence of independent variables, andlimn→∞ sn = ∞ then fn(x) converges to a Gaussianprocess.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.16/18
Limits with non-i.i.d. priors
What conditions on independent priors v j, notnecessarily identically distributed for convergence to aGaussian process?An easy condition to verify is given by the followingcorollary which is based on the Lindeberg-Fellertheorem.Corollary If the ouput weights {v j} are a uniformlybounded sequence of independent variables, andlimn→∞ sn = ∞ then fn(x) converges to a Gaussianprocess.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.16/18
Learning with stable processes
Instead of using a neural network, perform Gaussianprocess regression.XLimitation: Gaussian processes not as rich as finiteneural networks.
Regression problem: y(x) = u(x) + ε, estimate u(x) fromy(xi) and ε ⊥ u(x).Generalization of Gaussian process: place an α-stableprocess prior on y(x) and ε is i.i.d. α-stable noise.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.17/18
Learning with stable processes
Instead of using a neural network, perform Gaussianprocess regression.XLimitation: Gaussian processes not as rich as finiteneural networks.
Regression problem: y(x) = u(x) + ε, estimate u(x) fromy(xi) and ε ⊥ u(x).Generalization of Gaussian process: place an α-stableprocess prior on y(x) and ε is i.i.d. α-stable noise.
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.17/18
Classes of stable processesSymmetric α-stable processes with discrete spectralmeasure:• {v j} j: an i.i.d α-stable process• µ(x): a mean function• h(x, v): a bivariate filter function that introduces
dependency
Sub-Gaussian processes are α-stable processes of theform u(x) = A1/2G(x) where:• A is a totally right-skewed α/2-stable variable• G is a Gaussian process of mean zero and
covariance K
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.18/18
Classes of stable processesSymmetric α-stable processes with discrete spectralmeasure:• {v j} j: an i.i.d α-stable process• µ(x): a mean function• h(x, v): a bivariate filter function that introduces
dependency
Sub-Gaussian processes are α-stable processes of theform u(x) = A1/2G(x) where:• A is a totally right-skewed α/2-stable variable• G is a Gaussian process of mean zero and
covariance K
Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.18/18
top related