beyond gaussian processes: on the distributions of …lisa/seminaires/07-03-2006-2.pdf · lim jxj!1...

Beyond Gaussian Processes: On theDistributions of Infinite Networks

Ricky Der Daniel LeeDepartment of Mathematics Department of Electrical Engineeringrickyder@math.upenn.edu ddlee@seas.upenn.edu

Beyond Gaussian Processes: On the Distributions of Infinite Networks – p.1/18

Prior on neural networks

Extension to Radford Neal’s ph.d. thesis:

fn(x) = 1sn

j=1v jh(x; u j) ≡

j=1v jh j(x),

this can be viewed as a multi-layer perceptron withinput x, hidden functions h, weights u j, output weigthsv j and sn, a sequence of normalizing constants.

Gaussian process limit

When v j are i.i.d. with finite variance, Neal has shownthat the limiting distribution (n→ ∞), is a Gaussianprocess.

The authors investigate1. the case when v j has infinite variance2. the case when v j is not i.i.d.3. the possibility of doing regression with stable

processes

Gaussian process limit

When v j are i.i.d. with finite variance, Neal has shownthat the limiting distribution (n→ ∞), is a Gaussianprocess.The authors investigate

1. the case when v j has infinite variance2. the case when v j is not i.i.d.3. the possibility of doing regression with stable

processes

Assumptions

• h j(x) ≡ h(x, u j) are uniformily bounded in x, e.g. his a fixed nonlinearity

• {u j} is an i.i.d. sequence• this entails that h j(x) are i.i.d. for fixed x and

independant of {v j}

The choice of output priors v j will dictate the large net-work behavior.

Assumptions

• h j(x) ≡ h(x, u j) are uniformily bounded in x, e.g. his a fixed nonlinearity

• {u j} is an i.i.d. sequence• this entails that h j(x) are i.i.d. for fixed x and

independant of {v j}

The choice of output priors v j will dictate the large net-work behavior.

Stable distributions

X1, X2 independent copies of Gaussian variable X,then for any a, b ∈ �

aX1 + bX2d= cX + d,

for some c, d ∈ �.

This stability property is satisfied by all the stable dis-tributions.

Stable distributions

X1, X2 independent copies of Gaussian variable X,then for any a, b ∈ �

aX1 + bX2d= cX + d,

for some c, d ∈ �.

This stability property is satisfied by all the stable dis-tributions.

Definition of stable variable

A symmetric stable distribution has the followingcharacteristic function:

Φ(t) = e−σα|t|α ,

where σ > 0 is the spread parameter and 0 ≤ α ≤ 2 isthe stability index.

We write X ∼ S α(σ) for a symmetric α-stable variableof spread σ > 0.

Xα = 2 is the Gaussian case and for α < 2, E[X2] = ∞.

where σ > 0 is the spread parameter and 0 ≤ α ≤ 2 isthe stability index.We write X ∼ S α(σ) for a symmetric α-stable variableof spread σ > 0.

Xα = 2 is the Gaussian case and for α < 2, E[X2] = ∞.

where σ > 0 is the spread parameter and 0 ≤ α ≤ 2 isthe stability index.We write X ∼ S α(σ) for a symmetric α-stable variableof spread σ > 0.

Xα = 2 is the Gaussian case and for α < 2, E[X2] =∞.

Domain of attraction

Let Y1, ..., Yn be independent copies of a randomvariable Y; we say that Y belongs to the domain of thevariable X if:

an +1sn

j=1Y j

d→ X,

for appropriate sequences an, sn ∈ �. Then X must bestable.

Multivariate stable distributions

Let X1, X2 be independent copies of X, then X is stableif for every a, b ∈ �, there exists c ∈ � such that:

aX1 + bX2d= cX

A process is said to be stable if all its finite-dimensionaldistributions are multivariate stable.

Multivariate stable distributions

Let X1, X2 be independent copies of X, then X is stableif for every a, b ∈ �, there exists c ∈ � such that:

aX1 + bX2d= cX

A process is said to be stable if all its finite-dimensionaldistributions are multivariate stable.

Characteristic function

Theorem X is a symmetric α-stable vector if andonly if it has characteristic function

Φ(t) = exp{

S d−1| < t, s > |αdΓ(s)

where Γ is a finite measure on the unit (d-1)-sphereS d−1, and 0 ≤ α ≤ 2.

Preliminary result

Lemma Let v ∼ S α(σ) and let h be independent of vwith E|h|α < ∞. If y = hv, and {yi} are independentcopies of y, then

1n1/α

d→ X,

where X is α-stable with characteristic function:

Φ(t) = exp {−|σt|αE|h|α} .

Stable prior

Proposition Let the output weights of the neuralnetwork be i.i.d. v j ∼ S α(σ). Then

fn(x) = 1n1/α

j=1v jh j(x) d

→ f (x),

where f (x) is a symmetric α-stable process.The finite-dimensional distribution of ( f (x1), . . . , f (xd))has characteristic function:

Ψ(t) = exp(−σαEh| < t, h > |α).

Normal domain of attraction

The normal domain of attraction of index αencompasses distributions whose tails areasymptoticly equivalent to |x|−(α+1), for 0 < α < 2.

XThe previous proposition holds if the output weightsv j are i.i.d. random variable in the normal domain ofattraction of index α.

Normal domain of attraction

The normal domain of attraction of index αencompasses distributions whose tails areasymptoticly equivalent to |x|−(α+1), for 0 < α < 2.

XThe previous proposition holds if the output weightsv j are i.i.d. random variable in the normal domain ofattraction of index α.

Local Brownian motion

Let h(x) = sgn(a + ux), the step-function, where a and uare independant Gaussians with zero mean.

lim|x|→∞

f (x) = constant

In the “central region”, Neal has shown that this givesrise to a local Brownian motion in the central regime.

Symmetric α-stable Lévy motion

Symmetric α-stable priors give rise to a symmetricα-stable Lévy motion, i.e. a process {wt; t ∈ �}satisfying:• w0 = 0 almost surely• independent increments wt − ws, with s < t• wt − ws ∼ S α(|t − s|1/α)

Brownian vs Lévy processes

0 200 400 600 800 1000−3

0 200 400 600 800 1000−1500

−1000

−500

0 200 400 600 800 1000−50

0 200 400 600 800 1000−1000

−500

Student Version of MATLAB

Limits with non-i.i.d. priors

What conditions on independent priors v j, notnecessarily identically distributed for convergence to aGaussian process?

An easy condition to verify is given by the followingcorollary which is based on the Lindeberg-Fellertheorem.Corollary If the ouput weights {v j} are a uniformlybounded sequence of independent variables, andlimn→∞ sn = ∞ then fn(x) converges to a Gaussianprocess.

What conditions on independent priors v j, notnecessarily identically distributed for convergence to aGaussian process?An easy condition to verify is given by the followingcorollary which is based on the Lindeberg-Fellertheorem.

Corollary If the ouput weights {v j} are a uniformlybounded sequence of independent variables, andlimn→∞ sn = ∞ then fn(x) converges to a Gaussianprocess.

What conditions on independent priors v j, notnecessarily identically distributed for convergence to aGaussian process?An easy condition to verify is given by the followingcorollary which is based on the Lindeberg-Fellertheorem.Corollary If the ouput weights {v j} are a uniformlybounded sequence of independent variables, andlimn→∞ sn = ∞ then fn(x) converges to a Gaussianprocess.

Learning with stable processes

Instead of using a neural network, perform Gaussianprocess regression.XLimitation: Gaussian processes not as rich as finiteneural networks.

Regression problem: y(x) = u(x) + ε, estimate u(x) fromy(xi) and ε ⊥ u(x).Generalization of Gaussian process: place an α-stableprocess prior on y(x) and ε is i.i.d. α-stable noise.

Learning with stable processes

Instead of using a neural network, perform Gaussianprocess regression.XLimitation: Gaussian processes not as rich as finiteneural networks.

Regression problem: y(x) = u(x) + ε, estimate u(x) fromy(xi) and ε ⊥ u(x).Generalization of Gaussian process: place an α-stableprocess prior on y(x) and ε is i.i.d. α-stable noise.

Classes of stable processesSymmetric α-stable processes with discrete spectralmeasure:• {v j} j: an i.i.d α-stable process• µ(x): a mean function• h(x, v): a bivariate filter function that introduces

dependency

Sub-Gaussian processes are α-stable processes of theform u(x) = A1/2G(x) where:• A is a totally right-skewed α/2-stable variable• G is a Gaussian process of mean zero and

covariance K

Classes of stable processesSymmetric α-stable processes with discrete spectralmeasure:• {v j} j: an i.i.d α-stable process• µ(x): a mean function• h(x, v): a bivariate filter function that introduces

dependency

Sub-Gaussian processes are α-stable processes of theform u(x) = A1/2G(x) where:• A is a totally right-skewed α/2-stable variable• G is a Gaussian process of mean zero and

covariance K

beyond gaussian processes: on the distributions of …lisa/seminaires/07-03-2006-2.pdf · lim jxj!1...

Documents

09 almeida infojus jxj (1)

angular distributions, energy distributions- …

discrete probability distributions chapter 4. § 4.1...

probability distributions, sampling distributions and...

lecture 2: discrete distributions, normal distributions ›...

chaotic dynamics for perturbations of innite-dimensional...

qualified charitable distributions€¦ · only...

hydraulic innite linear actuator the ballistic gait

chapter 5: probability distributions: discrete probability...

probability review - university of...

combinatorics on words: an...

civl 3103 - memphis distributions... · 2011. 8. 29. ·...

brief review probability and statistics. probability...

mathematical formula handbook -...

compalg.inf.elte.hucompalg.inf.elte.hu/~tony/percolation/gacswinkler.pdfcompatible...

probability distributions as91586 apply probability...

chapter 9 distributions: population, sample and sampling...

1 chapter 8: the binomial and geometric distributions...

[chapter 5. multivariate probability...

nonlinear stabilization in infinite dimension · nonlinear...