2013 29 04 miguelhernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 1/22

Gaussian Process Vine Copulas forMultivariate Dependence

Jose Miguel Hernandez-Lobato1,2

joint work with David Lopez-Paz2,3 and Zoubin Ghahramani1

1Department of Engineering, Cambridge University, Cambridge, UK3Max Planck Institute for Intelligent Systems, Tubingen, Germany

April 29, 2013

2Both authors are equal contributors.1



What is a Copula? Informal DefinitionA copula is a function that links univariate marginal distributions into a

joint multivariate one.

The copula specifies the dependencies among the random variables.2



What is a Copula? Formal DefinitionA copula is a distribution function with marginals uniform in [0, 1] .

Let U 1, . . . , U d be r.v. uniformly distributed in [0, 1] with copula C then

C (u 1, . . . , u d ) = p (U 1 ≤ u 1, . . . , U d ≤ u d ) .

Sklar’s theorem (connection between joints, marginals and copulas)

Any joint cdf F (x 1, . . . , x d ) with marginal cdfs F 1(x 1), . . . ,F d (x d ) satisfies

F (x 1, . . . , x d ) = C (F 1(x 1), . . . ,F d (x d )) ,

where C is the copula of F .

It is easy to show that the joint pdf f can be written as

f (x 1, . . . , x d ) = c (F 1(x 1), . . . , F d (x d ))d i =1

f i (x i ) ,

c (u 1, . . . , u d ) and f 1(x 1), . . . , f d (x d ) are the copula and marginal densities.3



Why are Copulas Useful in Machine Learning?

The converse of Sklar’s theorem is also true:

Given a copula C : [0, 1]d → [0, 1] and margins F 1(x 1), . . . , F d (x d ) thenC (F 1(x 1), . . . , F d (x d )) represents a valid joint cdf.

Copulas are a powerful tool for the modeling of multivariate data .

We can easily extend univariate models to the multivariate regime.Copulas simplify the estimation process for multivariate models.

1 - Estimate the marginal distributions. 2 - Map the data to [0, 1]d using the estimated marginals. 3 - Estimate a copula function given the mapped data.

Learning the marginals : easily done using standard univariate methods.

Learning the copula : difficult, requires to use copula models that i) canrepresent a broad range of dependencies and ii) are robust to overfitting.

4



Parametric Copula ModelsThere are many parametric 2D copulas. Some examples are...

Usually depend on a single scalar parameter θ which is in a one-to-onerelationship with Kendall’s tau rank correlation coefficient, defined as

τ = p [(U 1 − U 1)(U 2 − U 2) > 0] − p [(U 1 − U 1)(U 2 − U 2) < 0]

= p [concordance]− p [discordance] ,

where (U 1, U 2) and (U 1, U 2) are independent samples from the copula.

However, in higher dimensions, the number and expressiveness of

parametric copulas is more limited .5



Vine Copulas

They are hierarchical graphical models that factorize c (u 1, . . . , u d ) intoa product of d (d − 1)/2 bivariate conditional copula densities.

We can factorize c (u 1, u 2, u 3) using the product rule of probability as

c (u 1, u 2, u 3) = f 3|12(u 3|u 1, u 2)f 2|1(u 2|u 1)

and we can express each factor in terms of bivariate copula functions

6



Regular VinesA regular vine specifies a factorization of c (u 1, . . . , u d ).

Formed by d − 1 trees T 1, . . . , T d −1 with node and edge sets V i and E i .

Each edge e in any tree has associated three sets of variables C (e ), D (e ),N (e ) ⊆ {1, . . . , d } called conditioned, conditioning and constraint sets.

V 1 = {1, . . . , d } and E 1 forms a spanning tree over a complete graph G 1

over V 1. For any e ∈ E 1, C (e ) = N (e ) = e and D (e ) = ∅.

For i > 1, V i = E i −1 and E i forms a spanning tree over a graph G i with

nodes V i and edges e = {e 1, e 2} such that e 1, e 2 ∈ E i −1 and e 1 ∩ e 2 = ∅.

For any e = {e 1, e 2} ∈ E i , i > 1, we have that C (e ) = N (e 1)∆N (e 2),D (e ) = N (e 1) ∩ N (e 2) and N (e ) = N (e 1) ∪ N (e 2).

c (u 1, . . . , u d ) =d −1i =1

e ∈E i

c C (e )|D (e ) .

8



Example of a Regular Vine

9



Using Regular Vines in Practice

Selecting a particular factorization:

Many possible factorizations . Each one determined by the specificchoices of spanning trees T 1, . . . , T d −1.

In practice, each tree T i is chosen by assigning a weight to each edge in

G i and then selecting the corresponding maximum spanning tree.

The weight for the edge e is usually related to the dependence levelbetween the variables in C (e ) (often measured in terms of Kendall’s tau).

It is common to prune the vine and consider only a few of the first trees.

Dealing with conditional bivariate copulas:

Use the simplifying assumption : c C (e )|D (e ) does not depend on D (e ).

Our main contribution: avoid making use of the simplifying assumption.

10



A Semi-parametric Model for Conditional Copulas

We describe c C (e )|D (e ) using a parametric model specified in terms of

Kendall’s tau τ ∈ [−1, 1].

Let z be a vector with the value of the variables in D (e ).

Then we assume τ = σ[f (z)] , where f is an arbitrary non-linear function

and σ(x ) = 2Φ(x ) − 1 is a sigmoid function.

11



Bayesian Inference on f

We are given a sample DUV = {U i , V i }ni =1 from C C (e )|D (e ) with

corresponding values for the variables in D (e ) given by Dz = {zi }n

i =1 .

We want to identify the value of f that was used to generate the data.

We assume that f follows a priori a Gaussian process .

12



Posterior and Predictive Distributions

The posterior distribution for f = (f 1, . . . , f n)T, where f i = f (zi ), is

p (f |DUV ,Dz) = [n

i =1 c (U i , V i |τ = σ[f i ])] p (f |Dz)

p (DUV |Dz) ,

where p (f |Dz) = N (f |m0,K) is the Gaussian process prior on f .

Given zn+1, the predictive distribution for U n+1 and V n+1 is

p (u n+1, v n+1|zn+1,DUV ,Dz) =

c (u n+1, v n+1|τ = σ[f n+1])

p (f n+1|f , zn+1,Dz)p (f |DUV ,Dz)d f ,

For efficient approximate inference, we use Expectation Propagation .

13



Expectation Propagation

EP approximates p (f |DUV ,Dz) by Q(f ) = N (f |m, V) , where

EP tunes mi and v i by minimizing KL[q i (f i )Q(f )[q i (f i )]−1||Q(f )] . We use

numerical integration methods for this task.

Kernel parameters fixed by maximizing the EP approx. of p (DUV |Dz).

The total cost is O(n3) .

14



Implementation Details

We choose the following covariance function for the GP prior:

Cov[f (zi ), f (z j )] = σ exp−(zi − z j )Tdiag(λ)(zi − z j )

+ σ0 .

The mean of the GP prior is constant and equal to Φ−1((τ MLE + 1)/2) ,where τ MLE is the MLE of τ for an unconditional Gaussian copula.

We use the FITC approximation:

K approximated by K = Q + diag(K − Q), where Q = Knn0 K−1n0n0

KTnn0

.

Kn0n0 is the n0 × n0 covariance matrix for n0 n pseudo-inputs .

Knn0 contains the covariances between training points and pseudo-inputs.

The cost of EP is now O (nn20) . We choose n0 = 20.

The predictive distribution is approximated using sampling .

15



Experiments IWe compare the proposed method GPVINE with two baselines:

1 - SVINE , based on the simplifying assumption.

2 - MLLVINE , based on the maximization of the local likelihood.Can only capture dependencies on a single random variable.

Limited to regular vines with at most two trees .

All the data mapped to [0, 1]d using the ecdfs.

Synthetic Data: Z uniform in [−6, 6] and (U , V ) Gaussian withcorrelation 3/4sin(Z ). Data set of size 50.

16



Experiments II

Real-world data: UCI datasets, meteorological data, mineralconcentrations and financial data

Data split into training and test sets (50 times) with half of the data.

Average test log likelihood when limited to two trees in the vine:

17

f



Results for More than Two Trees

18

C di i l D d i i W h D



Conditional Dependencies in Weather Data

Conditional Kendall’s tau foratmospheric pressure and

cloud percentage coverwhen conditioned to latitudeand longitude near Barcelonaon 11/19/2012 at 8pm.

19

S d C l i



Summary and Conclusions

Vine copulas are flexible models for multivariate dependencies which

specify a factorization of the copula density into a product of conditionalbivariate copulas.

In practical implementations of vines, some of the conditional

dependencies in the bivariate copulas are usually ignored .

To avoid this, we have proposed a method for the estimation of fully conditional vines using Gaussian processes (GPVINE).

GPVINE outperforms a baseline that ignores conditional dependencies(SVINE) and other alternatives based on maximum local-likelihoodmethods (MLLVINE).

20

R f



References

Lopez-Paz D., Hernandez-Lobato J. M. and Ghahramani Z. Gaussian

Process Vine Copulas for Multivariate Dependence International Conferenceon Machine Learning (ICML 2013).

Acar, E. F., Craiu, R. V., and Yao, F. Dependence calibration in conditionalcopulas: A nonparametric approach. Biometrics, 67(2):445-453, 2011.

Bedford, T. and Cooke, R. M. Vines-a new graphical model for dependent

random variables. The Annals of Statistics, 30(4):1031-1068, 2002

Minka, T. P. Expectation Propagation for approximate Bayesian inference.Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence,pp. 362-369, 2001.

Naish-Guzman, A. and Holden, S. B. The generalized FITC approximation.In Advances in Neural Information Processing Systems 20, 2007.

Patton, A. J. Modelling asymmetric exchange rate dependence.International Economic Review, 47(2):527-556, 2006

21



Thank you for your attention!

22

2013 29 04 miguelhernandez

Documents