2013 29 04 miguelhernandez

22
Gaussian Process Vine Copulas for Multivariate Dependence Jo e Miguel Hern´ an de z- Loba to 1,2  joint work with David L´ opez-Paz 2,3 and Zoubin Ghahramani 1 1 Department of Engineering, Cambridge University, Cambridge, UK 3 Max Planck Institute for Intelligent Systems, T¨ubingen, Germany April 29, 2013 2 Both authors are equal contributors. 1

Upload: henrydcl

Post on 01-Jun-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 1/22

Gaussian Process Vine Copulas forMultivariate Dependence

Jose Miguel Hernandez-Lobato1,2

 joint work with David Lopez-Paz2,3 and Zoubin Ghahramani1

1Department of Engineering, Cambridge University, Cambridge, UK3Max Planck Institute for Intelligent Systems, Tubingen, Germany

April 29, 2013

2Both authors are equal contributors.1

Page 2: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 2/22

What is a Copula? Informal DefinitionA copula is a function that links univariate marginal distributions into a

 joint multivariate one.

The copula specifies the dependencies among the random variables.2

Page 3: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 3/22

What is a Copula? Formal DefinitionA copula is a distribution function with marginals uniform in [0, 1] .

Let  U 1, . . . , U d  be r.v. uniformly distributed in [0, 1] with copula  C   then

C (u 1, . . . , u d ) = p (U 1 ≤ u 1, . . . , U d  ≤ u d ) .

Sklar’s theorem  (connection between joints, marginals and copulas)

Any joint cdf  F (x 1, . . . , x d ) with marginal cdfs  F 1(x 1), . . . ,F d (x d ) satisfies

F (x 1, . . . , x d ) = C (F 1(x 1), . . . ,F d (x d )) ,

where  C   is the copula of  F .

It is easy to show that the joint pdf  f   can be written as

f   (x 1, . . . , x d ) = c (F 1(x 1), . . . , F d (x d ))d i =1

f  i (x i )   ,

c (u 1, . . . , u d ) and   f  1(x 1), . . . , f  d (x d ) are the copula and marginal densities.3

Page 4: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 4/22

Why are Copulas Useful in Machine Learning?

The converse of Sklar’s theorem is also true:

Given a copula  C   : [0, 1]d → [0, 1] and margins  F 1(x 1), . . . , F d (x d ) thenC (F 1(x 1), . . . , F d (x d )) represents a valid joint cdf.

Copulas are a powerful tool for the modeling of multivariate data .

We can easily extend univariate models to the multivariate regime.Copulas simplify the estimation process for multivariate models.

1 - Estimate the marginal distributions. 2 - Map the data to [0, 1]d  using the estimated marginals. 3 - Estimate a copula function given the mapped data.

Learning the marginals : easily done using standard univariate methods.

Learning the copula : difficult, requires to use copula models that i) canrepresent a broad range of dependencies and ii) are robust to overfitting.

4

Page 5: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 5/22

Parametric Copula ModelsThere are many parametric 2D copulas. Some examples are...

Usually depend on a single scalar parameter  θ  which is in a one-to-onerelationship with Kendall’s tau rank correlation coefficient, defined as

τ   = p [(U 1 − U 1)(U 2 − U 2) >  0] − p [(U 1 − U 1)(U 2 − U 2) <  0]

=   p [concordance]− p [discordance]   ,

where (U 1, U 2) and (U 1, U 2) are independent samples from the copula.

However, in higher dimensions, the number and expressiveness of 

parametric copulas is more limited .5

Page 6: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 6/22

Vine Copulas

They are hierarchical graphical models that factorize   c (u 1, . . . , u d ) intoa product of  d (d  − 1)/2 bivariate conditional copula densities.

We can factorize  c (u 1, u 2, u 3) using the product rule of probability as

c (u 1, u 2, u 3) = f  3|12(u 3|u 1, u 2)f  2|1(u 2|u 1)

and we can express each factor in terms of bivariate copula functions

6

Page 7: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 7/22

Page 8: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 8/22

Regular VinesA regular vine specifies a factorization of  c (u 1, . . . , u d ).

Formed by   d  − 1 trees   T 1, . . . , T d −1  with node and edge sets  V i   and  E i .

Each edge  e   in any tree has associated three sets of variables  C (e ),  D (e ),N (e )  ⊆ {1, . . . , d }   called conditioned, conditioning and constraint sets.

V 1 = {1, . . . , d }  and  E 1   forms a spanning tree over a complete graph  G 1

over  V 1. For any  e  ∈ E 1,  C (e ) = N (e ) = e   and  D (e ) = ∅.

For   i  > 1,   V i  = E i −1   and  E i   forms a spanning tree over a graph  G i   with

nodes  V i   and edges  e  = {e 1, e 2}  such that  e 1, e 2 ∈ E i −1  and  e 1 ∩ e 2 = ∅.

For any  e  = {e 1, e 2} ∈ E i ,   i  > 1, we have that  C (e ) = N (e 1)∆N (e 2),D (e ) = N (e 1) ∩ N (e 2) and  N (e ) = N (e 1) ∪ N (e 2).

c (u 1, . . . , u d ) =d −1i =1

e ∈E i 

c C (e )|D (e )   .

8

Page 9: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 9/22

Example of a Regular Vine

9

Page 10: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 10/22

Using Regular Vines in Practice

Selecting a particular factorization:

Many possible factorizations . Each one determined by the specificchoices of spanning trees  T 1, . . . , T d −1.

In practice, each tree  T i   is chosen by assigning a weight to each edge in

G i   and then selecting the corresponding maximum spanning tree.

The weight for the edge   e   is usually related to the dependence levelbetween the variables in  C (e ) (often measured in terms of Kendall’s tau).

It is common to prune the vine and consider only a few of the first trees.

Dealing with conditional bivariate copulas:

Use the simplifying assumption :   c C (e )|D (e )  does not depend on  D (e ).

Our main contribution: avoid making use of the simplifying assumption.

10

Page 11: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 11/22

A Semi-parametric Model for Conditional Copulas

We describe   c C (e )|D (e )   using a parametric model specified in terms of 

Kendall’s tau  τ  ∈ [−1, 1].

Let   z  be a vector with the value of the variables in  D (e ).

Then we assume   τ  = σ[f   (z)] , where   f    is an arbitrary non-linear function

and   σ(x ) = 2Φ(x ) − 1 is a sigmoid function.

11

Page 12: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 12/22

Bayesian Inference on   f  

We are given a sample   DUV   = {U i , V i }ni =1   from  C C (e )|D (e )  with

corresponding values for the variables in  D (e ) given by   Dz  = {zi }n

i =1 .

We want to identify the value of   f   that was used to generate the data.

We assume that   f    follows  a priori  a Gaussian process .

12

Page 13: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 13/22

Posterior and Predictive Distributions

The posterior distribution for  f  = (f  1, . . . , f  n)T, where   f  i  = f   (zi ), is

p (f |DUV ,Dz) = [n

i =1 c (U i , V i |τ  = σ[f  i ])] p (f |Dz)

p (DUV |Dz)  ,

where  p (f |Dz) =  N (f |m0,K) is the Gaussian process prior on  f .

Given  zn+1, the predictive distribution for  U n+1  and  V n+1   is

p (u n+1, v n+1|zn+1,DUV ,Dz) =

   c (u n+1, v n+1|τ  = σ[f  n+1])

p (f  n+1|f , zn+1,Dz)p (f |DUV ,Dz)d f  ,

For efficient approximate inference, we use Expectation Propagation .

13

Page 14: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 14/22

Expectation Propagation

EP approximates  p (f |DUV ,Dz) by   Q(f ) =  N (f |m, V) , where

EP tunes mi  and v i   by minimizing KL[q i (f  i )Q(f )[q i (f  i )]−1||Q(f )] . We use

numerical integration methods for this task.

Kernel parameters fixed by maximizing the EP approx. of  p (DUV |Dz).

The total cost is   O(n3) .

14

Page 15: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 15/22

Implementation Details

We choose the following covariance function for the GP prior:

Cov[f   (zi ), f   (z j )] = σ exp−(zi  − z j )Tdiag(λ)(zi  − z j )

+ σ0 .

The mean of the GP prior is constant and equal to Φ−1((τ MLE  + 1)/2) ,where τ MLE   is the MLE of  τ   for an unconditional Gaussian copula.

We use the FITC approximation:

K  approximated by  K = Q + diag(K − Q), where  Q = Knn0 K−1n0n0

KTnn0

.

Kn0n0   is the  n0 × n0  covariance matrix for  n0  n pseudo-inputs  .

Knn0  contains the covariances between training points and pseudo-inputs.

The cost of EP is now   O (nn20) . We choose  n0  = 20.

The predictive distribution is approximated using sampling .

15

Page 16: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 16/22

Experiments IWe compare the proposed method GPVINE with two baselines:

1 - SVINE , based on the simplifying assumption.

2 - MLLVINE , based on the maximization of the local likelihood.Can only capture dependencies on a single random variable.

Limited to regular vines with at most two trees .

All the data mapped to [0, 1]d  using the ecdfs.

Synthetic Data:   Z  uniform in [−6, 6] and (U , V ) Gaussian withcorrelation 3/4sin(Z ). Data set of size 50.

16

Page 17: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 17/22

Experiments II

Real-world data: UCI datasets, meteorological data, mineralconcentrations and financial data

Data split into training and test sets (50 times) with half of the data.

Average test log likelihood when limited to two trees in the vine:

17

f

Page 18: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 18/22

Results for More than Two Trees

18

C di i l D d i i W h D

Page 19: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 19/22

Conditional Dependencies in Weather Data

Conditional Kendall’s tau foratmospheric pressure and

cloud percentage coverwhen conditioned to latitudeand longitude near Barcelonaon 11/19/2012 at 8pm.

19

S d C l i

Page 20: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 20/22

Summary and Conclusions

Vine copulas are flexible models for multivariate dependencies which

specify a factorization of the copula density into a product of conditionalbivariate copulas.

In practical implementations of vines, some of the conditional

dependencies in the bivariate copulas are usually ignored .

To avoid this, we have proposed a method for the estimation of fully conditional vines using Gaussian processes (GPVINE).

GPVINE outperforms a baseline that ignores conditional dependencies(SVINE) and other alternatives based on maximum local-likelihoodmethods (MLLVINE).

20

R f

Page 21: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 21/22

References

Lopez-Paz D., Hernandez-Lobato J. M. and Ghahramani Z. Gaussian

Process Vine Copulas for Multivariate Dependence International Conferenceon Machine Learning (ICML 2013).

Acar, E. F., Craiu, R. V., and Yao, F. Dependence calibration in conditionalcopulas: A nonparametric approach. Biometrics, 67(2):445-453, 2011.

Bedford, T. and Cooke, R. M. Vines-a new graphical model for dependent

random variables. The Annals of Statistics, 30(4):1031-1068, 2002

Minka, T. P. Expectation Propagation for approximate Bayesian inference.Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence,pp. 362-369, 2001.

Naish-Guzman, A. and Holden, S. B. The generalized FITC approximation.In Advances in Neural Information Processing Systems 20, 2007.

Patton, A. J. Modelling asymmetric exchange rate dependence.International Economic Review, 47(2):527-556, 2006

21

Page 22: 2013 29 04 MiguelHernandez

8/9/2019 2013 29 04 MiguelHernandez

http://slidepdf.com/reader/full/2013-29-04-miguelhernandez 22/22

Thank you for your attention!

22