exploring and measuring non-linear correlations

1
Exploring and measuring non-linear correlations G. Marti , S. Andler †‡ , F. Nielsen , P. Donnat (presented by M. Binkowski †* ) Hellebore Capital Ltd, Ecole Polytechnique, ENS de Lyon, * Imperial College London Motivations Interpretability of pairwise dependence Summary of associations between many variables Find abnormal dependence patterns Design robust and custom dependence coefficients Query the dataset for specific associations Realistic simulations of market variables Copulas Sklar’s Theorem Let X = (X i ,X j ) be a random vector with a joint cumulative distribution function F , and having continuous marginal cumulative distribu- tion functions F i ,F j respectively. Then, there exists a unique distribution C such that F (X i ,X j )= C (F i (X i ),F j (X j )). C , the copula of X , is the bivariate distribution of uniform marginals U i ,U j := F i (X i ),F j (X j ). Fréchet-Hoeffding copula bounds Figure 1: Copulas measure (left column) and cumulative dis- tribution function (right column) heatmaps for negative de- pendence (first row), independence (second row), i.e. the uniform distribution over [0, 1] 2 , and positive dependence (third row) The methodology - Clustering of copulas & custom dependence coefficients The methodology leverages copulas for encoding depen- dence between two variables, state-of-the-art optimal transport for providing a relevant geometry to the cop- ulas, and clustering for summarizing the main depen- dence patterns found between the variables. Some of the clusters centers can be used to parameterize a cus- tom dependence coefficient. Target/Forget Dependence Coefficient: Let {C - l } l be the set of forget-dependence copulas, and {C + k } k be the set of target-dependence copulas. Let C be the copula of (X i ,X j ). TFDC X i ,X j ; {C + k } k , {C - l } l := min l d M (C - l ,C ) min l d M (C - l ,C ) + min k d M (C, C + k ) [0, 1]. Which geometry for copulas? In [1], we detail the benefit of optimal transport over information divergences for clustering copulas. Figure 2: Copulas C 1 ,C 2 ,C 3 encoding a correlation of 0.5, 0.99, 0.9999 respectively; Which pair of copulas is the near- est? For Fisher-Rao, Kullback-Leibler, Hellinger and related di- vergences: D(C 1 ,C 2 ) D(C 2 ,C 3 ); W 2 (C 2 ,C 3 ) W 2 (C 1 ,C 2 ) We use results from [2], [3] to compute faster the distances and barycenters needed for the clustering. Figure 3: Barycenter for: (left) Bregman geometry (which in- cludes, for example, squared Euclidean and Kullback-Leibler dis- tances); (right) Wasserstein geometry. Copulas of financial time series We apply clustering to the N 2 bivariate copulas of a financial time series dataset consisting in daily re- turns of stocks, credit default swaps and FX rates. Figure 4: Credit default swaps: More mass in the top-right corner, i.e. upper tail dependence. Insurance cost against the default of companies tends to soar in distressed market. Queries about dependence (A) (B) (C) (D) Figure 5: Target copulas (simulated or handcrafted) and their respective nearest copulas which answer questions A,B,C,D (A) most Gaussian with ρ =0.7? (B) both positively and negatively correlated? (C) extreme returns for one, small for the other? (D) uncorrelated but correlated for small returns? References [1] G. Marti, S. Andler, F. Nielsen, P. Donnat, IEEE Statistical Signal Processing Workshop (2016), 1-5. [2] M. Cuturi, Advances in Neural Information Processing Systems (2013), 2292-2300. [3] M. Cuturi, A. Doucet, Proceedings of the 31th International Conference on Machine Learning (2014), 685-693. HELLEBORE CAPITAL

Upload: hellebore-capital-limited

Post on 16-Jan-2017

64 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Exploring and measuring non-linear correlations

Exploring and measuring non-linear correlationsG. Marti†?, S. Andler†‡, F. Nielsen?, P. Donnat† (presented by M. Binkowski†∗)

†Hellebore Capital Ltd, ?Ecole Polytechnique, ‡ENS de Lyon, ∗Imperial College London

Motivations

• Interpretability of pairwise dependence•Summary of associations between many variables•Find abnormal dependence patterns•Design robust and custom dependence coefficients•Query the dataset for specific associations•Realistic simulations of market variables

Copulas

Sklar’s Theorem

Let X = (Xi, Xj) be a random vector witha joint cumulative distribution function F , andhaving continuous marginal cumulative distribu-tion functions Fi, Fj respectively. Then, thereexists a unique distribution C such that

F (Xi, Xj) = C(Fi(Xi), Fj(Xj)).C, the copula of X , is the bivariate distributionof uniform marginals Ui, Uj := Fi(Xi), Fj(Xj).

Fréchet-Hoeffding copula bounds

0 0.5 1

ui

0

0.5

1

uj

w(ui, uj)

0.000

0.002

0.004

0.006

0.008

0.010

0.012

0.014

0.016

0.018

0.020

0 0.5 1

ui

0

0.5

1

uj

W(ui, uj)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 0.5 1

ui

0

0.5

1

uj

π(ui, uj)

0.00036

0.00037

0.00038

0.00039

0.00040

0.00041

0.00042

0.00043

0.00044

0 0.5 1

ui

0

0.5

1

uj

Π(ui, uj)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 0.5 1

ui

0

0.5

1

uj

m(ui, uj)

0.000

0.002

0.004

0.006

0.008

0.010

0.012

0.014

0.016

0.018

0.020

0 0.5 1

ui

0

0.5

1

uj

M(ui, uj)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Figure 1: Copulas measure (left column) and cumulative dis-tribution function (right column) heatmaps for negative de-pendence (first row), independence (second row), i.e. theuniform distribution over [0, 1]2, and positive dependence(third row)

The methodology - Clustering of copulas & custom dependence coefficients

The methodology leverages copulas for encoding depen-dence between two variables, state-of-the-art optimaltransport for providing a relevant geometry to the cop-ulas, and clustering for summarizing the main depen-dence patterns found between the variables. Some ofthe clusters centers can be used to parameterize a cus-tom dependence coefficient.

Target/Forget Dependence Coefficient: Let {C−l }l bethe set of forget-dependence copulas, and {C+

k }k be theset of target-dependence copulas. Let C be the copulaof (Xi, Xj).TFDC

(Xi, Xj; {C+

k }k, {C−l }l)

:=minl dM(C−l , C)

minl dM(C−l , C) + mink dM(C,C+k )∈ [0, 1].

Which geometry for copulas?

In [1], we detail the benefit of optimal transport overinformation divergences for clustering copulas.

Figure 2: Copulas C1, C2, C3 encoding a correlation of0.5, 0.99, 0.9999 respectively; Which pair of copulas is the near-est? For Fisher-Rao, Kullback-Leibler, Hellinger and related di-vergences: D(C1, C2) ≤ D(C2, C3); W2(C2, C3) ≤W2(C1, C2)

We use results from [2], [3] to compute faster thedistances and barycenters needed for the clustering.

0 0.5 10

0.5

1Bregman barycenter copula

0.0000

0.0008

0.0016

0.0024

0.0032

0.0040

0.0048

0.0056

0 0.5 10

0.5

1Wasserstein barycenter copula

0.0000

0.0004

0.0008

0.0012

0.0016

0.0020

0.0024

0.0028

0.0032

Figure 3: Barycenter for: (left) Bregman geometry (which in-cludes, for example, squared Euclidean and Kullback-Leibler dis-tances); (right) Wasserstein geometry.

Copulas of financial time series

We apply clustering to the(N2)bivariate copulas of

a financial time series dataset consisting in daily re-turns of stocks, credit default swaps and FX rates.

Figure 4: Credit default swaps: More mass in the top-rightcorner, i.e. upper tail dependence. Insurance cost against thedefault of companies tends to soar in distressed market.

Queries about dependence

(A) (B) (C) (D)Figure 5: Target copulas (simulated or handcrafted) and theirrespective nearest copulas which answer questions A,B,C,D

• (A) most Gaussian with ρ = 0.7?• (B) both positively and negatively correlated?• (C) extreme returns for one, small for the other?• (D) uncorrelated but correlated for small returns?

References

[1] G. Marti, S. Andler, F. Nielsen, P. Donnat, IEEEStatistical Signal Processing Workshop (2016), 1-5.

[2] M. Cuturi, Advances in Neural Information ProcessingSystems (2013), 2292-2300.

[3] M. Cuturi, A. Doucet, Proceedings of the 31thInternational Conference on Machine Learning (2014),685-693.

HELLEBORECAPITAL