poincaré embeddings for learning hierarchical representationslcarin/becky11.30.2018.pdf2018/11/30...

Poincaré Embeddings forLearning Hierarchical Representations

Maximilian Nickel, Douwe KielaFacebook AI Research

Presented by Ke (Becky) Bai

Nov. 30th, 2018

1 / 15

Introduction

• Symbolic data exhibits underlying latent hierarchy (Tree-likestructure, power-law distributed data).• Simultaneously capture similarity and hierarchy in the

embedding space by unsupervised learning.• Introduce a novel approach for learning hierarchical

representations by embedding entities into hyperbolic space.

2 / 15

3 / 15

MotivationsThe distance in the embedding space of the symbolic data reflectstheir semantic similarity.• The nodes of trees with branching factor b>1 grows

exponentially.• The hyperbolic disc area and circle length grow exponentiallywith the radius.

4 / 15

Embeding SpacePoincaré BallPoincaré ball model of hyperbolic space is a Riemannian manifold(Bd , gx)

Bd = {x ∈ Rd | ‖x‖ < 1} (1)

d-dimensional open unit ball, where ‖ · ‖ denotes the Euclidean norm.

gx =

(2

1− ‖x‖2

)2

gE , (2)

where x ∈ Bd and gE denotes the Euclidean metric tensor. Let

kx =(

21−‖x‖2

)2

DistanceThe distance between points θ, x ∈ Bd is computed by

d(θ, x) = arcosh(1 + 2

‖θ − x‖2

(1− ‖θ‖2)(1− ‖x‖2)

). (3)

5 / 15

Optimization

Θ′ ← argminΘL(Θ) s.t. ∀θi ∈ Θ : ‖θi‖ < 1. (4)

θt+1 = θt + Rθt (−ηt∇RL(θt)) (5)

R donates the retraction onto B at θt and ηt denotes the learning rate attime t.The Riemannian gradient can be derived from the Euclidean space byrescaling ∇E with the inverse of the Poincaré ball metric tensor, i.e., k−1

θ .∇R = k−1

θ ∇E

∇E =∂L(θ)

∂d(θ, x)

∂d(θ, x)

∂θ(6)

θt+1 ← proj(θt − ηt

(1− ‖θt‖2)2

4∇E

). (7)

6 / 15

Optimization with more details

∂d(θ, x)

∂θ=

4

β√γ2 − 1

(‖x‖2 − 2〈θ, x〉+ 1

α2 θ − xα

). (8)

Where γ = 1 + 2αβ‖θ − x‖2, α = 1− ‖θ‖2 , β = 1− ‖x‖2

proj(θ) =

{θ/‖θ‖ − ε if ‖θ‖ ≥ 1θ otherwise ,

(9)

7 / 15

Comparisons

u, v are also embedding vectors like θ, x shown before.

Euclidean Distance

d(u, v) = ‖u − v‖2

Translational Distance

d(u, v) = ‖u − v + r‖2

Where r is a learned global translation vector designed forasymmetric data.

8 / 15

Application1: Embedding Taxonomies

Let D = {(u, v)} be the set of observed hypernymy relationsbetween noun pairs from WordNet. The loss function is

∑(u,v)∈D

loge−d(u,v)∑

v ′∈N (u) e−d(u,v ′)

,

where N (u) = {v | (u, v) 6∈ D} ∪ {u}

9 / 15

Application1: Results

10 / 15

Application2: Network Embeddings

Let D = {(u, v)} represents the relationships between two people ifthey co-author a paper. In this social network, the probability of aco-author edge is

P((u, v) = 1) =1

e(d(u,v)−r)/t + 1

Where r and t are hyperparameters.The loss is the cross-entropy loss based on the probability.

11 / 15


Table 1: Mean average precision for Reconstruction and Link Prediction onnetwork data.

Dimensionality

Reconstruction Link Prediction

10 20 50 100 10 20 50 100

AstroPh Euclidean 0.376 0.788 0.969 0.989 0.508 0.815 0.946 0.960N=18,772; E=198,110 Poincaré 0.703 0.897 0.982 0.990 0.671 0.860 0.977 0.988

CondMat Euclidean 0.356 0.860 0.991 0.998 0.308 0.617 0.725 0.736N=23,133; E=93,497 Poincaré 0.799 0.963 0.996 0.998 0.539 0.718 0.756 0.758

GrQc Euclidean 0.522 0.931 0.994 0.998 0.438 0.584 0.673 0.683N=5,242; E=14,496 Poincaré 0.990 0.999 0.999 0.999 0.660 0.691 0.695 0.697

HepPh Euclidean 0.434 0.742 0.937 0.966 0.642 0.749 0.779 0.783N=12,008; E=118,521 Poincaré 0.811 0.960 0.994 0.997 0.683 0.743 0.770 0.774

12 / 15

Application3: Lexical Entailment

We can quantify to what degree X is a type of Y via ratings on scaleof [0,10] to evaluate how well semantic models can capture gradedlexical entailment.

score(is-a(u, v)) = −(1 + α(‖v‖ − ‖u‖))d(u, v)

Where α is a hyper parameter representing the severity of the penalty.Penalty is ‖v‖ − ‖u‖.

Training processure

− Train the embedding using WordNet as application 1.− Use the above evaluation to score all noun pairs in HYPERLEX.− Calculate Spearman’s rank correlation with the ground-truth

ranking.

13 / 15


Table 2: Spearman’s ρ for Lexical Entailment on HyperLex.

FR SLQS-Sim WN-Basic WN-WuP WN-LCh Vis-ID Euclidean Poincaré

ρ 0.283 0.229 0.240 0.214 0.214 0.253 0.389 0.512

14 / 15

Thanks

15 / 15

poincaré embeddings for learning hierarchical representationslcarin/becky11.30.2018.pdf2018/11/30...

Documents