poincaré embeddings for learning hierarchical representationslcarin/becky11.30.2018.pdf2018/11/30...

15
Poincaré Embeddings for Learning Hierarchical Representations Maximilian Nickel, Douwe Kiela Facebook AI Research Presented by Ke (Becky) Bai Nov. 30th, 2018 1 / 15

Upload: others

Post on 30-Dec-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Poincaré Embeddings for Learning Hierarchical Representationslcarin/Becky11.30.2018.pdf2018/11/30  · Poincaré Embeddings for Learning Hierarchical Representations MaximilianNickel,DouweKiela

Poincaré Embeddings forLearning Hierarchical Representations

Maximilian Nickel, Douwe KielaFacebook AI Research

Presented by Ke (Becky) Bai

Nov. 30th, 2018

1 / 15

Page 2: Poincaré Embeddings for Learning Hierarchical Representationslcarin/Becky11.30.2018.pdf2018/11/30  · Poincaré Embeddings for Learning Hierarchical Representations MaximilianNickel,DouweKiela

Introduction

• Symbolic data exhibits underlying latent hierarchy (Tree-likestructure, power-law distributed data).• Simultaneously capture similarity and hierarchy in the

embedding space by unsupervised learning.• Introduce a novel approach for learning hierarchical

representations by embedding entities into hyperbolic space.

2 / 15

Page 3: Poincaré Embeddings for Learning Hierarchical Representationslcarin/Becky11.30.2018.pdf2018/11/30  · Poincaré Embeddings for Learning Hierarchical Representations MaximilianNickel,DouweKiela

3 / 15

Page 4: Poincaré Embeddings for Learning Hierarchical Representationslcarin/Becky11.30.2018.pdf2018/11/30  · Poincaré Embeddings for Learning Hierarchical Representations MaximilianNickel,DouweKiela

MotivationsThe distance in the embedding space of the symbolic data reflectstheir semantic similarity.• The nodes of trees with branching factor b>1 grows

exponentially.• The hyperbolic disc area and circle length grow exponentiallywith the radius.

4 / 15

Page 5: Poincaré Embeddings for Learning Hierarchical Representationslcarin/Becky11.30.2018.pdf2018/11/30  · Poincaré Embeddings for Learning Hierarchical Representations MaximilianNickel,DouweKiela

Embeding SpacePoincaré BallPoincaré ball model of hyperbolic space is a Riemannian manifold(Bd , gx)

Bd = {x ∈ Rd | ‖x‖ < 1} (1)

d-dimensional open unit ball, where ‖ · ‖ denotes the Euclidean norm.

gx =

(2

1− ‖x‖2

)2

gE , (2)

where x ∈ Bd and gE denotes the Euclidean metric tensor. Let

kx =(

21−‖x‖2

)2

DistanceThe distance between points θ, x ∈ Bd is computed by

d(θ, x) = arcosh(1 + 2

‖θ − x‖2

(1− ‖θ‖2)(1− ‖x‖2)

). (3)

5 / 15

Page 6: Poincaré Embeddings for Learning Hierarchical Representationslcarin/Becky11.30.2018.pdf2018/11/30  · Poincaré Embeddings for Learning Hierarchical Representations MaximilianNickel,DouweKiela

Optimization

Θ′ ← argminΘL(Θ) s.t. ∀θi ∈ Θ : ‖θi‖ < 1. (4)

θt+1 = θt + Rθt (−ηt∇RL(θt)) (5)

R donates the retraction onto B at θt and ηt denotes the learning rate attime t.The Riemannian gradient can be derived from the Euclidean space byrescaling ∇E with the inverse of the Poincaré ball metric tensor, i.e., k−1

θ .∇R = k−1

θ ∇E

∇E =∂L(θ)

∂d(θ, x)

∂d(θ, x)

∂θ(6)

θt+1 ← proj(θt − ηt

(1− ‖θt‖2)2

4∇E

). (7)

6 / 15

Page 7: Poincaré Embeddings for Learning Hierarchical Representationslcarin/Becky11.30.2018.pdf2018/11/30  · Poincaré Embeddings for Learning Hierarchical Representations MaximilianNickel,DouweKiela

Optimization with more details

∂d(θ, x)

∂θ=

4

β√γ2 − 1

(‖x‖2 − 2〈θ, x〉+ 1

α2 θ − xα

). (8)

Where γ = 1 + 2αβ‖θ − x‖2, α = 1− ‖θ‖2 , β = 1− ‖x‖2

proj(θ) =

{θ/‖θ‖ − ε if ‖θ‖ ≥ 1θ otherwise ,

(9)

7 / 15

Page 8: Poincaré Embeddings for Learning Hierarchical Representationslcarin/Becky11.30.2018.pdf2018/11/30  · Poincaré Embeddings for Learning Hierarchical Representations MaximilianNickel,DouweKiela

Comparisons

u, v are also embedding vectors like θ, x shown before.

Euclidean Distance

d(u, v) = ‖u − v‖2

Translational Distance

d(u, v) = ‖u − v + r‖2

Where r is a learned global translation vector designed forasymmetric data.

8 / 15

Page 9: Poincaré Embeddings for Learning Hierarchical Representationslcarin/Becky11.30.2018.pdf2018/11/30  · Poincaré Embeddings for Learning Hierarchical Representations MaximilianNickel,DouweKiela

Application1: Embedding Taxonomies

Let D = {(u, v)} be the set of observed hypernymy relationsbetween noun pairs from WordNet. The loss function is

∑(u,v)∈D

loge−d(u,v)∑

v ′∈N (u) e−d(u,v ′)

,

where N (u) = {v | (u, v) 6∈ D} ∪ {u}

9 / 15

Page 10: Poincaré Embeddings for Learning Hierarchical Representationslcarin/Becky11.30.2018.pdf2018/11/30  · Poincaré Embeddings for Learning Hierarchical Representations MaximilianNickel,DouweKiela

Application1: Results

10 / 15

Page 11: Poincaré Embeddings for Learning Hierarchical Representationslcarin/Becky11.30.2018.pdf2018/11/30  · Poincaré Embeddings for Learning Hierarchical Representations MaximilianNickel,DouweKiela

Application2: Network Embeddings

Let D = {(u, v)} represents the relationships between two people ifthey co-author a paper. In this social network, the probability of aco-author edge is

P((u, v) = 1) =1

e(d(u,v)−r)/t + 1

Where r and t are hyperparameters.The loss is the cross-entropy loss based on the probability.

11 / 15

Page 12: Poincaré Embeddings for Learning Hierarchical Representationslcarin/Becky11.30.2018.pdf2018/11/30  · Poincaré Embeddings for Learning Hierarchical Representations MaximilianNickel,DouweKiela

Application2: Results

Table 1: Mean average precision for Reconstruction and Link Prediction onnetwork data.

Dimensionality

Reconstruction Link Prediction

10 20 50 100 10 20 50 100

AstroPh Euclidean 0.376 0.788 0.969 0.989 0.508 0.815 0.946 0.960N=18,772; E=198,110 Poincaré 0.703 0.897 0.982 0.990 0.671 0.860 0.977 0.988

CondMat Euclidean 0.356 0.860 0.991 0.998 0.308 0.617 0.725 0.736N=23,133; E=93,497 Poincaré 0.799 0.963 0.996 0.998 0.539 0.718 0.756 0.758

GrQc Euclidean 0.522 0.931 0.994 0.998 0.438 0.584 0.673 0.683N=5,242; E=14,496 Poincaré 0.990 0.999 0.999 0.999 0.660 0.691 0.695 0.697

HepPh Euclidean 0.434 0.742 0.937 0.966 0.642 0.749 0.779 0.783N=12,008; E=118,521 Poincaré 0.811 0.960 0.994 0.997 0.683 0.743 0.770 0.774

12 / 15

Page 13: Poincaré Embeddings for Learning Hierarchical Representationslcarin/Becky11.30.2018.pdf2018/11/30  · Poincaré Embeddings for Learning Hierarchical Representations MaximilianNickel,DouweKiela

Application3: Lexical Entailment

We can quantify to what degree X is a type of Y via ratings on scaleof [0,10] to evaluate how well semantic models can capture gradedlexical entailment.

score(is-a(u, v)) = −(1 + α(‖v‖ − ‖u‖))d(u, v)

Where α is a hyper parameter representing the severity of the penalty.Penalty is ‖v‖ − ‖u‖.

Training processure

− Train the embedding using WordNet as application 1.− Use the above evaluation to score all noun pairs in HYPERLEX.− Calculate Spearman’s rank correlation with the ground-truth

ranking.

13 / 15

Page 14: Poincaré Embeddings for Learning Hierarchical Representationslcarin/Becky11.30.2018.pdf2018/11/30  · Poincaré Embeddings for Learning Hierarchical Representations MaximilianNickel,DouweKiela

Application3: Results

Table 2: Spearman’s ρ for Lexical Entailment on HyperLex.

FR SLQS-Sim WN-Basic WN-WuP WN-LCh Vis-ID Euclidean Poincaré

ρ 0.283 0.229 0.240 0.214 0.214 0.253 0.389 0.512

14 / 15

Page 15: Poincaré Embeddings for Learning Hierarchical Representationslcarin/Becky11.30.2018.pdf2018/11/30  · Poincaré Embeddings for Learning Hierarchical Representations MaximilianNickel,DouweKiela

Thanks

15 / 15