hyperlink classification via structured graph...

21
Hyperlink Classification via Structured Graph Embedding 1 , 2 , and ∗1 1 (), 2 , *corresponding author The 42 nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019 1

Upload: others

Post on 20-May-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Hyperlink Classification via Structured Graph Embedding

𝐺𝑒𝑜𝑛 𝐿𝑒𝑒1, 𝑆𝑒𝑜𝑛𝑔𝑔𝑜𝑜 𝐾𝑎𝑛𝑔2, and 𝐽𝑜𝑦𝑐𝑒 𝐽𝑖𝑦𝑜𝑢𝑛𝑔𝑊ℎ𝑎𝑛𝑔∗1

1𝑆𝑢𝑛𝑔𝑘𝑦𝑢𝑛𝑘𝑤𝑎𝑛 𝑈𝑛𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦 (𝑆𝐾𝐾𝑈), 2𝑁𝑎𝑣𝑒𝑟 𝐶𝑜𝑟𝑝𝑜𝑟𝑎𝑡𝑖𝑜𝑛, *corresponding author

The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

1

Page 2: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Contents

1. Outline

2. Datasets and Challenges

3. Knowledge Graph Embedding

4. Hyperlink Classification Model

5. Result

6. Conclusion and Future Work

2

Page 3: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Outline

Three types of hyperlinks:

Navigation Link

Links related to navigation within a site, such as site path or site recommendation.

Suggestion Link

Links recommended directly by the user using the site.

Action Link

Links that require other user actions.

3

Page 4: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Outline

Navigation Link

4

Page 5: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Outline

Suggestion Link

5

Page 6: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Outline

Action Link

6

Page 7: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Conduct a biased random walk from the seed page. (By NAVER)

7

Datasets and Challenges

Seed Page

Page 8: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Datasets

8

Datasets and Challenges

1284 (89%)

93 (6%)

65 (5%)

Navigation Suggestion Action

9892 (99%)

85 (1%)23 (0%)

Navigation Suggestion Action

268 (61%)

112 (26%)

57 (13%)

Navigation Suggestion Action

graph_437 graph_1442 graph_10000

graph_437 graph_1442 Graph_10000

No. of Hyperlinks 437 1442 10000

No. of Pages 404 332 2202

No. of Domain 18 100 120

No. of Host - 22 25

Page 9: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Knowledge Graph Embedding

Can the knowledge graph embedding method be applied to hyperlink embedding?

9

NAVI

NAVI

NAVI

NAVI

SUGG

ACTION

Knowledge Graph Web Graph

Knowledge graph image from https://medium.com/comet-ml/using-fasttext-and-comet-ml-to-classify-relationships-in-knowledge-graphs-e73d27b40d67

Page 10: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Knowledge Graph Embedding

Can the knowledge graph embedding method be applied to hyperlink embedding?

10

(Bob, is_interested_in, The_Mona_Lisa)

(Bob, _is_a_friend_of, Alice)

(Barack Obama, _place_of_birth, Hawaii)

(Albert Einstein, _follows_diet, Veganism)

(San Francisco, _contains, Telegraph Hill)

(www.naver.com, NAVI, www.news.naver.com)

(www.google.com/larry_page, SUGG, www.wikipedia.org/Larry_Page)

(www.wikipedia.org/Larry_Page, ACTION, www.wikipedia.org/Larry_Page/edit)

Page 11: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Knowledge Graph Embedding

TransE : Translating Embeddings for Modeling Multi-relational Data (NIPS 2013)

TransH : Knowledge Graph Embedding by Translating on Hyperplanes (AAAI 2014)

TransR : Learning Entity and Relation Embeddings for Knowledge Graph Completion (AAAI 2015)

11Image from “Knowledge graph embedding: A survey of approaches and applications.”, TKDE 2017.

Page 12: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Knowledge Graph Embedding

TransE, TransH, and TransR methods minimize the following loss function:

L =

(h,r,t)∈S

(h′,r,t’)∈S’

𝑓 h, r, t + 𝛾 − 𝑓(h′, r, t’) +

where x + ≡ max(0, x) and 𝛾 is the margin.

S is a set of golden triplets and S’ is a set of corrupted triplets.

The method for calculating 𝑓(h, r, t) depends on the model.

TransE : 𝑓 h, r, t = h + r − t 22

TransH : 𝑓 h, r, t = h⊥ + r − t⊥ 22

TransR : 𝑓(h, r, t) = hr + r − tr 22

e⊥ and er represent projected entity e on relation-specific hyperplane and space, respectively.

12

Page 13: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Hyperlink Classification Model

Given two web pages 𝑃𝑢 and 𝑃𝑣, and set of hyperlink types R, estimate the type of hyperlink

between 𝑃𝑢 and 𝑃𝑣 as follow.

𝑟∗ = 𝒂𝒓𝒈𝐦𝐢𝐧𝑟∈R

𝑓(𝑷𝒖, 𝑟, 𝑷𝒗)

where 𝑓(ℎ, 𝑟, 𝑡) is the loss of triplet (ℎ, 𝑟, 𝑡).

r∗ is the predicted relation.

13

Page 14: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Hyperlink Classification Model

Negative Sampling

TransE / TransH / TransR → only changes entities when making negative samples

(ℎ, 𝑟, 𝑡) → (ℎ′, 𝑟, 𝑡) or (ℎ, 𝑟, 𝑡’)

Introduce parameter 𝛼 (0 ≤ 𝛼 ≤ 1).

• Replace the entity with the probability 𝛼. (ℎ, 𝑟, 𝑡) → (ℎ′, 𝑟, 𝑡) or (ℎ, 𝑟, 𝑡’)

• Replace the relation with the probability 1 − 𝛼. (ℎ, 𝑟, 𝑡) → (ℎ, 𝑟′, 𝑡)

14

Page 15: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Hyperlink Classification Model

Negative Sampling

False Negative

When we corrupt entities, there is a chance that it is not a corrupted one but just an unobserved

one in the training set.

15

(𝑝1, 𝑛, 𝑝2)

Corrupt entity (𝑝1, 𝑛, 𝑝3) : may exist in valid or test set → risky!

(𝑝1, 𝑟, 𝑝2) : guaranteed that it should not hold → safe!Corrupt relation

Page 16: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Result

The average F1 scores(%) of our model with different 𝜶 and the original TransE, TransH, and TransR.

16

Page 17: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Result

F1 Scores(%) of each relation and the average F1 scores.

17

Page 18: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Result

The average F1, precision, recall on the three web graphs.

• Random-predict indicates the performance of random prediction while preserving the number of

hyperlinks in each class.

• We use the result of TransH with 𝛼 = 0.3, 𝛼 = 0.5, and 𝛼 = 0.3 for web_437, web_1442, and web_10000,

respectively.

18

Page 19: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Result

Performance on the original web graphs and randomly shuffled graphs where the relation labels are

randomly assigned while preserving the number of each relation type.

The real-world web graphs have characterized structures, which enables the knowledge graph embedding

techniques to reasonably work well on predicting the relation labels.

19

Page 20: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Conclusion and Future Work

Conclusion

• The knowledge graph embedding techniques can be efficiently used for hyperlink embedding.

• Performance improvement can be observed by modifying the negative sampling method.

Future Work

• Utilize information from web pages or hyperlinks (such as anchor text) in hyperlink embeddings.

20

Page 21: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdf · Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Thank You

21