![Page 1: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/1.jpg)
Meta Paths and Meta Structures:
Analysing Large Heterogeneous
Information Networks
Reynold
Cheng
Zhipeng
Huang
Yudian
Zheng
Jing
Yan
Ka Yu
Wong
Eddie
Ng
Database
Group:
![Page 2: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/2.jpg)
Information is Everywhere !oSocial Networking Websites
2https://makeawebsitehub.com/social-media-sites/
![Page 3: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/3.jpg)
Information is Everywhere !oBiological Network
3http://serious-science.org/controlling-noisy-dynamics-in-biological-networks-to-fight-cancer-5376
![Page 4: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/4.jpg)
Information is Everywhere !
oResearch Collaboration Network
4https://scholarlykitchen.sspnet.org/2017/04/07/updated-figures-scale-nature-researchers-use-
scholarly-collaboration-networks/
![Page 5: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/5.jpg)
Information is Everywhere !
oProduct Recommendation Network
5
http://www.sciencedirect.com/science/article/pii/S0957417413006921
Byunghak Leem. Heuiju Chun. An impact of online recommendation network on demand
![Page 6: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/6.jpg)
The Real WorldoHeterogeneous Information Network(s),
i.e. HIN(s).
oNetworks: Nodes & Links
– Nodes: Various Types
– Links: Various Types
6Yangqiu Song. Recent Development of Heterogeneous Information Networks: From Meta-paths to Meta-graphs
![Page 7: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/7.jpg)
Example HINs
oDBLP Bibliographic
Network
oNetworks: Nodes & Links
– Node (Type):
• KDD (Venue)
• Jiawei Han (Author)
– Link (Type):
• Write (Author Paper)
• Publish (Paper Venue)
7Jiawei Han. A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks.
![Page 8: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/8.jpg)
Example HINs
oThe IMDB Movie
Network
oNetworks: Nodes & Links
– Node (Type):
• Forrest Gump (Movie)
• Tom Cruise (Actor)
– Link (Type):
• Make (Producer Movie)
• Act (Author Movie)
8Jiawei Han. A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks.
![Page 9: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/9.jpg)
Example HINs
oThe Facebook Network
oNetworks:
– Node (Type):
• Jimmy (User)
• Coca Cola (Product)
– Link (Type):
• Like (User Product)
• Follow (User User)
9Jiawei Han. A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks.
![Page 10: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/10.jpg)
HINs are Ubiquitous !oHealthcare
– Doctor, Patient, Disease
oSource Code Repository
– Project, Developer, Repository
oE-Commerce
– Seller, Buyer, Product
oNews
– Author, Organization
10Jiawei Han. A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks.
![Page 11: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/11.jpg)
Knowledge Graph (KG)oTurn Web Knowledge into KG
11Gerhard Weikum. Knowledge Graphs: from a Fistful of Triples to Deep Data and Deep Text.
![Page 12: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/12.jpg)
Knowledge Graph (KG)oExample KGs
12Gerhard Weikum. Knowledge Graphs: from a Fistful of Triples to Deep Data and Deep Text.
![Page 13: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/13.jpg)
Knowledge Graph (KG)oStatistics in Existing KGs
13Gerhard Weikum. Knowledge Graphs: from a Fistful of Triples to Deep Data and Deep Text.
![Page 14: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/14.jpg)
Problems in HIN
oLink Prediction
oEntity Profiling
oData Integration
14
Yangqiu Song. Recent Development of Heterogeneous Information Networks: From Meta-paths to Meta-graphs
Yutao Zhang, Jie Tang, Zhilin Yang, Jian Pei, and Philip S. Yu. COSNET: Connecting Heterogeneous Social
Networks with Local and Global Consistency, KDD 2015.
![Page 15: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/15.jpg)
15
Overview of the Tutorial
oRelevance Search
Find Similar/Relevant Objects in Networks
oExamples
DBLP1
▪ Who are most similar to Jiawei Han ?
▪ Whose recent publication is relevant with Jiawei Han’s research ?
1 http://dblp.uni-trier.de/
![Page 16: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/16.jpg)
oWhere do relations (meta-path)
come from?– Provided by experts [Sun VLDB’11]
• Not easy for a complex schema!
16
Changping Meng, Reynold Cheng, Silviu Maniu, Pierre Senellart, and Wangda Zhang. “Discovering Meta-
Paths in Large Heterogeneous Information Networks”, in WWW 2015.
Overview of the Tutorial
![Page 17: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/17.jpg)
o Query Recommendation: to suggest
alternate relevant queries to a search
engine user
o How will HIN benefit query
recommendation ?
17
Zhipeng Huang, Bogdan Cautis, Reynold Cheng, Yudian Zheng. KB-Enabled Query Recommendation for
Long-Tail Queries. CIKM 2016.
Overview of the Tutorial
![Page 18: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/18.jpg)
o How can we express using more complexstructure?
o More Expressive (i.e., contain more information)than a meta path.
18
Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, Xiang Li. Meta Structure:
Computing Relevance in Large Heterogeneous Information Networks. KDD 2016.
Overview of the Tutorial
![Page 19: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/19.jpg)
Outline◦ Introduction
– Motivation
– Heterogeneous Information Network (HIN)
– Applications
◦ Meta-Path
– Definition
– Similarity Search
– Meta-Path Discovery
– Query Recommendation
◦ Meta-Structure
– Definition
– Relevance Search
◦ Demo
◦ Conclusions & Future Work19
![Page 20: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/20.jpg)
Definition of Meta-PathoDefinition [Sun et al. VLDB 2011]
20Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K
Similarity Search in Heterogeneous Information Networks. VLDB 2011.
oExample
![Page 21: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/21.jpg)
Outline◦ Introduction
– Motivation
– Heterogeneous Information Network (HIN)
– Applications
◦ Meta-Path
– Definition
– Relevance Search
– Meta-Path Discovery
– Query Recommendation
◦ Meta-Structure
– Definition
– Relevance Search
◦ Demo
◦ Conclusions & Future Work21
![Page 22: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/22.jpg)
22
Relevance Search
oMotivation
Find Similar/Relevant Objects in Networks
oExamples
DBLP1
▪ Who are most similar to Jiawei Han ?
▪ Whose recent publication is relevant with Jiawei Han’s research ?
IMDb2
▪ Who are most similar to Tom Cruise ?
▪ Which movie is most relevant to Tom Cruise?
1 http://dblp.uni-trier.de/2 http://www.imdb.com/
![Page 23: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/23.jpg)
23
Relevance Search
oTarget
To answer these questions systematically
oSolutions
How to measure the similarity?▪ Define a Effective Similarity Function like Cosine, Euclidean
distance, Jaccard coefficient.
Structure similarity or Semantic similarity?▪ Structure Similarity: Based on structural similarity of sub‐network
structures. (like SimRank and PPR)
▪ Semantic Similarity: influenced by similar network structures. This
matters more for HIN! Semantic->edge relations
![Page 24: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/24.jpg)
24
SimRank
oModel
Idea: Two objects are similar if they are referenced
by similar objects
oDefinition▪ S(a,b) = Average similarity between in-neighbors of object a I(a)
and in-neighbors of object b I(b). Between [0, 1].
▪ S(a,b) = 1, if a=b
= , if a≠b
where c is the constant and 0<c<1
[Jeh, Glen, and Jennifer Widom. KDD’02] Jeh, Glen "SimRank: a measure of structural-context similarity."
![Page 25: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/25.jpg)
25
SimRank
oExample
S(a,a) = 1
S(a,b) =𝑐
1×1× 1 = c
▪ S(a,b) ideally should be 1.
▪ But, in reality the graph does not describe everything about
them, so by using the C to make s(a,b)<1. Adding C is to
expresses limited confidence or decay with distance.
x
b
a
[Jeh, Glen, and Jennifer Widom. KDD’03] Jeh, Glen "SimRank: a measure of structural-context similarity."
![Page 26: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/26.jpg)
26
Personalized PageRank (PPR)
oModel
Idea: Originally defined by Google as
a measure of importance for web-pages.
oDefinition▪ Given a graph G, a starting source node s, a target node t, and
a teleport probability 𝛼. Perform random walk from s. At each
step stop with the probability 𝛼, otherwise continue
performing random walk.
▪ Then the Personalized PageRank from s to t is
PPR𝑠~𝑡 = P(𝒔 → 𝒕)
[Jeh, Glen, and Jennifer Widom. WWW’02] Jeh, Glen "Scaling personalized web search."
![Page 27: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/27.jpg)
27
Personalized PageRank (PPR)
oExample
Starting from A, and 𝛼 = 0.2
For each target A, B, C, D
oCalculation
Iterative computation (Power Method);
Monte-carlo simulation (Approximation);
Bookmark Coloring Algorithm, and etc…
![Page 28: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/28.jpg)
28
Path Constrained Random Walk
oModel
Random walk on given paths.
oDefinition
▪ Performing random walks on given meta-paths with
the fixed starting point and target point.
▪ PCRW: Transition probability of the random walk
following a given meta-path.
▪ Between [0, 1].
PCRW(𝑠, 𝑡|𝚷) = P(𝒔 → 𝒕|𝚷)
[Cohen ECML’11]W. Cohen, N. Lao “Relational Retrieval Using a Combination of Path-Constrained Random Walks”
![Page 29: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/29.jpg)
29
Path Constrained Random Walk
oExample
= P1 -> P2 -> P3
1. Pro(B.Obama | P1)=1
2. Pro(M.A. Obama | P2) = Pro(B.Obama | P1) / 2 = 0.5
Pro(N.Obama | P2) = Pro(B.Obama | P1) / 2= 0.5
3. Pro(M.Obama | P3) = Pro(M.A. Obama | P2) /2 + Pro(N.Obama | P2) /2 = 0.5
Pro(B.Obama | P3) = Pro(M.A. Obama | P2) 2 + Pro(N.Obama | P2) /2 = 0.5
[Cohen ECML’11]W. Cohen, N. Lao “Relational Retrieval Using a Combination of Path-Constrained Random Walks”
Person Person Person
hasChild hasChild-1
m1
m1
![Page 30: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/30.jpg)
30
PathSim
oModel
Path Counts (PC):
#paths following a given meta-path
oDefinition
▪ Can only be applied on symmetric meta paths
(consider the node type and link type)
▪ Normalized version of PC. Between [0, 1].
▪ PathSim s, t | m =2×PC(s,t|m)
PC s,s +PC(t,t)
[Sun, Han VLDB’11] Y. Sun, J. Han, el “PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous
Information Networks
![Page 31: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/31.jpg)
31
PathSim
oExample
PC(B.Obama, M.Obama)=2
PC(B.Obama, B.Obama)=2
PC(M.Obama, M.Obama)=2
PS(B.Obama,M.Obama)=2*2/(2+2) =1
[Sun, Han VLDB’11] Y. Sun, J. Han, el “PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous
Information Networks
Person Person Person
hasChild hasChild-1
m1
![Page 32: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/32.jpg)
32
HeteSim
oModel
Improvement of SimRank for
Heterogeneous Information Network
oDefinition
▪ Any arbitrary meta paths.
▪ Given relations
[Shi, Kong, Huang TKDE’2014] Hetesim: A general framework for relevance measure in heterogeneous networks.
![Page 33: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/33.jpg)
33
HeteSim
oExample
𝒎𝟏= P1 -> P2 -> P3
HeteSim (B.Obama, M.Obama|𝑚1)=1
|𝑂𝐵.𝑂𝑏𝑎𝑚𝑎|+|𝐼𝑀.𝑂𝑏𝑎𝑚𝑎|(𝐻𝑒𝑡𝑒𝑆𝑖𝑚(M.A.Obama, M.A.Obama)+Hetesim(N.Obama, N.Obama))
=𝟏
(𝟐×𝟐)𝟏 + 𝟏 = 𝟎. 𝟓
Person Person Person
hasChild hasChild-1
m1
[Shi, Kong, Huang TKDE’2014] Hetesim: A general framework for relevance measure in heterogeneous networks.
![Page 34: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/34.jpg)
Comparison
oFor PathSim, HeteSim and PCRW, even for the same
example they have different values.
oThese metrics are designed for different applications
or measurement scenarios.
oNo dominating similarity measurements so far.
34
![Page 35: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/35.jpg)
35
Other Measurements
oKnowSim (APWeb’14)
Measure similarity between nodes by RWs on given
meta-path and the reverse meta-path respectively.
oAvgSim (ICDM’16)
Measure the similarity of Documents by modeling them
into heterogeneous information networks.
oRelSim (SDM’16)
Measure the similarity relations in heterogeneous
information network.
…
![Page 36: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/36.jpg)
36
Summary
Structure
-based
Semantic-
basedSymmetric?
SimRank √ Yes
PPR √ Yes
PCRW √ No
PathSim √ Yes
HeteSim √ Yes
…
![Page 37: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/37.jpg)
Outline◦ Introduction
– Motivation
– Heterogeneous Information Network (HIN)
– Applications
◦ Meta-Path
– Definition
– Relevance Search
– Meta-Path Discovery
– Query Recommendation
◦ Meta-Structure
– Definition
– Relevance Search
◦ Demo
◦ Conclusions & Future Work37
![Page 38: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/38.jpg)
Questions
oWhere do meta paths come from?– Provided by experts [Sun VLDB’11]
• Not easy for a complex schema!
– Enumeration within a given length of
meta paths [Cohen ECML’11]
• No clue about the length!
–How do I know the weights ?
38
![Page 39: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/39.jpg)
Our Contributions (WWW’15)
oDesign a solution that:
– (1) Discovers the best meta paths
– (2) Learns the weights, without
maximum weight specified.
[Meng WWW’15] Changping Meng, Reynold Cheng, Silviu Maniu,
Pierre Senellart, and Wangda Zhang. “Discovering Meta-Paths in Large
Heterogeneous Information Networks”, in WWW 2015.
39
![Page 40: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/40.jpg)
Meta-Path Framework
oFramework
Meta Path
Generation
Example
node pairs
Meta-paths
Relevance
Function
Knowledge
Graph
(Yago)
(B. Obama, M. Obama)
(B. Clinton, H. Clinton)
(Linear
Function)
G
F
Challenge: Each node and edge can have many
class labels. The number of candidate meta paths
grows exponentially with their path lengths.
40
![Page 41: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/41.jpg)
Generating Meta-Paths
o In Two Phases
Example
node pairs
Meta-paths
Relevance
Function Knowledge
Graph
G
F
Link
Type
Node
Type
41
![Page 42: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/42.jpg)
Phase 1: Link-Only Path Generation
oForward Stage-wise Path Generation (FSPG)
– iteratively generate the most related meta-paths and update
the model
Example
Pairs
Get one most
related meta
path m
Model
TrainingMeta path
m
Updated
model
FINISH
Based on the Least-Angle
Regression (LARS) model
[Efron, Ann.Stat’04]
Y
N
GreedyTree
Converg
e
To train the
weights on
meta paths.
42
![Page 43: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/43.jpg)
oGreedyTree
– A tree that greedily expands the node which has the largest
priority score
– Priority Score : related to the correlation between m and r
• m is the vector expression of a meta path, r is the residual vector
which evaluates the gap between the truth and current model
Phase 1: Link-Only Path Generation
43
![Page 44: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/44.jpg)
GreedyTree
Phase 1: Link-Only Path Generation
44
![Page 45: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/45.jpg)
Phase 2: Node Class Generation
oWhy node classes?
– A link only meta path may introduce some unrelated result
pairs
– It is less specific
– Solution : Lowest Common Ancestor (LCA)
• Record the LCA in the node of GreedyTree
45
![Page 46: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/46.jpg)
ExperimentsoDatasets
– DBLP (4 areas: DB, DM, AI, IR)
• 14K papers, 14K authors, 9K topics, 20 venues.
– Yago
• A KG derived from Wikipedia, WordNet and
GeoNames.
• CORE Facts: 2.1 million nodes, 8 million edges,
125 edge types, 0.36 million node types
oLink-prediction evaluation
– Select n pairs of certain relationships as
example pairs
– Randomly select another m pairs to predict
the links46
![Page 47: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/47.jpg)
Experiment 1: Effectiveness
oBaseline: enumerate all meta paths within a given
max length L = 1, 2, 3, 4
– L is small low recall.
– L is large low precision.
ROC for link prediction47
![Page 48: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/48.jpg)
Experiment 2
oCase study: Yago citizenOf
– Better than direct link (PCRW 1)
– Better than best PCRW 2
– Better than PCRW 3,4
5 most relevant meta paths
for “citizenOf”
48
![Page 49: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/49.jpg)
Experiment 3: Efficiency
oFindings:
– In Yago, 2 orders of magnitude better than paths with
lengths more than 2.
– In DBLP, the running time is comparable to PCRW 5, but
the accuracy is much better.
Running time of FSPG
49
![Page 50: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/50.jpg)
Outline◦ Introduction
– Motivation
– Heterogeneous Information Network (HIN)
– Applications
◦ Meta-Path
– Definition
– Relevance Search
– Meta-Path Discovery
– Query Recommendation
◦ Meta-Structure
– Definition
– Relevance Search
◦ Demo
◦ Conclusions & Future Work50
![Page 51: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/51.jpg)
One Application
oQuery Recommendation: to suggest alternate relevant
queries to a search engine user
– 1) As you type;
– 2) Related queries
51
Zhipeng Huang, Bogdan Cautis, Reynold Cheng, Yudian Zheng. KB-Enabled Query Recommendation for
Long-Tail Queries. CIKM 2016.
![Page 52: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/52.jpg)
Long Tail Distribution
oLong-tail queries: queries that are not commonly
requested by users
– “akira kurosawa influence george lucas”
52
![Page 53: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/53.jpg)
Motivation
oUbiquitous:
– 84% of 10M queries appear no more than 3 times.
oNecessary:
– Existing works that only rely on query log alone can no
longer handle well these queries.
53
![Page 54: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/54.jpg)
Query Log
oA set of user log <q, u, t, C>
– q: the query
– u: user id
– t: time stamp
– C: the clicked URLs
oSession: a time window, a mission.
oExisting methods rely on query logs to analyze the
flow among queries.
54
Boldi, Paolo, et al. "The query-flow graph: model and applications." Proceedings of the 17th ACM conference
on Information and knowledge management. ACM, 2008.
Bonchi, Francesco, et al. "Efficient query recommendations in the long tail via center-piece subgraphs."
Proceedings of the 35th international ACM SIGIR conference on Research and development in information
retrieval. ACM, 2012.
![Page 55: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/55.jpg)
Knowledge Graph
55
Hoffart, Johannes, et al. "Yago2: a spatially and temporally enhanced Knowledge Graph from wikipedia."
(2012).
![Page 56: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/56.jpg)
Relationship in the KG
oMeta path representation:
– P: city nextTo city
oQ: “weather Los Angeles”
– Rec:
• “weather Las Vegas”
• “weather San Diego”
56
[Sun, Han VLDB’11] Y. Sun, J. Han, el “PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous
Information Networks
![Page 57: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/57.jpg)
57
System Overview
oG = (Gqf, K, teq, P)
– Gqf is a query-flow graph
– K is a Knowledge Graph
– tEQ is a set of entity-query links
– P is a set of meta path to be extracted from
query log
![Page 58: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/58.jpg)
Offline
58
oGqf is built as described in [1].
o teq is built from entity linking and
normalizing the weights.
oP:
– Get the set of entity pairs within the same
session: 𝒆𝒊, 𝒆𝒋 𝒆𝒊, 𝒆𝒋 ∈ 𝒔𝒌}
– Get the meta path between ei and ej (we
use the shortest path for simplicity)
– Stored by the type of ei
![Page 59: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/59.jpg)
Online
Input query q:
LA weather
Step 1. Entity Linking Step 2. Entity Expansion
0.25
0.25
e1,1
e1,2
e1
meta path P1: city citynextTo
P1
P1
Entity Linking Tool
(e.g., Dexter2)
<San_Diego>
<Las_Vegas>
Recommendation:
q1 = San Diego weather
q2 = Las Vegas weather
Step 3. Query Searching
e1,1
e1,2
q1
q2
0.25
0.25
q1 = San Diego weather
q2 = Las Vegas weather
<Los_Angeles>
San Diego weather
Las Vegas weather
<San_Diego>
<Las_Vegas>e1 = <Los_Angeles>
e2 = <weather>
e2 P2
<weather>
meta path P2:property propertyisA
e20.5
0.5
<weather>
0.5
o Three Steps:
– Entity Linking (use existing tool)
– Entity Expansion (use P)
– Query Searching (PPR)
59
![Page 60: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/60.jpg)
Step 1: Entity Linking
oGiven
– q = “weather Los Angeles”
oReturn:
– e1 = Los_Angeles
60
Ceccarelli, Diego, et al. "Dexter: an open source framework for entity linking." Proceedings of the sixth
international workshop on Exploiting semantic annotations in information retrieval. ACM, 2013.
![Page 61: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/61.jpg)
Step 2. Entity Expansion
oGiven
– e1 = Los_Angeles
oUsing P:
– city NextTo city
oReturn
– e2 = Las_Vegas
– e3 = San_Diego
61
![Page 62: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/62.jpg)
Step 3. Query Searching
oGiven:
– e2 = Las_Vegas
– e3 = San_Diego
oReturn:
– q1 = “weather las vegas”
– q2 = “weather san diego”
62
![Page 63: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/63.jpg)
Experiments
oDataset: AOL. 20M query instances from 9M distinct
queries.
oUse 10%, 50%, 90% for building the query log, and
10% for testing.
oTesting sets: We use 3, 5, 10 as the threshold for
long-tail queries. We name them L’3, L’5 and L’10.
oMeasures:
– Coverage
– Precision@5
63
![Page 64: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/64.jpg)
Experimental Results
64
![Page 65: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/65.jpg)
Efficiency
oTime for offline:
oTime for entity linking:
– 60ms for Dexter2, and can reduce to 0.4ms if we use FEL
method.
65
![Page 66: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/66.jpg)
Outline◦ Introduction
– Motivation
– Heterogeneous Information Network (HIN)
– Applications
◦ Meta-Path
– Definition
– Relevance Search
– Meta-Path Discovery
– Query Recommendation
◦ Meta-Structure
– Definition
– Relevance Search
◦ Demo
◦ Conclusions & Future Work66
![Page 67: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/67.jpg)
67
Limitations of Meta Paths
oFail to discover common nodes in
different meta paths!
– E.g., a researcher wants to search for some
authors who have published papers in the
same venue and in the same topic with his
papers.a
1a
2a
3
p1,2
p1,1
p2,1
p2,2
p3,2
p3,1
v1
v2
v3
v4
t1 t
2t3
t4
K D D “m ining” AAAIVLD B “efficient” “privacy”
AAAI’15 VLD B’15K D D’15K D D’07
IC D M “social”
IC D M ’12
w rite publishm ention
VLD B’06
author paper venue topicobject types:
edge types:
![Page 68: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/68.jpg)
68
Limitations of Meta Paths
oFail to discover common nodes in
different meta paths!
– E.g., a researcher wants to search for some
authors who have published papers in the
same venue and in the same topic with his
papers.a
1a
2a
3
p1,2
p1,1
p2,1
p2,2
p3,2
p3,1
v1
v2
v3
v4
t1 t
2t3
t4
K D D “m ining” AAAIVLD B “efficient” “privacy”
AAAI’15 VLD B’15K D D’15K D D’07
IC D M “social”
IC D M ’12
w rite publishm ention
VLD B’06
author paper venue topicobject types:
edge types:
![Page 69: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/69.jpg)
69
Limitations of Meta Paths
oFail to discover common nodes in
different meta paths!
– E.g., a researcher wants to search for some
authors who have published papers in the
same venue and in the same topic with his
papers.a
1a
2a
3
p1,2
p1,1
p2,1
p2,2
p3,2
p3,1
v1
v2
v3
v4
t1 t
2t3
t4
K D D “m ining” AAAIVLD B “efficient” “privacy”
AAAI’15 VLD B’15K D D’15K D D’07
IC D M “social”
IC D M ’12
w rite publishm ention
VLD B’06
author paper venue topicobject types:
edge types:
![Page 70: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/70.jpg)
Meta Structure
o A meta structure is a directed acyclic graph (DAG) with a single source and sink (target) node
o More Expressive (i.e., contain moreinformation) than a meta path.
70
[Huang KDD’16] ZP. Huang “Meta Structure: Computing Relevance on Large Heterogeneous Information
Networks” KDD 2016
![Page 71: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/71.jpg)
Outline◦ Introduction
– Motivation
– Heterogeneous Information Network (HIN)
– Applications
◦ Meta-Path
– Definition
– Relevance Search
– Meta-Path Discovery
– Query Recommendation
◦ Meta-Structure
– Definition
– Relevance Search
◦ Demo
◦ Conclusions & Future Work71
![Page 72: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/72.jpg)
Relevance Measure 1: StructCount
oStructCount: extension of PathCount
oStructCount biases towards popular
objects with a large number of links.
StructCount(x0, y0 | S) = GraphIns(x0, y0 | S)
72
[Huang KDD’16] ZP. Huang “Meta Structure: Computing Relevance on Large Heterogeneous Information
Networks” KDD 2016
![Page 73: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/73.jpg)
Layers of Meta Structure
oThe layer of meta structure is a topological ordering
of a DAG
1 2 3 4 5
73
![Page 74: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/74.jpg)
Relevance Measure 2: SCSE
oStructure Constrained Random Walk (SCSE):
extension of PCRW.
a1
a2
a3
p1,2
p1,1
p2,1
p2,2
p3,2
p3,1
v1
v2
v3
v4
t1 t
2t3
t4
K D D “m ining” AAAIVLD B “efficient” “privacy”
AAAI’15 VLD B’15K D D’15K D D’07
IC D M “social”
IC D M ’12
w rite publishm ention
VLD B’06
author paper venue topicobject types:
edge types:
1.0
0.5
0.25
0.5
74
![Page 75: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/75.jpg)
Relevance Measure 2: SCSE
a1
a2
a3
p1,2
p1,1
p2,1
p2,2
p3,2
p3,1
v1
v2
v3
v4
t1 t
2t3
t4
K D D “m ining” AAAIVLD B “efficient” “privacy”
AAAI’15 VLD B’15K D D’15K D D’07
IC D M “social”
IC D M ’12
w rite publishm ention
VLD B’06
author paper venue topicobject types:
edge types:
75
![Page 76: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/76.jpg)
Relevance Measure 3: BSCSE
oBiased Structure Constrained Random Walk (BSCSE):
extension of BPCRW.
– A combination of SC and SCSE
– SC 0 1 SCSE
77
[Huang KDD’16] ZP. Huang “Meta Structure: Computing Relevance on Large Heterogeneous Information
Networks” KDD 2016
![Page 77: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/77.jpg)
Relevance Measures: Summary
Meta Path Meta Structure Meaning
PathCount StructCount # of meta-path/structure instances
PCRW SCSERandom walk probability on meta-
path/structure
BPCRW BSCSE Combination of count and probability
78
![Page 78: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/78.jpg)
i-LTable
o Index the probability distribution starting from the i-th
layer of a meta structure.
a1
a2
a3
p1,2
p1,1
p2,1
p2,2
p3,2
p3,1
v1
v2
v3
v4
t1 t
2t3
t4
K D D “m ining” AAAIVLD B “efficient” “privacy”
AAAI’15 VLD B’15K D D’15K D D’07
IC D M “social”
IC D M ’12
w rite publishm ention
VLD B’06
author paper venue topicobject types:
edge types:
Key / layer 3 Value
<ICDM,
social>
<Pei, 1.0>
<KDD,
mining>
<Pei, 0.5>
<Han, 0.5>
<VLDB,
efficient>
<Han, 1.0>
<VLDB,
privacy>
<Yang, 1.0>
<AAAI,
efficient>
<Yang, 1.0>
79
![Page 79: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/79.jpg)
Experiment: Entity Resolution
oOn YAGO, we have duplicated
entities, e.g., Barack_Obama and
Presidency_Of_Barack_Obama
o We retrieve the top-k pairs; the high
relevance of the node pairs indicates
that the nodes are duplicated
oArea under PR-Curve (AUC)
80
![Page 80: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/80.jpg)
Experiment: Entity Resolution
P1 P2
Measure PathCou
nt
PCRW PathSim PathCou
nt
PCRW PathSim
AUC 0.1324 0.0120 0.0097 0.0003 0.0014 0.0002
Linear Combination(optimal ) Meta Structure S
Measure PathCou
nt
PCRW PathSim SC SCSE BSCSE*
AUC 0.2898 0.2606 0.2920 0.5556 0.5640 0.564081
![Page 81: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/81.jpg)
Outline◦ Introduction
– Motivation
– Heterogeneous Information Network (HIN)
– Applications
◦ Meta-Path
– Definition
– Relevance Search
– Meta-Path Discovery
– Query Recommendation
◦ Meta-Structure
– Definition
– Relevance Search
◦ Demo
◦ Conclusions & Future Work82
![Page 82: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/82.jpg)
Meta-Paths Demo
Welcome to Meta-Paths
Interactive toolbox for querying
Heterogeneous Information Networks
by Examples
Start Learn More
83
![Page 83: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/83.jpg)
New Query
84
![Page 84: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/84.jpg)
FSPG Execution
The FSPG
algorithm will be
triggered on the
server, returning
the results upon
completion.
85
![Page 85: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/85.jpg)
Generated Meta-Paths
86
![Page 86: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/86.jpg)
Node Pair Generation
Results of the
Meta-Paths
algorithm are
shown. Upon
clicking
“Proceed”, the
node pair
generation
service will be
triggered on the
server.
87
![Page 87: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/87.jpg)
Suggested Node Pairs
The user can
remove some of
the suggested
node pairs, and
use the remaining
pairs to refine the
Meta-Paths in an
iterative manner.
88
![Page 88: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/88.jpg)
Fine-tuning Node Pairs
Click “Proceed” to
start the next
iteration, or
“Finish” to view
the final query
results.
89
![Page 89: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/89.jpg)
Next Iteration
90
![Page 90: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/90.jpg)
Final Results
Click “Save” to
keep a copy of
the query results.
Alternatively, click
“New” to start a
new query.
91
![Page 91: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/91.jpg)
Final Results
92
![Page 92: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/92.jpg)
Outline◦ Introduction
– Motivation
– Heterogeneous Information Network (HIN)
– Applications
◦ Meta-Path
– Definition
– Relevance Search
– Meta-Path Discovery
– Query Recommendation
◦ Meta-Structure
– Definition
– Relevance Search
◦ Demo
◦ Conclusions & Future Work93
![Page 93: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/93.jpg)
94
Conclusions
oHeterogeneous Information Networks
are more powerful than Homogeneous
Information Networks
oMeta-path can capture the relevances
(similarities) between two nodes
oMeta-structure captures more complex
relationships in structures
![Page 94: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/94.jpg)
95
Future Work
Dynamic Similarity Search on
Meta-Paths
oSometimes the direct relevance search
can not reveal the true relationship
among entities.
oSolutions: Dynamic Network Search
oProblems: 1. No efficient top-k query
algorithms. 2. No predicates or posterior
knowledge of the network
oML methods could help!
![Page 95: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/95.jpg)
96
Future Work
Ming HINs with Meta Structure
oUse Meta Structure to perform various
data mining tasks on HINs, e.g.,
recommendation, classification and
clustering.
oDesign effective and efficient
techniques to discover meta structure
to express the relationship between
entities.
![Page 96: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/96.jpg)
97
Future Work
Knowledge Graph exploration
oQ1: Given an entity of interest in a KG,
use different meta paths and meta
structures to find related entities,and
sort them according to relevance.
oQ2: Given some entity pairs, find some
meta structures to account for their
relationships (meta path version has
been solved).
![Page 97: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/97.jpg)
98
Future Work
Personalized Knowledge Graph
oPersonalized Recommendation is
popular and useful in recommendation.
oRich information from query logs.
oQuestions: How to build a Personalized
KG for each user?
oStorage and efficiency
oPrivacy issues
![Page 98: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/98.jpg)
99
Future Work
Knowledge Graph maintenance
oQ1: Build a domain-specific KG from
some given entity samples and a
document corpus.
oQ2: Expand a KG by crawling info from
internet.
oQ3: Error detection within a KG using
meta path and meta structures.
oQ4: Error correction automatically.
![Page 99: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/99.jpg)
100
Future Work
Knowledge Graph cleaning
oRelations / Nodes in KG are inherently
“dirty” (many are curated based on
automatic tools / scripts, which lead to
duplications or error data)
oHow to clean the Knowledge Graph by
removing dirty relations / nodes ?
![Page 100: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/100.jpg)
101
Future Work
Machine Learning
oMachine learning / deep learning is so
hot nowadays !
oHow to leverage the techniques in
machine learning / deep learning to
better enhance the heterogeneous
information networks (or knowledge
graphs) ?
![Page 101: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/101.jpg)
102
Future Work
Bioinformatics
oThe network is also very common in the
biology. This can help interpret the
network more accurately.
oMulti-discipline is very popular now.
oCan we find some typical examples in
biological information networks and
use meta-path or meta-structure to
analyze them?
![Page 102: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K](https://reader030.vdocument.in/reader030/viewer/2022040617/5f21d1f2d92bbf02be393064/html5/thumbnails/102.jpg)
Thanks ! Q & A
Reynold
Cheng
Zhipeng
Huang
Yudian
Zheng
Jing
Yan
Ka Yu
Wong
Eddie
Ng
Database
Group: