Mining Interesting Meta-Paths from Complex Heterogeneous
Information NetworksBaoxu Shi, Tim Weninger University of Notre Dame
1
Homogeneous Network
MoDAT
2
Heterogeneous Network
Association
People
University City Country
Conference WorkshopBelongs to
Speaks atlocates at
locates at the capital of
affiliate
Professor of
3
MoDAT
Heterogeneous Network
Belongs to
Speaks at
locates at
locates at the capital of
affiliate
Professor at
People
Association
Meeting
Education Geography
Meeting
Geography
Heterogeneous Network
People
Association
Meeting
Education Geography
Meeting
Geography
Path and Meta-Path
PeopleMeeting Education Geography
Association
How things are uniquely connected/separated?
NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND
3
EDUCATION GEOGRAPHYPEOPLE
Path
Meta-Path
1 2
1 2 3
EDUCATION
Interesting meta-path is meta-path that best describes how two objects are uniquely related in complex HINs.
7
NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND
3
EDUCATION GEOGRAPHYPEOPLE
Path
Meta-Path
1 2
1 2 3
EDUCATION
Education Professor University Geography
NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND
3
EDUCATION GEOGRAPHYPEOPLE
Path
Meta-Path
1 2
1 2 3
EDUCATION
Education Professor University Geography
Education Network Scientist Catholic University Geography
9
NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND
3
EDUCATION GEOGRAPHYPEOPLE
Path
Meta-Path
1 2
1 2 3
EDUCATION
Education Professor University Geography
Education Network Scientist Catholic University Geography
EducationNetwork Scientist
who born in Transylvania,1967
Catholic University
at South Bend, IN Geography
10
Limitations of State of the Art Meta-Path Related Researches
• Type of meta-labels are limited
• Meta-types do not have complex hierarchy
• Meta-paths are pre-defined manually
• No large scale experiments
Term Venue
Paper
Author
11
NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND
3
EDUCATION GEOGRAPHYPEOPLE
Path
Meta-Path
1 2
1 2 3
EDUCATION
Limitations of State of the Art Meta-Path Related Researches
• Type of meta-labels are limited
• Meta-types do not have complex hierarchy
• Meta-paths are pre-defined manually
• No large scale experiments
Framework that can handle millions of meta-types
Meta-types with complex hierarchy
Meta-path are automatically generated
Experiments are done on Wikipedia (10 million nodes, 740 million edges)
12
How to find interesting paths?
• Generate paths
• Rank top k interesting paths using meta-data
• Extract meta-path for searching
13
Path Generation
sib(ai, aj) i↵ ai 2 t ^ aj 2 t
8au0 2 A0u, sib(au0 , au) 8av0 2 A0
v, sib(av0 , av)
{~y1, ~y2, . . .} 2 Y ~y = ha1, a2, . . . , a|~y|ii
~x = ha1, a2, . . . , a|~x|ii{~x1, ~x2, . . . , ~xk} 2 X
au
= a1, av = a|~x|, 1 i k
• Generate path set for given points
• Generate sibling path set
14
a1 2 A0u, a|~y| 2 A0
v, 1 i k
au, av
ROCKNEWAND
AMERICAN COMPUTER SCIENTISTS
PROGRAMMING LANGUAGE
RESEARCHERS
983 Others
BARBARA LISKOV
ANDERS_HEJLSBERG79 Others
UNIVERSITY OF NOTRE DAME
FACULTY
COLLEGE FOOTBALL
HALL OF FAME INDUCTEES
21 Others
JOHN HEISMAN
BARRY SANDERS
JULIUS NIEUWLAND
BARABÁSI204 Others
1075 OthersHAL ABELSON
YX
VASANT HONAVAR
Short Paths
Example: Path generation
15
ROCKNENORTHEASTERN NOTRE DAME
WANDBARABÁSI
ROSE BOWLHARVARD
CY YOUNG CARL HUBBELL
CARNEGIE MELLON UNIVERSITY
TD GARDEN LA COLISEUM
Example: Path generation
16
Which is the most interesting path?
Path Ranking
• Unordered Ranking~x
i
= ha1, a2, . . . , a|~xi|i ~T~x
i
= hTa1 , Ta2 , . . . , Ta|~x
i
|i
T~xi =
|~xn|[
n=1
Tan
TY =
|Y |[
i=1
{T~yi}T~yi = Ta0u[
|~yi|�1[
n=2
{Tan} [ Ta0v
r(~xi
) =|T
~xi \ T
Y
||T
Y
|
Path Ranking
• Ordered Ranking
~x
i
= ha1, a2, . . . , a|~xi|i ~T~x
i
= hTa1 , Ta2 , . . . , Ta|~x
i
|i
p(an, a0n) =
|Tan \ TYn ||TYn |
r(~xi
) = mean
|~xi|n=1(p(an, a
0n
))
19
Result: Path Ranking
Qualitative analysis is done with mechanical turkers.
20
●
●
●
●
●
0.48
0.52
0.56
0.60
0 0.25 0.5 0.75 1
Result: Path Ranking
Result shows user more like to pick path with lowest or highest similarity.
People pick path with highest score may because they treat best as correct.
DATA MINERS
JIAWEI HAN
DATA MINING SIGKDD JOHANNES GEHRKE
STATISTICIANS
MATHEMATICIANS
PEOPLE
SCHOLARS AND ACADEMICS
DATA MINING
SCIENCE
ACM SIGS
PEOPLE
Mor
e sp
ecifi
cM
ore
gene
ral
Nodes
Types
COMPUTATIONAL STATISTICS
MATHEMATICAL SCIENCES
STATISTICS
SOCIETY
ACM
PROFESSIONAL ORGANIZATIONS
SCIENTIFIC SOCIETIES
DATABASE RESEARCHERS
COMPUTER SCIENTISTS
SCHOLARS AND ACADEMICS
SCHOLARS
ORGANIZATIONS
Example: Extract Meta-Path
22
Result: Meta-Path Constraint RWR0 0.24 0.41 0.48
Edgar F. Codd 40.5 18.1 9.0
Johannes Gehrke 28.4 29.4 8.4 2.8
Raghu Ramakrishnan 31.1 6.0 3.6
Anita Borg 5.1 0.6 0.2
Shafi Goldwasser 4.9 0.6
Osmar R. Zaiane 4.8 3.6 1.6
Vint Cerf 4.1 2.4 0.2
Allen Newell 2.0 0.6
ACM 5.1
IEEE 4.9
Yahoo! Research 4.8
Microsoft Research 4.4
Database researchers
Computer Scientist
Questions?