mining interesting meta-paths from complex heterogeneous information networks

Post on 13-Apr-2017

170 Views

Category:

Presentations & Public Speaking

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Mining Interesting Meta-Paths from Complex Heterogeneous

Information NetworksBaoxu Shi, Tim Weninger University of Notre Dame

1

Homogeneous Network

MoDAT

2

Heterogeneous Network

Association

People

University City Country

Conference WorkshopBelongs to

Speaks atlocates at

locates at the capital of

affiliate

Professor of

3

MoDAT

Heterogeneous Network

Belongs to

Speaks at

locates at

locates at the capital of

affiliate

Professor at

People

Association

Meeting

Education Geography

Meeting

Geography

Heterogeneous Network

People

Association

Meeting

Education Geography

Meeting

Geography

Path and Meta-Path

PeopleMeeting Education Geography

Association

How things are uniquely connected/separated?

NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND

3

EDUCATION GEOGRAPHYPEOPLE

Path

Meta-Path

1 2

1 2 3

EDUCATION

Interesting meta-path is meta-path that best describes how two objects are uniquely related in complex HINs.

7

NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND

3

EDUCATION GEOGRAPHYPEOPLE

Path

Meta-Path

1 2

1 2 3

EDUCATION

Education Professor University Geography

NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND

3

EDUCATION GEOGRAPHYPEOPLE

Path

Meta-Path

1 2

1 2 3

EDUCATION

Education Professor University Geography

Education Network Scientist Catholic University Geography

9

NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND

3

EDUCATION GEOGRAPHYPEOPLE

Path

Meta-Path

1 2

1 2 3

EDUCATION

Education Professor University Geography

Education Network Scientist Catholic University Geography

EducationNetwork Scientist

who born in Transylvania,1967

Catholic University

at South Bend, IN Geography

10

Limitations of State of the Art Meta-Path Related Researches

• Type of meta-labels are limited

• Meta-types do not have complex hierarchy

• Meta-paths are pre-defined manually

• No large scale experiments

Term Venue

Paper

Author

11

NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND

3

EDUCATION GEOGRAPHYPEOPLE

Path

Meta-Path

1 2

1 2 3

EDUCATION

Limitations of State of the Art Meta-Path Related Researches

• Type of meta-labels are limited

• Meta-types do not have complex hierarchy

• Meta-paths are pre-defined manually

• No large scale experiments

Framework that can handle millions of meta-types

Meta-types with complex hierarchy

Meta-path are automatically generated

Experiments are done on Wikipedia (10 million nodes, 740 million edges)

12

How to find interesting paths?

• Generate paths

• Rank top k interesting paths using meta-data

• Extract meta-path for searching

13

Path Generation

sib(ai, aj) i↵ ai 2 t ^ aj 2 t

8au0 2 A0u, sib(au0 , au) 8av0 2 A0

v, sib(av0 , av)

{~y1, ~y2, . . .} 2 Y ~y = ha1, a2, . . . , a|~y|ii

~x = ha1, a2, . . . , a|~x|ii{~x1, ~x2, . . . , ~xk} 2 X

au

= a1, av = a|~x|, 1 i k

• Generate path set for given points

• Generate sibling path set

14

a1 2 A0u, a|~y| 2 A0

v, 1 i k

au, av

ROCKNEWAND

AMERICAN COMPUTER SCIENTISTS

PROGRAMMING LANGUAGE

RESEARCHERS

983 Others

BARBARA LISKOV

ANDERS_HEJLSBERG79 Others

UNIVERSITY OF NOTRE DAME

FACULTY

COLLEGE FOOTBALL

HALL OF FAME INDUCTEES

21 Others

JOHN HEISMAN

BARRY SANDERS

JULIUS NIEUWLAND

BARABÁSI204 Others

1075 OthersHAL ABELSON

YX

VASANT HONAVAR

Short Paths

Example: Path generation

15

ROCKNENORTHEASTERN NOTRE DAME

WANDBARABÁSI

ROSE BOWLHARVARD

CY YOUNG CARL HUBBELL

CARNEGIE MELLON UNIVERSITY

TD GARDEN LA COLISEUM

Example: Path generation

16

Which is the most interesting path?

Path Ranking

• Unordered Ranking~x

i

= ha1, a2, . . . , a|~xi|i ~T~x

i

= hTa1 , Ta2 , . . . , Ta|~x

i

|i

T~xi =

|~xn|[

n=1

Tan

TY =

|Y |[

i=1

{T~yi}T~yi = Ta0u[

|~yi|�1[

n=2

{Tan} [ Ta0v

r(~xi

) =|T

~xi \ T

Y

||T

Y

|

Path Ranking

• Ordered Ranking

~x

i

= ha1, a2, . . . , a|~xi|i ~T~x

i

= hTa1 , Ta2 , . . . , Ta|~x

i

|i

p(an, a0n) =

|Tan \ TYn ||TYn |

r(~xi

) = mean

|~xi|n=1(p(an, a

0n

))

19

Result: Path Ranking

Qualitative analysis is done with mechanical turkers.

20

0.48

0.52

0.56

0.60

0 0.25 0.5 0.75 1

Result: Path Ranking

Result shows user more like to pick path with lowest or highest similarity.

People pick path with highest score may because they treat best as correct.

DATA MINERS

JIAWEI HAN

DATA MINING SIGKDD JOHANNES GEHRKE

STATISTICIANS

MATHEMATICIANS

PEOPLE

SCHOLARS AND ACADEMICS

DATA MINING

SCIENCE

ACM SIGS

PEOPLE

Mor

e sp

ecifi

cM

ore

gene

ral

Nodes

Types

COMPUTATIONAL STATISTICS

MATHEMATICAL SCIENCES

STATISTICS

SOCIETY

ACM

PROFESSIONAL ORGANIZATIONS

SCIENTIFIC SOCIETIES

DATABASE RESEARCHERS

COMPUTER SCIENTISTS

SCHOLARS AND ACADEMICS

SCHOLARS

ORGANIZATIONS

Example: Extract Meta-Path

22

Result: Meta-Path Constraint RWR0 0.24 0.41 0.48

Edgar F. Codd 40.5 18.1 9.0

Johannes Gehrke 28.4 29.4 8.4 2.8

Raghu Ramakrishnan 31.1 6.0 3.6

Anita Borg 5.1 0.6 0.2

Shafi Goldwasser 4.9 0.6

Osmar R. Zaiane 4.8 3.6 1.6

Vint Cerf 4.1 2.4 0.2

Allen Newell 2.0 0.6

ACM 5.1

IEEE 4.9

Yahoo! Research 4.8

Microsoft Research 4.4

Database researchers

Computer Scientist

Questions?

top related