a pattern-aware graph mining system

Post on 16-May-2022

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Pattern-Aware Graph Mining SystemKasra Jamshidi Rakesh Mahadasa Keval Vora

Simon Fraser University

https://github.com/pdclab/peregrine

Why should you pay attention?

executes 700x fasterPeregrine

consumes 100x less memoryPeregrine

scales to 100x larger datasetsPeregrine

On 8x fewer machines

With a more expressive API

a

db

a

bd

Data Graph

e

c

gf

Graph Mining

a

db

7

a

db

a

bd

Graph Mining

e

c

gf

Data Graph

a

db

8

a

db

a

bd

Graph Mining

e

c

gf

Data Graph

a

db

Subgraph

9

a

db

a

bd

Graph Mining

e

c

gf

Data Graph Pattern

10

Graph Mining

a

db

a

bd

Edge-Induced

11

Graph Mining

Edge-Induced

a

db

a

bd

12

Graph Mining

a

db

a

bd

Edge-Induced

13

Graph Mining

a

db

a

bd

Vertex-Induced

14

Graph Mining

a dba b d

Vertex-Induced

15

Graph Mining

a

db

a

bd

Vertex-Induced

16

Graph Mining

a

db

e

c

gf

Data Graph

17

Graph Mining

a

db

e

c

gf

Data Graph

a

db

a

bd

Vertex-Induced

18

Graph Mining

a

db

e

c

gf

Data Graph

a

db

a

b d

a

db

a

b d

a

db

a

b d

a

db

a

b d

Edge-Induced

a

db

a

bd

Vertex-Induced

19

Frequent Patterns(Edge-Induced)

Graph Mining

a

db

e

c

gf

Data Graph

20

Unlabeled Pattern Distribution(Vertex-Induced)

2 1 0

1 02

Graph Mining

a

db

e

c

gf

Data Graph

21

Scalability Challenge

22

• 4-motif counting on Orkut graph (|V| = 13M, |E| = 117M)

Scalability Challenge

23

Scalability Challenge

• 4-motif counting on Orkut graph (|V| = 13M, |E| = 117M)

123,503,340,341,270 subgraphs

24

System Requirements

25

System Requirements

Uniqueness

x

zy

x

zy

x

zy

26

System Requirements

Uniqueness

x

zy xz

y

x

z

y

27

System Requirements

Uniqueness

x

zy xz

y

x

z

y

28

System Requirements

Uniqueness

Structurex

zy

29

System Requirements

Uniqueness

Structurex

zy

30

System Requirements

Uniqueness

Structure

Interestingness

31

System Requirements

Uniqueness

Structure

Interestingness

32

System Requirements

Uniqueness

Structure

Interestingness

33

System Requirements

Uniqueness

Structure

Interestingness

34

Existing Work

Uniqueness

Structure

Interestingness

Arabesque (SOSP ‘15)RStream (OSDI ‘18)Fractal (SIGMOD ‘19)AutoMine (SOSP ‘19)

35

Existing Work

Uniqueness

Structure

Interestingness

Arabesque (SOSP ‘15)RStream (OSDI ‘18)Fractal (SIGMOD ‘19)AutoMine (SOSP ‘19)

Overlook user requirements

36

Existing Work

Uniqueness

Structure

Interestingness

Arabesque (SOSP ‘15)RStream (OSDI ‘18)Fractal (SIGMOD ‘19)AutoMine (SOSP ‘19)

Overlook user requirementsPer-subgraph computations

37

38

Pattern Awareness

Pattern Selection

Pattern Matching

39

Pattern Awareness

Pattern Selection

Pattern Matching

40

Pattern Awareness

Pattern Selection

41

Pattern Programming

#include “Peregrine.hh”using namespace Peregrine;

void motifCounting(int size){

DataGraph G(“path/to/graph/”);auto patterns = PatternGenerator::all(size, VERTEX_INDUCED);auto counts = count(G, patterns);

for (auto &[pattern, n] : counts)std::cout << pattern << “ ” << n << std::endl;

}

42

Pattern Programming

#include “Peregrine.hh”using namespace Peregrine;

void motifCounting(int size){

DataGraph G(“path/to/graph/”);auto patterns = PatternGenerator::all(size, VERTEX_INDUCED);auto counts = count(G, patterns);

for (auto &[pattern, n] : counts)std::cout << pattern << “ ” << n << std::endl;

}

43

Pattern Programming

#include “Peregrine.hh”using namespace Peregrine;

void motifCounting(int size){

DataGraph G(“path/to/graph/”);auto patterns = PatternGenerator::all(size, VERTEX_INDUCED);auto counts = count(G, patterns);

for (auto &[pattern, n] : counts)std::cout << pattern << “ ” << n << std::endl;

}

44

Pattern Programming

#include “Peregrine.hh”using namespace Peregrine;

void motifCounting(int size){

DataGraph G(“path/to/graph/”);auto patterns = PatternGenerator::all(size, VERTEX_INDUCED);auto counts = count(G, patterns);

for (auto &[pattern, n] : counts)std::cout << pattern << “ ” << n << std::endl;

}

45

Pattern Programming

DataGraph G(“path/to/graph/”);auto patterns = PatternGenerator::all(size, VERTEX_INDUCED);

auto counts = count(G, patterns);

46

Pattern Programming

DataGraph G(“path/to/graph/”);auto patterns = PatternGenerator::all(size, VERTEX_INDUCED);patterns[0].set_labels({‘a’, ‘b’, ‘c’, ‘d’});

auto counts = count(G, patterns);

47

Pattern Programming

DataGraph G(“path/to/graph/”);auto patterns = PatternGenerator::all(size, VERTEX_INDUCED);patterns[0].set_labels({‘a’, ‘b’, ‘c’, ‘d’});patterns[0].add_edge(1, 5);

auto counts = count(G, patterns);

48

Pattern Programming

DataGraph G(“path/to/graph/”);auto patterns = PatternGenerator::all(size, VERTEX_INDUCED);patterns[0].set_labels({‘a’, ‘b’, ‘c’, ‘d’});patterns[0].add_edge(1, 5);patterns.emplace_back(“path/to/pattern.txt”);auto counts = count(G, patterns);

49

Pattern Programming

DataGraph G(“path/to/graph/”);auto patterns = PatternGenerator::all(size, VERTEX_INDUCED);auto pattern = Pattern().add_edge(1, 2)

.add_edge(1, 3)

.add_edge(2, 3);auto counts = count(G, {pattern});

50

Pattern Programming

#include “Peregrine.hh”using namespace Peregrine;

void motifCounting(int size){

DataGraph G(“path/to/graph/”);auto patterns = PatternGenerator::all(size, VERTEX_INDUCED);auto counts = count(G, patterns);

for (auto &[pattern, n] : counts)std::cout << pattern << “ ” << n << std::endl;

}

51

Pattern Programming

#include “Peregrine.hh”using namespace Peregrine;

void motifCounting(int size){

DataGraph G(“path/to/graph/”);auto patterns = PatternGenerator::all(size, VERTEX_INDUCED);auto counts = count(G, patterns);

for (auto &[pattern, n] : counts)std::cout << pattern << “ ” << n << std::endl;

}

52

Pattern Programming

void frequentSubgraphMining(){

DataGraph G(“path/to/graph/”);auto patterns = PatternGenerator::all(2, EDGE_INDUCED);auto mapDomain = [](auto &&match, auto &&aggregator)

{ aggregator.map(match.pattern, match.mapping); };

auto results = match<Pattern, Domain>(G, patterns, mapDomain);

for (auto &[pattern, frequency] : results)std::cout << pattern << “ ” << frequency << std::endl;

}

53

Pattern Programming

void frequentSubgraphMining(){

DataGraph G(“path/to/graph/”);auto patterns = PatternGenerator::all(2, EDGE_INDUCED);auto mapDomain = [](auto &&match, auto &&aggregator)

{ aggregator.map(match.pattern, match.mapping); };

auto results = match<Pattern, Domain>(G, patterns, mapDomain);

for (auto &[pattern, frequency] : results)std::cout << pattern << “ ” << frequency << std::endl;

}

54

Pattern Programming

void frequentSubgraphMining(){

DataGraph G(“path/to/graph/”);auto patterns = PatternGenerator::all(2, EDGE_INDUCED);auto mapDomain = [](auto &&match, auto &&aggregator)

{ aggregator.map(match.pattern, match.mapping); };

auto results = match<Pattern, Domain>(G, patterns, mapDomain);

for (auto &[pattern, frequency] : results)std::cout << pattern << “ ” << frequency << std::endl;

}

55

Pattern Programming

void frequentSubgraphMining(){

DataGraph G(“path/to/graph/”);auto patterns = PatternGenerator::all(2, EDGE_INDUCED);auto mapDomain = [](auto &&match, auto &&aggregator)

{ aggregator.map(match.pattern, match.mapping); };

auto results = match<Pattern, Domain>(G, patterns, mapDomain);

for (auto &[pattern, frequency] : results)std::cout << pattern << “ ” << frequency << std::endl;

}

56

Pattern Awareness

Pattern Selection

Pattern Matching

57

Pattern Awareness

Pattern Selection

u

v

Anti-Edge

58

Pattern Awareness

Pattern Selection

u

v

Anti-Vertex

59

Pattern Awareness

Pattern Selection

Pattern Matching

60

Pattern Awareness

Pattern Selection

Pattern Matching

61

Pattern Awareness

Pattern Matching

• Symmetry breaking (RECOMB ‘07)

• Core pattern reduction (SIGMOD ‘16)

62

Symmetry Breaking

63

Symmetry Breaking

u x

y v

64

Symmetry Breaking

u x

y v

65

Symmetry Breaking

x

y

66

v

u

Symmetry Breaking

u x

y v

67

Symmetry Breaking

u y

x v

68

Symmetry Breaking

u x

y v

Worker 1

a

db

e

c

gf

Data Graph

69

b x

y v

Symmetry Breaking

a

db

e

c

gf

Data Graph Worker 1

70

b x

y d

Symmetry Breaking

Worker 1

a

db

e

c

gf

Data Graph

71

Symmetry Breaking

b x

y d

Worker 1

u x

y v

Worker 2

a

db

e

c

gf

Data Graph

72

Symmetry Breaking

b x

y d

Worker 1

d x

y v

Worker 2

a

db

e

c

gf

Data Graph

73

Symmetry Breaking

b x

y d

Worker 1

d x

y b

Worker 2

a

db

e

c

gf

Data Graph

74

Symmetry Breaking

b a

y d

Worker 1

d a

y b

Worker 2

a

db

e

c

gf

Data Graph

75

Symmetry Breaking

b a

g d

Worker 1

d a

g b

Worker 2

a

db

e

c

gf

Data Graph

76

Symmetry Breaking

b a

g d

Worker 1

d a

g b

Worker 2

a

db

e

c

gf

Data Graph

77

Symmetry Breaking

u < v

u

v

x

y

78

Symmetry Breaking

x < y

u

v

x

y

79

Symmetry Breaking

u < vx < y

u

v

x

y

80

Core Pattern

u

v

x

y

81

Core Pattern

u

v

x

y

82

Core Pattern

u

v

x

y

83

Core Pattern

b

d

Pattern-Unaware

a

db

e

c

gf

Data Graph

84

Core Pattern

b a

d

Pattern-Unaware

a

db

e

c

gf

Data Graph

85

Core Pattern

b a

d

Pattern-Unaware

a

db

e

c

gf

Data Graph

86

Core Pattern

b a

g d

Pattern-Unaware

a

db

e

c

gf

Data Graph

87

Core Pattern

b a

g d

Pattern-Unaware

a

db

e

c

gf

Data Graph

88

Core Pattern

b x

y d

Pattern-Aware

a

db

e

c

gf

Data Graph

89

Core Pattern

b x

y d

Pattern-Aware

a

db

e

c

gf

Data Graph

90

Core Pattern

a

db

e

c

gf

Data Graph

b x

y d

Pattern-Aware

91

Core Pattern

b a

g d

Pattern-Aware

a

db

e

c

gf

Data Graph

92

Core Pattern

b a

g d

Pattern-Aware

a

db

e

c

gf

Data Graph

h

93

Core Pattern

b a

h d

Pattern-Aware

b a

g d

a

db

e

c

gf

Data Graph

h

94

Core Pattern

b a

h d

Pattern-Aware

b a

g d

a

db

e

c

gf

Data Graph

h

95

Pattern Awareness

Pattern Matching

• Symmetry breaking (RECOMB ‘07)

• Core pattern reduction (SIGMOD ‘16)

96

Pattern Awareness

Pattern Matching

• Early termination

97

Early Termination

bool globalClusteringCoefficient(int bound){

DataGraph G(“path/to/graph/”);auto triplet = PatternGenerator::star(3);int numTriplets = count(G, {triplet});auto countAndCheck = [=](auto &&match, auto &&aggregator){int numTriangles = aggregator.readValue(match.pattern);if (3*numTriangles/numTriplets > bound) aggregator.stop();else aggregator.map(match.pattern, 1);

}auto triangle = PatternGenerator::clique(3);auto result = match<Pattern, int>(G, triangle, countAndCheck);return 3*result[triangle]/numTriplets > bound;

}

98

Early Termination

bool globalClusteringCoefficient(int bound){

DataGraph G(“path/to/graph/”);auto triplet = PatternGenerator::star(3);int numTriplets = count(G, {triplet});auto countAndCheck = [=](auto &&match, auto &&aggregator){int numTriangles = aggregator.readValue(match.pattern);if (3*numTriangles/numTriplets > bound) aggregator.stop();else aggregator.map(match.pattern, 1);

}auto triangle = PatternGenerator::clique(3);auto result = match<Pattern, int>(G, triangle, countAndCheck);return 3*result[triangle]/numTriplets > bound;

}

99

Early Termination

bool globalClusteringCoefficient(int bound){

DataGraph G(“path/to/graph/”);auto triplet = PatternGenerator::star(3);int numTriplets = count(G, {triplet});auto countAndCheck = [=](auto &&match, auto &&aggregator){int numTriangles = aggregator.readValue(match.pattern);if (3*numTriangles/numTriplets > bound) aggregator.stop();else aggregator.map(match.pattern, 1);

}auto triangle = PatternGenerator::clique(3);auto result = match<Pattern, int>(G, triangle, countAndCheck);return 3*result[triangle]/numTriplets > bound;

}

100

Triangle

Early Termination

bool globalClusteringCoefficient(int bound){

DataGraph G(“path/to/graph/”);auto triplet = PatternGenerator::star(3);int numTriplets = count(G, {triplet});auto countAndCheck = [=](auto &&match, auto &&aggregator){int numTriangles = aggregator.readValue(match.pattern);if (3*numTriangles/numTriplets > bound) aggregator.stop();else aggregator.map(match.pattern, 1);

}auto triangle = PatternGenerator::clique(3);auto result = match<Pattern, int>(G, triangle, countAndCheck);return 3*result[triangle]/numTriplets > bound;

}

101

TripletTriangle

Early Termination

bool globalClusteringCoefficient(int bound){

DataGraph G(“path/to/graph/”);auto triplet = PatternGenerator::star(3);int numTriplets = count(G, {triplet});auto countAndCheck = [=](auto &&match, auto &&aggregator){int numTriangles = aggregator.readValue(match.pattern);if (3*numTriangles/numTriplets > bound) aggregator.stop();else aggregator.map(match.pattern, 1);

}auto triangle = PatternGenerator::clique(3);auto result = match<Pattern, int>(G, triangle, countAndCheck);return 3*result[triangle]/numTriplets > bound;

}

102

TripletTriangle

Early Termination

bool globalClusteringCoefficient(int bound){

DataGraph G(“path/to/graph/”);auto triplet = PatternGenerator::star(3);int numTriplets = count(G, {triplet});auto countAndCheck = [=](auto &&match, auto &&aggregator){int numTriangles = aggregator.readValue(match.pattern);if (3*numTriangles/numTriplets > bound) aggregator.stop();else aggregator.map(match.pattern, 1);

}auto triangle = PatternGenerator::clique(3);auto result = match<Pattern, int>(G, triangle, countAndCheck);return 3*result[triangle]/numTriplets > bound;

}

103

TripletTriangle

Early Termination

bool globalClusteringCoefficient(int bound){

DataGraph G(“path/to/graph/”);auto triplet = PatternGenerator::star(3);int numTriplets = count(G, {triplet});auto countAndCheck = [=](auto &&match, auto &&aggregator){int numTriangles = aggregator.readValue(match.pattern);if (3*numTriangles/numTriplets > bound) aggregator.stop();else aggregator.map(match.pattern, 1);

}auto triangle = PatternGenerator::clique(3);auto result = match<Pattern, int>(G, triangle, countAndCheck);return 3*result[triangle]/numTriplets > bound;

}

104

TripletTriangle

Early Termination

bool globalClusteringCoefficient(int bound){

DataGraph G(“path/to/graph/”);auto triplet = PatternGenerator::star(3);int numTriplets = count(G, {triplet});auto countAndCheck = [=](auto &&match, auto &&aggregator){int numTriangles = aggregator.readValue(match.pattern);if (3*numTriangles/numTriplets > bound) aggregator.stop();else aggregator.map(match.pattern, 1);

}auto triangle = PatternGenerator::clique(3);auto result = match<Pattern, int>(G, triangle, countAndCheck);return 3*result[triangle]/numTriplets > bound;

}

105

TripletTriangle

Early Termination

bool globalClusteringCoefficient(int bound){

DataGraph G(“path/to/graph/”);auto triplet = PatternGenerator::star(3);int numTriplets = count(G, {triplet});auto countAndCheck = [=](auto &&match, auto &&aggregator){int numTriangles = aggregator.readValue(match.pattern);if (3*numTriangles/numTriplets > bound) aggregator.stop();else aggregator.map(match.pattern, 1);

}auto triangle = PatternGenerator::clique(3);auto result = match<Pattern, int>(G, triangle, countAndCheck);return 3*result[triangle]/numTriplets > bound;

}

106

TripletTriangle

Early Termination

bool globalClusteringCoefficient(int bound){

DataGraph G(“path/to/graph/”);auto triplet = PatternGenerator::star(3);int numTriplets = count(G, {triplet});auto countAndCheck = [=](auto &&match, auto &&aggregator){int numTriangles = aggregator.readValue(match.pattern);if (3*numTriangles/numTriplets > bound) aggregator.stop();else aggregator.map(match.pattern, 1);

}auto triangle = PatternGenerator::clique(3);auto result = match<Pattern, int>(G, triangle, countAndCheck);return 3*result[triangle]/numTriplets > bound;

}

107

TripletTriangle

Early Termination

bool globalClusteringCoefficient(int bound){

DataGraph G(“path/to/graph/”);auto triplet = PatternGenerator::star(3);int numTriplets = count(G, {triplet});auto countAndCheck = [=](auto &&match, auto &&aggregator){int numTriangles = aggregator.readValue(match.pattern);if (3*numTriangles/numTriplets > bound) aggregator.stop();else aggregator.map(match.pattern, 1);

}auto triangle = PatternGenerator::clique(3);auto result = match<Pattern, int>(G, triangle, countAndCheck);return 3*result[triangle]/numTriplets > bound;

}

108

Comparison with Existing Work

• Peregrine with 16 logical cores and 32GB RAM

• Arabesque & Fractal with 8x16 logical cores and 8x32GB RAM

• RStream with 96 logical cores and 192GB RAM

109

Comparison with Existing Work

110

Comparison with Existing Work

111

Effects of Pattern Awareness

112

Scalability

113

Scalability

114

Scalability

115

Scalability

116

Scalability

117

Scalability

118

Scalability

119

Pattern-aware programming and processing models

120

Pattern-aware programming and processing models

• Shift abstraction from subgraph to pattern

121

Pattern-aware programming and processing models

• Shift abstraction from subgraph to pattern

•User program is transparent to the system

122

Pattern-aware programming and processing models

• Shift abstraction from subgraph to pattern

•User program is transparent to the system

•Up to 42x faster than pattern-unaware

•Up to 737x faster than state-of-the-art

123

Pattern-aware programming and processing models

• Shift abstraction from subgraph to pattern

•User program is transparent to the system

•Up to 42x faster than pattern-unaware

•Up to 737x faster than state-of-the-art

https://github.com/pdclab/peregrine

124

top related