gpu-accelerated large-scale dense subgraph...

Post on 25-Jul-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GPU$accelerated$Large.scale$Dense$Subgraph$Detection$!

Andy!(Chang+Jun)!Wu!Email:!changjun.wu@xerox.com!!Xerox!Research!Center,!Webster,!NY!

What!is!dense!subgraph!detection?!

Page!2!

Definition:)+  Detect!subsets!of!vertices,!such!

that!the!connections!within!the!induced!subgraphs!are!dense,!and!their!connections!to!the!rest!of!the!graph!are!sparse.!!

+  unsupervised*learning*

!!Applications:)•  Community!detection!•  Recommender!system!•  Graph!visualization!•  Data!exploration!

Large3scale$dense$subgraph$detection?$

Page!3!

Graph!clustering!heuristics!

Page!4!

Observation:))+  nodes!belonging!to!the!same!cluster!have!a!high!

overlap!of!their!neighbors!(aka)outlinks)or)adjacency)lists).!

$Clustering)heuristics:*+  If!two!nodes!have!a!high!overlap!of!their!neighbors,!

then!most)likely)that!they!belong!to!the!same!cluster.!+  It*is*a*necessary*but*not*sufficient*condition.*

u*v*

x*

y*z*

p*

an!instance!of!dense!subgraph!

Dense*subgraph*detection!

Γ(x)!:*the*adjacency*list*of*vertex*x*Γ(x)*=*{u,*v,*p,*z,*y}********!Γ(y)*=*{u,*v,*p,*z,*x}!!

Set comparison

Page!5!

Set*comparison!

A$=$Γ(x)*=*{u,*v,*p,*z,*y}!! B$=$Γ(y)*=*{u,*v,*p,*z,*x}!

{u,*z}! {u,*z}!

MinEwise!permutation!theory:! prob.{minK (π i (A))==minK (π i (B))} =| A∩B || A∪B |

πi():!a!permutation!on!a!set!of!elements!

Broder,*1997*

Shingling!on!all!adjacency!lists!

Page!6!

Γ(v1)! Γ(v2)! Γ(v3)! Γ(v4)! Γ(v5)!

permutation!

C!shingles!

v1!

C!shingles!

v2! v3!

C!shingles!

v4! v5!

C!shingles! C!shingles!

shingle!

Shingling!example!for!a!clique*

Page!7!

v1*:*v

1,*v

2,*v

3,…,*v

n*

v2*:*v

1,*v

2,*v

3,…,*v

n*

…,*…,*…,*…,…*

vn*:*v

1,*v

2,*v

3,…,*v

n*

size:*n×n*

input!graph*

T1*:*S

1,*S

2,*S

3,…,*S

n*

T2*:*S

1,*S

2,*S

3,…,*S

n*

…,*…,*…,*…,…*

Tc*:*S

1,*S

2,*S

3,…,*S

n*

size:*c×c*

After!2nd!level!shingling!

A*clique*example!1st!level!shingling!

<v1,*S

1>,*<v

1,*S

2>,…,<v

1,*S

c>

<v1,*S

1>,*<v

1,*S

2>,…,<v

1,*S

c>

S1*:*v

1,*v

2,*v

3,…,*v

n*

S2*:*v

1,*v

2,*v

3,…,*v

n*

…,*…,*…,*…,…*

Sc*:*v

1,*v

2,*v

3,…,*v

n*

size:*c×n*

new!input!graph*

1st!level!shingles!

2nd!level!shingles!

clique

dense*subgraph

<vn,*S

1>,*<v

n,*S

2>,…,<v

n,*S

c>

v1*

v2

… vn

1st!level!shingling!

Shingling$on$GPU$

Page!8!

GPU!introduction!

Page!9!

Per+thread!memory!–!1x!Per+block!shared!memory!–!1x!

Thread!Thread!block!

Grid!

Glo

bal m

emor

y

Hos

t mem

ory

100x!GPU$

CUDA$application$

host!=!CPU!

host!=!CPU!

device!=!GPU!

device!=!GPU!

A!CPU+GPU!computational!framework!

Page!10!

load!input!graph!

aggregate!graph!

report!dense!subgraph!

time!lin

e

adjacency!lists!

shingles!

adjacency!lists!

shingles!

shingling

1st!level!shingling!

2nd!level!shingling!

Shingling!

Page!11!

Γ(v1) Γ(v2) Γ(v3) Γ(v4) Γ(v5)

permutation!

C!shingles

v1

C!shingles

v2 v3

C!shingles

v4 v5

C!shingles C!shingles

shingle

sorting!

Shingling!on!GPU!

Page!12!

Shared!memory!

Thread!block0! thread!block1! thread!block2! thread!block3!

Global!memory!

iteration0!

CPU!memory!

iteration1! iterationC!

seg1! seg2! seg3! seg4! seg5!

Global!memory!

seg1! seg2! seg3! seg4! seg5!

seg6! seg7!

CPU!memory!

Segmented!sorting!problem!

Page!13!

A[i]!

offset[i]!=!count(!j!<!i!where!A[j]!>!A[i]!)!+!count(!j!>!i!where!A[j]!<!A[i]!)!

A[i]!3333>!A[i$.$offset(i)]!

Parallel!counting!sort!Thread!block!

A[0]! A[n]!

gt:!great!counter!lt:!less!counter!

!index!=!i!–!(gt!–!lt)!

threadi*

thread!block0! thread!block1! thread!block2! thread!block3!

Segmented!counting!sort!

BLOCK

_SIZE!

1$ 2$

Parallel!odd+even!sort!Parallel!merge!sort!Parallel!radix!sort!****************E*Satish*et*al.*(2009)*

shuffle!

no$data$shuffling.$

Experimental$studies$

Page!14!

Experimental!platform!

Page!15!

GPU:$NVIDIA$Tesla$Kepler$K20c$_____________________________________________________$

CUDA)capability:)3.5)CUDA)driver/runtime:)5.0)Streaming)multiprocessors:)13))CUDA)cores:)192)Shared)memory:)48KB)Global)memory:)5GB))

CPU:)Intel)Xeon)E532650)RAM:)32GB)OS:)Red)Hat)Enterprise)Linux)6.3)

PCI!express!3.0!host!

device!

Performance!study!

Page!16!

Input$Graph)

Runtime$of$each$component$in$gpClust)serial$

runtime) speedup) GPU$speedup)CPU) GPU) H<.>D) I/O) total$

runtime)

20K! 52.70! 7.57! 6.08! 0.40! 66.75! 392.32! 5.88x! 44.86x!

2M! 2685.06! 447.97! 114.18! 28.77! 3275.89! 23,537.80! 7.18x! 373.71x!

#$input$seqs.$

#$singleton$vertices$

#$vertices$ #$edges$Average$degree$

Largest$CC$size$

20,000! 2,921!! 17,079!! 374,928!! 44!±!69!! 10,707!!

2,004,241! 441,257! 1,562,984!! 56,919,738!! 73!±!153!! 31,872!!

3)Two)arbitrary)sets)of)predicted)protein)families)from)Global)Ocean)Sampling)(GOS))project.)

α!=!10%+20%!serial!computation!time!on!the!CPU!side$Amdahl's$law:!max!speedup!=!1/α!=!5x)–)10x)

Better!performance!can!be!achieved!through!streaming!!

Cluster!size!distribution!

Page!17!

0

500

1000

1500

2000

2500

3000

3500

20-4950-99

100-199

200-499

500-999

1000-2000

>2000

Num

ber o

f gro

ups

Group size

gpClust approachGOS approach

Conclusion!

!  A!GPU!accelerated!large+scale!graph!clustering!algorithm.!" Scalable!solution!to!large+scale!input!graphs!" a!good!speedup!on!sequence!similarity!graphs!

!  Parallel!counting!sort!+++>!more!efficient!order!statistics.!!  Push!more!workload!(e.g.!union+find)!to!the!GPU!side.!

Future work

THANKS!$$

Questions?$

BACKUP$

Page 20

Metagenomics!

!  Environmental!microbial!communities!

Page 21

Assemble!DNA!&!predict!genes!

Translated!ORF!sequences!

106+8!new!sequences!

already!known!protein!!seq.!&!families!

~5x107!clusters!~108!sequences!

protein*family*identification

<$1%$microbes!can!be!isolated!and!cultivated!!in!standard!laboratory!environment.!

Community!annotation!

known*

familym*

new*

family1*

Overview!

Page 22

Graph construction Input sequences

Dense subgraph detection

Our)parallel)approach) pGraph:!parallel!graph!construction!•  Distributed*memory*alg.*

******************************************E*Wu*et.*al.*(2008,*2012)*

pClust:*MinEwise*clustering!•  MapReduce*version*

********************E*Rytsareva*et*al.*(2011)*

•  MultiEcore*CPU*version*

!!!!!!!!!!!!!!!!!!!E*Chapman*et*al.*(2011)*

•  GPGPU*version*

K+neighbor!clustering!Global)Ocean)Sampling)(GOS))Yooseph*et*al.*(2007)*

All+against+all!BLAST!

Shingling!algorithm!

Page!23!

…!!!!!!!!!…!!!!!!!!!!…!!!!!!!!!!!!!…!<z,*Sz

1>,*<z,*Sz

2>,*<z,*Sz

3>,…,*<z,*S*z

c>*

π1(),)π2(),)π3(),)…,)πc())

Set*comparison!

MinEwise!permutation!theory:! prob.{minK (π i (Γ(x)))==minK (π i (Γ(y)))} =|Γ(x)∩Γ(y) ||Γ(x)∪Γ(y) |

Γ(x)!:*the*adjacency*list*of*vertex*x!πi():!a!permutation!on!a!set!of!elements!

Gibson!et*al.,*2005!

Γ(y)!=!{a,*b,*c,*e,…,*r}!

Γ(z)*=*{…}!…!

*<x,*Sx1>,*<x,*Sx

2>,*<x,*Sx

3>,…,*<x,*Sx

c>*

Γ(x)!=!{a,*b,*c,*d,…,*r}!

input!graph!

Strategy!for!qualitative!comparison!

Page 24

Input!sequences!

GOS)clusters!All+against+all!

BLAST)k+neighbor!clustering!

GOS)

Parallel!homology!detection!

Shingling!clustering!

gpClust)

gpClust$clusters!

sequence+based!profiling!

sequence+based!profiling!

Cluster)expansion)

Benchmark))(protein)families)!

profile+based!matching!are!more!sensitive!than!sequence+based!matching.!

profile+based!profiling!

Qualitative!study!for!2M!sequences!

Approach) #)clusters) density) #)seqs))included)

Group)size)

Largest) Average)

Benchmark) 813!! 0.09!±!0.12! 2,004,241!! 56,266!! 2,!465!±!4,!372!!

GOS)) 6,152!! 0.40!±!0.27!! 1,236,712!! 20,027!! 201!±!650!!

gpClust)) 6,646!! 0.75!±!0.28!! 1,414,952!! 19,066!! 213!±!721!!

Page 25

density = # edges in a cluster

# all possible edges in a cluster

top related