graphr: accelerating graph processing using reram - sites@duke · cei.pratt.duke.edu graph...

22
GraphR: Accelerating Graph Processing Using ReRAM Linghao Song*, Youwei Zhuo # , Xuehai Qian # , Hai Li*, Yiran Chen* *Duke University # University of Southern California AL CHEM alchem.usc.edu CEI cei.pratt.duke.edu

Upload: others

Post on 22-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

GraphR: Accelerating Graph Processing Using ReRAM

Linghao Song*, Youwei Zhuo#, Xuehai Qian#, Hai Li*, Yiran Chen*

*Duke University#University of Southern California

ALCHEM alchem.usc.edu

CEI cei.pratt.duke.edu

Page 2: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

Graph Processing• To understand relationships in a group of nodes

• A wide range of application domains—Bioinformatics, Social Networks, Cyber Security, Data Mining…

• Classic algorithms:—Sparse Matrix Vector Multiplication (SpMV)

—Single Source Shortest Path (SSSP)

—Page Rank

(en.wikiquote.org)

CEI cei.pratt.duke.edu

ALCHEM alchem.usc.edu

Page 3: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

The Need for Graph Processing Accelerators • Graph processing algorithms:

• Generate random access• Require high memory bandwidth

• Good target for hardware acceleration• Tesseract (ISCA’15): HMC+Inorder-Cores• Graphicionado (MICRO’16): dedicated memory accessing module• Energy Efficient Architecture for Graph (ISCA’16): asynchronous

execution• These accelerators are based on:

• Vertex-centric processing model• Conventional CMOS technology

ALCHEM alchem.usc.edu

CEI cei.pratt.duke.edu

Page 4: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

ReRAM Based Acceleration• ReRAM Xbar for Matrix-Vector Multiplication

……

Bitline : Y

Wor

dlin

e : X

Xbar : W

Y=W*X

114

114

128

5123

3

112

112

512

=

WL_0

WL_1…BL_0BL_1BL_2

WL_2303

… ⇒

01212543

2304

BL_511

(2304=3*3*128*2)

12544(12544=112*112)

layer l+1layer l

(PipeLayer HPCA’17)

(PRIME ISCA’16,ISAAC ISCA’16,

Dot-Product Engine DAC’16,RENO DAC’15)

ALCHEM alchem.usc.edu

CEI cei.pratt.duke.edu

Page 5: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

Graph Processing in Action• Vertex-centric processing model

…………

random access

global random accessHigh memory bandwidth— little computation on the randomly fetched data

ALCHEM alchem.usc.edu

CEI cei.pratt.duke.edu

Edges

Vertices

Page 6: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

Graph Processing in Action• Edge-centric Processing Model…

Renew

Read & Process

sequential write

…Generate Updates

sequential read

Sequential edge access.Random vertex access.

(X-Stream SOSP’13)

ALCHEM alchem.usc.edu

Edges

VerticesCEI cei.pratt.duke.edu

Page 7: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

GraphR: Graph Processing with ReRAM Xbar

V4

V7

V1

V2

V15

Process Edge

Reduce/Apply

V4

V1 V2 V7

edges to V4value of all vertices

=

new value of V4

ReRAMCrossbar (CB)

perform SpMV in analog manner

But, WAIT!A Xbar with a size of V-by-V?The matrix is sparse.

ALCHEM alchem.usc.edu

CEI cei.pratt.duke.edu

Page 8: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

Storage and Computation EfficiencyStore:

Compressed FormatComputation:

ReRAM Xbar SpMV(row,col,val)(0,2,3)(0,3,8)(1,2,7)(2,0,1)(3,1,4)(3,3,2)

X

Y=W*X

StorageEfficiency

ComputationEfficiency

GraphR

ALCHEM alchem.usc.edu

Coordinate ListCOO

CEI cei.pratt.duke.edu

Page 9: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

GraphR Overview

ALCHEM alchem.usc.edu

CEI cei.pratt.duke.edu

Graph Preprocessing Ordered edge list (COO) on

disk

GraphR

Memory ReRAM

Graph Engines (GEs)

Load Block(i)(sequential disk I/O)

Steaming-apply subgraphs (sequential access)

Controllerin GraphR

Software

Page 10: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

Stream-Apply Execution

Block in Storage(COO)

Destination Vertices

Sour

ce V

ertic

es Block in Storage(COO)

Block in Storage(COO)

Block in Storage(COO)

Block in GraphR

RegI

-1

RegO

GE GE GE GE

RegI

-2Re

gI-3

RegI

-4

ALCHEM alchem.usc.edu

CEI cei.pratt.duke.edu

Page 11: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

Graph Engine (GE) Processing Patterns• Different algorithms achieve different parallelism when

mapped to Xbars

• Assuming an N×N Xbar

• Parallel Multiply-Accumulate (MAC)

• Performing N2 multiplications and N2 additions in parallel

• Parallel Add-op

• Performing N additions and N ops (can be defined) in parallel

ALCHEM alchem.usc.edu

CEI cei.pratt.duke.edu

Page 12: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

Parallel MAC• Performing N2 multiplications and N2 additions

in parallel

SrcDst j3j2j1j0

i0i1i2i3

i0:1/4

i1:1/4

i3:1/4

i2:1/4

1/3

1/3

1/3

j0 j1 j3j2

Src

Dst1/2 1/2

1

1/2

1/2

� ���� ���� ����

��� � � ���

� � ��� �

� ��� ��� �

���� ���� ��������

��

��

��

��

� � � �

Src

Dst

� ���� ���� ����

��� � � ���

� � ��� �

� ��� ��� �

���� ���� ����

1/4

9/60 13/60 25/60

1/41/4

1/41 ����

13/60

������

ALCHEM alchem.usc.edu

16 MULT , 16 ADD

CEI cei.pratt.duke.edu

Page 13: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

Parallel Add-op• Performing N additions and N ops (can be defined) in parallel

SrcDst

i0i1i2i3

i0 4 i1:3 i3:2i2:11

5 31 1

j0:7 j1:6 j3:Mj2:M

Src

Dst (

j0:7 j1:5 j3:4j2:3Dst )

� � � �

� � � �

� � � �

� � � �

� � � �

min

7 6 M M

min min min

41

7 5 9 M

M 5 9 M3

12

Dst����

i0

i1

i2

i3Src(RegI)

� � � �

� � � �

� � � �

� � � �

� � � �

min

7 5 9 M

min min min

31

7 5 6 4

M M 6 41

2

� � � �

� � � �

� � � �

� � � �

� � � �

min

7 5 6 4

min min min

11

7 5 6 4

M M M M2

Dst����

� � � �

� � � �

� � � �

� � � �

� � � �

min

7 5 6 4

min min min

21

7 5 3 4

M M 3 M

(RegO)

ALCHEM alchem.usc.edu

4 ops

4 adds

CEI cei.pratt.duke.edu

Page 14: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

Also in the paper …• Graph dataset preprocessing method

• Hardware components in GraphR

• Detailed comparison to other accelerators (Table 1)

ALCHEM alchem.usc.edu

CEI cei.pratt.duke.edu

Page 15: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

Evaluation• Evaluation Setup

- Data Sets

- Applications: PageRank, BFS, SSSP, SpMV- CPU: Intel Xeon E5-2630 V3- GPU: NVIDIA Tesla K40c

- GraphR: 8-8 Xbar, 32 Xbars/GE, 64 GEs ALCHEM alchem.usc.edu

CEI cei.pratt.duke.edu

Page 16: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

CPU Comparison: Performance

- Gmean: Performance 16.01x- SpMV, PageRank > BFS, SSSP

- Parallel MAC leads to higher speedup

ALCHEM alchem.usc.edu

CEI cei.pratt.duke.edu

Page 17: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

CPU Comparison: Energy Efficiency

- Energy Efficiency 33.82x

ALCHEM alchem.usc.edu

- SpMV, PageRank > BFS, SSSP- Parallel MAC leads to higher energy efficiency

CEI cei.pratt.duke.edu

Page 18: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

GPU Comparison

• Speedup: 1.69× to 2.19× • Energy Efficiency: 4.77× to 8.91×

ALCHEM alchem.usc.edu

CEI cei.pratt.duke.edu

Page 19: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

Accelerator Comparison

ALCHEM alchem.usc.edu

• Speedup: 1.16× to 4.12× • Energy Efficiency: 3.67× to 10.96× CEI

cei.pratt.duke.edu

Page 20: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

Sensitivity to Density

ALCHEM alchem.usc.edu

• Density↑ -> Speedup & Energy Efficiency↑• Achieving greater parallelism

CEI cei.pratt.duke.edu

Page 21: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

Conclusion• We propose GraphR:

• A graph processing accelerator based on ReRAM

• Key Insights/Results:

• ReRAM based SpMV for processing in graph engine

• Stream-apply execution

• Parallel MAC and Add-Op patterns

• 16.01x performance gain and 33.82 in energy efficiency

ALCHEM alchem.usc.edu

CEI cei.pratt.duke.edu

Page 22: GraphR: Accelerating Graph Processing Using ReRAM - Sites@Duke · cei.pratt.duke.edu Graph Preprocessing Ordered edge list (COO) on disk GraphR Memory ReRAM Graph Engines (GEs) Load

GraphR: Accelerating Graph Processing Using ReRAM

Linghao Song*, Youwei Zhuo#, Xuehai Qian#, Hai Li*, Yiran Chen*

*Duke University#University of Southern California

ALCHEM alchem.usc.edu

CEI cei.pratt.duke.edu