graphr: accelerating graph processing using reram - sites@duke · cei.pratt.duke.edu graph...
Post on 22-Sep-2020
4 Views
Preview:
TRANSCRIPT
GraphR: Accelerating Graph Processing Using ReRAM
Linghao Song*, Youwei Zhuo#, Xuehai Qian#, Hai Li*, Yiran Chen*
*Duke University#University of Southern California
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
Graph Processing• To understand relationships in a group of nodes
• A wide range of application domains—Bioinformatics, Social Networks, Cyber Security, Data Mining…
• Classic algorithms:—Sparse Matrix Vector Multiplication (SpMV)
—Single Source Shortest Path (SSSP)
—Page Rank
(en.wikiquote.org)
CEI cei.pratt.duke.edu
ALCHEM alchem.usc.edu
The Need for Graph Processing Accelerators • Graph processing algorithms:
• Generate random access• Require high memory bandwidth
• Good target for hardware acceleration• Tesseract (ISCA’15): HMC+Inorder-Cores• Graphicionado (MICRO’16): dedicated memory accessing module• Energy Efficient Architecture for Graph (ISCA’16): asynchronous
execution• These accelerators are based on:
• Vertex-centric processing model• Conventional CMOS technology
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
ReRAM Based Acceleration• ReRAM Xbar for Matrix-Vector Multiplication
……
Bitline : Y
Wor
dlin
e : X
Xbar : W
Y=W*X
114
114
128
5123
3
112
112
512
=
…
WL_0
WL_1…BL_0BL_1BL_2
WL_2303
… ⇒
01212543
2304
…
BL_511
(2304=3*3*128*2)
12544(12544=112*112)
layer l+1layer l
(PipeLayer HPCA’17)
(PRIME ISCA’16,ISAAC ISCA’16,
Dot-Product Engine DAC’16,RENO DAC’15)
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
Graph Processing in Action• Vertex-centric processing model
…
…
…………
random access
global random accessHigh memory bandwidth— little computation on the randomly fetched data
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
Edges
Vertices
Graph Processing in Action• Edge-centric Processing Model…
…
Renew
Read & Process
…
sequential write
…Generate Updates
sequential read
Sequential edge access.Random vertex access.
(X-Stream SOSP’13)
ALCHEM alchem.usc.edu
Edges
VerticesCEI cei.pratt.duke.edu
GraphR: Graph Processing with ReRAM Xbar
V4
V7
V1
V2
V15
Process Edge
Reduce/Apply
V4
V1 V2 V7
edges to V4value of all vertices
=
new value of V4
ReRAMCrossbar (CB)
perform SpMV in analog manner
But, WAIT!A Xbar with a size of V-by-V?The matrix is sparse.
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
Storage and Computation EfficiencyStore:
Compressed FormatComputation:
ReRAM Xbar SpMV(row,col,val)(0,2,3)(0,3,8)(1,2,7)(2,0,1)(3,1,4)(3,3,2)
X
Y=W*X
StorageEfficiency
ComputationEfficiency
GraphR
ALCHEM alchem.usc.edu
Coordinate ListCOO
CEI cei.pratt.duke.edu
GraphR Overview
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
Graph Preprocessing Ordered edge list (COO) on
disk
GraphR
Memory ReRAM
Graph Engines (GEs)
Load Block(i)(sequential disk I/O)
Steaming-apply subgraphs (sequential access)
Controllerin GraphR
Software
Stream-Apply Execution
Block in Storage(COO)
Destination Vertices
Sour
ce V
ertic
es Block in Storage(COO)
Block in Storage(COO)
Block in Storage(COO)
Block in GraphR
RegI
-1
RegO
GE GE GE GE
RegI
-2Re
gI-3
RegI
-4
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
Graph Engine (GE) Processing Patterns• Different algorithms achieve different parallelism when
mapped to Xbars
• Assuming an N×N Xbar
• Parallel Multiply-Accumulate (MAC)
• Performing N2 multiplications and N2 additions in parallel
• Parallel Add-op
• Performing N additions and N ops (can be defined) in parallel
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
Parallel MAC• Performing N2 multiplications and N2 additions
in parallel
SrcDst j3j2j1j0
i0i1i2i3
i0:1/4
i1:1/4
i3:1/4
i2:1/4
1/3
1/3
1/3
j0 j1 j3j2
Src
Dst1/2 1/2
1
1/2
1/2
� ���� ���� ����
��� � � ���
� � ��� �
� ��� ��� �
���� ���� ��������
��
��
��
��
�
� � � �
Src
Dst
� ���� ���� ����
��� � � ���
� � ��� �
� ��� ��� �
���� ���� ����
1/4
9/60 13/60 25/60
1/41/4
1/41 ����
13/60
������
ALCHEM alchem.usc.edu
16 MULT , 16 ADD
CEI cei.pratt.duke.edu
Parallel Add-op• Performing N additions and N ops (can be defined) in parallel
SrcDst
i0i1i2i3
i0 4 i1:3 i3:2i2:11
5 31 1
j0:7 j1:6 j3:Mj2:M
Src
Dst (
j0:7 j1:5 j3:4j2:3Dst )
� � � �
� � � �
� � � �
� � � �
� � � �
min
7 6 M M
min min min
41
7 5 9 M
M 5 9 M3
12
Dst����
i0
i1
i2
i3Src(RegI)
� � � �
� � � �
� � � �
� � � �
� � � �
min
7 5 9 M
min min min
31
7 5 6 4
M M 6 41
2
� � � �
� � � �
� � � �
� � � �
� � � �
min
7 5 6 4
min min min
11
7 5 6 4
M M M M2
Dst����
� � � �
� � � �
� � � �
� � � �
� � � �
min
7 5 6 4
min min min
21
7 5 3 4
M M 3 M
(RegO)
ALCHEM alchem.usc.edu
4 ops
4 adds
CEI cei.pratt.duke.edu
Also in the paper …• Graph dataset preprocessing method
• Hardware components in GraphR
• Detailed comparison to other accelerators (Table 1)
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
Evaluation• Evaluation Setup
- Data Sets
- Applications: PageRank, BFS, SSSP, SpMV- CPU: Intel Xeon E5-2630 V3- GPU: NVIDIA Tesla K40c
- GraphR: 8-8 Xbar, 32 Xbars/GE, 64 GEs ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
CPU Comparison: Performance
- Gmean: Performance 16.01x- SpMV, PageRank > BFS, SSSP
- Parallel MAC leads to higher speedup
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
CPU Comparison: Energy Efficiency
- Energy Efficiency 33.82x
ALCHEM alchem.usc.edu
- SpMV, PageRank > BFS, SSSP- Parallel MAC leads to higher energy efficiency
CEI cei.pratt.duke.edu
GPU Comparison
• Speedup: 1.69× to 2.19× • Energy Efficiency: 4.77× to 8.91×
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
Accelerator Comparison
ALCHEM alchem.usc.edu
• Speedup: 1.16× to 4.12× • Energy Efficiency: 3.67× to 10.96× CEI
cei.pratt.duke.edu
Sensitivity to Density
ALCHEM alchem.usc.edu
• Density↑ -> Speedup & Energy Efficiency↑• Achieving greater parallelism
CEI cei.pratt.duke.edu
Conclusion• We propose GraphR:
• A graph processing accelerator based on ReRAM
• Key Insights/Results:
• ReRAM based SpMV for processing in graph engine
• Stream-apply execution
• Parallel MAC and Add-Op patterns
• 16.01x performance gain and 33.82 in energy efficiency
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
GraphR: Accelerating Graph Processing Using ReRAM
Linghao Song*, Youwei Zhuo#, Xuehai Qian#, Hai Li*, Yiran Chen*
*Duke University#University of Southern California
ALCHEM alchem.usc.edu
CEI cei.pratt.duke.edu
top related