large-scale graph processing on emerging storage devices...prior external graph processing --...
TRANSCRIPT
![Page 1: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/1.jpg)
Large-Scale Graph Processing on Emerging Storage Devices
Nima Elyasi1, Changho Choi2, Anand Sivasubramaniam1
1Pennsylvania State University
2Samsung Semiconductor Inc.
![Page 2: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/2.jpg)
Graph Processing is Commonplace
2
Search Engines Social MediaRecommendations
and AdsMap and
Navigation
![Page 3: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/3.jpg)
Large-Scale Graph Processing Challenges
Huge Datasets Irregular Accesses
5
High cost of DRAM
$$$$
DRAM
![Page 4: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/4.jpg)
Large-Scale Graph Processing Challenges
Huge Datasets Irregular Accesses
External Graph Processing is Desirable
5
High cost of DRAM
$$$$
NVMe SSD
$
DRAM
![Page 5: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/5.jpg)
Large-Scale Graph Processing Challenges
Huge Datasets Irregular Accesses
External Graph Processing is Desirable
5
High cost of DRAM
$$$$
NVMe SSD
$
DRAM
![Page 6: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/6.jpg)
Large-Scale Graph Processing Challenges
Huge Datasets Irregular Accesses
External Graph Processing is Desirable
5
High cost of DRAM
$$$$
NVMe SSDFine-Grained and Random
Accesses
$
DRAM
![Page 7: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/7.jpg)
Fine-Grained Access in External Graph Processing
5
SSD Page Size and Vertex Accesses Don’t Match!
SSD Page 0 SSD Page 1
SSD Page
Several KiloBytes(4KB ~ 16KB) Several Bytes,
e.g., 4Bytes
Vertex Value
Irregular Accesses
![Page 8: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/8.jpg)
Fine-Grained Access in External Graph Processing
5
SSD Page Size and Vertex Accesses Don’t Match!
SSD Page 0 SSD Page 1
SSD Page
Several KiloBytes(4KB ~ 16KB)
Vertex updates are detrimental to:
Performance Device Endurance
Several Bytes, e.g., 4Bytes
Vertex Value
Irregular Accesses
![Page 9: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/9.jpg)
Providing Perfect Sequentiality as a Remedy
9
• If vertex data could be stored on DRAM• Fine-grained accesses was less of an issue
GraFBoost, ISCA’18Instead, prior external graph processing framework maintains vertex data on SSD
![Page 10: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/10.jpg)
Providing Perfect Sequentiality as a Remedy
10
• If vertex data could be stored on DRAM• Fine-grained accesses was less of an issue
GraFBoost, ISCA’18Instead, prior external graph processing framework maintains vertex data on SSD
Achieves perfect sequentiality by coalescing fine-grained accesses
![Page 11: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/11.jpg)
Programming Model
5
Vertex-centric Programming Model
- Iterative programming model
- Each vertex runs a user-defined program
- Sending updates to neighbors along outgoing edges
A
![Page 12: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/12.jpg)
Prior External Graph Processing -- GraFBoost
12
Vertex Data Index File
Edge File
Sang-Woo Jun, et al. Grafboost: Using accelerated flash storage for external graph analytics, ISCA’18.
![Page 13: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/13.jpg)
Prior External Graph Processing -- GraFBoost
13
V0 → Vx
V0 → Vy
V0 → Vz
Vertex Data Index File
Edge File
Sang-Woo Jun, et al. Grafboost: Using accelerated flash storage for external graph analytics, ISCA’18.
![Page 14: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/14.jpg)
Prior External Graph Processing -- GraFBoost
14
Keys: {Vx , Vy , Vz}, Value: {V0 value}
<Vx,V0 value>, <Vy,V0 value>, <Vz,V0 value>
V0 → Vx
V0 → Vy
V0 → Vz
Vertex Data Index File
Edge File
Sang-Woo Jun, et al. Grafboost: Using accelerated flash storage for external graph analytics, ISCA’18.
![Page 15: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/15.jpg)
Prior External Graph Processing -- GraFBoost
15
Keys: {Vx , Vy , Vz}, Value: {V0 value}
<Vx,V0 value>, <Vy,V0 value>, <Vz,V0 value>
V0 → Vx
V0 → Vy
V0 → VzGraFBoost sorts key-value pairs in memory, logs them in SSD, merges them, and updates vertex list in SSD
Vertex Data Index File
Edge File
Sang-Woo Jun, et al. Grafboost: Using accelerated flash storage for external graph analytics, ISCA’18.
![Page 16: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/16.jpg)
Computation Overhead of Sort!
16
• Up to 60% sort overhead (web graph)
• Higher sort overhead for PageRank- Processes all vertices in each iteration and generates more updates
![Page 17: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/17.jpg)
Current External Graph Processing:
Read from SSD Sort in Memory Write to SSD
Linear Time O(|E|) |E|*log(|E|) Linear Time O(|E|)
Scalability Issue
17
![Page 18: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/18.jpg)
Current External Graph Processing:
Read from SSD Sort in Memory Write to SSD
Linear Time O(|E|) |E|*log(|E|) Linear Time O(|E|)
Assuming DRAM “k” times faster than SSD (e.g., k=30):
Scalability Issue
18
When k < log(|E|) → Sorting can become bottleneck
![Page 19: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/19.jpg)
Current External Graph Processing:
Read from SSD Sort in Memory Write to SSD
Linear Time O(|E|) |E|*log(|E|) Linear Time O(|E|)
Assuming DRAM “k” times faster than SSD (e.g., k=30):
Scalability Issue
19
When k < log(|E|) → Sorting can become bottleneck
![Page 20: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/20.jpg)
Current External Graph Processing:
Read from SSD Sort in Memory Write to SSD
Linear Time O(|E|) |E|*log(|E|) Linear Time O(|E|)
Assuming DRAM “k” times faster than SSD (e.g., k=30):
Scalability Issue
20
When k < log(|E|) → Sorting can become bottleneck
Instead, we propose a vertex partitioning to eliminate the sorting
![Page 21: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/21.jpg)
Extensive Prior Efforts on Partitioning Graph Data:
- Not well suited for fully external graph processing
Partitioning Graph Data
21
Require all vertices be present in main memory
Do not decouple vertices and edges
FlashGraph, FAST’15 GraphChi, OSDI’12, Mosaic, EuroSys’17
PowerGraph, OSDI’12GridGraph, USENIX ATC’15
GraphP, HPCA’18
Need each partition be completely present in
cache or memory
Dramatically increasing number of partitions and incurring high cross-
partition communication
![Page 22: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/22.jpg)
Reorganizing graph data so that vertices associated with each partition can fit in main memory
Instead, We Propose a Partitioning for Vertex Data
22
Source Vertices
Destination Vertices
![Page 23: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/23.jpg)
Reorganizing graph data so that vertices associated with each partition can fit in main memory
Instead, We Propose a Partitioning for Vertex Data
23
Sou
rce V
erte
x D
ata
Vertex ID &
ValueIndex
Vertex A Offset A
Vertex B Offset B
Vertex C Offset C
Partition 0
So
rted B
ased
on
Vertex
ID
Edge Data
Vertex A
Out-edge
Vertex A
Out-edge
Vertex B
Out-edge
Vertex C
Out-edge
Source Vertices
Destination Vertices
![Page 24: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/24.jpg)
In each iteration:
Execution Flow
24
SSD
Vertex Data
Destination
Vertex for a
partition
Memory
![Page 25: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/25.jpg)
In each iteration:
Execution Flow
25
Sou
rce V
erte
x D
ata
Vertex ID &
ValueIndex
Vertex A Offset A
Vertex B Offset B
Vertex C Offset C
Partition 0
SSD
Vertex Data
Destination
Vertex for a
partition
A Chunk
of Source
Vertex
(32MB)
Update Destination
Vertices
Reading
Neighboring
Information
Memory Memory
Stre
am
ing F
rom
SS
D
![Page 26: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/26.jpg)
In each iteration:
Execution Flow
26
Sou
rce V
erte
x D
ata
Vertex ID &
ValueIndex
Vertex A Offset A
Vertex B Offset B
Vertex C Offset C
Partition 0
SSD
Vertex Data
Destination
Vertex for a
partition
A Chunk
of Source
Vertex
(32MB)
Write all
updated
vertex data
on SSD
Update Destination
Vertices
Reading
Neighboring
Information
Generate Mirror Updates
for other partitions
Meta-
data for
current
partition
Memory Memory Memory
Stre
am
ing F
rom
SS
D
![Page 27: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/27.jpg)
In each iteration:
Execution Flow
27
Sou
rce V
erte
x D
ata
Vertex ID &
ValueIndex
Vertex A Offset A
Vertex B Offset B
Vertex C Offset C
Partition 0
SSD
Vertex Data
Destination
Vertex for a
partition
A Chunk
of Source
Vertex
(32MB)
Write all
updated
vertex data
on SSD
Update Destination
Vertices
Reading
Neighboring
Information
Generate Mirror Updates
for other partitions
Meta-
data for
current
partition
Memory Memory Memory
Stre
am
ing F
rom
SS
D
How to Update Vertex List in Main Memory?
![Page 28: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/28.jpg)
Multiple threads are updating elements of the same vertex list- High synchronization cost
Updating Vertices in Memory
28
Vertex List
![Page 29: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/29.jpg)
Multiple threads are updating elements of the same vertex list- High synchronization cost
Updating Vertices in Memory
29
Vertex List
Buffer,
e.g., 1MB
![Page 30: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/30.jpg)
Required Meta-Data for Mirror Updates
Updating Vertex Mirrors on Different Partitions
30O(|V|) running time for updating mirrors
Vertex Value
Partition i
Mirrors for
Partition 0
Source Vertex Table
Vertex 0
Part ID(s)
Vertex 1
Part ID(s)
Vertex 2
Part ID(s)
Partition 0
Start Index
End Index
Partition i
Start Index
End Index
For each
partition
Metadata
For each
VertexStart Index
End Index
![Page 31: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/31.jpg)
Experimental Setup
• Processor: Intel Xeon -- 48 Cores
• Memory: DRAM – 256 GB
• SSD: Two Samsung NVMe SSDs - 3.2 TB capacity in total, and 6.4 GB/s Sequential Read Speed
• Graph Algorithms: - PageRank and Breadth-First-Search (BFS)
• Input Graphs:- Web, Twitter, Synthetic (Kron)
31
![Page 32: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/32.jpg)
Performance Evaluation
32
• More than 2X Improvement Compared to GrafSoft
• Providing Higher Benefits for larger graphs (Web, Kron32)
• Incurring around 10% space overhead for partitioning
![Page 33: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/33.jpg)
Execution Time Breakdown
33
• Mirror updates account for 8-12% of execution time
• I/O does not remain the main contributor to the total execution time
PageRank
![Page 34: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/34.jpg)
Concluding Remarks
• Large-scale graph processing suffers from random updates to vertices
• State-of-the-art provides perfect sequentiality by sorting all updates
- High computation overhead
• A partitioning for vertex data is proposed to eliminate the need for perfect sequentiality
• In Future: Addressing timely evolving graphs
• Thanks to GraFboost authors (Sang-Woo Jun) !34
![Page 35: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo](https://reader035.vdocument.in/reader035/viewer/2022081617/604d1bf1a5763a7f6f409e1a/html5/thumbnails/35.jpg)
Thanks!
35