partitions processing using streaming x-stream: edge ...ey204/teaching/acs/r212... · results &...

X-Stream: Edge-centric Graph Processing using Streaming Partitions Amitabha Roy, Ivo Mihailovic, Willy Zwaenepoel

Upload: others

Post on 20-Apr-2020

4 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

X-Stream: Edge-centric Graph Processing using Streaming

Partitions

Amitabha Roy, Ivo Mihailovic, Willy Zwaenepoel

ContextApproach

ModelImplementation

Results & Conclusion

Pregel & Powergraph: scatter & gather→ A scatter-gather methodology:

1. scatter(vertex v):

send updates over outgoing edges of v

2. gather(vertex v):

apply updates from inbound edges of v

→ how to scale-up?

Page 4: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

Trade-off: Sequential vs Random access

Page 5: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

GraphChi: a sequential approach→ avoids random access using shards

Problems:

1. need graph to be pre-sorted by source vertex2. vertex-centric3. requires re-sort of edges by destination vertex for gather step

Page 6: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

ContextApproach

ModelImplementation

Results & Conclusion

X-Stream’s Approach1. retain scatter-gather programming model2. use an edge-centric implementation3. stream unordered edge lists

Gains:

1. use sequential (not random) access2. do not need pre-processing step

Page 8: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

scatter-gather: an edge-centric implementationscatter(edge e):

send update over e

gather(update u):

apply update u to u.destination

Page 9: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

Quick TerminologyFast Storage:

→ caches (in-memory)

→ main-memory (out-of-core)

Slow Storage:

→ main-memory (in-memory)

→ SSD/Disk (out-of-core)

Page 10: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

ContextApproach

ModelImplementation

Results & Conclusion

The basic model:

Apply Scatter Apply Gather

input: an unordered set of directed edges

API: implementations of scatter/gather for given edges

Page 12: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

Problem: vertices may not fit in fast storage

Page 13: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

Problem: vertices may not fit in fast storage→ Streaming partitions:

- vertex set, V: a subset of the vertices of the graph - edge list: source is ∈ V- update list: dest ∈ V

→ How do we use them?

1. scatter/gather iterate over streaming partitions2. updates need to be shuffled

Page 14: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

ContextApproach

ModelImplementation

Results & Conclusion

Stream buffer

Chunck

Index Array (K entries)

Chunck Array

Page 16: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

Out-of-core In-memory→ Folds shuffle into scatter● run scatter, appending updates to an in-

memory buffer● when buffer full: run an in-memory

shuffle

→ 2 Stream Buffers

→ Number of partitions

N/K + 5SK <= M

→ Disk I/O

→ Parallel multi-stage shuffler & scatter/gather● stream independently for each

streaming partition ● work stealing● group partitions together into a tree for

the shuffler

→ 3 stream buffers

→ Number of partitions

= CPU_cache_size / footprint

Page 17: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

Chaos: the extension of X-Stream→ Scale out to multiple machines in 1 cluster

2 gains:

1. access secondary storage in parallel improves performance2. increases size of graph that can be handled

Page 18: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

Chaos: the extension of X-Stream→ Steps:

1. simple initial partitioning2. spread graph data uniformly over all 2nd storage devices3. work stealing

Assumptions:

1. network machine-to-machine bandwidth > bandwidth of storage device2. network switch bandwidth > aggregate bandwidth of all storage devices of

cluster

Page 19: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

ContextApproach

ModelImplementation

Results & Conclusion

Experiments:→ Tested on real-world graphs.

Page 21: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

Scalability

Page 22: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

Comparison

Page 23: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

Comparison: Ligra

Page 24: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

Comparison: Graphchi

Page 25: Partitions Processing using Streaming X-Stream: Edge ...ey204/teaching/ACS/R212... · Results & Conclusion. X-Stream’s Approach 1. retain scatter-gather programming model ... 1

Conclusion & TakeawayStrengths:

→ Sequential access

→ Scale up & scale out

Weaknesses

→ Limited number of problems it can handle

→ Limited types of graphs it can handle

→ How would you use in a real-world scenario

Unaudited Group results - senspdf.jse.co.za · Salient features EBITDA for the period decreased by 29% to R151 million (H1 2018: R212 million) Loss after tax of R447 million compared

Why Study A Stream’s Velocity?

Reasons She Goes to the Woods - Oneworld Publications€¦ · stream’s breath smells of bright weeds, frogspawn, lichened pebbles. The water is a dazzling drink. Circular, swirling

Algorithm runtime prediction: Methods & evaluationey204/teaching/ACS/R212... · Algorithmruntimeprediction:Methods&evaluation. Frank Hutter ... algorithm runtime, but are typically

CherryPick: Adaptively Unearthing the Best Cloud Configurations …ey204/teaching/ACS/R244_2018... · 2018-11-21 · CherryPick: Adaptively Unearthing the Best Cloud Configurations

BM61S40RFV-C Evaluation Board BM61S40RFV-EVK002 · R112, R113, R121, R122, R212, R213, R221, and R222 are implemented interim resisters for shipment check. Please replace each resister

Stream, Iridescent Light. · Stream, Iridescent Light. A stunning design from Christian Lava, Stream is an impressive presence in any room. Stream’s design exploits the physical

Six Faces of Data Centric Networking - University of …ey204/teaching/ACS/R202/slides/R202_DCN.pdf · Six Faces of Data Centric Networking ... Application level multicast ... DakNet

Service Stream Limited · 2018-06-05 · Service Stream’s Fixed Communications team specialises in providing design, construction, upgrade and maintenance services across Australia’s

Scalable Mesh Networks and The Address Space Balancing Problem Andrea ...ey204/pubs/ACS/andrea_master_thesis_2010.pdf · Andrea Lo Pumo Girton College A dissertation submitted to

15 - North West LLSnorthwest.lls.nsw.gov.au/.../word_doc/0005/...Day.docx · Web viewSite may have rock or concrete lining. BANK EROSION AND STABILITY (5) ... of your stream’s

Ernest: Efﬁcient Performance Prediction for Large-Scale …ey204/teaching/ACS/R244_2019... · 2017-09-18 · Ernest: Efﬁcient Performance Prediction for Large-Scale Advanced Analytics

MAIB Report 17/2013 - Seagate / Timor Stream - Serious ... · SERIOUS MARINE CASUALTY REPORT NO 17 ... Timor Stream’s master’s Standing Orders – Bridge ... Seagate and the refrigerated-cargo

Privacy Preservation in the Context of Big Data …ey204/pubs/talks/BIGDATA...11 Distributed Infrastructure 21 HDFS, GFS, Dynamo HBase, BigTable, Cassandra MapReduce (Hadoop, Google

ML-SOR: Message routing using multi-layer social networks ...ey204/pubs/2015_COMCOM.pdfML-SOR: Message routing using multi-layer social networks in opportunistic communications A

Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

REPLY TO PLAINTIFFS STEERING COMMITTEE’S …Trial against Gulf Stream Coach, Inc. * MAG: CHASEZ ***** REPLY TO PLAINTIFFS STEERING COMMITTEE’S OBJECTIONS TO GULF STREAM’S SECOND

ASX & MEDIA RELEASE 2 November 2016 For …Service Stream’s Energy & Water team specialise in providing design, installation, maint enance and customer management services to Australia’s

Machine Learning in the Cloud with Spark - cl.cam.ac.uk · Machine Learning in the Cloud with Spark R212 Data Centric Systems and Networking Open Source Project Study by Haikal Pribadi

Gulf Streams induced sea level rise and variability …tezer/PAPERS/2013_JGR_SLRvsGS.pdfGulf Stream’s induced sea level rise and variability along the U.S. mid-Atlantic coast Tal

Chapter 4 Biological Monitoring - DNR...Chapter 4 Biological Monitoring - Introduction to Volunteer Water Quality Monitoring Training Notebook - The quality of a stream’s health

Drinking From The Fire Hose: Scalable Stream Processing ...ey204/teaching/ACS/R212_2014_2015/slides/Peter... · •Database Management System (DBMS): • Data relatively static but

RESILIENT DISTRIBUTED DATASETS: A FAULT ...ey204/teaching/ACS/R244_2018...RESILIENT DISTRIBUTED DATASETS: A FAULT-TOLERANT ABSTRACTION FOR IN-MEMORY CLUSTER COMPUTING MATEIZAHARIA,

Report from Dagstuhl Seminar 14462 Systems and Algorithms ...ey204/pubs/Dagstuhl_14462_Report.pdf · Report from Dagstuhl Seminar 14462 Systems and Algorithms for Large ... by a joint

User Guide Vodafone Mobile Wi-Fi R212 - ComfortSurf · User Guide Vodafone Mobile Wi-Fi R212 Designed by Vodafone. Welcome to the world of mobile communications ... Language selection:

R212 - Power Linepowerlineitalia.com/pdf/KENDRILL/CC_2014-15-KENDRILL-R... · 2015-11-13 · R212.6D Broca de metal duro integral, serie larga 12xD, con refrigeración interna Solid

Palmer Lab - Palmer Lab - STREAM TEMPERATURE ......ges to the stream’s hydraulic geometry. Krause et al. (2004) modiﬁed the Stream Network Temperature Model to assess changes in

USING REINFORCEMENT LEARNING FOR AUTONOMIC RESOURCE ...ey204/teaching/ACS/R212_2015_2016/presentation… · USING REINFORCEMENT LEARNING FOR AUTONOMIC RESOURCE ALLOCATION IN CLOUDS:

MadLINQ: Large-Scale Distributed Matrix Computation for ...ey204/teaching/ACS/R212_2014_2015/papers/qi… · The computation core of many data-intensive applications can be best expressed

Designing Hybrid Data Processing Systems for Heterogeneous …ey204/teaching/ACS/R244_2017... · 2017-11-09 · Defining Stream Query Semantics • Windows convert data streams to

jet stream dental labcorporate profile · DHL is Jet Stream’s primary ... Makes a Difference Dr.William “Bill”Bloink,a clinical mentor and hands-on trainer for Heartland Dental

LIPSIN: Line Speed Publish/Subscribe Inter …ey204/teaching/ACS/R202_2012_2013/papers/S3_CCN...LIPSIN: Line Speed Publish/Subscribe Inter-Networking Petri Jokela1, András Zahemszky1,

Drinking From The Fire Hose: Scalable Stream Processing ...ey204/teaching/ACS/R212_2014_2015/sli… · • Different scheduling disciplines possible: 1. Round-robin 2. Minimise queue

Cooperative Extension Service Food and Environment College ... · Buttonbush/Tom Barnes Silky dogwood/Tom Barnes 6 six siMple solutions to Help you preserve or iMprove your streaM’s

Towards Latency: An Online Learning Mechanism for Caching ...ey204/pubs/MPHIL/2015_MICHAEL.pdfrelies on a stochastic method to estimate optimal expiration times for dynam-ically changing