ex-mate: data-intensive computing with large reduction objects and its application to graph mining...

39
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Upload: godfrey-weaver

Post on 29-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

Wei Jiang and Gagan Agrawal

Page 2: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Outline

April 19, 20232

Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion

Page 3: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Outline

April 19, 20233

Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion

Page 4: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

April 19, 20234

Map-Reduce Simple API : map and reduce

Easy to write parallel programs Fault-tolerant for large-scale data centers

Performance? Always a concern for HPC community

Generalized Reduction First proposed in FREERIDE that was developed at Ohio

State 2001-2003 Shared a similar processing structure

The key difference lies in a programmer-managed reduction-object

Better performance?

Background (I)

Page 5: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

April 19, 20235

Map-Reduce Execution

Page 6: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Comparing Processing Structures

6

• Reduction Object represents the intermediate state of the execution• Reduce func. is commutative and associative• Sorting, grouping.. .overheads are eliminated with red. func/obj.

April 19, 2023

Page 7: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Our Previous Work A comparative study between FREERIDE and

Hadoop: FREERIDE outperformed Hadoop with factors of 5 to 10 Possible reasons:

Java VS C++? HDFS overheads? Inefficiency of Hadoop? API difference?

Developed MATE (Map-Reduce system with an AlternaTE API) on top of Phoenix from Stanford Adopted Generalized Reduction Focused on API differences MATE improved Phoenix with an average of 50%

Avoids large set of intermediate pairs between Map & Reduce Reduces memory requirements

April 19, 20237

Page 8: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Extending MATE Main issues of the original MATE:

Only works on a single multi-core machine Datasets should reside in memory Assumes the reduction object MUST fit in memory

This paper extended MATE to address these limitations Focus on graph mining: an emerging class of apps

Require large-sized reduction objects as well as large-scale datasets

E.g., PageRank could have a 8GB reduction object! Support of managing arbitrary-sized reduction objects

Also reading disk-resident input data Evaluated Ex-MATE using PEGASUS

PEGASUS: A Hadoop-based graph mining system

April 19, 20238

Page 9: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Outline

April 19, 20239

Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion

Page 10: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

April 19, 202310

System Design and Implementation System design of Ex-MATE

Execution overview Support of distributed environments

System APIs in Ex-MATE One set provided by the runtime

operations on reduction objects Another set defined or customized by the users

reduction, combination, etc.. Runtime in Ex-MATE

Data partitioning Task scheduling Other low-level details

Page 11: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

April 19, 202311

Ex-MATE Runtime Overview Basic one-stage execution

Page 12: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

April 19, 202312

Implementation Considerations Support for processing very large datasets

Partitioning function: Partition and distribute to a number of nodes

Splitting function: Use the multi-core CPU on each node

Management of a large reduction-object (R.O.): Reduce disk I/O! Outputs (R.O.) are updated in a demand-driven way

Partition the reduction object into splits Inputs are re-organized based on data access

patterns Reuse a R.O. split as much as possible in memory

Example: Matrix-Vector Multiplication

Page 13: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

A MV-Multiplication Example

April 19, 202313

Output Vector

Input Vector

Input Matrix(1, 1)

(2, 1)

(1, 2)

Page 14: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Outline

April 19, 202314

Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion

Page 15: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

GIM-V for Graph Mining (I) Generalized Iterative Matrix-Vector

Multiplication(GIM-V) Proposed at CMU at first Similar to the common MV Multiplication

MV Mul. : Three operations in

GIM-V: combine m(i, j) and v(j) :

Not have to be a multiplication combineAll n partial results for the element i :

Not have to be the sum assign v(new) to v(i) :

The previous value of v(i) is updated by a new value

April 19, 202315

Multiplication

Sum

Assignment

Page 16: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

GIM-V for Graph Mining (II) A set of graph mining applications can fit

into this GIM-V PageRank, Diameter Estimation, Finding

Connected Components, Random Walk with Restart, etc..

Parallelization of GIM-V: Use Map-Reduce in PEGASUS

A two-stage algorithm: two consecutive map-reduce jobs

Use Generalized Reduction in Ex-MATE A one-stage algorithm: simpler code

April 19, 202316

Page 17: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

GIM-V Example: PageRank PageRank is used by Google to calculate the

relative importance of web-pages: Direct implementation of GIM-V: v(j) is the ranking

value The three customized operations are:

April 19, 202317

Multiplication

Sum

Assignment

Page 18: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

GIM-V: Other Algorithms Diameter Estimation: HADI is an algorithm to

estimate the diameter of a given graph The three customized operations are:

Finding Connected Components: HCC is a new algorithm to find the connected components of large graphs The three customized operations are:

April 19, 202318

Multiplication

Bitwise-or

Bitwise-or

Multiplication

Minimal

Minimal

Page 19: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Parallelization of GIM-V (I) Using Map-Reduce: Stage I

Map:

April 19, 202319

Map M(i,j) and V(j) to reducer j

Page 20: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Parallelization of GIM-V (II) Using Map-Reduce: Stage I (cont.)

Reduce:

April 19, 202320

Map “combine2(M(i,j) , V(j)) “to reducer i

Page 21: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Parallelization of GIM-V (III) Using Map-Reduce: Stage II

Map:

April 19, 202321

Page 22: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Parallelization of GIM-V (IV) Using Map-Reduce: Stage II (cont.)

Reduce:

April 19, 202322

Page 23: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Parallelization of GIM-V (V) Using Generalized Reduction in Ex-MATE:

Reduction:

April 19, 202323

Page 24: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Parallelization of GIM-V (VI) Using Generalized Reduction in Ex-MATE:

Finalize:

April 19, 202324

Page 25: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Outline

April 19, 202325

Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion

Page 26: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

April 19, 202326

Applications: Three graph mining algorithms:

PageRank, Diameter Estimation, and Finding Connected Components

Evaluation: Performance comparison with PEGASUS

PEGASUS provides a naïve version and an optimized version

Speedups with an increasing number of nodes Scalability speedups with an increasing size of

datasets Experimental platform:

A cluster of multi-core CPU machines Used up to 128 cores (16 nodes)

Experiments Design

Page 27: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

April 19, 202327

Results: Graph Mining (I) PageRank: 16GB dataset; a graph of 256

million nodes and 1 billion edgesA

vg

. Tim

e P

er

Itera

tion

(m

in)

# of nodes

10.0 speedup

Page 28: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

April 19, 202328

Results: Graph Mining (II) HADI: 16GB dataset; a graph of 256 million

nodes and 1 billion edgesA

vg

. Tim

e P

er

Itera

tion

(m

in)

# of nodes

11.0 speedup

Page 29: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

April 19, 202329

Results: Graph Mining (III) HCC: 16GB dataset; a graph of 256 million

nodes and 1 billion edgesA

vg

. Tim

e P

er

Itera

tion

(m

in)

# of nodes

9.0 speedup

Page 30: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

April 19, 202330

Scalability: Graph Mining (IV) HCC: 8GB dataset; a graph of 256 million

nodes and 0.5 billion edgesA

vg

. Tim

e P

er

Itera

tion

(m

in)

# of nodes

1.7 speedup

1.9 speedup

Page 31: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

April 19, 202331

Scalability: Graph Mining (V) HCC: 32GB dataset; a graph of 256 million

nodes and 2 billion edgesA

vg

. Tim

e P

er

Itera

tion

(m

in)

# of nodes

1.9 speedup

2.7 speedup

Page 32: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

April 19, 202332

Scalability: Graph Mining (VI) HCC: 64GB dataset; a graph of 256 million

nodes and 4 billion edgesA

vg

. Tim

e P

er

Itera

tion

(m

in)

# of nodes

1.9 speedup

2.8 speedup

Page 33: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Observations

April 19, 202333

Performance trends are similar for all three applications Consistent with the fact that all three applications

are implemented using the GIM-V method Ex-MATE outperforms PEGASUS significantly

for all three graph mining algorithms Reasonable speedups for different datasets Better scalability for larger datasets with a

increasing number of nodes

Page 34: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Outline

April 19, 202334

Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion

Page 35: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Related Work: Academia

April 19, 202335

Evaluation of Map-Reduce-like models in various parallel programming environments: Phoenix-rebirth for large-scale multi-core machines Mars for a single GPU MITHRA for GPGPUs in heterogeneous platforms Recent IDAV for GPU clusters

Improvement of Map-Reduce API: Integrating pre-fetch and pre-shuffling into Hadoop Supporting online queries Enforcing a less restrictive synchronization

semantics between Map and Reduce

Page 36: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Related Work: Industry

April 19, 202336

Google’s Pregel System: Map-reduce may not so suitable for graph

operations Proposed to target graph processing Open source version: HAMA project in Apache

Variants of Map-Reduce: Dryad/DryadLINQ from Microsoft Sawzall from Google Pig/Map-Reduce-Merge from Yahoo! Hive from Facebook

Page 37: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Outline

April 19, 202337

Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion

Page 38: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

April 19, 202338

Conclusion Ex-MATE supports the management of

reduction objects of arbitrary sizes Deals with disk-resident reduction objects

Outperforms PEGASUS for both the naïve and optimized implementations for all three graph mining application Has a simpler code

Offers a promising alternative for developing efficient data-intensive applications, Uses GIM-V for parallelizing graph mining

Page 39: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

39

Thank You, and Acknowledgments Questions and comments

Wei Jiang - [email protected] Gagan Agrawal - [email protected]

This project was supported by: