design pattern for scientific applications in dryadlinq ctp

21
SALSA SALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox

Upload: manon

Post on 05-Feb-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Design Pattern for Scientific Applications in DryadLINQ CTP. DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox. Motivation. One of latest release of DryadLINQ was published on Dec 2010 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSASALSA

Design Pattern for Scientific Applications in DryadLINQ CTP

DataCloud-SC11Hui Li

Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox

Page 2: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

Motivation

• One of latest release of DryadLINQ was published on Dec 2010

• Investigate usability and performance of using DryadLINQ language to develop data intensive applications

• Generalize programming patterns for scientific applications in DryadLINQ

• Programing model is important for eScience infrastructure, one reason why cloud over grid is that cloud has practical programming model

Page 3: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

Big Data Challenge

Mega 10^6

Giga 10^9

Tera 10^12

Peta 10^15

Page 4: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

MapReduce Processing Model

Scheduling Fault tolerance

Workload balanceCommunication

Page 5: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

Disadvantages in MapReduce

• Rigid and flat data processing paradigm are difficult to express semantic of relational operation, such as Join.

• Pig or Hive can solve above issue to some extent but has several limitations:– Relational operations are converted into a set of

MapReduce tasks for execution– For some equal join, it needs to materialize entire

cross product to the disk• Sample: MapReduce PageRank

Page 6: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

Microsoft Dryad Processing Model

Processingvertices Channels

(file, pipe, shared memory)

Inputs

OutputsDirected Acyclic Graph (DAG)

Page 7: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

Microsoft DryadLINQ Programming Model

• Higher level programing interface for Dryad• Based on LINQ model• Unified data model• Integrated into .NET language• Strong typed .NET objects

Common LINQ providers

Provider Base class <T> Strong typed .NET

objectsLINQ-to-objects IEnumerable<T>PLINQ ParallelQuery<T>DryadLINQ DistributedQuery<T>

Page 8: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA8

DryadLINQ

Subquery Subquery Subquery Subquery Subquery

Query

Application Program Legacy Code

Pleasingly Parallel Programing Patterns

User defined function User defined function

Sample applications:1. SAT problem2. Parameter sweep3. Blast, SW-G bio

Issue:Scheduling resources in the granularity of node rather than core

Page 9: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA9

Hybrid Parallel Programming Pattern

Query

DryadLINQ

PLINQ

subquery User defined function

TPL

Sample applications:1. Matrix Multiplication2. GTM and MDS

Solve previous issue by using PLINQ, TPL, Thread Pool technologies

Page 10: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

Distributed Grouped Aggregation Pattern

Three aggregation approaches:1. Naïve aggregation2. Hierarchical aggregation3. Aggregation tree

Sample applications:1. Word count2. PageRank

Page 11: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

Implementation and Performance

TEMPEST TEMPEST TEMPEST-CNXX

CPU Intel E7450 Intel E7450

Cores 24 24

Memory 24.0 GB 50.0 GB

Memory/Core 1 GB 2 GB

STORM STORM-CN01,CN02, CN03

STORM-CN04,CN05 STORM-CN06,CN07

CPU AMD 2356 AMD 8356 Intel E7450

Cores 8 16 24

Memory 16 GB 16 GB 48 GB

Memory/Core 2 GB 1 GB 2 GB

Hardware configuration

1. We use DryadLINQ CTP version released in December 2010 2. Windows HPC R2 SP2 3. .NET 4.0, Visual Studio 2010

Page 12: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

Pairwise sequence comparison using Smith Waterman Gotoh

Var SWG_Blocks = create_SWG_Blocks(AluGeneFile, numOfBlocks, BlockSize)Var inputs= SWG_Blocks.AsDistributedFromPartitions();Var outputs= inputs.Select(distributedObject => SWG_Program(distributedObject));

1. Pleasingly parallel application2. Easy to program with

DryadLINQ3. Easy to tune task granularity

with DryadLINQ API

Page 13: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

Workload balance in SW-G

31 62 124 186 2480

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Std. Dev. = 50

Std. Dev. = 100

Std. Dev. = 250

Number of Partitions

Exec

ution

Tim

e (S

econ

ds)

1. SWG tasks are heterogeneous in CPU time.

2. Coarse task granularity brings workload balance issue

3. Finer task granularity has more scheduling overhead

Simply solution: tune the task granularity with DryadLINQ APIRelated API:

AsDistributedFromPartition()RangePartition<Type t>()

Page 14: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

Square dense matrix multiplication

Parallel MM algorithms:1.Row partition2.Row/Column partition 3.Fox algorithm

Multiple core parallel technologies:1.PLINQ, 2.TPL,3.Thread Pool

Pseudo Code of Fox algorithm:Partitioned matrix A, B to blocksFor each iteration i: 1) broadcast matrix A block (k,j) to row k 2) compute matrix C blocks, and add the partial results to the previous result of matrix C block 3) roll-up matrix B block by column

Page 15: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

Matrix multiplication performance results1core on 16 nodes V.S 24 cores on 16 nodes

2400 4800 7200 9600 12000 14400 16800 192000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Fox-Hey RowPartition RowColumnPartition

Matrices Sizes

Rel

ativ

e P

aral

lel E

ffic

ienc

y

2400 4800 7200 9600 12000 14400 16800 192000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Fox-Hey RowPartition RowColumnPartition

Matrices Sizes

Rel

ativ

e P

aral

lel E

ffic

ienc

y

2400 4800 7200 9600 12000 14400 16800 192000

200400600800

100012001400160018002000

Fox RowPartitionRow/Column Partition

Matrices Sizes

Mfl

ops

2400 4800 7200 9600 12000 14400 16800 192000

5000

10000

15000

20000

25000

Fox RowPartitionRowColumnPartition

Matrices Sizes

Mfl

ops

Page 16: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

PageRank

Three distributed grouped aggregation approaches

1. Naïve aggregation2. Hierarchical aggregation3. Aggregation Tree

PageRank with hierarchical aggregation approach

Foreach iteration{1. join edges with ranks2. distribute ranks on edges3. groupBy edge destination4. aggregate into ranks}

[1] Pagerank Algorithm, http://en.wikipedia.org/wiki/PageRank[2] ClueWeb09 Data Set, http://boston.lti.cs.cmu.edu/Data/clueweb09/

Page 17: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

PageRank performance resultsNaïve aggregation v.s Aggregation Tree

320 480 640 800 960 1120 12800

50

100

150

200

250

Aggregation Tree

Hash Partition

Hierarchical Aggregation

Number of AM files

Sec

ond

per

Iter

a-tio

n

100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000300

400

500

600

700

800

900

1000

1100

1200

Hash Partition Aggregation Tree

Number of Output Tuples

Sec

ond

Per

Ite

ra-

tion

Page 18: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

Conclusion• We investigated the usability and performance of using DryadLINQ to develop

three applications: SW-G, Matrix Multiply, PageRank. And we abstracted three design patterns: pleasingly parallel programming pattern, hybrid parallel programming pattern, distributed aggregation pattern

• Usability– It is easy to tune task granularity with DryadLINQ data model and interfaces– The unified data model and interface enable developers to easily utilize the

parallelism of single-core, multi-core and multi-node environments. – DryadLINQ provide optimized distributed aggregation approaches, and the

choice of approaches have materialized effect on the performance• Performance

– The parallel MM algorithm scale up for large matrices sizes. – Distributed aggregation optimization can speed up the program by 30%

than naïve approach

Page 19: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

Question?

• Thank you!

Page 20: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

Outline

• Introduction– Big Data Challenge– MapReduce– DryadLINQ CTP

• Programming Model Using DryadLINQ– Pleasingly– Hybrid– Distributed Grouped

Aggregation

• Implementation• Discussion and Conclusion

Page 21: Design Pattern for Scientific Applications in DryadLINQ CTP

SALSA

Workflow of Dryad job

...

Compute node

Vertex 1

Data

Compute node

Vertex 2

Data

Compute node

Vertex n

Data

Compute node

Dryad graph manager

Head node

DSC Service

HPC Job Scheduler

Service

DSC

Window HPC Server 2008 R2 Cluster

HPC Client Utilites

DSC Client Service

DryadLINQ Provider

Workstation computer