the dryadlinq approach to distributed data-parallel computing yuan yu microsoft research silicon...

The DryadLINQ Approach to Distributed Data-Parallel Computing

Yuan Yu

Microsoft Research Silicon Valley

Distributed Data-Parallel Computing

• Dryad talk: the execution layer– How to reliably and efficiently execute distributed

data-parallel programs on a compute cluster?

• This talk: the programming model– How to write distributed data-parallel programs for

a compute cluster?

The Programming Model• Sequential, single machine programming abstraction• Same program runs on single-core, multi-core, or cluster• Preserve the existing programming environments

– Modern programming languages (C# and Java) are very good• Expressive language and data model• Strong static typing, GC, generics, …

– Modern IDEs (Visual Studio and Eclipse) are very good• Great debugging and library support

• Legacy code could be easily reused

Dryad and DryadLINQ

DryadLINQ provides automatic query plan generationDryad provides automatic distributed execution

Outline

• Programming model• DryadLINQ• Applications• Discussions and conclusion

LINQ• Microsoft’s Language INtegrated Query

– Available in .NET3.5 and Visual Studio 2008• A set of operators to manipulate datasets in .NET

– Support traditional relational operators• Select, Join, GroupBy, Aggregate, etc.

– Integrated into .NET programming languages• Programs can invoke operators• Operators can invoke arbitrary .NET functions

• Data model– Data elements are strongly typed .NET objects– Much more expressive than relational tables

• For example, nested data structures

LINQ Framework

PLINQ

Local machine

.Netprogram(C#, VB, F#, etc)

Execution engines

Query

Objects

LINQ-to-SQL

DryadLINQ

LINQ-to-Obj

LIN

Q p

rovi

der i

nter

face

Scalability

Single-core

Multi-core

Cluster

A Simple LINQ Query

IEnumerable<BabyInfo> babies = ...;

var results = from baby in babies where baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd orderby baby.Year ascending select baby;

A Simple PLINQ Query

IEnumerable<BabyInfo> babies = ...;

var results = from baby in babies.AsParallel() where baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd orderby baby.Year ascending select baby;

A Simple DryadLINQ Query

PartitionedTable<BabyInfo> babies = PartitionedTable.Get<BabyInfo>(“BabyInfo.pt”);

var results = from baby in babies where baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd orderby baby.Year ascending select baby;

DryadLINQ Data Model

Partition

Partitioned Table

.Net objects

Partitioned table exposes metadata information– type, partition, compression scheme, serialization, etc.

Demo

• It is just programming– The same familiar programming languages,

development tools, libraries, etc.

K-means Execution Graph

C0 ac

P1

P2

P3

acac

cc C1

acac

ac

cc C2

acac

ac

cc C3

K-means in DryadLINQ

public static Vector NearestCenter(Vector v, IEnumerable<Vector> centers) { return centers.Aggregate((r, c) => (r - v).Norm2() < (c - v).Norm2() ? r : c);}

public static IQueryable<Vector> Step(IQueryable<Vector> vectors, IQueryable<Vector> centers) { return vectors.GroupBy(v => NearestCenter(v, centers)) .Select(group => group.Aggregate((x,y) => x + y) / group.Count());}

var vectors = PartitionedTable.Get<Vector>("dfs://vectors.pt");var centers = vectors.Take(100);for (int i = 0; i < 10; i++) { centers = Step(vectors, centers);}centers.ToPartitionedTable<Vector>(“dfs://centers.pt”);

public class Vector { public double[] entries;

[Associative] public static Vector operator +(Vector v1, Vector v2) { … }

public static Vector operator -(Vector v1, Vector v2) { … }

public double Norm2() { …}

}

PageRank Execution Graph

N0

1ae

E1

E2

E3

aeae

cc

N0

2

N0

3

cc

cc

DD

D

N1

1 N1

2 N1

3

aeae

ae

cc

cc

cc

DD

D

N2

1 N2

2 N2

3

aeae

ae

cc

cc

cc

DD

D

N3

1 N3

2 N3

3

PageRank in DryadLINQ

var pages = PartitionedTable.Get<Page>(“dfs://pages.pt”); var ranks = pages.Select(page => new Rank(page.name, 1.0)); // repeat the iterative computation several times for (int iter = 0; iter < n; iter++) { ranks = Step(pages, ranks); }

ranks.ToPartitionedTable<Rank>(“dfs://ranks.pt”);

public struct Page { public UInt64 name; public Int64 degree; public UInt64[] links;

public Page(UInt64 n, Int64 d, UInt64[] l) { name = n; degree = d; links = l; }

public Rank[] Disperse(Rank rank) { Rank[] ranks = new Rank[links.Length]; double score = rank.rank / this.degree; for (int i = 0; i < ranks.Length; i++) { ranks[i] = new Rank(this.links[i], score); } return ranks; } }

public struct Rank { public UInt64 name; public double rank;

public Rank(UInt64 n, double r) { name = n; rank = r; } }

public static IQueryable<Rank> Step(IQueryable<Page> pages, IQueryable<Rank> ranks) { // join pages with ranks, and disperse updates var updates = from page in pages join rank in ranks on page.name equals rank.name select page.Disperse(rank);

// re-accumulate. return from list in updates from rank in list group rank.rank by rank.name into g select new Rank(g.Key, g.Sum());}

MapReduce in DryadLINQ

MapReduce(source, // sequence of Ts mapper, // T -> Ms keySelector, // M -> K reducer) // (K, Ms) -> Rs{ var map = source.SelectMany(mapper); var group = map.GroupBy(keySelector); var result = group.SelectMany(reducer); return result; // sequence of Rs}// Ex: Count the frequencies of words in a bookMapReduce(book, line => line.Split(' '), w => w.ToLower(), g => g.Count())

Dryad

DryadLINQ System Architecture

DryadLINQ

Client machine

(11)

Distributedquery plan

.NET program

Query Expr

Cluster

Output TablesResults

Input TablesInvoke

Query plan

Output Table

Dryad Execution

.Net Objects

LINQ query

foreach

Vertexcode

DryadLINQ

• Distributed execution plan generation– Static optimizations: pipelining, eager aggregation, etc.– Dynamic optimizations: data-dependent partitioning,

dynamic aggregation, etc.

• Vertex runtime– Single machine (multi-core) implementation of LINQ– Vertex code that runs on vertices– Data serialization code– Callback code for runtime dynamic optimizations– Automatically distributed to cluster machines

A Simple Example

• Count the frequencies of words in a book

var map = book.SelectMany(line => line.Split(' '));var group = map.GroupBy(w => w.ToLower());var result = group.Select(g => g.Count());

Naïve Execution Plan

M

D

MG

G

R

X

M

D

MG

G

R

X

M

D

Map

Distribute

Merge

GroupBy

Reduce

Consumer

map

redu

ce

MG

G

R

X

.....

…g.Count()

line.Split(' ')

Execution Plan Using Partial AggregationM

G1

D

M

G1

D

M

G1

D

MG

G2

C

MG

G2

C

GroupBy

InitialReduce

Merge

GroupBy

Combine

Merge

GroupBy

FinalReduce

Consumer

map

aggr

egati

on tr

eere

duce

IR IRIR

MG

G3

F

X

MG

G3

F

X

Map

Distribute

g.Count()

g.Sum()

g.Sum()

Challenge: Program Analysis Support

• The main sources of difficulty– Complicated data model– User-defined functions all over the places

• Requires sophisticated static program analysis at byte-code level– Mainly some form of flow analysis

• Possible with modern programming languages and runtimes, such as C#/CLR

Inferring Dataset Property

• Useful for query optimizations

Select(x => new { A=x.a+x.b, B=f(x.c) }

Hash(y => y.a+y.b)

Hash(z => z.A)

GroupBy(x => x.A)

No data partitioning at GroupBy

Inferring Dataset Property

• Useful for query optimizations

Select(x => Foo(x) }

Hash(y => y.a+y.b)

Hash(x => x.A)?

GroupBy(x => x.A)

Need IL-level data-flow analysis of Foo

Caching Query Results

• A cluster-wide caching service to support– Reuse of common subqueries– Incremental computations

• Cache: [Key Value]– Key is <q, d>, Value is the result of q(d)– Key requires a pretty good reachability analysis

• For correctness, Key must include “everything” reachable from q

• For performance, Key should only contain things q depends on

More Static Analysis • Purity checking

– All the functions called in DryadLINQ queries must be side-effect free

– DryadLINQ (and PLINQ) doesn’t enforce it• Metadata validity checking

– Partitioned table’s metadata contains the record type, partition scheme, serialization functions, …

– Need to determine if the metadata is valid– DryadLINQ doesn’t fully enforce it

• Static enforcement of program properties for security and privacy mechanisms

Examples of DryadLINQ Applications• Data mining

– Analysis of service logs for network security– Analysis of Windows Watson/SQM data– Cluster monitoring and performance analysis

• Graph analysis– Accelerated Page-Rank computation– Road network shortest-path preprocessing

• Image processing– Image indexing– Decision tree training– Epitome computation

• Simulation– light flow simulations for next-generation display research– Monte-Carlo simulations for mobile data

• eScience– Machine learning platform for health solutions– Astrophysics simulation

Decision Tree TrainingMihai Budiu, Jamie Shotton et al

Machinelearning

1M images x 10,000 pixels x 2,000 features x 221 tree nodes

Complexity >1020 objects

Decision Tree

label image

Learn a decision tree to classify pixels in a large set of images

30

Read, preprocessRedistribute

HistogramRegroup histograms on nodeCompute new tree layer

Compute new tree

Broadcast new tree

Initial empty tree

Final tree

Image inputs, partitioned

Sample Execution plan

Application Details

• Workflow = 37 DryadLINQ jobs• 12 hours running time on 235 machines• More than 100,000 processes• More than 100 days of CPU time• Recovers from several failures daily• 34,000 lines of .NET code

Client

Front EndCluster S

QM

Ser

vice

Datapoints collected on

client and Uploaded.

IIS Servers Check validity &

concatenate

File movers move incoming

data into inexpensive compressed

storage

Distributed Excution Engine

queries data based on user initiated ad-hoc queries or well defined queries for

reporting

Rich reports created with

customization capability

Dat

aFl

ow

DataMarts

Compressed Storage

SQM Client

Front EndReporting Portal

Extract data for storage in

custom schema with different

retention policy

V V V

Distributed Execution

Engine

DataMarts stores data for

reporting purposes

Custom Schema/Storage

Data validation and analysis

toolsets which query raw data

directly

Windows SQM Data Analysis Michal Strehovsky, Sivarudrappa Mahesh et al

The Language Integration Approach

• Single unified programming environment– Unified data model and programming language– Direct access to IDE and libraries– Different from SQL, HIVE, Pig Latin

• Multiple layers of languages and data models

• Works out very well, but requires good programming language supports– LINQ extensibility: custom operators/providers– .NET reflection, dynamic code generation, …

34

Combining with PLINQ

Query

DryadLINQ

PLINQ

subquery

The combination of PLINQ and DryadLINQ delivers computation to every core in the cluster

Acyclic Dataflow Graph

• Acyclic dataflow graph provides a very powerful computation model– Easy target for higher-level programming abstractions

such as DryadLINQ– Easy expression of many data-parallel optimizations

• We designed Dryad to be general and flexible– Programmability is less of a concern– Used primarily to support higher-level programming

abstractions– No major changes made to Dryad in order to support

DryadLINQ

Expectation Maximization (Gaussians)

36

• Generated by DryadLINQ• 3 iterations shown

Decoupling of Dryad and DryadLINQ• Separation of concerns

– Dryad layer concerns scheduling and fault-tolerance – DryadLINQ layer concerns the programming model

and the parallelization of programs– Result: powerful and expressive execution engine and

programming model

• Different from the MapReduce/Hadoop approach– A single abstraction for both programming model and

execution engine– Result: very simple, but very restricted execution

engine and language

38

ImageProcessing

Cosmos DFSSQL Servers

Software Stack

Windows Server

Cluster Services (Azure, HPC, or Cosmos)

Azure DFS

Dryad

DryadLINQ

Windows Server

Windows Server

Windows Server

CIFS/NTFS

MachineLearning

Graph Analysis

DataMining

Applications

…… eScience

Conclusion

• Single unified programming environment– Unified data model and programming language– Direct access to IDE and libraries

• An open and extensible system– Many LINQ providers out there

• Existing ones: LINQ-to-XML, LINQ-to-SQL, PLINQ, …• Very easy to write one for your app domain

– Dryad/DryadLINQ scales out all of them!

Availability

• Freely available for academic use– http://connect.microsoft.com/DryadLINQ– DryadLINQ source, Dryad binaries, documentation,

samples, blog, discussion group, etc.

• Will be available soon for commercial use– Free, but no product support

the dryadlinq approach to distributed data-parallel computing yuan yu microsoft research silicon...

Documents