dryadlinq - distributed computation › ~srsarangi › courses › 2020 › col_819_2020 ›...

35
Motivation DryadLINQ Design Evaluation DryadLINQ Distributed Computation Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Dryad LINQ 1/33

Upload: others

Post on 07-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

DryadLINQDistributed Computation

Smruti R. Sarangi

Department of Computer ScienceIndian Institute of Technology

New Delhi, India

Smruti R. Sarangi Dryad LINQ 1/33

Page 2: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

Outline

1 MotivationBasic IdeaRelated Work

2 DryadLINQ DesignSystem ArchitectureLINQExecution Plan GraphMiscellaneous

3 EvaluationTerasortSkyServer

Smruti R. Sarangi Dryad LINQ 2/33

Page 3: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

Basic IdeaRelated Work

Outline

1 MotivationBasic IdeaRelated Work

2 DryadLINQ DesignSystem ArchitectureLINQExecution Plan GraphMiscellaneous

3 EvaluationTerasortSkyServer

Smruti R. Sarangi Dryad LINQ 3/33

Page 4: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

Basic IdeaRelated Work

Dryad-LINQ

Current programming models for large scale distributed pro-gramming

Map-ReduceMPIMicrosoft Dryad

A DryadLINQ program contains LINQ expressions that:Use LINQ expressions to specify side effect free transforma-tions to dataThe Dryad system parallelizes portions of the program, andruns the program on thousands of machines

A sort of a terabyte level data set takes 319 seconds, on a240 node system.

Smruti R. Sarangi Dryad LINQ 4/33

Page 5: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

Basic IdeaRelated Work

Basic Idea

LINQ ConstructsLINQ (Language INtegrated Query) is a set of .NET con-structsIt provides support for imperative and declarative program-mingProgramming languages supported: C#, F#, VB

Imperative programming: variables, loops, iterators, condi-tionalsDeclarative programming: functors, type inferencing

Smruti R. Sarangi Dryad LINQ 5/33

Page 6: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

Basic IdeaRelated Work

Outline

1 MotivationBasic IdeaRelated Work

2 DryadLINQ DesignSystem ArchitectureLINQExecution Plan GraphMiscellaneous

3 EvaluationTerasortSkyServer

Smruti R. Sarangi Dryad LINQ 6/33

Page 7: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

Basic IdeaRelated Work

Related Work

Parallel DatabasesImplement only declarative variants of SQL QueriesThe query oriented nature of SQL makes it hard to specifytypical programming constructs

Map ReduceNot very flexible.Hard to perform operations such as sorting or databasejoinsLack of type support

Smruti R. Sarangi Dryad LINQ 7/33

Page 8: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

Basic IdeaRelated Work

Related Work - II

Domain specific languages on top of MapReduce – Sawzall,Pig (Yahoo), Hive (Facebook)They are a combination of declarative constructs, and iter-ative constructsHowever, they are not very flexible since their pattern is in-herently based on SQLHow is DryadLINQ different ?

The computation is not dependent on the nature of underly-ing resources.Uses virtual execution plansUnderlying computational resources can change dynamically(faults, outages, . . .)

Smruti R. Sarangi Dryad LINQ 8/33

Page 9: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

Overview of DryadLINQ

Structure of a Dryad jobIt is a directed acyclic graph (DAG)Each vertex is a programEach edge is a data channel that transmits a finite se-quence of records at runtime

Smruti R. Sarangi Dryad LINQ 9/33

Page 10: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

Outline

1 MotivationBasic IdeaRelated Work

2 DryadLINQ DesignSystem ArchitectureLINQExecution Plan GraphMiscellaneous

3 EvaluationTerasortSkyServer

Smruti R. Sarangi Dryad LINQ 10/33

Page 11: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

Dryad System Architecture

Dryad System ArchitectureIt contains a centralized job manager whose role is:

Instantiating a job’s dataflow graph.Scheduling processesFault toleranceJob monitoring and managementTransforming the job graph at runtime according to theuser’s instructions

Smruti R. Sarangi Dryad LINQ 11/33

Page 12: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

DryadLINQ Execution Overview

1 The user runs a .NET application. It creates a DryadLINQexpression object that has deferred evaluation.

2 The application calls the method ToDryadTable. This methodhands over the expression object to DryadLINQ.

3 DryadLINQ compiles the expression, and makes an execu-tion plan

1 Decomposition into sub-expressions2 Generation of code and data for Dryad nodes3 Generation of serialization and synchronization code.

4 Dryad invokes a custom job manager .

Smruti R. Sarangi Dryad LINQ 12/33

Page 13: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

DryadLINQ Execution Overview - II

1 The job manager creates a job graph. It schedules andspawns the jobs.

2 Each node in the graph executes the program assigned toit.

3 When the program is done, it writes the data to the outputtable.

4 After the job manager terminates, DryadLINQ collates allthe outputs and creates the DryadTable object.

5 Control returns to the user application.1 Dryad passes an iterator object to the table object.2 This can be passed to subsequent statements.

Smruti R. Sarangi Dryad LINQ 13/33

Page 14: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

Outline

1 MotivationBasic IdeaRelated Work

2 DryadLINQ DesignSystem ArchitectureLINQExecution Plan GraphMiscellaneous

3 EvaluationTerasortSkyServer

Smruti R. Sarangi Dryad LINQ 14/33

Page 15: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

LINQ

The base type is an interface IEnumerable<T> – An iteratorfor a set of objects with type T

The programmer is not aware of the data type associatedwith an instance of IEnumerable

IQueryable<T> is a subtype of IEnumerable<T>This is an unevaluated expressionIt undergoes deferred evaluationDryadLINQ creates a concrete class to implement the IQueryableexpression at runtime

Smruti R. Sarangi Dryad LINQ 15/33

Page 16: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

LINQ SQL Syntax Example

// Join two tables: scoreTriples and staticRankvar adjustedScoreTripes =from d in scoreTriplesjoin r in staticRank on d.docID equals r.keyselect new QueryScoreDocIDTriple(d, r);

var rankedQueries =from s in adjustedScoreTriplesgroup s by s.query into gselect TakeTopQueryResults(g);

Smruti R. Sarangi Dryad LINQ 16/33

Page 17: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

LINQ OOP Syntax Example

var adjustedScoreTriples =scoreTriples.Join(staticRank,d => d.docID, r => r.key,(d, r) => new QueryScoreDocIDTriple(d, r));

var groupedQueries =adjustedScoreTriples.GroupBy(s => s.query);

var rankedQueries = groupedQueries.Select(g => TakeTopQueryResults(g));

Smruti R. Sarangi Dryad LINQ 17/33

Page 18: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

DryadLINQ Constructs

A DryadLINQ collection (defined by IEnumerable) is a dis-tributed dataset. Partitioning strategies.

Hash PartitioningRange PartitioningRound-robin Partitioning

The results of a DryadLINQ computation are representedby the object – DryadTable<T>

Subtypes determine the actual storage interface.Can include additional details such as metadata and schemas.

Smruti R. Sarangi Dryad LINQ 18/33

Page 19: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

DryadLINQ Methods

All the methods need to be side effect freeShared objects can be distributed in any wayThe functions to access a DryadTable are serializable

GetTable<T>ToDryadTable<T>

Custom partitioning operatorsHashPartition<T,K>RangePartition<T,K>

Functional Operatorsapply(f,dataset) Applies function f to all the elements in adatasetfork(f,dataset) Similar to Apply , but can produce multipleoutput datasets.

Dryad annotations – parallelization, storage policiesSmruti R. Sarangi Dryad LINQ 19/33

Page 20: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

Outline

1 MotivationBasic IdeaRelated Work

2 DryadLINQ DesignSystem ArchitectureLINQExecution Plan GraphMiscellaneous

3 EvaluationTerasortSkyServer

Smruti R. Sarangi Dryad LINQ 20/33

Page 21: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

System Implementation

Execution Plan Graph (EPG)DryadLINQ converts the raw LINQ expressions to the nodesof the EPGThe EPG is a DAGA part of the EPG can also be generated at runtime basedon the values of iterative and conditional expressionsDryadLINQ also needs to respect the metadata (node re-quirements, and parallelization directives) while generatingthe EPGNeeds to support the deferred evaluation of functions

Smruti R. Sarangi Dryad LINQ 21/33

Page 22: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

Static Optimizations

Pipelining : One process executes multiple operations in apipelined fashionRedundancy Removal : Remove dead code, and unneces-sary partitioningEager Aggregation : Intelligently reduce data movement byoptimizing aggregation and repartitioningI/O Reduction : Use TCP pipes, and in-memory channelsto reduce persistence to files

Smruti R. Sarangi Dryad LINQ 22/33

Page 23: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

Dynamic Optimizations

Optimally Implementing OrderByDeterministically sample the values.Plot a histogram, and compute the appropriate keys forrange partitioningA set of vertices now perform the range partitioning.A node now fetches the inputs, and then sorts them. Thesetwo actions can be pipelined.

Smruti R. Sarangi Dryad LINQ 23/33

Page 24: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

Execution Plan of OrderBy

Deterministic Sampling

Deterministic Sampling

Computing the historgram & ranges

Data Fetch Data Fetch

Sort Sort

Smruti R. Sarangi Dryad LINQ 24/33

Page 25: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

Code Generation

The EPG is a virtual execution planDryadLINQ dynamically generates code for each EPG node

DryadLINQ generates a .NET assembly snippet that corre-sponds to each LINQ subexpression.It contains the serialization and I/O code for ferrying data.The EPG node code is generated at the computer of theclient ( job submitter ), because it may depend on the lo-cal context. Values in the local context are embedded inthe function/expression. The expression undergoes partialevaluation later.Uses .NET reflection to find the transitive closure of all .NETlibraries. The EPG code, and all the associated libraries areshipped to the cluster computer for remote execution.

Smruti R. Sarangi Dryad LINQ 25/33

Page 26: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

Outline

1 MotivationBasic IdeaRelated Work

2 DryadLINQ DesignSystem ArchitectureLINQExecution Plan GraphMiscellaneous

3 EvaluationTerasortSkyServer

Smruti R. Sarangi Dryad LINQ 26/33

Page 27: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

Interacting with other Frameworks

PLINQRuns a subexpression in a cluster node in parallel usingmulticore processors.Uses user supplied annotations (mostly transparent to theuser).Uses parallel iterators (similar to OpenMP)

SQLDryadLINQ nodes can directly access SQL databases.They can save internal datasets in SQL tables.Can ship some subexpressions to run directly as SQL pro-cedures.

Smruti R. Sarangi Dryad LINQ 27/33

Page 28: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

Debugging

Debugging massively parallel applications is very difficult

Debugging

Visual Studio .NET interface to debug the DryadLINQ pro-gram on a single computerDryadLINQ has a deterministic replay model

It is possibly to replay the entire execution – event by eventSecondly, it is possible to replay any subexpression on alocal machine and view the outputsPerformance Debugging

Collect detailed profiling information.

Smruti R. Sarangi Dryad LINQ 28/33

Page 29: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

System ArchitectureLINQExecution Plan GraphMiscellaneous

Debugging

Debugging massively parallel applications is very difficult

Debugging

Visual Studio .NET interface to debug the DryadLINQ pro-gram on a single computerDryadLINQ has a deterministic replay model

It is possibly to replay the entire execution – event by eventSecondly, it is possible to replay any subexpression on alocal machine and view the outputsPerformance Debugging

Collect detailed profiling information.

Smruti R. Sarangi Dryad LINQ 28/33

Page 30: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

TerasortSkyServer

Setup

240 computer clusterEach node contains two AMD Opteron nodes16GB of main memoryExperiments

TerasortSort a terabyte size dataset.3.87 GB saved per node.

Smruti R. Sarangi Dryad LINQ 29/33

Page 31: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

TerasortSkyServer

Outline

1 MotivationBasic IdeaRelated Work

2 DryadLINQ DesignSystem ArchitectureLINQExecution Plan GraphMiscellaneous

3 EvaluationTerasortSkyServer

Smruti R. Sarangi Dryad LINQ 30/33

Page 32: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

TerasortSkyServer

Results: Terasort

The number of nodes was varied from 1 to 240Each node stored 3.87 GB of data.The execution time was 120s for 1 machine, and quicklyjumped to 250 s.Then it grew very slowly (sub-linearly) to 320s.

Source [1]

Smruti R. Sarangi Dryad LINQ 31/33

Page 33: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

TerasortSkyServer

Outline

1 MotivationBasic IdeaRelated Work

2 DryadLINQ DesignSystem ArchitectureLINQExecution Plan GraphMiscellaneous

3 EvaluationTerasortSkyServer

Smruti R. Sarangi Dryad LINQ 32/33

Page 34: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

TerasortSkyServer

SkyServer benchmark

Three-way join for two tables containing astronomical data.Size: 11.8 GB and 41.8 GBThe number of machines was varied from 1 to 40The speedup increased from 1 to 19 sub-linearly for DryadLINQThe speedup increased from 1 to 24 sub-linearly for DryadTwo-pass

Source [1]

Smruti R. Sarangi Dryad LINQ 33/33

Page 35: DryadLINQ - Distributed Computation › ~srsarangi › courses › 2020 › col_819_2020 › doc… · Terasort SkyServer Results: Terasort The number ofnodeswas varied from 1 to

MotivationDryadLINQ Design

Evaluation

TerasortSkyServer

DryadLINQ: A System for General-Purpose DistributedData-Parallel Computing Using a High-Level Language byYuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, UlfarErlingsson, Pradeep Kumar Gunda, and Jon Currey, OSDI2008

Smruti R. Sarangi Dryad LINQ 33/33