1 dryad distributed data-parallel programs from sequential building blocks michael isard, mihai...

Post on 31-Dec-2015

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Dryad

Distributed Data-Parallel Programs from Sequential

Building Blocks

Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft Research, Silicon Valley

Presented by: Thomas Hummel

2

Introduction System Overview Dryad Graph Program Development Program Execution Experimental Results Future Work

Agenda

Introduction

Problem How to write efficient distributed programs

easily? Environment

Parallel Processors High Speed Links Administered Domain

Ignore Low Level Issues

3

Introduction

Parallel Execution Faster Execution

Automatic Specification Manual Specification

GPU Shader Distributed Databases MapReduce

4

Introduction

5

Graph Model Verticies Are Programs Edges Are Communication Links

Forced Parallelism Mindset Necessary Abstraction

Introduction

6

GPU Shader Low Level Hardware Specific

MapReduce Simplicity Paramount Performance Sacrificed

Database Implicit Communication Algebra Optimized

Introduction

7

Dryad Fine Communication Control Multiple Input/Output Sets Must Consider Resources

Execution Engine Executes DAG Of Programs Outputs Directed To Inputs No Recursion

System Overview

8

Dryad Job DAG Data Passed On Edges Vertex is a Program

Message Structure User Defined Shared Memory TCP Files

System Overview

9

Dryad Job DAG Data Passed On Edges Vertex is a Program

Message Structure User Defined Shared Memory TCP Files

System Overview

10

System Organization Job Manager Name Server Dameon (Work Nodes)

Dryad Graph

11

Graph Description Language “Embedded” in C++ Combine Sub-Graphs

C++ Class Inherited By Vertex Program Program Name Program Factory

Dryad Graph

12

Vertex Creation C++ Class Inherited By Vertex Program Program Name Program Factory One Vertex Is a Graph

Factory Called Program Specific Arguments Applied

Dryad Graph

13

Edge Creation Composition (Combine) Operation Two Graphs Varying Assignment Methods

Dryad Graph

14

Dryad Graph

15

Communication Channel File I/O By Default TCP Shared Memory

Pitfall: Connected Vertices Must Be On Same Process

Deadlock Avoidance DAG Architecture

Program Development

16

Vertex Program Development C++ Base Classes Status And Errors Reported to Job

Manager Standard “Main” Method Channel Readers/Writers

Supplied Via Argument List

Legacy Programs C++ Wrapper

Program Development

17

Pipelined Execution Assuming Sequential Code Event Based Programming Channels Are Asynchronous Thread Pool Optimized For Verticies

Program Execution

18

Job Manager Job Ends If JM Machine Fails Different Schemes Possible To Avoid This Versioning System For Execution

Instances Vertex Execution

Starts When All Input Channels Ready User Can Specify Execution Machine Can Be Re-Run On Failures Job Ends After All Verticies Have Run

Program Execution

19

Fault Tolerance Re-Run Vertex If Failed Channel Re-Creation (File Recreation) TCP/Shared Memory Failures Cause

Failures On All Connected Vertices Staged Execution Allows Intermediate

Error Checking

Experimental Results

20

SQL Operation 10 Computer Cluster Gigabit Connections

Data Mining Operation 1800 Computer Cluster 10 TB Data Set 11 Minute Execution Time

Future Work

21

Scripting Language Nebula Additional Abstraction

SISS Integration SQL Server Integration

Distributed SQL Queries Query Optimizer

top related