programming clusters with dryadlinq

46
Programming clusters with DryadLINQ Mihai Budiu Microsoft Research, Silicon Valley HPC & GPU Supercomputing Group of Silicon Valley Dec 5, 2011

Upload: early

Post on 23-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Programming clusters with DryadLINQ . Mihai Budiu Microsoft Research, Silicon Valley HPC & GPU Supercomputing Group of Silicon Valley Dec 5 , 2011. Design Space. Grid. Internet. Data- parallel. X. Search. Shared memory. D ata center. Transaction. HPC. Latency (interactive). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Programming clusters with DryadLINQ

Programming clusters with DryadLINQ

Mihai Budiu Microsoft Research, Silicon Valley

HPC & GPU Supercomputing Group of Silicon Valley Dec 5, 2011

Page 2: Programming clusters with DryadLINQ

2

Design Space

Throughput(batch)

Latency(interactive)

Internet

Datacenter

Data-parallel

Sharedmemory

XSearch

HPC

Grid

Transaction

Page 3: Programming clusters with DryadLINQ

3

Software Stack

Windows Server

Cluster (Windows HPC Cluster)

Storage (Distributed Storage Catalog)

Execution (Dryad)

Language (DryadLINQ)

Windows Server

Windows Server

Windows Server

Applications

Page 4: Programming clusters with DryadLINQ

Execution

Application

Data-Parallel Computation

4

Storage

Language

ParallelDatabases

Map-Reduce

GFSBigTable DSC

Dryad

DryadLINQScope

Sawzall,FlumeJava

Hadoop

HDFS

Pig, HiveSQL ≈SQL LINQ, SQLSawzall, Java

Page 5: Programming clusters with DryadLINQ

5

Dryad• Execution layer of Bing analytics• Continuously deployed since 2006• Running on >> 104 machines• Sifting through > 10Pb data daily• Runs on clusters > 3000 machines• Handles jobs with > 105 processes each• Platform for rich software ecosystem

The Dryad by Evelyn De Morgan.

Page 6: Programming clusters with DryadLINQ

6

Dryad = Execution Layer

Job (application)

Dryad

Cluster

Pipeline

Shell

Machine≈

Page 7: Programming clusters with DryadLINQ

7

2-D Piping• Unix Pipes: 1-D

grep | sed | sort | awk | perl

• Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50

Page 8: Programming clusters with DryadLINQ

8

Virtualized 2-D Pipelines

Page 9: Programming clusters with DryadLINQ

9

Virtualized 2-D Pipelines

Page 10: Programming clusters with DryadLINQ

10

Virtualized 2-D Pipelines

Page 11: Programming clusters with DryadLINQ

11

Virtualized 2-D Pipelines

Page 12: Programming clusters with DryadLINQ

12

Virtualized 2-D Pipelines• 2D DAG• multi-machine• virtualized

Page 13: Programming clusters with DryadLINQ

13

Dryad Job Structure

grep

sed

sortawk

perlgrep

grepsed

sort

sort

awk

Inputfiles

Vertices (processes)

Outputfiles

ChannelsStage

Page 14: Programming clusters with DryadLINQ

14

Dryad System Architecture

Files, TCP, FIFO, Networkjob schedule

data plane

control plane

NS,Sched RE RERE

V V V

Graph manager cluster

Page 15: Programming clusters with DryadLINQ

Fault Tolerance

Page 16: Programming clusters with DryadLINQ

16

LINQ

Dryad

=> DryadLINQ

Page 17: Programming clusters with DryadLINQ

17

LINQ = .Net+ Queries

Collection<T> collection;bool IsLegal(Key);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

Page 18: Programming clusters with DryadLINQ

18

Collections

class Collection<T> : IEnumerable<T>;

Elements of type TIterator(current element)

Page 19: Programming clusters with DryadLINQ

19

DryadLINQ Data Model

Partition

Collection

.Net objects

Page 20: Programming clusters with DryadLINQ

20

Collection<T> collection;bool IsLegal(Key k);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

DryadLINQ = LINQ + Dryad

C#

collection

results

C# C# C#

Vertexcode

Queryplan(Dryad job)Data

Page 21: Programming clusters with DryadLINQ

21

Demo

Page 22: Programming clusters with DryadLINQ

Example: counting lines

var table = PartitionedTable.Get<LineRecord>(file);int count = table.Count();

Parse, Count

Sum

Page 23: Programming clusters with DryadLINQ

Example: counting words

var table = PartitionedTable.Get<LineRecord>(file);int count = table

.SelectMany(l => l.line.Split(‘ ‘))

.Count();

Parse, SelectMany, Count

Sum

Page 24: Programming clusters with DryadLINQ

Example: counting unique wordsvar table = PartitionedTable.Get<LineRecord>(file);int count = table

.SelectMany(l => l.line.Split(‘ ‘))

.GroupBy(w => w)

.Count();

GroupBy;Count

HashPartition

Page 25: Programming clusters with DryadLINQ

Example: word histogramvar table = PartitionedTable.Get<LineRecord>(file);var result = table.SelectMany(l => l.line.Split(' ')) .GroupBy(w => w) .Select(g => new { word = g.Key, count = g.Count() });

GroupBy;Count

GroupByCountHashPartition

Page 26: Programming clusters with DryadLINQ

Example: high-frequency wordsvar table = PartitionedTable.Get<LineRecord>(file);var result = table.SelectMany(l => l.line.Split(' '))

.GroupBy(w => w)

.Select(g => new { word = g.Key, count = g.Count() })

.OrderByDescending(t => t.count)

.Take(100);

Sort; Take Mergesort;

Take

Page 27: Programming clusters with DryadLINQ

Example: words by frequencyvar table = PartitionedTable.Get<LineRecord>(file);var result = table.SelectMany(l => l.line.Split(' '))

.GroupBy(w => w)

.Select(g => new { word = g.Key, count = g.Count() })

.OrderByDescending(t => t.count);

Sample

Histogram

Broadcast

Range-partition

Sort

Page 28: Programming clusters with DryadLINQ

Example: Map-Reducepublic static IQueryable<S> MapReduce<T,M,K,S>( IQueryable<T> input,

Func<T, IEnumerable<M>> mapper,Func<M,K> keySelector,Func<IGrouping<K,M>,S> reducer)

{ var map = input.SelectMany(mapper); var group = map.GroupBy(keySelector); var result = group.Select(reducer); return result;}

Page 29: Programming clusters with DryadLINQ

Expectation Maximization

29

• 160 lines • 3 iterations shown

Page 30: Programming clusters with DryadLINQ

30

Probabilistic Index MapsImages

features

Page 31: Programming clusters with DryadLINQ

31

Language Summary

WhereSelectGroupByOrderByAggregateJoin

Page 32: Programming clusters with DryadLINQ

32

Using PLINQQuery

LINQ to HPC

PLINQ

Local query

Page 33: Programming clusters with DryadLINQ

33

So, What Good is it For?

Page 34: Programming clusters with DryadLINQ

34

Application: Kinect Training

Page 35: Programming clusters with DryadLINQ

35

Input device

Page 36: Programming clusters with DryadLINQ

36

Depth computation

Source: http://nuit-blanche.blogspot.com/2010/11/unsing-kinect-for-compressive-sensing.html

Page 37: Programming clusters with DryadLINQ

37

Vision Input: Depth Map + RGB

Page 38: Programming clusters with DryadLINQ

Why is it hard?

Page 39: Programming clusters with DryadLINQ

Background segmentationPlayer separation

Body Part Classifier

Skeletal Tracking Pipeline

39

Depth mapSensor

Body Part Identification

Skeleton

Page 40: Programming clusters with DryadLINQ

40

The Classifier

InputDepth map

OutputBody parts

Classifier

Runs on GPU @ 320x240

Page 41: Programming clusters with DryadLINQ

41

Motion Capture

Page 42: Programming clusters with DryadLINQ

42

Ground Truth Data

3D avatar model

Page 43: Programming clusters with DryadLINQ

43

Learn from Data

Classifier

Training examplesMachine learning

Page 44: Programming clusters with DryadLINQ

Kinect Training

Dryad

DryadLINQ

Machine learning

Decisionforest

Millions of examples needed

Cluster

Page 45: Programming clusters with DryadLINQ

45

Highly efficient parallellization

time

mac

hine

Page 46: Programming clusters with DryadLINQ

Conclusions

46

ClusterLINQDryad

46

=