concurrent collections (cnc)

60
1 Software and Services Group 1 Concurrent Collections (CnC) Kath Knobe Technology Pathfinding and Innovation (TPI) Software and Services Group (SSG) Intel

Upload: fawzi

Post on 23-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Concurrent Collections (CnC). Kath Knobe Technology Pathfinding and Innovation (TPI) Software and Services Group (SSG) Intel . Cholesky Performance. IPDPS’10 (best paper) Aparna Chandramowlishwaran. Intel 2-socket x 4-core Nehalem @ 2.8 GHz + Intel MKL 10.2. Eigensolver Performance. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Concurrent Collections (CnC)

1Software and Services Group 1

Concurrent Collections (CnC)

Kath KnobeTechnology Pathfinding and Innovation (TPI)

Software and Services Group (SSG)Intel

Page 2: Concurrent Collections (CnC)

2Software and Services Group 2

Cholesky Performance

Intel 2-socket x 4-core Nehalem @ 2.8 GHz +

Intel MKL 10.2

IPDPS’10 (best paper)Aparna Chandramowlishwaran

Page 3: Concurrent Collections (CnC)

3Software and Services Group 3

Eigensolver Performance

Intel 2-socket x 4-core Nehalem @ 2.8 GHz + Intel® Math Kernel Libraries

10.2

Multithreaded Intel® MKL

Baseline

Concurrent Collections

Higher is better

Page 4: Concurrent Collections (CnC)

4Software and Services Group 4

The Problem

• Most serial languages over-constrain orderings• Require arbitrary serialization• Allow for overwriting of data• The decision of if and when to execute are bound together

• Most parallel programming languages are embedded within serial languages

• Inherit problems of serial languages• They are too specific wrt type of parallelism in the

application and wrt the type target architecture • For the tuning expert, they don’t provide quite enough

control

Page 5: Concurrent Collections (CnC)

5Software and Services Group 5

explicitly parallel languages(over-constrained)

Concurrent Collections(only semantically required constraints)

explicitly serial languages(over-constrained)

Raise the level of the programming model just enough to avoid over-constraints

The Solution

Page 6: Concurrent Collections (CnC)

6Software and Services Group 66

• Introduction• The Big Idea• Simple C++ Example• Execution

Agenda

Page 7: Concurrent Collections (CnC)

7Software and Services Group 77

So What’s the Big Idea?•Don’t specify what operations run in parallel

−Difficult and depends on target

•Specify the semantic ordering constraints only−Easier and depends only on application

Semantic ordering constraints: The meaning of the program require some computations to execute before others.

Page 8: Concurrent Collections (CnC)

8Software and Services Group 88

Exactly Two Sources of Semantic Ordering Requirements

•Producer / Consumer (Data Dependence)Producer must execute before consumer

•Controller / Controllee (Control Dependence) Controller must execute before controllee

Page 9: Concurrent Collections (CnC)

9Software and Services Group 99

-The domain expert does not need to know about parallelism

Separation of Concerns Between Domain Expert and Tuning Expert

- The tuning expert does not need to know about the domain

- CnC maximizes flexibility for good performance

Goal: separation of concerns

The work of the domain expert• Semantic correctness• Constraints required by the application

The work of the tuning expert• Architecture• Actual parallelism • Locality• Overhead • Load balancing• Distribution among processors• Scheduling within a processor

Page 10: Concurrent Collections (CnC)

10Software and Services Group 1010

Concurrent Collections Spec

-The domain expert does not need to know about parallelism

Separation of Concerns Between Domain Expert and Tuning Expert

- The tuning expert does not need to know about the domain

- CnC maximizes flexibility for good performance

Goal: separation of concerns

The work of the domain expert• Semantic correctness• Constraints required by the application

The work of the tuning expert• Architecture• Actual parallelism • Locality• Overhead • Load balancing• Distribution among processors• Scheduling within a processor

Page 11: Concurrent Collections (CnC)

11Software and Services Group 1111

Concurrent Collections Spec

The application problem

-The domain expert does not need to know about parallelism

Separation of Concerns Between Domain Expert and Tuning Expert

- The tuning expert does not need to know about the domain

- CnC maximizes flexibility for good performance

Goal: separation of concerns

The work of the domain expert• Semantic correctness• Constraints required by the application

The work of the tuning expert• Architecture• Actual parallelism • Locality• Overhead • Load balancing• Distribution among processors• Scheduling within a processor

Page 12: Concurrent Collections (CnC)

12Software and Services Group 1212

Mapping to target platform

Concurrent Collections Spec

The application problem

-The domain expert does not need to know about parallelism

Separation of Concerns Between Domain Expert and Tuning Expert

- The tuning expert does not need to know about the domain

- CnC maximizes flexibility for good performance

Goal: separation of concerns

The work of the domain expert• Semantic correctness• Constraints required by the application

The work of the tuning expert• Architecture• Actual parallelism • Locality• Overhead • Load balancing• Distribution among processors• Scheduling within a processor

Page 13: Concurrent Collections (CnC)

13Software and Services Group 1313

Notation

(foo)

<T>

[x]

Computation Step

Data Item

Control Tag

(foo)

<T>

[x]

TextualGraphical

Page 14: Concurrent Collections (CnC)

14Software and Services Group 1414

Producer - consumer

(step1) (step2)[item]

Exactly Two Sources of Ordering Requirements

•Producer / Consumer (Data Dependence)Producer must execute before consumer

•Controller / Controllee (Control Dependence) Controller must execute before controllee

Page 15: Concurrent Collections (CnC)

15Software and Services Group 1515

Producer - consumer Controller - controllee

(step1) (step2)[item] (step1) (step2)

<t2>

Exactly Two Sources of Ordering Requirements

•Producer / Consumer (Data Dependence)Producer must execute before consumer

•Controller / Controllee (Control Dependence) Controller must execute before controllee

Page 16: Concurrent Collections (CnC)

16Software and Services Group 16

loop k = …

loop j = … loop i = … Z(i, j, k) = = A(k, j) end endend

Tag collections: new concept(step1) (step2)

<t2>

• Loop iteration space is a sequence <1,1,1>, <1,1,2>, …

• Restricted to integer indices. Body accesses values.

• IF = WHEN

Page 17: Concurrent Collections (CnC)

17Software and Services Group 17

Tag collections: new concept(step1) (step2)

<t2>

(Step2)

components of tag <t2> are k, j and i

loop k = …

loop j = … loop i = … Z(i, j, k) = … … = A(k, j) end endend A tag collection is a generalization of an

iteration space• An unordered set, not an ordered sequence

• Not just integer indices. Also graph nodes, tree nodes, set elements, … Body accesses values.

• IF ≠ WHEN

Page 18: Concurrent Collections (CnC)

18Software and Services Group 18

Tag collections: new concept(step1) (step2)

<t2>

(Step2)

components of tag <t2> are k, j and i

loop k = …

loop j = … loop i = … Z(i, j, k) = Z(i-1, j, k) + … … = A(k, j) end endend

[z]

Page 19: Concurrent Collections (CnC)

19Software and Services Group 19

Tag collections: new concept(step1) (step2)

<t2>

(Step2)

Single component of tag <t2> is k

loop k = …

loop j = … loop i = … Z(i, j, k) = = A(k, j) end endend

Page 20: Concurrent Collections (CnC)

20Software and Services Group 2020

• Introduction• The Big Idea• Simple C++ Example• Execution

Agenda

Page 21: Concurrent Collections (CnC)

21Software and Services Group 21

Cholesky factorization

Page 22: Concurrent Collections (CnC)

22Software and Services Group 22

Cholesky

Cholesky factorization

Page 23: Concurrent Collections (CnC)

23Software and Services Group 23

Cholesky

Trisolve

Cholesky factorization

Page 24: Concurrent Collections (CnC)

24Software and Services Group 24

Cholesky

Trisolve Update

Cholesky factorization

Page 25: Concurrent Collections (CnC)

25Software and Services Group 25

Cholesky

Cholesky factorization

Page 26: Concurrent Collections (CnC)

26Software and Services Group 2626

Example: The White Board Drawing What are the high level operations?

What are the chunks of data?

What are the producer/consumer relationships?

What are the inputs and outputs?

(Cholesky)

[array]

(Trisolve)

(Update)

Page 27: Concurrent Collections (CnC)

27Software and Services Group 2727

(Cholesky: iter)

[array: iter, row, col]

(Trisolve: row, iter)

(Update: col, row, iter)

Make it precise enough to executeDistinguish among the instances?

What are the distinct control tag collections?

What steps produce them?

<TrisolveTag: row, iter>

<CholeskyTag: iter>

<UpdateTag: col, row, iter>

Page 28: Concurrent Collections (CnC)

28Software and Services Group 28

Essentially making it ready for parallel execution But without explicit thinking about parallelism Focus on domain/application knowledge

Result is: • Parallel• Deterministic (wrt results)• Race-free

Page 29: Concurrent Collections (CnC)

29Software and Services Group 29

<Cell tags Initial: K, cell> (Correction filter: K, cell)

(Cell tracker: K) (Arbitrator initial: K)

(Arbitrator final: K)

[Input Image: K]

[Histograms: K]

[Motion corrections: K , cell]

[Labeled cells initial: K]

[Predicted states: K, cell]

[Cell candidates: K ]

[State measurements: K]

[Labeled cells final: K]

[Final states: K]

<K>

(Prediction filter: K, cell)

(Cell detector: K)

Cell tracker

<Cell tags Final: K, cell>

Page 30: Concurrent Collections (CnC)

30Software and Services Group 30

Another Application: Face Tracker• Look for face in all possible windows of an image Example: in a 3 x 3 image there are 14 windows• Sequence of classifiers. Any can determine: not face

30

Four 2 X 2

windows

Nine 1 x 1

windows

One 3 x 3

window

Page 31: Concurrent Collections (CnC)

31Software and Services Group 3131

Example: The White Board Drawing

What are the high level operations?

What are the chunks of data?

What are the producer/consumer relationships?

What are the inputs and outputs?

31

[image]

(classifier1)

(classifier2)

(classifier3)

Page 32: Concurrent Collections (CnC)

32Software and Services Group 3232

Make it precise enough to execute

<T2>

<face>

<T3>

(classifier1)

(classifier2)

(classifier3)

<T1>

Distinguish among the instances?

What are the distinct control tag collections?

What steps produce them?

[image]

Page 33: Concurrent Collections (CnC)

33Software and Services Group 33

Objects – summary Step Collections

- Are tagged. A step has access to its tag value- Performs gets and puts- Functional. Only side-effects are put objects

Item Collections – Means of communication

among step instances (data dependence)– Dynamic single assignment

Each instance is associated with exactly one contents.

– Are tagged

Tag Collections – Means of communication

among step instances (control dependence)– A tag collection may control

multiple step collections– Determines what step

instances will execute

(foo)

<T>[x]

Page 34: Concurrent Collections (CnC)

34Software and Services Group 34

Relationships – summary Prescription

>Every step collection is prescribed.>The relationship is always the identity function.>A tag collection may prescribe multiple step collections.

Consumer - Corresponds to gets in steps- A step may consume multiple distinct

item collections [x], [y] -> (foo)- A step may consume multiple

instances of items from a given collection

[x: neighbors(i)] -> (foo: i)

Producer - Corresponds to puts in steps- A step may produce to multiple distinct

collections (foo) -> [x], [y]

- A step may produce multiple instances to a given collection(foo: i) -> [x: neighbors(i)]

(step1)

<t>

(step2)

(step2)[item] (step1) [item]

(step1) <t2>

Page 35: Concurrent Collections (CnC)

35Software and Services Group 3535

.cnc file

// delcarations[BlockedMatrix<double>* array: int, int, int];

// Control relations<TrisolveTag: row, iter> :: (Trisolve: row, iter);

// Producer and consumer relations[array: row, iter, iter], [array: iter, iter, iter+1] -> (Trisolve: row, iter);

(Trisolve: row, iter) -> [array: row, iter, iter+1];

[array: iter, row, col]

(Trisolve)

<TrisolveTag: row, iter>

Page 36: Concurrent Collections (CnC)

36Software and Services Group 36

Coding Hints: Trisolve

StepReturnValue_t Trisolve( cholesky_graph_t& graph, const Tag_t& TS_tag) { // For each input item for this step retrieve the item using the proper tag // User code to create item tag here

BlockedMatrix<double>* ... = graph.array.Get(Tag_t(...)); string span = // Step implementation logic goes here ... // For each output item for this step, put the new item using the proper tag // User code to create item tag here

graph.array.Put(Tag_t(...), ...);span); } return CNC_Success; }

[array: iter, row, col]

(Trisolve)

<TrisolveTag: row, iter>

Page 37: Concurrent Collections (CnC)

37Software and Services Group 37

A model not a language: 1 Variety of ways to access the model

> Textual representation> Class library> Graphical user interface

Page 38: Concurrent Collections (CnC)

38Software and Services Group 38

A model not a language: 2

There are variants within the model – Distinct trade-offs

> Efficiency> Ease-of-use> Generality> Guarantees > …

− Possible different functionality> Continuous (or not)> Tag functions are analyzable (or not)> Real-time (or not)> …

Page 39: Concurrent Collections (CnC)

39Software and Services Group 39

Program execution: Attributes are monotonically acquired

Item instance

Page 40: Concurrent Collections (CnC)

40Software and Services Group 40

Program execution: Attributes are monotonically acquired

available

Item instance

Page 41: Concurrent Collections (CnC)

41Software and Services Group 41

Program execution: Attributes are monotonically acquired

available

Item instance

Tag instance

Page 42: Concurrent Collections (CnC)

42Software and Services Group 42

Program execution: Attributes are monotonically acquired

available

Item instance

Tag instanceavailable

Page 43: Concurrent Collections (CnC)

43Software and Services Group 43

Program execution: Attributes are monotonically acquired

available

Item instance

Tag instanceavailable

Step instance

Inputs-available

prescribed

enabled executed

Page 44: Concurrent Collections (CnC)

44Software and Services Group 44

Program execution: Attributes are monotonically acquired

available

Item instance

Tag instanceavailable

Step instance

Inputs-available

prescribed

enabled executed

Page 45: Concurrent Collections (CnC)

45Software and Services Group 45

Program execution: Attributes are monotonically acquired

available

Item instance

Tag instanceavailable

Step instance

Inputs-available

prescribed

enabled executed

Page 46: Concurrent Collections (CnC)

46Software and Services Group 46

Program execution: Attributes are monotonically acquired

available

Item instance

Tag instanceavailable

Step instance

Inputs-available

prescribed

enabled executed

Page 47: Concurrent Collections (CnC)

47Software and Services Group 47

Program execution: Attributes are monotonically acquired

available

Item instance

Tag instanceavailable

Step instance

Inputs-available

prescribed

enabled executed

Page 48: Concurrent Collections (CnC)

48Software and Services Group 48

Program execution: Attributes are monotonically acquired

available

Item instance

Tag instanceavailable

Step instance

Inputs-available

prescribed

enabled executed

Page 49: Concurrent Collections (CnC)

49Software and Services Group 49

Program execution: Attributes are monotonically acquired

dead

Item instance

Tag instanceavailable

Step instance

Inputs-available

prescribed

enabled executed

Page 50: Concurrent Collections (CnC)

50Software and Services Group 50

Program execution: Attributes are monotonically acquired

dead

Item instance

Tag instancedead

Step instance

Inputs-available

prescribed

enabled executed

Page 51: Concurrent Collections (CnC)

51Software and Services Group 5151

• Introduction and history• The Big Idea• Simple C++ Examples• Execution

Agenda

Page 52: Concurrent Collections (CnC)

52Software and Services Group 52

Plays well with a variety of runtime approaches

grain distribution schedule

HP TStreams

Intel CnC

Georgia Tech CnC

HP TStreams

Rice CnC

static static static

static static dynamic

static dynamic dynamic1

static dynamic dynamic

dynamic2 dynamic dynamic

1We want to allow you to plug your own schedulerMandviwala et. al.

LCPC ‘07[ ]2

Page 53: Concurrent Collections (CnC)

53Software and Services Group 53

Plays well with a variety of tuning experts

• The domain expert / later time• Different person / different expertise• Static analysis• Dynamic Runtime

(This is the option currently available on our website.)• …

Page 54: Concurrent Collections (CnC)

54Software and Services Group 54

Plays well with various types of parallelism

Possible parallelism• Data / loop parallelism• Task parallelism• Pipeline parallelism• Tree based parallelism• Fork/join

Rescheduling serial executions• Memory hierarchy opt• Power

Could have different runtime schedulers for distinct subgraphs of the application

Page 55: Concurrent Collections (CnC)

55Software and Services Group 55

CnC Plays Nicely With Serial Languages

• Intel: C++ / TBB• Rice: Java / Habanaro • Intel: Haskell (preliminary)• Rice: .NET (preliminary)• …

55

Page 56: Concurrent Collections (CnC)

56Software and Services Group 56

CnC Plays Nicely With target architectures

• Shared memory / distributed memory• Homogeneous / heterogeneous• Flat / hierarchical• …

56

Page 57: Concurrent Collections (CnC)

57Software and Services Group 57

•TPIKath KnobeGeoff LowneyMark HamptonRyan NewtonFrank Schlimbach

•ICLChih-Ping ChenMelanie BlowerShin LeeSteve RoseLeo TreggiariGanesh RaoNikolay KurtovJeff Arnold (soon)Mario Deilmann

The academic community• Rice University

Vivek SarkarZoran BudimlicSagnak TasirlarDavid PeixottoMike Burke (+ IBM)

• Georgia Tech Rich Vuduc Aparna Chandramowlishwaran

• Novosibirsk Nikolay Kurtov

• UCLA Jens Palsberg

The HP communityCarl OffnerAlex Nelson

The Intel community

Page 58: Concurrent Collections (CnC)

58Software and Services Group 58

CnC’10 WorkshopCo-located with

Languages and Compilers for parallel Computers (LCPC)

at Rice University in October 2010

[email protected]

Page 59: Concurrent Collections (CnC)

59Software and Services Group 59

Intel (C++): http://whatif.intel.com

Rice (Java): http://habanero.rice.edu/cnc.html

Page 60: Concurrent Collections (CnC)

60Software and Services Group 60