concurrent collections (cnc)

Post on 23-Feb-2016

50 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Concurrent Collections (CnC). Kath Knobe Technology Pathfinding and Innovation (TPI) Software and Services Group (SSG) Intel . Cholesky Performance. IPDPS’10 (best paper) Aparna Chandramowlishwaran. Intel 2-socket x 4-core Nehalem @ 2.8 GHz + Intel MKL 10.2. Eigensolver Performance. - PowerPoint PPT Presentation

TRANSCRIPT

1Software and Services Group 1

Concurrent Collections (CnC)

Kath KnobeTechnology Pathfinding and Innovation (TPI)

Software and Services Group (SSG)Intel

2Software and Services Group 2

Cholesky Performance

Intel 2-socket x 4-core Nehalem @ 2.8 GHz +

Intel MKL 10.2

IPDPS’10 (best paper)Aparna Chandramowlishwaran

3Software and Services Group 3

Eigensolver Performance

Intel 2-socket x 4-core Nehalem @ 2.8 GHz + Intel® Math Kernel Libraries

10.2

Multithreaded Intel® MKL

Baseline

Concurrent Collections

Higher is better

4Software and Services Group 4

The Problem

• Most serial languages over-constrain orderings• Require arbitrary serialization• Allow for overwriting of data• The decision of if and when to execute are bound together

• Most parallel programming languages are embedded within serial languages

• Inherit problems of serial languages• They are too specific wrt type of parallelism in the

application and wrt the type target architecture • For the tuning expert, they don’t provide quite enough

control

5Software and Services Group 5

explicitly parallel languages(over-constrained)

Concurrent Collections(only semantically required constraints)

explicitly serial languages(over-constrained)

Raise the level of the programming model just enough to avoid over-constraints

The Solution

6Software and Services Group 66

• Introduction• The Big Idea• Simple C++ Example• Execution

Agenda

7Software and Services Group 77

So What’s the Big Idea?•Don’t specify what operations run in parallel

−Difficult and depends on target

•Specify the semantic ordering constraints only−Easier and depends only on application

Semantic ordering constraints: The meaning of the program require some computations to execute before others.

8Software and Services Group 88

Exactly Two Sources of Semantic Ordering Requirements

•Producer / Consumer (Data Dependence)Producer must execute before consumer

•Controller / Controllee (Control Dependence) Controller must execute before controllee

9Software and Services Group 99

-The domain expert does not need to know about parallelism

Separation of Concerns Between Domain Expert and Tuning Expert

- The tuning expert does not need to know about the domain

- CnC maximizes flexibility for good performance

Goal: separation of concerns

The work of the domain expert• Semantic correctness• Constraints required by the application

The work of the tuning expert• Architecture• Actual parallelism • Locality• Overhead • Load balancing• Distribution among processors• Scheduling within a processor

10Software and Services Group 1010

Concurrent Collections Spec

-The domain expert does not need to know about parallelism

Separation of Concerns Between Domain Expert and Tuning Expert

- The tuning expert does not need to know about the domain

- CnC maximizes flexibility for good performance

Goal: separation of concerns

The work of the domain expert• Semantic correctness• Constraints required by the application

The work of the tuning expert• Architecture• Actual parallelism • Locality• Overhead • Load balancing• Distribution among processors• Scheduling within a processor

11Software and Services Group 1111

Concurrent Collections Spec

The application problem

-The domain expert does not need to know about parallelism

Separation of Concerns Between Domain Expert and Tuning Expert

- The tuning expert does not need to know about the domain

- CnC maximizes flexibility for good performance

Goal: separation of concerns

The work of the domain expert• Semantic correctness• Constraints required by the application

The work of the tuning expert• Architecture• Actual parallelism • Locality• Overhead • Load balancing• Distribution among processors• Scheduling within a processor

12Software and Services Group 1212

Mapping to target platform

Concurrent Collections Spec

The application problem

-The domain expert does not need to know about parallelism

Separation of Concerns Between Domain Expert and Tuning Expert

- The tuning expert does not need to know about the domain

- CnC maximizes flexibility for good performance

Goal: separation of concerns

The work of the domain expert• Semantic correctness• Constraints required by the application

The work of the tuning expert• Architecture• Actual parallelism • Locality• Overhead • Load balancing• Distribution among processors• Scheduling within a processor

13Software and Services Group 1313

Notation

(foo)

<T>

[x]

Computation Step

Data Item

Control Tag

(foo)

<T>

[x]

TextualGraphical

14Software and Services Group 1414

Producer - consumer

(step1) (step2)[item]

Exactly Two Sources of Ordering Requirements

•Producer / Consumer (Data Dependence)Producer must execute before consumer

•Controller / Controllee (Control Dependence) Controller must execute before controllee

15Software and Services Group 1515

Producer - consumer Controller - controllee

(step1) (step2)[item] (step1) (step2)

<t2>

Exactly Two Sources of Ordering Requirements

•Producer / Consumer (Data Dependence)Producer must execute before consumer

•Controller / Controllee (Control Dependence) Controller must execute before controllee

16Software and Services Group 16

loop k = …

loop j = … loop i = … Z(i, j, k) = = A(k, j) end endend

Tag collections: new concept(step1) (step2)

<t2>

• Loop iteration space is a sequence <1,1,1>, <1,1,2>, …

• Restricted to integer indices. Body accesses values.

• IF = WHEN

17Software and Services Group 17

Tag collections: new concept(step1) (step2)

<t2>

(Step2)

components of tag <t2> are k, j and i

loop k = …

loop j = … loop i = … Z(i, j, k) = … … = A(k, j) end endend A tag collection is a generalization of an

iteration space• An unordered set, not an ordered sequence

• Not just integer indices. Also graph nodes, tree nodes, set elements, … Body accesses values.

• IF ≠ WHEN

18Software and Services Group 18

Tag collections: new concept(step1) (step2)

<t2>

(Step2)

components of tag <t2> are k, j and i

loop k = …

loop j = … loop i = … Z(i, j, k) = Z(i-1, j, k) + … … = A(k, j) end endend

[z]

19Software and Services Group 19

Tag collections: new concept(step1) (step2)

<t2>

(Step2)

Single component of tag <t2> is k

loop k = …

loop j = … loop i = … Z(i, j, k) = = A(k, j) end endend

20Software and Services Group 2020

• Introduction• The Big Idea• Simple C++ Example• Execution

Agenda

21Software and Services Group 21

Cholesky factorization

22Software and Services Group 22

Cholesky

Cholesky factorization

23Software and Services Group 23

Cholesky

Trisolve

Cholesky factorization

24Software and Services Group 24

Cholesky

Trisolve Update

Cholesky factorization

25Software and Services Group 25

Cholesky

Cholesky factorization

26Software and Services Group 2626

Example: The White Board Drawing What are the high level operations?

What are the chunks of data?

What are the producer/consumer relationships?

What are the inputs and outputs?

(Cholesky)

[array]

(Trisolve)

(Update)

27Software and Services Group 2727

(Cholesky: iter)

[array: iter, row, col]

(Trisolve: row, iter)

(Update: col, row, iter)

Make it precise enough to executeDistinguish among the instances?

What are the distinct control tag collections?

What steps produce them?

<TrisolveTag: row, iter>

<CholeskyTag: iter>

<UpdateTag: col, row, iter>

28Software and Services Group 28

Essentially making it ready for parallel execution But without explicit thinking about parallelism Focus on domain/application knowledge

Result is: • Parallel• Deterministic (wrt results)• Race-free

29Software and Services Group 29

<Cell tags Initial: K, cell> (Correction filter: K, cell)

(Cell tracker: K) (Arbitrator initial: K)

(Arbitrator final: K)

[Input Image: K]

[Histograms: K]

[Motion corrections: K , cell]

[Labeled cells initial: K]

[Predicted states: K, cell]

[Cell candidates: K ]

[State measurements: K]

[Labeled cells final: K]

[Final states: K]

<K>

(Prediction filter: K, cell)

(Cell detector: K)

Cell tracker

<Cell tags Final: K, cell>

30Software and Services Group 30

Another Application: Face Tracker• Look for face in all possible windows of an image Example: in a 3 x 3 image there are 14 windows• Sequence of classifiers. Any can determine: not face

30

Four 2 X 2

windows

Nine 1 x 1

windows

One 3 x 3

window

31Software and Services Group 3131

Example: The White Board Drawing

What are the high level operations?

What are the chunks of data?

What are the producer/consumer relationships?

What are the inputs and outputs?

31

[image]

(classifier1)

(classifier2)

(classifier3)

32Software and Services Group 3232

Make it precise enough to execute

<T2>

<face>

<T3>

(classifier1)

(classifier2)

(classifier3)

<T1>

Distinguish among the instances?

What are the distinct control tag collections?

What steps produce them?

[image]

33Software and Services Group 33

Objects – summary Step Collections

- Are tagged. A step has access to its tag value- Performs gets and puts- Functional. Only side-effects are put objects

Item Collections – Means of communication

among step instances (data dependence)– Dynamic single assignment

Each instance is associated with exactly one contents.

– Are tagged

Tag Collections – Means of communication

among step instances (control dependence)– A tag collection may control

multiple step collections– Determines what step

instances will execute

(foo)

<T>[x]

34Software and Services Group 34

Relationships – summary Prescription

>Every step collection is prescribed.>The relationship is always the identity function.>A tag collection may prescribe multiple step collections.

Consumer - Corresponds to gets in steps- A step may consume multiple distinct

item collections [x], [y] -> (foo)- A step may consume multiple

instances of items from a given collection

[x: neighbors(i)] -> (foo: i)

Producer - Corresponds to puts in steps- A step may produce to multiple distinct

collections (foo) -> [x], [y]

- A step may produce multiple instances to a given collection(foo: i) -> [x: neighbors(i)]

(step1)

<t>

(step2)

(step2)[item] (step1) [item]

(step1) <t2>

35Software and Services Group 3535

.cnc file

// delcarations[BlockedMatrix<double>* array: int, int, int];

// Control relations<TrisolveTag: row, iter> :: (Trisolve: row, iter);

// Producer and consumer relations[array: row, iter, iter], [array: iter, iter, iter+1] -> (Trisolve: row, iter);

(Trisolve: row, iter) -> [array: row, iter, iter+1];

[array: iter, row, col]

(Trisolve)

<TrisolveTag: row, iter>

36Software and Services Group 36

Coding Hints: Trisolve

StepReturnValue_t Trisolve( cholesky_graph_t& graph, const Tag_t& TS_tag) { // For each input item for this step retrieve the item using the proper tag // User code to create item tag here

BlockedMatrix<double>* ... = graph.array.Get(Tag_t(...)); string span = // Step implementation logic goes here ... // For each output item for this step, put the new item using the proper tag // User code to create item tag here

graph.array.Put(Tag_t(...), ...);span); } return CNC_Success; }

[array: iter, row, col]

(Trisolve)

<TrisolveTag: row, iter>

37Software and Services Group 37

A model not a language: 1 Variety of ways to access the model

> Textual representation> Class library> Graphical user interface

38Software and Services Group 38

A model not a language: 2

There are variants within the model – Distinct trade-offs

> Efficiency> Ease-of-use> Generality> Guarantees > …

− Possible different functionality> Continuous (or not)> Tag functions are analyzable (or not)> Real-time (or not)> …

39Software and Services Group 39

Program execution: Attributes are monotonically acquired

Item instance

40Software and Services Group 40

Program execution: Attributes are monotonically acquired

available

Item instance

41Software and Services Group 41

Program execution: Attributes are monotonically acquired

available

Item instance

Tag instance

42Software and Services Group 42

Program execution: Attributes are monotonically acquired

available

Item instance

Tag instanceavailable

43Software and Services Group 43

Program execution: Attributes are monotonically acquired

available

Item instance

Tag instanceavailable

Step instance

Inputs-available

prescribed

enabled executed

44Software and Services Group 44

Program execution: Attributes are monotonically acquired

available

Item instance

Tag instanceavailable

Step instance

Inputs-available

prescribed

enabled executed

45Software and Services Group 45

Program execution: Attributes are monotonically acquired

available

Item instance

Tag instanceavailable

Step instance

Inputs-available

prescribed

enabled executed

46Software and Services Group 46

Program execution: Attributes are monotonically acquired

available

Item instance

Tag instanceavailable

Step instance

Inputs-available

prescribed

enabled executed

47Software and Services Group 47

Program execution: Attributes are monotonically acquired

available

Item instance

Tag instanceavailable

Step instance

Inputs-available

prescribed

enabled executed

48Software and Services Group 48

Program execution: Attributes are monotonically acquired

available

Item instance

Tag instanceavailable

Step instance

Inputs-available

prescribed

enabled executed

49Software and Services Group 49

Program execution: Attributes are monotonically acquired

dead

Item instance

Tag instanceavailable

Step instance

Inputs-available

prescribed

enabled executed

50Software and Services Group 50

Program execution: Attributes are monotonically acquired

dead

Item instance

Tag instancedead

Step instance

Inputs-available

prescribed

enabled executed

51Software and Services Group 5151

• Introduction and history• The Big Idea• Simple C++ Examples• Execution

Agenda

52Software and Services Group 52

Plays well with a variety of runtime approaches

grain distribution schedule

HP TStreams

Intel CnC

Georgia Tech CnC

HP TStreams

Rice CnC

static static static

static static dynamic

static dynamic dynamic1

static dynamic dynamic

dynamic2 dynamic dynamic

1We want to allow you to plug your own schedulerMandviwala et. al.

LCPC ‘07[ ]2

53Software and Services Group 53

Plays well with a variety of tuning experts

• The domain expert / later time• Different person / different expertise• Static analysis• Dynamic Runtime

(This is the option currently available on our website.)• …

54Software and Services Group 54

Plays well with various types of parallelism

Possible parallelism• Data / loop parallelism• Task parallelism• Pipeline parallelism• Tree based parallelism• Fork/join

Rescheduling serial executions• Memory hierarchy opt• Power

Could have different runtime schedulers for distinct subgraphs of the application

55Software and Services Group 55

CnC Plays Nicely With Serial Languages

• Intel: C++ / TBB• Rice: Java / Habanaro • Intel: Haskell (preliminary)• Rice: .NET (preliminary)• …

55

56Software and Services Group 56

CnC Plays Nicely With target architectures

• Shared memory / distributed memory• Homogeneous / heterogeneous• Flat / hierarchical• …

56

57Software and Services Group 57

•TPIKath KnobeGeoff LowneyMark HamptonRyan NewtonFrank Schlimbach

•ICLChih-Ping ChenMelanie BlowerShin LeeSteve RoseLeo TreggiariGanesh RaoNikolay KurtovJeff Arnold (soon)Mario Deilmann

The academic community• Rice University

Vivek SarkarZoran BudimlicSagnak TasirlarDavid PeixottoMike Burke (+ IBM)

• Georgia Tech Rich Vuduc Aparna Chandramowlishwaran

• Novosibirsk Nikolay Kurtov

• UCLA Jens Palsberg

The HP communityCarl OffnerAlex Nelson

The Intel community

58Software and Services Group 58

CnC’10 WorkshopCo-located with

Languages and Compilers for parallel Computers (LCPC)

at Rice University in October 2010

kath.knobe@intel.com

59Software and Services Group 59

Intel (C++): http://whatif.intel.com

Rice (Java): http://habanero.rice.edu/cnc.html

60Software and Services Group 60

top related