parallel programming: design of an overview...

40
Parallel Programming: Design of an Overview Class Christoph von Praun University of Applied Sciences Nuremberg, Germany [email protected] This work was supported by an IBM Innovation Award grant.

Upload: others

Post on 09-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011

Parallel Programming: Design of an Overview Class

Christoph von PraunUniversity of Applied Sciences

Nuremberg, [email protected]

1This work was supported by an IBM Innovation Award grant.

Page 2: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 2

Summary

• Design of a 3rd year introductory parallel programming class in the Bachelor curriculum: ‘Orientation’ class

• Key characteristics of the class– Organization of topics follows the

Tiers of parallelism– Uses programming language X10– Strong focus on lab sessions

• Teaching materials are available online

Page 3: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 3

Computer Science Master + Bachelor Curriculum

thesis

123456

thesis7

Parallel Programming

The Art of Multiprocessor Programming

Graphics Programming with CUDA

Elective classes

Scientific Computing

‘Orientation’ class

123

Semester

Page 4: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011

Influences on parallel programming classes

4

Parallel programming

Scientific computing

OS

High-performance computing

Software architectureComputer

architecture

Programming languages

Page 5: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011

Influences on parallel programming classes

5

Parallel programming

Scientific computing

OS

High-performance computing

Software architectureComputer

architecture

Programming languages

Page 6: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 6

Outline

• Tiers of parallelism• Course structure and contents• Role of X10• Student feedback and experience

Page 7: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 7

Tiers of parallelism

• Original idea due to Michael L. Scott:– “Don’t start with Dekker’s algorithm ...” [1]– “Making the simple case simple” [2]

• Development of parallel software (parallel programming) can be based on techniques at different abstraction layers– progressively less complexity at higher abstraction

layers

Page 8: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 8

implementation of threads, synchronization mechanisms, non-blocking data structures

parallelization techniques

(1) automatic or implicit

parallelizing compiler

(2) deterministic fully independent computations or serialization

(3) explicitly synchronized (data race free)

critical sections, transactions

(4) low-level (with race conditions)

high-level(simpler)

low-level(more

complex)

Page 9: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 9

implementation of threads, synchronization mechanisms, non-blocking data structures

parallelization techniques

(1) automatic or implicit

parallelizing compiler

(2) deterministic fully independent computations or serialization

(3) explicitly synchronized (data race free)

critical sections, transactions

(4) low-level (with race conditions)

Page 10: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 10

implementation of threads, synchronization mechanisms, non-blocking data structures

parallelization techniques

(1) automatic or implicit

parallelizing compiler

(2) deterministic fully independent computations or serialization

(3) explicitly synchronized (data race free)

critical sections, transactions

(4) low-level (with race conditions)

Goal of the class:

Students should be conscious about ‘their’ tier when developing a parallel

program.

Encourage students to move programming activity to higher tiers in

the abstraction hierarchy.

Page 11: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 11

Tier-1: automatic or implicit parallelism

• Auto-parallelization through compilers• Parallel kernels: parallelism encapsulated in libraries– LAPACK, etc.

• Parallel frameworks: framework organizes parallelism, synchronization and communication, programmer supplies sequential kernels– Map-reduce– Web application frameworks, e.g. WebSphere– etc.

Page 12: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 12

Tier-1: automatic or implicit parallelism

• Auto-parallelization through compilers• Parallel kernels: parallelism encapsulated in libraries– LAPACK, etc.

• Parallel frameworks: framework organizes parallelism, synchronization and communication, programmer supplies sequential kernels– Map-reduce– Web application frameworks, e.g. WebSphere

Sequential semantics.

Page 13: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 13

Tier-2: deterministic parallelism

• Independent computations:– parallel array languages (FORALL loops)– parallel containers (e.g., STAPL, Intel Concurrent

Collections, Hierarchically Tiled Arrays)

• Concurrent computations with dependencies that follow deterministic idioms:– reduction, scan

Page 14: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 14

Tier-2: deterministic parallelism

• Independent computations:– parallel array languages (FORALL loops)– parallel containers (e.g., STAPL, Intel Concurrent

Collections, Hierarchically Tiled Arrays)

• Concurrent computations with dependencies that follow deterministic idioms:– reduction, scan

Semantics through serialization + sequential reasoning.

Page 15: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 15

Tier-3: explicitly synchronized, data-race-free

Three principal programming models• Event-based • Thread-parallel with shared memory– critical sections– condition variables

• Message-based– send/receive– collective communication

Page 16: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 16

Tier-3: explicitly synchronized, data-race-free

Three principal programming models• Event-based • Thread-parallel with shared memory– critical sections– condition variables

• Message-based– send/receive– collective communication

Semantics through interleaving of program blocks

Page 17: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 17

Tier-4: low-level, with race conditions

• Programming with shared memory– atomic load and store– atomic compare and swap

• Platform-specific (Java, X86, ...)• Sequential consistency is often a simplifying assumption– e.g. teaching Dekker’s algorithm

Page 18: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 18

Tier-4: low-level, with race conditions

• Programming with shared memory– atomic load and store– atomic compare and swap

• Platform-specific (Java, X86, ...)• Sequential consistency is often a simplifying assumption– e.g. teaching Dekker’s algorithm

Semantics through interleaving of statements, possibly not

sequentially consistent

Page 19: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 19

Outline

• Tiers of parallelism• Course structure and contents• Role of X10• Student feedback and experience

Page 20: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 20

Roadmap of the class (15 weeks)

Tier-1 (1 week)

Tier-4 (1 + 2 week)

Tier-2 (7 weeks)

Tier-3(2 weeks)

Motivation (1 week)

Principles(1 week)

Topics not addressed in this course

1

2

3 4

5

6

7

Page 21: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 21

Motivation (1 week)

• Hardware trend: – Moore’s Law continues– frequency scaling limited by power density:

multicores• Performance: Software need to be parallel

– challenges (Amdahl’s Law)– opportunities (Gustafson’s Law)

• Energy: Throughput-oriented computing can save energy

Lab session• Pencil and paper

Page 22: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011

Principles (1 week)

• Simple model for concurrent computations– partial orders of operations– synchronization vs ordinary operations– happens-before relation

• Explain semantics of X10 language constructs – async– finish, for-async– atomic

Lab session• Parallel prime number testing

22

Page 23: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011

Tier-4 (1 week)[low-level with race conditions]

• Race conditions • Non-determinacy

– associative non-determinism (floating point)– atomicity violation: lost-update problem

• “Interleaving” semantics

Lab sessions• Numeric integration

23

Page 24: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011

Tier-1 (1 week)[automatic or implicit parallelism]

• Challenges of loop parallelization– intro to data dependencies – difficulties and limitations of dependence analysis on

some loop scenarios• Parallel frameworks

– Map-Reduce, Web-applications

Lab sessions• Development of Map-Reduce applications

(framework provided)

24

Page 25: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 25

Tier-2 (7 weeks)[deterministic parallelism]

Patterns for algorithmic problem decomposition: according to T. Mattson, B. Sanders, B. Massingill, “Patterns for parallel programming”, AW 2005.

• Data parallelism– geometric decomposition, recursive

data– data locality issues

• Task parallelism– task parallel, divide and conquer– task scheduling / load balancing issues

Page 26: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

Lab sessions• Data parallel

– heat-transfer– matrix multiply– algorithms

for reduction and prefix-sum

• Task parallel– map-reduce framework implementation– merge-sort– traveling salesman

X10 Workshop, San Jose - June 4, 2011 26

Tier-2 (7 weeks)[deterministic parallelism]

Page 27: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 27

Tier-3 (2 weeks)[explicitly synchronized]

• Pattern: Pipeline parallelism• Producer-consumer communication through concurrent

queues– critical sections– conditional synchronization

Lab sessions• Array-based concurrent queue with explicit

synchronization (atomic blocks)

Page 28: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011

Tier-4 (2 weeks)[low-level with race conditions]

• Programming with race conditions• Memory models (SC, TSO)

Lab sessions• Lamport’s concurrent non-blocking queue

(1 consumer / 1 produce non-blocking queue)• Observe non-SC behavior of Java

28

Page 29: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011

Topics not addressed in the course

• Patterns for ...– ... locality / reducing data access latency – ... load balancing / distribution of work – ... enhancing parallelism– ... distribution data

• Performance debugging

29

Page 30: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 30

Outline

• Tiers of parallelism• Course structure and contents• Role of X10• Student feedback and experience

Page 31: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 31

X10

• Pragmatic choice: – Syntax familiar to students: “extension of sequential

Java”– Simple things can be expressed with succinct syntax– X10 can express programs at tiers (1)-(3)

memory model not specified -> use Java at tier (4)

• Class was not X10 ‘only’– students could choose their own language for

projects– X10 language tutorial provided separately

Page 32: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

“The language should not be used in future classes , since parallel programming is simplified significantly, and for that reason, one does not run into issues and problems that occur when conventional programming languages are used for parallel programming.”

“Takes a while to be familiar with the type system / type inference.”

“Usability of X10 IDE needs to be improved” [March-June 2010]

X10 Workshop, San Jose - June 4, 2011 32

Student feedback on X10

Page 33: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011 33

Outline

• Tiers of parallelism• Course structure and contents• Role of X10• Student feedback and experience

Page 34: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

Has the course been well-structured and did the structure support your learning?

— 1 —

Evaluation, Parallel Programming (Christoph von Praun)

Auswertung zur Veranstaltung "Parallel Programming" Liebe Dozentin, lieber Dozent,anbei erhalten Sie die Ergebnisse der Evaluation Ihrer Lehrveranstaltung.Zu dieser Veranstaltung wurden 32 Bewertungen abgegeben. Erläuterungen zu den Diagrammen befinden sich am Ende dieses Dokuments.Mit freundlichen Grüßen, Das Evaluationsteam

Aufbereitung und Strukturierung des Lehrstoffes

Darbietung der Lehrstoffes

zu wenig (1) zu viel (5)

2 9 5

1 2 3 4 5

3.19s = 0.63

3.19s = 0.63

Stoffmenge

nicht zu erkennen (1) deutlich sichtbar (5)

2 2 8 4

1 2 3 4 5

3.88s = 0.93

3.88s = 0.93

Roter Faden

überflüssig (1) hilfreich (5)

1 3 4 8

1 2 3 4 5

4.19s = 0.95

4.19s = 0.95

Schriftliche Unterlagen

zu niedrig (1) zu hoch (5)

10 6

1 2 3 4 5

3.38s = 0.48

3.38s = 0.48

Niveau der Veranstaltung

verwirrend (1) übersichtlich (5)

4 7 5

1 2 3 4 5

4.06s = 0.75

4.06s = 0.75

Tafelarbeit/Folien

langweilig (1) motivierend (5)

7 9

1 2 3 4 5

4.56s = 0.5

4.56s = 0.5

Vortragsstil

dürftig (1) ausgeprägt (5)

2 2 12

1 2 3 4 5

4.62s = 0.7

4.62s = 0.7

Diskussionsbereitschaft

nicht kompetent (1) sehr kompetent (5)

1 1 14

1 2 3 4 5

4.81s = 0.53

4.81s = 0.53

Dozent war fachlich

yes, very clear structure

no, poor structure

fair structure

X10 Workshop, San Jose - June 4, 2011 34

Student feedback (1/3)

Feedback collected from 16/21 participants.

Page 35: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

The number of topics and the volume of material presented in class was ...

— 1 —

Evaluation, Parallel Programming (Christoph von Praun)

Auswertung zur Veranstaltung "Parallel Programming" Liebe Dozentin, lieber Dozent,anbei erhalten Sie die Ergebnisse der Evaluation Ihrer Lehrveranstaltung.Zu dieser Veranstaltung wurden 32 Bewertungen abgegeben. Erläuterungen zu den Diagrammen befinden sich am Ende dieses Dokuments.Mit freundlichen Grüßen, Das Evaluationsteam

Aufbereitung und Strukturierung des Lehrstoffes

Darbietung der Lehrstoffes

zu wenig (1) zu viel (5)

2 9 5

1 2 3 4 5

3.19s = 0.63

3.19s = 0.63

Stoffmenge

nicht zu erkennen (1) deutlich sichtbar (5)

2 2 8 4

1 2 3 4 5

3.88s = 0.93

3.88s = 0.93

Roter Faden

überflüssig (1) hilfreich (5)

1 3 4 8

1 2 3 4 5

4.19s = 0.95

4.19s = 0.95

Schriftliche Unterlagen

zu niedrig (1) zu hoch (5)

10 6

1 2 3 4 5

3.38s = 0.48

3.38s = 0.48

Niveau der Veranstaltung

verwirrend (1) übersichtlich (5)

4 7 5

1 2 3 4 5

4.06s = 0.75

4.06s = 0.75

Tafelarbeit/Folien

langweilig (1) motivierend (5)

7 9

1 2 3 4 5

4.56s = 0.5

4.56s = 0.5

Vortragsstil

dürftig (1) ausgeprägt (5)

2 2 12

1 2 3 4 5

4.62s = 0.7

4.62s = 0.7

Diskussionsbereitschaft

nicht kompetent (1) sehr kompetent (5)

1 1 14

1 2 3 4 5

4.81s = 0.53

4.81s = 0.53

Dozent war fachlich

too muchtoo few perfect

X10 Workshop, San Jose - June 4, 2011 35

Student feedback (2/3)

Page 36: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

Did the lab sessions help you to learn and understand the materials presented in class?

X10 Workshop, San Jose - June 4, 2011 36

Student feedback (3/3)

— 2 —

Evaluation, Parallel Programming (Christoph von Praun)

Allgemeines

Gesamteindruck der Veranstaltung

Freitextkommentare Weitere freie Anmerkungen

nicht vorhanden (1) ausgeprägt (5)

3 4 9

1 2 3 4 5

4.38s = 0.78

4.38s = 0.78

Praxisbezug des Lehrstoffes

zu wenig (1) zu viel (5)

6 8 2

1 2 3 4 5

3.75s = 0.66

3.75s = 0.66

Übungsbeispiele (Menge)

ungeeignet (1) vertiefend (5)

1 2 6 7

1 2 3 4 5

4.19s = 0.88

4.19s = 0.88

Übungsbeispiele (Inhalt)

1.56s = 0.5

7 9

1 2 3 4 5 6

1.56s = 0.5

Bitte vergeben Sie eine Schulnote: 1 (sehr gut) bis 6 (ungenügend).

Besonders gut finde ich:

alwaysnever sometimes

Page 37: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

• Focus of discussion on correctness, not performance

• Focus on ‘higher layers’ in the abstraction hierarchy – Less complex than lower tiers– Assumption: People educated in our school are more

likely to do parallel programming at higher rather than lower tiers

• Language X10 not widely used in practice

X10 Workshop, San Jose - June 4, 2011 37

Criticism

Page 38: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

• “Tiers of parallelism” is a fruitful concept– course structure– orientation for students

• Focus on lab sessions important – provided skeletons and solutions for every exercise– few students could chose their own language

(typically much more complex than X10)• X10 turned out to be very good choice

– succinct expression of programs at different tiers– steep learning curve

X10 Workshop, San Jose - June 4, 2011 38

Conclusions

Page 39: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

01 -

[1] Michael L. Scott: “Don’t start with Dekker’s algorithm - top-down introduction to concurrency”, Multicore Programming Education Workshop, 2009.

[2] Michael L. Scott: “Making the simple case simple”, Position paper, Workshop on Curricula for Concurrency, in conjunction with OOPSLA, 2009.

39

Sources

Page 40: Parallel Programming: Design of an Overview Classx10.sourceforge.net/documentation/papers/X10Workshop2011/... · 2011-12-01 · parallel programming is simplified significantly,

X10 Workshop, San Jose - June 4, 2011

Thank you for your attention.

Teaching materials are available at http://www.in.ohm-hochschule.de/professors/praun/pp

40