parallel and concurrent programming introduction …...parallel and concurrent programming...

97
Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting with CPU Cache Mutual Exclusion Definitions Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle [email protected] http://wiki-prog.infoprepa.epita.fr

Upload: others

Post on 27-Mar-2020

41 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Parallel and Concurrent ProgrammingIntroduction and Foundation

Marwan Burelle

[email protected]://wiki-prog.infoprepa.epita.fr

Page 2: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Outline

1 IntroductionParallelism in Computer ScienceNature of ParallelismGlobal Lecture Overview

2 Being ParallelGain ?Models of Hardware ParallelismDecomposition

3 FoundationsTasks SystemsProgram DeterminismMaximal Parallelism

Page 3: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Outline

4 Interracting with CPU CacheFalse SharingMemory Fence

5 Mutual ExclusionClassic Problem: Shared CounterCritical Section and Mutual ExclusionSolutions with no locks

6 Definitions

Page 4: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

IntroductionParallelism in ComputerScience

Nature of Parallelism

Global Lecture Overview

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Introduction

Introduction

Page 5: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

IntroductionParallelism in ComputerScience

Nature of Parallelism

Global Lecture Overview

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Evolutions

Next evolutions in processor tends more on more ongrowing of cores’ numberGPU and similar extensions follows the same pathand introduce extra parallelism possibilitiesNetwork evolutions and widespread of internetfortify clustering techniques and grid computing.Concurrency and parallelism are no recent concern,but are emphasized by actual directions of market.

Page 6: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

IntroductionParallelism in ComputerScience

Nature of Parallelism

Global Lecture Overview

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Nature of Parallel Programming

Determinism Synchronization

Algorithms

Parallel Programming

Page 7: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

IntroductionParallelism in ComputerScience

Nature of Parallelism

Global Lecture Overview

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Parallelism in Computer Science

Parallelism in Computer Science

Page 8: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

IntroductionParallelism in ComputerScience

Nature of Parallelism

Global Lecture Overview

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Tools and Techniques

Threads APIs:POSIX ThreadsWin32 ThreadsJavaThreadsC11 new thread API. . .

Parallelism at Language Level:AdaErlangF#go. . .

Higher Levels Library and related:OpenMPBoost’s ThreadsIntel’s TBBCuda/OpenCL/DirectX Compute ShadersSDL threads, QT threads . . .. . .

Page 9: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

IntroductionParallelism in ComputerScience

Nature of Parallelism

Global Lecture Overview

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Models and theories

Execution ModelsActor ModelMessage PassingFuturesCSP (Communicating Sequential Processes)CCS (Calculus of Communicating Systems)

Theoretical Models and proof toolsπ−calculusAmbient calculus (and Boxed Ambient)Petri NetsBisimulationTrace Theory

Critical SectionLocksConditionsSemaphoresMonitors

. . .

Page 10: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

IntroductionParallelism in ComputerScience

Nature of Parallelism

Global Lecture Overview

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

A Bit of History

late 1950’s: first discussion about parallel computing.1962: D825 by Burroughs Corporation (four processors.)1967: Amdahl and Slotnick published a debat aboutfeasibility of parallel computing and introduceAmdahl’s law about limit of speed-up due to parallelcomputing.1969: Honeywell’s Multics introduced first SymmetricMultiprocessor (SMP) system capable of running up toeight processors in parallel.1976: The first Cray-1 is installed at Los AlamosNationnal Laboratory. The major break-through ofCray-1 is its vector instructions set capable ofperforming an operation on each element of a vectorin parallel.1983: CM-1 Connection Machine by Thinking Machineoffers 65536 1-bit processors working on a SIMD(Single Instruction, Multiple Data) model.

Page 11: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

IntroductionParallelism in ComputerScience

Nature of Parallelism

Global Lecture Overview

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

A Bit of History (2)

1991: Thinking Machine introduced CM-5 using aMIMD architecture based on a fat tree network ofSPARC RISC processors.1990’s: modern micro-processors are often capable ofbeing run in an SMP (Symmetric MultiProcessing)model. It begans with processors such as Intel’s486DX, Sun’s UltraSPARC, DEC’s Alpha IBM’s POWER. . . Early SMP architectures was based onmotherboard providing two or more sockets forprocessors.2002: Intel introduced the first processor withHyper-Threading technology (running two threads onone physical processor) derived from DEC previouswork.2006: First multi-core processors appears (sevralprocessors in one ship.). . .

Page 12: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

IntroductionParallelism in ComputerScience

Nature of Parallelism

Global Lecture Overview

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Nature of Parallelism

Nature of Parallelism

Page 13: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

IntroductionParallelism in ComputerScience

Nature of Parallelism

Global Lecture Overview

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Hardware Parallelism

Symetric MultiProcessor: generic designation ofidentical processors running in parallel under thesame OS supervision.Multi-Core: embedding multiple processor unit onthe same die: decrease cost, permit sharing ofcommon elements, permit shared on die cache.Hyper-Threading: using the same unit (core orprocessor) to run two (or more) logical unit: eachlogical unit run slower but parallelism can offer gainand opportunities of more efficient interleaving (usingindependent elements of the hardware.)

Page 14: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

IntroductionParallelism in ComputerScience

Nature of Parallelism

Global Lecture Overview

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Hardware Parallelism

Vectorized Instructions: usual extensions providingparallel execution of a single operation on all elementsof a vector.Non-Uniform Memory Access (NUMA):architecture where each processor has its ownmemory with dedicated access (to limit false sharingfor example.)

Page 15: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

IntroductionParallelism in ComputerScience

Nature of Parallelism

Global Lecture Overview

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Software and Parallelism

In order to take advantage of hardware parallelism,the operating system must support SMP.Operating Systems can offer parallelism to applicationeven when hardware is not, by usingmulti-programming.The system have two ways to provide parallelism:threads and processus.When only processus are available, no memoryconflicts can arise, but processus have to use variouscommunication protocole to exchange data andsynchronized themselves.When memory is shared, programs have to managememory accesses to avoid conflict, but data exchangebetween threads is far simpler.

Page 16: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

IntroductionParallelism in ComputerScience

Nature of Parallelism

Global Lecture Overview

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Programming Parallel Applications

Operating systems supporting parallelism and threads,normally provide API for parallel programming.Support for parallelism can be either primitive in theprogramming language or added by means of APIand libraries.The most common paradigm is the explicit use ofthreads backend by the Kernel.Some languages try to offer implicit parallelism, or atleast parallel blocks, but producing smart parallelcode is tedious.Some hardware parallelism features (vectorinstructions or multi-operations) may need specialsupport from the language.Modern frameworks for parallelism find a convenientway by abstracting system threads management andoffering more simple threads manipulation (OpenMP,Intel’s TBB . . . )

Page 17: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

IntroductionParallelism in ComputerScience

Nature of Parallelism

Global Lecture Overview

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Global Lecture Overview

Global Lecture Overview

Page 18: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

IntroductionParallelism in ComputerScience

Nature of Parallelism

Global Lecture Overview

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Global Lecture Overview

1 Introduction to parallelism(this course)

2 Synchronization and POSIX ThreadsHow to enforce safe data sharing using various synchronizationtechniques, illustrated using POSIX Threads.

3 Algorithms and Data StructuresHow to adapt or write algorithms and data structures in a parallel world(shared queues, tasks scheduling, lock free structures . . . )

4 TBB and other higher-level toolsProgramming using Intel’s TBB . . .

Page 19: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being ParallelGain ?

Models of HardwareParallelism

Decomposition

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Being Parallel

Being Parallel

Page 20: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being ParallelGain ?

Models of HardwareParallelism

Decomposition

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Gain ?

Gain ?

Page 21: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being ParallelGain ?

Models of HardwareParallelism

Decomposition

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Amdahl’s law

Amdahl’s lawIf P is a part of a computation that can be made parallel, then themaximum speed-up (with respect to the sequential version) ofrunning this program on a N processors machine is:

1(1 − P) + P

N

Page 22: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being ParallelGain ?

Models of HardwareParallelism

Decomposition

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Amdahl’s law

Suppose, half of a program can be made parallel andwe run it on a four processors, then we have amaximal speed-up of: 1

(1−0.5)+ 0.54

= 1.6 which means

that the program will run 60% faster.For the same program, running on a 32 processorswill have a speed up of 1.94.We can observe that when N tends toward infinity, thespeed-up tends to 2 ! We can not do better than twotime faster even with a relatively high number ofprocessors !

Page 23: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being ParallelGain ?

Models of HardwareParallelism

Decomposition

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Gustafson’s law

Gustafson’s lawLet P be the number of processor and α the sequential fractionof the parallel excution time, then we define the scaled-speedup,noted S(P), as:

S(P) = P + α × (P − 1)

Amdahl’s law consider a fixed amount of work andfocus on mimimal time excutionGustafson’s law consider a fixed execution time anddescribe the increased size of problemThe main consequences of Gustafson’s law is that, wecan always increase size of the solved problem (for afixed amount of time) by increasing the number ofprocessor

Page 24: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being ParallelGain ?

Models of HardwareParallelism

Decomposition

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Models of Hardware Parallelism

Models of Hardware Parallelism

Page 25: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being ParallelGain ?

Models of HardwareParallelism

Decomposition

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Flynn’s Taxonomy

Single Instruction Multiple InstructionSingle Data SISD MISD

Multiple Data SIMD MIMD

SISD: usual non-parallel systemsSIMD: performing the same operations on variousdata (like vector computing.)MISD: uncommon model where several operationsare performed on the same data, usually implies thatall operations must agreed on the result (fault tolerantcode such as in space-shuttle controller.)MIMD: most common actual model.

Page 26: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being ParallelGain ?

Models of HardwareParallelism

Decomposition

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Decomposition

Decomposition

Page 27: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being ParallelGain ?

Models of HardwareParallelism

Decomposition

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Going Parallel

To move from a linear program to a parallel program,we have to break our activities in order to things inparallelWhen decomposing you have to take into accountvarious aspects:

Activities independance (how much synchronizationwe need)Load Balancing (using all available threads)Decomposition overhead

Choosing the right decomposition is the most criticalchoice you have to make when designing a parallel pro-gram

Page 28: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being ParallelGain ?

Models of HardwareParallelism

Decomposition

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Decomposition Strategies

Task Driven the problem is splited in (almost)independant tasks ran in parallel;

Data Driven all running task perform the sameoperations on a partition of the originaldata set;

Data Flow Driven the whole activities is decomposed ina chain of dependant tasks, a pipeline,where each task depends on theoutput of the previous one.

Page 29: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being ParallelGain ?

Models of HardwareParallelism

Decomposition

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Task Driven Decomposition

In order to perform task driven decomposition, youneed to:

list activities in your program,establish groups of dependent activities that will formstandalone tasks,identify interraction and possible data conflictbetween tasks

Task driven decomposition is technically simple toimplement and can be particulary efficient whenactivities are well segmentedOn the other hand, this strategy is higly constraint bythe nature of your activities, their relationship anddependencies.Load balancing (maintaining thread activities at itsmaximum) is often hard to achieve due to the fixednature of the decomposition.

Page 30: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being ParallelGain ?

Models of HardwareParallelism

Decomposition

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Data Driven Decomposition

In order to perform data driven decomposition, youneed to:

Defines the common task performed on each subset ofdataFind a coherent data division strategy that respectload balancing, data conflict and other memory issue(such as cache false sharing,)You often have to prepare a recollection phase tocompute final result (probably a fully linearcomputation,)

Data driven decomposition scales pretty well, as dataset size grows partitionning becomes more efficientagainst sequential or task based approach;Care must be taken when choosing data partitionningin order to obtain maximal perfomancesToo much partitionning or too small data set willprobably induce higher overhead.

Page 31: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being ParallelGain ?

Models of HardwareParallelism

Decomposition

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Data Flow Driven Decomposition

In order to perform data flow driven decomposition,you need to:

Split activities along the flow of execution in order toidentify tasksModel data exchange between tasksChoose a data partitionning (the flow’s grain) strategy

With respect to Ford’s concept of production line,execution time for a chunk of data correspond to theexecution time of the longest task, the global time isthus this execution time multiply by the number ofchunks plus two times the cost of the whole line;Needs carefull design and conception, data channelsand efficient data partitionning, probably the morecomplex (but the more realistic) approach

Page 32: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being ParallelGain ?

Models of HardwareParallelism

Decomposition

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

The Choice of The Pragmatic Programmer

Data driven approach yield pretty good result whenperforming single operations on huge set of data;Task driven approach are more suited for concurrencyissues (performing various activities in parallel tomask waiting time.)Data flow driven approach often offers a skeletontogether with specific data or task drivendecomposition for particular tasks in the global flow;Data flow driven decomposition requires complexinfrastructure but provides a more adaptable solutionfor a complete chain of activities;Of course, in realistic situation, we’ll often choose amid-term decomposition mixing various approaches.

Page 33: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being ParallelGain ?

Models of HardwareParallelism

Decomposition

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Strategies and Patterns

There are various design patterns (we’ll see some later)dedicated to parallel programming, here are someexamples:

Divide and Conquer: data decomposition (divide)and recollection (conquer)Pipeline: data flow decompositionWave Front: parallel topological traversal of a graphof tasksGeometric Decomposition: divide data-set inrectangles

Page 34: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Foundations

Foundations

Page 35: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Tasks Systems

Tasks Systems

Page 36: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Tasks

We will describe parallel programs by a notion of task.A task T is an instruction in our program. For the sakeof clarity, we will limit our study to task of the form:T : VAR = EXPRwhere VAR is a memory location (can be seen as a variable)and EXPR are usuall expressions with variables, constantsand basic operators, but no function calls.A task T can be represented by two sets of memorylocations (or variables): IN(T) the set of memorylocations used as input and OUT(T) the set of memorylocations affected by T.IN(T) and OUT(T) can, by them self, be seen aselementary task (as reading or writing values.) Andthus our finest grain description of a programexecution will be a sequence of IN() and OUT() tasks.

Page 37: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Tasks (examples)

Example:

Let P1 be a simple sequential program we present it hereusing task and memory locations sets:

T1 : x = 1

T2 : y = 5

T3 : z = x + y

T4 : w = |x - y|

T5 : r = (z + w)/2

T1 : IN(T1)=∅OUT(T1) = {x}

T2 : IN(T2)=∅OUT(T2) = {y}

T3 : IN(T3)={x, y}OUT(T3) = {z}

T4 : IN(T4)={x, y}OUT(T4) = {w}

T5 : IN(T5)={z, w}OUT(T5) = {r}

Page 38: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Execution and Scheduling

Given two sequential programs (a list of tasks) aparallel execution is a list of tasks resulting of thecomposition of the two programs.Since, we do not control the scheduler, the onlyconstraint on an execution is the preservation of theorder between tasks of the same program.Scheduling does not undestand our notion of task, itrather works at assembly instructions level, and thus,we can assume that a task T can be interleaved withanother task between the realisation of the subtaskIN(T) and the realisation of the subtask OUT(T).As for tasks, the only preserved order is that IN(T)allways appears before OUT(T).Finally, an execution can be modelized by anordered sequence of input and output sets ofmemory locations.

Page 39: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Execution (example)

Example:

Given the two programs P1 and P2:

T11 : x = 1T12 : y = x + 1

T21 : y = 1T22 : x = y - 1

The following sequences are valid parallel execution ofp1//P2:

E1 = IN(T11); OUT(T11); IN(T12); OUT(T12); IN(T21); OUT(T21); IN(T22); OUT(T22)E2 = IN(T21); OUT(T21); IN(T22); OUT(T22); IN(T11); OUT(T11); IN(T12); OUT(T12)E3 = IN(T11); IN(T21); OUT(T11); OUT(T21); IN(T12); IN(T22); OUT(T22); OUT(T12)

At the end of each executions we can observe each value inboth memory locations x and y:E1 x = 0 and y = 1E2 x = 1 and y = 2E3 x = 0 and y = 2

Page 40: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

The issue !

In the previous example, it is obvious that twodifferent executions of the same parallel program maygive different results.

In a linear programming, given fixed inputs,programs’ executions always give the same result.

Normally, programs and algorithms are supposed tobe deterministic, using parallelism it is obviously notalways the case !

Page 41: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Program Determinism

Program Determinism

Page 42: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Tasks’ Dependencies

In order to completely describe parallel programs andparallel executions of programs, we introduce anotion of dependencies between tasks.Let E be a set of tasks and (<) a well foundeddependency order on E.A pair of tasks T1 and T2 verify T1 < T2 if the sub-taskOUT(T1) must occurs before the sub-task IN(T2).A Task System (E, <) is the definition of a set, E, oftasks and a dependency order (<) on E. It describes acombination of sevral sequential programs into aparallel program (or a fully sequential program if (<)is total.) Tasks of a same sequential program have anatural ordering, but we can also define orderingbetween tasks of different programs, or betweenprograms.

Page 43: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Task Language

Task Language

Let E = {T1, . . . ,Tn} be a set of task, A = {IN(T1), . . . , OUT(Tn)}a vocabulary based on sub-task of E and (<) an orderingrelation on E.The language associated with a task system S = (E, <),noted L(S), is the set of words ω on the vocabulary A suchthat for every Ti in E there is exactly one occurence of IN(Ti)and one occurence of OUT(Ti) and the former appearingbefore the latter. If Ti < Tj then OUT(Ti) must appear beforeIN(Tj).

We can define the product of system S1 and S2 byS1 × S2 such that L(S1 × S2) = L(S1).L(S2) (_._ is theconcatenation of language.)We can also define parallel combination of tasksystem: S1//S2 = (E1 ∪ E2, <1 ∪ <2) (whereE1 ∩ E2 = ∅.)

Page 44: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Tasks’ Dependencies and Graph

The relation (<) can sometimes be represented usingdirected graph (and thus graph visualizationmethods.)

In order to avoid overall complexity, we use thesmallest relation (<min) with the same transitiveclosure as (<) rather than (<) directly.

Such a graph is of course directed and without cycle.Vertexes are task and an edge between from T1 to T2implies that T1 < T2.

This graph is often call Precedence Graph.

Page 45: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Precedence Graph (example)

{T1;{//{T2;T3;T4};{T5;T6;T7};

//};T8;

}

T1

T2 T5

T3 T6

T4

T8

T7

If we define S1 = {T1}, S2 = {T2 T3 T4}, S3 = {T5 T6 T7} and S4= {T8}. Then the resulting system (described by the graph above) is:

S = S1 × (S2//S3) × S4

Page 46: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Notion of Determinism

Deterministic System

A deterministic task system S = (E, <) is such that for everypair of words ω and ω′ of L(S) and for every memorylocations X, sequences values affected to X are the samefor ω and ω′.

A deterministic system, is a tasks system where every possibleexecutions are not distinguishable by only observing the evolutionof values in memory locations (observational equivalence, a kindof bisimulation.)

Page 47: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Determinism

The previous definition may seems too restrictive to beusefull.In fact, one can exclude local memory locations (i.e.memory locations not shared with other programs) ofthe observational property.In short, the deterministic behavior can be limited to arestricted set of meaningful memory locations,excluding temporary locations used for innercomputations.The real issue here is the provability of thedeterministic behavior: one can not possibly testevery execution path of a given system.We need a finite property independant of thescheduling (i.e. a property relying only on the system.)

Page 48: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Non-Interference

Non-Interference (NI) is a general property used inmany context (especially language level security.)Two tasks are non-interfering, if and only if the valuestaken by memory locations does not depend on theorder of execution of the two tasks.

Non InterferenceLet S = (E, <) be a tasks system, T1 and T2 be two task ofE, then T1 and T2 are non-interfering if and only if, theyverify one of the two following properties:

T1 < T2 or T2 < T1 (the system force a particularorder.)IN(T1) ∩ OUT(T2) = IN(T2) ∩ OUT(T1) =OUT(T1) ∩ OUT(T2) = ∅

Page 49: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

NI and determinism

The NI definitions is a based on the contraposition ofthe Bernstein’s conditions (defining when two tasksare dependent.)Obviously, two non-interfering tasks do not introducenon-deterministic behavior in a system (they arealready ordered or the order of their execution is notrelevant.)

TheoremLet S = (E, <) be a tasks system, S is a deterministic system ifevery pair of tasks in E are non-interfering.

Page 50: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Equivalent Systems

We now extends our use of observational equivalenceto compare systems.The idea is that we can not distinguish two systemsthat have the same behavior (affect the same sequenceof values in a particular set of memory locations.)

Equivalent Systems

Let S1 = (E1, <1) and S2 = (E2, <2) be two tasks systems. S1and S2 are equivalent if and only if:

E1 = E2

S1 and S2 are deterministicFor every words ω1 ∈ L(S1) and ω2 ∈ L(S2), for every(meaningful) memory location X, ω1 and ω2 affect thesame sequence of values to X.

Page 51: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Maximal Parallelism

Maximal Parallelism

Page 52: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Maximal Parallelism

Now that we can define and verify determinism oftasks systems, we need to be able to assure a kind ofmaximal parallelism.Maximal parallelism describes the minimalsequentiality and ordering needed to staydeterministic.A system with maximal parallelism can’t be moreparallel without introduction of non-deterministicbehavior (and thus inconsistency.)Being able to build (or transform systems into)maximally parallel systems, garantees usage of aparallel-friendly computer at its maximum capacity forour given solution.

Page 53: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Maximal Parallelism

Maximal ParallelismA tasks system with maximal parallelism, is a tasks whereone can not remove dependency between two tasks T1 andT2 without introducing interference between T1 and T2.

TheoremFor every deterministic system S = (E, <) there exists anequivalent system with maximal parallelism Smax = (E, <max)with (<max) defined as:

T1 <max T2 if

T1 <C T2∧ OUT(T1) , ∅ ∧ OUT(T2) , ∅

IN(T1) ∩ OUT(T2) , ∅

∨ IN(T2) ∩ OUT(T1) , ∅∨ OUT(T1) ∩ OUT(T2) , ∅

Page 54: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

FoundationsTasks Systems

Program Determinism

Maximal Parallelism

Interracting withCPU Cache

Mutual Exclusion

Definitions

Usage of Maximal Parallelism

Given a graph representing a system, one can reasonabout parallelism and performances.Given an (hypothetical) unbound materialparallelism, the complexity of a parallel system is thelength of the longest path in the graph from initialtasks (tasks with no predecessors) to final tasks (taskswith no successor.)Classical analysis of dependency graph can be use tospot critical tasks (tasks that can’t be late withoutslowing the whole process) or find good plannedexcutions for non-parallel hardware.Tasks systems and maximal parallelism can be used toprove modelization of parallel implementations ofsequential programs.Maximal parallelism can be also used to effectivelymeasure real gain of a parallel implementation.

Page 55: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU CacheFalse Sharing

Memory Fence

Mutual Exclusion

Definitions

Interracting with CPU Cache

Interracting with CPU Cache

Page 56: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU CacheFalse Sharing

Memory Fence

Mutual Exclusion

Definitions

Cache: Hidden Parallelism Nightmare

Modern CPUs rely on memory cache to preventmemory access bottleneck;In SMP architecture, cache are mandatory: there’sonly one memory bus ! (NUMA architectures try tosolve this)Access to shared data induce memory locking andcache updates (thus waiting time for your core.)Even when data are not explicitly shared, cachemanagement can become your worst ennemy !

Page 57: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU CacheFalse Sharing

Memory Fence

Mutual Exclusion

Definitions

Cache Management and SMP

This part is Intel/x86 oriented, technical details may notcorrespond to other processors.

Each cache line have a special state: Invalid(I),Shared(S), Exclusive(E), Modified(M)Cache line in state S are shared among core and onlyused for reading.Cache line in state E or M are only owned by one core.When a core tries to write to some memory location ina cache line, it will forced any other core to loose thecache line (putting it to state I for example.)Cache mechanism worked as a read/write lock thattrack modifications and consistency between valuesin cache line and value in memory.The pipeline (and somehow the compiler) try toanticipate write needs so cache line are directlyacquired in E state (fetch for write)

Page 58: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU CacheFalse Sharing

Memory Fence

Mutual Exclusion

Definitions

False Sharing

False Sharing

Page 59: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU CacheFalse Sharing

Memory Fence

Mutual Exclusion

Definitions

False Sharing

Page 60: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU CacheFalse Sharing

Memory Fence

Mutual Exclusion

Definitions

Be nice with cache

There’s no standard ways to control cache interraction, evenusing ASM code. Issues described previously may or may nothappen in various contexts. Rather than providing a seminalsolutions, we must rely on guidelines to prevent cache falsesharing.

Avoid as much as possible shared data;Prefer threads’ local storages (local variable, localyallocated buffers . . . );When returning set of values, allocate a container perworking thread;Copy shared data before using it;When possible use a thread oriented allocator(modern allocator will work with separated pools ofmemory per unit rather than one big pool);

Page 61: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU CacheFalse Sharing

Memory Fence

Mutual Exclusion

Definitions

Be nice with cache

Example:

int main(){// Bad styleint res[2];pthread_t t[2];pthread_create(t,NULL,

run,res);pthread_create(t+1,NULL,

run,res+1);pthread_join(t[0],NULL);pthread_join(t[1],NULL);// use the resultreturn 0;

}

Example:

int main(){// Better styleint *r0, *r1;pthread_t t[2];r0=malloc(sizeof (int));r1=malloc(sizeof (int));pthread_create(t,NULL,

run,r0);pthread_create(t+1,NULL,

run,r1);pthread_join(t[0],NULL);pthread_join(t[1],NULL);// use the resultreturn 0;

}

Page 62: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU CacheFalse Sharing

Memory Fence

Mutual Exclusion

Definitions

Be nice with cache

Example:

int main(){// Even betterint *res[2];pthread_t t[2];// Provide no containerspthread_create(t,NULL,run,NULL);pthread_create(t+1,NULL,run,NULL);// let threads allocate the result// and collect it with join.pthread_join(t[0],res);pthread_join(t[1],res+1);

}

Page 63: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU CacheFalse Sharing

Memory Fence

Mutual Exclusion

Definitions

Some stats ...

1024.00 2048.00 4096.00 8192.00 16384.00 65536.00 131072.000.0000

100.0000

200.0000

300.0000

400.0000

500.0000

600.0000

700.0000

800.0000

BadGood

Page 64: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU CacheFalse Sharing

Memory Fence

Mutual Exclusion

Definitions

Some stats ...

Input (n) Bad Practice Good Practice Difference1024 0.0446 0.0028 0.04182048 0.1878 0.0107 0.1770784096 0.7140 0.0423 0.6716838192 2.7296 0.1688 2.56081216384 7.4191 0.6817 6.7374265536 122.0350 10.7911 111.2439131072 682.3750 43.1707 639.2043

The bad code take more then 90% longer, due to falsesharing.

Both code do heavy computation based on n, bad version share an array between threadsfor reading and writing, while the other copy the input value and allocate its owncontainer. Time results are in seconds, measured using tbb::tick_count. The mainprogram runs 4 threads on 2 dual core processor (no hyperthreading) under FreeBSD.

Page 65: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU CacheFalse Sharing

Memory Fence

Mutual Exclusion

Definitions

Discussion

False sharing can induce an important penality (asshown with previous example)In the presence of more than 2 CPU (on the same diceor not) multiple level of cache can amplified thepenality (this is our case.)Most dual-core shares L2 cache but on quad core onlypairs of cores share itWith multi-processors (as in the experience), there’sprobably no shared cache at all: no hope to avoid afull memory access !The same issue can even be worse when sharingarrays of locks !

Page 66: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU CacheFalse Sharing

Memory Fence

Mutual Exclusion

Definitions

Memory Fence

Memory Fence

Page 67: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU CacheFalse Sharing

Memory Fence

Mutual Exclusion

Definitions

Don’t Trust The Evidence

Modern processor are able to somehow modifyexecution order.On multi-processor plateform this means thatapparent ordering may not be respected at executionlevel (in fact, your compiler is doing the same)

Form Intel’s TBB DocumentationAnother mistake is to assume that conditionally executedcode cannot happen before the condition is tested. However,the compiler or hardware may speculatively hoist the conditionalcode above the condition.

Similarly, it is a mistake to assume that a processor cannotread the target of a pointer before reading the pointer. Amodern processor does not read individual values from mainmemory. It reads cache lines . . .

Page 68: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU CacheFalse Sharing

Memory Fence

Mutual Exclusion

Definitions

Trap

Example:

bool Ready;std::string Message;

// Thread 1 actionvoid Send( const std::string& src ) {Message=src; // C++ hidden memcpyReady = true;

}// Thread 2 actionbool Receive( std::string& dst ) {bool result = Ready;if( result ) dst=Message; // C++ hidden memcpyreturn result;

}

Page 69: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU CacheFalse Sharing

Memory Fence

Mutual Exclusion

Definitions

Trap (2)

In the previous example, there is no garantee that thestring is effectively copied when the second thread seethe flag becoming true.Marking the flag volatilewon’t change any things(it doesn’t induce memory fence.)Compilers and processor try to optimized memoryoperations and often use prefetch or similar activities.Prefer locks, atomic types or condition variables.

Page 70: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

Mutual Exclusion

Mutual Exclusion

Page 71: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

Classic Problem: Shared Counter

Classic Problem: Shared Counter

Page 72: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

Sharing a counter

Sharing a counter between two threads (or processus)is good seminal example to understand thecomplexity of synchronisation.The problem is quite simple: we have two threadsmonitoring external events, when an event occursthey increase a global counter.Increasing a counter X is a simple task of the form:T1 : X = X + 1

With associated sets:

IN(T1) = {X}OUT(T1) = {X}

The two thread execute the same task T1 (along withtheir monitoring activity.) And thus, they areinterfering.

Page 73: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

Pseudo-Code for Counter Sharing

Example:

global int X=0;

guardian(int id):for (;;)wait_event(id); // wait for an eventX = X + 1; // T1

main:{// // parallel executionguardian(0);guardian(1);

//}

Page 74: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

Critical Section and Mutual Exclusion

Critical Section and Mutual Exclusion

Page 75: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

Critical Section

In the previous example, while we can easily see theinterference issue, no task ordering can solve it.We need an other refinement to enforce consistency ofour program.The critical task (T1) is called a critical section.

Critical SectionA section of code is said to be a critical section if executionof this section can not be interrupted by other process ma-nipulating the same shared data without loss of consistencyor determinism.

The overlapping portion of each process, where theshared variables are being accessed.

Page 76: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

Critical Section

Example:

guardian(int id):// Restant Sectionfor (;;)wait_event(id);// Entering Section (empty here)X = X + 1; // Critical Section// Leaving Section (empty here)

Restant Section : section outside of the critical part

Critical Section (CS) : section manipulating shared data

Entering Section : code used to enter CS

Leaving Section : code used to leave CS

Page 77: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

Expected Properties of CS

Mutual Exclusion: two process can not be in a criticalsection (concerning the same shared data) at the sametime.

Progress: a process operating outside of its criticalsection cannot prevent other processes from enteringtheirs.

Bounded Waiting: a process attempting to enter itscritical region will be able to do so after a finite time.

Page 78: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

Classical Issues with CS

Deadlock: two process try to enter in CS at the sametime and block each other (errors in entering section.)

Race condition: two processes make an assumptionthat will be invalidated when the execution of theother process will finished (no mutual exclusion.)

Starvation: a process is waiting for CS indefinitely.

Priority Inversion: a complex double blockingsituation between process of different priority. Inshort, a high priority process is consuming executiontime waiting for CS, blocking a low priority processalready in CS (spin waiting.)

Page 79: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

Solutions with no locks

Solutions with no locks

Page 80: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

Bad Solution 1

Example:

global int X=0;global int turn=0;

guardian(int id):int other = (id+1)%2;for (;;)wait_event(id);while(turn!=id);X = X + 1;turn=other;

Page 81: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

Bad Solution 1

This solution enforce mutual exclusion: turn cannothave two different values at the same time.This solution enforce bounded waiting: you can seethe other thread passing only one time while waiting.This solution does not respect progression:

You will wait for entering CS that the other thread ispassed (even if it arrived after you.)If the other thread see no event, it will not go throughthe CS and won’t let you take your turn !

Page 82: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

Bad Solution 2

Example:

global int X=0;global int ASK[2] = {0;0};

guardian(int id):int other = (id+1)%2;for (;;)wait_event(id);ASK[id] = 1;while(ASK[other]);X = X + 1;ASK[id] = 0;

Page 83: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

Bad Solution 2

This solution enforce mutual exclusion: turn cannothave two different values at the same time.

This solution respects progression

This solution present a dead lock:When asking for CS, each thread will set their flag andthen waits if necessaryBoth thread can set their flag simultaneouslyThus, both thread will wait each other with escapepossibility

Page 84: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

Bad Solution 3

Example:

global int X=0;global int ASK[2] = {0;0};

guardian(int id):int other = (id+1)%2;for (;;)wait_event(id);while(ASK[other]);ASK[id] = 1;X = X + 1;ASK[id] = 0;

Page 85: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

Bad Solution 3

This tiny modification removed the dead lock ofprevious solution

But, this solution present a race condition, mutualexclusion is violated:

When entering CS, a thread will first wait and then setits flagBoth thread can enter the waiting loop before theother one has set its flag and then just passBoth thread can thus enter the CS: lost game !

Page 86: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

The Petterson’s Algorithm

Example:

global int X=0;global int turn=0;global int ASK[2] = {0;0};

guardian(int id):int other = (id+1)%2;for (;;)wait_event(id);ASK[id] = 1;turn=other;while(turn!=id && ASK[other]);X = X + 1;ASK[id] = 0;

Page 87: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

The Petterson’s Algorithm

The previous algorithm statisfies mutual exclusion,progress and bounded waiting.

The solution is limited to two process but can begeneralized to any number of processes.

This solution is hardware/system independent.

The main issue is spin wait: a process waiting for CS isconsuming time ressources, opening risks of priorityinversion.

Page 88: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual ExclusionClassic Problem: SharedCounter

Critical Section and MutualExclusion

Solutions with no locks

Definitions

QUESTIONS ?

Page 89: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Definitions

Definitions

Page 90: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Tasks’ Dependencies

Dependency Ordering Relation

a dependency ordering relation is a partial order whichveryfies:

anti-symmetry (T1 < T2 and T2 < T1 can not be bothtrue)anti-reflexive (we can’t have T < T)transitive (if T1 < T2 and T2 < T3 then T1 < T3).

back

Page 91: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Task Language

Task Language

Let E = {T1, . . . ,Tn} be a set of task, A = {IN(T1), . . . , OUT(Tn)}a vocabulary based on sub-task of E and (<) an orderingrelation on E.The language associated with a task system S = (E, <),noted L(S), is the set of words ω on the vocabulary A suchthat for every Ti in E there is exactly one occurence of IN(Ti)and one occurence of OUT(Ti) and the former appearingbefore the latter. If Ti < Tj then OUT(Ti) must appear beforeIN(Tj). back

Page 92: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Tasks’ Dependencies and Graph

Transitive ClosureThe transitive closure of a relation (<) is the relation <C

defined by:

x <C y if and only if{

x < y∃ z such that x <C z and z <C y

This relation is the biggest relation that can be obtainedfrom (<) by only adding sub-relation by transitivity.

back

Page 93: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Tasks’ Dependencies and Graph

Equivalent Relation

Let < be a well founded partial order, any relation <eq issaid to be equivalent to < if and only if, <eq has the sametransitive closure as <.

Kernel (<min)The kernel <min of a relation < is the smallest relationequivalent to <, that is if we suppress any pair x <min y tothe relation it is no longer equivalent. back

Page 94: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Notion of Determinism

Deterministic System

A deterministic task system S = (E, <) is such that for everypair of words ω and ω′ of L(S) and for every memorylocations X, sequences values affected to X are the samefor ω and ω′.

A deterministic system, is a tasks system where every possibleexecutions are not distinguishable by only observing the evolutionof values in memory locations (observational equivalence, a kindof bisimulation.) back

Page 95: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Non-Interference

Non InterferenceLet S = (E, <) be a tasks system, T1 and T2 be two task ofE, then T1 and T2 are non-interfering if and only if, theyverify one of the two following properties:

T1 < T2 or T2 < T1 (the system force a particularorder.)IN(T1) ∩ OUT(T2) = IN(T2) ∩ OUT(T1) =OUT(T1) ∩ OUT(T2) = ∅

back

Page 96: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Equivalent Systems

Equivalent Systems

Let S1 = (E1, <1) and S2 = (E2, <2) be two tasks systems. S1and S2 are equivalent if and only if:

E1 = E2

S1 and S2 are deterministicFor every words ω1 ∈ L(S1) and ω2 ∈ L(S2), for every(meaningful) memory location X, ω1 and ω2 affect thesame sequence of values to X.

back

Page 97: Parallel and Concurrent Programming Introduction …...Parallel and Concurrent Programming Introduction and Foundation Marwan Burelle Introduction Being Parallel Foundations Interracting

Parallel andConcurrent

ProgrammingIntroduction and

Foundation

Marwan Burelle

Introduction

Being Parallel

Foundations

Interracting withCPU Cache

Mutual Exclusion

Definitions

Maximal Parallelism

Maximal ParallelismA tasks system with maximal parallelism, is a tasks whereone can not remove dependency between two tasks T1 andT2 without introducing interference between T1 and T2.

back

TheoremFor every deterministic system S = (E, <) there exists anequivalent system with maximal parallelism Smax = (E, <max)with (<max) defined as:

T1 <max T2 if

T1 <C T2∧ OUT(T1) , ∅ ∧ OUT(T2) , ∅

IN(T1) ∩ OUT(T2) , ∅

∨ IN(T2) ∩ OUT(T1) , ∅∨ OUT(T1) ∩ OUT(T2) , ∅

back