c++ prgramming in a parallel world€¦ · c++ prgramming in a parallel world c++ prgramming in a...

134
C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University Carlos III of Madrid Spain February 25th, 2020 cbed J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 1/89

Upload: others

Post on 28-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

C++ prgramming in a parallel worldCPP Europe

Bucharest 2020

J. Daniel Garcia

ARCOS GroupUniversity Carlos III of Madrid

Spain

February 25th, 2020

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 1/89

Page 2: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Warning

c This work is under Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)license.You are free to Share — copy and redistribute the ma-terial in any medium or format.

b You must give appropriate credit, provide a link to thelicense, and indicate if changes were made. You maydo so in any reasonable manner, but not in any way thatsuggests the licensor endorses you or your use.

e You may not use the material for commercial purposes.d If you remix, transform, or build upon the material, you

may not distribute the modified material.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 2/89

Page 3: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Who am I?

A C++ programmer.Started writing C++ code in 1989.

A university professor in Computer Architecture.University Carlos III of Madrid (since 2001).

An ISO C++ language standards committee member.AENOR: Spanish Standards National Body.

My goal: Improve applications programming.Performance→ faster applications.Energy efficiency→ better performance per Watt.Maintainability→ easier to modify.Reliability→ safer components.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 3/89

Page 4: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Who am I?

A C++ programmer.Started writing C++ code in 1989.

A university professor in Computer Architecture.University Carlos III of Madrid (since 2001).

An ISO C++ language standards committee member.AENOR: Spanish Standards National Body.

My goal: Improve applications programming.Performance→ faster applications.Energy efficiency→ better performance per Watt.Maintainability→ easier to modify.Reliability→ safer components.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 3/89

Page 5: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Who am I?

A C++ programmer.Started writing C++ code in 1989.

A university professor in Computer Architecture.University Carlos III of Madrid (since 2001).

An ISO C++ language standards committee member.AENOR: Spanish Standards National Body.

My goal: Improve applications programming.Performance→ faster applications.Energy efficiency→ better performance per Watt.Maintainability→ easier to modify.Reliability→ safer components.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 3/89

Page 6: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Who am I?

A C++ programmer.Started writing C++ code in 1989.

A university professor in Computer Architecture.University Carlos III of Madrid (since 2001).

An ISO C++ language standards committee member.AENOR: Spanish Standards National Body.

My goal: Improve applications programming.Performance→ faster applications.Energy efficiency→ better performance per Watt.Maintainability→ easier to modify.Reliability→ safer components.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 3/89

Page 7: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

ARCOS@uc3m

UC3M: A young international research oriented university.ARCOS: An applied research group.

Lines: High Performance Computing, Big data,Cyberphysical systems, Programming Models forApplications Improvement.

Improving applications:REPARA: Reengineering and Enabling Performance andpoweR of Applications. FP7-ICT (2013–2016).RePhrase: REfactoring Parallel Heterogeneous ResourceAware Applications. H2020-ICT (2015–2018).ASPIDE: exAScale ProgrammIng models for extreme DataprocEssing. H2020-FET-HPC (2018–2020).

Standardization:ISO/IEC JTC/SC22/WG21. ISO C++ Committee.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 4/89

Page 8: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Times have changed

1 Times have changed

2 What do you do with multicore?

3 Parallelism in C++17

4 After C++20: Executors

5 What can else can I do?

6 Summary

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 5/89

Page 9: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Times have changed

First microprocessor

Intel 4004 (1971).Application domain: Calculators.Technology: 10,000 nm.Data:

2300 transistors.13 mm2108 KHz12 Volts

Features:4-bits data.Data-path in one cycle.

Intel 4004 photo by RostislavLisovy

Unicom 141P Calculator 3 photoby Michael Holley.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 6/89

Page 10: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Times have changed

My first computer

Sinclair ZX-Spectrum

Zilog Z80 (1976).Application domain: Homecomputers, videoconsoles.Technology: 4,000 nm.Data:

8500 transistors.2.5 MHz5 Volts

Features:8-bits data.

Zilog Z80 photo by KonstantinLanzet

ZX Spectrum photo by BillBertram.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 7/89

Page 11: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Times have changed

Last single core

Die of IntelPentium 4(Northwood)Source:http://gecko54000.free.fr

Intel Pentium 4 (2003).Application domain: Desktop / Servers.Technology: 90 nm (1/100x).

Data:55M transistors (20,000x).101 mm2 (10x).3.4 GHz (10,000x).1.2 Volts (1/10x).

Features:32/64-bit data (16x).Data path with 22 pipeline stages (later 31).3-4 instructions per cycle (superscalar).Two level cache on chip.Data parallel instructions (SIMD).Hyper-threading.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 8/89

Page 12: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Times have changed

A typical multicoreIntel Core i7 (2009).

Application: Desktop / Server.Technology: 45 nm (1/2x).

Data:774M transistors (12x).296 mm2 (3x).3.2 GHz – 3.6 GHz (≈1x).0.7 – 1.4 Volts (≈1x).

Features:128-bit data (2x).Datapath with 14-stage pipeline (0.5x).4 instructions per cycle (≈1x).Three level cache on chip.Data parallel instructions (SIMD).4 cores (4x) + Hyper-threading.

Die of Intel Core i7 (Nehalem)Source: www.legitreviews.com

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 9/89

Page 13: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Times have changed

What happened?

Source: The free lunch is over. Herb Sutter. http://www.gotw.ca/publications/concurrency-ddj.htm

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 10/89

Page 14: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What do you do with multicore?

1 Times have changed

2 What do you do with multicore?

3 Parallelism in C++17

4 After C++20: Executors

5 What can else can I do?

6 Summary

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 11/89

Page 15: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What do you do with multicore?

Impact of multicores

Increase throughput:More transactions per second.Mostly concurrent programming.

Increase performanceFaster execution of a task.Mostly parallel programming.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 12/89

Page 16: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What do you do with multicore?

C++11/14

Focused in providing the concurrency building blocks.Main features:

Clear definition of the memory model.Support for TLS (thread_local).Concurrency portable abstractions:

std::thread.std::mutex, std::timed_mutex, . . .std::condition_variable, condition_variable_any.std::unique_lock, . . .std::promise, std::future, std::packaged_task.

Low level portable lock-free abstractions:std::atomic.std::memory_order.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 13/89

Page 17: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What do you do with multicore?

Where do I get this?

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 14/89

Page 18: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

1 Times have changed

2 What do you do with multicore?

3 Parallelism in C++17

4 After C++20: Executors

5 What can else can I do?

6 Summary

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 15/89

Page 19: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Introduction

3 Parallelism in C++17IntroductionExecution policiesUpdating global stateTransformationsReductionsMap/ReduceScansMore algorithms

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 16/89

Page 20: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Introduction

Parallel algoritms

Many algorithms in the STL have now a parallel version.They take a new first argument to specify the executionpolicy.

// Traditional way −> sequentialstd :: for_each(v.begin(), v.end(), []( auto & x) { f (x) ; }) ;

// New parallelstd :: for_each(std::execution::par,

v.begin() , v.end(), []( auto & x) { f (x) ; }) ;

// New sequentialstd :: for_each(std::execution::seq,

v.begin() , v.end(), []( auto & x) { f (x) ; }) ;

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 17/89

Page 21: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Introduction

Parallel algoritms

Many algorithms in the STL have now a parallel version.They take a new first argument to specify the executionpolicy.

// Traditional way −> sequentialstd :: for_each(v.begin(), v.end(), []( auto & x) { f (x) ; }) ;

// New parallelstd :: for_each(std::execution::par,

v.begin() , v.end(), []( auto & x) { f (x) ; }) ;

// New sequentialstd :: for_each(std::execution::seq,

v.begin() , v.end(), []( auto & x) { f (x) ; }) ;

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 17/89

Page 22: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Introduction

Parallel algoritms

Many algorithms in the STL have now a parallel version.They take a new first argument to specify the executionpolicy.

// Traditional way −> sequentialstd :: for_each(v.begin(), v.end(), []( auto & x) { f (x) ; }) ;

// New parallelstd :: for_each(std::execution::par,

v.begin() , v.end(), []( auto & x) { f (x) ; }) ;

// New sequentialstd :: for_each(std::execution::seq,

v.begin() , v.end(), []( auto & x) { f (x) ; }) ;

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 17/89

Page 23: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Introduction

Processing images

#include <vector>#include <execution>

#include "image.h"

int main() {std :: vector<image> v = load_images("file.dat");

std :: for_each(std::execution::par,v.begin() , v.end(), []( auto & img) { img.to_gray(); }) ;

store_images("newfile.dat", v) ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 18/89

Page 24: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Introduction

Sorting

Sorting requires many applications of the comparator.Specially interesting when comparator is not trivial.

std :: vector<customer> v = get_customers();

std :: sort (std :: execution::par, v.begin() , v.end(),[]( const auto & e1, const auto & e2) {

if (e1.name == e2.name) return e1.last < e2.last;else return e1.name < e2.name;

}) ;

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 19/89

Page 25: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Execution policies

3 Parallelism in C++17IntroductionExecution policiesUpdating global stateTransformationsReductionsMap/ReduceScansMore algorithms

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 20/89

Page 26: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Execution policies

Overview of execution policiesstd::execution::seq.

Class std::execution::sequenced_policy.Algorithm executes sequentially (single thread).Might have changes over traditional algorithm.

std::execution::par.Class std::execution::parallel_policy.Algorithm executes in multiple threads.No vectorization!

std::execution::par_unseq.Class std::execution::parallel_unsequenced_policy.Algorithm executes in multiple threads.Vectorization allowed!

std::execution::unseq (C++20).Class std::execution::unsequenced_policy.Algorithm executes in single thread.Vectorization allowed!

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 21/89

Page 27: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Execution policies

Overview of execution policiesstd::execution::seq.

Class std::execution::sequenced_policy.Algorithm executes sequentially (single thread).Might have changes over traditional algorithm.

std::execution::par.Class std::execution::parallel_policy.Algorithm executes in multiple threads.No vectorization!

std::execution::par_unseq.Class std::execution::parallel_unsequenced_policy.Algorithm executes in multiple threads.Vectorization allowed!

std::execution::unseq (C++20).Class std::execution::unsequenced_policy.Algorithm executes in single thread.Vectorization allowed!

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 21/89

Page 28: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Execution policies

Overview of execution policiesstd::execution::seq.

Class std::execution::sequenced_policy.Algorithm executes sequentially (single thread).Might have changes over traditional algorithm.

std::execution::par.Class std::execution::parallel_policy.Algorithm executes in multiple threads.No vectorization!

std::execution::par_unseq.Class std::execution::parallel_unsequenced_policy.Algorithm executes in multiple threads.Vectorization allowed!

std::execution::unseq (C++20).Class std::execution::unsequenced_policy.Algorithm executes in single thread.Vectorization allowed!

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 21/89

Page 29: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Execution policies

Overview of execution policiesstd::execution::seq.

Class std::execution::sequenced_policy.Algorithm executes sequentially (single thread).Might have changes over traditional algorithm.

std::execution::par.Class std::execution::parallel_policy.Algorithm executes in multiple threads.No vectorization!

std::execution::par_unseq.Class std::execution::parallel_unsequenced_policy.Algorithm executes in multiple threads.Vectorization allowed!

std::execution::unseq (C++20).Class std::execution::unsequenced_policy.Algorithm executes in single thread.Vectorization allowed!

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 21/89

Page 30: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Execution policies

Constraints on iterators

Some algorithms on the STL require ranges expressed asinput iterators.

template< class InputIt, class T >typename iterator_traits< InputIt >::difference_type

count( InputIt first , InputIt last , const T &value );

Execution policy based require iterators to be forwarditerators

template< class ExecutionPolicy, class ForwardIt, class T >typename iterator_traits<ForwardIt>::difference_type

count(ExecutionPolicy&& policy,ForwardIt first , ForwardIt last , const T &value );

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 22/89

Page 31: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Execution policies

Constraints on iterators

Some algorithms on the STL require ranges expressed asinput iterators.

template< class InputIt, class T >typename iterator_traits< InputIt >::difference_type

count( InputIt first , InputIt last , const T &value );

Execution policy based require iterators to be forwarditerators

template< class ExecutionPolicy, class ForwardIt, class T >typename iterator_traits<ForwardIt>::difference_type

count(ExecutionPolicy&& policy,ForwardIt first , ForwardIt last , const T &value );

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 22/89

Page 32: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Execution policies

Changes in algorithms interface

Some algorithms have changed their return types.

Without execution policy.Returns the comparator object.

template <class InputIt, class UnaryFunction>constexpr UnaryFunction

for_each( InputIt first , InputIt last , UnaryFunction f);

With execution policy.Does not return any value.

template <class ExecutionPolicy, class ForwardIt,class UnaryFunction2>

voidfor_each(ExecutionPolicy&& policy,

ForwardIt first , ForwardIt last , UnaryFunction2 f);

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 23/89

Page 33: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Execution policies

Changes in algorithms interface

Some algorithms have changed their return types.Without execution policy.

Returns the comparator object.

template <class InputIt, class UnaryFunction>constexpr UnaryFunction

for_each( InputIt first , InputIt last , UnaryFunction f);

With execution policy.Does not return any value.

template <class ExecutionPolicy, class ForwardIt,class UnaryFunction2>

voidfor_each(ExecutionPolicy&& policy,

ForwardIt first , ForwardIt last , UnaryFunction2 f);

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 23/89

Page 34: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Execution policies

Changes in algorithms interface

Some algorithms have changed their return types.Without execution policy.

Returns the comparator object.

template <class InputIt, class UnaryFunction>constexpr UnaryFunction

for_each( InputIt first , InputIt last , UnaryFunction f);

With execution policy.Does not return any value.

template <class ExecutionPolicy, class ForwardIt,class UnaryFunction2>

voidfor_each(ExecutionPolicy&& policy,

ForwardIt first , ForwardIt last , UnaryFunction2 f);

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 23/89

Page 35: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Execution policies

What about exceptions?

In non execution policy based exceptions can be thrown.std :: for_each(v.begin(), v.end(),

[]( auto & x) {if ( valid (x)} f (x) ;else throw invalid_value{x}; // Throws exception

}) ;

In excecution policy based exceptions translate intostd::terminate.std :: for_each(std::execution::seq, v.begin() , v.end(),

[]( auto & x) {if ( valid (x)} f (x) ;else throw invalid_value{x}; // Invoke std :: terminate

}) ;

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 24/89

Page 36: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Execution policies

What about exceptions?

In non execution policy based exceptions can be thrown.std :: for_each(v.begin(), v.end(),

[]( auto & x) {if ( valid (x)} f (x) ;else throw invalid_value{x}; // Throws exception

}) ;

In excecution policy based exceptions translate intostd::terminate.std :: for_each(std::execution::seq, v.begin() , v.end(),

[]( auto & x) {if ( valid (x)} f (x) ;else throw invalid_value{x}; // Invoke std :: terminate

}) ;

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 24/89

Page 37: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Execution policies

When to avoid execution policies

Using input or output iterators.

Avoid calling std::terminate on exceptions.

Avoid side effects on use of elements.

Make use of return values (e.g. std::for_each().

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 25/89

Page 38: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Execution policies

When to avoid execution policies

Using input or output iterators.

Avoid calling std::terminate on exceptions.

Avoid side effects on use of elements.

Make use of return values (e.g. std::for_each().

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 25/89

Page 39: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Execution policies

When to avoid execution policies

Using input or output iterators.

Avoid calling std::terminate on exceptions.

Avoid side effects on use of elements.

Make use of return values (e.g. std::for_each().

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 25/89

Page 40: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Execution policies

When to avoid execution policies

Using input or output iterators.

Avoid calling std::terminate on exceptions.

Avoid side effects on use of elements.

Make use of return values (e.g. std::for_each().

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 25/89

Page 41: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Updating global state

3 Parallelism in C++17IntroductionExecution policiesUpdating global stateTransformationsReductionsMap/ReduceScansMore algorithms

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 26/89

Page 42: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Updating global state

Counting valid elements

long count = 0;std :: vector<double> v = get_values();

std :: for_each(std::execution::par,v.begin() , v.end(),[&](double x) {

if (x>0) count++;}

) ;

std :: cout << "Count= " << count << "\n";

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 27/89

Page 43: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Updating global state

Solving the data race: mutexes

long count = 0;std :: mutex m;std :: vector<double> v = get_values();

std :: for_each(std::execution::par,v.begin() , v.end(),[&](double x) {

if (x>0) {std :: lock_guard<std::mutex> l{m};count++;

}}

) ;

std :: cout << "Count= " << count << "\n";

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 28/89

Page 44: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Updating global state

Solving the data race: atomics

std :: atomic<long> count = 0;std :: vector<double> v = get_values();

std :: for_each(std::execution::par,v.begin() , v.end(),[&](double x) {

if (x>0) count++;}

) ;

std :: cout << "Count= " << count << "\n";

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 29/89

Page 45: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Updating global state

Or even better

std :: vector<double> v = get_values();

long count = std :: count_if (std :: execution::par,v.begin() , v.end(),[]( double x) {

return x>0}

) ;

std :: cout << "Count= " << count << "\n";

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 30/89

Page 46: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Updating global state

Remember

Accessing global state from algorithms may result in dataraces.

Using mutexes may be a heavyweight solution.Atomics hav limited applicability.

There might be a better algorithm.A std::for_each() call may be a code smell.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 31/89

Page 47: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Updating global state

Remember

Accessing global state from algorithms may result in dataraces.

Using mutexes may be a heavyweight solution.Atomics hav limited applicability.

There might be a better algorithm.A std::for_each() call may be a code smell.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 31/89

Page 48: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Transformations

3 Parallelism in C++17IntroductionExecution policiesUpdating global stateTransformationsReductionsMap/ReduceScansMore algorithms

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 32/89

Page 49: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Transformations

The map-pattern

A well known pattern in functional programming.Apply an operation to every element in a data set togenerate a new data set.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 33/89

Page 50: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Transformations

Squaring values

std :: vector<double> square(const std::vector<double> & v){

std :: vector<double> r(v.size()) ;

std :: transform(std :: sequential :: par,v.begin() , v.end(), r .begin() ,[]( double x) { return x∗x; }

) ;

return r ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 34/89

Page 51: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Transformations

Adding vectors

std :: vector<double> add(const std::vector<double> & v,const std::vector<double> & w)

{std :: vector<double> r(v.size()) ;

std :: transform(std :: sequential :: par,v.begin() , v.end(), w.begin(), r .begin() ,[]( double x, double y) { return x+y; }

) ;

return r ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 35/89

Page 52: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Transformations

Heterogeneous transformations

std :: vector<std ::complex<double>> create_cplx(const std::vector<double> & re,const std::vector<double> & im)

{auto sz = std :: min(re.size () , im.size () ) ;std :: vector<std ::complex<double>> res(sz);

std :: transform(std :: execution::par,re.begin() , re.end(), im.begin() ,res.begin() ,[]( double r, double i) −> complex<double> {

return { r , i };}) ;

return res;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 36/89

Page 53: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Reductions

3 Parallelism in C++17IntroductionExecution policiesUpdating global stateTransformationsReductionsMap/ReduceScansMore algorithms

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 37/89

Page 54: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Reductions

Reduction pattern

A reduction computes the sum of all elements in a dataset.

Note: std::reduce looks quite similar to std::accumulateon the surface.

Result is not deterministic unless the sum opration is bothassociative and commutative.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 38/89

Page 55: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Reductions

Reduction pattern

A reduction computes the sum of all elements in a dataset.

Note: std::reduce looks quite similar to std::accumulateon the surface.

Result is not deterministic unless the sum opration is bothassociative and commutative.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 38/89

Page 56: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Reductions

Add all elements in a vector

void print_add(const std::vector<double> & v){

double r = std :: reduce(std::execution::par,v.begin() , v.end()) ;

std :: cout << "sum= " << r << "\n";}

Initial value is value_type{}.Binary operation is std::plus<>().

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 39/89

Page 57: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Reductions

Providing initial value

void print_add(const std::vector<double> & v){

double r = std :: reduce(std::execution::par,v.begin() , v.end(), 100.0);

std :: cout << "sum= " << r << "\n";}

Still reduction operation is std::plus<>().

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 40/89

Page 58: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Reductions

Providing reduction operator

void print_add(const std::vector<double> & v){

double r = std :: reduce(std::execution::par,v.begin() , v.end(), 0.0,[]( double x, double y) { return x+y; }

) ;

std :: cout << "sum= " << r << "\n";}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 41/89

Page 59: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Map/Reduce

3 Parallelism in C++17IntroductionExecution policiesUpdating global stateTransformationsReductionsMap/ReduceScansMore algorithms

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 42/89

Page 60: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Map/Reduce

Map/reduce pattern

A map-reduce pattern combines a map pattern with areduce pattern over the results of that map.

In C++ it is spelled out std::transform_reduce.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 43/89

Page 61: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Map/Reduce

Computing the norm of a vector

void print_norm(const std::vector<double> & v){

double s = std::transform_reduce(std::execution::par,v.begin() , v.end(),0.0,[]( double x, double y) { return x + y },[]( double x) { return x ∗ x; }

) ;

std :: cout << "Norm: " << std:: sqrt (s) << "\n";}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 44/89

Page 62: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Map/Reduce

Computing aggregate area

double area(const std::vector<shape> & shapes){

return std :: map_reduce(std::execution::par,shapes.begin(), shapes.end(),0.0,[]( double x, double y) { return x+y; },[]( const shape & s) { return s.area() ; }

) ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 45/89

Page 63: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Map/Reduce

Cannonical example

Word frequencies from sequence of words.Associative container with <word,freq>.

auto word_freq(const std::vector<std::string> & words){

using dictionary = std :: map<std::string,long>;return std :: transform_reduce(std::execution::par,

words.begin(), words.end(), dictionary {},[]( dictionary & lhs, const dictionary & rhs) −> dictionary {

for (auto & [key,value] : rhs) { lhs [key] += value; }return lhs ;

},[]( const std:: string & s) −> dictionary { return {w,1}; }

) ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 46/89

Page 64: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Map/Reduce

Cannonical example

Word frequencies from sequence of words.Associative container with <word,freq>.

auto word_freq(const std::vector<std::string> & words){

using dictionary = std :: map<std::string,long>;

return std :: transform_reduce(std::execution::par,words.begin(), words.end(), dictionary {},[]( dictionary & lhs, const dictionary & rhs) −> dictionary {

for (auto & [key,value] : rhs) { lhs [key] += value; }return lhs ;

},[]( const std:: string & s) −> dictionary { return {w,1}; }

) ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 46/89

Page 65: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Map/Reduce

Cannonical example

Word frequencies from sequence of words.Associative container with <word,freq>.

auto word_freq(const std::vector<std::string> & words){

using dictionary = std :: map<std::string,long>;return std :: transform_reduce(std::execution::par,

words.begin(), words.end(), dictionary {},[]( dictionary & lhs, const dictionary & rhs) −> dictionary {

for (auto & [key,value] : rhs) { lhs [key] += value; }return lhs ;

},[]( const std:: string & s) −> dictionary { return {w,1}; }

) ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 46/89

Page 66: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Map/Reduce

Cannonical example

Word frequencies from sequence of words.Associative container with <word,freq>.

auto word_freq(const std::vector<std::string> & words){

using dictionary = std :: map<std::string,long>;return std :: transform_reduce(std::execution::par,

words.begin(), words.end(), dictionary {},

[]( dictionary & lhs, const dictionary & rhs) −> dictionary {for (auto & [key,value] : rhs) { lhs [key] += value; }return lhs ;

},[]( const std:: string & s) −> dictionary { return {w,1}; }

) ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 46/89

Page 67: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Map/Reduce

Cannonical example

Word frequencies from sequence of words.Associative container with <word,freq>.

auto word_freq(const std::vector<std::string> & words){

using dictionary = std :: map<std::string,long>;return std :: transform_reduce(std::execution::par,

words.begin(), words.end(), dictionary {},[]( dictionary & lhs, const dictionary & rhs) −> dictionary {

for (auto & [key,value] : rhs) { lhs [key] += value; }return lhs ;

},

[]( const std:: string & s) −> dictionary { return {w,1}; }) ;

}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 46/89

Page 68: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Map/Reduce

Cannonical example

Word frequencies from sequence of words.Associative container with <word,freq>.

auto word_freq(const std::vector<std::string> & words){

using dictionary = std :: map<std::string,long>;return std :: transform_reduce(std::execution::par,

words.begin(), words.end(), dictionary {},[]( dictionary & lhs, const dictionary & rhs) −> dictionary {

for (auto & [key,value] : rhs) { lhs [key] += value; }return lhs ;

},[]( const std:: string & s) −> dictionary { return {w,1}; }

) ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 46/89

Page 69: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Scans

3 Parallelism in C++17IntroductionExecution policiesUpdating global stateTransformationsReductionsMap/ReduceScansMore algorithms

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 47/89

Page 70: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Scans

Scan pattern

A scan pattern computes a sequence of partial reductionson a dataset.

A scan on x0, x1, x2, . . .Results in the sequence:

x0

x0 + x1

x0 + x1 + x2

. . .

Two alternatives:std::exclusive_scan()std::inclusive_scan()

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 48/89

Page 71: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Scans

Computing CDF

auto compute_cdf(const std::vector<int> & histogram){

std :: vector<int> cdf(histogram.size() ) ;

std :: inclusive_scan(std :: execution::par,histogram.begin(), histogram.end(),cdf.begin() ,0

) ;

return cdf;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 49/89

Page 72: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

Scans

Combining transform and scan

auto compute_cdf(const std::vector<int> & histogram){

std :: vector<int> cdf(histogram.size() ) ;

std :: transform_inclusive_scan(std::execution::par,histogram.begin(), histogram.end(),cdf.begin() ,0,[]( auto x, auto y) { return x+y; }[]( auto x) {

if (x<0) return 0;if (x>255) return 255;return x;

}) ;

return cdf;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 50/89

Page 73: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

More algorithms

3 Parallelism in C++17IntroductionExecution policiesUpdating global stateTransformationsReductionsMap/ReduceScansMore algorithms

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 51/89

Page 74: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Parallelism in C++17

More algorithms

What algorithms are parallel

Most algorithms have an execution policy based version.Few exceptions:

Numerics replaced by new versions: accumulate,inner_product, partial_sum.Backwards algorithms: copy_backward, move_backward.Searching: some versions of search.Sampling and permuting: sample, shuffle, *_permutation.Partitioning: partition_point.Bounds search: *_bound, equal_range.Heap based: *_heap.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 52/89

Page 75: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

After C++20: Executors

1 Times have changed

2 What do you do with multicore?

3 Parallelism in C++17

4 After C++20: Executors

5 What can else can I do?

6 Summary

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 53/89

Page 76: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

After C++20: Executors

DISCLAIMER

This section contains tentative design that has cur-rently under discussion.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 54/89

Page 77: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

After C++20: Executors

Context

A possible future:Composition of networked, asynchronous parallelcomputations.Accelerated by diverse hardware

But the present:Low-level concurrency primitives (std::thread,std::atomic, . . . ).Components with known problems (std::async,std::future, . . . ).Parallel algorithms neither flexible nor composable.

Solution with two components:executorssenders and receivers.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 55/89

Page 78: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

After C++20: Executors

Context

A possible future:Composition of networked, asynchronous parallelcomputations.Accelerated by diverse hardware

But the present:Low-level concurrency primitives (std::thread,std::atomic, . . . ).Components with known problems (std::async,std::future, . . . ).Parallel algorithms neither flexible nor composable.

Solution with two components:executorssenders and receivers.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 55/89

Page 79: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

After C++20: Executors

Context

A possible future:Composition of networked, asynchronous parallelcomputations.Accelerated by diverse hardware

But the present:Low-level concurrency primitives (std::thread,std::atomic, . . . ).Components with known problems (std::async,std::future, . . . ).Parallel algorithms neither flexible nor composable.

Solution with two components:executorssenders and receivers.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 55/89

Page 80: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

After C++20: Executors

Executors

Executors:A work execution interface.

Any executor type.

using namespace std::execution;std :: static_thread_pool p(16);executor auto ex = p.executor() ;execute(ex, []{ do_the_work(); }) ;

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 56/89

Page 81: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

After C++20: Executors

Senders and receivers

Senders and receivers:A representation of work and interrelationships.

sender types.receiver types.

sender auto begin = schedule(ex);sender auto next = then(begin, [] { f () ; return 42; }) ;sender auto job = then(next, []( int x) { g(x) ; return 99;

}) ;

receiver auto doit = as_receier ([]( int x) { store(x) ; }) ;submit(job,doit ) ;

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 57/89

Page 82: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

After C++20: Executors

What is an executor

A lightweight handle to an execution context.A thread pool.SIMD units.GPUs.Current thread.. . .

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 58/89

Page 83: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

After C++20: Executors

The simplest executor

An inline executor executes the work immediately.

struct inline_executor {template<class F>void execute(F&& f) const noexcept {

std :: invoke(std :: forward<F>(f));}

auto operator<=>(const inline_executor&) const = default;};

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 59/89

Page 84: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

After C++20: Executors

Bulk execution

Another example of control structure provided by anexecutor.

Creates a group of functions calls in a single operation.

struct simd_executor : inline_executor {template<class F>simd_sender bulk_execute(F f, size_t n) const {

#pragma simdfor(size_t i = 0; i != n; ++i) {

std :: invoke(f , i ) ;}

return {};}

};

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 60/89

Page 85: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

After C++20: Executors

An executor based for-each

template<class Executor, class F, class Range>void my_for_each(const Executor& ex, F f, Range rng) {

// request bulk execution, receive a sendersender auto s = execution::bulk_execute(ex,

[=]( size_t i ) {f (rng[ i ]) ;

}) ;

// initiate execution and wait for it to completeexecution::sync_wait(s);

}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 61/89

Page 86: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

After C++20: Executors

A future asynchronous STL?

sender auto s =just (3) | // produce ’3’ immediatelyvia(scheduler1) | // transition contextthen ([]( int a){return a+1;}) | // chain continuationthen ([]( int a){return a∗2;}) | // chain another continuationvia(scheduler2) | // transition contexthandle_error ([]( auto e){

return just (3) ;}) ; // with default value onerrors

int r = sync_wait(s); // wait for the result

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 62/89

Page 87: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

1 Times have changed

2 What do you do with multicore?

3 Parallelism in C++17

4 After C++20: Executors

5 What can else can I do?

6 Summary

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 63/89

Page 88: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

GrPPI

https://github.com/arcosuc3m/grppi

Generic reusable Parallel Pattern Interface.A header only library.A set of execution policies.A set of type safe generic algorithms.Requires C++14.Apache 2.0 License.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 64/89

Page 89: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

GrPPI

https://github.com/arcosuc3m/grppi

Generic reusable Parallel Pattern Interface.A header only library.A set of execution policies.A set of type safe generic algorithms.Requires C++14.Apache 2.0 License.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 64/89

Page 90: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Controlling execution

5 What can else can I do?Controlling executionPipelinesFarm of tasksControlling the bufferingFilter

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 65/89

Page 91: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Controlling execution

Execution types

Execution model is encapsulated in execution types.Always provided as first argument to patterns.

Current concrete execution types:Sequential: sequential_execution.ISO C++ Threads: parallel_execution_native.OpenMP: parallel_execution_omp.Intel TBB: parallel_execution_tbb.FastFlow: parallel_execution_ff.

Run-time polymorphic wrapper through type erasure:dynamic_execution.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 66/89

Page 92: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Controlling execution

Execution model properties

Some execution types allow finer configurtion.Example: Concurrency degree.

Interface:

ex.set_concurrency_degree(4);int n = ex.concurrency_degree();

Default values:Sequential⇒ 1.Native⇒ std::thread::hardware_concurrency().OpenMP⇒ omp_get_num_threads().

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 67/89

Page 93: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Controlling execution

Execution model properties

Some execution types allow finer configurtion.Example: Concurrency degree.

Interface:

ex.set_concurrency_degree(4);int n = ex.concurrency_degree();

Default values:Sequential⇒ 1.Native⇒ std::thread::hardware_concurrency().OpenMP⇒ omp_get_num_threads().

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 67/89

Page 94: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Controlling execution

Execution model properties

Some execution types allow finer configurtion.Example: Concurrency degree.

Interface:

ex.set_concurrency_degree(4);int n = ex.concurrency_degree();

Default values:Sequential⇒ 1.Native⇒ std::thread::hardware_concurrency().OpenMP⇒ omp_get_num_threads().

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 67/89

Page 95: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

5 What can else can I do?Controlling executionPipelinesFarm of tasksControlling the bufferingFilter

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 68/89

Page 96: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

Pipeline pattern

A pipeline pattern allows processing a data stream wherethe computation may be divided in multiple stages.

Each stage processes the data item generated in theprevious stage and passes the produced result to the nextstage.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 69/89

Page 97: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

Standalone pipeline

A standalone pipeline is a top-level pipeline.Invoking the pipeline translates into its execution.

Given:A generator g : ∅ 7→ T1 ∪∅A sequence of transformers ti : Ti 7→ Ti+1

For every non-empty value generated by g, it evaluates:tn(tn−1(. . . t1(g())))

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 70/89

Page 98: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

Standalone pipeline

A standalone pipeline is a top-level pipeline.Invoking the pipeline translates into its execution.

Given:A generator g : ∅ 7→ T1 ∪∅A sequence of transformers ti : Ti 7→ Ti+1

For every non-empty value generated by g, it evaluates:tn(tn−1(. . . t1(g())))

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 70/89

Page 99: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

GeneratorsA generator g is any callable C++ entity that:

Takes no argument.Returns a value of type T that may hold (or not) a value.Null value signals end of stream.

The return value must be any type that:Is copy-constructible or move-constructible.

T x = g() ;

Is contextually convertible to bool

if (x) { /∗ ... ∗/ }if (! x) { /∗ ... ∗/ }

Can be derreferenced

auto val = ∗x;

The standard library offers an excellent candidatestd::optional<T>.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 71/89

Page 100: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

GeneratorsA generator g is any callable C++ entity that:

Takes no argument.Returns a value of type T that may hold (or not) a value.Null value signals end of stream.

The return value must be any type that:Is copy-constructible or move-constructible.

T x = g() ;

Is contextually convertible to bool

if (x) { /∗ ... ∗/ }if (! x) { /∗ ... ∗/ }

Can be derreferenced

auto val = ∗x;

The standard library offers an excellent candidatestd::optional<T>.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 71/89

Page 101: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

GeneratorsA generator g is any callable C++ entity that:

Takes no argument.Returns a value of type T that may hold (or not) a value.Null value signals end of stream.

The return value must be any type that:Is copy-constructible or move-constructible.

T x = g() ;

Is contextually convertible to bool

if (x) { /∗ ... ∗/ }if (! x) { /∗ ... ∗/ }

Can be derreferenced

auto val = ∗x;

The standard library offers an excellent candidatestd::optional<T>.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 71/89

Page 102: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

GeneratorsA generator g is any callable C++ entity that:

Takes no argument.Returns a value of type T that may hold (or not) a value.Null value signals end of stream.

The return value must be any type that:Is copy-constructible or move-constructible.

T x = g() ;

Is contextually convertible to bool

if (x) { /∗ ... ∗/ }if (! x) { /∗ ... ∗/ }

Can be derreferenced

auto val = ∗x;

The standard library offers an excellent candidatestd::optional<T>.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 71/89

Page 103: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

GeneratorsA generator g is any callable C++ entity that:

Takes no argument.Returns a value of type T that may hold (or not) a value.Null value signals end of stream.

The return value must be any type that:Is copy-constructible or move-constructible.

T x = g() ;

Is contextually convertible to bool

if (x) { /∗ ... ∗/ }if (! x) { /∗ ... ∗/ }

Can be derreferenced

auto val = ∗x;

The standard library offers an excellent candidatestd::optional<T>.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 71/89

Page 104: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

Simple pipeline: x -> x*x -> 1/x -> print

template <typename Execution>void run_pipe(const Execution & ex, int n){

grppi :: pipeline (ex,[ i=0,max=n] () mutable −> optional<int> {

if ( i<max) return i++;else return {};

},[]( int x) −> double { return x∗x; },[]( double x) { return 1/x; },[]( double x) { cout << x << "\n"; }

) ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 72/89

Page 105: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

Nested pipelines

Pipelines may be nested.

An inner pipeline:Does not take an execution policy.All stages are transformers (no generator).The last stage must also produce values.

The inner pipeline uses the same execution policy thanthe outer pipeline.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 73/89

Page 106: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

Nested pipelines: Image processing

void process(std::istream & in_file , std :: ostream & out_file) {grppi:parallel_execution_native ex;

grppi :: pipeline (ex,[& in_file ]() −> optional<frame> {

frame f = read_frame(file) ;if (! file ) return {};return f ;

},pipeline (

[]( const frame & f) { return filter ( f ) ; },[]( const frame & f) { return gray_scale(f) ; },

},[& out_file ]( const frame & f) { write_frame(out_file , f ) ; }

) ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 74/89

Page 107: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

Nested pipelines: Image processing

void process(std::istream & in_file , std :: ostream & out_file) {grppi:parallel_execution_native ex;grppi :: pipeline (ex,

[& in_file ]() −> optional<frame> {frame f = read_frame(file) ;if (! file ) return {};return f ;

},pipeline (

[]( const frame & f) { return filter ( f ) ; },[]( const frame & f) { return gray_scale(f) ; },

},[& out_file ]( const frame & f) { write_frame(out_file , f ) ; }

) ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 74/89

Page 108: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

Nested pipelines: Image processing

void process(std::istream & in_file , std :: ostream & out_file) {grppi:parallel_execution_native ex;grppi :: pipeline (ex,

[& in_file ]() −> optional<frame> {frame f = read_frame(file) ;if (! file ) return {};return f ;

},

pipeline ([]( const frame & f) { return filter ( f ) ; },[]( const frame & f) { return gray_scale(f) ; },

},[& out_file ]( const frame & f) { write_frame(out_file , f ) ; }

) ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 74/89

Page 109: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

Nested pipelines: Image processing

void process(std::istream & in_file , std :: ostream & out_file) {grppi:parallel_execution_native ex;grppi :: pipeline (ex,

[& in_file ]() −> optional<frame> {frame f = read_frame(file) ;if (! file ) return {};return f ;

},pipeline (

[]( const frame & f) { return filter ( f ) ; },[]( const frame & f) { return gray_scale(f) ; },

},

[& out_file ]( const frame & f) { write_frame(out_file , f ) ; }) ;

}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 74/89

Page 110: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

Nested pipelines: Image processing

void process(std::istream & in_file , std :: ostream & out_file) {grppi:parallel_execution_native ex;grppi :: pipeline (ex,

[& in_file ]() −> optional<frame> {frame f = read_frame(file) ;if (! file ) return {};return f ;

},pipeline (

[]( const frame & f) { return filter ( f ) ; },[]( const frame & f) { return gray_scale(f) ; },

},[& out_file ]( const frame & f) { write_frame(out_file , f ) ; }

) ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 74/89

Page 111: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

Piecewise pipelines: Image processing

void process(std::istream & in_file , std :: ostream & out_file) {

auto reader = [& in_file ]() −> optional<frame> {frame f = read_frame(file) ;if (! file ) return {};return f ;

};auto transformer = pipeline(

[]( const frame & f) { return filter ( f ) ; },[]( const frame & f) { return gray_scale(f) ; },

};auto writer = [& out_file ]( const frame & f) { write_frame(out_file ,

f ) ; }

grppi:parallel_execution_native ex;grppi :: pipeline (ex, reader, transformer, writer ) ;

}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 75/89

Page 112: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

Piecewise pipelines: Image processing

void process(std::istream & in_file , std :: ostream & out_file) {auto reader = [& in_file ]() −> optional<frame> {

frame f = read_frame(file) ;if (! file ) return {};return f ;

};

auto transformer = pipeline([]( const frame & f) { return filter ( f ) ; },[]( const frame & f) { return gray_scale(f) ; },

};auto writer = [& out_file ]( const frame & f) { write_frame(out_file ,

f ) ; }

grppi:parallel_execution_native ex;grppi :: pipeline (ex, reader, transformer, writer ) ;

}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 75/89

Page 113: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

Piecewise pipelines: Image processing

void process(std::istream & in_file , std :: ostream & out_file) {auto reader = [& in_file ]() −> optional<frame> {

frame f = read_frame(file) ;if (! file ) return {};return f ;

};auto transformer = pipeline(

[]( const frame & f) { return filter ( f ) ; },[]( const frame & f) { return gray_scale(f) ; },

};

auto writer = [& out_file ]( const frame & f) { write_frame(out_file ,f ) ; }

grppi:parallel_execution_native ex;grppi :: pipeline (ex, reader, transformer, writer ) ;

}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 75/89

Page 114: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

Piecewise pipelines: Image processing

void process(std::istream & in_file , std :: ostream & out_file) {auto reader = [& in_file ]() −> optional<frame> {

frame f = read_frame(file) ;if (! file ) return {};return f ;

};auto transformer = pipeline(

[]( const frame & f) { return filter ( f ) ; },[]( const frame & f) { return gray_scale(f) ; },

};auto writer = [& out_file ]( const frame & f) { write_frame(out_file ,

f ) ; }

grppi:parallel_execution_native ex;grppi :: pipeline (ex, reader, transformer, writer ) ;

}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 75/89

Page 115: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Pipelines

Piecewise pipelines: Image processing

void process(std::istream & in_file , std :: ostream & out_file) {auto reader = [& in_file ]() −> optional<frame> {

frame f = read_frame(file) ;if (! file ) return {};return f ;

};auto transformer = pipeline(

[]( const frame & f) { return filter ( f ) ; },[]( const frame & f) { return gray_scale(f) ; },

};auto writer = [& out_file ]( const frame & f) { write_frame(out_file ,

f ) ; }

grppi:parallel_execution_native ex;grppi :: pipeline (ex, reader, transformer, writer ) ;

}cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 75/89

Page 116: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Farm of tasks

5 What can else can I do?Controlling executionPipelinesFarm of tasksControlling the bufferingFilter

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 76/89

Page 117: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Farm of tasks

Farm pattern

A farm is a streaming pattern applicable to a stage in apipeline, providing multiple tasks to process data itemsfrom a data stream.

A farm has an associated cardinality which is the numberof parallel tasks used to serve the stage.Each task in a farm runs a transformer for each data itemit receives.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 77/89

Page 118: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Farm of tasks

Farms in pipelines: Improving a video

template <typename Execution>void run_pipe(const Execution & ex,

std :: ifstream & filein , std :: ofstream & fileout ){

grppi :: pipeline (ex,[& filein ] () −> optional<frame> {

frame f = read_frame(filein ) ;if (! filein ) retrun {};return f ;

},farm(4, []( const frame & f) { return improve(f) ; },[& fileout ] (const frame & f) { write_frame(f) ; }

) ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 78/89

Page 119: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Farm of tasks

Piecewise farms: Improving a videotemplate <typename Execution>void run_pipe(const Execution & ex,

std :: ifstream & filein , std :: ofstream & fileout ){

auto improver = farm(4,[]( const frame & f) { return improve(f) ; }) ;

grppi :: pipeline (ex,[& filein ] () −> optional<frame> {

frame f = read_frame(filein ) ;if (! filein ) retrun {};return f ;

},improver,[& fileout ] (const frame & f) { write_frame(f) ; }

) ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 79/89

Page 120: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Controlling the buffering

5 What can else can I do?Controlling executionPipelinesFarm of tasksControlling the bufferingFilter

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 80/89

Page 121: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Controlling the buffering

Ordering

Signals if pipeline items must be consumed in the sameorder they were produced.

Do they need to be time-stamped?

Default is ordered.

APIex.enable_ordering()ex.disable_ordering()bool o = ex.is_ordered()

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 81/89

Page 122: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Controlling the buffering

Queueing properties

Some policies (native and omp) use queues tocommunicate pipeline stages.

Properties:Queue size: Buffer size of the queue.Mode: blocking versus lock-free.

APIex.set_queue_attributes(100, mode::blocking)

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 82/89

Page 123: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Filter

5 What can else can I do?Controlling executionPipelinesFarm of tasksControlling the bufferingFilter

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 83/89

Page 124: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Filter

Filter pattern

A filter pattern discards (or keeps) the data items from adata stream based on the outcome of a predicate.This pattern can be used only as a stage of a pipeline.

Alternatives:Keep: Only data items satisfying the predicate are sent tothe next stage.Discard: Only data items not satisfying the predicate aresent to the next stage.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 84/89

Page 125: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Filter

Filter pattern

A filter pattern discards (or keeps) the data items from adata stream based on the outcome of a predicate.This pattern can be used only as a stage of a pipeline.

Alternatives:Keep: Only data items satisfying the predicate are sent tothe next stage.Discard: Only data items not satisfying the predicate aresent to the next stage.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 84/89

Page 126: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Filter

Filtering in: Print primes

bool is_prime(int n);

template <typename Execution>void print_primes(const Execution & ex, int n){

grppi :: pipeline (exec,[ i=0,max=n]() mutable −> optional<int> {

if ( i<=n) return i++;else return {};

},grppi :: keep(is_prime),[]( int x) { cout << x << "\n"; }

) ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 85/89

Page 127: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

What can else can I do?

Filter

Filtering out: Discard words

template <typename Execution>void print_primes(const Execution & ex, std::istream & is ){

grppi :: pipeline (exec,[& file ]() −> optional<string> {

string word;file >> word;if (! file ) { return {}; }else { return word; }

},grppi :: discard ([]( std :: string w) { return w.length() < 4; },[]( std :: string w) { cout << x << "\n"; }

) ;}

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 86/89

Page 128: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Summary

1 Times have changed

2 What do you do with multicore?

3 Parallelism in C++17

4 After C++20: Executors

5 What can else can I do?

6 Summary

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 87/89

Page 129: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Summary

Summary

We live in a parallel world!

Many portable concurrency primitives since C++11.Low level and good for solving the througput challenge.

C++17 brings easy parallelism to the STL.Mostly data parallelism.

C++23 (hopefully) might bring executors.

Stream parallelism still to be solved.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 88/89

Page 130: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Summary

Summary

We live in a parallel world!

Many portable concurrency primitives since C++11.Low level and good for solving the througput challenge.

C++17 brings easy parallelism to the STL.Mostly data parallelism.

C++23 (hopefully) might bring executors.

Stream parallelism still to be solved.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 88/89

Page 131: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Summary

Summary

We live in a parallel world!

Many portable concurrency primitives since C++11.Low level and good for solving the througput challenge.

C++17 brings easy parallelism to the STL.Mostly data parallelism.

C++23 (hopefully) might bring executors.

Stream parallelism still to be solved.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 88/89

Page 132: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Summary

Summary

We live in a parallel world!

Many portable concurrency primitives since C++11.Low level and good for solving the througput challenge.

C++17 brings easy parallelism to the STL.Mostly data parallelism.

C++23 (hopefully) might bring executors.

Stream parallelism still to be solved.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 88/89

Page 133: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Summary

Summary

We live in a parallel world!

Many portable concurrency primitives since C++11.Low level and good for solving the througput challenge.

C++17 brings easy parallelism to the STL.Mostly data parallelism.

C++23 (hopefully) might bring executors.

Stream parallelism still to be solved.

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 88/89

Page 134: C++ prgramming in a parallel world€¦ · C++ prgramming in a parallel world C++ prgramming in a parallel world CPP Europe Bucharest 2020 J. Daniel Garcia ARCOS Group University

C++ prgramming in a parallel world

Summary

C++ prgramming in a parallel worldCPP Europe

Bucharest 2020

J. Daniel Garcia

ARCOS GroupUniversity Carlos III of Madrid

Spain

February 25th, 2020

cbed – J. Daniel Garcia – ARCOS@UC3M ([email protected]) – Twitter: @jdgarciauc3m 89/89