parallel and distributed computingparallelcomp.github.io/lecture4_2(betterversion).pdf · parallel...
TRANSCRIPT
![Page 1: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/1.jpg)
Foster’s Methodology: Application Examples
Parallel and Distributed Computing
Department of Computer Science and Engineering (DEI)Instituto Superior Tecnico
October 19, 2011
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 1 / 25
![Page 2: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/2.jpg)
Outline
Foster’s design methodology
partitioningcommunicationagglomerationmapping
Application Examples
Boundary value problemFinding the maximumn-body problem
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 2 / 25
![Page 3: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/3.jpg)
Distributed Memory Systems
Parallel programming of distributed-memory systems is significantlydifferent from shared-memory systems essentially due to the very largeoverhead in terms of:
communication
task initialization / termination
⇒ efficient parallelization requires that these effects be taken into accountfrom the start!
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 3 / 25
![Page 4: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/4.jpg)
Foster’s Design Methodology
Problem
Primitive Tasks
Partitioning
Communication
Agglomeration
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 4 / 25
![Page 5: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/5.jpg)
Foster’s Design Methodology
Problem
Primitive Tasks
Partitioning
Communication
Agglomeration
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 4 / 25
![Page 6: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/6.jpg)
Foster’s Design Methodology
Problem
Primitive Tasks
Partitioning
Communication
Agglomeration
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 4 / 25
![Page 7: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/7.jpg)
Foster’s Design Methodology
Problem
Primitive Tasks
Partitioning
Communication
Agglomeration
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 4 / 25
![Page 8: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/8.jpg)
Foster’s Design Methodology
Problem
Primitive Tasks
Partitioning
Communication
Agglomeration
Mapping
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 4 / 25
![Page 9: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/9.jpg)
Boundary Value Problem
Problem
Determine the evolution of the temperature of a rod in the followingconditions:
rod of length 1
both ends of the rod at 0oC
uniform material
insulated except at the ends
initial temperature of a point at distance x is 100 sin(π x)
A partial differential equation (PDE) governs the temperature of the rodin time. Typically too complicated to solve analytically.
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 5 / 25
![Page 10: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/10.jpg)
Boundary Value Problem
Problem
Determine the evolution of the temperature of a rod in the followingconditions:
rod of length 1
both ends of the rod at 0oC
uniform material
insulated except at the ends
initial temperature of a point at distance x is 100 sin(π x)
A partial differential equation (PDE) governs the temperature of the rodin time. Typically too complicated to solve analytically.
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 5 / 25
![Page 11: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/11.jpg)
Boundary Value Problem
Finite difference methods can be used to obtain an approximate solutionto these complex problems⇒ discretize space and time:
unit-length rod divided into n sections of length h
time from 0 to P divided into m periods of length k
We obtain a grid n ×m of temperatures T (x , t).
Temperature over time is given by:
T (x , t) = r · T (x − 1, t − 1) + (1− 2r) · T (x , t − 1) + r · T (x + 1, t − 1)
where r = k/h2.
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 6 / 25
![Page 12: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/12.jpg)
Boundary Value Problem
Finite difference methods can be used to obtain an approximate solutionto these complex problems⇒ discretize space and time:
unit-length rod divided into n sections of length h
time from 0 to P divided into m periods of length k
We obtain a grid n ×m of temperatures T (x , t).
Temperature over time is given by:
T (x , t) = r · T (x − 1, t − 1) + (1− 2r) · T (x , t − 1) + r · T (x + 1, t − 1)
where r = k/h2.
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 6 / 25
![Page 13: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/13.jpg)
Boundary Value Problem
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 7 / 25
![Page 14: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/14.jpg)
Boundary Value Problem
Partitioning:
Make each T (x , t) computation a primitive task.⇒ 2-dimensional domain decomposition
x
t
n
m
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 8 / 25
![Page 15: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/15.jpg)
Boundary Value Problem
Partitioning:
Make each T (x , t) computation a primitive task.⇒ 2-dimensional domain decomposition
x
t
n
m
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 8 / 25
![Page 16: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/16.jpg)
Boundary Value Problem
Communication:
x
t
n
m
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 9 / 25
![Page 17: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/17.jpg)
Boundary Value Problem
Communication:
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 9 / 25
![Page 18: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/18.jpg)
Boundary Value Problem
Agglomeration:
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 10 / 25
![Page 19: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/19.jpg)
Boundary Value Problem
Agglomeration:
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 10 / 25
![Page 20: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/20.jpg)
Boundary Value Problem
Agglomeration:
Mapping:
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 11 / 25
![Page 21: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/21.jpg)
Boundary Value Problem
Agglomeration:
Mapping:
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 11 / 25
![Page 22: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/22.jpg)
Boundary Value Problem
Agglomeration:
Mapping:
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 11 / 25
![Page 23: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/23.jpg)
Boundary Value Problem
Analysis of execution time:
C : time to computeT (x , t) = rT (x − 1, t − 1) + (1− 2r)T (x , t − 1) + rT (x + 1, t − 1)
D: time to send/receive one T (x , t) from another processor
p: number of processors
Sequential algorithm:
mnC
Parallel algorithm:
computation per time instant:⌈
np
⌉C
communication per time instant: 2D(receiving is synchronous, sending asynchronous)
Total: m(⌈
np
⌉C + 2D)
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 12 / 25
![Page 24: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/24.jpg)
Boundary Value Problem
Analysis of execution time:
C : time to computeT (x , t) = rT (x − 1, t − 1) + (1− 2r)T (x , t − 1) + rT (x + 1, t − 1)
D: time to send/receive one T (x , t) from another processor
p: number of processors
Sequential algorithm: mnC
Parallel algorithm:
computation per time instant:
⌈np
⌉C
communication per time instant: 2D(receiving is synchronous, sending asynchronous)
Total: m(⌈
np
⌉C + 2D)
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 12 / 25
![Page 25: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/25.jpg)
Boundary Value Problem
Analysis of execution time:
C : time to computeT (x , t) = rT (x − 1, t − 1) + (1− 2r)T (x , t − 1) + rT (x + 1, t − 1)
D: time to send/receive one T (x , t) from another processor
p: number of processors
Sequential algorithm: mnC
Parallel algorithm:
computation per time instant:⌈
np
⌉C
communication per time instant:
2D(receiving is synchronous, sending asynchronous)
Total: m(⌈
np
⌉C + 2D)
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 12 / 25
![Page 26: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/26.jpg)
Boundary Value Problem
Analysis of execution time:
C : time to computeT (x , t) = rT (x − 1, t − 1) + (1− 2r)T (x , t − 1) + rT (x + 1, t − 1)
D: time to send/receive one T (x , t) from another processor
p: number of processors
Sequential algorithm: mnC
Parallel algorithm:
computation per time instant:⌈
np
⌉C
communication per time instant: 2D(receiving is synchronous, sending asynchronous)
Total: m(⌈
np
⌉C + 2D)
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 12 / 25
![Page 27: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/27.jpg)
Finding the Maximum
Problem
Determine the maximum over a set of n values.
This is a particular case of a reduction:
a0 ⊕ a1 ⊕ a2 ⊕ · · · ⊕ an−1
where ⊕ can be any associative binary operator.
⇒ a reduction always takes Θ(n) time on a sequential computer
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 13 / 25
![Page 28: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/28.jpg)
Finding the Maximum
Partitioning:
Make each value a primitive task.⇒ 1-dimensional domain decomposition
One task will compute final solution: root task
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 14 / 25
![Page 29: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/29.jpg)
Finding the Maximum
Partitioning:
Make each value a primitive task.⇒ 1-dimensional domain decomposition
One task will compute final solution: root task
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 14 / 25
![Page 30: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/30.jpg)
Finding the Maximum
Communication:
n/4-1 tasks n/4-1 tasks
n-1 tasks n/2-1 tasks
Continue recursively: Binomial Tree
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 15 / 25
![Page 31: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/31.jpg)
Finding the Maximum
Communication:
n/4-1 tasks n/4-1 tasks
n-1 tasks n/2-1 tasks
Continue recursively: Binomial Tree
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 15 / 25
![Page 32: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/32.jpg)
Finding the Maximum
Communication:
n/4-1 tasks n/4-1 tasks
n/2-1 tasks n/2-1 tasks
Continue recursively: Binomial Tree
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 15 / 25
![Page 33: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/33.jpg)
Finding the Maximum
Communication:
n/4-1 tasks n/4-1 tasks
n/4-1 tasks n/4-1 tasks
Continue recursively: Binomial Tree
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 15 / 25
![Page 34: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/34.jpg)
Finding the Maximum
Communication:
n/4-1 tasks n/4-1 tasks
n/4-1 tasks n/4-1 tasks
Continue recursively: Binomial Tree
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 15 / 25
![Page 35: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/35.jpg)
Binomial Trees
Recursive definition of Binomial Tree, with n = 2k nodes:
B
B
B
B0 k
k-1
k-1
B0 B1 B2 B3 B4
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 16 / 25
![Page 36: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/36.jpg)
Binomial Trees
Recursive definition of Binomial Tree, with n = 2k nodes:
B
B
B
B0 k
k-1
k-1
B0 B1 B2 B3 B4
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 16 / 25
![Page 37: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/37.jpg)
Finding the Maximum
Illustrative example:
7
7 3
5
-4
-9
1
-3
3
2
5
7 0
-1 8
1
7
7 3
5
-4
-9
1
-3
3
2
5
7 0
-1 8
1
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 17 / 25
![Page 38: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/38.jpg)
Finding the Maximum
Illustrative example:
7
7 3
5
-4
-9
1
-3
3
2
5
7 0
-1 8
1
7
7 3
5
-4
-9
1
-3
3
2
5
7 0
-1 8
1
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 17 / 25
![Page 39: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/39.jpg)
Finding the Maximum
Illustrative example:
7
7 3
5
-4
-9
1
-3
3
2
5
7 0
-1 8
1
7
7 3
5
-4
-4
1
-3
3
3
7
7 0
0 8
8
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 17 / 25
![Page 40: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/40.jpg)
Finding the Maximum
Illustrative example:
7
7 3
5
-4
-9
1
-3
3
2
5
7 0
-1 8
1
7
7 3
5
-4
-4
1
-3
3
3
7
7 0
0 8
8
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 17 / 25
![Page 41: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/41.jpg)
Finding the Maximum
Illustrative example:
7
7 3
5
-4
-9
1
-3
3
2
5
7 0
-1 8
1
7
7 3
7
-4
-4
1
-3
3
3
7
7 0
0 8
8
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 17 / 25
![Page 42: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/42.jpg)
Finding the Maximum
Illustrative example:
7
7 3
5
-4
-9
1
-3
3
2
5
7 0
-1 8
1
7
7 3
7
-4
-4
1
-3
3
3
7
7 0
0 8
8
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 17 / 25
![Page 43: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/43.jpg)
Finding the Maximum
Illustrative example:
7
7 3
5
-4
-9
1
-3
3
2
5
7 0
-1 8
1
7
7 3
7
-4
-4
7
-3
3
3
7
7 0
0 8
8
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 17 / 25
![Page 44: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/44.jpg)
Finding the Maximum
Illustrative example:
7
7 3
5
-4
-9
1
-3
3
2
5
7 0
-1 8
1
7
7 3
7
-4
-4
7
-3
3
3
7
7 0
0 8
8
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 17 / 25
![Page 45: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/45.jpg)
Finding the Maximum
Agglomeration:
Group n leafs of the tree:
7
7 3
5
-4
-9
1
-3
3
2
5
7 0
-1 8
1
Mapping:
The same (actually, in the agglomeration phase, use n such that you endup with p tasks).
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 18 / 25
![Page 46: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/46.jpg)
Finding the Maximum
Agglomeration:
Group n leafs of the tree:
7
7 3
5
-4
-9
1
-3
3
2
5
7 0
-1 8
1
Mapping:
The same (actually, in the agglomeration phase, use n such that you endup with p tasks).
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 18 / 25
![Page 47: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/47.jpg)
Finding the Maximum
Agglomeration:
Group n leafs of the tree:
7
7 3
5
-4
-9
1
-3
3
2
5
7 0
-1 8
1
Mapping:
The same (actually, in the agglomeration phase, use n such that you endup with p tasks).
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 18 / 25
![Page 48: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/48.jpg)
Finding the Maximum
Agglomeration:
Group n leafs of the tree:
7
7 3
5
-4
-9
1
-3
3
2
5
7 0
-1 8
1
Mapping:
The same (actually, in the agglomeration phase, use n such that you endup with p tasks).
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 18 / 25
![Page 49: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/49.jpg)
Finding the Maximum
Analysis of execution time:
C : time to perform the binary operation (maximum)
D: time to send/receive one value from another processor
p: number of processors
Sequential algorithm:
(n − 1)C
Parallel algorithm:
computation in the leafs: (⌈
np
⌉− 1)C
computation up the tree: dlog pe (C + D)
Total: (⌈
np
⌉− 1)C + dlog pe (C + D)
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 19 / 25
![Page 50: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/50.jpg)
Finding the Maximum
Analysis of execution time:
C : time to perform the binary operation (maximum)
D: time to send/receive one value from another processor
p: number of processors
Sequential algorithm: (n − 1)C
Parallel algorithm:
computation in the leafs:
(⌈
np
⌉− 1)C
computation up the tree: dlog pe (C + D)
Total: (⌈
np
⌉− 1)C + dlog pe (C + D)
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 19 / 25
![Page 51: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/51.jpg)
Finding the Maximum
Analysis of execution time:
C : time to perform the binary operation (maximum)
D: time to send/receive one value from another processor
p: number of processors
Sequential algorithm: (n − 1)C
Parallel algorithm:
computation in the leafs: (⌈
np
⌉− 1)C
computation up the tree:
dlog pe (C + D)
Total: (⌈
np
⌉− 1)C + dlog pe (C + D)
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 19 / 25
![Page 52: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/52.jpg)
Finding the Maximum
Analysis of execution time:
C : time to perform the binary operation (maximum)
D: time to send/receive one value from another processor
p: number of processors
Sequential algorithm: (n − 1)C
Parallel algorithm:
computation in the leafs: (⌈
np
⌉− 1)C
computation up the tree: dlog pe (C + D)
Total: (⌈
np
⌉− 1)C + dlog pe (C + D)
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 19 / 25
![Page 53: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/53.jpg)
n-Body Problem
Problem
Simulate the motion of n particles of varying masses in two dimensions.
compute new positions and velocities
consider gravitational interactions only
Straightforward sequential algorithms solve this problem in Θ(n2) time(however, better time complexity algorithms exist)
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 20 / 25
![Page 54: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/54.jpg)
n-Body Problem
Partitioning:
Make each particle a primitive task.⇒ 1-dimensional domain decomposition
Communication:
All tasks need to communicate with all tasks!
Gather operation: one task receive a dataset from all tasks.
All-gather operation: all tasks receive a dataset from all tasks.
0 1
2 3
0 1
2 3
Implement communication using a hypercube topology!
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 21 / 25
![Page 55: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/55.jpg)
n-Body Problem
Partitioning:
Make each particle a primitive task.⇒ 1-dimensional domain decomposition
Communication:
All tasks need to communicate with all tasks!
Gather operation: one task receive a dataset from all tasks.
All-gather operation: all tasks receive a dataset from all tasks.
0 1
2 3
0 1
2 3
Implement communication using a hypercube topology!
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 21 / 25
![Page 56: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/56.jpg)
n-Body Problem
Partitioning:
Make each particle a primitive task.⇒ 1-dimensional domain decomposition
Communication:
All tasks need to communicate with all tasks!
Gather operation: one task receive a dataset from all tasks.
All-gather operation: all tasks receive a dataset from all tasks.
0 1
2 3
0 1
2 3
Implement communication using a hypercube topology!
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 21 / 25
![Page 57: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/57.jpg)
n-Body Problem
Partitioning:
Make each particle a primitive task.⇒ 1-dimensional domain decomposition
Communication:
All tasks need to communicate with all tasks!
Gather operation: one task receive a dataset from all tasks.
All-gather operation: all tasks receive a dataset from all tasks.
0 1
2 3
0 1
2 3
Implement communication using a hypercube topology!
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 21 / 25
![Page 58: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/58.jpg)
n-Body Problem
Partitioning:
Make each particle a primitive task.⇒ 1-dimensional domain decomposition
Communication:
All tasks need to communicate with all tasks!
Gather operation: one task receive a dataset from all tasks.
All-gather operation: all tasks receive a dataset from all tasks.
0 1
2 3
0 1
2 3
Implement communication using a hypercube topology!
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 21 / 25
![Page 59: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/59.jpg)
n-Body Problem
Partitioning:
Make each particle a primitive task.⇒ 1-dimensional domain decomposition
Communication:
All tasks need to communicate with all tasks!
Gather operation: one task receive a dataset from all tasks.
All-gather operation: all tasks receive a dataset from all tasks.
0 1
2 3
0 1
2 3
Implement communication using a hypercube topology!
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 21 / 25
![Page 60: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/60.jpg)
n-Body Problem
Agglomeration / Mapping:
All primitive tasks have the same computation cost and communicationpattern
⇒ no particular strategy for agglomeration required
Agglomerate n/p primitive tasks, so that we have one task per processor.
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 22 / 25
![Page 61: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/61.jpg)
n-Body Problem
Analysis of execution time (per time instant):
C : time to compute new particle position and velocity
D: time to initiate message
B: (bandwidth) number of data units that can be sent in one unit of time
p: number of processors
Sequential algorithm: Cn(n − 1)Parallel algorithm:
Computation time:
C np (n − 1)
Communication time:one message with k data units: D + k
Bdepth of tree: log plength of message at depth i : 2i−1 n
p
total communication time:∑log p
i=1
(D + 2i−1n
pB
)= D log p + n(p−1)
pB
Total: D log p + np
(p−1B + C (n − 1)
)
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 23 / 25
![Page 62: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/62.jpg)
n-Body Problem
Analysis of execution time (per time instant):
C : time to compute new particle position and velocity
D: time to initiate message
B: (bandwidth) number of data units that can be sent in one unit of time
p: number of processors
Sequential algorithm: Cn(n − 1)Parallel algorithm:
Computation time: C np (n − 1)
Communication time:
one message with k data units: D + kB
depth of tree: log plength of message at depth i : 2i−1 n
p
total communication time:∑log p
i=1
(D + 2i−1n
pB
)= D log p + n(p−1)
pB
Total: D log p + np
(p−1B + C (n − 1)
)
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 23 / 25
![Page 63: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/63.jpg)
n-Body Problem
Analysis of execution time (per time instant):
C : time to compute new particle position and velocity
D: time to initiate message
B: (bandwidth) number of data units that can be sent in one unit of time
p: number of processors
Sequential algorithm: Cn(n − 1)Parallel algorithm:
Computation time: C np (n − 1)
Communication time:one message with k data units:
D + kB
depth of tree: log plength of message at depth i : 2i−1 n
p
total communication time:∑log p
i=1
(D + 2i−1n
pB
)= D log p + n(p−1)
pB
Total: D log p + np
(p−1B + C (n − 1)
)
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 23 / 25
![Page 64: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/64.jpg)
n-Body Problem
Analysis of execution time (per time instant):
C : time to compute new particle position and velocity
D: time to initiate message
B: (bandwidth) number of data units that can be sent in one unit of time
p: number of processors
Sequential algorithm: Cn(n − 1)Parallel algorithm:
Computation time: C np (n − 1)
Communication time:one message with k data units: D + k
Bdepth of tree:
log plength of message at depth i : 2i−1 n
p
total communication time:∑log p
i=1
(D + 2i−1n
pB
)= D log p + n(p−1)
pB
Total: D log p + np
(p−1B + C (n − 1)
)
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 23 / 25
![Page 65: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/65.jpg)
n-Body Problem
Analysis of execution time (per time instant):
C : time to compute new particle position and velocity
D: time to initiate message
B: (bandwidth) number of data units that can be sent in one unit of time
p: number of processors
Sequential algorithm: Cn(n − 1)Parallel algorithm:
Computation time: C np (n − 1)
Communication time:one message with k data units: D + k
Bdepth of tree: log plength of message at depth i :
2i−1 np
total communication time:∑log p
i=1
(D + 2i−1n
pB
)= D log p + n(p−1)
pB
Total: D log p + np
(p−1B + C (n − 1)
)
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 23 / 25
![Page 66: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/66.jpg)
n-Body Problem
Analysis of execution time (per time instant):
C : time to compute new particle position and velocity
D: time to initiate message
B: (bandwidth) number of data units that can be sent in one unit of time
p: number of processors
Sequential algorithm: Cn(n − 1)Parallel algorithm:
Computation time: C np (n − 1)
Communication time:one message with k data units: D + k
Bdepth of tree: log plength of message at depth i : 2i−1 n
p
total communication time:
∑log pi=1
(D + 2i−1n
pB
)= D log p + n(p−1)
pB
Total: D log p + np
(p−1B + C (n − 1)
)
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 23 / 25
![Page 67: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/67.jpg)
n-Body Problem
Analysis of execution time (per time instant):
C : time to compute new particle position and velocity
D: time to initiate message
B: (bandwidth) number of data units that can be sent in one unit of time
p: number of processors
Sequential algorithm: Cn(n − 1)Parallel algorithm:
Computation time: C np (n − 1)
Communication time:one message with k data units: D + k
Bdepth of tree: log plength of message at depth i : 2i−1 n
p
total communication time:∑log p
i=1
(D + 2i−1n
pB
)= D log p + n(p−1)
pB
Total: D log p + np
(p−1B + C (n − 1)
)CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 23 / 25
![Page 68: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/68.jpg)
Review
Foster’s design methodology
partitioningcommunicationagglomerationmapping
Application Examples
Boundary value problemFinding the maximumn-body problem
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 24 / 25
![Page 69: Parallel and Distributed Computingparallelcomp.github.io/Lecture4_2(betterversion).pdf · Parallel algorithm: computation per time instant: l n p m C communication per time instant:](https://reader033.vdocument.in/reader033/viewer/2022050605/5facee6ce3d2970cb946266b/html5/thumbnails/69.jpg)
Next Class
MPI
CPD (DEI / IST) Parallel and Distributed Computing – 11 2011-10-19 25 / 25