![Page 1: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/1.jpg)
Introduction to Parallel Algorithm Analysis
Jeff M. Phillips
October 2, 2011
![Page 2: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/2.jpg)
Petri Nets
C. A. Petri [1962] introduced analysis modelfor concurrent systems.
I Flow chart
I Described data flow and dependencies.
I Very low level (we want something morehigh-level)
I Reachability EXP-SPACE-HARD,Decidable
![Page 3: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/3.jpg)
Critical Regions Problem
Edsger Dijkstra [1965]
I Mutex: “Mutual exclusion” of variable
I Semaphores : Locks/Unlocks access to (multiple) data.
I Semaphore more general - keeps a count. Mutex binary.
Important, but lower level details.
![Page 4: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/4.jpg)
Critical Regions Problem
Edsger Dijkstra [1965]
I Mutex: “Mutual exclusion” of variable
I Semaphores : Locks/Unlocks access to (multiple) data.
I Semaphore more general - keeps a count. Mutex binary.
Important, but lower level details.
![Page 5: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/5.jpg)
Amdahl’s and Gustafson’s Laws
Amdahl’s Law : Gene Amdahl[1967]
I Small portion (fraction α)non-parallelizable
I Limits max speed-upS = 1/α.
Gustafson’s Law :Gustafson+Barsis [1988]
I Small portion (fraction α)non-parallelizable
I P processors
I Limits max speed-upS(P) = P − α(P − 1).
Tseq
Tpar
S =Tseq
Tpar
S(P ) =Tseq
Tpar(P )Tpar(P )
![Page 6: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/6.jpg)
Logical Clocks
Leslie Lamport [1978]
I Posed parallel problems as finitestate machine
I Preserved (only) partial order:“happens before” mutex
![Page 7: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/7.jpg)
Logical Clocks
Leslie Lamport [1978]
I Posed parallel problems as finitestate machine
I Preserved (only) partial order:“happens before” mutex
![Page 8: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/8.jpg)
Logical Clocks
Leslie Lamport [1978]
I Posed parallel problems as finitestate machine
I Preserved (only) partial order:“happens before” mutex
![Page 9: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/9.jpg)
Logical Clocks
Leslie Lamport [1978]
I Posed parallel problems as finitestate machine
I Preserved (only) partial order:“happens before” mutex
![Page 10: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/10.jpg)
Logical Clocks
Leslie Lamport [1978]
I Posed parallel problems as finitestate machine
I Preserved (only) partial order:“happens before” mutex
![Page 11: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/11.jpg)
Logical Clocks
Leslie Lamport [1978]
I Posed parallel problems as finitestate machine
I Preserved (only) partial order:“happens before” mutex
Highlights nuances and difficulties inclock synchronization.
![Page 12: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/12.jpg)
DAG Model
Directed Acyclic Graph:
I Each node represents a chunk ofcomputation that is to be done ona single processor
I Directed edges indicate that thefrom node must be completedbefore the to node
I The longest path in the DAGrepresents the total amount ofparallel time of the algorithm
I The width of the DAG indicatesthe number of processors that canbe used at once
![Page 13: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/13.jpg)
DAG Model
Directed Acyclic Graph:
I Each node represents a chunk ofcomputation that is to be done ona single processor
I Directed edges indicate that thefrom node must be completedbefore the to node
I The longest path in the DAGrepresents the total amount ofparallel time of the algorithm
I The width of the DAG indicatesthe number of processors that canbe used at once
![Page 14: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/14.jpg)
DAG Model
Directed Acyclic Graph:
I Each node represents a chunk ofcomputation that is to be done ona single processor
I Directed edges indicate that thefrom node must be completedbefore the to node
I The longest path in the DAGrepresents the total amount ofparallel time of the algorithm
I The width of the DAG indicatesthe number of processors that canbe used at once
A1 A2
A3+A4
A3 A4
A1+A2+A3+A4
A1+A2
![Page 15: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/15.jpg)
PRAM Model
RAM
CPU1 CPU2 CPUp
Steve Fortune and James Wyllie [1978].“shared memory model”
I P processors which operate on ashared data
I For each processor read, write, op(e.g. +, −, ×) constant time.
I CREW : Concurrent read,exclusive write
I CRCW : Concurrent read,concurrent write
I EREW : Exclusive read, exclusivewrite
![Page 16: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/16.jpg)
PRAM Model
RAM
CPU1 CPU2 CPUp
Steve Fortune and James Wyllie [1978].“shared memory model”
I P processors which operate on ashared data
I For each processor read, write, op(e.g. +, −, ×) constant time.
I CREW : Concurrent read,exclusive write
I CRCW : Concurrent read,concurrent write
I EREW : Exclusive read, exclusivewrite
![Page 17: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/17.jpg)
PRAM Model
RAM
CPU1 CPU2 CPUp
A1 A2
A3+A4
A3 A4
A1+A2+A3+A4
A1+A2
Steve Fortune and James Wyllie [1978].“shared memory model”
I P processors which operate on ashared data
I For each processor read, write, op(e.g. +, −, ×) constant time.
I CREW : Concurrent read,exclusive write
I CRCW : Concurrent read,concurrent write
I EREW : Exclusive read, exclusivewrite
![Page 18: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/18.jpg)
Message Passing Model
Emphasizes Locality
I send(X , i) : sends X to Pi
I receive(Y , j) : receives Y from Pj
I Fixed topology, can onlysend/receive from neighbor
Common Topologies:
I Array/Ring Topology•
I Mesh Topology•
I Hypercube Topology•
![Page 19: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/19.jpg)
Message Passing Model
Emphasizes Locality
I send(X , i) : sends X to Pi
I receive(Y , j) : receives Y from Pj
I Fixed topology, can onlysend/receive from neighbor
Common Topologies:
I Array/Ring Topology•
I Mesh Topology•
I Hypercube Topology•
![Page 20: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/20.jpg)
Message Passing Model
Emphasizes Locality
I send(X , i) : sends X to Pi
I receive(Y , j) : receives Y from Pj
I Fixed topology, can onlysend/receive from neighbor
Common Topologies:
I Array/Ring Topology• (Ω(p) rounds)
I Mesh Topology•
I Hypercube Topology•
![Page 21: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/21.jpg)
Message Passing Model
Emphasizes Locality
I send(X , i) : sends X to Pi
I receive(Y , j) : receives Y from Pj
I Fixed topology, can onlysend/receive from neighbor
Common Topologies:
I Array/Ring Topology• (Ω(p) rounds)
I Mesh Topology•
I Hypercube Topology•
![Page 22: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/22.jpg)
Message Passing Model
Emphasizes Locality
I send(X , i) : sends X to Pi
I receive(Y , j) : receives Y from Pj
I Fixed topology, can onlysend/receive from neighbor
Common Topologies:
I Array/Ring Topology• (Ω(p) rounds)
I Mesh Topology• (Ω(
√p) rounds)
I Hypercube Topology•
![Page 23: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/23.jpg)
Message Passing Model
Emphasizes Locality
I send(X , i) : sends X to Pi
I receive(Y , j) : receives Y from Pj
I Fixed topology, can onlysend/receive from neighbor
Common Topologies:
I Array/Ring Topology• (Ω(p) rounds)
I Mesh Topology• (Ω(
√p) rounds)
I Hypercube Topology•
0000
0010
0001
0011
1000
1010
1001
1011
0100
0110
0101
0111
1100
1110
1101
1111
![Page 24: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/24.jpg)
Message Passing Model
Emphasizes Locality
I send(X , i) : sends X to Pi
I receive(Y , j) : receives Y from Pj
I Fixed topology, can onlysend/receive from neighbor
Common Topologies:
I Array/Ring Topology• (Ω(p) rounds)
I Mesh Topology• (Ω(
√p) rounds)
I Hypercube Topology• (Ω(log p) rounds)
0000
0010
0001
0011
1000
1010
1001
1011
0100
0110
0101
0111
1100
1110
1101
1111
![Page 25: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/25.jpg)
Programming in MPI
Open MPI :
I (Open Source High Performance Computing).
I http://www.open-mpi.org/
When to use MPI?
I Critical to exploit locality (i.e. scientific simulations)
I Complication in only talking to neighbor
![Page 26: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/26.jpg)
Programming in MPI
Open MPI :
I (Open Source High Performance Computing).
I http://www.open-mpi.org/
When to use MPI?
I Critical to exploit locality (i.e. scientific simulations)
I Complication in only talking to neighbor
![Page 27: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/27.jpg)
Bulk Synchronous Parallel
Les Valiant [1989] BSPCreates “barriers” in parallel algorithm.
1. Each processor computes in data
2. Processors send/receive data
3. Barrier : All processors wait forcommunication to end globally
Allows for easy synchronization. Easierto analysis since handles many messysynchronization details if this isemulated.
CPU
RAM
CPU
RAM
CPU
RAM
CPU
RAM
CPU
RAM
![Page 28: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/28.jpg)
Bulk Synchronous Parallel
Les Valiant [1989] BSPCreates “barriers” in parallel algorithm.
1. Each processor computes in data
2. Processors send/receive data
3. Barrier : All processors wait forcommunication to end globally
Allows for easy synchronization. Easierto analysis since handles many messysynchronization details if this isemulated.
CPU
RAM
CPU
RAM
CPU
RAM
CPU
RAM
CPU
RAM
![Page 29: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/29.jpg)
Bulk Synchronous Parallel
Les Valiant [1989] BSPCreates “barriers” in parallel algorithm.
1. Each processor computes in data
2. Processors send/receive data
3. Barrier : All processors wait forcommunication to end globally
Allows for easy synchronization. Easierto analysis since handles many messysynchronization details if this isemulated.
CPU
RAM
CPU
RAM
CPU
RAM
CPU
RAM
CPU
RAM
![Page 30: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/30.jpg)
Bridging Model for Multi-Core Computing
Les Valiant [2010] Mulit-BSPMany parameters:
I P : number of processors
I M : Memory/Cache Size
I B : Block Size/Cost
I L : Synchronization Costs
Argues: any portable and efficientparallel algorithm, must take intoaccount all of these parameters.
Advantages:
I Analyzes all levels of architecturetogether
I Like Cache-Oblivious, but notoblivious
At depth d uses parameters:⋃i (pi , gi , Li ,mi )
I pi : number of subcomponents(processors at leaf)
I gi : communication bandwidth(e.g. I/O cost)
I Li : synchronization cost
I mi : memory/cache sizeMatrix Multiplication, Fast Fourier Transform, Sorting
![Page 31: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/31.jpg)
Bridging Model for Multi-Core Computing
Les Valiant [2010] Mulit-BSPMany parameters:
I P : number of processors
I M : Memory/Cache Size
I B : Block Size/Cost
I L : Synchronization Costs
Argues: any portable and efficientparallel algorithm, must take intoaccount all of these parameters.
Advantages:
I Analyzes all levels of architecturetogether
I Like Cache-Oblivious, but notoblivious
At depth d uses parameters:⋃i (pi , gi , Li ,mi )
I pi : number of subcomponents(processors at leaf)
I gi : communication bandwidth(e.g. I/O cost)
I Li : synchronization cost
I mi : memory/cache sizeMatrix Multiplication, Fast Fourier Transform, Sorting
![Page 32: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/32.jpg)
Bridging Model for Multi-Core Computing
Les Valiant [2010] Mulit-BSPMany parameters:
I P : number of processors
I M : Memory/Cache Size
I B : Block Size/Cost
I L : Synchronization Costs
Argues: any portable and efficientparallel algorithm, must take intoaccount all of these parameters.
Advantages:
I Analyzes all levels of architecturetogether
I Like Cache-Oblivious, but notoblivious
At depth d uses parameters:⋃i (pi , gi , Li ,mi )
I pi : number of subcomponents(processors at leaf)
I gi : communication bandwidth(e.g. I/O cost)
I Li : synchronization cost
I mi : memory/cache size
Matrix Multiplication, Fast Fourier Transform, Sorting
![Page 33: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/33.jpg)
Bridging Model for Multi-Core Computing
Les Valiant [2010] Mulit-BSPMany parameters:
I P : number of processors
I M : Memory/Cache Size
I B : Block Size/Cost
I L : Synchronization Costs
Argues: any portable and efficientparallel algorithm, must take intoaccount all of these parameters.
Advantages:
I Analyzes all levels of architecturetogether
I Like Cache-Oblivious, but notoblivious
At depth d uses parameters:⋃i (pi , gi , Li ,mi )
I pi : number of subcomponents(processors at leaf)
I gi : communication bandwidth(e.g. I/O cost)
I Li : synchronization cost
I mi : memory/cache sizeMatrix Multiplication, Fast Fourier Transform, Sorting
![Page 34: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/34.jpg)
Two types of programmers
1. Wants to optimize the heck out of everything, tune allparameters
2. Wants to get something working, not willing to work too hard
![Page 35: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/35.jpg)
Two types of programmers
1. Wants to optimize the heck out of everything, tune allparameters
2. Wants to get something working, not willing to work too hard
![Page 36: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/36.jpg)
Two types of programmers
1. Wants to optimize the heck out of everything, tune allparameters
2. Wants to get something working, not willing to work too hard
![Page 37: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/37.jpg)
MapReduce
Each Processor has full hard drive,data items < key,value >.Parallelism Procedes in Rounds:
I Map: assigns items to processorby key.
I Reduce: processes all items usingvalue. Usually combines manyitems with same key.
Repeat M+R a constant number oftimes, often only one round.
I Optional post-processing step.
CPU
RAM
CPU
RAM
CPU
RAM
MAP
REDUCE
Pro: Robust (duplication) and simple. Can harness LocalityCon: Somewhat restrictive model
![Page 38: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/38.jpg)
General Purpose GPU
Massive parallelism on your desktop.Uses Graphics Processing Unit.Designed for efficient video rasterizing.Each processor corresponds to pixel p
I depth buffer:D(p) = mini ||x − wi ||
I color buffer: C (p) =∑
i αiχi
I ...
X
wi
p
Pro: Fine grain, massive parallelism. Cheap.Con: Somewhat restrictive model. Small memory.
![Page 39: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/39.jpg)
... and Beyond
Google Sawzall?
I Compute statistics on massive distributed data.
I Separates local computation from aggregation.
Processing in memory?
I GPU has large cost in transferring data
I Can same work be done directly on memory?
Massive, Unorganized, Distributed Computing
I Bit-Torrent (distributed hash tables)
I SETI @ Home
I sensor networks
![Page 40: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/40.jpg)
... and Beyond
Google Sawzall?
I Compute statistics on massive distributed data.
I Separates local computation from aggregation.
Processing in memory?
I GPU has large cost in transferring data
I Can same work be done directly on memory?
Massive, Unorganized, Distributed Computing
I Bit-Torrent (distributed hash tables)
I SETI @ Home
I sensor networks
![Page 41: Introduction to Parallel Algorithm Analysisjeffp/teaching/cs7960/parallel... · 2011. 10. 3. · Leslie Lamport [1978] I Posed parallel problems as nite state machine I Preserved](https://reader035.vdocument.in/reader035/viewer/2022071516/6138b6170ad5d20676496ce7/html5/thumbnails/41.jpg)
... and Beyond
Google Sawzall?
I Compute statistics on massive distributed data.
I Separates local computation from aggregation.
Processing in memory?
I GPU has large cost in transferring data
I Can same work be done directly on memory?
Massive, Unorganized, Distributed Computing
I Bit-Torrent (distributed hash tables)
I SETI @ Home
I sensor networks