a concurrent matrix transpose algorithm, the implementation presentedby pourya jafari
TRANSCRIPT
A Concurrent Matrix A Concurrent Matrix Transpose Algorithm, The Transpose Algorithm, The
ImplementationImplementation
PresentedPresented
byby
Pourya JafariPourya Jafari
Review: Algorithm StepsReview: Algorithm Steps
Pre-process inside Pre-process inside each threadeach thread Shift rowsShift rows
Intra-process/thread Intra-process/thread communicationcommunication Shift columnsShift columns
Post-process inside Post-process inside each threadeach thread Shift rows againShift rows again
0000 0101 0202 0303
1010 1111 1212 1313
2020 2121 2222 2323
3030 3131 3232 3333
Review: Shift values?Review: Shift values?
Set shifts based on row index : range 0 to N-1Set shifts based on row index : range 0 to N-1
Now arrange the rows, so that column shifts gets Now arrange the rows, so that column shifts gets us to ius to i Preprocess shifting: i’ = i - L Preprocess shifting: i’ = i - L After intra-process shift columns should be equal to After intra-process shift columns should be equal to
original row index ioriginal row index i i’ + j = i i’ + j = i i - L + j = i i - L + j = i L = - j L = - j
So we shift each column j cells upSo we shift each column j cells up
Review: Last step ?Review: Last step ?
1 → 2: Column shift j up1 → 2: Column shift j up
2 → 3: Row shift based on row indices2 → 3: Row shift based on row indices
3 → 4: ?3 → 4: ? Change of indices so farChange of indices so far
(i - j, j) → (i - j, i - j + j) (i - j, j) → (i - j, i - j + j) (i - j, i) = (m, n) (i - j, i) = (m, n) One operation to change row index to jOne operation to change row index to j
n - m = (i - (i - j))= jn - m = (i - (i - j))= j
0000 0101 0202 0303
1010 1111 1212 1313
2020 2121 2222 2323
3030 3131 3232 3333
0000 1111 2222 3333
1010 2121 3232 0303
2020 3131 0202 1313
3030 0101 1212 2323
0000 0101 0202 0303
1010 1111 1212 1313
2020 2121 2222 2323
3030 3131 3232 3333
0000 1111 2222 3333
0303 1010 2121 3232
0202 1313 2020 3131
0101 1212 2323 3030
0000 1010 2020 3030
0101 1111 2121 3131
0202 1212 2222 3232
0303 1313 2323 3333
(1) (2-a) (2-b) (3)
(4)
Review: Radix Review: Radix
Using radix representation, we can group Using radix representation, we can group row shiftsrow shifts
We use radix 2 for simplicityWe use radix 2 for simplicity Digits are bit representation, Shift all row Digits are bit representation, Shift all row
indices have their k-th bit onindices have their k-th bit on
00
11
22
33
00
11
22
33
00
11
22
33
Shift for each row k=0 k=1
= +
The concurrency pictureThe concurrency picture
Each thread can do pre/post processing Each thread can do pre/post processing independently independently
Processes must synchronize Processes must synchronize after each phaseafter each phase after each step of intra-process stepafter each step of intra-process step during intra-process communicationsduring intra-process communications
Communication package (1)Communication package (1)
We need a mean of communicationWe need a mean of communication Facilitates synchronized communicationFacilitates synchronized communication Provides unbuffered communication to save Provides unbuffered communication to save
memorymemory
JCSP: based on the algebra of JCSP: based on the algebra of Communicating Sequential Processes Communicating Sequential Processes ((CSPCSP) ) has strong theory backgroundhas strong theory background Object OrientedObject Oriented
Communication package (2)Communication package (2)
JCSP provides JCSP provides One2OneChannelOne2OneChannel
Where a single sender can send and a single Where a single sender can send and a single receiver can receivereceiver can receive
One2AnyChannelOne2AnyChannelWhere a single sender and many receiver can Where a single sender and many receiver can communicate but one at the same timecommunicate but one at the same time
Any2OneChannelAny2OneChannelMultiple senders and one receiver Multiple senders and one receiver
Classes (1)Classes (1)
CProcess: Column processCProcess: Column process Has a PID; Knows N; Has an array to save its Has a PID; Knows N; Has an array to save its
itemsitems One2OneChannel to each other process for One2OneChannel to each other process for
intra-process shift operationintra-process shift operation One2AnyChannel to MProcess to receive One2AnyChannel to MProcess to receive
start/resume callsstart/resume calls Any2OneChannel to MProcess to signal that Any2OneChannel to MProcess to signal that
this CProcess has finished current stepthis CProcess has finished current step
Classes (2)Classes (2)
MProcess: Master ProcessMProcess: Master Process One2Any Channel AnytoOneChannel to any One2Any Channel AnytoOneChannel to any
CProcessCProcess Synchronizes the phases and intra-process Synchronizes the phases and intra-process
communication by waiting for all CProcesses communication by waiting for all CProcesses to finish current phase and then resume them to finish current phase and then resume them for the next phasefor the next phase
Classes (3)Classes (3)
Launcher: Threads driverLauncher: Threads driver Create channelsCreate channels Create one MProcess and CProcessCreate one MProcess and CProcess Run them in parallelRun them in parallel
Intra-process communication in Intra-process communication in CProcessCProcess
Might send/receive multiple itemsMight send/receive multiple items Determines the indices that need to be shifted Determines the indices that need to be shifted Packs them in form of a messagePacks them in form of a message Sends the message to the next CProcess and Sends the message to the next CProcess and
receive from the previous process in the shift receive from the previous process in the shift chainchain
Unpack the received messageUnpack the received message Assign the items inside to the same indices Assign the items inside to the same indices
determined in the first stepdetermined in the first step
UML DiagramUML Diagram
-PID : int-N : int-ch0 : One2OneChannel-ch1 : Any2OneChannel-ch2 : One2AnyChannel
CProcess
+run()
CSProcess
+In()+Out()
One2OneChannel
+In()+Out()
One2AnyChannel
+In()+Out()
Any2OneChannel
-N : int-ch1 : Any2OneChannel-ch2 : One2AnyChannel
MProcess -N : int-ch0 : One2OneChannel-ch1 : Any2OneChannel-ch2 : One2AnyChannel-CPs : CSProcess
Launcher
The Intraprocess ShiftThe Intraprocess Shift
Synchronized send and then receiveSynchronized send and then receive
Cycle might formCycle might form
All CProcesses will go to send state and wait All CProcesses will go to send state and wait for the next CProcess to receivefor the next CProcess to receive
None of CSProcesses receive -> DeadlockNone of CSProcesses receive -> Deadlock
76543210 8
The Shift Cycle (1)The Shift Cycle (1)
One CProcess in the cycle should One CProcess in the cycle should receive to break the cyclereceive to break the cycle
But will lose the value which has to But will lose the value which has to sendsend
Receives and buffers the send valueReceives and buffers the send value
Sends and then assign the buffered Sends and then assign the buffered value to the relevant array cellvalue to the relevant array cell
The Shift Cycle (3)The Shift Cycle (3)
Cycles happen when the interleaving Cycles happen when the interleaving value h divides Nvalue h divides N
We do buffered read for all numbers less We do buffered read for all numbers less than hthan h
76543210 8
The Shift Cycle (3)The Shift Cycle (3)
Even after this, the program runs into Even after this, the program runs into deadlock againdeadlock again
Cycles form when gcd(h, N) is greater Cycles form when gcd(h, N) is greater than 1than 1Must buffer values less than equal to Must buffer values less than equal to gcd(h, N)gcd(h, N)
76543210 8
ResultsResults