blue gene simulator
DESCRIPTION
Blue Gene Simulator. Gengbin Zheng [email protected] Gunavardhan Kakulapati [email protected] Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign http://charm.cs.uiuc.edu. Overview. Blue Gene Emulator Blue Gene Simulator - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/1.jpg)
1
Blue Gene SimulatorBlue Gene Simulator
Gengbin [email protected]
Gunavardhan [email protected]
Parallel Programming LaboratoryDepartment of Computer Science
University of Illinois at Urbana-Champaignhttp://charm.cs.uiuc.edu
![Page 2: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/2.jpg)
2
OverviewOverview
Blue Gene Emulator
Blue Gene Simulator
Timing correction schemes
Performance and results
![Page 3: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/3.jpg)
3
Emulation on a Parallel MachineEmulation on a Parallel Machine
Simulating (Host) Processor
BG/C Nodes
Hardware thread
![Page 4: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/4.jpg)
4
Blue Gene Emulator: functional viewBlue Gene Emulator: functional view
Communication threads
Non-affinity message queues Affinity message queues
Worker threads
inBuffer
One Blue Gene/C node
CorrectionQ
![Page 5: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/5.jpg)
5
Blue Gene Emulator: functional viewBlue Gene Emulator: functional view
Affinity message queues
Communication threads
Worker threads
inBuff
Non-affinity message queues
CorrectionQ
Converse scheduler
Converse Q
Communication threads
Worker threads
inBuff
Non-affinity message queues
CorrectionQ Affinity message
queues
![Page 6: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/6.jpg)
6
What is capable …What is capable …
Blue Gene API supportBlue Gene Charm++
– Structured DaggerTrace Projections
![Page 7: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/7.jpg)
7
Emulator to SimulatorEmulator to Simulator
Emulator:
– Study programming model and application development
Simulator:
– performance prediction capability
– models communication latency based on network model;
– Doesn’t model memory access on chip, or network
contention
![Page 8: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/8.jpg)
8
SimulatorSimulator
Parallel performance is hard to model– Communication subsystem
Out of order messagesCommunication/computation overlap
– Event dependenciesParallel Discrete Event Simulation
– Emulation program executes in parallel with event time stamp correction.
– Exploit inherent determinacy of application
![Page 9: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/9.jpg)
9
How to simulate?How to simulate? Time stamping events
– Per thread timer (sharing one physical timer)
– Time stamp messages Calculate communication latency based on network model
Parallel event simulation– When a message is sent out, calculate the predicted
arrival time for the destination bluegene-processor
– When a message is received, update current time. currTime = max(currTime,recvTime)
– Time stamp correction
![Page 10: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/10.jpg)
10
Thread Timer: curT
Time Stamping messages and threadsTime Stamping messages and threadsMessage sent:RecvT(msg) = curT+Latency
Message scheduled:curT = max(curT, RecvT(msg))
![Page 11: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/11.jpg)
11
Need for timestamp correctionNeed for timestamp correction
Time stamp correction needed for out-of-order messages
Out-of-order delivery can occur:– A message arrives late while some other
message updates the thread time to future– So late message executes in the context of
future, although its predicted time is earlier
![Page 12: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/12.jpg)
12
Parallel correction algorithmParallel correction algorithmSort message execution by receive time;Adjust time stamps when neededUse correction message to inform the change
in event startTime.Send out correction messages following the
path message was sentThe events already in the timeline may have
to move.
![Page 13: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/13.jpg)
13
M8
M1 M7M6M5M4M3M2
RecvTime
ExecutionTimeLine
Timestamps CorrectionTimestamps Correction
![Page 14: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/14.jpg)
14
M8M1 M7M6M5M4M3M2
RecvTime
ExecutionTimeLine
Timestamps CorrectionTimestamps Correction
![Page 15: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/15.jpg)
15
M1 M7M6M5M4M3M2
RecvTime
ExecutionTimeLine
M8
ExecutionTimeLineM1 M7M6M5M4M3M2 M8
RecvTime
Correction Message
Timestamps CorrectionTimestamps Correction
![Page 16: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/16.jpg)
16
M1 M7M6M5M4M3M2
RecvTime
ExecutionTimeLine
Correction Message (M4)
M4
Correction Message (M4)
M4
M1 M7M4M3M2
RecvTime
ExecutionTimeLineM5 M6
Correction Message
M1 M7M6M4 M3M2
RecvTime
ExecutionTimeLineM5
Correction Message
Timestamps CorrectionTimestamps Correction
![Page 17: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/17.jpg)
17
Linear-order correctionLinear-order correction
Works only when– Programs have no alternate orders of
execution possible– Messages are processed in the same order for
multiple executions– Eg: MPI programs with no-wildcard recvs,
structured-dagger code with no “overlap” or “forall”.
![Page 18: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/18.jpg)
18
Reasons:Reasons:
Correction algorithm breaks dependency logic– Only based on receive time;– Cases:
When an event depends on several messages– Last message triggers the computation
Message buffered until some condition holdsExample for invalid correction scheme:
Jacobi-1D
![Page 19: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/19.jpg)
19
![Page 20: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/20.jpg)
20
SolutionSolution
Use structured dagger to retrieve dependence information
As the program runs, form a chain of bluegene logs preserving the dependency information .
Bluegene logs for entry functions and structured dagger functions
![Page 21: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/21.jpg)
21
Timestamp correction schemeTimestamp correction scheme
Every event has a list of backward and forward dependents.
An event cannot start till its backward dependents have finished.
Define effRecvTime =
max(recvTime, endOfBackDeps) An event can start only after its effRecvTime.
startTime = max(effRecvTime,timeline.last.endTime)
![Page 22: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/22.jpg)
22
Timestamp correction schemeTimestamp correction scheme
Timeline is not sorted on the recvTime of the event like the previous case.
Timeline is sorted based on the effRecvTime. Steps to process a correction message
– Find the earliest updated event due to the message
– Cut the timeline from that event
– Calculate new effRecvTimes from then.
– Reinsert into the timeline in the order of effRecvTime
![Page 23: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/23.jpg)
23
Non-linear order correction Non-linear order correction schemeschemeThe new scheme :
– Takes into account the event dependencies– Works even when messages can be received in
different orders in different runs.– Requires all the dependencies to be captured
using structured dagger.But the timing correction is very slow.
Several optimizations possible.
![Page 24: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/24.jpg)
24
Optimizations to online Optimizations to online correction schemecorrection schemeOverwrite old corrections:
– An event can get multiple correction messages.
– Reduce the number of corrections– Same scheme if correction message arrives
earlier than the message itself Use multisend
– Messages destined to same real processor but different events can be sent collectively.
![Page 25: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/25.jpg)
25
More optimizationsMore optimizations Prioritize messages based on their predicted
recvTime. Lazy processing
– Process correction messages periodically.
– Allows corrections to be overwritten. Batch processing
– Process many correction messages at a time
– Many events will be affected
– Choose the earliest and reinsert in the order of effRecvTime.
Ability to start corrections in the middle– Can ignore the startup events for timing correction
![Page 26: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/26.jpg)
26
Timing correction still very slow.Observations:
– Don’t let the execution go far ahead of the correction wave.
– A large difference means many wrong events to be corrected.
– Closely following the execution wave also may not help.
A new scheme – Similar to the one used for gvt (Global virtual
time)
![Page 27: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/27.jpg)
27
GVT-like schemeGVT-like schemeUse heartbeat
– Periodically broadcast asking for gvtGvt
– Is the time after which the events are invalid due to pending corrections
– Compute the gvt as the minimum of predict recvTimes of all correction messages and startTimes of all affected events.
Use a parameter “leash”. Execution of the program cannot go beyond “gvt + leash”
![Page 28: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/28.jpg)
28
Projections before correctionProjections before correction
![Page 29: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/29.jpg)
29
Projections after correctionProjections after correction
![Page 30: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/30.jpg)
30
Correctness of the scheme (using Correctness of the scheme (using Jacobi1D)Jacobi1D)
![Page 31: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/31.jpg)
31
Predicted time vs latency factorPredicted time vs latency factor
![Page 32: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/32.jpg)
32
Predicted speedupPredicted speedup
![Page 33: Blue Gene Simulator](https://reader035.vdocument.in/reader035/viewer/2022070405/56813e60550346895da86638/html5/thumbnails/33.jpg)
33
More workMore workOngoing work
– Make sure gvt scheme is correctFuture work
– The presented scheme is on-line correction– Explore the off-line (post-mortem) correction
scheme using generated traces.