![Page 1: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/1.jpg)
D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010
Synchronizing the timestamps of concurrent events in traces of hybrid MPI/OpenMP applications
![Page 2: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/2.jpg)
2Daniel Becker
• Cluster systems represent majority of today’s supercomputers– Availability of inexpensive
commodity components
• Vast diversity– Architecture– Interconnect technology– Software environment
• Message-passing and shared-memory programming models for communication and synchronization
Cluster systems
![Page 3: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/3.jpg)
3Daniel Becker
• Application areas– Performance analysis
• Time-line visualization• Wait-state analysis
– Performance modeling – Performance prediction– Debugging
• Events recorded at runtime to enable post-mortem analysis of dynamic program behavior
• Event includes at least timestamp, location, and event type
Event tracing
Send
Recv
Barrier
Barrier
E
E
S X E MX
R X E MX
… …E S X E MX
… …E R X E MX
… …S R X X EE E MX MX
merge(opt.)
write
record
![Page 4: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/4.jpg)
4Daniel Becker
Problem: Non-synchronized clocks
![Page 5: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/5.jpg)
5Daniel Becker
Outline
![Page 6: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/6.jpg)
6Daniel Becker
Lamport, Mattern, Fidge,Rabenseifner
Restore and preserve logical correctness
Lamport, Mattern, Fidge,Rabenseifner
Restore and preserve logical correctness
Dunigan, Maillet, Tron, Doleschal
Measure offset values and determine interpolation function
Determine medial smoothing function based on send/receive differences
Duda, Hofman, Hilgers
Query time from reference clocks synchronized at regular intervals
Mills
Clock synchronization
![Page 7: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/7.jpg)
7Daniel Becker
Controlled logical clock
E X
E
S
µmin
XX RE
![Page 8: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/8.jpg)
8Daniel Becker
MPI semantics
E
E
MX
MX
E MX
E
E
MX
MX
E MX
MX
MX
MX
E
E
E MX
MX
MX
E
E
E
![Page 9: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/9.jpg)
9Daniel Becker
• Neither restores nor preserves clock condition in OpenMP event semantics
• May introduce violations in locations that were previously intact
Limitations of the CLC algorithm
R
S
omp_barrier
omp_barrier
Romp_barrier
![Page 10: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/10.jpg)
10Daniel Becker
Collective communication
omp_barrier
omp_barrier
E
E
OX
OX
Consider OpenMP constructs as composed of multiple logical messages
Define logical send/receive pairs for each flavor
![Page 11: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/11.jpg)
11Daniel Becker
OpenMP semantics
E
E
E
F J
OX
OX
OX
OX
OX
OX
E
E
E
U
U
L
Tasking
U
U
L
U
![Page 12: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/12.jpg)
12Daniel Becker
• Operation may have multiple logical receive and send events
• Multiple receives used to synchronize multiple clocks• Latest send event is the relevant send event
Happened-before relation
MXE
E OX
OXE
OXE
![Page 13: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/13.jpg)
13Daniel Becker
• Correct local traces in parallel– Keep whole trace in memory– Exploit distributed memory &
processing capabilities
Parallelization
• Replay communication– Traverse trace in parallel– Exchange data at
synchronization points – Use operation of same type
• MPI functions• OpenMP constructs
![Page 14: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/14.jpg)
14Daniel Becker
222
1
3
Forward replay
1… …
3… …
2… …omp_barrier
omp_barrier
2
omp_barrier1
3
![Page 15: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/15.jpg)
15Daniel Becker
• Avoid new violations• Do not advance send
farther than matching receive
Backward amortization
RS
S
R
![Page 16: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/16.jpg)
16Daniel Becker
• Data on sender side needed
• Communication direction– Communication precedes
in backward direction– Roles of sender and
receiver are inverted
• Traversal direction– Start at end of trace– Avoid deadlocks
Backward replay
S R… …
S R… …
S
S R
R
R
R S
S
![Page 17: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/17.jpg)
17Daniel Becker
Piece-wise correction
LCib
RR
R
RSSSSS
∆tR
R
LCib Controlled logical clock without jump discontinuities
LCi’ – LCib Controlled logical clock with jump discontinuities
LCiA’ - LCi
b Linear interpolation for backward amortization
LCiA - LCi
b Piecewise linear interpolation for backward amortization
Amortization interval
min(LCk’(corr. receive event) - µ - LCib)
dif
fere
nce
s t
o L
Cib
![Page 18: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/18.jpg)
18Daniel Becker
Experimental evaluation
Significant percentage of messages was violated (up to 5%)
After correction all traces were free of clock condition violations
Nic
ole
clus
ter • JSC@FZJ
• 32 compute nodes• 2 quad-core Opteron running at 2.4 GHz• Infiniband Ap
plic
ation
s • PEPC (4 threads per process)
• Jacobi solver (2 threads per process)
Evaluation focused on frequency of clock violations, accuracy, and scalability of the correction
![Page 19: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/19.jpg)
19Daniel Becker
• Event position– Absolute deviations correspond to
value clock condition violations– Relative deviations are negligible
Accuracy of the algorithm
• Event distance– Larger relative deviations possible– Impact on analysis results negligible
Correction only marginally changes the length of local
intervals
Correction changed the length of local intervals
only marginally
![Page 20: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/20.jpg)
20Daniel Becker
• Only violated MPI semantics in original trace• Roughly half of the corrections correspond to
OpenMP semantics
Synchronizing hybrid codes
Algorithm preserved OpenMP semantics
RR
S
omp_barrier
omp_barrier
omp_barrier
omp_barrier
![Page 21: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/21.jpg)
21Daniel Becker
Scalability
![Page 22: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/22.jpg)
22Daniel Becker
Summary
![Page 23: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/23.jpg)
23Daniel Becker
• Exploit knowledge of MPI-internal messaging inside collective operations using PERUSE
• Leverage periodic offset measurements at global synchronization points
Outlook
![Page 24: D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September 21 2010 Synchronizing the timestamps of concurrent events](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e205503460f94b0c792/html5/thumbnails/24.jpg)
24Daniel Becker
Thanks!