Download - Multiprocessor Kernel Performance Profiling
![Page 1: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/1.jpg)
March 12, 2001Kperfmon-MP
Multiprocessor Kernel Performance Profiling
Alex [email protected]
Computer Sciences Department
University of Wisconsin
1210 W. Dayton Street
Madison, WI 53706-1685
USA
![Page 2: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/2.jpg)
– 2 –Kperfmon-MP March 12, 2001
Kperfmon: Overview
• Specify a resource – Almost any function or basic block in the kernel
• Apply a metric to the resource:– Number of entries to a function or basic block– Wall clock time, CPU time (virtual time)– All Sparc Hardware Counters: cache misses,
branch mispredictions, instructions per cycle, ...
• Visualize the metric data in real time
![Page 3: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/3.jpg)
– 3 –Kperfmon-MP March 12, 2001
Kperfmon-MP: Goals
Modify uniprocessor Kperfmon to provide:
• Safe operation on SMP machines– Thread safety– Migration safety
• New feature: Per-CPU performance data– More detailed performance data– Reduce cache coherence traffic caused by the tool
![Page 4: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/4.jpg)
– 4 –Kperfmon-MP March 12, 2001
Kperfmon: Technology
• No need for kernel recompilation– Works with stock SPARC Solaris 7 kernels– Supports both 32-bit and 64-bit kernels
• No need for rebooting– Important for 24 x 7 systems
• Use the KernInst framework to:– Insert measurement code in the kernel at run time– Sample accumulated metric values from the user
space periodically
![Page 5: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/5.jpg)
– 5 –Kperfmon-MP March 12, 2001
Patch HeapPatch Heap Data HeapData Heap
Kernel SpaceKernel Space
InstrumentationrequestInstrumentationrequest
ioctl()ioctl()
/dev/kerninst/dev/kerninst
KperfmonKperfmonKperfmonKperfmon
Kperfmon System
KerninstdKerninstdKerninstdKerninstd
SamplingrequestSamplingrequest
VisisVisisVisisVisis
![Page 6: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/6.jpg)
– 6 –Kperfmon-MP March 12, 2001
Kperfmon instrumentation
• Counter primitive– Number of entries to a function or a basic block
• Wall clock timer primitive– Real time spent in a function
• CPU timer primitive– Excludes time while the thread was switched-out– Can count more than just timer ticks
• All HW-counter metrics use this mechanism
![Page 7: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/7.jpg)
– 7 –Kperfmon-MP March 12, 2001
tcp_lookup()
cnt
sethi hi(&cnt), r0 or r0, lo(&cnt), r0 ldx [r0], r1retry: add r1, 1, r2 casx [r0], r1, r2 cmp r1, r2 bne retry mov r2, r1 nop ba,a tcp_lookup+4
Code Patch AreaCode Patch Area
Data AreaData Area(entry)
Non-MP Counter primitive
• Atomic, thread-safe update
• Lightweight
• No register save/restore required
Relocated instruction
![Page 8: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/8.jpg)
– 8 –Kperfmon-MP March 12, 2001
tcp_lookup()
stop timer
start timer
Code Patch AreaCode Patch Area
Data AreaData Area(entry)
(exit)
Non-MP Wall clock timer primitive
• Inclusive (includes time in callees)
• Keeps accumulating if switched-out
timer
![Page 9: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/9.jpg)
– 9 –Kperfmon-MP March 12, 2001
tcp_lookup()tcp_lookup()
switch-out
switch-in
(entry)
(exit)
Non-MP CPU timer primitive
• Exclude the time spent while switched out
– Instrument context switch routines
• HW counter metrics are based on this mechanism
stop timer
start timer
pause timer
restart timer
context switchcontext switch List of paused timershead
free free
![Page 10: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/10.jpg)
– 10 –Kperfmon-MP March 12, 2001
tcp_lookup()tcp_lookup()
switch-out
switch-in
(entry)
(exit)
stop timer
start timer
pause timer
restart timer
context switchcontext switch List of paused timershead
free free
Non-MP CPU timer primitive
• Exclude the time spent while switched out
– Instrument context switch routines
• HW counter metrics are based on this mechanism
![Page 11: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/11.jpg)
– 11 –Kperfmon-MP March 12, 2001
tcp_lookup()tcp_lookup()
switch-out
switch-in
(entry)
(exit)
stop timer
start timer
pause timer
restart timer
context switchcontext switch List of paused timershead
free free
Non-MP CPU timer primitive
• Exclude the time spent while switched out
– Instrument context switch routines
• HW counter metrics are based on this mechanism
![Page 12: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/12.jpg)
– 12 –Kperfmon-MP March 12, 2001
tcp_lookup()tcp_lookup()
switch-out
switch-in
(entry)
(exit)
stop timer
start timer
pause timer
restart timer
context switchcontext switch List of paused timershead
freetmr
Non-MP CPU timer primitive
• Exclude the time spent while switched out
– Instrument context switch routines
• HW counter metrics are based on this mechanism
![Page 13: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/13.jpg)
– 13 –Kperfmon-MP March 12, 2001
tcp_lookup()tcp_lookup()
switch-out
switch-in
(entry)
(exit)
stop timer
start timer
pause timer
restart timer
context switchcontext switchhead
free
List of paused timers
tmr
Non-MP CPU timer primitive
• Exclude the time spent while switched out
– Instrument context switch routines
• HW counter metrics are based on this mechanism
![Page 14: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/14.jpg)
– 14 –Kperfmon-MP March 12, 2001
tcp_lookup()tcp_lookup()
switch-out
switch-in
(entry)
(exit)
stop timer
start timer
pause timer
restart timer
context switchcontext switchhead
free free
List of paused timers
Non-MP CPU timer primitive
• Exclude the time spent while switched out
– Instrument context switch routines
• HW counter metrics are based on this mechanism
![Page 15: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/15.jpg)
– 15 –Kperfmon-MP March 12, 2001
tcp_lookup()tcp_lookup()
switch-out
switch-in
(entry)
(exit)
stop timer
start timer
pause timer
restart timer
context switchcontext switchhead
free free
List of paused timers
Non-MP CPU timer primitive
• Exclude the time spent while switched out
– Instrument context switch routines
• HW counter metrics are based on this mechanism
![Page 16: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/16.jpg)
– 16 –Kperfmon-MP March 12, 2001
tcp_lookup()tcp_lookup()
switch-out
switch-in
(entry)
(exit)
stop timer
start timer
pause timer
restart timer
context switchcontext switchhead
free free
List of paused timers
Non-MP CPU timer primitive
• Exclude the time spent while switched out
– Instrument context switch routines
• HW counter metrics are based on this mechanism
![Page 17: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/17.jpg)
– 17 –Kperfmon-MP March 12, 2001
Kperfmon-MP: Goals
Modify uniprocessor Kperfmon to provide:
• Safe operation on SMP machines– Thread safety– Migration safety
• New feature: Per-CPU performance data– More detailed performance data– Reduce cache coherence traffic caused by the tool
![Page 18: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/18.jpg)
– 18 –Kperfmon-MP March 12, 2001
ld [head], R1
add R1, 4, R1
st R1, [head]
Non-MP timer allocation routine
Thread Safety
• Used on switch-out to save the paused timers
• Context switch is serial on uniprocessors– No thread safety problems there
• Context switches may be concurrent on SMPs!– Multiple threads are being scheduled simultaneously– The allocation code is no longer safe
head
freetmr free free
![Page 19: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/19.jpg)
– 19 –Kperfmon-MP March 12, 2001
MP timer allocation routine
Thread Safety
• Context switches may be concurrent on SMPs
• Use the atomic cas instruction to ensure safety
alloc:
ld [head], R1
add R1, 4, R2
cas [head], R1, R2
cmp R1, R2
bne alloc
head
freetmr free free
![Page 20: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/20.jpg)
– 20 –Kperfmon-MP March 12, 2001
tcp_lookup()tcp_lookup()cnt-cpu0…
rd cpu#, r0ldx cnt[r0], r1add r1, 1, r2casx r2, cnt[r0]…
Code Patch AreaCode Patch Area Data AreaData Area
(entry)
Per-CPU performance data
• Instrumentation code is shared by all CPUs
• Per-CPU copies of the primitive’s data– Two copies are never placed in the same cache line
cnt-cpu1
cnt-cpu31
![Page 21: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/21.jpg)
– 21 –Kperfmon-MP March 12, 2001
timer-cpu0
Data AreaData Area
timer-cpu1
Migration Between Primitives
• Wall timer started on CPU0, stopped on CPU1
• Counters and CPU timers are not affected
tcp_lookup()tcp_lookup()
switch-out
switch-in
(entry)
(exit)
stop timer
start timer
context switchcontext switch
CPU0
![Page 22: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/22.jpg)
– 22 –Kperfmon-MP March 12, 2001
timer-cpu0
Data AreaData Area
timer-cpu1
Migration Between Primitives
• Wall timer started on CPU0, stopped on CPU1
• Counters and CPU timers are not affected
tcp_lookup()tcp_lookup()
switch-out
switch-in
(entry)
(exit)
stop timer
start timer
context switchcontext switch
CPU0
![Page 23: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/23.jpg)
– 23 –Kperfmon-MP March 12, 2001
timer-cpu0
Data AreaData Area
timer-cpu1
Migration Between Primitives
• Wall timer started on CPU0, stopped on CPU1
• Counters and CPU timers are not affected
tcp_lookup()tcp_lookup()
switch-out
switch-in
(entry)
(exit)
stop timer
start timer
context switchcontext switch
CPU0
![Page 24: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/24.jpg)
– 24 –Kperfmon-MP March 12, 2001
timer-cpu0
Data AreaData Area
timer-cpu1
Migration Between Primitives
• Wall timer started on CPU0, stopped on CPU1
• Counters and CPU timers are not affected
tcp_lookup()tcp_lookup()
switch-out
switch-in
(entry)
(exit)
stop timer
start timer
context switchcontext switch
CPU0
CPU1
![Page 25: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/25.jpg)
– 25 –Kperfmon-MP March 12, 2001
timer-cpu0
Data AreaData Area
timer-cpu1
Migration Between Primitives
• Wall timer started on CPU0, stopped on CPU1
• Counters and CPU timers are not affected
tcp_lookup()tcp_lookup()
switch-out
switch-in
(entry)
(exit)
stop timer
start timer
context switchcontext switch
CPU0
CPU1
![Page 26: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/26.jpg)
– 26 –Kperfmon-MP March 12, 2001
timer-cpu0
Data AreaData Area
timer-cpu1
Migration Between Primitives
• Wall timer started on CPU0, stopped on CPU1
• Counters and CPU timers are not affected
tcp_lookup()tcp_lookup()
switch-out
switch-in
(entry)
(exit)
stop timer
start timer
context switchcontext switch
CPU0
CPU1
![Page 27: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/27.jpg)
– 27 –Kperfmon-MP March 12, 2001
Solution: virtualization
• Implement wall timers on top of CPU timers!
Data AreaData Area
tcp_lookup()tcp_lookup()
(entry)
(exit)
start timerCPU0
timer-cpu0
timer-cpu1
![Page 28: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/28.jpg)
– 28 –Kperfmon-MP March 12, 2001
Solution: virtualization
• Implement wall timers on top of CPU timers!
Data AreaData Area
tcp_lookup()tcp_lookup()
(entry)
(exit)
start timerCPU0
timer-cpu0
timer-cpu1
![Page 29: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/29.jpg)
– 29 –Kperfmon-MP March 12, 2001
Solution: virtualization
• Implement wall timers on top of CPU timers!
Data AreaData Area
tcp_lookup()tcp_lookup()
switch-out
(entry)
(exit)
start timer
context switchcontext switch
CPU0
timer-cpu0
timer-cpu1
pause timerrecord curr.
time
![Page 30: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/30.jpg)
– 30 –Kperfmon-MP March 12, 2001
Solution: virtualization
• Implement wall timers on top of CPU timers!
Data AreaData Area
timer-cpu1
tcp_lookup()tcp_lookup()
switch-out
switch-in
(entry)
(exit)
start timer
context switchcontext switch
CPU0
CPU1
timer-cpu0
pause timerrecord curr.
time
add timeswitched-outrestart timer
![Page 31: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/31.jpg)
– 31 –Kperfmon-MP March 12, 2001
Solution: virtualization
• Implement wall timers on top of CPU timers!
Data AreaData Area
timer-cpu1
tcp_lookup()tcp_lookup()
switch-out
switch-in
(entry)
(exit)
stop timer
start timer
context switchcontext switch
CPU0
CPU1
timer-cpu0
add timeswitched-outrestart timer
pause timerrecord curr.
time
![Page 32: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/32.jpg)
– 32 –Kperfmon-MP March 12, 2001
Solution: virtualization
• Implement wall timers on top of CPU timers!
Data AreaData Area
timer-cpu1
tcp_lookup()tcp_lookup()
switch-out
switch-in
(entry)
(exit)
stop timer
start timer
context switchcontext switch
CPU0
CPU1
timer-cpu0
add timeswitched-outrestart timer
pause timerrecord curr.
time
![Page 33: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/33.jpg)
![Page 34: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/34.jpg)
![Page 35: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/35.jpg)
![Page 36: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/36.jpg)
![Page 37: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/37.jpg)
![Page 38: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/38.jpg)
– 38 –Kperfmon-MP March 12, 2001
Conclusion
• Techniques for correct MP profiling:– Atomic memory updates to ensure thread safety– Virtualized timers to handle thread migration
• Per-CPU data collection is important– Provides detailed performance information– Introduces fewer coherence cache misses
![Page 39: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/39.jpg)
– 39 –Kperfmon-MP March 12, 2001
Future Work
• New metrics– Locality of CPU assignments– Per-thread performance data
• Formal verification of instrumentation code for migration/preemption problems
• Ports to other architectures and OS’es
![Page 40: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/40.jpg)
– 40 –Kperfmon-MP March 12, 2001
http://www.cs.wisc.edu/paradynhttp://www.cs.wisc.edu/paradynhttp://www.cs.wisc.edu/paradynhttp://www.cs.wisc.edu/paradyn
The Big Picture
![Page 41: Multiprocessor Kernel Performance Profiling](https://reader036.vdocument.in/reader036/viewer/2022062810/56815b1b550346895dc8cadf/html5/thumbnails/41.jpg)
– 41 –Kperfmon-MP March 12, 2001
The Big Picture
• Demo: Wednesday, Room 6372Demo: Wednesday, Room 6372
• Available for download on requestAvailable for download on request– mailto: [email protected]: [email protected]
– Public release in AprilPublic release in April