k t a u kernel tuning and analysis utilities
DESCRIPTION
K T A U Kernel Tuning and Analysis Utilities. Department of Computer and Information Science Performance Research Laboratory University of Oregon. Agenda. Motivations KTAU Overview ZeptoOS - KTAU - TAU on BG/L KTAU - TAU on Linux Cluster. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/1.jpg)
K T A UKernel Tuning and Analysis Utilities
Department of Computer and Information Science
Performance Research Laboratory
University of Oregon
![Page 2: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/2.jpg)
University of Oregon Performance Research Lab
Agenda
• Motivations
• KTAU Overview
• ZeptoOS - KTAU - TAU on BG/L
• KTAU - TAU on Linux Cluster
![Page 3: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/3.jpg)
University of Oregon Performance Research Lab
What is a process is doing inside a kernel?
Solution:
Context-of-Execution Based profile/trace
We can analyze the execution path of a process, and store the data local to a process.
![Page 4: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/4.jpg)
University of Oregon Performance Research Lab
What about other processes on the system?
Solution:
System-wide performance analysis
By aggregating performance of each process in the system (all or selectively), we can capture interactions among processes.
![Page 5: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/5.jpg)
University of Oregon Performance Research Lab
Profiling or Tracing?
Answer:
Why not doing both?
• Profile• A summarized view of performance data, with the
advantage of compact data size.
• Trace• A detail view of process execution timeline, with a
disadvantage of large data size.
![Page 6: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/6.jpg)
University of Oregon Performance Research Lab
Why do we need another kernel profiling/tracing tool?
Answer:
Why not?
• LTT• Oprofile• KernInst
![Page 7: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/7.jpg)
University of Oregon Performance Research Lab
KTAU Design Goals
• Fine-grained kernel-level performance measurement
– Parallel applications
– Support both profiling and tracing
• Both process-centric and system-wide view
• Merge user-space performance with kernel-space
• Detailed program-OS interaction data
• Analysis and visualization compatible with existing tools
![Page 8: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/8.jpg)
University of Oregon Performance Research Lab
KTAU Method• Instruments Linux kernel source with KTAU profiling
API
• Maintains performance data for each kernel routine (per process)
• Performance data accessible via /proc filesystem
• Instrumented application maintains data in user-space
• Post-execution performance data analysis
![Page 9: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/9.jpg)
University of Oregon Performance Research Lab
KTAU
Framework
![Page 10: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/10.jpg)
University of Oregon Performance Research Lab
KTAU Architecture
5 modules
- KTAU Instrumentation
- KTAU Profiling/Tracing Infrastructure
- KTAU Proc Interface
- KTAU User-API Library
- KTAU-D
![Page 11: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/11.jpg)
University of Oregon Performance Research Lab
Kernel Profiling Issues on BG/L
• I/O node kernel• Linux kernel approach
• Compute node kernel• No daemon processes• Single address space
– single performance database– single callstack across user/kernel
• Keeps track of one process only (optimization)• Instrumented compute node kernel
![Page 12: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/12.jpg)
University of Oregon Performance Research Lab
KTAU on BG/L I/O Node
. . . . . . . . C N 2. . . . . . . .
. . . . . . . . C N 3. . . . . . . ..
…32 Compute Nodes….
. . . . . . . . C N 31. . . . . . . .
. . . . . . . . C N 32. . . . . . . ...
BG/L IO-Node
BG/L Compute-Node
ZeptoOS IO-N Kernel
KTAU
User-space + ZeptoOS RamDisk
IBM’sCIOD KTAU-D
IBM Compute-N Kernel
User-space
Compute Job w/ TAU
. . . . . . . . C N 2. . . . . . . .
. . . . . . . . C N 3. . . . . . . ..
…32 Compute Nodes….
. . . . . . . . C N 31. . . . . . . .
. . . . . . . . C N 32. . . . . . . ...
BG/L IO-Node
BG/L Compute-Node
ZeptoOS IO-N Kernel
KTAU
User-space + ZeptoOS RamDisk
IBM’sCIOD KTAU-D
IBM Compute-N Kernel
User-space
Compute Job w/ TAU
![Page 13: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/13.jpg)
University of Oregon Performance Research Lab
KTAU on BG/L
• Current status– IO Node ZeptoOS kernel profiling/tracing
– KTAU integrated into ZeptoOS build system
– Detailed IO Node kernel observation now possible
– KTAU-Daemon (KTAU-D) on IO Node• monitors system-wide and individual process• more than what strace allows
– Visualization of trace/profile of ZeptoOS and CIOD• Vampir/JumpShot (trace), and Paraprof (profile),
![Page 14: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/14.jpg)
University of Oregon Performance Research Lab
KTAU Usage Models for BG/L IO-Node
• Daemon-based monitoring (KTAU-D)– Use KTAU-D to monitor (profile/trace) a single process (e.g.,
CIOD) or entire IO-Node kernel– No access to source code of user-space program– CIOD kernel-activity available though CIOD source N/A
• ‘Self’ monitoring– A user-space program can be instrumented (e.g., with TAU)
to access its OWN kernel-level trace/profile data– ZIOD (ZeptoOS IO-D) source (when available) can be
instrumented– Can produce MERGED user-kernel trace/profile
![Page 15: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/15.jpg)
University of Oregon Performance Research Lab
More on KTAU-D
• A daemon running on BG/L IO-node that periodically accesses kernel profile/trace data and outputs to filesystem
• Configuration done through ZeptoOS configuration tool
• KTAU-D, configuration file, and necessary scripts are integrated into the ZeptoOS runtime environment.
![Page 16: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/16.jpg)
University of Oregon Performance Research Lab
KTAU-D Configuration in ZeptoOS-1.2
![Page 17: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/17.jpg)
University of Oregon Performance Research Lab
KTAU-D Profile Data• KTAU-D can be used to access profile data (system-
wide and individual process) of BGL IO-Node
• Data is obtained at the start and stop of KTAUD, and then the resulting profile is generated
• Currrently flat profiles with inclusive/exclusive times and Function call counts are produced– (Future work: Call-graph profiles).
• Profile data is viewed using the ParaProf visualization tool
![Page 18: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/18.jpg)
University of Oregon Performance Research Lab
Example of Operating System Profile on I/O Nodes
Running Flash3 on 32 compute-node
Ciod KernelProfile
![Page 19: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/19.jpg)
University of Oregon Performance Research Lab
KTAU-D Trace
• KTAU-D can be used to access system-wide and individual process trace data of BGL IO-Node
• Trace from KTAU-D is converted into TAU trace-format which then can be converted into other formats– Vampir, Jumpshot
• Trace from KTAU-D can be used together (merged) with trace from TAU to monitor both user and kernel space activities– (Work in progress)
![Page 20: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/20.jpg)
University of Oregon Performance Research Lab
Exp 1: Observe activities on the IO node
Set up:– KTAU:
• Enable all instrumentation points• Number of kernel trace entries per process = 10K
– KTAU-D:• System-wide tracing• Accessing trace every 1 second and dump trace output
to a file in user’s home directory through NFS
– IOTEST:• An mpi-based benchmark (open/write/read/close)• Running with default parameters (block-size = 16MB) on
NFS.
![Page 21: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/21.jpg)
University of Oregon Performance Research Lab
Read Time
Write Time
Write Seek Time
Read Seek Time
Main
IOTESTwith TAU
instrumentation
![Page 22: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/22.jpg)
University of Oregon Performance Research Lab
sys_write() / sys_read()
KTAU Trace of CIOD running 2, 4, 8, 16, 32 nodes
As the number of compute node increase, CIOD has to handle larger amount of sys_call
being forwarded.
![Page 23: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/23.jpg)
University of Oregon Performance Research Lab
Zoomed View of CIOD Trace (8 compute nodes)
![Page 24: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/24.jpg)
University of Oregon Performance Research Lab
Can Correlate CIOD Activity with RPC-IOD?
• Activity within a BG/L ionode system switching from “CIOD” to “rpciod” during a “sys_write” system call
• rpciod performs “socket_send” and interrupt handling before switching back
rpciod
ciod
![Page 25: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/25.jpg)
University of Oregon Performance Research Lab
Exp 2: Correlating multiple traces from Compute-node and IO-node
• Set up:– Running IOTEST with TAU instrumentation on 64
compute nodes– Running ZeptoOS-1.2 with KTAU on 2 io-node– Reduced set of kernel instrumentation.
• No TCP stack and schedule()
– 10K entries of ring-trace buffer– Using PVFS2
(Note: Trace of 64 compute-node and 2 io-node)
![Page 26: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/26.jpg)
University of Oregon Performance Research Lab
read() @ 12:678 sec
write() @ 3:283 sec
TAU Trace
![Page 27: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/27.jpg)
University of Oregon Performance Research Lab
sys_open() @ 53:1 sys_read() @1:05:545sys_write() @ 56:6
sys_open() @ 53:2 sys_write() @ 56:85 sys_read() @ 1:05:778
ciod on ionode23
ciod on ionode47
pvfs2-client on ionode23
pvfs2-client on ionode47
![Page 28: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/28.jpg)
University of Oregon Performance Research Lab
Exp 3: Analyze system-wide performance
• Set up:– 2 runs of IOTEST with TAU instrumentation on 32
compute nodes• NFS• PVFS
– Running ZeptoOS-1.2 with KTAU on 1 io-node– Analyzing both profile and trace data
![Page 29: K T A U Kernel Tuning and Analysis Utilities](https://reader036.vdocument.in/reader036/viewer/2022062408/568134a6550346895d9baf83/html5/thumbnails/29.jpg)
University of Oregon Performance Research Lab
write() @ 39:00 read() @ 47.804
write() @ 42:99 read() @ 54:61
pvfs2-client
ciod
rpciod
ciod