characterizing nas benchmark performance on shared heterogeneous networks
DESCRIPTION
Characterizing NAS Benchmark Performance on Shared Heterogeneous Networks. Jaspal Subhlok Shreenivasa Venkataramaiah Amitoj Singh University of Houston Heterogeneous Computing Workshop, April 15, 2002. Mapping/Adapting Distributed Applications on Networks. Model. Data. Sim 2. Vis. - PowerPoint PPT PresentationTRANSCRIPT
Rice01, slide 1
Characterizing NAS Benchmark Performance on Shared Heterogeneous Networks
Jaspal Subhlok
Shreenivasa Venkataramaiah
Amitoj Singh
University of Houston
Heterogeneous Computing Workshop, April 15, 2002
Rice01, slide 2
Mapping/Adapting Distributed Applications on Networks
Data
Sim 1
VisSim 2
Stream
Model
Pre
?Application Network
Rice01, slide 3
Automatic node selection
m-6
m-5
m-4
m-7
m-1 m-2 m-3
Congested route
Compute nodesRouters
m-8
Busynodes
selected nodes
Select 4 nodes for execution : Choice is easy
Rice01, slide 4
Automatic node selection
m-6
m-5
m-4
m-7
m-1 m-2 m-3
Congested route
Compute nodesRouters
m-8
Busynodes
selected nodes
Select 5 nodes: choice depends on application
Rice01, slide 5
Mapping/Adapting Distributed Applications on Networks
Data
Sim 1
VisSim 2
Stream
Model
Pre
?Application Network
1) Discover application characteristics and model performance in a shared heterogeneous environment
2) Discover network structure and available resources (e.g., NWS, REMOS)
3) Algorithms to map/remap applications to networks
Rice01, slide 6
Methodology for Building Application Performance Signature
Performance signature = model to predict application execution time under given network conditions
1. Execute the application on a controlled testbed
2. Measure system level activity during execution– such as CPU, communication and memory usage
3. Analyze and discover program level activity (message sizes, sequences, synchronization waits)
4. Develop a performance signature
• No access to source code/libraries assumed
Rice01, slide 7
Discovering application characteristics
500MHz Pentium Duos
ethernet switch(crossbar)
100 Mbps links
ExecutableApplication
Code
Benchmarking on a controlled
testbed and analysis
Model as aPerformance
Signature
• capture patterns of CPU loads and traffic during execution
Rice01, slide 8
Results in this paper
ExecutableApplication
Code
Benchmarking on a controlled
testbed
Measure performance with resource
sharing
Demonstrate that measured resource usage on a testbed is a good predictor of performance on a shared network for NAS benchmarks
500MHz Pentium Duos
ethernet switch(crossbar)100 Mbps
links
• capture patterns of CPU loads and traffic during execution
Rice01, slide 9
Experiment Procedure
• Resource utilization of NAS benchmarks measured on a dedicated testbed– CPU probes based on “top” and “vmstat” utility
– Bandwidth using “iptraf”, “tcpdump”, SNMP queries
• Performance of NAS benchmark measured with competing loads and limited bandwidth– Employ dummynet and NISTnet to limit bandwidth
• All measurements presented are on 500MHz Pentium Duos, 100 Mbps network, TCP/IP, FreeBSD
• All results on Class A, MPI, NAS Benchmarks
Rice01, slide 10
Discovered Communication Structure of NAS Benchmarks
0 1
32
BT
0 1
32
CG
0 1
3
IS
0 1
32
EP
0 1
32
LU
0 1
32
MG
0 1
32
SP
2
Rice01, slide 11
Performance with competing computation loads
0
20
40
60
80
100
120
140
EP BT CG IS LU MG SP
Per
cen
tag
e in
crea
se in
exe
cutio
n ti
me
All nodes are loaded
Most busy node loaded
Least busy node loaded
• Increase beyond 50% due to lack of coordinated (gang) scheduling and synchronization
• Correlation between low CPU utilization and smaller increase in execution time (e.g. MG shows only ~60% CPU utilization)
• Execution time is lower if least busy node has a competing load (20% difference in the busyness level for CG)
Rice01, slide 12
Performance with Limited Bandwidth (reduced from 100 to 10Mbps) on one link
0
20
40
60
80
100
120
140
CG IS MG SP BT LU EP
Pe
rce
nta
ge
in
cre
as
e i
n
ex
ec
uti
on
tim
e
0
2
4
6
8
10
12
14
16
Lin
k n
etw
ork
tra
ffic
(M
bp
s)
Close correlation between link utilization and performance with a shared or slow link
Rice01, slide 13
Performance with Limited Bandwidth (reduced from 100 to 10 Mbps) on all
links
0
50
100
150
200
250
300
350
400
450
500
IS CG SP MG BT LU EP
Pe
rce
nta
ge
in
cre
as
e i
n
ex
ec
uti
on
tim
e
0
10
20
30
40
50
60
70
80
To
tal
ne
two
rk t
raff
ic (
Mb
ps
)
Close correlation between total network traffic and performance with all shared or slow links
Rice01, slide 14
Results and Conclusions (not the last slide)
• Computation and communication patterns can be captured by passive, near non-intrusive, monitoring
• Benchmarked resource usage pattern is a strong indicator of performance with sharing– strong correlation between application traffic and
performance with low bandwidth links– CPU utilization during normal execution a good
indicator of performance with node sharing
Synchronization and timing effects were not dominant for NAS Benchnmarks
Rice01, slide 15
Discussion and Ongoing Work (the last slide)
• Capture application level data exchange pattern from network probes (e.g. MPI message sequence, sizes)– slowdown different for different message sizes
• Infer the main synchronization/waiting patterns– Impact of unbalanced execution and lack of gang
scheduling• Capture impact of CPU scheduling policy for
accurate prediction with sharing– Policies try to compensate for waits
Goal is to build a quantitative “performance signature” to estimate execution time under any given network conditions, and use it in a resource management prototype system