measuring and characterizing hpc applications and...
TRANSCRIPT
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Measuring and Characterizing HPC applicationsand CPU/GPU simulation
Georges Da CostaIRIT, Toulouse University
GreenDays@Lyon, 2012
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Plan
1 Context
2 Passive gathering of information
3 Experiments
4 Integrated behavioral dvfs method
5 HydrasimHydrasim simulatorExample
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Context
To optimize a computing center:
Gather insight on running applications
Choose how to act
depends on applicationMore precise : phase of application
Act (change frequency, switch on/off parts of nodes,...)
Optimize : reduce energy consumption at the same performance
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Application modification
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Remark on monitoring: Choose the right tool
90
100
110
120
130
140
150
160
170
180
190
0 200 400 600 800 1000 1200 1400 1600
NR
G (
Watt)
Time (seconds)
NRG@wn14
racadmipmitool
Dell cluster installed in 2009
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Ignorance is bliss, really?
*-AAS (PAAS, IAAS,...) leads to ignorance
Ignorance leads to errors
Errors lead to inefficiency
Focus :
How to optimize a computing center while knowing nothing?
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Know your enemy
What we know
HPC applicationsGoal: Save energyNo impact on performance (SLA,...)
Name your weapon (constraints)
Minimum impact of monitoringClosed application, no sourceEven full OS freedom (Grid’5000, VMs)
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Plan
1 Context
2 Passive gathering of information
3 Experiments
4 Integrated behavioral dvfs method
5 HydrasimHydrasim simulatorExample
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Who’s Who
Which application is running?Ask the developer, but
Depends on libraryCan cheat (if accounting is related to it)Computers work for us, not the opposite
Application is not important, its behavior is!
Two different apps can have the same impactSystem changes can have the same impact
We need Run-Time Behavioral Detection
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
BigBrother is watching
Run-time detection
Behavioral pattern
Extract information
Fine grained : performance counters, system valuesCoarse grained : network, disk, power consumption
Classical remark: impact of the monitoring infrastructure
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Finding patterns, NPB example, CG
0
2e+07
4e+07
6e+07
8e+07
1e+08
1.2e+08
0 10 20 30 40 50 60 70 80
Num
ber
of byte
s s
ent per
second
time (s)Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Finding patterns, NPB example, FT
0
2e+07
4e+07
6e+07
8e+07
1e+08
1.2e+08
0 20 40 60 80 100 120 140 160 180
Num
ber
of byte
s s
ent per
second
time (s)Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Finding patterns, NPB example, SP
0
2e+07
4e+07
6e+07
8e+07
1e+08
1.2e+08
0 50 100 150 200 250 300 350
Num
ber
of byte
s s
ent per
second
time (s)Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Model creation
First create a model
Run and monitor reference applications
Cluster subset of characteristic
Choose the most best subset
The most discriminatingThe one with less impactThe one with best stability
Using a clear metric reduces bias
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Use your creation
Simple to use the model once it is created:
Step 1: Measure some characteristics
Step 2: Compare to reference
Step 3: Categorize application (or phase)
Low impact method:
Low computing cost
Network and power characteristics have low monitoring costs
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Plan
1 Context
2 Passive gathering of information
3 Experiments
4 Integrated behavioral dvfs method
5 HydrasimHydrasim simulatorExample
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Configuration
Usecase : Nas Parallel Benchmark
Different type of workload
Representative of HPC applications
Seven benchmarks
Embarrassingly parallelCommunication intensive
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Monitoring infrastructure
Performance Counters
Standard Linux perfcounters (>2.6.31)1 measure/second/core/perfcounter
Network IO
Inbound and outbound packets and bytes1 measure/second/host
Disk IO
Read and write1 measure/second/host
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Post-measure processing
Instantaneous measures fluctuate widely
Need for low cost post-processing
Small window processing
To react to application phaseLow memory/processing cost of processing (ie no FFT)
Simple statistical processing:
Mean, median, standard deviation, decile
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Characteristic choice
Characteristics relevance depends heavily on applications
For HPC application
Disk is of no use
Disk IO only at beginning and endCan change on low memory condition (swaping)
Perfcounters are expected to be relevant
Network and power also
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Perfcounters: unexpectedly bad
One of the best: HW BRANCH MISS (with power on x-axis)
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Network and power
Best combination: Median number of packets and Mean Power
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Plan
1 Context
2 Passive gathering of information
3 Experiments
4 Integrated behavioral dvfs method
5 HydrasimHydrasim simulatorExample
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Energy efficient dvfs
Using HPC application detection
Categorize application (or phase)
Apply a rule-based algorithm
Change processor speed to min or max
Can take into account several objectives depending on rules
Energy only
Energy with taking into account performance
...
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
With more control comes more efficiency
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Conclusion on application profiling
Application characterization is possible with no impact!
It leads to optimize resources usages and reduce energy (J)
Still much to do
Improve statistical post-processing
Improve reactivity (reduce window)
Create a kernel governor
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Plan
1 Context
2 Passive gathering of information
3 Experiments
4 Integrated behavioral dvfs method
5 HydrasimHydrasim simulatorExample
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
GPU are efficient for some algorithms
Fast
Energy-efficient
The system uses 7,168 NVIDIA TeslaM2050 GPUs and 14,336 CPUs; it wouldrequire more than 50,000 CPUs and twice
as much floor space to deliver the sameperformance using CPUs alone (Nvidia)
With only CPU : 12 megawattsHybrid version : 4.04 megawatts
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Runtime vs Placement
Where to run a task ? Two possibilities:
Runtime
StarPU+ Low-impact, reactive- Can be far from optimal
Placement
- Need a-priori information- Cannot adapt+ Can be optimal
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Hydrasim simulator
Hydrasim
Hydrasim: Tool to evaluate placement algorithm
Realistic hardware
Number of processors and GPUEnergy and performance of hardwareSeveral policies (more later)
Output
MakespanEnergy (for CPU, GPU and both)
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Hydrasim simulator
Task Model
DAG of tasks
Tagged dot file
Tasks: time on CPU and GPU
Communications: time of
CPU→CPUCPU→GPUGPU→CPUGPU→GPU
6/7
6/1
0/5/5/0
9/2
0/1/1/0
8/6
0/3/3/0 4/7
0/2/2/0
9/5
0/4/4/0
0/3/3/0 0/5/5/0
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Hydrasim simulator
Allocation Model
Colored dot file
Blue processor
Red GPU
6/7
6/1
0/5/5/0
9/2
0/1/1/0
8/6
0/3/3/0 4/7
0/2/2/0
9/5
0/4/4/0
0/3/3/0 0/5/5/0
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Hydrasim simulator
Complete performance evaluation environment
Starting point depends on the problem
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Hydrasim simulator
Simulator zoom
Simulator needs
Number of CPU/GPU
Characteristics of CPU/GPU
Runtime policy to choose jobs
Fifo, Random, FastestMax connections
Runtime policy for the bus
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Example
Comparison of schedulers
Let’s take several schedulers
All CPU
All GPU
Fastest
EE (most energy efficient)
Rand
Spgti
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Example
Environment
Applications
Generated using a random strategy10 to 100 tasks
Hardware
CPU Intel Xeon E5540 and GPU Nvidia Tesla C1060.1 to 16 CPU and 1 to 16 GPU
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Example
Direct results
0.994
0.996
0.998
1
1.002
1.004
1.006
1.008
10 20 30 50 40 60 70 90 80 100
Ene
rgie
Number of tasks
smallestfirstrandom
maxconnecfirstfifo
0.994
0.996
0.998
1
1.002
1.004
1.006
1.008
1.01
10 20 30 50 40 60 70 90 80 100
Mak
espa
nNumber of tasks
smallestfirstrandom
maxconnecfirstfifo
Energy and Makespan are not impacted by runtime
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Example
Direct results (cont)
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
10 20 30 50 40 60 70 90 80 100
Mak
espa
n
Number of tasks
all_gpurandspgti
all_cpufastest
ee
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
10 20 30 50 40 60 70 90 80 100
Ene
rgie
Number of tasks
all_gpurandspgti
all_cpufastest
ee
They are by scheduler
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Example
Optimize resource number
0 2
4 6
8 10
12 14
16
0 2
4 6
8 10
12 14
16
200 250 300 350 400 450 500 550 600 650 700
Makespan (s)
CPU
GPU
Makespan (s)
0 2
4 6
8 10
12 14
16
0 2
4 6
8 10
12 14
16
50000 100000 150000 200000 250000 300000 350000 400000 450000
Energy (J)
CPU
GPU
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim
Example
Conclusion
Using Hydrasim you can
Test several hardware configuration
For free !
Obtain the optimal number of resources
Compare algorithms
We are interested in
Use-cases
Feedback on the simulator
http://hydrasim.sourceforge.net/
C++/Discrete event simulator/GPL
Georges Da Costa IRIT, Toulouse University
Measuring and Characterizing HPC applications and CPU/GPU simulation