measuring and characterizing hpc applications and...

41
Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim Measuring and Characterizing HPC applications and CPU/GPU simulation Georges Da Costa IRIT, Toulouse University GreenDays@Lyon, 2012 Georges Da Costa IRIT, Toulouse University Measuring and Characterizing HPC applications and CPU/GPU simulation

Upload: lamquynh

Post on 14-Sep-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Measuring and Characterizing HPC applicationsand CPU/GPU simulation

Georges Da CostaIRIT, Toulouse University

GreenDays@Lyon, 2012

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Plan

1 Context

2 Passive gathering of information

3 Experiments

4 Integrated behavioral dvfs method

5 HydrasimHydrasim simulatorExample

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Context

To optimize a computing center:

Gather insight on running applications

Choose how to act

depends on applicationMore precise : phase of application

Act (change frequency, switch on/off parts of nodes,...)

Optimize : reduce energy consumption at the same performance

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Application modification

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Remark on monitoring: Choose the right tool

90

100

110

120

130

140

150

160

170

180

190

0 200 400 600 800 1000 1200 1400 1600

NR

G (

Watt)

Time (seconds)

NRG@wn14

racadmipmitool

Dell cluster installed in 2009

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Ignorance is bliss, really?

*-AAS (PAAS, IAAS,...) leads to ignorance

Ignorance leads to errors

Errors lead to inefficiency

Focus :

How to optimize a computing center while knowing nothing?

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Know your enemy

What we know

HPC applicationsGoal: Save energyNo impact on performance (SLA,...)

Name your weapon (constraints)

Minimum impact of monitoringClosed application, no sourceEven full OS freedom (Grid’5000, VMs)

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Is it so important?

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Plan

1 Context

2 Passive gathering of information

3 Experiments

4 Integrated behavioral dvfs method

5 HydrasimHydrasim simulatorExample

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Who’s Who

Which application is running?Ask the developer, but

Depends on libraryCan cheat (if accounting is related to it)Computers work for us, not the opposite

Application is not important, its behavior is!

Two different apps can have the same impactSystem changes can have the same impact

We need Run-Time Behavioral Detection

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

BigBrother is watching

Run-time detection

Behavioral pattern

Extract information

Fine grained : performance counters, system valuesCoarse grained : network, disk, power consumption

Classical remark: impact of the monitoring infrastructure

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Finding patterns, NPB example, CG

0

2e+07

4e+07

6e+07

8e+07

1e+08

1.2e+08

0 10 20 30 40 50 60 70 80

Num

ber

of byte

s s

ent per

second

time (s)Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Finding patterns, NPB example, FT

0

2e+07

4e+07

6e+07

8e+07

1e+08

1.2e+08

0 20 40 60 80 100 120 140 160 180

Num

ber

of byte

s s

ent per

second

time (s)Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Finding patterns, NPB example, SP

0

2e+07

4e+07

6e+07

8e+07

1e+08

1.2e+08

0 50 100 150 200 250 300 350

Num

ber

of byte

s s

ent per

second

time (s)Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Model creation

First create a model

Run and monitor reference applications

Cluster subset of characteristic

Choose the most best subset

The most discriminatingThe one with less impactThe one with best stability

Using a clear metric reduces bias

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Use your creation

Simple to use the model once it is created:

Step 1: Measure some characteristics

Step 2: Compare to reference

Step 3: Categorize application (or phase)

Low impact method:

Low computing cost

Network and power characteristics have low monitoring costs

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Plan

1 Context

2 Passive gathering of information

3 Experiments

4 Integrated behavioral dvfs method

5 HydrasimHydrasim simulatorExample

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Configuration

Usecase : Nas Parallel Benchmark

Different type of workload

Representative of HPC applications

Seven benchmarks

Embarrassingly parallelCommunication intensive

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Monitoring infrastructure

Performance Counters

Standard Linux perfcounters (>2.6.31)1 measure/second/core/perfcounter

Network IO

Inbound and outbound packets and bytes1 measure/second/host

Disk IO

Read and write1 measure/second/host

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Post-measure processing

Instantaneous measures fluctuate widely

Need for low cost post-processing

Small window processing

To react to application phaseLow memory/processing cost of processing (ie no FFT)

Simple statistical processing:

Mean, median, standard deviation, decile

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Characteristic choice

Characteristics relevance depends heavily on applications

For HPC application

Disk is of no use

Disk IO only at beginning and endCan change on low memory condition (swaping)

Perfcounters are expected to be relevant

Network and power also

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Perfcounters: unexpectedly bad

One of the best: HW BRANCH MISS (with power on x-axis)

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Network and power

Best combination: Median number of packets and Mean Power

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Plan

1 Context

2 Passive gathering of information

3 Experiments

4 Integrated behavioral dvfs method

5 HydrasimHydrasim simulatorExample

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Energy efficient dvfs

Using HPC application detection

Categorize application (or phase)

Apply a rule-based algorithm

Change processor speed to min or max

Can take into account several objectives depending on rules

Energy only

Energy with taking into account performance

...

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

With more control comes more efficiency

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Conclusion on application profiling

Application characterization is possible with no impact!

It leads to optimize resources usages and reduce energy (J)

Still much to do

Improve statistical post-processing

Improve reactivity (reduce window)

Create a kernel governor

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Plan

1 Context

2 Passive gathering of information

3 Experiments

4 Integrated behavioral dvfs method

5 HydrasimHydrasim simulatorExample

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

GPU are efficient for some algorithms

Fast

Energy-efficient

The system uses 7,168 NVIDIA TeslaM2050 GPUs and 14,336 CPUs; it wouldrequire more than 50,000 CPUs and twice

as much floor space to deliver the sameperformance using CPUs alone (Nvidia)

With only CPU : 12 megawattsHybrid version : 4.04 megawatts

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Runtime vs Placement

Where to run a task ? Two possibilities:

Runtime

StarPU+ Low-impact, reactive- Can be far from optimal

Placement

- Need a-priori information- Cannot adapt+ Can be optimal

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Hydrasim simulator

Hydrasim

Hydrasim: Tool to evaluate placement algorithm

Realistic hardware

Number of processors and GPUEnergy and performance of hardwareSeveral policies (more later)

Output

MakespanEnergy (for CPU, GPU and both)

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Hydrasim simulator

Task Model

DAG of tasks

Tagged dot file

Tasks: time on CPU and GPU

Communications: time of

CPU→CPUCPU→GPUGPU→CPUGPU→GPU

6/7

6/1

0/5/5/0

9/2

0/1/1/0

8/6

0/3/3/0 4/7

0/2/2/0

9/5

0/4/4/0

0/3/3/0 0/5/5/0

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Hydrasim simulator

Allocation Model

Colored dot file

Blue processor

Red GPU

6/7

6/1

0/5/5/0

9/2

0/1/1/0

8/6

0/3/3/0 4/7

0/2/2/0

9/5

0/4/4/0

0/3/3/0 0/5/5/0

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Hydrasim simulator

Complete performance evaluation environment

Starting point depends on the problem

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Hydrasim simulator

Simulator zoom

Simulator needs

Number of CPU/GPU

Characteristics of CPU/GPU

Runtime policy to choose jobs

Fifo, Random, FastestMax connections

Runtime policy for the bus

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Example

Comparison of schedulers

Let’s take several schedulers

All CPU

All GPU

Fastest

EE (most energy efficient)

Rand

Spgti

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Example

Environment

Applications

Generated using a random strategy10 to 100 tasks

Hardware

CPU Intel Xeon E5540 and GPU Nvidia Tesla C1060.1 to 16 CPU and 1 to 16 GPU

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Example

Direct results

0.994

0.996

0.998

1

1.002

1.004

1.006

1.008

10 20 30 50 40 60 70 90 80 100

Ene

rgie

Number of tasks

smallestfirstrandom

maxconnecfirstfifo

0.994

0.996

0.998

1

1.002

1.004

1.006

1.008

1.01

10 20 30 50 40 60 70 90 80 100

Mak

espa

nNumber of tasks

smallestfirstrandom

maxconnecfirstfifo

Energy and Makespan are not impacted by runtime

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Example

Direct results (cont)

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25

10 20 30 50 40 60 70 90 80 100

Mak

espa

n

Number of tasks

all_gpurandspgti

all_cpufastest

ee

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25

10 20 30 50 40 60 70 90 80 100

Ene

rgie

Number of tasks

all_gpurandspgti

all_cpufastest

ee

They are by scheduler

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Example

Optimize resource number

0 2

4 6

8 10

12 14

16

0 2

4 6

8 10

12 14

16

200 250 300 350 400 450 500 550 600 650 700

Makespan (s)

CPU

GPU

Makespan (s)

0 2

4 6

8 10

12 14

16

0 2

4 6

8 10

12 14

16

50000 100000 150000 200000 250000 300000 350000 400000 450000

Energy (J)

CPU

GPU

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation

Context Passive gathering of information Experiments Integrated behavioral dvfs method Hydrasim

Example

Conclusion

Using Hydrasim you can

Test several hardware configuration

For free !

Obtain the optimal number of resources

Compare algorithms

We are interested in

Use-cases

Feedback on the simulator

http://hydrasim.sourceforge.net/

C++/Discrete event simulator/GPL

Georges Da Costa IRIT, Toulouse University

Measuring and Characterizing HPC applications and CPU/GPU simulation