accelerating high-throughput computing through opencl · accelerating high-throughput computing...

14
Accelerating High-Throughput Computing through OpenCL Andrei Dafinoiu, Joshua Higgins, Violeta Holmes High-Performance Computing Research Group University of Huddersfield Huddersfield, United Kingdom

Upload: others

Post on 11-Jan-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accelerating High-Throughput Computing through OpenCL · Accelerating High-Throughput Computing through OpenCL Andrei Dafinoiu, Joshua Higgins, VioletaHolmes High-Performance Computing

Accelerating High-Throughput Computing through OpenCL

Andrei Dafinoiu, Joshua Higgins, Violeta Holmes

High-Performance Computing Research Group

University of Huddersfield

Huddersfield, United Kingdom

Page 2: Accelerating High-Throughput Computing through OpenCL · Accelerating High-Throughput Computing through OpenCL Andrei Dafinoiu, Joshua Higgins, VioletaHolmes High-Performance Computing

OverviewIntroduction

Resources

Motivation

Experiment Design

Results and Performance

Conclusions

Page 3: Accelerating High-Throughput Computing through OpenCL · Accelerating High-Throughput Computing through OpenCL Andrei Dafinoiu, Joshua Higgins, VioletaHolmes High-Performance Computing

OpenCL

• OpenCL -> A programming

framework for heterogeneous

compute platforms

• Supports CPU, GPU, DSP, and

other accelerators

• Programming API based on C/C++

Page 4: Accelerating High-Throughput Computing through OpenCL · Accelerating High-Throughput Computing through OpenCL Andrei Dafinoiu, Joshua Higgins, VioletaHolmes High-Performance Computing

Introduction

• High-Throughput Computing

• HTCondor

• QGGCondor

Page 5: Accelerating High-Throughput Computing through OpenCL · Accelerating High-Throughput Computing through OpenCL Andrei Dafinoiu, Joshua Higgins, VioletaHolmes High-Performance Computing

Aim of project• To expand the capabilities of the QGGCondor.

• To evaluate the efficiency and flexibility of OpenCL for use within a heterogeneous HTC environment.

• To increase the visibility of GPGPU computing

Page 6: Accelerating High-Throughput Computing through OpenCL · Accelerating High-Throughput Computing through OpenCL Andrei Dafinoiu, Joshua Higgins, VioletaHolmes High-Performance Computing

HTCondor SetupClassAdds, what are they ?

Old ClassAdd HTCondor Proposed New ClassAdd

Page 7: Accelerating High-Throughput Computing through OpenCL · Accelerating High-Throughput Computing through OpenCL Andrei Dafinoiu, Joshua Higgins, VioletaHolmes High-Performance Computing

Environment SetupReasoning:

- Unreliable environment for benchmarking purposes.

- Desire to execute benchmarks over the live system.

Method:

- Extracted hostnames of 1000 condor machines.

- Script based generation of jobs.

Page 8: Accelerating High-Throughput Computing through OpenCL · Accelerating High-Throughput Computing through OpenCL Andrei Dafinoiu, Joshua Higgins, VioletaHolmes High-Performance Computing

Resource Discovery

AMD 5600

1%AMD 6400

19%

AMD 6500

14%

K600

4%GTX 610

4%GTX 670

7%

GTX 750 TI

8%

GTX 970

13%

Not Detected

30%

GPU DISTRIBUTION

AMD 5600 AMD 6400 AMD 6500 K600 GTX 610

GTX 670 GTX 750 TI GTX 970 Not Detected

Page 9: Accelerating High-Throughput Computing through OpenCL · Accelerating High-Throughput Computing through OpenCL Andrei Dafinoiu, Joshua Higgins, VioletaHolmes High-Performance Computing

Experiment DesignFast-Fourier Transform

1000 iterations -> to ensure precision

17 FFT sizes -> from 2^8 to 2^24

701 machines -> that reported GPUs

Resulting in:

-Aprox. 12 million FFT calculations

-Aprox. 28 thousand CPU hours

Page 10: Accelerating High-Throughput Computing through OpenCL · Accelerating High-Throughput Computing through OpenCL Andrei Dafinoiu, Joshua Higgins, VioletaHolmes High-Performance Computing

Results

0

0.5

1

1.5

2

2.5

2^8 2^11 2^14 2^17 2^20 2^23

GF

LOP

S

FFT Size

1D FFT on CPU

Intel I50

10

20

30

40

50

2^8 2^11 2^14 2^17 2^20 2^23

GF

LOP

S

FFT Size

1D FFT on GPU

AMD 6500

Page 11: Accelerating High-Throughput Computing through OpenCL · Accelerating High-Throughput Computing through OpenCL Andrei Dafinoiu, Joshua Higgins, VioletaHolmes High-Performance Computing

5

10

20

40

80

160

2^8 2^11 2^14 2^17 2^20 2^23

GF

LOP

S

FFT Size

QGGCondor GPU FFT Performance

AMD 6500 AMD 6400 GTX 970 GTX 750 Ti GTX 670 AMD 5600

2014

GTX

970

AMD

5600

2009

2010

2012

Page 12: Accelerating High-Throughput Computing through OpenCL · Accelerating High-Throughput Computing through OpenCL Andrei Dafinoiu, Joshua Higgins, VioletaHolmes High-Performance Computing

Comparison with GPU cluster

0

20

40

60

80

100

120

140

160

2^8 2^11 2^14 2^17 2^20 2^23

GF

LOP

S

FFT Size

Average Performance

Condor C2050 AMD NVIDIA• On average, a single compute

GPU is negligibly faster.

• Newer-gen GPGPUs outperform

older compute counterparts.

P.S: NVIDIA C2050 was released in 2011

Page 13: Accelerating High-Throughput Computing through OpenCL · Accelerating High-Throughput Computing through OpenCL Andrei Dafinoiu, Joshua Higgins, VioletaHolmes High-Performance Computing

Conclusion•HTCondor GPU integration was successful with single entity ClassAdd definition.

• OpenCL based implementations over HTCondor are straightforward (without specific optimizations) however machines need dedicated graphics drivers installed.

• QGGCondor GPU performance varies greatly however the average is marginally slower than that of a dedicated compute GPU.

Page 14: Accelerating High-Throughput Computing through OpenCL · Accelerating High-Throughput Computing through OpenCL Andrei Dafinoiu, Joshua Higgins, VioletaHolmes High-Performance Computing

Thank you for your attention

Questions ?