the centre for modelling & simulation (cfms) product ...the ibm power8 system is the first...

9
The Centre for Modelling & Simulation (CFMS) Product Technology Evaluation Report IBM Power System S824L

Upload: others

Post on 25-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Centre for Modelling & Simulation (CFMS) Product ...The IBM POWER8 System is the first generation of chip architecture developed for the OpenPower Foundation and together with

The Centre for Modelling & Simulation (CFMS)

Product Technology Evaluation Report

IBM Power System S824L

Page 2: The Centre for Modelling & Simulation (CFMS) Product ...The IBM POWER8 System is the first generation of chip architecture developed for the OpenPower Foundation and together with

Table of Content

Executive Summary

CFMS Product Technology Evaluation Programme

IBM Power System S824L Product Technology Evaluation Objectives

IBM Power System S824L System

Experimental Testing Plan

Benchmarking & Testing Outcome - Technical Testing Results

About CFMS

CFMS, Bristol and Bath Science Park, Dirac Crescent, Emersons Green, Bristol, BS16 7FR www.cfms.org.uk [email protected] 0117 906 1100

2

3

4

4

5

6

7

9

Page 3: The Centre for Modelling & Simulation (CFMS) Product ...The IBM POWER8 System is the first generation of chip architecture developed for the OpenPower Foundation and together with

Executive Summary

There is an expectation that an affordable route to exascale computing will involve some disruption of the existing CPU+RAM+Infiniband architecture. Although accelerators like NVIDIA® Tesla® and Intel Xeon Phi have good positions in the Top500 List, actual industrial uptake is limited.

The upward trajectory of architecting everything in the same way and scaling in size to achieve extra performance doesn’t fully address all the challenges, and is limiting in terms of a realistic solution for the future. Take power for example, one of the most critical and debated points in the industry.

The largest supercomputer in the world is a tenth of the way to exascale with over a million cores, but yet is too expensive to run due to the required power usage. Building a machine 10 times bigger and absorbing the power bill, does not address the heart of the problem. There may be some advantages in looking at the next generation of x86 technology, and in 5 years time we might manage to double the performance with the same power budget, but that’s not the step change we’re looking for. It has to come from a unified rearchitecturing of the software and hardware interaction.

It will be interesting to see the impact of IBM Power systems as a catalyst for disruption, especially the introduction of NVLink and a matured CAPI, a basis for new technologies in the future.

With CAPI currently available, and NVLink in the launch pipeline, what they both potentially offer is a step change. The product testing undertaken with the IBM Power System S824L is early level testing. To understand NVLink and CAPI, getting an application running on the platform and achieving a reasonable level of performance is a requisite before contemplating the introduction of new and different technologies. Running Linux on x86, and Little Endian Linux on IBM POWER8, comparing and benchmarking these platforms will provide valuable insight. When NVLink is commercially available, the base groundwork will have already have been completed, reducing time, cost and productivity in the process of getting up and running.

The results from the technology evaluation of the IBM Power System S824L against the defined experimental plan were positive, interesting and insightful. To summarise, the S824L:

• Meets system performance levels for technical computing

• Is appealing in terms of flexibility and simplicity

• Is on the path to maturity, and gaps identified are not insurmountable

As part of the BProduct Technology Evaluation function, The Centre for Modelling & Simulation (CFMS) also invites third parties to collaborate on testing. In addition to CFMS, a leading engine manufacturer in the aerospace industry was involved in testing the S824L. Their outcome also reflects our conclusions from the programme.

The S824L tested was 4U with a significant number of PCI slots for IO expansion, which results in a relatively low compute density. We look forward to a more HPC focussed offering. Overall, testing the S824L demonstrates the potential, and is the start of an interesting journey which will be welcomed within industry.

CFMS, Bristol and Bath Science Park, Dirac Crescent, Emersons Green, Bristol, BS16 7FR www.cfms.org.uk [email protected] 0117 906 1100

3

Page 4: The Centre for Modelling & Simulation (CFMS) Product ...The IBM POWER8 System is the first generation of chip architecture developed for the OpenPower Foundation and together with

2

CFMS Product Technology Evaluation

Supporting the development and delivery of high value design methods and tools, CFMS provides access to and advises on best in class technologies that accelerate advanced modelling and simulation and HPC.

Providing insight from an actual user perspective, Product Technology Evaluation consists of evaluating product performance through to product validation to establish suitability of use for customer projects, to working with manufacturers, inputting into product design and development to confirm the proposed approach. Through our trusted, independent Technology Lab, we can replicate and setup technical environments, running scenarios for testing and evaluation we offer a choice of project outputs including written reports, presentations or collaborative arrangements, working with all parties involved.

From reducing risk in the development process and helping vendors understand how their technology will be used, through to taking hardware, software or an end-to-end solution we help by validating suitability for customer and research projects. Supported by our team of Modelling and Simulation, HPC and IT systems specialists, we work with industry end users to manufacturers, providing analysis of product performance and greater insight and feedback into product development and technology research projects.

CFMS, Bristol and Bath Science Park, Dirac Crescent, Emersons Green, Bristol, BS16 7FR www.cfms.org.uk [email protected] 0117 906 1100

The IBM POWER8 System is the first generation of chip architecture developed for the OpenPower Foundation and together with the Nvidia® Tesla® GPU forms the basis of the next generation DOE HPC infrastructure. This is also the first Power Processor from IBM being offered with support for Little Endian Byte ordering used by x86 processors from Intel and AMD which reduces codes changes required to migrate existing x86 code.

Technical computing applications typically stretch development tool chains due to the complexity of software, and exercise the operating system as they make use of many features like affinity to maximise application performance. These applications are an ideal test for system readiness whilst also providing performance results on real engineering workloads that can be directly compared with other systems.

The combination of the IBM POWER8 and the Nvidia® Kepler GPU creates a challenge to find suitable industrially relevant application software that can exercise all components in the system. zCFD from Zenotech was chosen as it is able to exercise both types of processors and it implements industry standard algorithms.

A range of tests were undertaken to assess the following system properties:

• System performance for technical computing and engineering workloads

• Quality of the tool chain (compilers, environment, etc.)

• Integrated energy efficiency and scale

• Total Cost of Ownership (TCO) and management

• Assessment of system for readiness

IBM Power System S824L Product Technology Evaluation Objectives

4

Page 5: The Centre for Modelling & Simulation (CFMS) Product ...The IBM POWER8 System is the first generation of chip architecture developed for the OpenPower Foundation and together with

.

CFMS, Bristol and Bath Science Park, Dirac Crescent, Emersons Green, Bristol, BS16 7FR www.cfms.org.uk [email protected] 0117 906 1100

IBM Power System S824L

The IBM Power System S824L server is a Linux-based two-socket server supporting 20 or 24 POWER8 cores in a dense, 4U rack-optimised form factor.

IBM Power System S824L product highlights:

• The first server to leverage OpenPOWER Foundation technology to dramatically accelerate Java, big data and technical computing applications. Running multiple concurrent queries that take advantage of industry-leading memory and I/O bandwidths, this leads to highly supported utilisation rates

• Delivers faster query acceleration for Java applications with NVIDIA GPUs

• Boosts workload performance by offloading highly parallel operations to GPU accelerator(s)

• Has twice the bandwidth of prior servers and lower hardware and power requirements, allowing superior scale-out efficiencies with Open technologies like Linux and OpenStack that economically enable these capabilities

• Will enable future integrated hardware solutions that dramatically accelerate compute- and data- intensive tasks due to its open standards based platform

The summary specification of the POWER8 server used for testing is shown below:

Microprocessors

Level 2 (L2) Cache

Level 3 (L3) Cache

Level 4 (L4) Cache

Memory Min/Max

Processor-to-memory bandwidth

Two 10-core 3.42 gigahertz (GHZ) POWER8 processor cards

512 kilobyte (KB) L2 cache per core

8 megabyte (MB) L3 cache per core

16 MB per dual inline memory module (DIMM)

512 gigbyte (GB) RAM

192 gigabytes per second (GBps) per socket

5

Page 6: The Centre for Modelling & Simulation (CFMS) Product ...The IBM POWER8 System is the first generation of chip architecture developed for the OpenPower Foundation and together with

CFMS, Bristol and Bath Science Park, Dirac Crescent, Emersons Green, Bristol, BS16 7FR www.cfms.org.uk [email protected] 0117 906 1100

A number of tests were selected to exercise different elements of the system: High Performance Linpack (HPL - currently industry standard), zCFD, Solar and OpenFOAM. CFMS has access to the source code of all these applications, which (with the exception of HPL) are used for solving industrial scale problems.

Between these tests, the performance evaluation of single-threaded, pure-MPI and hybrid MPI/OpenMP workloads was assessed.

Experimental Testing Plan

Metric Test Requirements/RisksInstallation Access, weight, power

System performance for technical computing

Single thread

MPI

MPI/OpenMP hybird

For each test, compare runtime with single node Intel Ivybridge

performance

Compile and run HPL

Compile and run zCFD

Complier support

Third party library support

NVIDIA GPU performance

CUDA software stack

Compile and run zCFD

Compare with K20 on Intel Ivybridge

Compiler Support

Third party library support

Integrated Energy Efficiency/Scale Measure power draw under load

Install and Configuration

Storage:• RAID1 for persistent data (2 disks)

• RAID0 for application scratch (6 disks)

Network:• Single GbE connection to site network

OS/Software:• Ubuntu 14.10

• IBM XL C/C++ and Fortran compilers

• IBM Engineering and Scientific Subroutine Library (ESSL)

• GNU 4.9 C/C++ and Fortran compilers

• CUDA Toolkit 7.0-rc

• OpenMPI 6.5

Where possible, software binaries were installed from the official canonical repositories via APT.

6

Page 7: The Centre for Modelling & Simulation (CFMS) Product ...The IBM POWER8 System is the first generation of chip architecture developed for the OpenPower Foundation and together with

CFMS, Bristol and Bath Science Park, Dirac Crescent, Emersons Green, Bristol, BS16 7FR www.cfms.org.uk [email protected] 0117 906 1100

Product Evaluation Outcome - Technical Results

FlexibilityMost hardware multithreading solutions are enabled/disabled in UEFI/BIOS, and will require a reboot to change. This leads most HPC systems to leave multithreading either on or off, depending on the performance impact that it will have on the applications in use.

By contrast, SMT8 can be reconfigured while the system is running. This allows for the SMT configuration to be optimised for each simulation run, or even at each simulation job step.

POWER8 provides greater Non-Uniform Memory Access (NUMA) control, allowing the optimisation of the process layout on the hardware topology.

SimplicityAs HPC application workloads become more specific and optimised, we have observed an ongoing trend within the HPC landscape for heterogeneous clusters, rather than attempting a ‘best-fit’ homogenous configuration. These heterogeneous clusters may combine standard CPU only compute nodes with either ‘high-memory’ nodes, or with compute nodes equipped with GPGPU or other accelerators.

Tools like xCAT (eXtreme Cloud Administration Toolkit) and IBM Spectrum Scale (formerly IBM GPFS) can be used to deploy and run mixed x86 and Power systems (equipped with NVIDIA GPGPUs) to accelerate specific workloads, while providing a consistent experience for end users.

Porting existing CUDA software to run on the S824L’s K40 GPUs was trivial, primarily due to the common interfaces provided by the CUDA toolkits on x86 and POWER8.

PerformanceSystem performance assessment focused on compute rather that I/O, as typical HPC workloads would exercise processors with storage being implemented as a shared parallel file system. The performance was measured by running zCFD on a standard aerospace test case from NASA. The latest CUDA 7.0rc Toolkit from NVIDIA was utilised, together with the gcc 4.9 providing OpenMP thread based parallelism and OpenMPI 6.5 providing MPMD parallelism. The IBM XL C/C++ for Linux Compiler was also used but the version supporting OpenMP was not available during the test period so its performance was extrapolated.

To get the best performance from the POWER8 processor requires the use of a combination of thread level and process level parallelism. The benchmark runs were undertaken with one MPI process per NUMA node (i.e. 2 MPI processes per POWER8 processor socket) with a processor and memory affinity set, and the number of OpenMP threads were varied according to the SMT setting. The NVIDIA K40 benchmark was run with one MPI process per GPU.

The results were compared to a dual socket Intel(R) Xeon(R) CPU E5-2648L v2 @ 1.9GHz system from IBM with hyperthreading switched off, and the software compiled using Intel 15.0 compilers and OpenMPI 6.5.

The dual socket based on POWER8 is 1.2x faster than the Intel Ivybridge system when using the gcc 4.9 compiler. The extrapolated results from the limited runs using the IBM XL C/C++ for Linux Compiler shows a potential of over 2x speed up but this needs to be validated when the compilers are released by IBM.

7

Page 8: The Centre for Modelling & Simulation (CFMS) Product ...The IBM POWER8 System is the first generation of chip architecture developed for the OpenPower Foundation and together with

6

CFMS, Bristol and Bath Science Park, Dirac Crescent, Emersons Green, Bristol, BS16 7FR www.cfms.org.uk [email protected] 0117 906 1100

MaturityOne of the challenges with porting test workloads to the S824L was the discovery of problems with packages which were available in the official Ubuntu repositories. When filing bug reports, the response from some application maintainers is that although POWER8 support was included in the latest releases, opportunities for full testing had been minimal. Raising this point with IBM, the intention is to make more test and development servers based on POWER8 available to the software development community, which should mitigate these issues in future.

Although testing achieved good performance from the POWER8 processors, the energy cost to solution provided fewer FLOPS/watt compared to equivalent x86, partly due to the higher clock frequency.

It has been widely reported that the POWER8 processor card costs less than the equivalent Intel x86 processor. However, the remainder of the server still carries a price premium, and the price performance for the S824L has yet to mature. This is to be expected with a new product to market and may be achievable with greater availability via OpenPOWER.

8

Page 9: The Centre for Modelling & Simulation (CFMS) Product ...The IBM POWER8 System is the first generation of chip architecture developed for the OpenPower Foundation and together with

6

CFMS, Bristol and Bath Science Park, Dirac Crescent, Emersons Green, Bristol, BS16 7FR www.cfms.org.uk [email protected] 0117 906 1100

The Centre for Modelling & Simulation (CFMS) is proud to be a growing, independent and not-for-profit organisation that specialises in high value design capability. We promote advanced modelling and simulation, underpinned by HPC, pushing the boundaries of technology.

Through our exceptional, collaborative, virtual and physical facility, we enable the adoption and acceleration of new technologies for advanced modelling and simulation, while improving learning and developing awareness of state-of-the-art.

Engaging with organisations large and small, we help to provide access to the right tools, resources, skills and technologies, resulting in increased productivity and faster, more informed decision-making.

As a trusted and neutral provider, our vision is to drive a practical revolution in engineering capability and design, working with organisations to reduce risk in the design phase, product development costs and time to market.

For further information about how CFMS can help your business call 0117 906 1100 or email [email protected]

About CFMS

9