parallel processing

25
PARALLEL PROCESSING The NAS Parallel Benchmarks Daniel Gross Chen Haiout

Upload: tameka

Post on 02-Feb-2016

39 views

Category:

Documents


0 download

DESCRIPTION

PARALLEL PROCESSING. The NAS Parallel Benchmarks Daniel Gross Chen Haiout. NASA (NAS Devision). NASA (NAS Devision) Aims. NASA Advanced Supercomputing Division Develop, demonstrate, and deliver innovative computing capabilities to enable NASA projects and missions - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: PARALLEL PROCESSING

PARALLEL PROCESSING

The NAS Parallel Benchmarks

Daniel GrossChen Haiout

Page 2: PARALLEL PROCESSING

NASA (NAS Devision)

Page 3: PARALLEL PROCESSING

NASA (NAS Devision) Aims

• NASA Advanced Supercomputing Division

• Develop, demonstrate, and deliver innovative computing capabilities to enable NASA projects and missions

• Demonstrate by the next millennium an operational computing system capable of simulating, in one to several hours, an entire aerospace vehicle system throughout its mission and life cycle.

Page 4: PARALLEL PROCESSING

NPB Introduction

• NAS Parallel Benchmarks suite (NPB) has been used widely to evaluate modern parallel systems

• Measure objectively the performance of highly parallel computers and to compare their performance with that of conventional supercomputers

• Consists of eight benchmark problems derived from important classes of Arophysics applications.

• NPB is based on Fortran 77 and the MPI message passing standard

Page 5: PARALLEL PROCESSING

Benchmark Problems

• EP Embarrassingly Parallel

• IS Integer sort

• CG Conjugate gradient

• MG Multigrid method for Poisson eqn

• FT Spectral method (FFT) for Laplace eqn

• BT ADI; Block-Tridiagonal systems

• SP ADI; Scalar Pentadiagonal systems

• LU Lower-Upper symmetric Gauss-Seidel

Page 6: PARALLEL PROCESSING

• The Embarrassingly Parallel Benchmark (EP)

In this benchmark, 2-dimensional statistics are accumulated from a large number of Gaussian pseudo-random numbers. This problem requires almost no communication, in some sense this benchmark provides an estimate of the upper achievable limits for floating-point performance on a particular system.

• SP benchmark

It is called the scalar pentadiagonal (SP) benchmark. In this benchmark, multiple independent systems of non-diagonally dominant, scalar pentadiagonal equations are solved. A complete solution of the SP requires 400 iteration.

• MultiGrid (MG) Benchmark

MG uses a multigrid method to compute the solution of the three-dimensional scalar Poisson equation.

This code is a good test of both short and long distance highly structured communication.

Page 7: PARALLEL PROCESSING

• 3-D FFT PDE (FT) Benchmark

FT contains the computational kernel of a three dimensional FFT-based spectral method.

• BT Simulated CFD benchmark

BT solve systems of equations resulting from an approximately factored finite difference discretization of the Navier-Stokes equations.

Page 8: PARALLEL PROCESSING
Page 9: PARALLEL PROCESSING

Class Benchmarks

• Since the 1991 specifications of NPB 1.0, computer speed and memory sizes have grown and correspondingly so have representative problem sizes.

• NPB 1.0 specifies two problem sizes for each benchmark – class “A” and a larger class “B”.The class A benchmarks can now be run on a moderatelypowerful workstation, and class B benchmarks on high-end workstations or small parallel systems.

• To retain the focus on high-end supercomputing, we now add a class “C” for all of the NAS benchmarks.

Page 10: PARALLEL PROCESSING

Weakness Points

• Implementations of the NAS Benchmarks are usually highly tuned by computer vendors

• largest problems (class B) no longer reflect the largest problems being done on present-day supercomputers

Page 11: PARALLEL PROCESSING

Why 8 Different Benchmarks?

Page 12: PARALLEL PROCESSING
Page 13: PARALLEL PROCESSING

Comparing World Wide Clusters• Loki and Hyglac

In September 1996 two medium-scale parallel systems called “Loki” and “Hyglac” were installed.

Each consisted of sixteen Pentium Pro (200 MHz) PCs with 16 Mbytes of memory and 3.2 and 2.5 Gbytes of disks per node, respectively. Each system was integrated using two fast Ethernet NICs in each node.

Both sites had performed a complex N-body gravitational simulation of 2 million particles using an advanced tree-code algorithm. Each of these systems achieved a sustained performance of 1.19 Gflops and 1.26 Gflops, respectively. When the systems were connected together The same code was run again and achieved a sustained capability of over 2 Gflops without further optimization of the code for this new configuration.

Page 14: PARALLEL PROCESSING

• Berkeley NOW

The hardware configuration of the Berkeley NOW (Network Of Workstation) system comprise 105 Sun Ultra 170 workstations connected by Myricom networks. Each node includes 167MHz Ultra 1 microprocessor with 512 KB cache, 128 MB of RAM, two 2.3 GB disk space.

Page 15: PARALLEL PROCESSING

• Cray T3E

The Cray T3E-1200 is a scalable shared-memory multiprocessor based on the DEC Alpha 21164 microprocessor. It provides a shared physical address space of up to 2048 processors over a 3D torus interconnect. Each node of the system contains an Alpha 21164 processor each of which is capable of 1200 Mflops. The system logic runs at 75 MHz, and the processor runs at some multiple of this, such as 600 MHz for Cray T3E-1200. Torus links provide a raw bandwidth of 650 MBps in each direction to maintain system balance with the faster processors and memory.

Page 16: PARALLEL PROCESSING

NPB Graph Results

Page 17: PARALLEL PROCESSING
Page 18: PARALLEL PROCESSING
Page 19: PARALLEL PROCESSING
Page 20: PARALLEL PROCESSING
Page 21: PARALLEL PROCESSING
Page 22: PARALLEL PROCESSING
Page 23: PARALLEL PROCESSING

The Dwarves –Hardware

• Old PII at 300MHz processors –Will be

removed soon.

• 8 PIII at 450MHz processors

• 4 PIII at 733MHz processors

• The new machines:

– Dual AMD Athlon(tm) MP 2000+ @

1,666MHz. 1GB Memory.

Page 24: PARALLEL PROCESSING

In The Next 2 Weeks

• Install the NPB 2.2 on the Dwarves cluster

• Run the Benchmark tests on the Dwarves Cluster

• Run tests on several different configurations (different number of dwarves)

• Estimate Network Bandwidth and latency.

• Compare the Dwarves cluster performance to similar clusters in the world

Page 25: PARALLEL PROCESSING

Questions will not be answered !!!

GOOD NIGHT