hpcc awards class 1: performance - icl utkicl.cs.utk.edu/graphics/posters/files/sc13-hpcc.pdf ·...

2
1 10 100 1000 10000 2005 2006 2007 2008 2009 2010 2011 2012 1 10 100 1000 10000 2005 2006 2007 2008 2009 2010 2011 2012 1 10 100 1000 10000 2005 2006 2007 2008 2009 2010 2011 2012 HPCC AWARDS CLASS 1: PERFORMANCE HPL This is the widely used implementation of the Linpack TPP benchmark. It measures the sustained floating point rate of execution for solving a linear system of equations. STREAM A simple benchmark test that measures sustainable memory bandwidth (in GB/s) and the corresponding computation rate for four vector kernel codes. RandomAccess Measures the rate of integer updates to random locations in a large global memory array. PTRANS Implements parallel matrix transpose that exercises a large volume communication pattern whereby pairs of processes communicate with each other simultaneously. FFT Calculates a Discrete Fourier Transform (DFT) of very large one-dimensional complex data vector. b_eff Effective bandwidth benchmark is a set of MPI tests that measure the latency and bandwidth of a number of simultaneous communication patterns. DGEMM Measures the floating point rate of execution of double precision real matrix-matrix multiplication. HPCC BENCHMARKS 0.1 1 10 100 1000 2005 2006 2007 2008 2009 2010 2011 2012 EP-STREAM-Triad G-RandomAccess G-FFT G-HPL 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd Cray XT5 Hex-core OAK RIDGE Jaguar IBM Blue Gene/P ARGONNE Intrepid Cray XT3 SANDIA Red Storm IBM Blue Gene/L LIVERMORE NEC SX-9 JAMSTEC Earth Simulator Fujitsu SPARC64 VIIIfx RIKEN AICS K computer IBM Blue Gene/L LIVERMORE Cray XT5 Quad-core OAK RIDGE Jaguar Cray XT5 Hex-core OAK RIDGE Jaguar Fujitsu SPARC64 VIIIfx RIKEN AICS K computer IBM Blue Gene/L LIVERMORE IBM Blue Gene/P ARGONNE Intrepid Fujitsu SPARC64 VIIIfx RIKEN AICS K computer DARPA PERCS IBM Blue Gene/P LIVERMORE Dawn Cray XT5 Hex-core OAK RIDGE Jaguar IBM Blue Gene/P ARGONNE Intrepid Cray XT3 SANDIA Red Storm IBM Blue Gene/L LIVERMORE NEC SX-9 JAMSTEC Earth Simulator Fujitsu SPARC64 VIIIfx RIKEN AICS K computer TB/s GUPS Tflop/s Tflop/s FIND OUT MORE AT http://www.hpcchallenge.org SPONSORED BY National Science Foundation

Upload: phunghanh

Post on 19-Aug-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

1

10

100

1000

10000

2005 2006 2007 2008 2009 2010 2011 2012

1

10

100

1000

10000

2005 2006 2007 2008 2009 2010 2011 20121

10

100

1000

10000

2005 2006 2007 2008 2009 2010 2011 2012

HPCC AWARDS CLASS 1: PERFORMANCE

HPL

This is the widely used implementation of the Linpack TPP benchmark. It measures the sustained floating point rate of execution for solving a linear system of equations.

STREAM

A simple benchmark test that measures sustainable memory bandwidth (in GB/s) and the corresponding computation rate for four vector kernel codes.

RandomAccess

Measures the rate of integer updates to random locations in a large global memory array.

PTRANS

Implements parallel matrix transpose that exercises a large volume communication pattern whereby pairs of processes communicate with each other simultaneously.

FFT

Calculates a Discrete Fourier Transform (DFT) of very large one-dimensional complex data vector.

b_eff

Effective bandwidth benchmark is a set of MPI tests that measure the latency and bandwidth of a number of simultaneous communication patterns.

DGEMM

Measures the floating point rate of execution of double precision real matrix-matrix multiplication.

HPCC BENCHMARKS

0.1

1

10

100

1000

2005 2006 2007 2008 2009 2010 2011 2012

EP-STREAM-TriadG-RandomAccess

G-FFT G-HPL1st2nd

3rd

1st

2nd

3rd

1st

2nd3rd

1st

2nd3rd

Cray XT5 Hex-coreOAK RIDGEJaguar

IBMBlue Gene/P

ARGONNEIntrepidCray XT3

SANDIARed Storm

IBMBlue Gene/LLIVERMORE

NEC SX-9JAMSTEC

EarthSimulator

FujitsuSPARC64 VIIIfx

RIKEN AICSK computer

IBMBlue Gene/LLIVERMORE

Cray XT5 Quad-core

OAK RIDGEJaguar

Cray XT5 Hex-coreOAK RIDGEJaguar

FujitsuSPARC64 VIIIfx

RIKEN AICSK computer

IBMBlue Gene/LLIVERMORE

IBMBlue Gene/P

ARGONNEIntrepid

FujitsuSPARC64 VIIIfx

RIKEN AICSK computer

DARPA PERCS

IBMBlue Gene/P

LIVERMOREDawn

Cray XT5 Hex-coreOAK RIDGEJaguar

IBMBlue Gene/P

ARGONNEIntrepidCray XT3

SANDIARed Storm

IBMBlue Gene/LLIVERMORE

NEC SX-9JAMSTEC

EarthSimulator

FujitsuSPARC64 VIIIfx

RIKEN AICSK computer

TB

/s

GU

PS

Tfl

op

/s

Tfl

op

/s

FIND OUT MORE AT http://www.hpcchallenge.org

SPONSORED BY

National Science Foundation

PROJECT GOALS• Provide performance bounds in locality space using real world

computational kernels

• Allow scaling of input data size and time to run according to the system capability

• Verify the results using standard error analysis

• Allow vendors and users to provide optimized code for superior performance

• Make the benchmark information continuously available to the public in order to disseminate performance tuning knowledge and record technological progress over time

• Ensure reproducibility of the results by detailed reporting of all aspects of benchmark runs

FEATURE HIGHLIGHTS OF HPCC 1.4.3• Increased the size of scratch vector for local FFT tests that was

missed in the previous version (reported by SGI)• Added Makefile for Blue Gene/P contributed by Vasil Tsanov.• Released in August 2013

SUMMARY OF HPCC AWARDSCLASS 1: Best Performance• Best in G-HPL, EP-STREAM-Triad per system, G-RandomAccess, G-FFT• There will be 4 winners (one in each category)

CLASS 2: Most Productivity• One or more winners• Judged by a panel at SC12 BOF• Stresses elegance and performance• Implementations in various (existing and new) languages are encouraged• Submissions may include up to two kernels not present in HPCC• Submission consists of: code, its description, performance numbers, and a

presentation at the BOF

DIGITAL SIGNALPROCESSING

RADAR CROSSSECTION

LOW HIGH

HIGH

SPA

TIA

L L

OC

ALIT

Y

TEMPORAL LOCALITY

APPLICATIONS

COMPUTATIONALFLUID DYNAMICS

TRAVELINGSALES PERSON

STREAM (EP & SP) PTRANS (G)

HPL Linpack (G)Matrix Mult (EP & SP)

RandomAccess (G, EP, & SP) FFT (G, EP, & SP)

FIND OUT MORE AT http://www.hpcchallenge.org

SPONSORED BY

National Science Foundation

SN-DGEMM

RandomRing Bandwidth

RandomRing Latency

PP-HPL

PP-PTRANS

PP-RandomAccess

PP-FFTE

SN-STREAM Triad

1

0.8

0.6

0.4

0.2

SunCray

Dalco

Cray XD1 AMD Opteron 64 procs – 2.2 GHz1 thread/MPI process (64) RapidArray Interconnect System11-22-2004

Dalco Opteron/QsNet Linux Cluster AMD Opteron64 procs – 2.2 GHz1 thread/MPI process (64)QsNetII11-04-2004

Sun Fire V20z Cluster AMD Opteron64 procs – 2.2 GHz1 thread/MPI process (64) Gigabit Ethernet, Cisco 6509 switch03-06-2005

LOCALITY SPACE OF MEMORY ACCESS IN APPLICATIONS

HPCC RESULTS PAGE

KIVIAT CHART WITH RESULTS FOR THREE DIFFERENT CLUSTERS