top500 accidential benchmarking
TRANSCRIPT
TOP500 and Accidental Benchmarking
New Directionsin Numerical
Linear Algebra and High
Performance Computing:
Celebrating the70th Birthday ofJack Dongarra,
July 8, 2021
Erich Strohmaier
1 Confessions of an Accidental Benchmarker
• Appendix B of the Linpack Users’ Guide • Designed to help users extrapolate execution time for
Linpack software package • First benchmark report from 1977;
• Cray 1 to DEC PDP-10
http://bit.ly/hpcg-benchmark
�� ��$#��&&������$(&��)'-�-� ��#�*�&'�(,�$����##�''����� ���������(�$#�!����$&�($&,���� ����&$)+���#������(�$#�!����$&�($&,�
"APPENDIX B"Started 36 Years Ago Have seen a Factor of 109 - From 14 Mflop/s to 34 Pflop/s
• In the late 70’s the fastest computer ran LINPACK at 14 Mflop/s
• Today with HPL we are at 34 Pflop/s • Nine orders of magnitude • doubling every 14 months • About 6 orders of
magnitude increase in the number of processors
• Plus algorithmic improvements
Began in late 70’s time when floating point operations were expensive compared to other operations and data movement
�((%��(�#,�����%��� 2
• Adaptive definition of ‘Supercomputer’ for collecting market statistics• Simple metric and procedure (few rules)• Based on measured performance (system has to function)• Floating point benchmark (‘scientific computing’ in early 90s)• High performing (optimizable) to encourage adoption• Broad system coverageØ HPL (High Performance Linpack) had widest coverage by a factor 2-3 x at
least Ø In 1993 and still !
TOP500 – WHY HPL?
PERFORMANCE DEVELOPMENT
1.00E-011.00E+001.00E+011.00E+021.00E+031.00E+041.00E+051.00E+061.00E+071.00E+081.00E+091.00E+10
19941996199820002002200420062008201020122014201620182020
SUM
N=1
N=50059.7 GFlop/s
422 MFlop/s
1.17 TFlop/s
442 PFlop/s
1.52 PFlop/s
2.8 EFlop/s
1 Gflop/s
1 Tflop/s
100 Mflop/s
100 Gflop/s
100 Tflop/s
10 Gflop/s
10 Tflop/s
1 Pflop/s
100 Pflop/s
10 Pflop/s
1 Eflop/s
10 Eflop/s
Many reasons, here are 2 essentials for the TOP500:1) Easy and continuous scalable problem size– Simplicity
2) Asymptotically best performance– For both system size and problem size– Brings out correct long-term trends
WHY DID LINPACK WORK SO WELL?
Size
Rate
HPL performance
• All the good things from HPL plus:– Arithmetic Intensity: Flop/Bytes ~ O(1)– Main features all scale with O(n)
• Does not correlate overly well with established BMs• Changes relative rankings compared to TOP500
CRITERIA FOR ADDITIONAL BENCHMARKS
HPCG - BYTES/FLOPS
0%
2%
4%
6%
8%
10%
12%
0 0.2 0.4 0.6 0.8 1
HPC
G/R
peak
Byte/Flop Ratio of System
SX-ACE
SIAM PP 2018
HPL-AI Benchmark Top 5 List, June 2021
Rank Site Computer Cores HPL Rmax(Eflop/s)
TOP500 Rank
HPL-AI (Eflop/s) Speedup
1RIKEN Center for Computational ScienceJapan
Fugaku, Fujitsu A64FX, Tofu D7,630,848 0.442 1 2.0 4.5x
2 DOE/SC/ORNLUSA
Summit, AC922 IBM POWER9, IB Dual-rail FDR, NVIDIA Volta V100 2,414,592 0.149 2 1.15 7.7x
3 NVIDIAUSA
Selene, DGX SuperPOD, AMD EPYC 7742 64C 2.25 GHz, Mellanox HDR, NVIDIA A100
555,520 0.063 6 0.63 9.9x
4DOE/SC/LBNL/NERSCUSA
Perlmutter, HPE Cray EX235n, AMD EPYC 7763 64C 2.45 GHz, Slingshot-10, NVIDIA A100
761,856 0.065 5 0.59 9.1x
5
ForschungszentrumJuelich (FZJ)Germany
JUWELS Booster Module, Bull Sequana XH2000 , AMD EPYC 7402 24C 2.8GHz, Mellanox HDR InfiniBand, NVIDIA Ampere A100, Atos
449,280 0.044 8 0.47 10x
PERFORMANCE DEVELOPMENT
1.00E-01
1.00E+01
1.00E+03
1.00E+05
1.00E+07
1.00E+09
1.00E+11
19941996199820002002200420062008201020122014201620182020
June 2008
June 2013
SUM
N=1
N=50059.7 GFlop/s
422 MFlop/s
1.17 TFlop/s
442 PFlop/s
1.52 PFlop/s
2.8 EFlop/s
1 Gflop/s
1 Tflop/s
100 Mflop/s
100 Gflop/s
100 Tflop/s
10 Gflop/s
10 Tflop/s
1 Pflop/s
100 Pflop/s
10 Pflop/s
1 Eflop/s
100 Eflop/s
10 Eflop/s
AVERAGE SYSTEM AGE
0
5
10
15
20
25
3019
95
1996
1998
1999
2001
2002
2004
2005
2007
2008
2010
2011
2013
2014
2016
2017
2019
2020
Age
[Mon
ths]
7.6 month
Less incentives to upgrade systems leads to older systems
0
20
40
60
80
100
1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020
RANK AT WHICH HALF OF TOTAL PERFORMANCE IS ACCUMULATED
Less frequent purchases allow to increase system sizes and lead to a more top-heavy TOP500
ANNUAL PERFORMANCE INCREASE OF THE TOP500
11.21.41.61.8
22.22.42.6
19941996
19982000
20022004
20062008
20102012
20142016
20182020
Moore’s Law
TOP500
TOP500: Averages1000 x
11 y
15 y20 y
Less frequent purchases on steady budget leads to increases in system sizes, which temporarily keeps overall performance increase up.
PERFORMANCE DEVELOPMENT
1.00E-01
1.00E+01
1.00E+03
1.00E+05
1.00E+07
1.00E+09
1.00E+11
19941996199820002002200420062008201020122014201620182020
June 2008
June 2013
SUM
N=1
N=50059.7 GFlop/s
422 MFlop/s
1.17 TFlop/s
442 PFlop/s
1.52 PFlop/s
2.8 EFlop/s
1 Gflop/s
1 Tflop/s
100 Mflop/s
100 Gflop/s
100 Tflop/s
10 Gflop/s
10 Tflop/s
1 Pflop/s
100 Pflop/s
10 Pflop/s
1 Eflop/s
100 Eflop/s
10 Eflop/s
RESEARCH / COMMERCIAL MARKETS
• Markets for scientific computing and for commercial data processing are very different.
• Extract proper sub-samples for these markets from the full TOP500 list– TOP100 Research and Academic installations– TOP100 Industry (and Vendor) installations
• Could try to separate out Industry installations but difficult to do
– Ignore "Government, Classified, Others" for now– 100 works reasonably well, more is difficult
ENTRY LEVEL SYSTEM SIZE
1.00E+01
1.00E+02
1.00E+03
1.00E+04
20092010
20112012
20132014
20152016
20172018
20192020
2021
100 Tflop/s
10 Tflop/s
1 Pflop/s
10 Pflop/s
TOP500
Research
Commercial