performance analysis of ibm power8 processors

1
L A T E X Tik Zposter 56 Performance Analysis of IBM POWER8 Processors Jakub Jelen [email protected] Fakulta informa ˇ cn ´ ıch technologi ´ ı VUT v Brn ˇ e 56 Performance Analysis of IBM POWER8 Processors Jakub Jelen [email protected] Fakulta informa ˇ cn ´ ıch technologi ´ ı VUT v Brn ˇ e What is IBM POWER8 and what is so special there? The Power8 processor is available under OpenPOWER license. Every core is able to execute instructions from 8 different software threads in real Si- multaneous Multi-Threading (8SMT). Every socket can be connected to 8 memory modules using fast memory lanes, providing sustainable bandwidth over 190 GB/s per socket. The Power8 can operate in both little and big endian mode. Competing Intel server/HPC architecture with similar parameters is Intel Haswell-EP. Fig. 1: IBM Power8 CPU die, labeled 1 Can you compare them? IBM Intel Xeon Power8 E5-2680 v3 8247-21L Haswell-EP Lithography 22 nm 22 nm Launch date Q2’14 Q3’14 Cores 10 12 Threads 80 24 L1 data cache 64 kB 32 kB L2 cache (core) 512 kB 256 kB L3 cache (sum) 80 MB 30 MB L4 cache 128 MB - Bandwidth [GB/s] 192 68 Vector unit [b] 2 × 128 256 (VSX) (AVX2) Frequency [GHz] 3.425 2.5 Performance 548 480 [GFLOPS] Is it better than Intel Haswell-EP processors? It depends on what you measure. Memory performance is significantly better. Single Power8 socket could provide higher sustain- able bandwidth than two Intel Xeon sockets. Bandwidth GB/s 20 40 60 80 100 120 140 # Threads 1 10 100 Power8 Triad Add Scale Copy Intel Triad Add Scale Copy #LSU #Cores #HWT #Intel Cores #Memory modules Fig. 3: STREAM: Memory bandwidth performance So how fast is it? For which workloads? GFLOPS 1 10 100 #Threads 1 10 100 IBM Power8 (256) (512) (1024) (2048) (4096) Haswell-EP (256) (512) (1024) (2048) (4096) As mentioned above, memory bandwidth is exceptional. Another performed test was matrix multiplication using different algo- rithms. Power8 showed better performance than Haswell-EP for single single-thread tasks. Even ATLAS performed better than MKL. Similar results were achieved with Steady State Heat distribution on 2D plane, but results for smaller matrices are comparable with Intel. In more complex algorithms, such as N-body simulation, Intel Compiler is able to make use of more of the vector potential and pro- vide significantly better performance. Certainly, this is not a processor you would by to your laptop. Or to your home server. It is targeting supercomputing, cloud and big data workload. The memory characteris- tics were proven by the tests, which means this processor is suitable for memory bound problems. So what? Intel has a Skylake! And Phi! Yes. Current results show that the performance is com- parable in some cases, which is not world-changing. It is a long time since the introduction of Power8 and Intel did a great job with 14 nm in Skylake. But Skylake ar- chitecture still do not provide appropriate processors for server segment. The Intel Xeon Phi is exact opposite the the Power8 processor. It is using massive parallelism, but with sig- nificantly less memory and caches. Therefore it is more suitable for compute-bound workloads. The next generation Power9 processor is just around the corner and during next year it will build up new supercomputers Summit and Sierra, aiming high in the TOP 500 list of supercomputers 3 . Sources: 1 http://www.extremetech.com/computing/181102-ibm-power8-openpower-x86-server-monopoly 2 http://fit-rhlab.rhcloud.com/powerlinux-openpower-development-hosting/ 3 http://wccftech.com/nvidia-volta-gpus-ibm-power9-cpus-deliver-300-petaflops- performance-2017-summit-sierra-supercomputers/ Can I try that out? Yes! There is Open Power HUB 2 , where is Red Hat with IBM Linux Technology Center providing access to Power8 hardware.

Upload: others

Post on 25-Mar-2022

12 views

Category:

Documents


0 download

TRANSCRIPT

LATEX TikZposter

? 56 ?

Performance Analysis of IBM POWER8 Processors

Jakub [email protected]

Fakulta informacnıch technologiı VUT v Brne

? 56 ?

Performance Analysis of IBM POWER8 Processors

Jakub [email protected]

Fakulta informacnıch technologiı VUT v Brne

What is IBM POWER8and what is so special

there?The Power8 processor is available underOpenPOWER license.Every core is able to execute instructionsfrom 8 different software threads in real Si-multaneous Multi-Threading (8SMT).Every socket can be connected to 8 memorymodules using fast memory lanes, providingsustainable bandwidth over 190 GB/s persocket. The Power8 can operate in both littleand big endian mode.Competing Intel server/HPC architecturewith similar parameters is Intel Haswell-EP.

Fig. 1: IBM Power8 CPU die, labeled1

Can you compare them?

IBM Intel XeonPower8 E5-2680 v3

8247-21L Haswell-EPLithography 22 nm 22 nmLaunch date Q2’14 Q3’14Cores 10 12Threads 80 24L1 data cache 64 kB 32 kBL2 cache (core) 512 kB 256 kBL3 cache (sum) 80 MB 30 MBL4 cache 128 MB -Bandwidth [GB/s] 192 68Vector unit [b] 2 × 128 256

(VSX) (AVX2)Frequency [GHz] 3.425 2.5Performance 548 480[GFLOPS]

Is it better than IntelHaswell-EP processors?

It depends on what you measure. Memoryperformance is significantly better. SinglePower8 socket could provide higher sustain-able bandwidth than two Intel Xeon sockets.

Ban

dw

idth

GB

/s

20

40

60

80

100

120

140

# Threads1 10 100

Power8TriadAddScaleCopy

IntelTriadAddScaleCopy

#LS

U

#C

ore

s

#H

WT

#In

tel C

ore

s

#M

emo

ry m

od

ules

Fig. 3: STREAM: Memory bandwidth performance

So how fast is it? For which workloads?

GFL

OPS

1

10

100

#Threads1 10 100

IBM Power8 (256) (512) (1024) (2048) (4096)

Haswell-EP (256) (512) (1024) (2048) (4096)

As mentioned above, memory bandwidth isexceptional. Another performed test wasmatrix multiplication using different algo-rithms. Power8 showed better performancethan Haswell-EP for single single-threadtasks. Even ATLAS performed better thanMKL.Similar results were achieved with SteadyState Heat distribution on 2D plane, butresults for smaller matrices are comparablewith Intel.In more complex algorithms, such as N-bodysimulation, Intel Compiler is able to makeuse of more of the vector potential and pro-vide significantly better performance.Certainly, this is not a processor you wouldby to your laptop. Or to your home server.It is targeting supercomputing, cloud and big data workload. The memory characteris-tics were proven by the tests, which means this processor is suitable for memory boundproblems.

So what? Intel has a Skylake!And Phi!

Yes. Current results show that the performance is com-parable in some cases, which is not world-changing. Itis a long time since the introduction of Power8 and Inteldid a great job with 14 nm in Skylake. But Skylake ar-chitecture still do not provide appropriate processors forserver segment.The Intel Xeon Phi is exact opposite the the Power8processor. It is using massive parallelism, but with sig-nificantly less memory and caches. Therefore it is moresuitable for compute-bound workloads.The next generation Power9 processor is just aroundthe corner and during next year it will build up newsupercomputers Summit and Sierra, aiming high in theTOP 500 list of supercomputers3.

Sources:1 http://www.extremetech.com/computing/181102-ibm-power8-openpower-x86-server-monopoly2 http://fit-rhlab.rhcloud.com/powerlinux-openpower-development-hosting/3 http://wccftech.com/nvidia-volta-gpus-ibm-power9-cpus-deliver-300-petaflops-

performance-2017-summit-sierra-supercomputers/

Can I try that out?Yes! There is OpenPower HUB 2, whereis Red Hat with IBMLinux TechnologyCenter providingaccess to Power8hardware.