xeon phi: architecture general...

19
1 Xeon Phi: Architecture Xeon Phi: Architecture General information General information Philipp Bartels Philipp Bartels Thomas Lange Thomas Lange

Upload: others

Post on 29-Aug-2020

10 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

1

Xeon Phi: ArchitectureXeon Phi: ArchitectureGeneral informationGeneral information

Philipp BartelsPhilipp BartelsThomas LangeThomas Lange

Page 2: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

2

TIANHE-2TIANHE-2

32.000 CPUs: XEON E5-2692 v2

48.000 Accelerators: XEON PHI 31S1P

Theoretical Peak: 54,902.4 Tflop/s (double)

Linpack Performance: 33,862.7 TFlop/s

Page 3: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

3

Page 4: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

4

Page 5: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

5

- Accelerator / Co-Prozessor- Accelerator / Co-Prozessor

- general purpose cores (57-61)- general purpose cores (57-61)

- embedded Linux- embedded Linux

Page 6: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

6

More characteristicsMore characteristics

Can get an IP-Adress

x86-64 instruction set

Extension: Initial Many Core Instructions (IMCI)

Quad-Hyperthreading

512-Bit Vektor Registers

Page 7: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

7

Xeon Phi 7120D Xeon Phi 7120D

RCP: 4,235$ (Amazon.com: 3,507.82$)

61 cores (each 1.238 GHz)

Overall 30.5 MB L2-Cache

Main memory: 16GB GDDR5

TDP: 300W

Page 8: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

8

Page 9: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

10

Parallel programming modelsParallel programming models

OpenMPOpenMP

OpenACCOpenACC

Intel Cilk PlusIntel Cilk Plus

Intel TBBIntel TBB

OpenCLOpenCL

Page 10: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

11

Pragma ExamplePragma Example

#pragma offload target (mic) #pragma offload target (mic) in(...) inout(...)in(...) inout(...) {{

#pragma omp parallel for#pragma omp parallel for

for(i=0; i<n; i++){for(i=0; i<n; i++){

c[i] = 2 * a[i] + b[i];c[i] = 2 * a[i] + b[i];

}}

}}

Page 11: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

12

Intel Cilk PlusIntel Cilk Plus

3 simple keywords3 simple keywordscilk_forcilk_forcilk_spawncilk_spawncilk_synccilk_sync

Array notationArray notation

SIMD-enabled functionsSIMD-enabled functions

#pragma simd#pragma simd

Page 12: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

13

exampleexample

cilk_forcilk_for (int i = 0; i < 8; ++i) (int i = 0; i < 8; ++i){{ do_work(i);do_work(i);}}

int fib(int n)int fib(int n){{ if (n < 2)if (n < 2) return n;return n; int x = int x = cilk_spawncilk_spawn fib(n-1);fib(n-1); int y = fib(n-2);int y = fib(n-2); cilk_sync;cilk_sync; return x + y;return x + y;}}

Page 13: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

14

VectorizationVectorization

Page 14: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

15

VectorizationVectorization

perform the same operation on multiple data perform the same operation on multiple data elements in a single instructionelements in a single instruction

#pragma omp simd #pragma omp simd for (i = 0; i < 1024; i++)for (i = 0; i < 1024; i++)

C[i] = A[i]*B[i];C[i] = A[i]*B[i];

//array notation in Intel Cilk Plus//array notation in Intel Cilk Plusfor (i = 0; i < 1024; i+=4)for (i = 0; i < 1024; i+=4)

C[i] = A[i:i+3]*B[i:i+3];C[i] = A[i:i+3]*B[i:i+3];

Page 15: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

16

Vectorization of a loopVectorization of a loop

AutovectorizationAutovectorization

execute more than one iteration of the loop at the execute more than one iteration of the loop at the same timesame time

requirements:requirements:

straight-line codestraight-line code number of iterations must be knownnumber of iterations must be known no loop-carried dependenciesno loop-carried dependencies no special operators no special operators Must be the inner loopMust be the inner loop

Page 16: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

17

ExampleExample

Can be vectorized by compilerCan be vectorized by compiler

for (i=1; i<MAX; i++) {for (i=1; i<MAX; i++) { a[i] = b[i] + c[i]a[i] = b[i] + c[i] d[i] = e[i] – a[i-1]d[i] = e[i] – a[i-1]}}

Cannot be vectorized by compilerCannot be vectorized by compiler

for (i=1; i<MAX; i++) for (i=1; i<MAX; i++) d[i] = e[i] – a[i-1]d[i] = e[i] – a[i-1] a[i] = b[i] + c[i]a[i] = b[i] + c[i]}}

Page 17: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

18

Price $# Cores

Base core clock MHzsingle GFlops

double GFlopsAmount Main Mem.

Mem-BandwidthTDP

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000Xeon Phi 7120A Tesla K40

Comparison with Tesla K40 Comparison with Tesla K40

Page 18: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

19

Who did whatWho did what

Thomas Lange: slide 9 to 17

Philipp Bartels: slide 18 and 1 to 8

Page 19: Xeon Phi: Architecture General informationhpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations...Xeon Phi: Architecture General information Philipp Bartels Thomas Lange 2 TIANHE-2

20

Who did whatWho did what

Thomas Lange: slide 9 to 17

Philipp Bartels: slide 18 and 1 to 8