intel xeon phi coprocessor case study...intel confidential intel® xeon phi™ coprocessor:...
TRANSCRIPT
![Page 1: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/1.jpg)
INTEL CONFIDENTIAL
INTEL®
XEON PHI™COPROCESSOR Case Study
Software Ecosystem Snapshotand End User Momentum
![Page 2: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/2.jpg)
INTEL CONFIDENTIAL
ENERGY
Rishi Khan
Vice President of Research and Development,
ET International, November, 2012
2
HOW TO EVALUATE YOUR APPLICATION
![Page 3: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/3.jpg)
INTEL CONFIDENTIAL3
NO
YES
YES
YES
EVALUATING YOUR APPLICATIONS
Click to see Animation video onExploiting Parallelism on Intel Xeon Phi Coprocessors
Can your workload benefit from more
memory bandwidth?
Can your workload benefit from
large vectors?
NO
NO
Can your workload scale to over 100 threads?
Representative example workload for illustration purposes only
Use Intel® Xeon Phi™ coprocessors for applications that scale with:
• Threads • Vectors • Memory Bandwidth
![Page 4: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/4.jpg)
INTEL CONFIDENTIAL
BREAKTHROUGH PROGRAMMING EFFICIENCYFor Accelerating Highly Parallel Applications
Maintain a single code base
• Software that runs on Intel® Xeon®
processors also runs on Intel®
Xeon Phi™ coprocessors
Use familiar tools
• No need to learn new tools, languages, or development models
Optimize code just once
• Optimizations for Intel® Xeon Phi™ coprocessors also boost performance for Intel Xeon processors
Preserve your investment
• Don’t reinvent the wheel—accelerate performance for existing applications
4
ClusterMulti-core Many-Core
Intel®
MICArchitecture
Multi-coreCPU
Multi-coreCPU
Multi-core and Many-Core Cluster
Multi-core Cluster
CompilersLibraries
Parallel Models
Source
<code id=‖this is code"><lorem type=‖script/megascript"
src=‖.lorem.ipsum.mee-201307.ovs">
<ipsum type="tlet/merengue‖>
![Page 5: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/5.jpg)
INTEL CONFIDENTIAL5
FLEXIBLE USAGE MODELS
SOURCE CODE
Compilers, Libraries and Parallel Models
XEON(s)
MAIN ( )
XEON(s)
MAIN ( )
Xeon Phi(s)
XEON(s)
MAIN ( )
Xeon Phi(s)
MAIN ( )
Xeon Phi(s)
MAIN ( )
Highly parallel code Serial and moderately parallel code
Multicore Only(90% of applications)
Multicore Hosted with Manycore Offload
Symmetric Manycore Only
Percentage
of Code
Exec
utio
n Model
s
![Page 6: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/6.jpg)
INTEL CONFIDENTIAL6
Examples of Highly Parallel Market Segments & ApplicationsClick on links for more information
Mix of ISV and End User Development
Sub-Segments: Applications/Workloads
Public SectorHPL, HPCC, NPB, LAMMPS, QCD, BQCD, GROMACS
Energy (including Oil & Gas)
RTM (Reverse Time Migration), WEM (Wave Equation Migration)
Climate Modeling and Weather Simulation
WRF, HOMME
Financial AnalysisMonte Carlo, Black-Scholes, Binomial model, Heston model
Life Sciences (Molecular Dynamics, Gene Sequencing, Bio-Chemistry)
LAMMPS, NAMD, AMBER, HMMER, BLAST, QCD, CHARMM, BQCD, GROMACS
Manufacturing (CAD/CAM/CAE/CFD/EDA) Implicit, Explicit Solvers
Digital Content Creation Ray Tracing, Animation, Effects
Software Ecosystem Tools, Middleware
![Page 7: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/7.jpg)
INTEL CONFIDENTIALOther brands and names are the property of their respective owners.
A GROWING ECOSYSTEM:Developing today on Intel® Xeon Phi™ coprocessors
Approved for Public Presentation
Shown at SC’12, November 2012
7
![Page 8: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/8.jpg)
INTEL CONFIDENTIAL
Intel® Xeon Phi™ Coprocessor: Performance Proof-pointsClick on links for more information (SC12)
Energy Academic/Government Labs
Financial Services Manufacturing,Digital Content Creation
Acceleware –8th order Isotropic RTM
ASKAP – tHogbomClean(updated)
Monte Carlo (updated)Altair - Radioss (finite element solver)
Sinopec iCluster PSDM (updated)
Sandia – miniFE(finite element solver)
Black-Scholes(updated)
Embree 2.0
CNPC BGP Geoeast PSTM
ZIB Ising 3D Aneo HPClibNEC – Realtime Super Program
CGG: WEMJefferson Labs Lattice QCD(updated)
Intel: 3DFD TTI Proxy (new)
LRZ/TUM SG++
CAS IPE MD Simulation
BQCD (updated)
Gromacs
CAS – CNIC Wigeon
CAS – ICT FDTD
WRF (updated)
AWE Cloverleaf
TACC LB3d
MPI-HMMER (updated)
8
![Page 9: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/9.jpg)
INTEL CONFIDENTIAL
Intel® Xeon Phi™ Coprocessor: Performance Proof-pointsClick on links for more information (ISC13)
Energy Academic/Government Labs
Financial Services Manufacturing,Digital Content Creation
Acceleware –8th order Isotropic RTM
ASKAP – tHogbomClean(updated)
Monte Carlo (updated)Altair - Radioss (finite element solver)
Sinopec iCluster PSDM (updated)
Sandia – miniFE(finite element solver)
Black-Scholes(updated)
Embree 2.0
CNPC BGP Geoeast PSTM
ZIB Ising 3D Aneo HPClibNEC – Realtime Super Resolution Program
CGG: WEMJefferson Labs Lattice QCD(updated)
OpenFOAM (Symmetrec)
Intel: 3DFD TTI Proxy (new)
LRZ/TUM SG++ StartCD (Symmetrec)
CAS IPE MD Simulation STARTCCM+ (Symmetrec)
BQCD (updated) ANSYS-Fluent (Symmetrec)
Gromacs Overflow (NASA, Symmetrec)
CAS – CNIC Wigeon Numeca (native)
CAS – ICT FDTD Phoenics (native)
WRF (updated)
AWE Cloverleaf
TACC LB3d
MPI-HMMER (updated)
9
![Page 10: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/10.jpg)
INTEL CONFIDENTIAL10
CFD Application
• ANSYS Fluent status:
• ansys Fluent has been trying to do a ―beta‖ release of Fluent for Xeon Phi.
• OpenFOAM:
• The current status is that OpenFOAM is about 2x slower on KNC compared to SNB-EP 2S, 1600 MHz. This is single-node, so no Infiniband involved.
• open source development team did not want to do anything to customize for Phi.
• Overflow (NASA): NASA Overflow has done the best of known
structured solvers.
• StartCD|StartCCM+: Ongoing.
![Page 11: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/11.jpg)
INTEL CONFIDENTIAL11
Case Study: TACC -- LBM
![Page 12: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/12.jpg)
INTEL CONFIDENTIAL
![Page 13: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/13.jpg)
INTEL CONFIDENTIAL
![Page 14: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/14.jpg)
INTEL CONFIDENTIAL
![Page 15: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/15.jpg)
INTEL CONFIDENTIAL
![Page 16: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/16.jpg)
INTEL CONFIDENTIAL
![Page 17: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/17.jpg)
INTEL CONFIDENTIAL
![Page 18: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/18.jpg)
INTEL CONFIDENTIAL
![Page 19: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/19.jpg)
INTEL CONFIDENTIAL
![Page 20: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/20.jpg)
INTEL CONFIDENTIAL
![Page 21: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/21.jpg)
INTEL CONFIDENTIAL
![Page 22: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/22.jpg)
INTEL CONFIDENTIAL
![Page 23: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/23.jpg)
INTEL CONFIDENTIAL
![Page 24: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/24.jpg)
INTEL CONFIDENTIAL
![Page 25: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/25.jpg)
INTEL CONFIDENTIAL
![Page 26: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/26.jpg)
INTEL CONFIDENTIAL
![Page 27: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/27.jpg)
INTEL CONFIDENTIAL
![Page 28: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/28.jpg)
INTEL CONFIDENTIAL
![Page 29: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/29.jpg)
INTEL CONFIDENTIAL
![Page 30: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/30.jpg)
INTEL CONFIDENTIAL
![Page 31: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/31.jpg)
INTEL CONFIDENTIAL
![Page 32: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/32.jpg)
INTEL CONFIDENTIAL
![Page 33: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/33.jpg)
INTEL CONFIDENTIAL
![Page 34: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/34.jpg)
INTEL CONFIDENTIAL
![Page 35: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/35.jpg)
INTEL CONFIDENTIAL
![Page 36: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/36.jpg)
INTEL CONFIDENTIAL
![Page 37: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/37.jpg)
INTEL CONFIDENTIAL
![Page 38: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/38.jpg)
INTEL CONFIDENTIAL
![Page 39: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/39.jpg)
INTEL CONFIDENTIAL
![Page 40: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/40.jpg)
INTEL CONFIDENTIAL
![Page 41: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/41.jpg)
INTEL CONFIDENTIAL
![Page 42: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/42.jpg)
INTEL CONFIDENTIAL
![Page 43: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/43.jpg)
INTEL CONFIDENTIAL
![Page 44: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/44.jpg)
INTEL CONFIDENTIAL
![Page 45: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/45.jpg)
INTEL CONFIDENTIAL45
FDTD Case Study -- ICT
![Page 46: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/46.jpg)
INTEL CONFIDENTIAL46
Finite Difference Time DomainApplication background
• Widely used in many electromagnetics domains
Microwave
electromagnetic protection
Navigation
Electromagnetic detection
Radar
Antenna
![Page 47: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/47.jpg)
INTEL CONFIDENTIAL47
• An algorithm used for electromagnetic simulation by solving Maxwell equation.
Finite Difference Time DomainAlgorithm introduction
HE
t
EH E
t
y xz
yx z
y x z
E HE
y z t
HE E
z x t
E E H
x y t
y xzx
yx zy
y x zz
H EHE
y z t
EH HE
z x t
H H EE
x y t
Yee Grid
1
1/2 1/2
1/2 1/2
2 ( ) ( ) 2( 1/ 2, , ) ( 1/ 2, , )
2 ( ) ( ) 2 ( ) ( )
( 1/ 2, 1/ 2, ) ( 1/ 2, 1/ 2, )
( 1/ 2, , 1/ 2) ( 1/ 2, , 1/ 2)
n n
x x
n n
z z
n n
y y
m m t tE i j k E i j k
m m t m m t
H i j k H i j k
y
H i j k H i j k
z
For example, Ex obtained by:
具有单元数量级约为109的小空间模型,对于一个毫秒级持续时间波形的仿真,1000个处理器也要计算至少一个月的时间
![Page 48: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/48.jpg)
INTEL CONFIDENTIAL48
• Define the candidate functions for optimization
- update_E_PML(21.5%), update_H_PML(24.7%)
- update_e(18.8% CPU time), update_h(19.1%),
- SAR_EH_CPX(10.4%),
.
48
Hotspot Analysis
Begin
Grid initialization
Analyses input, initial conditions
END
output
Update H PML
Update H
Update E PML
Update E
meetiteration condition
s?
NO
YES
![Page 49: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/49.jpg)
INTEL CONFIDENTIAL49
• Scalability was improved a lot compared to the baseline code after optimization by
- Loop fusion
- Merge the nested loop
49
Scalability Optimization
#pragma omp for nowaitfor(i=0; i <= nx; i++)
for(j=1; j <= ny; j++) for(k=1; k <= nz; k++) {
……}
int j_end = ny+1;int ij_end=(nx+1)*(ny+1);int ij_start = j_end + 1;#pragma omp for nowaitfor(int ij=ij_start; ij < ij_end; ij++) {
i = ij/j_end;j = ij%j_end;if(!j) continue;
for(k=1; k <= nz; k++) { ……}
#pragma omp for nowaitfor(i=0; i <= nx; i++)
for(j=1; j <= ny; j++) for(k=1; k <= nz; k++) {
Ex= ……}#pragma omp for nowaitfor(i=0; i <= nx; i++)
for(j=1; j <= ny; j++) for(k=1; k <= nz; k++) {
Ey= ……}#pragma omp for nowaitfor(i=0; i <= nx; i++)
for(j=1; j <= ny; j++) for(k=1; k <= nz; k++) {
Ez= ……}
#pragma omp for nowaitfor(i=0; i <= nx; i++)
for(j=1; j <= ny; j++) for(k=1; k <= nz; k++) {
Ex= ……Ey= ……Ez= ……
}
![Page 50: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/50.jpg)
INTEL CONFIDENTIAL50
Vectorization
• Compiler help to auto-vectorize the most inner loop by
- Remove code dependency
- #pragma simd
- Adding compiler option ―-ansi-alias‖
• Check the speed up of scalar code vs vectorized code
0
0.5
1
1.5
2
SNB-scalar SNB-vectorized
Vectorization on SNB
0
1
2
3
4
5
KNC-scalar KNC-vectorized
Vectorization on KNC
![Page 51: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/51.jpg)
INTEL CONFIDENTIAL51
• Written by C and paralleled by OpenMP.
• Include three computation dataset mode, the performance shows different.: small , medium, large dataset
• Migration to Intel® MIC in native mode , with medium dataset.(240*300*300)
• The optimized performance shows below,
51
Implement on Intel® Xeon Phi™ coprocessor
0
1
2
3
4
2*XeonBaseline
2*XeonOptimized
1*KNC
1
2.5
3.2
Speedup over Baseline code on
E5-2670
FDTD on KNC - higher is better
1.28X
SNB KNC
Dual E5-2670 2.6GHz
61 cores1.09 GHzMemory size: 8GB
Compiler XE 13 0.079 Compiler XE 13 0.079
![Page 52: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/52.jpg)
INTEL CONFIDENTIAL52
Case Study: Deep Learning
![Page 53: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/53.jpg)
INTEL CONFIDENTIAL53
Case Study on IPDC
• Case: Deep Learning
- Deep Learning is a new domain of machine learning, it use multiple layers of nonlinear neural network instead of traditional static model
- Deep Learning bring subversive influence to human-machine interaction technology such as voice, image, hands writing and so on
- Deep Learning greatly improve accuracy and processing time
- Deep Learning is also time consuming part in model training
![Page 54: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/54.jpg)
INTEL CONFIDENTIAL
• 2S SNB Baseline: no optimization, single thread, time: 77.33s• Hardware: 2Sockets Intel® Xeon E5-2670 (8 cores, 2.6GHz)• Most time consuming function is ContrastiveDivergence: self time
69.8s, 90%• CPI = 5.6 too high
SNB: 2S Intel® Xeon E5-2670 (8 cores, 2.6GHz)MIC: Intel® Xeon Phi™
(B0, 61 cores, 1.1GHz, 8GB @ 5.5GT/s)
Hotspots Analysis
![Page 55: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/55.jpg)
INTEL CONFIDENTIAL
• L1 cache hit ratio: 2640 / 24978 = 10.6%, too low
Hotspots Analysis
![Page 56: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/56.jpg)
INTEL CONFIDENTIAL
• Original code Inconsecutive access in innermost loop, blocking vectorization and cache
hits
SNB: 2S Intel® Xeon E5-2670 (8 cores, 2.6GHz)MIC: Intel® Xeon Phi™
(B0, 61 cores, 1.1GHz, 8GB @ 5.5GT/s)
Inconsecutive access
Optimization on Intel Xeon
![Page 57: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/57.jpg)
INTEL CONFIDENTIAL
• Optimization V1 transpose for consecutive access in innermost loop SIMD optimization
Consecutive access
Optimization on Intel Xeon
![Page 58: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/58.jpg)
INTEL CONFIDENTIAL
• Original code Inconsecutive access in innermost loop, blocking vectorization and cache
hits
Inconsecutive access
Optimization on Intel Xeon
![Page 59: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/59.jpg)
INTEL CONFIDENTIAL
• Optimization V2 Reconstruction code, directly write to result array and exchange outer
loops SIMD optimization Consecutive access
Optimization on Intel Xeon
![Page 60: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/60.jpg)
INTEL CONFIDENTIAL
• Optimization V3 Reconstruct code, split the value assigning to a different loop Left codes can use MKL sgemm, which has been highly optimized both on
Intel Xeon and Intel Xeon Phi
Optimization on Intel Xeon
![Page 61: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/61.jpg)
INTEL CONFIDENTIAL
• Optimization V4 Use OpenMP multi-threads to scale code to multi/many cores Use schedule(dynamic) to ensure load balance between different threads Use KMP_AFFINITY=balanced,granularity=thread to bind threads to
multi/many cores, avoiding threads migration
Optimization on Intel Xeon
![Page 62: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/62.jpg)
INTEL CONFIDENTIAL
• Optimization V5 Original code Glibc rand() is not thread safe function, and will block parallelism
rand() block parallelism
Optimization on Intel Xeon
![Page 63: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/63.jpg)
INTEL CONFIDENTIAL
• Optimization V5 Original code Glibc rand() is not thread safe function, and will block parallelism
rand() block parallelism
Optimization on Intel Xeon
![Page 64: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/64.jpg)
INTEL CONFIDENTIAL
• Optimization V5 Pre-calculate rand() increasing parallelism
Increasing parallelism
Optimization on Intel Xeon
![Page 65: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/65.jpg)
INTEL CONFIDENTIAL
• Optimization V6 Use MKL random generator instead of Glibc rand(), further increasing
parallelism
Optimization on Intel Xeon
![Page 66: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/66.jpg)
INTEL CONFIDENTIAL
• Optimization result on Intel Xeon CPI reduced from 5.6 to 1.09 L1 cache hit ratio increased from 10.6% to 98.16% Optimizations on Intel Xeon are all useful for Intel Xeon Phi
Optimization on Intel Xeon
![Page 67: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/67.jpg)
INTEL CONFIDENTIAL
SNB: 2S Intel® Xeon E5-2670 (8 cores, 2.6GHz)MIC: Intel® Xeon Phi™
(B0, 61 cores, 1.1GHz,
8GB @ 5.5GT/s)
Xeon® E5
processor –
1 threads baseline
Xeon® E5
processor –
32 threads
optimization
Xeon Phi ™
coprocessor – 244
tasks optimization
Time elapsed 77.335 s 0.2162 s 0.085 s
Performance on Intel Xeon & Intel Xeon Phi
![Page 68: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/68.jpg)
INTEL CONFIDENTIAL68
英特尔®软件学院2013年全新推出的课程基于英特尔® 集成众核架构的编程和优化授课形式:课堂讲授+动手实验
课程长度:2-4天 (可基于客户需求定制)
目标学员:
本课程适合集群和并行系统设计高性能,可扩充应用的软件工程师,项目负责人和解决方案架构师;计算机、高性能应用相关专业的教师等。
课程描述:本课程总时长为期4天,采用课堂授课与动手
实践相结合的方式,主要内容包括英特尔集成众核架构介绍,基于众核架构的开发环境搭建、众核编程、调试、优化以及实际应用案例分享等。本课程设计根据不同用户需求,由初级、中级、高级三部分组成,学生们可以从中阶梯式地学习和掌握课程内容。重点在于了解如何将实际应用移植到集成众核架构,利用英特尔®工具分析并提取应用特征,通过多线程、多任务、向量化等并行优化方法获得性能提升。案例研究说明如何将课堂上的方法和工具合理配合,提升实际应用的性能。
![Page 69: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/69.jpg)
INTEL CONFIDENTIAL69
培训模式及联系方式
英特尔®软件学院:http://software.intel.com/zh-cn/college
联系邮件:[email protected]
英特尔®软件学院
VIP客户/MIC培训中心
英特尔®软件学院
MIC培训中心
客户
![Page 70: INTEL XEON PHI COPROCESSOR Case Study...INTEL CONFIDENTIAL Intel® Xeon Phi™ Coprocessor: Performance Proof-points Click on links for more information (SC12) Energy Academic/ Government](https://reader030.vdocument.in/reader030/viewer/2022040102/5e2ae7b64fed1149364d8895/html5/thumbnails/70.jpg)
INTEL CONFIDENTIAL70
Backup