designing hpc solutions - mugmug.mvapich.cse.ohio-state.edu/static/media/mug/... · time taken for...
Post on 15-Jul-2020
3 Views
Preview:
TRANSCRIPT
Designing HPC Solutions
Onur Celebioglu
Dell Inc
Agenda
bull HPC Focus Areas
bull Performance analysis of HPC Components
ndash Compute
ndash Interconnect
ndash Accelerators
ndash And many more
bull Best Practices
bull Designing better HPC solutions
ndash Domain specific Appliances
HPC at Dell
bull Evaluate new HPC technologies and selectively adopt for Integration
bull Share our findings with the broader HPC community
bull Analyze decision points to obtain the optimal solution to the problem at hand
bull Decision Points include but not limited to
ndash Compute Performance
ndash Memory Performance
ndash Interconnect
ndash Accelerators
ndash Storage
ndash Power Energy Efficiency
ndash Software Stack
ndash Middleware
bull Focus Areas
ndash Define best practices by analyzing each and every component of an HPC cluster
ndash Use these best practices to develop plug and play solutions targeted at specific HPC verticals such as Life sciences Fluid Dynamics High frequency trading etc
Compute Memory amp Energy Efficiency
12G ndash Optimal BIOS Settings
-5
Pe
rf+
19
P
ow
er
Sa
vin
g
-8
Pe
rf+
22
P
ow
er
Sa
vin
g
-6
Pe
rf+
25
P
ow
er
Sa
vin
g
-12
P
erf
+2
4
Po
we
r S
av
ing
-11
P
erf
+1
6
Po
we
r S
av
ing
Sa
me
Pe
rf+
10
Po
we
r S
av
ing
-15
P
erf
+2
1
Po
we
r S
av
ing
-5
Pe
rf+
18
P
ow
er
Sa
vin
g
-7
Pe
rf+
26
P
ow
er
Sa
vin
g
-6
Pe
rf+
26
P
ow
er
Sa
vin
g
-9
Pe
rf+
23
P
ow
er
Sa
vin
g
-13
P
erf
+2
6
Po
we
r S
av
ing
-4
Pe
rf+
20
P
ow
er
Sa
vin
g
-10
P
erf
+2
5
Po
we
r S
av
ing
080
085
090
095
100
105
110
115
120
HPL Fluenttruck_poly_14m
Fluenttruck_111m
WRFconus_12k
NAMDstmv
MILCIntel input file
LUclass D
En
erg
y e
ffic
ien
cy
ga
ins
wit
h T
urb
o d
isa
ble
d(r
ela
tiv
e t
o T
urb
o e
na
ble
d)
DAPC Perf
Balanced configuration Performance focused Energy Efficient
configuration
Latency sensitive
System Profile Performance Per Watt
Optimized (DAPC)
Performance
Optimized
Custom Custom
CPU Power Mgmt System DBPM Max Performance System DBPM Max Performance
Turbo Boost Enabled Enabled Disabled Disabled
C States amp C1E Enabled Disabled Enabled Disabled
Monitor Mwait Enabled Enabled Enabled Disabled
Logical Processor Disabled Disabled Disabled Disabled
Node Interleaving Disabled Disabled Disabled Disabled
Ivy Bridge vs Sandy Bridge Single Node
bull E5-2670 8C 26 Ghz (SB) vs E5-2697 V2 12C 27 GHz (IVB)
46
25
16
3
12
26
37
0
5
10
15
20
25
30
35
40
45
50
HPL ANSYS Fluent LS-DYNA Simulia Abaqus612- S4B
Simulia Abaqus612- E6
LAMMPS MUMPS
Performance Gain with Ivy Bridge (12 core) over Sandy Bridge (8 core)
Decision Processor selection Criteria Performance
bull 2 x E5-2697-v2 27 GHz 12c 130W does the best in most cases
bull All tests done on fully subscribed 4 servers with FDR interconnect
Performance across four nodes using multiple IVB processors
Decision Processor selection Criteria Power
bull 2 x E5-2697-v2 27 GHz 12c 130W does the best in most cases
bull All tests done on fully subscribed 4 servers with FDR interconnect
Energy efficiency across four nodes using multiple IVB processors
Decision Memory selection Criteria Performance
bull Dual Rank memory modules give best performance
bull All tests done on fully subscribed 4 servers with FDR interconnect
Performance drop when using single rank memory modules on 4 nodes
Interconnect Performance
OSU Latency and Bandwidth (FDR vs 40 GigE RoCE)
bull How do benchmarks synthetic kernels and micro benchmarks behave at scale
bull Can micro benchmark performance explain applicationrsquos performance at a larger scale
137
163
12
125
13
135
14
145
15
155
16
165
FDR 40GigE RoCE
Late
ncy
(u
s)
MPI OSU latency
626273
49023
0
1000
2000
3000
4000
5000
6000
7000
FDR 40GigE RoCE
Ban
dw
idth
(M
Bs
)
MPI OSU Bandwidth
MVAPICH2-20b and OMB v42
RoCE vs IB vs TCP
1 1 1 1 1 1
10
0
10
1
10
2
10
2
10
0
10
1
10
0
10
1
10
1
09
5
09
4
09
3
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
GFl
op
s)
Nodes - Cores
HPL
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b
10
0
10
0
10
0
10
0
10
0
10
0
09
9
10
0
10
1
09
9
10
0
09
8
09
9
09
7
08
8
07
5
06
3
02
9
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
NPB LU Class D
FDR 40GigE-ROCE 40GigE-TCP
RoCE vs IB vs TCP
1 1 1 1 1 1
09
9
10
0
10
2
09
4
08
4
08
3
10
0
10
0
09
4
07
2
05
1
03
8
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
WRF Conus 12Km
FDR 40GigE-ROCE 40GigE-TCP
10
0
10
0
10
0
10
0
10
0
10
0
10
1
09
8
08
7
05
9
07
5
05
7
10
1
09
8
04
9
02
9
03
0 04
0
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
Rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
MILC Intel Data set
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b WRF 35 MILC 762
Interconnect Summary
bull InfiniBand is still performs higher than other network fabrics in this study for HPC workloads
bull For some workloads RoCE performs similar to InfiniBand and may be a viable alternative
ndash Havenrsquot seen wide adoption of RoCE in production yet
ndash Mileage will vary based on applicationrsquos communication characteristics
ndash Needs switches with DCB support for optimal lossless performance
bull Ethernet with TCPIP stops scaling after 4-8 nodes
Accelerator Performance
Power and Performance K20 vs K40
HPL performance on single-node
Power amp energy efficiency of an eight node cluster
MV2 performance with GPU Direct OMBDevice to Device Latency
Intel SandyBridge (E5-2670) NVIDIA Telsa K20m GPU Mellanox ConnectX-3 FDR CUDA 60OFED 22-100 with GPU Direct RDMA Beta
62
26x
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Agenda
bull HPC Focus Areas
bull Performance analysis of HPC Components
ndash Compute
ndash Interconnect
ndash Accelerators
ndash And many more
bull Best Practices
bull Designing better HPC solutions
ndash Domain specific Appliances
HPC at Dell
bull Evaluate new HPC technologies and selectively adopt for Integration
bull Share our findings with the broader HPC community
bull Analyze decision points to obtain the optimal solution to the problem at hand
bull Decision Points include but not limited to
ndash Compute Performance
ndash Memory Performance
ndash Interconnect
ndash Accelerators
ndash Storage
ndash Power Energy Efficiency
ndash Software Stack
ndash Middleware
bull Focus Areas
ndash Define best practices by analyzing each and every component of an HPC cluster
ndash Use these best practices to develop plug and play solutions targeted at specific HPC verticals such as Life sciences Fluid Dynamics High frequency trading etc
Compute Memory amp Energy Efficiency
12G ndash Optimal BIOS Settings
-5
Pe
rf+
19
P
ow
er
Sa
vin
g
-8
Pe
rf+
22
P
ow
er
Sa
vin
g
-6
Pe
rf+
25
P
ow
er
Sa
vin
g
-12
P
erf
+2
4
Po
we
r S
av
ing
-11
P
erf
+1
6
Po
we
r S
av
ing
Sa
me
Pe
rf+
10
Po
we
r S
av
ing
-15
P
erf
+2
1
Po
we
r S
av
ing
-5
Pe
rf+
18
P
ow
er
Sa
vin
g
-7
Pe
rf+
26
P
ow
er
Sa
vin
g
-6
Pe
rf+
26
P
ow
er
Sa
vin
g
-9
Pe
rf+
23
P
ow
er
Sa
vin
g
-13
P
erf
+2
6
Po
we
r S
av
ing
-4
Pe
rf+
20
P
ow
er
Sa
vin
g
-10
P
erf
+2
5
Po
we
r S
av
ing
080
085
090
095
100
105
110
115
120
HPL Fluenttruck_poly_14m
Fluenttruck_111m
WRFconus_12k
NAMDstmv
MILCIntel input file
LUclass D
En
erg
y e
ffic
ien
cy
ga
ins
wit
h T
urb
o d
isa
ble
d(r
ela
tiv
e t
o T
urb
o e
na
ble
d)
DAPC Perf
Balanced configuration Performance focused Energy Efficient
configuration
Latency sensitive
System Profile Performance Per Watt
Optimized (DAPC)
Performance
Optimized
Custom Custom
CPU Power Mgmt System DBPM Max Performance System DBPM Max Performance
Turbo Boost Enabled Enabled Disabled Disabled
C States amp C1E Enabled Disabled Enabled Disabled
Monitor Mwait Enabled Enabled Enabled Disabled
Logical Processor Disabled Disabled Disabled Disabled
Node Interleaving Disabled Disabled Disabled Disabled
Ivy Bridge vs Sandy Bridge Single Node
bull E5-2670 8C 26 Ghz (SB) vs E5-2697 V2 12C 27 GHz (IVB)
46
25
16
3
12
26
37
0
5
10
15
20
25
30
35
40
45
50
HPL ANSYS Fluent LS-DYNA Simulia Abaqus612- S4B
Simulia Abaqus612- E6
LAMMPS MUMPS
Performance Gain with Ivy Bridge (12 core) over Sandy Bridge (8 core)
Decision Processor selection Criteria Performance
bull 2 x E5-2697-v2 27 GHz 12c 130W does the best in most cases
bull All tests done on fully subscribed 4 servers with FDR interconnect
Performance across four nodes using multiple IVB processors
Decision Processor selection Criteria Power
bull 2 x E5-2697-v2 27 GHz 12c 130W does the best in most cases
bull All tests done on fully subscribed 4 servers with FDR interconnect
Energy efficiency across four nodes using multiple IVB processors
Decision Memory selection Criteria Performance
bull Dual Rank memory modules give best performance
bull All tests done on fully subscribed 4 servers with FDR interconnect
Performance drop when using single rank memory modules on 4 nodes
Interconnect Performance
OSU Latency and Bandwidth (FDR vs 40 GigE RoCE)
bull How do benchmarks synthetic kernels and micro benchmarks behave at scale
bull Can micro benchmark performance explain applicationrsquos performance at a larger scale
137
163
12
125
13
135
14
145
15
155
16
165
FDR 40GigE RoCE
Late
ncy
(u
s)
MPI OSU latency
626273
49023
0
1000
2000
3000
4000
5000
6000
7000
FDR 40GigE RoCE
Ban
dw
idth
(M
Bs
)
MPI OSU Bandwidth
MVAPICH2-20b and OMB v42
RoCE vs IB vs TCP
1 1 1 1 1 1
10
0
10
1
10
2
10
2
10
0
10
1
10
0
10
1
10
1
09
5
09
4
09
3
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
GFl
op
s)
Nodes - Cores
HPL
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b
10
0
10
0
10
0
10
0
10
0
10
0
09
9
10
0
10
1
09
9
10
0
09
8
09
9
09
7
08
8
07
5
06
3
02
9
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
NPB LU Class D
FDR 40GigE-ROCE 40GigE-TCP
RoCE vs IB vs TCP
1 1 1 1 1 1
09
9
10
0
10
2
09
4
08
4
08
3
10
0
10
0
09
4
07
2
05
1
03
8
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
WRF Conus 12Km
FDR 40GigE-ROCE 40GigE-TCP
10
0
10
0
10
0
10
0
10
0
10
0
10
1
09
8
08
7
05
9
07
5
05
7
10
1
09
8
04
9
02
9
03
0 04
0
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
Rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
MILC Intel Data set
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b WRF 35 MILC 762
Interconnect Summary
bull InfiniBand is still performs higher than other network fabrics in this study for HPC workloads
bull For some workloads RoCE performs similar to InfiniBand and may be a viable alternative
ndash Havenrsquot seen wide adoption of RoCE in production yet
ndash Mileage will vary based on applicationrsquos communication characteristics
ndash Needs switches with DCB support for optimal lossless performance
bull Ethernet with TCPIP stops scaling after 4-8 nodes
Accelerator Performance
Power and Performance K20 vs K40
HPL performance on single-node
Power amp energy efficiency of an eight node cluster
MV2 performance with GPU Direct OMBDevice to Device Latency
Intel SandyBridge (E5-2670) NVIDIA Telsa K20m GPU Mellanox ConnectX-3 FDR CUDA 60OFED 22-100 with GPU Direct RDMA Beta
62
26x
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
HPC at Dell
bull Evaluate new HPC technologies and selectively adopt for Integration
bull Share our findings with the broader HPC community
bull Analyze decision points to obtain the optimal solution to the problem at hand
bull Decision Points include but not limited to
ndash Compute Performance
ndash Memory Performance
ndash Interconnect
ndash Accelerators
ndash Storage
ndash Power Energy Efficiency
ndash Software Stack
ndash Middleware
bull Focus Areas
ndash Define best practices by analyzing each and every component of an HPC cluster
ndash Use these best practices to develop plug and play solutions targeted at specific HPC verticals such as Life sciences Fluid Dynamics High frequency trading etc
Compute Memory amp Energy Efficiency
12G ndash Optimal BIOS Settings
-5
Pe
rf+
19
P
ow
er
Sa
vin
g
-8
Pe
rf+
22
P
ow
er
Sa
vin
g
-6
Pe
rf+
25
P
ow
er
Sa
vin
g
-12
P
erf
+2
4
Po
we
r S
av
ing
-11
P
erf
+1
6
Po
we
r S
av
ing
Sa
me
Pe
rf+
10
Po
we
r S
av
ing
-15
P
erf
+2
1
Po
we
r S
av
ing
-5
Pe
rf+
18
P
ow
er
Sa
vin
g
-7
Pe
rf+
26
P
ow
er
Sa
vin
g
-6
Pe
rf+
26
P
ow
er
Sa
vin
g
-9
Pe
rf+
23
P
ow
er
Sa
vin
g
-13
P
erf
+2
6
Po
we
r S
av
ing
-4
Pe
rf+
20
P
ow
er
Sa
vin
g
-10
P
erf
+2
5
Po
we
r S
av
ing
080
085
090
095
100
105
110
115
120
HPL Fluenttruck_poly_14m
Fluenttruck_111m
WRFconus_12k
NAMDstmv
MILCIntel input file
LUclass D
En
erg
y e
ffic
ien
cy
ga
ins
wit
h T
urb
o d
isa
ble
d(r
ela
tiv
e t
o T
urb
o e
na
ble
d)
DAPC Perf
Balanced configuration Performance focused Energy Efficient
configuration
Latency sensitive
System Profile Performance Per Watt
Optimized (DAPC)
Performance
Optimized
Custom Custom
CPU Power Mgmt System DBPM Max Performance System DBPM Max Performance
Turbo Boost Enabled Enabled Disabled Disabled
C States amp C1E Enabled Disabled Enabled Disabled
Monitor Mwait Enabled Enabled Enabled Disabled
Logical Processor Disabled Disabled Disabled Disabled
Node Interleaving Disabled Disabled Disabled Disabled
Ivy Bridge vs Sandy Bridge Single Node
bull E5-2670 8C 26 Ghz (SB) vs E5-2697 V2 12C 27 GHz (IVB)
46
25
16
3
12
26
37
0
5
10
15
20
25
30
35
40
45
50
HPL ANSYS Fluent LS-DYNA Simulia Abaqus612- S4B
Simulia Abaqus612- E6
LAMMPS MUMPS
Performance Gain with Ivy Bridge (12 core) over Sandy Bridge (8 core)
Decision Processor selection Criteria Performance
bull 2 x E5-2697-v2 27 GHz 12c 130W does the best in most cases
bull All tests done on fully subscribed 4 servers with FDR interconnect
Performance across four nodes using multiple IVB processors
Decision Processor selection Criteria Power
bull 2 x E5-2697-v2 27 GHz 12c 130W does the best in most cases
bull All tests done on fully subscribed 4 servers with FDR interconnect
Energy efficiency across four nodes using multiple IVB processors
Decision Memory selection Criteria Performance
bull Dual Rank memory modules give best performance
bull All tests done on fully subscribed 4 servers with FDR interconnect
Performance drop when using single rank memory modules on 4 nodes
Interconnect Performance
OSU Latency and Bandwidth (FDR vs 40 GigE RoCE)
bull How do benchmarks synthetic kernels and micro benchmarks behave at scale
bull Can micro benchmark performance explain applicationrsquos performance at a larger scale
137
163
12
125
13
135
14
145
15
155
16
165
FDR 40GigE RoCE
Late
ncy
(u
s)
MPI OSU latency
626273
49023
0
1000
2000
3000
4000
5000
6000
7000
FDR 40GigE RoCE
Ban
dw
idth
(M
Bs
)
MPI OSU Bandwidth
MVAPICH2-20b and OMB v42
RoCE vs IB vs TCP
1 1 1 1 1 1
10
0
10
1
10
2
10
2
10
0
10
1
10
0
10
1
10
1
09
5
09
4
09
3
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
GFl
op
s)
Nodes - Cores
HPL
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b
10
0
10
0
10
0
10
0
10
0
10
0
09
9
10
0
10
1
09
9
10
0
09
8
09
9
09
7
08
8
07
5
06
3
02
9
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
NPB LU Class D
FDR 40GigE-ROCE 40GigE-TCP
RoCE vs IB vs TCP
1 1 1 1 1 1
09
9
10
0
10
2
09
4
08
4
08
3
10
0
10
0
09
4
07
2
05
1
03
8
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
WRF Conus 12Km
FDR 40GigE-ROCE 40GigE-TCP
10
0
10
0
10
0
10
0
10
0
10
0
10
1
09
8
08
7
05
9
07
5
05
7
10
1
09
8
04
9
02
9
03
0 04
0
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
Rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
MILC Intel Data set
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b WRF 35 MILC 762
Interconnect Summary
bull InfiniBand is still performs higher than other network fabrics in this study for HPC workloads
bull For some workloads RoCE performs similar to InfiniBand and may be a viable alternative
ndash Havenrsquot seen wide adoption of RoCE in production yet
ndash Mileage will vary based on applicationrsquos communication characteristics
ndash Needs switches with DCB support for optimal lossless performance
bull Ethernet with TCPIP stops scaling after 4-8 nodes
Accelerator Performance
Power and Performance K20 vs K40
HPL performance on single-node
Power amp energy efficiency of an eight node cluster
MV2 performance with GPU Direct OMBDevice to Device Latency
Intel SandyBridge (E5-2670) NVIDIA Telsa K20m GPU Mellanox ConnectX-3 FDR CUDA 60OFED 22-100 with GPU Direct RDMA Beta
62
26x
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Compute Memory amp Energy Efficiency
12G ndash Optimal BIOS Settings
-5
Pe
rf+
19
P
ow
er
Sa
vin
g
-8
Pe
rf+
22
P
ow
er
Sa
vin
g
-6
Pe
rf+
25
P
ow
er
Sa
vin
g
-12
P
erf
+2
4
Po
we
r S
av
ing
-11
P
erf
+1
6
Po
we
r S
av
ing
Sa
me
Pe
rf+
10
Po
we
r S
av
ing
-15
P
erf
+2
1
Po
we
r S
av
ing
-5
Pe
rf+
18
P
ow
er
Sa
vin
g
-7
Pe
rf+
26
P
ow
er
Sa
vin
g
-6
Pe
rf+
26
P
ow
er
Sa
vin
g
-9
Pe
rf+
23
P
ow
er
Sa
vin
g
-13
P
erf
+2
6
Po
we
r S
av
ing
-4
Pe
rf+
20
P
ow
er
Sa
vin
g
-10
P
erf
+2
5
Po
we
r S
av
ing
080
085
090
095
100
105
110
115
120
HPL Fluenttruck_poly_14m
Fluenttruck_111m
WRFconus_12k
NAMDstmv
MILCIntel input file
LUclass D
En
erg
y e
ffic
ien
cy
ga
ins
wit
h T
urb
o d
isa
ble
d(r
ela
tiv
e t
o T
urb
o e
na
ble
d)
DAPC Perf
Balanced configuration Performance focused Energy Efficient
configuration
Latency sensitive
System Profile Performance Per Watt
Optimized (DAPC)
Performance
Optimized
Custom Custom
CPU Power Mgmt System DBPM Max Performance System DBPM Max Performance
Turbo Boost Enabled Enabled Disabled Disabled
C States amp C1E Enabled Disabled Enabled Disabled
Monitor Mwait Enabled Enabled Enabled Disabled
Logical Processor Disabled Disabled Disabled Disabled
Node Interleaving Disabled Disabled Disabled Disabled
Ivy Bridge vs Sandy Bridge Single Node
bull E5-2670 8C 26 Ghz (SB) vs E5-2697 V2 12C 27 GHz (IVB)
46
25
16
3
12
26
37
0
5
10
15
20
25
30
35
40
45
50
HPL ANSYS Fluent LS-DYNA Simulia Abaqus612- S4B
Simulia Abaqus612- E6
LAMMPS MUMPS
Performance Gain with Ivy Bridge (12 core) over Sandy Bridge (8 core)
Decision Processor selection Criteria Performance
bull 2 x E5-2697-v2 27 GHz 12c 130W does the best in most cases
bull All tests done on fully subscribed 4 servers with FDR interconnect
Performance across four nodes using multiple IVB processors
Decision Processor selection Criteria Power
bull 2 x E5-2697-v2 27 GHz 12c 130W does the best in most cases
bull All tests done on fully subscribed 4 servers with FDR interconnect
Energy efficiency across four nodes using multiple IVB processors
Decision Memory selection Criteria Performance
bull Dual Rank memory modules give best performance
bull All tests done on fully subscribed 4 servers with FDR interconnect
Performance drop when using single rank memory modules on 4 nodes
Interconnect Performance
OSU Latency and Bandwidth (FDR vs 40 GigE RoCE)
bull How do benchmarks synthetic kernels and micro benchmarks behave at scale
bull Can micro benchmark performance explain applicationrsquos performance at a larger scale
137
163
12
125
13
135
14
145
15
155
16
165
FDR 40GigE RoCE
Late
ncy
(u
s)
MPI OSU latency
626273
49023
0
1000
2000
3000
4000
5000
6000
7000
FDR 40GigE RoCE
Ban
dw
idth
(M
Bs
)
MPI OSU Bandwidth
MVAPICH2-20b and OMB v42
RoCE vs IB vs TCP
1 1 1 1 1 1
10
0
10
1
10
2
10
2
10
0
10
1
10
0
10
1
10
1
09
5
09
4
09
3
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
GFl
op
s)
Nodes - Cores
HPL
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b
10
0
10
0
10
0
10
0
10
0
10
0
09
9
10
0
10
1
09
9
10
0
09
8
09
9
09
7
08
8
07
5
06
3
02
9
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
NPB LU Class D
FDR 40GigE-ROCE 40GigE-TCP
RoCE vs IB vs TCP
1 1 1 1 1 1
09
9
10
0
10
2
09
4
08
4
08
3
10
0
10
0
09
4
07
2
05
1
03
8
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
WRF Conus 12Km
FDR 40GigE-ROCE 40GigE-TCP
10
0
10
0
10
0
10
0
10
0
10
0
10
1
09
8
08
7
05
9
07
5
05
7
10
1
09
8
04
9
02
9
03
0 04
0
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
Rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
MILC Intel Data set
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b WRF 35 MILC 762
Interconnect Summary
bull InfiniBand is still performs higher than other network fabrics in this study for HPC workloads
bull For some workloads RoCE performs similar to InfiniBand and may be a viable alternative
ndash Havenrsquot seen wide adoption of RoCE in production yet
ndash Mileage will vary based on applicationrsquos communication characteristics
ndash Needs switches with DCB support for optimal lossless performance
bull Ethernet with TCPIP stops scaling after 4-8 nodes
Accelerator Performance
Power and Performance K20 vs K40
HPL performance on single-node
Power amp energy efficiency of an eight node cluster
MV2 performance with GPU Direct OMBDevice to Device Latency
Intel SandyBridge (E5-2670) NVIDIA Telsa K20m GPU Mellanox ConnectX-3 FDR CUDA 60OFED 22-100 with GPU Direct RDMA Beta
62
26x
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
12G ndash Optimal BIOS Settings
-5
Pe
rf+
19
P
ow
er
Sa
vin
g
-8
Pe
rf+
22
P
ow
er
Sa
vin
g
-6
Pe
rf+
25
P
ow
er
Sa
vin
g
-12
P
erf
+2
4
Po
we
r S
av
ing
-11
P
erf
+1
6
Po
we
r S
av
ing
Sa
me
Pe
rf+
10
Po
we
r S
av
ing
-15
P
erf
+2
1
Po
we
r S
av
ing
-5
Pe
rf+
18
P
ow
er
Sa
vin
g
-7
Pe
rf+
26
P
ow
er
Sa
vin
g
-6
Pe
rf+
26
P
ow
er
Sa
vin
g
-9
Pe
rf+
23
P
ow
er
Sa
vin
g
-13
P
erf
+2
6
Po
we
r S
av
ing
-4
Pe
rf+
20
P
ow
er
Sa
vin
g
-10
P
erf
+2
5
Po
we
r S
av
ing
080
085
090
095
100
105
110
115
120
HPL Fluenttruck_poly_14m
Fluenttruck_111m
WRFconus_12k
NAMDstmv
MILCIntel input file
LUclass D
En
erg
y e
ffic
ien
cy
ga
ins
wit
h T
urb
o d
isa
ble
d(r
ela
tiv
e t
o T
urb
o e
na
ble
d)
DAPC Perf
Balanced configuration Performance focused Energy Efficient
configuration
Latency sensitive
System Profile Performance Per Watt
Optimized (DAPC)
Performance
Optimized
Custom Custom
CPU Power Mgmt System DBPM Max Performance System DBPM Max Performance
Turbo Boost Enabled Enabled Disabled Disabled
C States amp C1E Enabled Disabled Enabled Disabled
Monitor Mwait Enabled Enabled Enabled Disabled
Logical Processor Disabled Disabled Disabled Disabled
Node Interleaving Disabled Disabled Disabled Disabled
Ivy Bridge vs Sandy Bridge Single Node
bull E5-2670 8C 26 Ghz (SB) vs E5-2697 V2 12C 27 GHz (IVB)
46
25
16
3
12
26
37
0
5
10
15
20
25
30
35
40
45
50
HPL ANSYS Fluent LS-DYNA Simulia Abaqus612- S4B
Simulia Abaqus612- E6
LAMMPS MUMPS
Performance Gain with Ivy Bridge (12 core) over Sandy Bridge (8 core)
Decision Processor selection Criteria Performance
bull 2 x E5-2697-v2 27 GHz 12c 130W does the best in most cases
bull All tests done on fully subscribed 4 servers with FDR interconnect
Performance across four nodes using multiple IVB processors
Decision Processor selection Criteria Power
bull 2 x E5-2697-v2 27 GHz 12c 130W does the best in most cases
bull All tests done on fully subscribed 4 servers with FDR interconnect
Energy efficiency across four nodes using multiple IVB processors
Decision Memory selection Criteria Performance
bull Dual Rank memory modules give best performance
bull All tests done on fully subscribed 4 servers with FDR interconnect
Performance drop when using single rank memory modules on 4 nodes
Interconnect Performance
OSU Latency and Bandwidth (FDR vs 40 GigE RoCE)
bull How do benchmarks synthetic kernels and micro benchmarks behave at scale
bull Can micro benchmark performance explain applicationrsquos performance at a larger scale
137
163
12
125
13
135
14
145
15
155
16
165
FDR 40GigE RoCE
Late
ncy
(u
s)
MPI OSU latency
626273
49023
0
1000
2000
3000
4000
5000
6000
7000
FDR 40GigE RoCE
Ban
dw
idth
(M
Bs
)
MPI OSU Bandwidth
MVAPICH2-20b and OMB v42
RoCE vs IB vs TCP
1 1 1 1 1 1
10
0
10
1
10
2
10
2
10
0
10
1
10
0
10
1
10
1
09
5
09
4
09
3
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
GFl
op
s)
Nodes - Cores
HPL
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b
10
0
10
0
10
0
10
0
10
0
10
0
09
9
10
0
10
1
09
9
10
0
09
8
09
9
09
7
08
8
07
5
06
3
02
9
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
NPB LU Class D
FDR 40GigE-ROCE 40GigE-TCP
RoCE vs IB vs TCP
1 1 1 1 1 1
09
9
10
0
10
2
09
4
08
4
08
3
10
0
10
0
09
4
07
2
05
1
03
8
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
WRF Conus 12Km
FDR 40GigE-ROCE 40GigE-TCP
10
0
10
0
10
0
10
0
10
0
10
0
10
1
09
8
08
7
05
9
07
5
05
7
10
1
09
8
04
9
02
9
03
0 04
0
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
Rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
MILC Intel Data set
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b WRF 35 MILC 762
Interconnect Summary
bull InfiniBand is still performs higher than other network fabrics in this study for HPC workloads
bull For some workloads RoCE performs similar to InfiniBand and may be a viable alternative
ndash Havenrsquot seen wide adoption of RoCE in production yet
ndash Mileage will vary based on applicationrsquos communication characteristics
ndash Needs switches with DCB support for optimal lossless performance
bull Ethernet with TCPIP stops scaling after 4-8 nodes
Accelerator Performance
Power and Performance K20 vs K40
HPL performance on single-node
Power amp energy efficiency of an eight node cluster
MV2 performance with GPU Direct OMBDevice to Device Latency
Intel SandyBridge (E5-2670) NVIDIA Telsa K20m GPU Mellanox ConnectX-3 FDR CUDA 60OFED 22-100 with GPU Direct RDMA Beta
62
26x
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Ivy Bridge vs Sandy Bridge Single Node
bull E5-2670 8C 26 Ghz (SB) vs E5-2697 V2 12C 27 GHz (IVB)
46
25
16
3
12
26
37
0
5
10
15
20
25
30
35
40
45
50
HPL ANSYS Fluent LS-DYNA Simulia Abaqus612- S4B
Simulia Abaqus612- E6
LAMMPS MUMPS
Performance Gain with Ivy Bridge (12 core) over Sandy Bridge (8 core)
Decision Processor selection Criteria Performance
bull 2 x E5-2697-v2 27 GHz 12c 130W does the best in most cases
bull All tests done on fully subscribed 4 servers with FDR interconnect
Performance across four nodes using multiple IVB processors
Decision Processor selection Criteria Power
bull 2 x E5-2697-v2 27 GHz 12c 130W does the best in most cases
bull All tests done on fully subscribed 4 servers with FDR interconnect
Energy efficiency across four nodes using multiple IVB processors
Decision Memory selection Criteria Performance
bull Dual Rank memory modules give best performance
bull All tests done on fully subscribed 4 servers with FDR interconnect
Performance drop when using single rank memory modules on 4 nodes
Interconnect Performance
OSU Latency and Bandwidth (FDR vs 40 GigE RoCE)
bull How do benchmarks synthetic kernels and micro benchmarks behave at scale
bull Can micro benchmark performance explain applicationrsquos performance at a larger scale
137
163
12
125
13
135
14
145
15
155
16
165
FDR 40GigE RoCE
Late
ncy
(u
s)
MPI OSU latency
626273
49023
0
1000
2000
3000
4000
5000
6000
7000
FDR 40GigE RoCE
Ban
dw
idth
(M
Bs
)
MPI OSU Bandwidth
MVAPICH2-20b and OMB v42
RoCE vs IB vs TCP
1 1 1 1 1 1
10
0
10
1
10
2
10
2
10
0
10
1
10
0
10
1
10
1
09
5
09
4
09
3
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
GFl
op
s)
Nodes - Cores
HPL
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b
10
0
10
0
10
0
10
0
10
0
10
0
09
9
10
0
10
1
09
9
10
0
09
8
09
9
09
7
08
8
07
5
06
3
02
9
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
NPB LU Class D
FDR 40GigE-ROCE 40GigE-TCP
RoCE vs IB vs TCP
1 1 1 1 1 1
09
9
10
0
10
2
09
4
08
4
08
3
10
0
10
0
09
4
07
2
05
1
03
8
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
WRF Conus 12Km
FDR 40GigE-ROCE 40GigE-TCP
10
0
10
0
10
0
10
0
10
0
10
0
10
1
09
8
08
7
05
9
07
5
05
7
10
1
09
8
04
9
02
9
03
0 04
0
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
Rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
MILC Intel Data set
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b WRF 35 MILC 762
Interconnect Summary
bull InfiniBand is still performs higher than other network fabrics in this study for HPC workloads
bull For some workloads RoCE performs similar to InfiniBand and may be a viable alternative
ndash Havenrsquot seen wide adoption of RoCE in production yet
ndash Mileage will vary based on applicationrsquos communication characteristics
ndash Needs switches with DCB support for optimal lossless performance
bull Ethernet with TCPIP stops scaling after 4-8 nodes
Accelerator Performance
Power and Performance K20 vs K40
HPL performance on single-node
Power amp energy efficiency of an eight node cluster
MV2 performance with GPU Direct OMBDevice to Device Latency
Intel SandyBridge (E5-2670) NVIDIA Telsa K20m GPU Mellanox ConnectX-3 FDR CUDA 60OFED 22-100 with GPU Direct RDMA Beta
62
26x
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Decision Processor selection Criteria Performance
bull 2 x E5-2697-v2 27 GHz 12c 130W does the best in most cases
bull All tests done on fully subscribed 4 servers with FDR interconnect
Performance across four nodes using multiple IVB processors
Decision Processor selection Criteria Power
bull 2 x E5-2697-v2 27 GHz 12c 130W does the best in most cases
bull All tests done on fully subscribed 4 servers with FDR interconnect
Energy efficiency across four nodes using multiple IVB processors
Decision Memory selection Criteria Performance
bull Dual Rank memory modules give best performance
bull All tests done on fully subscribed 4 servers with FDR interconnect
Performance drop when using single rank memory modules on 4 nodes
Interconnect Performance
OSU Latency and Bandwidth (FDR vs 40 GigE RoCE)
bull How do benchmarks synthetic kernels and micro benchmarks behave at scale
bull Can micro benchmark performance explain applicationrsquos performance at a larger scale
137
163
12
125
13
135
14
145
15
155
16
165
FDR 40GigE RoCE
Late
ncy
(u
s)
MPI OSU latency
626273
49023
0
1000
2000
3000
4000
5000
6000
7000
FDR 40GigE RoCE
Ban
dw
idth
(M
Bs
)
MPI OSU Bandwidth
MVAPICH2-20b and OMB v42
RoCE vs IB vs TCP
1 1 1 1 1 1
10
0
10
1
10
2
10
2
10
0
10
1
10
0
10
1
10
1
09
5
09
4
09
3
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
GFl
op
s)
Nodes - Cores
HPL
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b
10
0
10
0
10
0
10
0
10
0
10
0
09
9
10
0
10
1
09
9
10
0
09
8
09
9
09
7
08
8
07
5
06
3
02
9
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
NPB LU Class D
FDR 40GigE-ROCE 40GigE-TCP
RoCE vs IB vs TCP
1 1 1 1 1 1
09
9
10
0
10
2
09
4
08
4
08
3
10
0
10
0
09
4
07
2
05
1
03
8
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
WRF Conus 12Km
FDR 40GigE-ROCE 40GigE-TCP
10
0
10
0
10
0
10
0
10
0
10
0
10
1
09
8
08
7
05
9
07
5
05
7
10
1
09
8
04
9
02
9
03
0 04
0
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
Rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
MILC Intel Data set
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b WRF 35 MILC 762
Interconnect Summary
bull InfiniBand is still performs higher than other network fabrics in this study for HPC workloads
bull For some workloads RoCE performs similar to InfiniBand and may be a viable alternative
ndash Havenrsquot seen wide adoption of RoCE in production yet
ndash Mileage will vary based on applicationrsquos communication characteristics
ndash Needs switches with DCB support for optimal lossless performance
bull Ethernet with TCPIP stops scaling after 4-8 nodes
Accelerator Performance
Power and Performance K20 vs K40
HPL performance on single-node
Power amp energy efficiency of an eight node cluster
MV2 performance with GPU Direct OMBDevice to Device Latency
Intel SandyBridge (E5-2670) NVIDIA Telsa K20m GPU Mellanox ConnectX-3 FDR CUDA 60OFED 22-100 with GPU Direct RDMA Beta
62
26x
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Decision Processor selection Criteria Power
bull 2 x E5-2697-v2 27 GHz 12c 130W does the best in most cases
bull All tests done on fully subscribed 4 servers with FDR interconnect
Energy efficiency across four nodes using multiple IVB processors
Decision Memory selection Criteria Performance
bull Dual Rank memory modules give best performance
bull All tests done on fully subscribed 4 servers with FDR interconnect
Performance drop when using single rank memory modules on 4 nodes
Interconnect Performance
OSU Latency and Bandwidth (FDR vs 40 GigE RoCE)
bull How do benchmarks synthetic kernels and micro benchmarks behave at scale
bull Can micro benchmark performance explain applicationrsquos performance at a larger scale
137
163
12
125
13
135
14
145
15
155
16
165
FDR 40GigE RoCE
Late
ncy
(u
s)
MPI OSU latency
626273
49023
0
1000
2000
3000
4000
5000
6000
7000
FDR 40GigE RoCE
Ban
dw
idth
(M
Bs
)
MPI OSU Bandwidth
MVAPICH2-20b and OMB v42
RoCE vs IB vs TCP
1 1 1 1 1 1
10
0
10
1
10
2
10
2
10
0
10
1
10
0
10
1
10
1
09
5
09
4
09
3
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
GFl
op
s)
Nodes - Cores
HPL
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b
10
0
10
0
10
0
10
0
10
0
10
0
09
9
10
0
10
1
09
9
10
0
09
8
09
9
09
7
08
8
07
5
06
3
02
9
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
NPB LU Class D
FDR 40GigE-ROCE 40GigE-TCP
RoCE vs IB vs TCP
1 1 1 1 1 1
09
9
10
0
10
2
09
4
08
4
08
3
10
0
10
0
09
4
07
2
05
1
03
8
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
WRF Conus 12Km
FDR 40GigE-ROCE 40GigE-TCP
10
0
10
0
10
0
10
0
10
0
10
0
10
1
09
8
08
7
05
9
07
5
05
7
10
1
09
8
04
9
02
9
03
0 04
0
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
Rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
MILC Intel Data set
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b WRF 35 MILC 762
Interconnect Summary
bull InfiniBand is still performs higher than other network fabrics in this study for HPC workloads
bull For some workloads RoCE performs similar to InfiniBand and may be a viable alternative
ndash Havenrsquot seen wide adoption of RoCE in production yet
ndash Mileage will vary based on applicationrsquos communication characteristics
ndash Needs switches with DCB support for optimal lossless performance
bull Ethernet with TCPIP stops scaling after 4-8 nodes
Accelerator Performance
Power and Performance K20 vs K40
HPL performance on single-node
Power amp energy efficiency of an eight node cluster
MV2 performance with GPU Direct OMBDevice to Device Latency
Intel SandyBridge (E5-2670) NVIDIA Telsa K20m GPU Mellanox ConnectX-3 FDR CUDA 60OFED 22-100 with GPU Direct RDMA Beta
62
26x
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Decision Memory selection Criteria Performance
bull Dual Rank memory modules give best performance
bull All tests done on fully subscribed 4 servers with FDR interconnect
Performance drop when using single rank memory modules on 4 nodes
Interconnect Performance
OSU Latency and Bandwidth (FDR vs 40 GigE RoCE)
bull How do benchmarks synthetic kernels and micro benchmarks behave at scale
bull Can micro benchmark performance explain applicationrsquos performance at a larger scale
137
163
12
125
13
135
14
145
15
155
16
165
FDR 40GigE RoCE
Late
ncy
(u
s)
MPI OSU latency
626273
49023
0
1000
2000
3000
4000
5000
6000
7000
FDR 40GigE RoCE
Ban
dw
idth
(M
Bs
)
MPI OSU Bandwidth
MVAPICH2-20b and OMB v42
RoCE vs IB vs TCP
1 1 1 1 1 1
10
0
10
1
10
2
10
2
10
0
10
1
10
0
10
1
10
1
09
5
09
4
09
3
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
GFl
op
s)
Nodes - Cores
HPL
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b
10
0
10
0
10
0
10
0
10
0
10
0
09
9
10
0
10
1
09
9
10
0
09
8
09
9
09
7
08
8
07
5
06
3
02
9
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
NPB LU Class D
FDR 40GigE-ROCE 40GigE-TCP
RoCE vs IB vs TCP
1 1 1 1 1 1
09
9
10
0
10
2
09
4
08
4
08
3
10
0
10
0
09
4
07
2
05
1
03
8
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
WRF Conus 12Km
FDR 40GigE-ROCE 40GigE-TCP
10
0
10
0
10
0
10
0
10
0
10
0
10
1
09
8
08
7
05
9
07
5
05
7
10
1
09
8
04
9
02
9
03
0 04
0
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
Rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
MILC Intel Data set
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b WRF 35 MILC 762
Interconnect Summary
bull InfiniBand is still performs higher than other network fabrics in this study for HPC workloads
bull For some workloads RoCE performs similar to InfiniBand and may be a viable alternative
ndash Havenrsquot seen wide adoption of RoCE in production yet
ndash Mileage will vary based on applicationrsquos communication characteristics
ndash Needs switches with DCB support for optimal lossless performance
bull Ethernet with TCPIP stops scaling after 4-8 nodes
Accelerator Performance
Power and Performance K20 vs K40
HPL performance on single-node
Power amp energy efficiency of an eight node cluster
MV2 performance with GPU Direct OMBDevice to Device Latency
Intel SandyBridge (E5-2670) NVIDIA Telsa K20m GPU Mellanox ConnectX-3 FDR CUDA 60OFED 22-100 with GPU Direct RDMA Beta
62
26x
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Interconnect Performance
OSU Latency and Bandwidth (FDR vs 40 GigE RoCE)
bull How do benchmarks synthetic kernels and micro benchmarks behave at scale
bull Can micro benchmark performance explain applicationrsquos performance at a larger scale
137
163
12
125
13
135
14
145
15
155
16
165
FDR 40GigE RoCE
Late
ncy
(u
s)
MPI OSU latency
626273
49023
0
1000
2000
3000
4000
5000
6000
7000
FDR 40GigE RoCE
Ban
dw
idth
(M
Bs
)
MPI OSU Bandwidth
MVAPICH2-20b and OMB v42
RoCE vs IB vs TCP
1 1 1 1 1 1
10
0
10
1
10
2
10
2
10
0
10
1
10
0
10
1
10
1
09
5
09
4
09
3
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
GFl
op
s)
Nodes - Cores
HPL
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b
10
0
10
0
10
0
10
0
10
0
10
0
09
9
10
0
10
1
09
9
10
0
09
8
09
9
09
7
08
8
07
5
06
3
02
9
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
NPB LU Class D
FDR 40GigE-ROCE 40GigE-TCP
RoCE vs IB vs TCP
1 1 1 1 1 1
09
9
10
0
10
2
09
4
08
4
08
3
10
0
10
0
09
4
07
2
05
1
03
8
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
WRF Conus 12Km
FDR 40GigE-ROCE 40GigE-TCP
10
0
10
0
10
0
10
0
10
0
10
0
10
1
09
8
08
7
05
9
07
5
05
7
10
1
09
8
04
9
02
9
03
0 04
0
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
Rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
MILC Intel Data set
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b WRF 35 MILC 762
Interconnect Summary
bull InfiniBand is still performs higher than other network fabrics in this study for HPC workloads
bull For some workloads RoCE performs similar to InfiniBand and may be a viable alternative
ndash Havenrsquot seen wide adoption of RoCE in production yet
ndash Mileage will vary based on applicationrsquos communication characteristics
ndash Needs switches with DCB support for optimal lossless performance
bull Ethernet with TCPIP stops scaling after 4-8 nodes
Accelerator Performance
Power and Performance K20 vs K40
HPL performance on single-node
Power amp energy efficiency of an eight node cluster
MV2 performance with GPU Direct OMBDevice to Device Latency
Intel SandyBridge (E5-2670) NVIDIA Telsa K20m GPU Mellanox ConnectX-3 FDR CUDA 60OFED 22-100 with GPU Direct RDMA Beta
62
26x
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
OSU Latency and Bandwidth (FDR vs 40 GigE RoCE)
bull How do benchmarks synthetic kernels and micro benchmarks behave at scale
bull Can micro benchmark performance explain applicationrsquos performance at a larger scale
137
163
12
125
13
135
14
145
15
155
16
165
FDR 40GigE RoCE
Late
ncy
(u
s)
MPI OSU latency
626273
49023
0
1000
2000
3000
4000
5000
6000
7000
FDR 40GigE RoCE
Ban
dw
idth
(M
Bs
)
MPI OSU Bandwidth
MVAPICH2-20b and OMB v42
RoCE vs IB vs TCP
1 1 1 1 1 1
10
0
10
1
10
2
10
2
10
0
10
1
10
0
10
1
10
1
09
5
09
4
09
3
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
GFl
op
s)
Nodes - Cores
HPL
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b
10
0
10
0
10
0
10
0
10
0
10
0
09
9
10
0
10
1
09
9
10
0
09
8
09
9
09
7
08
8
07
5
06
3
02
9
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
NPB LU Class D
FDR 40GigE-ROCE 40GigE-TCP
RoCE vs IB vs TCP
1 1 1 1 1 1
09
9
10
0
10
2
09
4
08
4
08
3
10
0
10
0
09
4
07
2
05
1
03
8
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
WRF Conus 12Km
FDR 40GigE-ROCE 40GigE-TCP
10
0
10
0
10
0
10
0
10
0
10
0
10
1
09
8
08
7
05
9
07
5
05
7
10
1
09
8
04
9
02
9
03
0 04
0
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
Rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
MILC Intel Data set
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b WRF 35 MILC 762
Interconnect Summary
bull InfiniBand is still performs higher than other network fabrics in this study for HPC workloads
bull For some workloads RoCE performs similar to InfiniBand and may be a viable alternative
ndash Havenrsquot seen wide adoption of RoCE in production yet
ndash Mileage will vary based on applicationrsquos communication characteristics
ndash Needs switches with DCB support for optimal lossless performance
bull Ethernet with TCPIP stops scaling after 4-8 nodes
Accelerator Performance
Power and Performance K20 vs K40
HPL performance on single-node
Power amp energy efficiency of an eight node cluster
MV2 performance with GPU Direct OMBDevice to Device Latency
Intel SandyBridge (E5-2670) NVIDIA Telsa K20m GPU Mellanox ConnectX-3 FDR CUDA 60OFED 22-100 with GPU Direct RDMA Beta
62
26x
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
RoCE vs IB vs TCP
1 1 1 1 1 1
10
0
10
1
10
2
10
2
10
0
10
1
10
0
10
1
10
1
09
5
09
4
09
3
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
GFl
op
s)
Nodes - Cores
HPL
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b
10
0
10
0
10
0
10
0
10
0
10
0
09
9
10
0
10
1
09
9
10
0
09
8
09
9
09
7
08
8
07
5
06
3
02
9
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
NPB LU Class D
FDR 40GigE-ROCE 40GigE-TCP
RoCE vs IB vs TCP
1 1 1 1 1 1
09
9
10
0
10
2
09
4
08
4
08
3
10
0
10
0
09
4
07
2
05
1
03
8
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
WRF Conus 12Km
FDR 40GigE-ROCE 40GigE-TCP
10
0
10
0
10
0
10
0
10
0
10
0
10
1
09
8
08
7
05
9
07
5
05
7
10
1
09
8
04
9
02
9
03
0 04
0
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
Rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
MILC Intel Data set
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b WRF 35 MILC 762
Interconnect Summary
bull InfiniBand is still performs higher than other network fabrics in this study for HPC workloads
bull For some workloads RoCE performs similar to InfiniBand and may be a viable alternative
ndash Havenrsquot seen wide adoption of RoCE in production yet
ndash Mileage will vary based on applicationrsquos communication characteristics
ndash Needs switches with DCB support for optimal lossless performance
bull Ethernet with TCPIP stops scaling after 4-8 nodes
Accelerator Performance
Power and Performance K20 vs K40
HPL performance on single-node
Power amp energy efficiency of an eight node cluster
MV2 performance with GPU Direct OMBDevice to Device Latency
Intel SandyBridge (E5-2670) NVIDIA Telsa K20m GPU Mellanox ConnectX-3 FDR CUDA 60OFED 22-100 with GPU Direct RDMA Beta
62
26x
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
RoCE vs IB vs TCP
1 1 1 1 1 1
09
9
10
0
10
2
09
4
08
4
08
3
10
0
10
0
09
4
07
2
05
1
03
8
0
02
04
06
08
1
12
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
WRF Conus 12Km
FDR 40GigE-ROCE 40GigE-TCP
10
0
10
0
10
0
10
0
10
0
10
0
10
1
09
8
08
7
05
9
07
5
05
7
10
1
09
8
04
9
02
9
03
0 04
0
000
020
040
060
080
100
120
1 - 20 2 - 40 4 - 80 8 - 160 16 - 320 32 - 640
Per
form
ance
Rel
ativ
e to
FD
R (
Rat
ing)
Nodes - Cores
MILC Intel Data set
FDR 40GigE-ROCE 40GigE-TCP
MVAPICH2-20b WRF 35 MILC 762
Interconnect Summary
bull InfiniBand is still performs higher than other network fabrics in this study for HPC workloads
bull For some workloads RoCE performs similar to InfiniBand and may be a viable alternative
ndash Havenrsquot seen wide adoption of RoCE in production yet
ndash Mileage will vary based on applicationrsquos communication characteristics
ndash Needs switches with DCB support for optimal lossless performance
bull Ethernet with TCPIP stops scaling after 4-8 nodes
Accelerator Performance
Power and Performance K20 vs K40
HPL performance on single-node
Power amp energy efficiency of an eight node cluster
MV2 performance with GPU Direct OMBDevice to Device Latency
Intel SandyBridge (E5-2670) NVIDIA Telsa K20m GPU Mellanox ConnectX-3 FDR CUDA 60OFED 22-100 with GPU Direct RDMA Beta
62
26x
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Interconnect Summary
bull InfiniBand is still performs higher than other network fabrics in this study for HPC workloads
bull For some workloads RoCE performs similar to InfiniBand and may be a viable alternative
ndash Havenrsquot seen wide adoption of RoCE in production yet
ndash Mileage will vary based on applicationrsquos communication characteristics
ndash Needs switches with DCB support for optimal lossless performance
bull Ethernet with TCPIP stops scaling after 4-8 nodes
Accelerator Performance
Power and Performance K20 vs K40
HPL performance on single-node
Power amp energy efficiency of an eight node cluster
MV2 performance with GPU Direct OMBDevice to Device Latency
Intel SandyBridge (E5-2670) NVIDIA Telsa K20m GPU Mellanox ConnectX-3 FDR CUDA 60OFED 22-100 with GPU Direct RDMA Beta
62
26x
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Accelerator Performance
Power and Performance K20 vs K40
HPL performance on single-node
Power amp energy efficiency of an eight node cluster
MV2 performance with GPU Direct OMBDevice to Device Latency
Intel SandyBridge (E5-2670) NVIDIA Telsa K20m GPU Mellanox ConnectX-3 FDR CUDA 60OFED 22-100 with GPU Direct RDMA Beta
62
26x
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Power and Performance K20 vs K40
HPL performance on single-node
Power amp energy efficiency of an eight node cluster
MV2 performance with GPU Direct OMBDevice to Device Latency
Intel SandyBridge (E5-2670) NVIDIA Telsa K20m GPU Mellanox ConnectX-3 FDR CUDA 60OFED 22-100 with GPU Direct RDMA Beta
62
26x
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
MV2 performance with GPU Direct OMBDevice to Device Latency
Intel SandyBridge (E5-2670) NVIDIA Telsa K20m GPU Mellanox ConnectX-3 FDR CUDA 60OFED 22-100 with GPU Direct RDMA Beta
62
26x
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Domain specific solutions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Dell Genomic Analysis Platform
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Dell Genomic Analysis Platform (Continued)
Parameter Results and Analysis
Time taken for analyzing 30 samples 195 Hours
Energy Consumption for analyzing 30
samples
22277 kWh
kWhGenome 742 kWh Genome
Genomesday 37
Advantages
bull Metrics relevant to the domain instead of GFLOPs
bull Energy Efficient
bull Plug and Play
bull Scalability
bull What used to take 2 weeks now takes less than 4 hours
bull More to follow
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Dell - Restricted - Confidential
Collateral
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Future Work and Potential Areas of Research
bull Deployment tools
bull Use of virtualization and cloud (Openstack) in HPC
ndash Linux Containers and Docker
bull Hadoop
bull Lustre FS
bull Accelerators
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Storage Blogsndash HTSS + DX Object Storage
rsaquo httpdelltozJqiTK
ndash Dell HPC NFS Storage Solution with High Availability -- Large Capacity Configuration
rsaquo httpdelltoGYWU5x
ndash Dell support for XFS greater than 100 TB
rsaquo httpdelltoGUjXRq
ndash NSS-HA 12G Performance Intro
rsaquo httpdelltoNFUafG
ndash NSS45-HA Solution Configurations
rsaquo httpdellto10xLxJV
ndash Dell Fluid Cache for DAS performance with NFS
rsaquo httpdellto15KnsDc
ndash Achieving over 100000 IOPs with NFS Async
rsaquo httpdellto16yE3bP
ndash Dell | Terascala HPC Storage Solution Part I
rsaquo httpwwwdelltechcentercompageDell+|+Terascala+HPC+Storage+Solution+Part+I
ndash Dell | Terascala HPC Storage Solution Part 2
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2336aspx
ndash DT-HSS3 Performance and Scalability
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2300aspx
23
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Storage Blogs Continuedndash Dell | Terascala HPC Storage Solution - HSS5
rsaquo httpdellto1gpVVyN
ndash NSS overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2338aspx
ndash NSS-HA overview
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2298aspx
ndash NSS-HA XL configuration
rsaquo httpencommunitydellcomtechcenterhigh-performance-computingwwiki2299aspx
ndash Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations
rsaquo httpdellto1eZU0xL
24
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Coprocessor Acceleration Blogsndash GPUDirect Improves Communication Bandwidth Between GPUs on the C410X
rsaquo httpdelltoApnLz5
ndash Comparing GPU-Direct Enabled Communication Patterns for Oil and Gas Simulations
rsaquo httpdelltoJsWqWT
ndash Accelerating ANSYS Mechanical Simulations with M2090 GPU on the R720
rsaquo httpdelltoJT79KF
ndash Accelerating High Performance Linpack (HPL) with GPUs
rsaquo httpdelltoMrYw8q
ndash Faster Molecular Dynamics with GPUs
rsaquo httpdelltoPEaFaF
ndash Deploying and Configuring Intel Xeon Phi Coprocessor with HPC Solution
rsaquo httpdellto14GtFRv
25
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Best Practices Blogsndash 12G HPC Solution with ROCKS+ from StackIQ
rsaquo httpdelltoxGmSHO
ndash HPC mode on Dell PowerEdge R815 with AMD 6200 Processors
rsaquo httpdelltoMMGG4s
ndash Optimal BIOS settings for HPC workloads
rsaquo httpdelltoPkkMG1
ndash CFD Primer
rsaquo httpdelltoUwJQum
ndash OpenFOAM
rsaquo httpdelltoRga3hS
ndash PowerEdge M420 with single Force10 MXL Switch
rsaquo httpdelltoZjnhjz
ndash Active Infrastructure for HPC Life Sciences
rsaquo httpdellto18eaDSJ
ndash Dell HPC Solution Refresh Intel Xeon Ivy Bridge-EP 1866 DDR3 memory and RHEL 64
rsaquo httpdellto18U3Aki
26
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Performance Blogsndash HPC IO performance using PCI-E Gen3 slots on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltowzdV0x
ndash HPC performance on the 12th Generation (12G) PowerEdge Servers
rsaquo httpdelltozozohn
ndash Unbalanced Memory Configuration Performance
rsaquo httpdelltoUQ1kQu
ndash Performance analysis of HPC workloads
rsaquo httpdelltoSTbE8q
27
Dell - Restricted - Confidential
Questions
Dell - Restricted - Confidential
Questions
top related