attention developers: the top six advantages of cuda-ready...
TRANSCRIPT
Attention Developers:
The Top Six Advantages of CUDA-Ready Clusters and Clouds
Ian Lumb
Bright Evangelist
CUDA-Ready Clusters and Clouds
1. You focus on coding not infrastructure• You view infrastructure as your API
2. entire CUDA environment
3. You cross-develop with confidence and ease • You make use of different versions of the CUDA Toolkit
4. You choose CUDA or OpenCL or OpenACC•
5. or Big Data• You access Hadoop services alongside HPC
6. You make use of public and private clouds• You extend into AWS and deploy OpenStack
CUDA-ready clusters and clouds are GPU developer-ready
CUDA-Ready Clusters and Clouds
1. You focus on coding not infrastructure• You view infrastructure as your API
2. entire CUDA environment
3. You cross-develop with confidence and ease • You make use of different versions of the CUDA Toolkit
4. You choose CUDA or OpenCL or OpenACC•
5. or Big Data• You access Hadoop services alongside HPC
6. You make use of public and private clouds• You extend into AWS and deploy OpenStack
CUDA-ready clusters and clouds are GPU developer-ready
4
5
Cluster Health Management
Provide problem free environment for running jobs
Four elements1. Cluster management automation
2. Regular health checks
3. Pre-job health checks
4. Hardware stability & performance tests
All elements above are configurable and extensible
CUDA-Ready Clusters and Clouds
1. You focus on coding not infrastructure• You view infrastructure as your API
2.environment
3. You cross-develop with confidence and ease • You make use of different versions of the CUDA Toolkit
4. You choose CUDA or OpenCL or OpenACC•
5. or Big Data• You access Hadoop services alongside HPC
6. You make use of public and private clouds• You extend into AWS and deploy OpenStack
CUDA-ready clusters and clouds are GPU developer-ready
Cluster Management Shell
Bright Cluster Manager CUDA Environment
User PortalCluster Management GUI
SSL / SOAP / X509 / IPtables
Cluster Management Daemon
Dis
k
Eth
ern
et
Inte
rco
nn
ect
IPM
I /
iLO
PD
U
CP
U
GP
Us
Me
mo
ry
SlurmPBS Pro
Torque/MauiTorque/MOAB
Grid EngineLSF
MonitoringAutomation
Health ChecksManagement
CompilersLibraries
DebuggersProfilers
Provisioning
SLES / RHEL / CentOS / SL
Innovation characterizes the entire history and evolution of GPU programmability through CUDA•
• People
Proactively maintaining business and technical relationships
• Process
`Hands-
– Preliminary to fully productized implementations
• Product
Bright Cluster Manager released once per year
– Updates flow continuously
10
CUDA-Ready Clusters and Clouds
1. You focus on coding not infrastructure• You view infrastructure as your API
2. entire CUDA environment
3. You cross-develop with confidence and ease • You make use of different versions of the CUDA Toolkit
4. You choose CUDA or OpenCL or OpenACC•
5. or Big Data• You access Hadoop services alongside HPC
6. You make use of public and private clouds• You extend into AWS and deploy OpenStack
CUDA-ready clusters and clouds are GPU developer-ready
13
Available Versions of the CUDA Toolkit
14
Using CUDA 6.0
CUDA-Ready Clusters and Clouds
1. You focus on coding not infrastructure• You view infrastructure as your API
2. entire CUDA environment
3. You cross-develop with confidence and ease • You make use of different versions of the CUDA Toolkit
4. You choose CUDA or OpenCL or OpenACC•
5. or Big Data• You access Hadoop services alongside HPC
6. You make use of public and private clouds• You extend into AWS and deploy OpenStack
CUDA-ready clusters and clouds are GPU developer-ready
Programming GPUs
CUDA
OpenCL
OpenACC
MPI
Tools• CUDA gdb
• nvidia-smi
• CUDA Utility Library
• Examples
• 3rd Party
Allinea
Rogue Wave
Case Study: TUAT (1)
The Customer• Engages materials-science research
Compares computational models with physical experiments
• High-resolution, 3D phase field modeling at large scales using GPUs
The Challenge• Make available the latest innovations in GPU technology
without distracting focus from research
Case Study: TUAT (2)
The Solution• Laboratory GPU cluster designed and implemented by
HPCTech Corp.
• Bright Cluster Manager deployed by HPCTech
Use Bright to fully manage the entire CUDA environment including regular updates
Use modules environment via Bright to manage multiple CUDA environments
• Prototype simulations using laboratory HPC cluster
Includes debugging and tuning code
• Execute large-scale simulations using TSUBAME
•
Large-Scale Grain Growth Simulation• Number of computational grids: 1024 x 1024 x 1024• 3 hours with 128 GPUs
2232768
Simulation conditions
# of grains 32768
Size of domain 512 mm3
Time 8182 s(16000 step)
Grain Number 1
Yamanaka Labhttp://www.tuat.ac.jp/~yamanaka/
23
Case Study: TUAT (3)
“We scientists are time-constrained,” said Dr. Yamanaka. “Our priority is our research, not managing our clusters. Bright is intuitive to use, and with it I can effectively manage my cluster without wasting time writing scripts, or synchronizing management tool revisions. Provisioning is fast and easy too. I prefer this approach over open source toolkits.”
CUDA-Ready Clusters and Clouds
1. You focus on coding not infrastructure• You view infrastructure as your API
2. entire CUDA environment
3. You cross-develop with confidence and ease • You make use of different versions of the CUDA Toolkit
4. You choose CUDA or OpenCL or OpenACC•
5. or Big Data• You access Hadoop services alongside HPC
6. You make use of public and private clouds• You extend into AWS and deploy OpenStack
CUDA-ready clusters and clouds are GPU developer-ready
25
Booth # 34
Additional Slides
27
NVIDIA GPU Boost via Bright Cluster Manager
29
Cluster Health Management
Goal: provide problem free environment for running jobs
Four elements1. Cluster management automation
2. Regular health checks• Actions that return PASS, FAIL or UNKNOWN
• Can be associated with a settable severity and a message
• Can launch an action based on any response value
3. Pre-job health checks• Let the workload manager hold the job very briefly
• Check the health of each reserved node
• If unhealthy, take the node offline, inform the system administrator
• Let the workload manager reschedule the job to a different set of nodes
4. Hardware stability & performance tests• Very wide range of tests
• May include disk overwrites and reboot(s)
All elements above are configurable and extensible
32
Bright API
CUDA-Ready Clusters and Clouds
1. You focus on coding not infrastructure• You view infrastructure as your API
2. entire CUDA environment
3. You cross-develop with confidence and ease • You make use of different versions of the CUDA Toolkit
4. You choose CUDA or OpenCL or OpenACC•
5.Data• You access Hadoop services alongside HPC
6. You make use of public and private clouds• You extend into AWS and deploy OpenStack
CUDA-ready clusters and clouds are GPU developer-ready
34
35
36
37
HPC and Hadoop
Use GPUs for HPC and Big Data Analytics
Introduce GPUs into Hadoop clusters
Make use of Hadoop services
CUDA-Ready Clusters and Clouds
1. You focus on coding not infrastructure• You view infrastructure as your API
2. entire CUDA environment
3. You cross-develop with confidence and ease • You make use of different versions of the CUDA Toolkit
4. You choose CUDA or OpenCL or OpenACC•
5. or Big Data• You access Hadoop services alongside HPC
6. You make use of public and private clouds• Amazon Web Services and OpenStack
CUDA-ready clusters and clouds are GPU developer-ready
GPUs in the Cloud? The Top Four Reasons
1. You can realize possibilities using the cloud • You can scale up and scale out
2. You still realize the promise of GPU programmability•
3. Your use of the cloud is transparent •
Constraints apply for MPI apps
4. Your go-to apps still work in the cloud
Scenario I
node001
head nodenode002
node003
Cloud Utilization
Scenario II
head node
node001 node002 node003
node004
node005
node006
node007
Cloud Utilization
43
Case Study: Oil and Gas Exploration (1)
The Customer• Acquires and processes significant volumes of seismic data
for multinational clients
• Refactoring existing algorithms to make use of GPUs
Want to take advantage of the latest innovations
– Decrease time to results through increased performance
The Challenge• Introduce GPU-based enhancements without disrupting
Case Study: Oil and Gas Exploration (2)
The Solution• Wholeheartedly adopting GPU technology
Latest GPUs in a variety of hardware configurations– Including ultradense GPU units
Embracing latest innovations in the CUDA toolkit
• Deployed Bright Cluster ManagerUse Bright to fully manage the entire CUDA environment
– From NVIDIA Tesla K40 GPU accelerators to the CUDA toolkit
Use modules environment via Bright to manage multiple CUDA environments for R&D and production processing
change – Includes in-house seismic processing applications (e.g., RTM)
• The Results • Realizing > 10X performance gains in certain cases
• GPU technology transforming data-processing business