singapore students to compete in sc14 - hpc-ai advisory ...€¦ · singapore students to compete...
TRANSCRIPT
Singapore Students to Compete in
SC14
by Team NUS
7 Oct 2014
Contents
• SC14 Student Cluster Competition
• Team Formation
• Trainings
• Exciting Hardware
• Advanced Software
• Challenges
• Knowledge and Experience
Contents
• SC14 Student Cluster Competition
• Team Formation
• Trainings
• Exciting Hardware
• Advanced Software
• Challenges
• Knowledge and Experience
SC14 Student Cluster Competition
• 17 Nov – 19 Nov, New Orleans, Louisiana, USA
• 48 Hours
• 12 University Teams
• 3120 Watt limit
• 4 Parallel Applications
• Fastest system wins
Contents
• SC14 Student Cluster Competition
• Team Formation
• Trainings
• Exciting Hardware
• Advanced Software
• Challenges
• Knowledge and Experience
The Students
6 undergraduate students from National University of Singapore
• Chen Liang (Yr 4 Comp Engineering)
• David Heryanto (Yr 4 Comp Science)
• Ho Wei Xiong (Yr 5 Math + Comp Science double degree)
• Liu Jin Frank (Yr 4 Comp Science)
• Li Yin (Yr 4 Comp Science)
• Yu Fangzhou (Yr 4 Comp Science)
The Mentors
• Prof. Deng Yuefan (NUS Visitng Professor)
• Kevin Siswandi (A*CRC)
• Jonathan Low (A*CRC)
• Special Thanks:
• Dr. Marek Michalewicz (A*CRC)
• Dr. Tan Tin Wee (A*CRC - NUS)
The Sponsors
• A*CRC (hardware + logistics)
• Intel (Servers and CPUs)
• IBM (Power8)
• NVIDIA (GPUs)
• Samsung (Memory Cards)
Contents
• SC14 Student Cluster Competition
• Team Formation
• Trainings
• Exciting Hardware
• Advanced Software
• Challenges
• Knowledge and Experience
Training Intensity Timeline
0
5
10
15
Apr May Jun Jul Aug Sep
Ho
urs
pe
r w
ee
k
Weeks
Weekly Training Hours
Training Scope
• Hardware Architecture Design • Component specs analysis
• Power efficiency analysis
• Application Learning and Testing • Understanding the scientific theory behind competition applications
• Application fine-tuning
• Run-time and speed-up comparison
Training Scope
• System setup • Operating System
• Internet Sharing
• Remote Access
• Password-free ssh
• File system sharing
• Networking through TCP and IB
• MPI
• CPU frequency tuning
• Job Scheduling
• Hardware Management System
• System monitoring tools
Contents
• SC14 Student Cluster Competition
• Team Formation
• Trainings
• Exciting Hardware
• Advanced Software
• Challenges
• Knowledge and Experience
A glimpse of the super computer speed
Figure 1: 4-core notebook running parallel program Figure 2: 28-core HPC server running parallel program
0
50
100
150
200
250
300
350
400
0 20 40 60
Tim
e (
S)
Cores
ADCIRC Sample Input Time against Cores
Figure 1: ADCIRC test performed by Team NUS on SGI
server
Figure 2: NAMD test performed by Team NUS on SGI
server
A glimpse of the super computer speed
Working with today’s best technology
Figure 1: IBM Power S822L server (Available since June 2014)
• Processors • Two 10-core processors (3.42 GHz)
• Each core supports maximum 8 threads
• Processor-to-memory bandwidth 192 GB/s per
socket
• 512 KB L2 cache per core
• 8 MB L3 cache per core
• 16 MB L4 cache per socket
• Memory • Up to 1 TB
• OS • Linux (RHEL 7)
Figure 1: IBM Power S822L server (Available since June 2014)
• Supported Compilers • XL compilers (optimal)
• GNU compilers
• Supported Math Library • ESSL (optimal)
• Open BLAS
• MPI • OpenMPI
Working with today’s best technology
Figure 1: Intel S2600WTT Server (Available since Q4 2014)
• Processors • Two 14-core Xeon E5-2697 v3 CPUs (2.6
GHz)
• 35 MB Cache
• Memory • DDR4
• 24 DIMMs Up to 3 TB (using 128 G DIMMs)
• OS • Linux (CentOS 7)
Working with today’s best technology
Working with today’s best technology
Figure 1: Intel S2600WTT Server (Available since Q4 2014)
• Supported Compilers • Intel compilers (optimal)
• GNU compilers
• Supported Math Library • MKL (optimal)
• Open BLAS
• MPI • IntelMPI (optimal)
• OpenMPI
Working with today’s best technology
0
50
100
150
200
250
300
350
400
450
500
0 5 10 15 20 25 30
ADCIRC Sample Input Time (s) against Cores
Power 8 Intel Haswell
Figure 1: ADCIRC test performed by Team NUS on IBM Power
S822L and Intel server with E5 2697 v3 (Haswell) CPUs
Figure 2: LINPACK benchmark performed by Team NUS on IBM
Power S822L and Intel server with E5 2697 v3 CPUs
1006.556
540.317
0
200
400
600
800
1000
1200
Ethernet InfiniBand
GF
lop
s
Linpack Test N=10000 on Single Node
Working with today’s best technology
Figure 1: nVIDIA Tesla K40 GPU
(passive)
• Performance • Double Precision floating point performance
(peak) = 1.43 Tflops
• Single Precision floating point performance
(peak) = 4.29 Tflops
• Memory • 12G GDDR5, bandwidth = 288 GB/s
Working with today’s best technology
Figure 1: NAMD test performed by Team NUS with and
without GPU
1006.556
540.317
0
200
400
600
800
1000
1200
Ethernet InfiniBand
GF
lop
s
Linpack Test 2 nodes with and without GPU acceleration
Figure 2: LINPACK benchmark performed by Team NUS
with and without GPU
Working with today’s best technology
Figure 1: Mellanox Infiniband Cards, Cables and Switch
Working with today’s best technology
Figure 2: Graph 500 benchmarking on 2 nodes,
performed by Team NUS with Ethernet and InfiniBand
1006.556
540.317
0
200
400
600
800
1000
1200
Ethernet InfiniBand
Seco
nd
s
Graph 500 test run time comparison using Ethernet and InfiniBand
(Scale=15, Edgefactor = 20, NBFS = 64)
Contents
• SC14 Student Cluster Competition
• Team Formation
• Trainings
• Exciting Hardware
• Advanced Software
• Challenges
• Knowledge and Experience
Advanced scientific applications
• ADCIRC (Advanced Circulation Model Framework)
• NAMD (Nanoscale Molecular Dynamics Program)
• MATLAB Seismic Data Analysis Application
ADCIRC
• Simulate water elevation changes over time.
Figure 1: Gulf of Mexico 2D Mesh Grid Figure 2: Gulf of Mexico Water elevation during
Hurricane Isabel
NAMD
• Simulate particle movement and energy change over time.
Figure 1: Ubiquitin in a water box and in a water sphere.
Hydrogen atoms are colored black for contrast
MATLAB Seismic Data Analysis Application
• Use seismic wave signal to find underground surface topology.
Contents
• SC14 Student Cluster Competition
• Team Formation
• Trainings
• Excitements
• Challenges
• Knowledge and Experience
Challenges
• Hardware • Unfamiliar with HPC hardware
• Lots of terms and jargons to figure out
• Lack of online documentation for latest hardware
• Need to trouble shoot hardware issues
• Steep learning curve on setting up cluster, install/compile the correct software/middleware
• Competition Applications • Need to understand the input/output data format
• Need to understand the workflow of applications
• Need to understand configuration parameters when compiling and running applications
• Need to debug compilation/running errors
• Lots of manual testing need to be automated
Contents
• SC14 Student Cluster Competition
• Team Formation
• Trainings
• Excitements
• Challenges
• Knowledge and Experience
Knowledge and Experience
• Parallel computing theory
• Scientific application models
• Hardware specs and performance measure
• System setup, backup, configuration, communication
• Advanced linux usage
Thanks
Q&A
References
• Power S822L specs
• http://www-03.ibm.com/systems/sg/power/hardware/s812l-s822l/
• Intel S2600WTT specs
• http://ark.intel.com/products/82156/Intel-Server-Board-S2600WTT
• Tesla K40 specs
• http://www.nvidia.com/content/PDF/kepler/Tesla-K40-PCIe-Passive-Board-Spec-BD-06902-001_v05.pdf
• Images
• http://www.112it.pl/_categoryPhoto/24614.jpg
• http://exxactcorp.com/uploads/product/77b1531b36b4b77b613daa85292092b7.jpg
• http://www.storagereview.com/images/StorageReview-Mellanox-InfiniBand.jpg
• http://adcirc.org/home/documentation/example-problems/hurricane-isabel-example/
• http://www.ks.uiuc.edu/Training/Tutorials/namd/namd-tutorial-win-html/node8.html
Student Cluster Competition: A*CRC Story
Summary and Take-Aways
Acknowledgements Dr Marek Michalewicz SGI Micron
Prof. Yuefan Deng NVIDIA Paul Hiew Ngee Heng
Prof. Tan Tin Wee Intel Dr Dominic Chien
Stephen Wong Mellanox Dr Michael Sullivan
Lim Ching Kwang HP Dr Liou Sing Wu
Dr Jonathan Low IBM Dr Gabriel Noaje
A*CRC Comp. System Grp TechSource Nebojsa Novakovic
A*CRC Operations Grp 3M