deepsense computing platform · o job resume: bresume o job kill: bkill •check available hosts...

19
DeepSense Computing Platform

Upload: others

Post on 24-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

DeepSense Computing Platform

Page 2: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

Agenda

• System Overview

• File Systems

• Data Transfer

• IBM Spectrum LSF

• Conductor with Spark

• Technical Support

Page 3: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

System Overview

• Compute Nodes• 20 Large memory nodes

-20 core, 512GB memory

• 4 Huge memory nodes

-20 core, 1TB memory

• 10 GPU nodes

-2XP100, 20 core, 512 GB Memory

• Operating System • Redhat Enterprise 7.5

Page 4: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

Heterogeneous Computing with GPU

Page 5: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

NVIDIA P100 GPU

Cores 3584

Memory 16 GB HBM2

Memory Bandwidth 720 GB/s

FLOPS (sp) 9.3 TFLOPS

FLOPS (dp) 4.7 TFLOPS

FLOPS (hp) 18.7 TFLOPS

Power consumption 250W

CUDA compute ability 6.0

Page 6: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

IBM Power 8 with NVIDIA P100

CPU-GPU Systems Connected via PCI-e

NVLink Enables Fast Unified Memory Access between CPU and GPU memory

Page 7: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

Applications

• Domain

- Ocean data products, Ship Building, Fisheries and

Aquaculture, Seaport and Logistics, Security and

Defense, Marine Risk…

• Data Source

- Sensor logs, text, image, video, web traffic

geospatial, AIS, …

• Analytics

- Image processing, Time-Series, Predictive

Analytics, Machine Learning, Deep Learning,

Distributed Computing, ..

Page 8: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

File System

File System

Directory Purpose Quota Backed up?

Purged?

Home /dshome/subdir/username development 1Tb, 500k files per user yes no

Data /data/projectname development 2Tb, 500k files per project

yes no

Scratch /scratch/username computation 2TB, 1M files per user no yes

Page 9: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

Data Transfer

• Two protocol nodesoprotocol1.deepsense.ca

oprotocol2.deepsense.ca

• Connect using SAMBA:

- smb://protocol1.deepsense.ca

- use your DeepSense account

Page 10: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

Data Backups

• DeepSense is platform for data analytics. It is not meant for long term storage.• Users should ensure their original data is backed up at their own site.

• We do have daily backups• /dshome, /data, /software are backed up every evening

• The backup keeps 7 versions of files

• Once a file is deleted, it is kept backed up for 30 days. After which, it is no longer accessible

• If you need to restore a file, please let us know

Page 11: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

IBM Spectrum LSF

• Workload management platform

• Maximize utilization for distributed

High Performance computing

• GPU Support

• Execute batch/interactive jobs

• Containerized workloads

Page 12: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

LSF Access and Login

• User account => Deepsense account

• Login nodes:• login1.deepsense.ca

• login2.deepsense.ca

• Example connection:• ssh <username>@login1.deepsense.ca

• for Mac or Linux client use terminal

• for windows client use PuTTY, MobaXterm

• If you are off campus, need a Dalhousie VPN connection

Page 13: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

Submitting Job to LSF

• Development/test jobs• For testing/dev use the login nodes

• Shared with all users

• Batch jobs• Command: bsub

• With ‘bsub’ options specify:

input/output files, GPU option, CPU/Memory Limit, etc..

• Interactive jobs• Command: bsub -I

Page 14: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

LSF Monitoring/Cancelling jobs

• Check Running jobso bjobs -l

o bjobs -l <jobid> // for job details

• Control job executiono Job suspend: bstop <jobid>

o Job resume: bresume <jobid>

o Job kill: bkill <jobid>

• Check available hostso bhosts

Page 15: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

IBM Spectrum Conductor with Spark (CWS)

• Spark integration and lifecycle management platform

• Support for multiple Spark versions

• Integrated application platform

• Notebooks, Deep Learning packages

• Simplified administration

Page 16: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

Accessing CWS

• Management ConsoleoGo to url:https://ds-mgm-02.deepsense.cs.dal.ca:8443

o Login using DS account

• Command Line Optiono from login node ssh to:

ssh ds-cmhm-02.deepsense.cs.dal.ca

o source the environment

o login to cws using DS account

Page 17: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

CWS - Spark Instance Group

• From dashboard go to:oWorkload -> Spark -> Spark

Instance Group

o Specify name, directory and user

• Choose Spark Versiono Spark 2.3.1, Spark 2.2.0,

Spark 2.1.1, Spark 1.6.1

• Optional: choose Notebook

Page 18: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

Technical support

• DocumentationoDeepsense computing platform wiki page

https://docs.deepsense.ca

o IBM Knowledge Center

https://www.ibm.com/support/knowledgecenter/

• Troubleshooting/technical questionso Send email to [email protected]

Page 19: DeepSense Computing Platform · o Job resume: bresume  o Job kill: bkill  •Check available hosts o bhosts. IBM Spectrum Conductor with Spark (CWS) •Spark

Questions ?