rhea analysis & post-processing cluster robert d. french nccs user assistance

21
Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

Upload: alejandro-union

Post on 14-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

RheaAnalysis & Post-processing Cluster

Robert D. FrenchNCCS User Assistance

Page 2: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

2

Rhea Quick Overview

• 200 Dell PowerEdge C6220 Nodes– 196 Compute / 4 Login– RHEL 6.4– 2 x 8-Core Intel Xeon CPUs @ 2.0 GHz

• Hyperthreading is enabled, so “top” shows 32 CPUs

– 64GB of RAM– New 56Gb/s IB Fabric

• Mounts Atlas– Does not mount Widow

• Replaces Lens

• No Preemptive Queue

Page 3: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

3

Allocation & Billing

• Rhea is prioritized as an extra resource for INCITE and ALCC users through the end of the year.– DD Projects may request access

• 1 node hour charged per node per hour– Ex: 10 nodes for 2 hours = 20 node hours

• Each project will be awarded 1,000 hours per Month– Separate from Titan / Eos usage– Request more if you run low

Page 4: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

4

Rhea Queue Policy

Job Size Job Length Job Limits Restricted by

1 – 16 Nodes 0 – 12 Hours12 – 36 Hours36 – 96 Hours

3 eligible / unlimited running2 active 1 active

UserSystemSystem

17 – 32 Nodes 0 – 12 Hours12 – 36 Hours

2 active1 active

SystemSystem

33 – 128 Nodes 0 – 3 Hours 1 active System

• Should minimize large jobs swamping the system

• Small runs should complete quickly

• Request a Reservation for more nodes / longer wall-times

Page 5: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

5

Software Stack

• Most Lens software will already be installed

• Here are some highlights:– Visualization: ParaView, VisIt, VMD– Compilers: GCC, Intel, and PGI– Scientific Languages: MATLAB, Octave, R, SciPy– Data Management: Globus, BBCP, NetCDF, HDF5, Adios– Debugging: DDT, Vampir, Valgrind

• Full list of installed software available on our website

• If you can’t find what you need, just ask!

Page 6: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

6

Transitioning to Rhea

TitanTitanLensLens

WidowWidow

• Now:

• Titan and Lens mount Widow

Page 7: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

7

Transitioning to Rhea

TitanTitan RheaRheaLensLens

WidowWidow AtlasAtlas

• Soon (mid-to-late November):

• Titan will mount both Atlas and Widow

• Move data to Atlas and take advantage of Rhea

Page 8: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

8

Transitioning to Rhea

TitanTitan RheaRhea

AtlasAtlas

• Near Future:

• Lens will be decommissioned

• Rhea will be the center’s viz & analysis cluster

WidowWidow

Page 9: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

9

Questions?

Page 10: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

Spider II Directory Layout Changes

Chris Fuson

Page 11: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

11

OLCF Center-wide File Systems

• Spider–Center-wide scratch space–Temporary; not backed-up–Available from compute nodes–Fast access to job-related temporary files

and for staging large files to and from archival storage

–Contains multiple Lustre file systems

Page 12: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

12

Spider I v/s Spider II

Spider I

Widow [1-3]

•240 GB/s•10 PB•3 MDS•192 OSS•1,344 OST

•Current Center-wide Scratch•Decommissioned Early January, 2014

Atlas [1-2]

•1 TB/s•30 PB•2 MDS•288 OSS•2,016 OST

•Available on Additional OLCF Systems Soon

Spider II

Page 13: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

13

Spider II Change Overview

1. New directory structure– Organized by project– Each project given a directory on one of the atlas filesystems– WORKDIR now within project areas

» You may have multiple WORKDIRs» * Requires Change

2. Quota increases– Increased file system size allows for increased quotas

3. All areas purged– To help ensure space available for all projects

Before using Spider II, please note the following:Before using Spider II, please note the following:

Page 14: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

14

• Purpose: Batch job I/O• Path:

- $MEMBERWORK/<projid>• 10 TB quota• 14 day purge• Permissions:

- User allowed to change permissions to share within project- No automatic permission changes

Spider II Directory StructureProjectID

Member WorkMember Work

World Work

Project Work

Page 15: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

15

• Purpose: Data sharing within project• Path:

- $PROJWORK/<projid>• 100 TB quota• 90 day purge• Permissions:

- Read, Write, Execute access for project members

Spider II Directory StructureProjectID

Member Work

World Work

Project WorkProject Work

Page 16: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

16

• Purpose: Data sharing with users who are not members of project• Path:

- $WORLDWORK/<projid>• 10 TB quota• 14 day purge• Permissions:

- Read, Execute for world- Read, Write, Execute for project

Spider II Directory StructureProjectID

Member Work

World WorkWorld Work

Project Work

Page 17: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

17

Spider II Directory Structure• New directory structure

– Organized by project

Page 18: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

18

Before Using Atlas• Modify scripts to point to new directory structure

• /tmp/work/$USER• $WORKDIR

/tmp/proj/<projid>

• $MEMBERWORK/<projid>

• Migrate data• You will need to transfer needed data onto Spider II (atlas)

• $PROJWORK/<projid>

Page 19: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

19

Questions?

• More information:– www.olcf.ornl.gov/kb_articles/atlas-transition/

• Email:– [email protected]

Page 20: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

20

Other Items

• Dec 17th - Titan to return to 100%

• 2013 User Survey– Available on olcf.ornl.gov

Page 21: Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance

21

Thanks for your time.