14th october 2014graduate lectures1 oxford university particle physics unix overview sean brisbane...
TRANSCRIPT
14th October 2014 Graduate Lectures 1
Oxford University Particle Physics Unix Overview
Sean BrisbaneParticle Physics Systems Administrator
Room 661Tel 73389
14th October 2014 Graduate Lectures 2
Strategy Local Cluster Overview Connecting to it Grid Cluster Computer Rooms How to get help
14th October 2014 Graduate Lectures 3
Particle Physics Strategy The Server / Desktop Divide
Win 7 PC Linux Desktop
Des
ktop
sS
erve
rs
General Purpose Unix
Server
Group DAQ
Systems
Linux Worker nodes
Web Server
Linux FileServers
Win 7 PC
Win 7 PC
Ubuntu PC
Approx 200 Desktop PC’s with Exceed, putty or ssh/X windows used to access PP Linux systems
Virtual Machine Host
NIS Server
torque Server
Physics fileservers and clients
14th October 2014 Graduate Lectures 4
Windows server
Central Linux file-server
PP file-serverStorage system:
Client Windows Central Ubuntu
PP Linux
Recommended storage
H:\ drive /home folder /home and /data folders
Windows storage “H:\” drive or “Y:\home”
/physics/home
PP Storage Y:/LinuxUsers/pplinux/data/home
/data/home, /data/experiment
Central Linux Y:/LinuxUsers/home/particle
/network/home/particle
14th October 2014 Graduate Lectures 5
Particle Physics Linux Unix Team (Room 661):
Pete Gronbech - Senior Systems Manager and GridPP Project Manager Ewan MacMahon – Grid Systems Administrator Kashif Mohammad – Grid and Local Support Sean Brisbane – Local Server and User Support
General purpose interactive Linux based systems for code development, short tests and access to Linux based office applications. These are accessed remotely.
Batch queues are provided for longer and intensive jobs. Provisioned to meet peak demand and give a fast turnaround for final analysis.
Systems run Scientific Linux (SL) which is a free Red Hat Enterprise based distribution.
The Grid & CERN have migrated to SL6. The majority of the local cluster is also on SL6, but some legacy SL5 systems are provided for those that need them.
We will be able to offer you the most help running your code on the newer SL6. Some experimental software frameworks still require SL5.
14th October 2014 Graduate Lectures 6
Current Clusters
Particle Physics Local Batch cluster
Oxfords Tier 2 Grid cluster
14th October 2014
PP Linux Batch Farm
pplxwn15
Scientific Linux 6
pplxint8pplxint9
8 * Intel 5420 cores
Interactive login nodes
pplxwn16 8 * Intel 5420 cores
7Graduate Lectures
pplxwn31
pplxwn32
pplxwnnn
pplxwn38
pplxwn41
pplxwnnn
pplxwn60 16 * Intel cores
16 * Intel 2650 cores
16 * Intel 2650 cores
12 * Intel 5650 cores
12 * Intel 5650 cores
12 * Intel 5650 cores
12 * Intel 5650 cores
Users log in to the interactive nodespplxint8 & 9, the home directories and all the data disks (/home area or /data/group ) are shared across the cluster and visible on the interactive machines and all the batch system worker nodes.
Approximately 300 cores (430 incl. JAI/LWFA), each with 4GB of RAM memory.
The /home area is where you should keep your important text files such as source code, papers and thesis
The /data/ area is where you should put your big reproducible input and output data
pplxwnnn 12 * Intel 5650 cores
pplxwn59
jailxwn01 64 * AMD cores
16 * Intel cores
jailxwn02 64 * AMD cores
14th October 2014
PP Linux Batch Farm Scientific Linux 5
pplxint5Interactive login nodes
pplxwn23 16 * AMD 6128 cores
pplxwn24 16 * AMD 6128 cores
8Graduate Lectures
Legacy SL5 jobs supported by smaller selection of worker nodes.
Currently eight servers with 16 cores each with 4GB of RAM memory per core.
All of your files area available from SL5 and 6, but the software environment will be different and therefore your code may not run if compiled for the other operating system.
pplxwn30 16 * AMD 6128 cores
pplxwnnn 16 * AMD 6128 cores
pplxwnnn 16 * AMD 6128 cores
pplxint6
14th October 2014
PP Linux Batch Farm Data Storage
pplxfsn
40TB
pplxfsn
40TB
Data Areas
pplxfsn
19TB
9Graduate Lectures
NFS Servers
Home areas
Data Areas
NFS is used to export data to the smaller experimental groups, where the partition size is less than the total size of a server.
The data areas are too big to be backed up. The servers have dual redundant PSUs, RAID 6 and are running on uninterruptible powers supplies. This safeguards against hardware failures, but does not help if you delete files.
The home areas are backed up by two different systems nightly. The Oxford ITS HFS service and a local back up system. If you delete a file tell us a soon as you can when you deleted it and it’s full name.The latest nightly backup of any lost or deleted files from your home directory is available at the read-only location /data/homebackup/{username}
The home areas are quota’d but if you require more space ask us.
Store your thesis on /home NOT /data.
pplxfsn
30TBData Areas
Particle Physics Computing
44TB
Lustre OSS04
df -h /data/atlasFilesystem Size Used Avail Use% Mounted on/lustre/atlas25/atlas 366T 199T 150T 58% /data/atlas
df -h /data/lhcbFilesystem Size Used Avail Use% Mounted on/lhcb25 118T 79T 34T 71% /data/lhcb25
14th October 2014 10Graduate Lectures
The Lustre file system is used to group multiple file servers together to provide extremely large continuous file spaces. This is used for the Atlas and LHCb groups.
pplxint5
14th October 2014 Graduate Lectures 11
14th October 2014 Graduate Lectures 12
Strong Passwords etc
Use a strong password not open to dictionary attack! fred123 – No good Uaspnotda!09 – Much better
More convenient* to use ssh with a passphrased key stored on your desktop. Once set up
14th October 2014 Graduate Lectures 13
Connecting with PuTTYQuestion: How many of you are using Windows? & Linux? On the desktop
Demo1. Plain ssh terminal connection
1. From ‘outside of physics’ 2. From Office (no password)
2. ssh with X windows tunnelled to passive exceed
3. ssh, X windows tunnel, passive exceed, KDE Session
4. Password-less access from ‘outside physics’1. See backup slides
http://www2.physics.ox.ac.uk/it-services/ppunix/ppunix-cluster
http://www.howtoforge.com/ssh_key_based_logins_putty
14th October 2014 Graduate Lectures 14
14th October 2014 Graduate Lectures 15
SouthGrid Member Institutions
Oxford RAL PPD Cambridge Birmingham Bristol Sussex
JET at Culham
Current capacity Compute Servers
Twin and twin squared nodes– 1770 CPU cores
Storage Total of ~1300TB The servers have between 12 and 36 disks, the
more recent ones are 4TB capacity each. These use hardware RAID and UPS to provide resilience.
14th October 2014 Graduate Lectures 16
14th October 2014 Graduate Lectures 17
Get a Grid Certificate
Must remember to use the same PC to request and retrieve the Grid Certificate.
The new UKCA page
http://www.ngs.ac.uk/ukca
uses a JAVA based CERT WIZARD
You will then need to contact central Oxford IT. They will need to see you, with your university card, to approve your request:
Dear Stuart Robeson and Jackie Hewitt,
I Please let me know a good time to come over to Banbury road IT office for you to approve my grid certificate request.
Thanks.
When you have your grid certificate…
14th October 2014
Save to a filename in your home directory on the Linux systems, eg:
Y:\Linuxusers\particle\home\{username}\mycert.p12
Log in to pplxint9 and run
mkdir .globus
chmod 700 .globus
cd .globus
openssl pkcs12 -in ../mycert.p12 -clcerts -nokeys -out usercert.pem
openssl pkcs12 -in ../mycert.p12 -nocerts -out userkey.pem
chmod 400 userkey.pem
chmod 444 usercert.pem
Now Join a VO
This is the Virtual Organisation such as “Atlas”, so: You are allowed to submit jobs using the
infrastructure of the experiment Access data for the experiment
Speak to your colleagues on the experiment about this. It is a different process for every experiment!
14th October 2014 Graduate Lectures 19
Joining a VO
Your grid certificate identifies you to the grid as an individual user, but it's not enough on its own to allow you to run jobs; you also need to join a Virtual Organisation (VO).
These are essentially just user groups, typically one per experiment, and individual grid sites can choose to support (or not) work by users of a particular VO.
Most sites support the four LHC VOs, fewer support the smaller experiments.
The sign-up procedures vary from VO to VO, UK ones typically require a manual approval step, LHC ones require an active CERN account.
For anyone that's interested in using the grid, but is not working on an experiment with an existing VO, we have a local VO we can use to get you started.
14th October 2014 Graduate Lectures 20
When that’s done
Test your grid certificate:> voms-proxy-init –voms lhcb.cern.ch
Enter GRID pass phrase:
Your identity: /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=j bloggs
Creating temporary
proxy ..................................... Done Consult the documentation provided by
your experiment for ‘their’ way to submit and manage grid jobs
14th October 2014 Graduate Lectures 21
14th October 2014 Graduate Lectures 22
Two Computer Rooms provide excellent
infrastructure for the future
The New Computer room built at Begbroke Science Park jointly for the Oxford Super Computer and the Physics department, provides space for 55 (11KW) computer racks. 22 of which will be for Physics. Up to a third of these can be used for the Tier 2 centre. This £1.5M project was funded by SRIF and a contribution of ~£200K from Oxford Physics.
The room was ready in December 2007. Oxford Tier 2 Grid cluster was moved there during spring 2008. All new Physics High Performance Clusters will be installed here.
14th October 2014 Graduate Lectures 23
Local Oxford DWB Physics Infrastructure Computer Room
Completely separate from the Begbroke Science park a computer room with 100KW cooling and >200KW power has been built. ~£150K Oxford Physics money.
Local Physics department Infrastructure computer room.
Completed September 2007.
This allowed local computer rooms to be refurbished as offices again and racks that were in unsuitable locations to be re housed.
Cold aisle containment
2414th October 2014 Graduate Lectures
Other resources (for free) Oxford Advanced Research Computing
A shared cluster of CPU nodes, “just” like the local cluster here GPU nodes
– Faster for ‘fitting’, toy studies and MC generation
– *IFF* code is written in a way that supports them Moderate disk space allowance per experiment (<5TB) http://www.arc.ox.ac.uk/content/getting-started
Emerald Huge farm of GPUs http://www.cfi.ses.ac.uk/emerald/
Both needs a separate account and project Come talk to us in RM 661
14th October 2014 Graduate Lectures 25
14th October 2014 Graduate Lectures 26
The end of the overview Now more details of use of the clusters Help Pages
http://www.physics.ox.ac.uk/it/unix/default.htm http://www2.physics.ox.ac.uk/research/particle-
physics/particle-physics-computer-support ARC
http://www.arc.ox.ac.uk/content/getting-started
Email [email protected]
BACKUP
14th October 2014 Graduate Lectures 27
Puttygen to create an ssh key on Windows (previous slide point #4)
14th October 2014 Graduate Lectures 28
Paste this into ~/.ssh/authorized_keys on pplxint
Enter a secure passphrase then : - Enter a strong passphrase - Save the private parts of the key to a subdirectory of your local drive.
Pageant
Run Pageant once after login Right-click on the pageant symbol and
and “Add key” for your Private (windows ssh key)
14th October 2014 Graduate Lectures 29