prague site report
DESCRIPTION
Prague Site Report. Jiří Chudoba Institute of Physics, Prague. 23.4.2012 Hepix meeting, Prague. Local Organization. Institute of Physics: 2 locations in Prague, 1 in Olomouc 786 employees (281 researchers + 78 doctoral students) Department of Networking and Computing Techniques (SAVT) - PowerPoint PPT PresentationTRANSCRIPT
Prague Site Report
Jiří ChudobaInstitute of Physics, Prague
23.4.2012 Hepix meeting, Prague
Local Organization• Institute of Physics:
o 2 locations in Prague, 1 in Olomouco 786 employees (281 researchers + 78 doctoral students)
• Department of Networking and Computing Techniques (SAVT)o networking up to offices, mail and web servers, central services
• Computing centre (CC)o large scale calculationso part of SAVT (except leader – Jiri Chudoba)
• Division of Elementary Particle Physicso Section Department of detector development and data processing
• head Milos Lokajicek• started large scale calculations, later transferred to CC• the biggest hw contributor (LHC computing)• participates in the CC operation
Server room I• Server room I (Na Slovance)
o 62 m2, ~20 racks 350 kVA motor generator, 200 + 2 x 100 kVA UPS, 108 kW air cooling, 176 kW water cooling
o continuous changeso hosts computing servers and central services
Other server rooms• New server room for SAVT
o located next to server room Io independent UPS (24 kW now, max 64 kW n+1),
motor generator (96 kW), cooling 25 kW (n+1)o dedicated for central services o 16 m2, now 4 racks (room for 6)o very high reliability requiredo first servers moved in last week
• Server room Cukrovarnickao another building in Pragueo 14 m2, 3 racks (max 5), 20 kW central UPS, 2x8 kW
coolingo backup servers and services
• Server room UTIAo 3 racks, 7 kW cooling, 3 + 5x1.5 kW UPSo dedicated to Department of Condensed Matter
Theory
Clusters in CC - Dorje• Dorje: Altix ICE8200, 1.5 rack
o 512 cores on 64 diskless WN, IB, 2 disk arrays (6+14 TB)o only local users, solid state physics, condense matter theoryo 1 admin for administration and user supporto relatively small number of jobs, MPI jobs up to 256 processeso Torque + Maui, SLES10 SP2, SGI Tempo, MKL, OpenMPI, ifort
• users run mostly: Wien2k, vasp, fireball, apls
Cluster LUNA• 2 servers SunFire X4600
o 8 CPUs 32 cores, 256 GB RAM• 4 servers SunFire V20z, V40z• Operated by CESNET Metacentrum – distributed
computing activity of the NGI_CZ• Metacentrum
o 9 locationso 3500 coreso 300 TB
Cluster Thsun, Small group servers
• Thsuno “private” cluster
• small number of users• power users with root privileges
o 12 servers of variable hw• servers for groups
o managed by groups in collaboration with CC
Cluster Golias• Upgraded every year –
several subclusters of the identical hw
• 3812 cores, 30700 HS06• almost 2 PB disk space• the newest (March 2012)
subcluster rubus:o 23 nodes SGI Rackable C1001-G13o 2x (Opteron 6274 16 cores) 64 GB RAM,
2x SAS 300 GBo 374 W (full load)o 232 HS06 per node, 5343 HS06 total
47%
26%
22%
4% 1%
37%
30%
28%
1% 4%
d0
alice
atlas
auger
solid
Golias shares2011 HS06 shareAlice+Star 7551 30Atlas 7087 28D0 9165 37Solid 914 4Calice 30 0Auger 205 1
24951 100
2012 HS06 shareAlice+Star 7564 25Atlas 11861 39D0 9969 32Solid 629 2Calice 13 0Auger 668 2
30704 100
3%4% 15%
22%
15%8%
5%
12%
17%
Golias-pGolias-cIberisIbisIbSalixSaltixDorjeRubus
Subclusters contribution to the total performance
Planned vs real usage (walltime)
WLCG Tier2• cluster Golias@FZU + xrootd servers @ Rez• 2012 pledges:
o ATLAS 10000 HS06, 1030 TiB; 11861 HS06 available, 1300 TB av.o ALICE 5000 HS06, 420 TiB; 7564 HS06, 540 TB available
• delivery of almost 600 TB delayed due to floods• 66% efficiency is assumed for WLCG accounting
o sometimes under 100% of pledges• Low cputime/walltime ratio for the ALICE
o not only on our siteo Tests with limits on number of concurrent jobs (last week)o “no limit” (about 900 jobs) – 45%o limit 600 jobs - 54 %
Utilization• Very high average utilization
o several different projects, different tools for productiono D0 – production submitted locally by 1 usero ATLAS – panda, ganga, local users; DPMo ALICE – VO box; xrootd
D0
ALICE
ATLAS
Networking• CESNET upgraded our
main CISCO routero 6506 -> 6509o supervisor SUP720 -> SUP2To new 8x 10G X2 cardo planned upgrade of power
supplies 2x3kW -> 2x6 kW• (2 cards 48x1 Gbps, 1 card
4x10 Gbps, FW service module)
External connection• Exclusive: 1 Gbps (to FZK) + 10 Gbps (CESNET)• Shared: 10 Gbps (PASNET – GEANT)
• Not enough for ATLAS T2D limit (5 MB/s to/from T1s)• Perfsonar installed
FZK -> FZU
FZU -> FZK
PASNET link
Miscellaneous items• Torque server performance
o W jobs, sometimes long response timeo divide Golias in 2 clusters with 2 torque instances?o memory limits for ATLAS and ALICE queues
• CVMFSo used by ATLAS, works wello some older nodes have too small disks -> excluded for ATLAS
• Managemento Cfengine v2 used for productiono Puppet used for IPv6 testbed
• 2 new 64 core nodeso SGI Rackable H2106-G7, 128 GB RAM, 4x Opteron 6274 2.2GHz, 446 HS06o frequent crashes when loaded with jobs
• Another 2 servers with Intel SB expectedo small subclusters with different hw
Water cooling• Active vs passive cooling doors
o 1 new rack with cooling doorso 2 new cooling doors on APC racks
Water cooling
good sealing crucial
diskservers on off (divider added)
disk
serv
ers
work
er n
odes
rubus01
Distributed Tier2, Tier3s• Networking infrastructure (provided by CESNET)
connects all Prague institutions involvedo Academy of Sciences of the Czech Republic
• Institute of Physics (FZU, Tier-2)• Nuclear Physics Institute
o Charles University in Prague• Faculty of Mathematics and Physics
o Czech Technical University in Prague• Faculty of Nuclear Sciences and Physical Engineering• Institute of Experimental and Applied physics
• Now only NPI hosts resources visible in Grido Many reasons why others do not: manpower, suitable rooms, lack of IPv4
addresses• Data Storage group at CESNET
o deployment for LHC projects discussed
22
• Thanks to my colleagues for help with preparation of these slides:
o Marek Eliášo Lukáš Fialao Jiří Horkýo Tomáš Hrubýo Tomáš Kouba o Jan Kundráto Miloš Lokajíčeko Petr Roupeco Jana Uhlířováo Ota Velínský