Download - Tier1A Status
Tier1A Status
Martin Bly28 April 2003
CPU Farm
• Older hardware:– 108 dual processors (450, 600 and 1GHz)– 156 dual processor 1400MHz PIII
• Recent delivery:– 80 dual 2.66GHz P4 Xeon – 533MHz FSB, 2GB memory
• Next delivery expected in the summer
Operating Systems
• Operating Systems:– Redhat 6.2 service will close in May– Redhat 7.2 service has been in production
for Babar for 6 months.– New Redhat 7.3 service now available for
LHC/other experiments
• Increasing demands for security updates becoming problematic.
Disk Farm (last Year)
• Last year – 26 servers, each with 2 external RAID arrays - 1.7TB disk per server:– Excellent performance, well balanced system– Problems with a bad batch of Maxtor drives –
many failures and high error rate – all 620 drives now replaced by Maxtor.
– Still outstanding problems with Accusys controller failing to eject bad drives from RAID set.
Disk Farm (this year)
• Recent upgrade to disk farm.– 11 dual P4 servers (with PCIx), each with 2 Infortrend
IFT-6300 arrays– 12 Maxtor 200GB Diamondmax Plus 9 drives per
array.
• Not yet in production – but a few snags:– Original tendered Maxtor: Maxline Plus II drive was
found not to exist.– Infortrend array has 2TB limit per RAID set – some
(10%) wasted space!
• Nick White ([email protected]) for more info
New Projects
• Basic fabric performance monitoring (ganglia)
• Resource CPU accounting (based on PBS accounts/mysql)
• New CA in production• New batch scheduler (MAUI)• Deploy new helpdesk (May)
Ganglia Monitoring
• Urgently needed live performance and utilisation monitoring– RAL Ganglia Monitoring (live)– RAL Ganglia Monitoring (Static)
• Scalable solution based on multicast• Very rapidly deployable - reasonable
support on all Tier1A Hardware• See: http://ganglia.sourceforge.net/
PBS Accounting Software
• Need to keep track of system CPU and disk usage.
• Home grown PBS accounting package (Derek Ross):– Upload PBS and disk stats into MYSQL– Process with perl DBI script– Serve via Apache
• http://www.gridpp.rl.ac.uk/stats • Contact Derek ([email protected]) for more
info.
MAUI/PBS
• Maui scheduler has been in production for last 3 months.
• Allows extremely flexible scheduling with many features. But ….– Not all of it works – we have done much
work with developers for fixes.– Major problem – MAUI schedules on wall
clock time – not CPU time. Had to bodge it!!
New Helpdesk Software
• Old helpdesk mail based/unfriendly.• With additional staff, urgently need to deploy
new solution.• Expect new system to be based on free
software – probably Request Tracker• Hope that deployed system will also meet
needs of Testbed and may also satisfy Tier 2 sites.
• Expect deployment by end of May.• http://requestracker.gridpp.rl.ac.uk/ (Static)
Outstanding Issues/worries
• We have to run many distinct services. For example, FERMI Linux, RH 6.2/7.2/7.3, EDG testbeds, LCG …
• Farm management is getting very complex. We need better tools and automation.
• Security Is becoming a big concern again.