tier0 status
DESCRIPTION
Tier0 Status. Tony Cass LCG-LHCC Referees Meeting 16 th February 2009. Agenda. Resources CASTOR status and performance Progress with new data centre project. Agenda. Resources CASTOR status and performance Progress with new data centre project. November Status. - PowerPoint PPT PresentationTRANSCRIPT
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Tier0 Status - 1
Tier0 Status
Tony Cass
LCG-LHCC Referees Meeting16th February 2009
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Tier0 Status - 2
Agenda• Resources• CASTOR status and performance• Progress with new data centre project
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Tier0 Status - 3
Agenda• Resources• CASTOR status and performance• Progress with new data centre project
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Tier0 Status - 4
Status of 2009 procurements• CPU
– First batch• Ordered out in late August• Delivery before November• Production in December or early 2009
– Second batch• Received the tender answers• Target FC approval in December• Delivery before March 2009• Production in March – April 2009
• Disk – First batch
• FC approval last week• Delivery in December• Production January 2009
– Second batch• Received the tender answers• Target FC approval in December• Delivery before March 2009• Production in March – April 2009
• Tape– Media availability not a problem but exact procurement schedule
depends on progress with new repack service between now and beginning of 2009
2 of 3 batches already on site
FC approval not required, but delivery scheduleunchanged (installation depends on readiness of racks)
JanuaryFebruary
70 Sun T10KB drives ordered (1TB/cartridge)
T10KA drives to be phased out as repack advances.
November Status
No orders issued following December
statement on likely schedule
No orders issued following December
statement on likely schedule
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Procurements2009 Status & 2010 outlook• CPU & Disk
– ~60% of foreseen 2009 pledges available in April– (Additional ATLAS request not included)
– Balance to be operational in October• Tight schedule, but agreed with Purchasing dept.• Exploring options to purchase iSCSI disk storage
– Greater cost/TB, but avoids interruption to CASTOR service due to disk server failure (#1 cause of incidents; disk failures are handled transparently)
– 2010 procurement planning underway• Tenders issued in June; adjudication in ~November.
• Tape– Expect ~20PB spare capacity by October.– Will purchase “high density” IBM robot in autumn
• 14,000 slots — 14PB– Can convert an existing IBM robot to “high density’
version in 2010 (with no service interruption) if additional capacity required. Tier0 Status - 5
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Resource Usage Efficiency • CPU/Wall ratio has long been a concern:• But utilisation of the public LXBATCH
cluster is generally high:• Still, we see many jobs waiting for tape
recalls– New “backfill” option introduced to schedule
short jobs when long waits for tape expected.– Nice improvement seen:– Need to review settings and publicise to
improve impact.
Tier0 Status - 6
(CPU...)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
SLC5 Migration• Migration of batch resources underway
– All new capacity introduced will be SLC5 based
– Existing capacity migrated progressively.• Migration of LXPLUS alias is an issue:
– Principle is easy: switch when majority of batch capacity is SLC5. But measured where?• @ CERN: switch early• on grid: switch late.
– No clear/obvious solution yet.• [Rapid migration of other grid sites would help. And
is maybe sensible before September anyway?]
Tier0 Status - 7
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Tier0 Status - 8
Agenda• Resources• CASTOR status and performance• Progress with new data centre project
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Tier0 Status - 9
Agenda• Resources• CASTOR status and performance
– Upstream services (SRM, FTS)– CASTOR status & plans– Metrics
• Progress with new data centre project
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Tier0 Status - 10
SRM & FTS• SRM 2.7 release is delayed
– Originally foreseen in June but has still not yet passed testing/certification
– Continue with 1.3 until LHC shutdown• SLC3 – hardware running out of warranty
retire/replace• Cannot be deployed in a fully redundant configuration• Built with an old castor client constrains the stager
deployment• FTS 2.1 passed certification too close to LHC startup– Continue with 2.0 service (SLC3)– Setting up an independent 2.1 production service
(SLC4) in parallel allowing VOs to move when convenient
Pre-production clusters in service for all LHC VOs
Production deployment before end-2008
FTS 2.1 production service available
Still being “tested” by experiments but most
production transfers already with this version
November Status
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
• Status– Generally quiet/good...– ... except for tape repack
• BUT we are reasonably confident about our ability to support production; user analysis is the concern and there is no major load.– CASTOR 2.1.8, with integrated xrootd redirector, should
deliver improvements for analysis• LSF bypass & reduced latency, but also improved scalability as
xrootd daemon has smaller footprint than rfio (to be deprecated?)• Also delivers
– end-to-end checksumming for rfio– User space accounting (required for later deployment of
quotas)– operational improvements (notably automatic draining of disk
servers)– fixes to problems identified by repack (main reason for
deployment delays)• Schedule: end-Feb release, in production on c2cernt3 end-March,
deployment for experiment instances in April.
CASTOR Status & Plans
Tier0 Status - 11
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Tier0 Status - 12
Performance metrics• Metrics have been implemented and
deployed on preproduction cluster– Data collected in lemon– RRD graphs not yet implemented
• Production deployment delayed for several reasons– New metrics imply several changes to
exception/alarms and automated actions used in production
– An unexpected technical dependency on the late SRM 2.7 version• Ongoing work to back-port the implementation
All still true
November Status
Much progress, but little visible; considering
how best to group metrics for display
• e.g. group cache hits and garbage collection
activity? However...
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Tier0 Status - 16
Agenda• Resources• CASTOR status and performance• Progress with new data centre project
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Tier0 Status - 17
New data centre project• Reminder: the selected strategy is to
do a single tender for an overall solution
• Four phase process developed:1. Request (many) conceptual designs2. Commission 3-4 companies submitting
conceptual designs to develop an outline design
3. In-house, turn a selected outline design into plans and documents enabling
4. Single tender for overall construction.
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
• Deadline: 28th November– Contacts with all 4 companies during design
phase– All 4 companies say deadline will be met
• Meetings to review proposed designs scheduled in week of December 8th.
• Market Survey in preparation as first stage in selection of company for detailed design & construction.
• Discussions in Oslo on 28th November to further investigate possible remote server installation in 2011 (and beyond)– RAL also have power available in 2011, but
not as much and for a shorter period. Tier0 Status - 18
Outline Design PhaseNovember Status
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
• Four designs reviewed– No clear winner, but consensus on leading design.
• New Management supports project. Good, but…– New requirements --- “Green” & Prévessin heat
recovery option– New organisation brings new players to brief
• “Single Contract for construction” agreed• Agreement to work with one company to deliver
fully acceptable design with modifications for new requirements.– Will lead to ~6 month delay.– [Personal view] Plan to continue with only one
company should be agreed by Directorate now to avoid potential hiccups later. Frédéric Hemmer discussing with Sergio Bertolucci.
• Will need to revisit option to install equipment at University of Oslo.
Tier0 Status - 19
Current Status
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Tier0 Status - 20
Questions?
Comments?