gridpp3 project status sarah pearce 14 april 2010 gridpp24 rhul
TRANSCRIPT
GridPP3 project status
Sarah Pearce 14 April 2010GridPP24 RHUL
2GridPP24, RHUL
Since last GridPP meeting
• LHC has turned on– First collisions in December– “First physics” 30 March– Grid seems to working as expected for all the experiments
so far
• Tier-1 had some ‘settling in’ problems, mostly resolved
• R89 officially opened• 2nd tranche of T2 hardware grants issued (?)• EPSRC review• EGI/ROSCOE/CUE/GridTalk proposals submitted and
variously accepted or rejected• GridPP4 proposal written and submitted14/4/10
3GridPP24, RHUL
Tier-1• Acceptance problems with one tranche
of disk delivery. Now resolved - drives replaced with those of a different manufacturer
• Three triggers for disaster management from R89:• Cooling failures in August –
remedial work undertaken and work to address Building Management System problems planned
• Water leak onto tape robot – audit of water sources carried out and rectification
• Impedance problems with the UPS supply – remedial work ongoing
• Tier-1 procurements have been carried out. Disk and CPU have been delivered and are undergoing acceptance testing
• R89 opened on 30 March – same day as LHC event
14/4/10
4GridPP24, RHUL
EPSRC review
Report explicitly mentions:
• GridPP’s expertise in large-scale distributed data management and analysis.
• Our work with start-up companies.
• The substantial secondary economic benefit arising from the ability to rapidly screen drugs "in-silico”. GridPP resources were used in this way to screen potential agents in the fight against bird-flu and malaria.
NGS and GridPP have been highly successful, providing many users with access to more computing power than they could otherwise easily obtain. Looking forward, we recommend that these efforts, including enhanced capacity and function of distributed storage, be sustained and expanded.14/4/10
5GridPP24, RHUL
EGI-Inspire etc.
• Wide range of project bids submitted to the EC in November:– EGI-Inspire. In negotiation, will go
ahead with small changes.• UK involved in security, helpdesk,
monitoring, regional support (and training)
– GridTalk-II. In negotiation, will go ahead.
• QMUL and IC UK partners: GridBriefings, GridCasts, GridCafe, RTM…
• May change name?
– ROSCOE (SSC including HEP) – not successful.
– CUE (Training, outreach) – not successful
14/4/10
6GridPP24, RHUL
December performance
Tier-1• Efficiency for all 4 LHC experiments over 90%• Delivered a record amount of CPU to wLCG
Tier-2s• ScotGrid 98% availability/ 98% reliability• NorthGrid 98%/98% reliability• SouthGrid 97%/98% reliability• LondonGrid 91%/94% reliability
All above 90% threshold
14/4/10
7GridPP24, RHUL
UKI CPU contribution
CPU April 2010
?
14/4/10
Since GridPP23
2009 up to GridPP23
8GridPP24, RHUL
UKI Tier-1 & Tier-2 contribution
14/4/10
Since GridPP23
GridPP22-GridPP23
9GridPP24, RHUL
CPU efficiencySeptember 09 – April10
14/4/10
10GridPP24, RHUL
Storage
• From gstat (and previous talks…)
September 2008 March 2009 September 2009
April 2010
14/4/10
11GridPP24, RHUL
Since data taking started
14/4/10
12GridPP24, RHUL
Project Map GridPP3 Q4 08
Date
ATLAS LHCb CMS1.1 1.2 1.3 1.4
1.1.1 1.1.2 1.1.3 1.1.4 1.1.5 1.2.1 1.2.2 1.2.3 1.2.4 1.2.5 1.3.1 1.3.2 1.3.3 1.3.4 1.3.5 1.4.1 1.4.2 1.4.3 1.4.4 1.4.51.1.6 1.1.7 1.1.8 1.1.9 1.1.10 1.2.6 1.2.7 1.2.8 1.2.9 1.2.10 1.3.6 1.3.7 1.3.8 1.3.9 1.3.10 1.4.6 1.4.7 1.4.8 1.4.9 1.4.101.1.11 1.1.12 1.1.13 1.1.14 1.1.15 1.2.11 1.2.12 1.2.13 1.2.14 1.2.15 1.3.11 1.3.12 1.3.13 1.3.14 1.3.15 1.4.111.1.16 1.1.17 1.1.18 1.1.19 1.1.20 1.2.16 1.2.17 1.2.18 1.2.19 1.2.201.1.21 1.1.22 1.1.23 1.1.24 1.1.25 1.2.21 1.2.22 1.2.23 1.2.24 1.2.25
2.1 3.1 4.1 5.1 6.12.1.1 2.1.2 2.1.3 2.1.4 2.1.5 3.1.1 3.1.2 3.1.3 3.1.4 3.1.5 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 6.1.1 6.1.2 6.1.3 6.1.4 6.1.52.1.6 2.1.7 2.1.8 2.1.9 2.1.10 3.1.6 3.1.7 3.1.8 3.1.9 3.1.10 4.1.6 4.1.7 4.1.8 4.1.9 4.1.10 5.1.6 5.1.7 5.1.8 6.1.6 6.1.7 6.1.8 6.1.92.1.11 2.1.12 2.1.13 3.1.11 3.1.12 3.1.13 3.1.14 3.1.15 4.1.11 4.1.12 4.1.13 4.1.14 4.1.15
3.1.16 3.1.17 3.1.18 3.1.19 3.1.203.1.21 3.1.22
2.2 3.2 4.2 5.2 6.22.2.1 2.2.2 2.2.3 2.2.4 2.2.5 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 5.2.1 5.2.2 5.2.3 5.2.4 5.2.5 6.2.1 6.2.2 6.2.3 6.2.4 6.2.52.2.6 2.2.7 2.2.8 2.2.9 2.2.10 3.2.6 3.2.7 3.2.8 3.2.9 3.2.10 4.2.6 4.2.7 4.2.8 4.2.9 4.2.10 5.2.6 5.2.7 5.2.8 5.2.9 5.2.10 6.2.6 6.2.7 6.2.8 6.2.9 6.2.102.2.11 2.2.12 2.2.13 2.2.14 2.2.15 3.2.11 3.2.12 3.2.13 3.2.14 3.2.15 4.2.11 4.2.12 4.2.13 4.2.14 4.2.15 5.2.11 5.2.12 6.2.11 6.2.12 6.2.13 6.2.142.2.16 2.2.17 2.2.18 2.2.19 2.2.20 3.2.16 3.2.17 3.2.18 3.2.19 3.2.202.2.21 3.2.21 3.2.22 3.2.23 3.2.24 3.2.25
3.2.26
2.3 3.3 4.3 6.32.3.1 2.3.2 2.3.3 2.3.4 2.3.5 3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 4.3.1 4.3.2 4.3.3 4.3.4 4.3.5 6.3.1 6.3.2 6.3.3 6.3.4 6.3.52.3.6 2.3.7 3.3.6 3.3.7 3.3.8 3.3.9 3.3.10 4.3.6 4.3.7 4.3.8 4.3.9 4.3.10 6.3.6 6.3.7 6.3.8 6.3.9 6.3.10
3.3.11 3.3.12 3.3.13 3.3.14 3.3.15 4.3.11 4.3.12 4.3.13 4.3.14 4.3.153.3.16 3.3.17 3.3.18 3.3.19 3.3.203.3.21 3.3.22 3.3.23 3.3.24 3.3.253.3.26 3.3.27 3.3.28 3.3.29 3.3.30
2.4 3.4 4.4 6.42.4.1 2.4.2 2.4.3 2.4.4 2.4.5 3.4.1 3.4.2 3.4.3 3.4.4 3.4.5 4.4.1 4.4.2 4.4.3 4.4.4 4.4.5 6.4.1 6.4.2 6.4.3 6.4.4 6.4.52.4.6 2.4.7 2.4.8 2.4.9 2.4.10 3.4.6 3.4.7 3.4.8 3.4.9 3.4.10 4.4.6 4.4.7 4.4.8 4.4.9 4.4.10 6.4.6 6.4.7 6.4.8 6.4.9 6.4.102.4.11 2.4.12 2.4.13 2.4.14 2.4.15 3.4.11 3.4.12 3.4.13 3.4.14 3.4.15 4.4.11 4.4.12 4.4.13 4.4.14 4.4.15 6.4.11 6.4.12 6.4.13 6.4.14 6.4.152.4.16 3.4.16 3.4.17 3.4.18 3.4.19 3.4.20 6.4.16 6.4.17
3.4.21
2.52.5.1 2.5.2 2.5.3 2.5.4 2.5.52.5.6 2.5.7 2.5.8
EGEE
4 5
Hardware procurement
Storage systems
Other experiments
Operations
Security
Front end systems LondonGrid
ScotGridResource delivery
External6
To provide UK computing for the Large Hadron Collider
Grid services Tier-13
Middleware support
SouthGrid
NorthGridData and storage
GridPP3 Goal
2
NGI
LCGNetwork
Planning
Execution
Outreach
Tier-2 Management
14/4/10
13GridPP24, RHUL
ProjectMap Q4 09
14/4/10
14GridPP24, RHUL
Project map - statistics
Metrics Milestones
14/4/10
15GridPP24, RHUL
Experiments - red metrics
ATLAS
• One red metric, due to problems with T1 disk acceptance
• Job success rate and T1 data availability up
LHCb
• Red metrics similar to last quarter. Low numbers of jobs account for some.
• RAL downtime from database issues reduced T1 LHCb SAM tests uptime
• T2 SAM tests low due to issues with EFDA and UCL
CMS
• ‘Good quarter’
Other experiments
• Efficiency up to 82% in December, with ALICE data
• Fractions used by other experiments down (7.6% of T1 CPU, compared with 14% in Q3)
• New users: T2K, Super-B, SuperNemo14/4/10
16GridPP24, RHUL
Grid services
Operations
• 2.1.3 Fraction of KSI2k used (Target 80%, achieved 61%).
• 2.1.6 Job success rates – no longer use this metric, as SLL test results unreliable
Security
• One red milestone – site security review. Questionnaires sent out in December. Mingchao will report at this meeting
• Security incident at Oxford: well handled
14/4/10
17GridPP24, RHUL
Tier-1
• T1 operated well during data taking in Nov/Dec
• Work on UPS supply problems continuing. Other issues resolved (cooling, water leak)
• One tranche of disk capacity failed acceptance – now solved
• MoU commitment for tape not met, due to change in requirements because of LHC schedule
• Milestones 3.4.21 General ADS Service Ends. Major users have been migrated. Considering next steps.
• Milestone 3.3.31 R89 document available. A document detailing the trasition to R89 was published in March and is available on request.
14/4/10
18GridPP24, RHUL
Tier-2s
% of promised disk and CPU available – green for all Tier-2s (metrics 1&2).
SAM availability and reliability tests green or orange (so above 90%) for most Tier-2s (metrics 3&4).
Metric 5 (SLL ATLAS test) now suspended
Other red metrics:
• Average SLL SE test performance (metric 6) London
• CPU utilisation (wall clock time & CPU time, metrics 7/8) LondonGrid, SouthGrid
• Number of management meetings NorthGrid (metric 11)
• Tier-2 meeting LCG MoU service levels (metric 14) LondonGrid –UCL-central: slow to install kernel update - site taken offline
14/4/10
19GridPP24, RHUL
Management and external
Project execution – red metrics• Quarterly reports target not met (too
much focus on GridPP4!)
Rest of Map• No red metrics
14/4/10
20GridPP24, RHUL
Finances - summary
14/4/10
21GridPP24, RHUL
Finances: Tier-1 hardware
• Tier-1 disk purchase FY09 increased in order to address potential shortfall: now allows for half a year buffer in disk
• £661k moved forward into FY09 from FY10, at the request of STFC
• Hardware requirements re-profiled, as a result of changes in LHC schedule
• Likely charge for networking costs pa, due to new arrangement between research councils and JISC
14/4/10
22GridPP24, RHUL
Finances: Tier-2s and staff
• Second tranche of Tier-2 hardware agreed• 0.5FTE CMS support post at RAL delayed due to
recruitment restrictions. Increased to 1FTE for FY10, to compensate for delay.
• Bridging funding to retain expertise of some EGEE-funded staff where posts are envisaged to continue in GridPP4.
14/4/10
23GridPP24, RHUL
In parallel
14/4/10
24GridPP24, RHUL
EGEE moves to EGI
4/2/10
25GridPP24, RHUL
And data comes…
4/2/10