gridpp3 project status sarah pearce 14 april 2010 gridpp24 rhul

25
GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

Upload: janis-atkinson

Post on 20-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

GridPP3 project status

Sarah Pearce 14 April 2010GridPP24 RHUL

Page 2: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

2GridPP24, RHUL

Since last GridPP meeting

• LHC has turned on– First collisions in December– “First physics” 30 March– Grid seems to working as expected for all the experiments

so far

• Tier-1 had some ‘settling in’ problems, mostly resolved

• R89 officially opened• 2nd tranche of T2 hardware grants issued (?)• EPSRC review• EGI/ROSCOE/CUE/GridTalk proposals submitted and

variously accepted or rejected• GridPP4 proposal written and submitted14/4/10

Page 3: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

3GridPP24, RHUL

Tier-1• Acceptance problems with one tranche

of disk delivery. Now resolved - drives replaced with those of a different manufacturer

• Three triggers for disaster management from R89:• Cooling failures in August –

remedial work undertaken and work to address Building Management System problems planned

• Water leak onto tape robot – audit of water sources carried out and rectification

• Impedance problems with the UPS supply – remedial work ongoing

• Tier-1 procurements have been carried out. Disk and CPU have been delivered and are undergoing acceptance testing

• R89 opened on 30 March – same day as LHC event

14/4/10

Page 4: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

4GridPP24, RHUL

EPSRC review

Report explicitly mentions:

• GridPP’s expertise in large-scale distributed data management and analysis.

• Our work with start-up companies.

• The substantial secondary economic benefit arising from the ability to rapidly screen drugs "in-silico”. GridPP resources were used in this way to screen potential agents in the fight against bird-flu and malaria.

NGS and GridPP have been highly successful, providing many users with access to more computing power than they could otherwise easily obtain. Looking forward, we recommend that these efforts, including enhanced capacity and function of distributed storage, be sustained and expanded.14/4/10

Page 5: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

5GridPP24, RHUL

EGI-Inspire etc.

• Wide range of project bids submitted to the EC in November:– EGI-Inspire. In negotiation, will go

ahead with small changes.• UK involved in security, helpdesk,

monitoring, regional support (and training)

– GridTalk-II. In negotiation, will go ahead.

• QMUL and IC UK partners: GridBriefings, GridCasts, GridCafe, RTM…

• May change name?

– ROSCOE (SSC including HEP) – not successful.

– CUE (Training, outreach) – not successful

14/4/10

Page 6: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

6GridPP24, RHUL

December performance

Tier-1• Efficiency for all 4 LHC experiments over 90%• Delivered a record amount of CPU to wLCG

Tier-2s• ScotGrid 98% availability/ 98% reliability• NorthGrid 98%/98% reliability• SouthGrid 97%/98% reliability• LondonGrid 91%/94% reliability

All above 90% threshold

14/4/10

Page 7: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

7GridPP24, RHUL

UKI CPU contribution

CPU April 2010

?

14/4/10

Since GridPP23

2009 up to GridPP23

Page 8: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

8GridPP24, RHUL

UKI Tier-1 & Tier-2 contribution

14/4/10

Since GridPP23

GridPP22-GridPP23

Page 9: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

9GridPP24, RHUL

CPU efficiencySeptember 09 – April10

14/4/10

Page 10: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

10GridPP24, RHUL

Storage

• From gstat (and previous talks…)

September 2008 March 2009 September 2009

April 2010

14/4/10

Page 11: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

11GridPP24, RHUL

Since data taking started

14/4/10

Page 12: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

12GridPP24, RHUL

Project Map GridPP3 Q4 08

Date

ATLAS LHCb CMS1.1 1.2 1.3 1.4

1.1.1 1.1.2 1.1.3 1.1.4 1.1.5 1.2.1 1.2.2 1.2.3 1.2.4 1.2.5 1.3.1 1.3.2 1.3.3 1.3.4 1.3.5 1.4.1 1.4.2 1.4.3 1.4.4 1.4.51.1.6 1.1.7 1.1.8 1.1.9 1.1.10 1.2.6 1.2.7 1.2.8 1.2.9 1.2.10 1.3.6 1.3.7 1.3.8 1.3.9 1.3.10 1.4.6 1.4.7 1.4.8 1.4.9 1.4.101.1.11 1.1.12 1.1.13 1.1.14 1.1.15 1.2.11 1.2.12 1.2.13 1.2.14 1.2.15 1.3.11 1.3.12 1.3.13 1.3.14 1.3.15 1.4.111.1.16 1.1.17 1.1.18 1.1.19 1.1.20 1.2.16 1.2.17 1.2.18 1.2.19 1.2.201.1.21 1.1.22 1.1.23 1.1.24 1.1.25 1.2.21 1.2.22 1.2.23 1.2.24 1.2.25

2.1 3.1 4.1 5.1 6.12.1.1 2.1.2 2.1.3 2.1.4 2.1.5 3.1.1 3.1.2 3.1.3 3.1.4 3.1.5 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 6.1.1 6.1.2 6.1.3 6.1.4 6.1.52.1.6 2.1.7 2.1.8 2.1.9 2.1.10 3.1.6 3.1.7 3.1.8 3.1.9 3.1.10 4.1.6 4.1.7 4.1.8 4.1.9 4.1.10 5.1.6 5.1.7 5.1.8 6.1.6 6.1.7 6.1.8 6.1.92.1.11 2.1.12 2.1.13 3.1.11 3.1.12 3.1.13 3.1.14 3.1.15 4.1.11 4.1.12 4.1.13 4.1.14 4.1.15

3.1.16 3.1.17 3.1.18 3.1.19 3.1.203.1.21 3.1.22

2.2 3.2 4.2 5.2 6.22.2.1 2.2.2 2.2.3 2.2.4 2.2.5 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 5.2.1 5.2.2 5.2.3 5.2.4 5.2.5 6.2.1 6.2.2 6.2.3 6.2.4 6.2.52.2.6 2.2.7 2.2.8 2.2.9 2.2.10 3.2.6 3.2.7 3.2.8 3.2.9 3.2.10 4.2.6 4.2.7 4.2.8 4.2.9 4.2.10 5.2.6 5.2.7 5.2.8 5.2.9 5.2.10 6.2.6 6.2.7 6.2.8 6.2.9 6.2.102.2.11 2.2.12 2.2.13 2.2.14 2.2.15 3.2.11 3.2.12 3.2.13 3.2.14 3.2.15 4.2.11 4.2.12 4.2.13 4.2.14 4.2.15 5.2.11 5.2.12 6.2.11 6.2.12 6.2.13 6.2.142.2.16 2.2.17 2.2.18 2.2.19 2.2.20 3.2.16 3.2.17 3.2.18 3.2.19 3.2.202.2.21 3.2.21 3.2.22 3.2.23 3.2.24 3.2.25

3.2.26

2.3 3.3 4.3 6.32.3.1 2.3.2 2.3.3 2.3.4 2.3.5 3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 4.3.1 4.3.2 4.3.3 4.3.4 4.3.5 6.3.1 6.3.2 6.3.3 6.3.4 6.3.52.3.6 2.3.7 3.3.6 3.3.7 3.3.8 3.3.9 3.3.10 4.3.6 4.3.7 4.3.8 4.3.9 4.3.10 6.3.6 6.3.7 6.3.8 6.3.9 6.3.10

3.3.11 3.3.12 3.3.13 3.3.14 3.3.15 4.3.11 4.3.12 4.3.13 4.3.14 4.3.153.3.16 3.3.17 3.3.18 3.3.19 3.3.203.3.21 3.3.22 3.3.23 3.3.24 3.3.253.3.26 3.3.27 3.3.28 3.3.29 3.3.30

2.4 3.4 4.4 6.42.4.1 2.4.2 2.4.3 2.4.4 2.4.5 3.4.1 3.4.2 3.4.3 3.4.4 3.4.5 4.4.1 4.4.2 4.4.3 4.4.4 4.4.5 6.4.1 6.4.2 6.4.3 6.4.4 6.4.52.4.6 2.4.7 2.4.8 2.4.9 2.4.10 3.4.6 3.4.7 3.4.8 3.4.9 3.4.10 4.4.6 4.4.7 4.4.8 4.4.9 4.4.10 6.4.6 6.4.7 6.4.8 6.4.9 6.4.102.4.11 2.4.12 2.4.13 2.4.14 2.4.15 3.4.11 3.4.12 3.4.13 3.4.14 3.4.15 4.4.11 4.4.12 4.4.13 4.4.14 4.4.15 6.4.11 6.4.12 6.4.13 6.4.14 6.4.152.4.16 3.4.16 3.4.17 3.4.18 3.4.19 3.4.20 6.4.16 6.4.17

3.4.21

2.52.5.1 2.5.2 2.5.3 2.5.4 2.5.52.5.6 2.5.7 2.5.8

EGEE

4 5

Hardware procurement

Storage systems

Other experiments

Operations

Security

Front end systems LondonGrid

ScotGridResource delivery

External6

To provide UK computing for the Large Hadron Collider

Grid services Tier-13

Middleware support

SouthGrid

NorthGridData and storage

GridPP3 Goal

2

NGI

LCGNetwork

Planning

Execution

Outreach

Tier-2 Management

14/4/10

Page 13: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

13GridPP24, RHUL

ProjectMap Q4 09

14/4/10

Page 14: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

14GridPP24, RHUL

Project map - statistics

Metrics Milestones

14/4/10

Page 15: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

15GridPP24, RHUL

Experiments - red metrics

ATLAS

• One red metric, due to problems with T1 disk acceptance

• Job success rate and T1 data availability up

LHCb

• Red metrics similar to last quarter. Low numbers of jobs account for some.

• RAL downtime from database issues reduced T1 LHCb SAM tests uptime

• T2 SAM tests low due to issues with EFDA and UCL

CMS

• ‘Good quarter’

Other experiments

• Efficiency up to 82% in December, with ALICE data

• Fractions used by other experiments down (7.6% of T1 CPU, compared with 14% in Q3)

• New users: T2K, Super-B, SuperNemo14/4/10

Page 16: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

16GridPP24, RHUL

Grid services

Operations

• 2.1.3 Fraction of KSI2k used (Target 80%, achieved 61%).

• 2.1.6 Job success rates – no longer use this metric, as SLL test results unreliable

Security

• One red milestone – site security review. Questionnaires sent out in December. Mingchao will report at this meeting

• Security incident at Oxford: well handled

14/4/10

Page 17: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

17GridPP24, RHUL

Tier-1

• T1 operated well during data taking in Nov/Dec

• Work on UPS supply problems continuing. Other issues resolved (cooling, water leak)

• One tranche of disk capacity failed acceptance – now solved

• MoU commitment for tape not met, due to change in requirements because of LHC schedule

• Milestones 3.4.21 General ADS Service Ends. Major users have been migrated. Considering next steps.

• Milestone 3.3.31 R89 document available. A document detailing the trasition to R89 was published in March and is available on request.

14/4/10

Page 18: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

18GridPP24, RHUL

Tier-2s

% of promised disk and CPU available – green for all Tier-2s (metrics 1&2).

SAM availability and reliability tests green or orange (so above 90%) for most Tier-2s (metrics 3&4).

Metric 5 (SLL ATLAS test) now suspended

Other red metrics:

• Average SLL SE test performance (metric 6) London

• CPU utilisation (wall clock time & CPU time, metrics 7/8) LondonGrid, SouthGrid

• Number of management meetings NorthGrid (metric 11)

• Tier-2 meeting LCG MoU service levels (metric 14) LondonGrid –UCL-central: slow to install kernel update - site taken offline

14/4/10

Page 19: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

19GridPP24, RHUL

Management and external

Project execution – red metrics• Quarterly reports target not met (too

much focus on GridPP4!)

Rest of Map• No red metrics

14/4/10

Page 20: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

20GridPP24, RHUL

Finances - summary

14/4/10

Page 21: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

21GridPP24, RHUL

Finances: Tier-1 hardware

• Tier-1 disk purchase FY09 increased in order to address potential shortfall: now allows for half a year buffer in disk

• £661k moved forward into FY09 from FY10, at the request of STFC

• Hardware requirements re-profiled, as a result of changes in LHC schedule

• Likely charge for networking costs pa, due to new arrangement between research councils and JISC

14/4/10

Page 22: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

22GridPP24, RHUL

Finances: Tier-2s and staff

• Second tranche of Tier-2 hardware agreed• 0.5FTE CMS support post at RAL delayed due to

recruitment restrictions. Increased to 1FTE for FY10, to compensate for delay.

• Bridging funding to retain expertise of some EGEE-funded staff where posts are envisaged to continue in GridPP4.

14/4/10

Page 23: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

23GridPP24, RHUL

In parallel

14/4/10

Page 24: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

24GridPP24, RHUL

EGEE moves to EGI

4/2/10

Page 25: GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

25GridPP24, RHUL

And data comes…

4/2/10