12th september 2002tim adye1 ral tier a tim adye rutherford appleton laboratory babar collaboration...

12
12th September 2 002 Tim Adye 1 RAL Tier A RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002

Upload: hailey-miles

Post on 28-Mar-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002

12th September 2002 Tim Adye 1

RAL Tier ARAL Tier A

Tim Adye

Rutherford Appleton Laboratory

BaBar Collaboration Meeting

Imperial College, London

12th September 2002

Page 2: 12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002

12th September 2002 Tim Adye 2

Hardware

• 104 “noma”-like machines allocated to BaBar• 156+old farm shared with other experiments• 6 BaBar Suns (4-6 CPUs each)

• 20 TB disk for BaBar• Also using ~10 TB of pool disk for data transfers• All disk servers on Gigabit ethernet• Pretty good server performance

• … as well as existing RAL facilities• 622 Mbit/s network to SLAC and elsewhere

• RAL connection now 2.5 Gbit/s

• AFS server• 100 TB Tape robot (->330 TB->1 PB)

• Many years’ experience running BaBar software

Page 3: 12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002

12th September 2002 Tim Adye 3

Problems

• Disk problems tracked down to a bad batch of drives• All drives are now being replaced by the manufacturer

• our disks should be done in ~1 month

• By using spare servers, replacement shouldn’t interrupt service

• Some (inevitable) scaling problems due to the major expansion in the system• Now that installation and (most) BaBar-requested features

are setup, support staff can concentrate on reliability

Page 4: 12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002

12th September 2002 Tim Adye 4

Support

• Initially suffered from lack of support staff and out-of-hours support• Two new system managers now in post• Two more being recruited (one just for BaBar)• Additional staff have been able to help with problems at

weekends• Discussing more formal arrangements

Page 5: 12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002

12th September 2002 Tim Adye 5

Page 6: 12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002

12th September 2002 Tim Adye 6

RAL Batch CPU Use

0

20,000

40,000

60,000

80,000

100,000

Week Beginning

CP

U H

ou

rs p

er W

eek

(No

rmal

ised

to

P45

0)

SPUK UsersNon-UK Users

Full usage at full efficiency of BaBar CPUs = 106,624 Hours/Week; 59,733 according to MOU

Page 7: 12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002

12th September 2002 Tim Adye 7

RAL Batch Users(running at least one non-trivial job each week)

0

5

10

15

20

25

30

Week Beginning

Use

rs p

er W

eek

Non-UK UsersUK Users

A total of 113 new BaBar users registered since December

Page 8: 12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002

12th September 2002 Tim Adye 8

Data at RAL

• All data in Kanga format is at RAL• 19 TB currently on disk

• Series-8 + series-10 + reskimmed series-10

• AllEvents + streams

• data + signal+generic MC

• New data copied from SLAC within 1-2 days

• RAL is now the primary Kanga analysis site• See Nicole’s talk for details

Page 9: 12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002

12th September 2002 Tim Adye 9

Changes since July

• Two new RedHat 6 front-end machines• Dedicated to BaBar use• Login to babar.gridpp.rl.ac.uk

• Trial RedHat 7.2 service• One front-end and (currently) 5 batch workers• Once we are happy with the configuration, many/all of the

rest of the batch workers will be rapidly upgraded

• ssh AFS token passing installed on front-ends• So, your local (eg. SLAC) token is available when you log in

• Trial Grid Gatekeeper available (EDG 1.2)• Allows job submission from the Grid

• Improved new user registration procedures

Page 10: 12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002

12th September 2002 Tim Adye 10

Plans

• Upgrade full farm to RedHat 7.2• Leave RedHat 6 front-end for use with older releases

• Upgrade Suns to Solaris 8 and integrate into PBS queues

• Install data dedicated import-export machines• Fast (Gigabit) network connection• Special firewall rules to allow scp, bbftp, bbcp, etc.

• AFS authentication improvements• PBS token passing and renewal• integrated login (AFS token on login, like SLAC)

Page 11: 12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002

12th September 2002 Tim Adye 11

Plans

• Objectivity support• Works now for private federations, but no data import

• Support Grid “generic accounts”, so special RAL user registration is no longer necessary

• Procure next batch of hardware• Delivery probably early 2003

Page 12: 12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002

12th September 2002 Tim Adye 12

Summary

• Significant hardware available, and now being fully used

• Disk problems now understood and being fixed• Improvements planned and underway to make using

RAL as SLAC-like as possible (but faster, and maybe better!)

• Join us!• See BaBar home page -> New Accounts• Contact Emmanuel Olaiya (at SLAC) or me (at RAL) for help