glite – an outsider’s view stephen burke ral. january 31 st 2005glite overview introduction a...

14
gLite – An Outsider’s View Stephen Burke RAL

Upload: shonda-bryan

Post on 05-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!

gLite – An Outsider’s View

Stephen BurkeRAL

Page 2: GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!

January 31st 2005 gLite overview

Introduction

• A personal view of the current situation– Asked to be provocative!– Some things may be wrong

• Accurate information can be hard to obtain

• History• Current situation• Future

Page 3: GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!

January 31st 2005 gLite overview

What was supposed to happen

• The original idea was to harden/re-engineer the deployed LCG middleware– Short development cycles driven by user

feedback– No “big bang” releases– No major new development

• Autumn 2003: the ARDA RTAG recommended a new architecture based on AliEn– Set up a prototype system quickly– Rapid development endorsed

Page 4: GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!

January 31st 2005 gLite overview

What actually happened

• EGEE started well after EDG finished– Large gap: December 2003 -> April 2004

• EDG infrastructure (cvs, build system, bug tracking, developer guidelines, testbeds, …) all scrapped– New system may be better (?), but it took ~7 months to

put it in place

• JRA1 “prototype” was essentially AliEn– Only two sites, of which one was hardly supported– ARDA project was set up and started using the prototype

• The members (some with no LCG experience) got used to AliEn

• LCG forged ahead with middleware improvements– Middleware quality/stability much improved– But the experiments are still unhappy

Page 5: GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!

January 31st 2005 gLite overview

AliEn -> gLite

• JRA1 wrote architecture and design documents for a major middleware development project– EDG experience suggests it will take years, not months– Not obviously driven by NA4 or SA1 requirements– AliEn pushed aside

• RB will support pull model as well as push– Migration to web services – everyone seems to like this,

but what is the real gain in the short term?• Web service code mostly not available yet anyway

• Big bang release is back!– Hardly any testing so far by SA1/LCG– Or most users– Information is limited

Page 6: GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!

January 31st 2005 gLite overview

Testbeds

• EDG testbed(s) and ITeam were successful– Both effectively lost in EGEE

• “Prototype” testbed not very useful– Effectively just one site, few machines– Not really a prototype – misled people about what to

expect• JRA1 testing testbed and test team effectively co-

opted for integration– Reduced resources for testing– Few sites, limited manpower– Already ~600 bugs in savannah, growing rapidly

• 260 closed, another 130 fixed and being tested• EDG had ~2500 bugs by the end

• SA1 PPS just starting at the start of 2005– Role still unclear

Page 7: GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!

January 31st 2005 gLite overview

Workload management

• Development of the EDG/LCG RB– Seems to be largely backward-compatible– Only user docs so far are the EDG manuals

• Not clear what new features are available– Or whether LCG mods are included

• Not AliEn!– Support for pull model via new CEMon

component

• Still uses BDII with GLUE schema– Should change to R-GMA (?)

Page 8: GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!

January 31st 2005 gLite overview

Data Management

• Very complex design, largely new code– No real user documentation, just javadoc– Mostly not delivered yet– Not clear how much will be in RC1

• Metadata activities also in ARDA and GridPP• Still seem to be developing the architecture

– Particularly the interaction with the WMS– WMS hedging its bets

• Supports both (gLite and LCG) systems

• LCG has also been developing DM tools– New file catalogue on its way– How do they relate to gLite?

Page 9: GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!

January 31st 2005 gLite overview

Data Storage

• gLite has no development of its own, relying on SRM projects

• EDG-SE was not stable enough for production– Still in development?

• dCache almost ready, but has taken ~18 months and still has many bugs– Support unclear

• New LCG Disk Pool Manager– Only an alpha version so far

• Is storage management really this hard?– Will the “classic SE” ever die?!

Page 10: GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!

January 31st 2005 gLite overview

R-GMA

• Should be an information system– But both LCG and gLite still use BDII

• Some user documentation available• gLite version is fairly backward

compatible with LCG version– No web services yet– gLite version getting “fast track” into LCG

• Still few users– But needed for APEL accounting

Page 11: GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!

January 31st 2005 gLite overview

Security

• “Security must be built in from the start”– So it gets a separate activity!

• EGEE security requirements document nearly identical to EDG D7.5 from May 2002– Which was mostly not implemented …

• Both LCG and gLite intend to use VOMS– But still not yet integrated with most middleware– No real strategy for how to use it?– Who “owns” VOMS?

Page 12: GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!

January 31st 2005 gLite overview

Others

• Package management– gLite is developing a software package

manager• and so is LCG!

– May be useful, no experience yet

• GAS– Came with AliEn, not clear if anyone

wants it

Page 13: GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!

January 31st 2005 gLite overview

Operational issues

• System Design– Neither SA1 nor JRA1 has anyone designing how the

complete system should work

• Configuration– The existing system has a very complex configuration

which is the source of many problems– Being addressed in JRA1, but not clear if it will really make

things better

• Stability and debugging– In a big Grid some things are always broken– Error messages and logging must allow problems to be

traced– Services need to be fault-tolerant– Not clear if JRA1 is addressing this

Page 14: GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!

January 31st 2005 gLite overview

What happens next?

• Code being delivered to SA1, will run on PPS• All serious bugs supposed to be fixed by March

– EDG experience is that it took >1 year to go from code delivery to production use – some things never made it!

• Migration strategy?– Hard if you don’t know what will work– LCG has its own developments, especially in data management– New R-GMA is largely backward-compatible

• And not critical yet– New RB seems similar to current version

• At least in push mode• ALICE (and LHCb?) want AliEn

– Data management is completely different• Big bang releases

– Code has now been branched– Will developers be keen to fix bugs in the “old” branch?– “Wait for the next version, it will all be fixed then”!