wp3 information and monitoring steve fisher / ral 23/9/2003
TRANSCRIPT
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3Plan of Talk• EDG 2.0 Release
• EDG 2.1 Release• Wrapping up EDG• Transition to EGEE
I will assume that by now everyone knows what R-GMA is!
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3EDG 2.0 Release• Revised APIs
• Improved tools• GLUE• Service and ServiceStatus• A lot of work on the internals to make it more
reliable
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3EDG 2.0: R-GMA APIs• The C API code totally rewritten– Wraps the C++ API
• Huge reduction in maintenance cost• Get decent error messages
• Removed deprecated methods throughout– Much simpler to understand
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3EDG 2.0: R-GMA Tools
• R-GMA CLI– Supports single query and interactive modes– Can perform simple operations with Consumers,
Producers and Archivers
• R-GMA Browser– Small changes making it much easier to use
• edg-rgma-check– This is being steadily enhanced
• edg-rgma-examples– Simple end-user test
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3EDG 2: Schemas• GLUE schema
• Service and ServiceStatus– Cron job publishes service information on what
service should be running.– WP3 code interrogates the service and publishes
its status
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3EDG Release 2.1• Robustness
• Performance• Authentication• GRM/PROVE• Nagios & Ganglia Integration
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3EDG 2.1: Performance
• Optimizeit revealed a number of problems– Slow Java I/O – now use NIO for streaming– SQL parsing – now use hand coded parser for the
inserts– XML parsing too slow – DOM replaced by SAX– Too many threads – now a big reduction
• Publish data on performance of R-GMA– linked from WP3 web page– application, development and WP3 testbeds– Scripts try to interpret the measurements
• Is the absence of information correct?• If it is lost – where is it lost?
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3EDG 2.1: Authentication
• We now have authentication in place– i.e you must have a proxy– Tomcat machines must have a certificate– Otherwise invisible to the user
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3EDG 2.1: GRM/PROVE
• Primarily to understand the performance of parallel applications
• GRM now uses the mercury monitor (from GridLab) to deal with large volumes of data – but no security.
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3Nagios integration• We are using Nagios to monitor the EDG
ServiceStatus
• Cron configures nagios periodically with the known Services
• Nagios then watches the ServiceStatus• Nagios provides:– Pretty displays– Alarm mechanism
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3Ganglia Integration• Ganglia makes information available as XML
• Have written a component (ranglia) to make that information available as R-GMA tables
• Uses CanonicalProducer – ie on demand generation.
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3Final months of EDG
• Effort will be needed in– Bug fixing– Support– Documentation
• And “demos” of– Multiple VO support– Registry replication– Schema replication– Authorisation– Grid Services
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3Support• Working with WP1 on eliminating GOUT
• Help experiments and other users to make best use of R-GMA– BOSS – GANGA– Network Monitoring– Logging and Bookkeeping– Evaluation by:
• BaBar• UK – e-Science
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3Multiple VO Support• Would like to move to multiple info services – One logical registry per VO– Each record is published to a set of VOs
• Note that record is only published once but the producer is registered in multiple registries
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3Registry Replication• Each logical registry has multiple physical
“copies”• Each entry in registry has 3 possible states• Transmit new and deleted records and a
checksum• Self healing even supports new registry
instances• Consumer uses any instance• Fail over mechanism• Not quite ready for EDG 2.1 release
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3Schema replication• Once the registry replication is stable this
work will be started
• Will probably:– have a mechanism to vote for a master schema– synchronise all schemas with the master– might have masters for subsets of the schema
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3Grid Services• Already have schema and registry as OGSI
Grid services
• We will continue the transition to Grid services – Will provide wrappers for compatibility with our
current APIs.
Steve Fisher/RAL - 23/9/2003Info and Monitoring 1
WP3EGEE transition• Starting now to get GMA part of the OGSA
document– Will then see how R-GMA can be an
implementation of OGSA GMA– Or perhaps the R- also needs to be in OGSA
• Beginning to start thinking about appropriate re-engineering procedures and how to achieve “quality”.