infso-ri-508833 enabling grids for e-science egee operations egee/lcg ii operation workshop – 26...

42
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

Upload: janel-bishop

Post on 18-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE Operations

EGEE/LCG II OPERATION WORKSHOP – 26th May 2005Operation WG Wrap upC. Vistoli

Page 2: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 2

Enabling Grids for E-sciencE

INFSO-RI-508833

Operations issues covered

• Operation:– GOC DB and site registration– Site Management Workflow– Interaction with OSG

• VO management– Freedom of choice– GRAT– CIC web portal– Resource allocation

• Deployment procedure• Metrics• Accounting• Monitoring (out of time)

Page 3: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 3

Enabling Grids for E-sciencE

INFSO-RI-508833

GOCDB

• GOCDB 2: presentation P. Strange• A lot of discussion and decision about meaning and

type of information to add or insert in the DB

Page 4: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 4

Enabling Grids for E-sciencE

INFSO-RI-508833

New features of GOCDB2

• Main feature is a new roles based authentication system– Roles are granted to contacts to grant permissions– Roles system is expandable to contain new roles and

permissions

• Extra functions now exposed for ROC staff and COD/CIC– Creating new sites– Managing status of existing sites (production status, monitoring

status etc)

• Other improvements– Many bugfixes!– SQL transaction support– UI improvements

Page 5: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 5

Enabling Grids for E-sciencE

INFSO-RI-508833

System architecture

• MySQL 4.x database with transaction support– GOCDB2 now fully supports transactions for enhanced data

integrity– Contact GOC team if you want your tool to link directly to the

MySQL database

• PHP 4 front end interface• Apache 2.x web server• Gridsite security layer

– Grants HTTPS access only to EGEE recognised certificate holders

– Does not map users to roles, this is a function provided by GOCDB

Page 6: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 6

Enabling Grids for E-sciencE

INFSO-RI-508833

GOC How-to

GOC requirements -grid-wide needs and features from a centralized source of info-« how-to » to be settled, followed up.GOC/COD to set up a identified process and validated by ROCs

Variable status : Organisation/Group/Production/Type status« LCG2, PPS »needs « closer to reality » variables definition to lead to coherent info filling from ROCs,sites that does not lead to inconsistencies e.g. into monitoring tools to be followed up to ROC managers and COD based on the following.

*****

Needs how to state PPS ? Laurence how to set their IS?look at another site . MS: refer to the Glue-schema names. MT – LCG2 nodes « group » monitored.« Type » = accepted w regards to a given grid w regards with a given ROC. Cf jdl attribute-glueschema ce status not at the site level : confusing as it is. « Type gluevalueperce»= variable dynamically enforced onto IS similar as glue ce site status – COD and CIC – open/closed/ Bdii notification to site admin on daily operations.« Status » = ROC decision + COD decision : candidate,certified, uncertified, suspended, glitepre-production, at the site level.« Group  Gridscope» = many-to-many. Site level.« organisation project organisation or funding partners » = hosting organization. Legal body.

Page 7: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 7

Enabling Grids for E-sciencE

INFSO-RI-508833

Operations issues

• Open issue obout GOCDB and downtimes• ROC: how they collect and certifiy sites• Escalation procedure • Core services management• SA1 requirements escalation

Page 8: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 8

Enabling Grids for E-sciencE

INFSO-RI-508833

Scheduled Downtimes and GOC DB

- No roll-out list mailing- EGEE Broadcast to the affected VO managers and grid

services users – coordination with user support ? - GOC DB filling

****

GOC DB requirement :needed to separate CE and SE. Silently assumed that

scheduled downtimes meant « CE downtimes ».detailed scheduled DT for all nodes at the service level.

Production and inventory and follow-up of CIS from GIIS tool.

IF proved not be sufficient then requirements on IS developpers.

Page 9: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 9

Enabling Grids for E-sciencE

INFSO-RI-508833

Adding a site + site monitoring- ROC management negociation– status « candidate » - Site is established, – « uncertified »- GOC DB - « Site form » complete- SFT local instance by ROCs or the Egee one ok for 1 week

*****- ROCs are to exchange experience with their own local certification process and tests to come up with suggestions to put in

common.- several SFT tests instances need to run concurrently. To be taken care by SFT developpers. Documentation to install a local instance

to be available to willing ROCs.****Input : certified means « not harmful for the structure »SEE: local instance of SFT : good practice. To become a suggestion.TWN: same.PN: site has to be registered in a local bdii –ROC responsability. Then registered into the general bdii automatically no rollout published.CE : same for 3 days.UK :same as CEIt : certification bdii for it sites. Specific set of tests and SFT.NE:D+CH: regular project SFT. Sites for the time bing in « pre-production ». Suggests to set a regional SFT set of tests.Needs documention for PN for this installation.Cern: a.k.a PNFrance: TBDRussia:SWE:

******OSG: will provide the GIP and collect info the same way as EGEE. E.g : OSG could enforce the Egee dteam registration

(OSG Egee: could run native tests and Osg would appear as a given entity in the monitoring system– something similar as a ROC)(Egee OSG : need to pack libraries).

How-to: Follow-up of tickets between footprint/remedy //GGUS can be done manually in the first phase.

Page 10: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 10

Enabling Grids for E-sciencE

INFSO-RI-508833

Site Quarantine and Escalation

Deadline to be dependant on site size.Prioritization of COD work to be dependant on site size.need for a proposal from the COD to be handed to the ROC managers and back to the COD:>100 CPU site deadline 1 day to be changed be top 10 sites deadline 1 day.-----How to take care of non european sites – non existing EGEE ROCsGD Team to give feedback on to ROC managers-----Escalation:1rst mail to site and ROC2nd mail to ROCPhone call to ROC------OSG : contact the registration manager to put pressure on the ROC managerROC to close the ticket. Need to update deadline for ROC manually through COD – OKCOD to put a specific site under observation – for 3 days, into quarantine, when recurrent pbs.OSG : to publicize the « bad reputation sites » on a web pageROCs agreement on metrics to be agreed upon, URGENT.-----To large sites organisation to deal with this constraints and ensure 24/7 like behaviour.---Deadline before CA upgrade becomes « a critical test » is 1 week and need COD actionRGMA no deadline:

Need for accounting : Important. However, registry failures cannot be blamed on sites : refinement on SFT tests needed by SFT developpers. Contact with RGMA team needed.RGMA Tutorial link to be sent to the ROC managers.

Page 11: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 11

Enabling Grids for E-sciencE

INFSO-RI-508833

Collecting requirements through SA1

• CL – almost no official requirements existing from SA1• SZ - implying ROC managers to collect - How-to “legitimate”

requirements.• JT - getting experts to meet -------------------------------------------------STARTING NOW: Issues that have arisen in this workshop1/ M/W security policy – tracability and datamanagement2/ VO Fair share and site implementation3/ Requirement on environement variables on to the batch systems-------Input:

The process of JRA1 enforcement is “frozen” or uneffective. and for Egee2 it is unclear.Send them to SZ to eradicate duplicates. To get feedback from a mailing list and take them to the PTF. SZ to rewrite them and to do the follow-up.1/ depends on JSG – 2/ irrelevant – site level relevancy

Page 12: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 12

Enabling Grids for E-sciencE

INFSO-RI-508833

VO management

1/CIC web

2/Freedom of Choice

Atlas is used in production,

The sites should be aware that they are blacklisted –

to be implemented –medium sized sites

VO should be able to define their customized set of tests

3/ GridAT

Could have been run as dteam VO,and allow better

development effort (e.g adding history features) of

monitoring and alarm tool

Page 13: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 13

Enabling Grids for E-sciencE

INFSO-RI-508833

• Workflow

CIC portal : Support a new VO

CIC Portal Site ROC OAG VO Manager

Request

Validation

Broadcast

Validated ?

yesno

MAIL

MAIL MAILMAIL

Cc

Contact & dialog initialization

Page 14: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 14

Enabling Grids for E-sciencE

INFSO-RI-508833

• Workflow

CIC portal: Publish DC

CIC Portal SitesOther VOs OAGVO Manager

MAIL MAILMAIL

Infos on DC

Publication

Publication

form

Broadcast

Validation

Calendar

NEWS

Authentication

Page 15: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 15

Enabling Grids for E-sciencE

INFSO-RI-508833

Freedom of choice - VO Page – 1/3

Page 16: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 16

Enabling Grids for E-sciencE

INFSO-RI-508833

Freedom of choice - Final List - 2/3

Page 17: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 17

Enabling Grids for E-sciencE

INFSO-RI-508833

Freedom of choice - CIC Page- 3/3

Page 18: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 18

Enabling Grids for E-sciencE

INFSO-RI-508833

GridAT

GridAT (Grid Application Test) definitions:

• GridAT aims to simplify the addition of new tests for new or existing applications.

• GridAT can be used for validating grid site, from VO software viewpoint, submitting a test job and evaluating if its output matches the expected results.

• GridAT is designed to certificate, on-demand, installed grid applications.

Page 19: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 19

Enabling Grids for E-sciencE

INFSO-RI-508833

GridAT Web Interface

WEB portal gives an overview of the Italian GRID from VO viewpoint.

Summary table contains the results for each site of last tests grouped by Virtual Organisation.

More details can be obtained just clicking on  the test date.

Page 20: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 20

Enabling Grids for E-sciencE

INFSO-RI-508833

Resource Allocation Process

• Resource allocation policy– Overview of status and requirements from VO at the OAG

Scratch WN space + MPI + licensed software+

secure data access

****

OAG contact point to set up!!

How to check resource actual allocation – Inventory of actual services from GOC and GIIS– Workflow blackbone

Page 21: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 21

Enabling Grids for E-sciencE

INFSO-RI-508833

Resource negotiation: Problems

• Only general percentages by region available• Interpretation of numbers not always the same• No indication on availability of specific resources (MPI,

licensed software)• Allocation has to be done site by site• Gap between OAG and sites, to be filled by ROCs• ROCs don’t decide on scientific priorities• Exact workflow description missing

Page 22: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 22

Enabling Grids for E-sciencE

INFSO-RI-508833

Resource negotiation: Implementation

• Implementation via CIC portal, “OAG view” (and VO/RC)– Readable to everybody– Role-specific actions reserved to authorized people

Sites: support yes/no, free cycles only / more detailed description OAG/ROCs: (re-)trigger requests to sites/ROCs (specific broadcast) VOs: contact point for discussions with specific sites

– Shows site status by region: Solicited, Answered (+answer details)

• Automate statistics, summaries, steps of the procedure

Page 23: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 23

Enabling Grids for E-sciencE

INFSO-RI-508833

Deployment and Process

• LCG-2_4_0 first release using the new process (5 days late)– Release was picked up at a slow pace (˜2.5 sites/day)– Differences between regions

Repacking and adaptation takes time and is needed – Release was not sufficiently tested

2 test sites for a deployment test are not sufficient

• Release Preparation:• More, early involvement of sites required

– Have to be see the list of potential components very early – Regular progress reports to the ROC managers telephone conference

• Very early announcement of new releases needed– 3 weeks complete list of components and changes

Problematic, because this means certification has to be finished

– 2 weeks before a new release the release has to go to: ROC-IT, ROC-SE, ROC-UK for:

• Test deployment (1 week)• Testing on ROCs testbeds• Fixing bugs will take 1 week

Page 24: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 24

Enabling Grids for E-sciencE

INFSO-RI-508833

Deployment and Process

• Deployment of new releases:

• The ROCs will drive the deployment – Announcement of releases through the ROC managers– Sites that are (much) too late will be excluded by their ROCs

• Next Releases:• By mid June an extra release is needed for the SC3

– FTS, LFC service, VOMS (RFC compliant proxy extensions), bug fixes– tier1s and tier2s participating in SC3 have to upgrade quickly

• Regular release 1st July – Like mid June + updates– All sites – (may contain gLite WLM for voluntary parallel deployment)

• Transition to gLite will require changes to the process– More frequent releases– step by step introduction of new components

Page 25: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 25

Enabling Grids for E-sciencE

INFSO-RI-508833

Metric

• Two sets needed• Complex, detailed set

– Used for pinpointing problems– Used by ROCs, CICs and site admins (experts)

• Coarse Summary– Measure overall performance– Small, easy to understand set– Hierarchical (Grid, ROCs, CICs, RCs)– Targeted at users to show progress (or lack of)

Page 26: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 26

Enabling Grids for E-sciencE

INFSO-RI-508833

Metric

• General Agreement on the concept– detailed discussions on:

time windows• Sliding windows (week, month, 3 month)

quantities to watch for (RCs, ROCs, CICs…..)• ROCs based on RCs

• CICs based on services

• Release quality has to be measured

• To make progress: workgroup to define quantities– Organized by: Ognjen Prnjat ([email protected]) – Small (˜5), Ognjen, Markus, Helene, Jeff T. and ???– Ognjen will collect input– ROCs, CICs and OMC have to agree on ONE set of quantities –

Page 27: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 27

Enabling Grids for E-sciencE

INFSO-RI-508833

Metering: Gianduia

Page 28: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 28

Enabling Grids for E-sciencE

INFSO-RI-508833

DGAS deployment

VO1

VO2

VO3

site1

site2

site3

HLR 1

HLR 2

HLR 3 HLR 5

HLR 4

CE

CE

CECE

CE

CE

APEL

Aggregate site accounting

Page 29: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 29

Enabling Grids for E-sciencE

INFSO-RI-508833

Page 30: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 30

Enabling Grids for E-sciencE

INFSO-RI-508833

How APEL Works?

• PBS/LSF log processed daily on site CE to extract required data, filter acts as R-GMA DBProducer -> PbsRecords table

• Gatekeeper log processed daily on site CE to extract required data, filter acts as R-GMA DBProducer -> GkRecords table

• Message log processed daily on site CE to extract required data, filter acts as R-GMA DBProducer -> MessageRecords table

• Site GIIS interrogated daily on site CE to obtain SpecInt and SpecFloat values for CE, acts as DBProducer -> SpecRecords table, one dated record per day

• These three tables joined daily on MON to produce LcgRecords table. As each record is produced program acts as StreamProducer to send the entries to the LcgRecords table on the GOC site.

• Site now has table containing its own accounting data; GOC has aggregated table over whole of LCG.

• Interactive and regular reports produced by site or at GOC site as required.

Page 31: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 31

Enabling Grids for E-sciencE

INFSO-RI-508833

APEL and gLite

– Is APEL integrated in g-Lite? Work currently in progress. We have ported the APEL code into the gLite CVS repository but

need to understand functional differences e.g. WMS and use of Condor

3 Components: Core + PBS plugin + LSF plugin Sent our requirements to Erwin Laure….waiting for information.

– What about its deployment plan? As soon as possible …but would also like to add some new features

• Global Job ID to link with L&B

• DN to VO mapping

Page 32: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 32

Enabling Grids for E-sciencE

INFSO-RI-508833

GridICE Architecture

Resource

Site Publisher

Sensor

event collector

event provider

consumer

publisher

WAN

LAN

publishers

Lemon srv

Lemon agt

LDAP Client

MDS GRIS

scripts

HTTP:HTML/XMLNS

GridICE on LCG 2

logical components

roles

GridICE Server

Consumer

WAN

xML: pull,aperiodic,unicastNS: push,aperiodic,unicast

Browser

Data delivery model

pull,periodic,unicast

push,periodic,unicast

application

consumers

Page 33: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 33

Enabling Grids for E-sciencE

INFSO-RI-508833

GridICE and DGAS Common Metering for Grid jobs

• DGAS is an accounting system, therefore is interested in knowing the usage-related parameters of a job after its execution

• GridICE is a monitoring system, therefore is interested in knowing the job-related information since the job is created in the queue– The information should be updated frequently and provided to

users respecting the security concerns

queued

running aborted

deletedexecuted

GridICE

DGAS

Page 34: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 34

Enabling Grids for E-sciencE

INFSO-RI-508833

GridICE on gLite

Resource

Site Publisher

Sensor

event collector

event provider

consumer

publisher

WAN

LAN

publishers

Lemon srv

Lemon agent

CEMon

scripts

HTTP:HTML/XMLNS

GridICE on gLite

logical components

roles

GridICE Server

Consumer

WAN

xML: pull,aperiodic,unicastNS: push,aperiodic,unicast

Browser

Data delivery model

pull,periodic,unicast

push,periodic,unicast

RGMA

application

consumers

MDS2

consumers

G

Page 35: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 35

Enabling Grids for E-sciencE

INFSO-RI-508833

Summary

GridICE in LCG 2.x GridICE in gLite

Schema GLUE 1.1++ GLUE 1.1++/GLUE 1.2++

Local Area Distribution Lemon

UDP/TCP

Lemon

UDP/TCP

Site Publisher MDS GRIS (LDAP)

no security

CEMon/R-GMA

http+gsi+proxy+voms ext.

Discovery BDII (LDAP) Service Discovery API

Wide Area Data Distribution

LDAP/Pull SOAP/pull (push)

Notification fixed number of events content-based

subscription

Page 36: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 36

Enabling Grids for E-sciencE

INFSO-RI-508833

NPM Architecture

• JRA4/NPM provides uniform access to network performance information from a heterogeneous set of monitoring frameworks

Page 37: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 37

Enabling Grids for E-sciencE

INFSO-RI-508833

We need your help

• We have some idea of requirements from networking experts within JRA4

• Draft requirements document available here:– https://edms.cern.ch/document/593620/1

• Draft use case document available here:– https://edms.cern.ch/document/591777/1

• We’re looking for more input from NOCs and GOCs• If you have requirements, use cases or opinions on

interfaces or needed metrics, please send them to us• Even if you don’t have ideas at the moment, but would

like to be involved in the process, please get in contact• Contact details are at the end of the talk

Page 38: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 38

Enabling Grids for E-sciencE

INFSO-RI-508833

Operations Summary

• CIC On Duty is now well established– COD is just 6 month old!!!!! – Tools have evolved at a dramatic pace

Portal, SFT,……• Many rapid iterations

Truly distributed effort Integration of new COD partner (Russia) went smoothly

– Tuning of procedures is an ongoing process No dramatic changes (take resource size more into account)

Page 39: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 39

Enabling Grids for E-sciencE

INFSO-RI-508833

Operations Summary

• Accounting– Last November still an area of concern– APEL now well established

Support for batch systems is improving Several privacy related problems have been understood and solved

– gLite Accounting: DGAS Some concerns about amount of information published

• Can be handled by proper authorization? Collaboration with APEL on batch sensors (BBQS, Condor,..)

• DGAS agreed to provide them Will be introduced initially on a voluntary basis

• Sites will give feedback (including privacy issues)

Page 40: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 40

Enabling Grids for E-sciencE

INFSO-RI-508833

Operations Summary

Tools!!!!!!!!!!!– GOC-DB, monitoring, monitoring, testing, testing….– Many impressive tools

Lots of overlap, we should focus and fuse some of them R-GMA based “monitoring bus” emerging

• Releases, Deployment– ROCs will drive the deployment– ROCs will contribute to the release preparation

Testing Reviewing the proposed contents

• Performance Metric– Measure service quality (RC, ROCs, CICs,…)– Ognjen organizes small workgroup to define details

Page 41: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 41

Enabling Grids for E-sciencE

INFSO-RI-508833

Operations Summary

OSG • Similar problems, Interesting Tools• Linking of operations between LCG(EGEE) Grid3(OSG)

– Concept: OSG treated like a ROC, LCG like a SC Details will be worked out during interoperation tests

• Worries• Resources

– Activities:– Service Challenges, LCG production, gLite pre-production, gLite

transition, scaling up for LHC scale………

• Duplication of effort– Especially pronounced in the area of tools

Page 42: INFSO-RI-508833 Enabling Grids for E-sciencE  EGEE Operations EGEE/LCG II OPERATION WORKSHOP – 26 th May 2005 Operation WG Wrap up C. Vistoli

EGEE Operations 42

Enabling Grids for E-sciencE

INFSO-RI-508833

Conclusions

• GOCDB2 FEATURES AND LINK WITH OPERATIONS

• OPERATIONS PROCEDURE – CODS Monitoring tools useful : current development effort to be provided

to the COD management.• DEPLOYMENT• PERFORMANCE MEASUREMENT • ACCOUNTING

Development coordinated with It and UK for accounting purposes. Dgas deployed as a facultative component for the time being.Current operations: Rgma and Apel. Apel specifities not covered. Next items on could not be covered by the agenda.

• OSG interoperability : underlying on topics covered– Possibility to inegrate site verification of OSG and site certification in

EGEE - SFT developpers– Interfacing needed somehow between the respective operations

support tools – Footprint and Remedy