cic portal/cod activities

25
samedi 18 juin 2022 CIC Portal/COD Activities Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France

Upload: tricia

Post on 30-Jan-2016

59 views

Category:

Documents


0 download

DESCRIPTION

CIC Portal/COD Activities. Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France. Contents. CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover. Use tools. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CIC Portal/COD Activities

samedi 22 avril 2023

CIC Portal/COD Activities

Hélène Cordier

IN2P3/CNRS Computing Centre, Lyon, France

Page 2: CIC Portal/COD Activities

Contents

CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover

Page 3: CIC Portal/COD Activities

22/04/23The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 3

Use toolsUse tools

Each actor can use a set of operational tools (provided, integrated or interfaced)

REGIONAL CENTER

SITEUSER

OPERATOR

VO MANAGER

Tools(CIC Portal)

CommunicateCommunicate

Track, report, Track, report, diagnose and diagnose and

follow-up follow-up problemsproblems

Manage static Manage static information information

about my VOabout my VO

Report on site Report on site activity, activity,

submit tests, submit tests, configureconfigure

Page 4: CIC Portal/COD Activities

What do people connect to the CIC portal for ?

Distribution in 2005

VO11%

OAG4%

COD39%

ROC14%

RC11%

users4%

home17%

Distribution in 2007

VO6%

OAG0%

COD37%

ROC5% RC

23%

users1%

home28%

0

100

200

300

400

500

600

700

800

900

1000

déc-

04

févr

-05

avr-

05

juin

-05

août

-05

oct-

05

déc-

05

févr

-06

avr-

06

juin

-06

août

-06

oct-

06

déc-

06

févr

-07

avr-

07

juin

-07

août

-07

oct-

07

déc-

07

month

nu

mb

er o

f co

nn

ecti

on

s

Av connections Dec 2004-Dec 2007Av connections Dec 2004-Dec 2007

Page 5: CIC Portal/COD Activities

Connections and process

New registrations

0

2

4

6

8

1012

14

16

18

20

juin-06juil-06

août-06

sept-06oct-06

nov-06

déc-06

janv-07

févr-07

mars-07avr-07

mai-07

juin-07juil-07

août-07

sept-07oct-07

nov-07

déc-07

janv-08

Total nb of registered VOs

133

60

0

20

40

60

80

100

120

140

juin-06juil-06

août-06

sept-06

oct-06

nov-06

déc-06

janv-07

févr-07

mars-07

avr-07

mai-07

juin-07juil-07

août-07

sept-07

oct-07

nov-07

déc-07

janv-08

Number of sent Broadcasts

0

50

100

150

200

250

mars-05avr-05

mai-05

juin-05juil-05

août-05

sept-05oct-05

nov-05

déc-05

janv-06

févr-06

mars-06avr-06

mai-06

juin-06juil-06

août-06

sept-06oct-06

nov-06

déc-06

janv-07

févr-07

mars-07avr-07

mai-07

juin-07juil-07

août-07

sept-07oct-07

nov-07

déc-07

janv-08

Titr

e de

l'ax

e

Distribution in 2007

VO6%

OAG0%

COD37%

ROC5% RC

23%

users1%

home28%

Page 6: CIC Portal/COD Activities

Tasks handled by CIC portal Development team

Task repartition per type

High level or political action

9%Internal tools & synchronization

18%

Technical investigation

5%

Tests and verif ications

7%

Development of new features

6%

Improvement of existing features

30%

Incidents and Bug f ixing

25%

Task repartition per origin of the request

OCC8%

Others2%Failover

7%

OAG + VOs13%

ROCs17% COD

25%

internal 28%

Between October 2006 and February 2007

Task repartition per type

High level or political action

5%

Internal tools & synchronization

18%

Technical investigation

6%

Tests and verif ications

12%

Development of new features

30%

Improvement of existing features

20%

Incidents and Bug f ixing

20%

Between February 2007 and January 2008Between February 2007 and January 2008Task repartition per origin of the request

Failover4%

internal 17%

COD25%

ROCs10%

OAG + VOs17%

OCC12%

Others15%

Page 7: CIC Portal/COD Activities

Contents

CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover

Page 8: CIC Portal/COD Activities

Latest changes in 6 months

Last technical changes– authentication is now based on full certificate DN instead of CN

Work on VO ID cards– changes in Database schema for VO/VOMS information– VO ID card interface improved– Integration of the YAIM VO Configurator to the CIC portal– Downloadable XML dump of VO ID card info

Scheduled downtimes procedure

Integration of the regional 1rst line support dashboard – prototype with CE

Page 9: CIC Portal/COD Activities

CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover

On-going developments

Page 10: CIC Portal/COD Activities

What is left for next release in March

2159 Adapt to new components released into production, cf YAIM tool.

1559 Development of a new version report taking into account several feedback.

1920 Follow SAM migration to gridview on CIC portal side IDLE

Internal Tasks include quick fixes/bug fixes, documentation, background clean-up work, code optimization/prospective for EGEE-III.

Page 11: CIC Portal/COD Activities

22/04/23ARM Meeting, EGEE’07, Budapest 11

COD activityCOD activity

CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover

Page 12: CIC Portal/COD Activities

22/04/23The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 12

A tool for Grid Operators: COD A tool for Grid Operators: COD dashboarddashboard

Operator

Ticketing system

Sites infoMonitoring tool #1

Monitoring tool #2

Monitoring tool #n

Mail client

MANY ENTRY POINTS

Monitoring tool #2

Operator

Ticketing system

Sites info

Monitoring tool #1

Monitoring tool #n

Mail sender

Dashboard

SINGLE ENTRY POINT

Start of EGEEStart of EGEE NowNow

Page 13: CIC Portal/COD Activities

22/04/23The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 13

Interaction with EGEE servicesInteraction with EGEE services

FZK, Karlsruhe, Germany

GGUS

ASGC, Taipei, Taiwan

GstatGstat

CERN, Geneva, Switzerland

SAM

GOC-DB

htt

p GIIS statusper site

- Create ticket- Update ticket

SOAP

- View ticket

Test results on nodes

XSQ

L-ba

sed

serv

ice

- Site info- Scheduled downtimes

SQL queries

IN2P3-CC, Lyon,France

OPERATIONS PORTAL

Site4

Site2Site3

Site1

ticket #14

ticket #32No ticket

ticket #28

status

statusstatus

status

status

statusstatus

status

Page 14: CIC Portal/COD Activities

22/04/23The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 14

OutlineOutline

CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover

Page 15: CIC Portal/COD Activities

Statistics

% of opened tickets CE SE SRM RGMA sBDII

October 39 15 14 11 6

November 34 14 18 6 10

December 29 18 21 9 8

Solution time [hours] Oct Nov Dec

cod tickets 269 268 228

ggus tickets ass. To ROCs 277 281 307

ALL SU 364 427 709

Proportion of COD tickets against GGUS tickets for all ROCs

0

100

200

300

400

500

600

700

800

31-juil. 31-août 30-sept. 31-oct. 30-nov. 31 Dec

Tickets opened by COD teams

Tickets opened through GGUS

All GGUS tickets

Page 16: CIC Portal/COD Activities

22/04/23The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 16

CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Duties and Working groups Zoom on Failover

Page 17: CIC Portal/COD Activities

COD Duties

Rotations of 10 federations/teams --

1/5 weeks. Quarterly face-to-face meetings to update tools,

procedures and uniformize working habits.

=================================== 10 federations over 18 months in EGEE-I Working groups for over 18 months now….

Page 18: CIC Portal/COD Activities

There is more to it ….

Straightforward mandate working groups:

GSTAT -- TW,

SAM -- CERN,

SAMAP – CE, topped by- Tools for Improvement for COD, TIC – CE

(EGEE’07)

Page 19: CIC Portal/COD Activities

Working groups mandate

- Integration of the existing tools CIC– FRIntegration platform of all COD tools to ease-up the daily operational job

- Improvement of BEST PRACTICES -- DE-CHIdentifity, raise and analyse with COD how to have homogeneous operations

- Release of updated documentation OPM –SEDocumentation under constant evolution

- Set-up of Failover Mechanisms for GRID CORE SERVICES – SWE, What is done at a federation level, what is done at the project level (need help from JShiers group), what could be done (operational point of view) and what is needed at the ROC/Site level (from a m/w point of view).

- Set-up of High Availability strategy of the operational tools for CODs FAILOVER– IT

Page 20: CIC Portal/COD Activities

22/04/23The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 20

Failover working groupFailover working group

CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover for Operational Tools

Page 21: CIC Portal/COD Activities

EGEE Failover: purpose

Propose, implement and document failover procedures for the collaboration, management and monitoring tools used in EGEE/WLCG Grid. – Solution is based on DNS and consists in:

• mapping the service name to one or more destinations• update this mapping whenever some failure is detected

Geographical failover for the EGEE-WLCG Grid collaboration tools – CHEP 2007, Victoria BC, Canada (September 2007)

Page 22: CIC Portal/COD Activities

COD Work aspects to keep in EGEE IIII

Dedication : Working groups recognized within federations to provide expertise and by federations to make the needs come to the central operations.

Collaboration : Up to now, each federation had found a way to contribute actively to improve their COD work environment, when not proactively leading a working group.

Also, each person/tool developper/expert recognized as of « global interest » eventhough out of COD scope has been integrated happily in this « closed community », e.g SAMAP TIC scope to monitor this aspect with Nagios prototype for example.

Flexibility : Purpose of the groups to evolve together with their mandate with time and the upcoming of the needs e.g. Core grid services HA, EGI

Anticipation : e.g. Strategy of the Operational Failover Working Group.

Experiment : e.g regionalisation of tools and the future modular « NGI dashboards » to widen the CE 1rst line support experience.

Page 23: CIC Portal/COD Activities

COD Work aspects to make evolve in EGEE IIII

Mandate and Assessment of the COD activity Integration of NDGF/NE as a COD team – other teams ? Catch-all and global operations center -- what core services are to be monitored centrally , and how to monitor them and how to properly switch to backup -- How to aggregate local data and what local data would be concerned Assess metrics in order to assess the most problematic m/w components, recurrently unreliable sites Operational tools reliability assessment /ENOC test as a start base? Strenghten need on HA/Failover of operational tools and grid core services

Vision of the COD tools long-term evolution : 1 set of tools /federation + aggregation?Which set of tools is to be regionalized ? SAM, GOC DB, COD? what else? How are they going to interact => need for a global schema, NOW.

Page 24: CIC Portal/COD Activities

COD Work aspects to make evolve in EGEE IIII

Leverage on « project labeled » tools in order for operational use-cases for not to remain « pending ». developements strategy/priorities are coherent.-- data workflow – synch GOCDB/BDII/SAM/COD-- development strategy – depends on the stretegy of the COD tools long-term evolution-- priority decision workflow – Who and how to drive the « project  labeled » tools requests priority for operational use-cases for not to remain « pending ».- critical tests monitoring/accounting or ARC CE.- ca update procedure,- need for SAM failover… staffing is adequate for proper reactivity not only for bugfix.

Interoperability/interoperations (item to be followed up)– OSG : rather informal for the moment, BUT NOW, users do have problems and

sites are the relay of their users cf GGUS ticket 31037. – NDGF : existing critical test monitoring ? and what are the consequences on

operational procedures?

Page 25: CIC Portal/COD Activities

Conclusions and References

Where, how, when do we adress these topics??

Some can be adressed here or can be thought at at COD meetings, some are relevant to OCC/ROC first and COD working groups can then make suggestions/recommendations.

References:

CIC portal: a Collaborative and Scalable Integration Platform for High Availability Grid Operations

Grid 2007 (IEEE), Austin Tx, United-States (September 2007)

Geographical failover for the EGEE-WLCG Grid collaboration tools CHEP 2007, Victoria BC, Canada (September 2007)

Where, how, when do we adress these topics??

Some can be adressed here or can be thought at at COD meetings, some are relevant to OCC/ROC first and COD working groups can then make suggestions/recommendations.

References:

CIC portal: a Collaborative and Scalable Integration Platform for High Availability Grid Operations

Grid 2007 (IEEE), Austin Tx, United-States (September 2007)

Geographical failover for the EGEE-WLCG Grid collaboration tools CHEP 2007, Victoria BC, Canada (September 2007)