cic portal/cod activities
DESCRIPTION
CIC Portal/COD Activities. Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France. Contents. CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover. Use tools. - PowerPoint PPT PresentationTRANSCRIPT
samedi 22 avril 2023
CIC Portal/COD Activities
Hélène Cordier
IN2P3/CNRS Computing Centre, Lyon, France
Contents
CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover
22/04/23The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 3
Use toolsUse tools
Each actor can use a set of operational tools (provided, integrated or interfaced)
REGIONAL CENTER
SITEUSER
OPERATOR
VO MANAGER
Tools(CIC Portal)
CommunicateCommunicate
Track, report, Track, report, diagnose and diagnose and
follow-up follow-up problemsproblems
Manage static Manage static information information
about my VOabout my VO
Report on site Report on site activity, activity,
submit tests, submit tests, configureconfigure
What do people connect to the CIC portal for ?
Distribution in 2005
VO11%
OAG4%
COD39%
ROC14%
RC11%
users4%
home17%
Distribution in 2007
VO6%
OAG0%
COD37%
ROC5% RC
23%
users1%
home28%
0
100
200
300
400
500
600
700
800
900
1000
déc-
04
févr
-05
avr-
05
juin
-05
août
-05
oct-
05
déc-
05
févr
-06
avr-
06
juin
-06
août
-06
oct-
06
déc-
06
févr
-07
avr-
07
juin
-07
août
-07
oct-
07
déc-
07
month
nu
mb
er o
f co
nn
ecti
on
s
Av connections Dec 2004-Dec 2007Av connections Dec 2004-Dec 2007
Connections and process
New registrations
0
2
4
6
8
1012
14
16
18
20
juin-06juil-06
août-06
sept-06oct-06
nov-06
déc-06
janv-07
févr-07
mars-07avr-07
mai-07
juin-07juil-07
août-07
sept-07oct-07
nov-07
déc-07
janv-08
Total nb of registered VOs
133
60
0
20
40
60
80
100
120
140
juin-06juil-06
août-06
sept-06
oct-06
nov-06
déc-06
janv-07
févr-07
mars-07
avr-07
mai-07
juin-07juil-07
août-07
sept-07
oct-07
nov-07
déc-07
janv-08
Number of sent Broadcasts
0
50
100
150
200
250
mars-05avr-05
mai-05
juin-05juil-05
août-05
sept-05oct-05
nov-05
déc-05
janv-06
févr-06
mars-06avr-06
mai-06
juin-06juil-06
août-06
sept-06oct-06
nov-06
déc-06
janv-07
févr-07
mars-07avr-07
mai-07
juin-07juil-07
août-07
sept-07oct-07
nov-07
déc-07
janv-08
Titr
e de
l'ax
e
Distribution in 2007
VO6%
OAG0%
COD37%
ROC5% RC
23%
users1%
home28%
Tasks handled by CIC portal Development team
Task repartition per type
High level or political action
9%Internal tools & synchronization
18%
Technical investigation
5%
Tests and verif ications
7%
Development of new features
6%
Improvement of existing features
30%
Incidents and Bug f ixing
25%
Task repartition per origin of the request
OCC8%
Others2%Failover
7%
OAG + VOs13%
ROCs17% COD
25%
internal 28%
Between October 2006 and February 2007
Task repartition per type
High level or political action
5%
Internal tools & synchronization
18%
Technical investigation
6%
Tests and verif ications
12%
Development of new features
30%
Improvement of existing features
20%
Incidents and Bug f ixing
20%
Between February 2007 and January 2008Between February 2007 and January 2008Task repartition per origin of the request
Failover4%
internal 17%
COD25%
ROCs10%
OAG + VOs17%
OCC12%
Others15%
Contents
CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover
Latest changes in 6 months
Last technical changes– authentication is now based on full certificate DN instead of CN
Work on VO ID cards– changes in Database schema for VO/VOMS information– VO ID card interface improved– Integration of the YAIM VO Configurator to the CIC portal– Downloadable XML dump of VO ID card info
Scheduled downtimes procedure
Integration of the regional 1rst line support dashboard – prototype with CE
CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover
On-going developments
What is left for next release in March
2159 Adapt to new components released into production, cf YAIM tool.
1559 Development of a new version report taking into account several feedback.
1920 Follow SAM migration to gridview on CIC portal side IDLE
Internal Tasks include quick fixes/bug fixes, documentation, background clean-up work, code optimization/prospective for EGEE-III.
22/04/23ARM Meeting, EGEE’07, Budapest 11
COD activityCOD activity
CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover
22/04/23The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 12
A tool for Grid Operators: COD A tool for Grid Operators: COD dashboarddashboard
Operator
Ticketing system
Sites infoMonitoring tool #1
Monitoring tool #2
Monitoring tool #n
Mail client
MANY ENTRY POINTS
Monitoring tool #2
Operator
Ticketing system
Sites info
Monitoring tool #1
Monitoring tool #n
Mail sender
Dashboard
SINGLE ENTRY POINT
Start of EGEEStart of EGEE NowNow
22/04/23The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 13
Interaction with EGEE servicesInteraction with EGEE services
FZK, Karlsruhe, Germany
GGUS
ASGC, Taipei, Taiwan
GstatGstat
CERN, Geneva, Switzerland
SAM
GOC-DB
htt
p GIIS statusper site
- Create ticket- Update ticket
SOAP
- View ticket
Test results on nodes
XSQ
L-ba
sed
serv
ice
- Site info- Scheduled downtimes
SQL queries
IN2P3-CC, Lyon,France
OPERATIONS PORTAL
Site4
Site2Site3
Site1
ticket #14
ticket #32No ticket
ticket #28
status
statusstatus
status
status
statusstatus
status
22/04/23The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 14
OutlineOutline
CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover
Statistics
% of opened tickets CE SE SRM RGMA sBDII
October 39 15 14 11 6
November 34 14 18 6 10
December 29 18 21 9 8
Solution time [hours] Oct Nov Dec
cod tickets 269 268 228
ggus tickets ass. To ROCs 277 281 307
ALL SU 364 427 709
Proportion of COD tickets against GGUS tickets for all ROCs
0
100
200
300
400
500
600
700
800
31-juil. 31-août 30-sept. 31-oct. 30-nov. 31 Dec
Tickets opened by COD teams
Tickets opened through GGUS
All GGUS tickets
22/04/23The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 16
CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Duties and Working groups Zoom on Failover
COD Duties
Rotations of 10 federations/teams --
1/5 weeks. Quarterly face-to-face meetings to update tools,
procedures and uniformize working habits.
=================================== 10 federations over 18 months in EGEE-I Working groups for over 18 months now….
There is more to it ….
Straightforward mandate working groups:
GSTAT -- TW,
SAM -- CERN,
SAMAP – CE, topped by- Tools for Improvement for COD, TIC – CE
(EGEE’07)
Working groups mandate
- Integration of the existing tools CIC– FRIntegration platform of all COD tools to ease-up the daily operational job
- Improvement of BEST PRACTICES -- DE-CHIdentifity, raise and analyse with COD how to have homogeneous operations
- Release of updated documentation OPM –SEDocumentation under constant evolution
- Set-up of Failover Mechanisms for GRID CORE SERVICES – SWE, What is done at a federation level, what is done at the project level (need help from JShiers group), what could be done (operational point of view) and what is needed at the ROC/Site level (from a m/w point of view).
- Set-up of High Availability strategy of the operational tools for CODs FAILOVER– IT
22/04/23The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 20
Failover working groupFailover working group
CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover for Operational Tools
EGEE Failover: purpose
Propose, implement and document failover procedures for the collaboration, management and monitoring tools used in EGEE/WLCG Grid. – Solution is based on DNS and consists in:
• mapping the service name to one or more destinations• update this mapping whenever some failure is detected
Geographical failover for the EGEE-WLCG Grid collaboration tools – CHEP 2007, Victoria BC, Canada (September 2007)
COD Work aspects to keep in EGEE IIII
Dedication : Working groups recognized within federations to provide expertise and by federations to make the needs come to the central operations.
Collaboration : Up to now, each federation had found a way to contribute actively to improve their COD work environment, when not proactively leading a working group.
Also, each person/tool developper/expert recognized as of « global interest » eventhough out of COD scope has been integrated happily in this « closed community », e.g SAMAP TIC scope to monitor this aspect with Nagios prototype for example.
Flexibility : Purpose of the groups to evolve together with their mandate with time and the upcoming of the needs e.g. Core grid services HA, EGI
Anticipation : e.g. Strategy of the Operational Failover Working Group.
Experiment : e.g regionalisation of tools and the future modular « NGI dashboards » to widen the CE 1rst line support experience.
COD Work aspects to make evolve in EGEE IIII
Mandate and Assessment of the COD activity Integration of NDGF/NE as a COD team – other teams ? Catch-all and global operations center -- what core services are to be monitored centrally , and how to monitor them and how to properly switch to backup -- How to aggregate local data and what local data would be concerned Assess metrics in order to assess the most problematic m/w components, recurrently unreliable sites Operational tools reliability assessment /ENOC test as a start base? Strenghten need on HA/Failover of operational tools and grid core services
Vision of the COD tools long-term evolution : 1 set of tools /federation + aggregation?Which set of tools is to be regionalized ? SAM, GOC DB, COD? what else? How are they going to interact => need for a global schema, NOW.
COD Work aspects to make evolve in EGEE IIII
Leverage on « project labeled » tools in order for operational use-cases for not to remain « pending ». developements strategy/priorities are coherent.-- data workflow – synch GOCDB/BDII/SAM/COD-- development strategy – depends on the stretegy of the COD tools long-term evolution-- priority decision workflow – Who and how to drive the « project labeled » tools requests priority for operational use-cases for not to remain « pending ».- critical tests monitoring/accounting or ARC CE.- ca update procedure,- need for SAM failover… staffing is adequate for proper reactivity not only for bugfix.
Interoperability/interoperations (item to be followed up)– OSG : rather informal for the moment, BUT NOW, users do have problems and
sites are the relay of their users cf GGUS ticket 31037. – NDGF : existing critical test monitoring ? and what are the consequences on
operational procedures?
Conclusions and References
Where, how, when do we adress these topics??
Some can be adressed here or can be thought at at COD meetings, some are relevant to OCC/ROC first and COD working groups can then make suggestions/recommendations.
References:
CIC portal: a Collaborative and Scalable Integration Platform for High Availability Grid Operations
Grid 2007 (IEEE), Austin Tx, United-States (September 2007)
Geographical failover for the EGEE-WLCG Grid collaboration tools CHEP 2007, Victoria BC, Canada (September 2007)
Where, how, when do we adress these topics??
Some can be adressed here or can be thought at at COD meetings, some are relevant to OCC/ROC first and COD working groups can then make suggestions/recommendations.
References:
CIC portal: a Collaborative and Scalable Integration Platform for High Availability Grid Operations
Grid 2007 (IEEE), Austin Tx, United-States (September 2007)
Geographical failover for the EGEE-WLCG Grid collaboration tools CHEP 2007, Victoria BC, Canada (September 2007)