operation in ab-co 2005 & beyond

30
Operation in Operation in AB-CO AB-CO 2005 & Beyond 2005 & Beyond

Upload: asher-mcgee

Post on 31-Dec-2015

33 views

Category:

Documents


1 download

DESCRIPTION

Operation in AB-CO 2005 & Beyond. Scope. How to ensure a support to operation with the right quality of services Domains Are: PS Complex : Linac2, Linac3, PSB, PS, AD, Isolde (+ REX), LEIR SPS & Transfer lines Experimental area CTF3 LHC Hardware commissioning Cryogenic systems - PowerPoint PPT Presentation

TRANSCRIPT

Operation in Operation in AB-COAB-CO2005 & Beyond 2005 & Beyond

Scope Scope

How to ensure a support to operation with How to ensure a support to operation with the right quality of servicesthe right quality of services

Domains Are:Domains Are: PS Complex : Linac2, Linac3, PSB, PS, AD, Isolde (+

REX), LEIR SPS & Transfer lines Experimental area CTF3 LHC Hardware commissioning

Cryogenic systems Beam interlock & Powering interlock Systems QPS Vacuum, PO

LHC

ObjectivesObjectives

Homogenize principles through the Homogenize principles through the different domains different domains

Include the new requirements Include the new requirements Hardware commissioning LHC commissioning & operation

Identify and Agree with partners on Identify and Agree with partners on responsibility limitsresponsibility limits

Emit recommendations on, organization Emit recommendations on, organization tools, procedures,tools, procedures,

AB CO working groupAB CO working group

Section AP Section DM Section FC Section HT Section IN Section IS

Eugenia Hatziangeli

Ronny Billen

Nicolas de Metz-Noblat

Jean-Claude

Bau

Alastair Bland

Philippe Gayet

Franck Di Maio

PierreCharrue

Frank Locci

PlanningPlanning

15 Octobre first meeting15 Octobre first meeting End of December proposals forEnd of December proposals for

2005

End of april proposals forEnd of april proposals for 2006-2010 reminding :

2006-2007 Hardware commissioning 2006 LEIR run 2007-2008 LHC Commissioning 2008-2009 first phase of LHC operation

Recommendations for 2005Recommendations for 2005

LINAC, BOOSTER, ISOLDE, LINAC3.LINAC, BOOSTER, ISOLDE, LINAC3. As it is now with CO internal adjustments

LEIR.LEIR. During commissioning PL will organize support After acceptation same as above with enforced

support for new technology

SPSSPS No piquet support, Only insfrastructure support

during working time

LHC hardware CommissioningLHC hardware Commissioning Each PL organize the support for his project (PIC,

QPS,CRYO,….) Infrastructure support for Servers, FIP, PVSS, FESA,

CMW, Laser, logging,

CO Equipments CO Equipments

CO Software (app, components) CO Software (app, components)

FESA

CMW

UNICOS (Cryo)

UNICOS (PIC,QPS,CRYO)

PIC

UNICOS (PIC,QPS,CRYO)

LASER, Logging,… CO

DIA

G

LASER, Logging , BIC …

PIC,BIC,QPSCryo Ring

CM, JAPC…

Tim

ing

FESA

Tools for Hardware installation & Tools for Hardware installation & Operation Operation

Naming ConventionNaming Convention

Layout DBLayout DB Two layers of descripition

System (PLC, VWE, GATEWAY, FIP segment, Server,..) Functional Component (slot) of systems (board, Power

Supply CPU,…) Connection to functional slots (timing, PIC, Power,

Ethernet

ABCAMABCAM Asset management tools describe all physical

equipment associated to a functional slots

VME-VXIVME-VXI

Failure typesFailure types Power/Network failure RACK top : Power supply, timing fan out, RF repeater (local

diagnostic) intervention by trained team with procedure CPU : (monitored by Xcluc), intervention by trained team

with procedure CO Board : (all CO board does not contains remote

monitoring mechanism or if they exist they are not homogeneous) intervention by trained team with procedure

1553 Fieldbus and serial link (not always monitored) intervention by trained team with procedure

Application in FE : (seen in Xcluc) repair or reboot or do nothing by operators

VME-VXIVME-VXI

Problems to address Problems to address 450 units , Several back planes type

BDI,RF types cannot be maintained by CO PS complex equipments to be transferred from

configuration DB to the Hardware maintenance tools Different monitoring & remote action methods

Huge investment (money & manpower) to be done to homogenize

Some equipments does not have monitoring capabilities (racks)

Cohabitation of CO non CO managed board PB of differential diagnostics Who is doing the intervention

FIP FIP

Failure types Failure types Power (disseminated power supplies along the network) Ethernet only for gateway Gateway (150) components failures (diagnostic on Xcluc)

gateway replacement by trained team with procedure, soft reloading by operators

Mother board, power supply, FIP Board,Timing cards Segment (585)Component failures (diagnostic via FIP diagnostic

tool) component replacement by trained team with procedure Copper/ Fiber coupler, Cu/Cu repeater,FIP DIAG

Agent failures (diagnostic via FIP diagnostic tool or supervision/expert application) equipment group responsibility

Application in FE : (seen in Xcluc) repair or reboot or do nothing by operators

FIP FIP

Problem to addressProblem to address CO declare all components/architectures/layout in the

maintenance/operation tools Provide homogeneous Tools for Diagnostic & remote action

Remote reset . Restart gateway Make difference between agent (equipment) and FIP (CO) problem Agent diagnostics

PLCPLC

Failure typesFailure types Power / Network Back plane power supply , PLC Ethernet board, CPU board (no

remote differential diagnostic possible) intervention by trained team with procedure

IO board or field bus board failure (monitored by PLC console software) intervention by trained team with procedure

Instruments or electronic failure (PIC)(monitored trough PLC/PVSS) intervention by specialist

Application failure (seen in supervision system) action via PLC console software by specialist

PLCPLC

Problem to AddressProblem to Address PLC owned BY CO (Cryo(125), PIC/WIC(44), RR(??))

Different projects with different constraint and principles For PIC CO is also responsible for electronic equipments monitored via

PLC/PVSS PLC owned by Equipment group (BT, PO, VAC,RF(20)some PLC in

between (30) We have to determine limit of CO responsibilities & services Centralize all PLC related information in tools accepted by the community

Abcam, LayoutDB

Common Diagnostics principles to be established Generalize and complete IEPLC diagnostics methodology to all PLCs Remote reset/action are not always a good strategy (disastrous for Cryo

PLC with a Ethernet PB) Action possible only after a local diagnostics Intervention procedures need to be establish by CO and followed by a

trained (on PLC) team After a CPU replacement application reload needed in some cases

The support need to know how to use PLC console program Identify who can perform these task and train them

TIMINGTIMING

Failure typeFailure type GMT Distribution

Power failure failure of a Timing component (Coupler, repeater, Timing

Board) trained team Cable or Fiber disconnection/cut trained team Timing board failure on client unit (VME, Gateway) trained team

Timing Distribution Connection /repeaters trained team Event timing disabled by user : should be treated by operators

MTG sequencer Hardware failure specialists Error in programming operation timing specialist

Timing reception via Ethernet in work stations (video)

TIMINGTIMING

Problem to AddressProblem to Address Introduce GMT layout & Timing distribution Layout DB

Back log of “PS complex” Difficult to sort Software/User error & hardware for normal

operation crew Several tools for timing diagnostics for different PB

CTRtest, TG8test timing board reception check Video: telegram reception (In FE and WS) TestTGM : availability of services

Necessity to have a real timing competence always available in OP

First diagnostic and solution of softwar&user errors Timing related work is part of the normal Operator Work but it’s

not tracked as it should be by OP

ServersServers

Failure types Failure types Power/network (all systems grouped in restricted area) Loss of a system resource

CPU, Power supplies, disk Repair operator Hardware intervention (specialist)

Configuration Loss : Repair /reboot does not solve PB

restore from a backup (specialist) Application

Diag In application itselfRepair from xcluc (operator)

Problems to addressProblems to address OS Configuration homogenization

Still some PS/SL way of life to migrate toward AB Procedure & training for operator intervention

What is the task of the operator How to do it in a proper way

Power DependencePower Dependence

Identify a power Failure on all Process Identify a power Failure on all Process Control devicesControl devices

All systems must be entered in layout DB Connection to power supply known

All power units must be monitored What does that mean ?? Is the granularity achieved by TS-EL compatible with

our needs ???

How to make the link between TS-EL monitoring system And CO equipment

GTPM (data collection nee to be organized) ANOTHER TOOL…

Intervention should be done by OP/TS-EL

Network DependenceNetwork Dependence

Identify a Network Failure on all Process Identify a Network Failure on all Process Control devicesControl devices

All systems must be entered in layout DB Link to be establish to Netops

All network components are monitored How to link the NETOPS/spectrum information to the

CO diagnostic tools

Java Applications : Situation Java Applications : Situation

Legacy softwareLegacy software Known by CO : One member can maintain them Orphans Applications : ??? Both case : Phasing out “Moyen terme” .

New application or new component (library)New application or new component (library) Developed by CO or CO/OP team , this team develops according to

common rules Diagnostic tools available in CCC to make distinction between

application failure or external Problem Software Component List necessary for the application Hardware dependence List

Technical contact list.

Failure TypesFailure Types Controlled process (application) Process expert Control system (application, xcluc ,…) control Specialist

Front end communication, application server. CMW server…) Application (Xcluc) repair and if not efficient application Specialist Config error for data driven application (process expert)

No efficient Intervention on application Software can be No efficient Intervention on application Software can be done by a non expertdone by a non expert

Java application Java application

Problem to addressProblem to address For legacy software

Identify and plan all legacy and Orphans applications upgrade

If no upgrade (not possible or non useful) or before upgrade identify an expert or a support team per application (team can be a mix OP/CO/… Staff)

For new software Identify the expert team per application (OP/CO/…) Include in application documentation or online :

List of dependencies to other applicationList of hardware dependencies

DM applicationDM application

Failure typeFailure type Oracle server IT Applications server see server page Logging application : A monitoring tool exists for

logging on a web based access page. Can be seen & corrected by CCC operator

Config DB : ???

Problem to address Problem to address Ensure the guaranty of services 365/24 by IT for oracle

server Prepare procedure for CCC operator on reference

server web based intervention.

PVSS ApplicationPVSS Application

No automatic control actions performed in PVSS No automatic control actions performed in PVSS applications:applications:

Monitoring, Operator command request, Interface to LASER/logging

All applications Based on JUNICOS frameworksAll applications Based on JUNICOS frameworks Same principles of monitoring through all applications Failure types are not applications dependant

Failure TypesFailure Types Controlled process (via application & SMS) Process expert Control system (via PVSS monitoring tool) PVSS Specialist

Front end comunication,Data server CPU disk usage,Archive monitoring,,Logging exchange monitoring..

PVSS manager (auto repair in case of failure Xcluc) PVSS Specialist

Problems to addressProblems to address Backup/Restore policy to be established Integration with existing tools

Operation ResponsibilitiesOperation Responsibilities

APJava Applications framework

High level applications for :

-LEAR

-LHC HC

LASER

HTTiming /Sequencing

Remote reset

FE

FCCMW

FE

INServers

FE (via xcluc)

PIC/WIC

ISPVSS

IEPLC

CRYO

FIP

Test bench

DMLogging

Configuration DB

ABCAM

LAYOUT DB

All sections will have activities related to operation in 2006

Present piquet know HowPresent piquet know How

APJava Applications framework

Legacy Application

High level application :

-LEAR

-LHC HC

LASER

HTTiming /Sequencing

Remote reset

FE

FCCMW

FE

INServers

FE (via xcluc)

PIC/WIC

ISPVSS

IEPLC

CRYO

FIP

Test bench

DMLogging

Configuration DB

ABCAM

LAYOUT DB

Some remarksSome remarks

We have a large diversity of systems and We have a large diversity of systems and only a small part is integrated today only a small part is integrated today

The Present piquet team is not tailored to The Present piquet team is not tailored to take over the entire operation duty of the take over the entire operation duty of the CO group CO group

1 team leader , 4 experts ,2 new comers “new” technologies not mastered by existing team Geographical dispersion of equipement

In 2006 /2007 Operation activity will have to In 2006 /2007 Operation activity will have to “Cohabite” with installation/commissioning “Cohabite” with installation/commissioning activitiesactivities

Firsts Proposals Firsts Proposals

For hardware system use systematically the layout For hardware system use systematically the layout DB and ABCAM toolsDB and ABCAM tools

Together with OP clean the Power/Network IssuesTogether with OP clean the Power/Network Issues Transmit to OP the Timing software managementTransmit to OP the Timing software management Clarify responsibilities with equipments in all grey Clarify responsibilities with equipments in all grey

areas.areas. Prepare & execute the legacy software upgradePrepare & execute the legacy software upgrade Integrate all existing diagnostic tool Integrate all existing diagnostic tool

LASER (AP),GTPM (OP),XCLUC (IN),Spectrum (IT -CS),TIM (OP),PVSS UNICOS integrated diagnostics (IS/IN),Application integrated diagnostics (AP) ,DiagCMW (FC), TIMING Tools (HT), PLC consoles Tools (IS), FIP diagnostic Tool (IS), Logging monitoring (DM)

Tracks Tracks

All sections must organize (alone, in synergy with other, via a All sections must organize (alone, in synergy with other, via a reorganization,…) the operation support of the systems or reorganization,…) the operation support of the systems or applications they deploy.applications they deploy.

Not systematic organization (PIQUET OR LIST) intervention team can be grouped

IE : hardware for VME, gateway, FIP, PLC PVSS/PLC & PVSS/FEC applications support

Create an operation coordination (a Person or a Team)Create an operation coordination (a Person or a Team) Makes the interface toward OP Coordinates the control system integration

Requesting procedure/documentation to system teams Coordinating the diagnostic tools development Requesting from the different team the functionalities necessary to operation

Create a Real Operation Oriented policy within the entire groupCreate a Real Operation Oriented policy within the entire group

Possible Operation Team Possible Operation Team Duties/Limits for 2006Duties/Limits for 2006

No installationNo installation No configurationNo configuration No application No application

modificationsmodifications No application bug fixingNo application bug fixing No timing user error No timing user error

fixingfixing No intervention on No intervention on

commissioning systemcommissioning system No intervention on No intervention on

Power/network PBPower/network PB

For system in operationFor system in operation HardwareHardware

Remote diagnostic Local diagnostic Reboot, or reinitialize

communication Hardware intervention

(with limitations) Application reloading

(with limitation) Call Equipment specialists

SoftwareSoftware Refine diagnostic Reboot application

(operators) Call specialists

ManagementManagement Tracks problems Requests & obtain

improvements