1 alice grid status david evans the university of birmingham gridpp 16 th collaboration meeting qmul...

22
1 ALICE Grid Status ALICE Grid Status David Evans David Evans The University of The University of Birmingham Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

Upload: anna-lee

Post on 27-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

1

ALICE Grid StatusALICE Grid Status

David EvansDavid Evans

The University of BirminghamThe University of Birmingham

GridPP 16th Collaboration MeetingQMUL 27-29 June 2006

Page 2: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

2

Outline of TalkOutline of Talk

The ALICE ExperimentThe ALICE Experiment ALICE computing requirementsALICE computing requirements ALICE Grid – AliEnALICE Grid – AliEn Analysis using AliEnAnalysis using AliEn Status of ALICE Data Challenge 2006Status of ALICE Data Challenge 2006 Summary and OutlookSummary and Outlook

Page 3: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

3

The ALICE ExperimentThe ALICE Experiment

ALICE is one of the four main LHC ALICE is one of the four main LHC experiments at CERN.experiments at CERN.

Only one dedicated to heavy-ion physics.Only one dedicated to heavy-ion physics.– Study of QCD under extreme conditionsStudy of QCD under extreme conditions

~ 1000 collaborators~ 1000 collaborators ~ 100 institutions~ 100 institutions Birmingham is only Birmingham is only

UK institute involvedUK institute involved

Page 4: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

4

ALICE RequirementsALICE Requirements

Data taking (each year)Data taking (each year)– 1 month of Pb-Pb data ~ 1 PByte1 month of Pb-Pb data ~ 1 PByte– Also p-p for rest of the year ~ 1 PByteAlso p-p for rest of the year ~ 1 PByte

Large scale simulation effortLarge scale simulation effort – 1 Pb-Pb event: ~ 8 hrs (3 GHz)1 Pb-Pb event: ~ 8 hrs (3 GHz)

Data ReconstructionData Reconstruction Data analysisData analysis Smaller Collaboration than Smaller Collaboration than

ATLAS or CMS but similar ATLAS or CMS but similar computing requirements.computing requirements.

Page 5: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

5

Profile of CPU Profile of CPU requirementsrequirements

Total

CERN T0

CERN T1

Ext Tier 1

Ext Tier 2

35 MSK2K

Jan 07 Sept 08 Nov 09

Page 6: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

6

Tier HierarchyTier Hierarchy

MONARC ModelMONARC Model

‘‘Cloud Model’Cloud Model’ (Tier free) used (Tier free) used in ALICE data challenges (native AliEn sites – for LCG site in ALICE data challenges (native AliEn sites – for LCG site

we comply with Tier model)we comply with Tier model)

Tier 0RAW data master copyData reconstruction (1st pass)Prompt analysis

Tier 1Copy of RAWreconstructionScheduled analysis

Tier 2MC productionPartial copy of ESDData analysis

Page 7: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

7

ALICE Gridd - AliEnALICE Gridd - AliEn

AliEn (ALICE Environment) – Grid framework AliEn (ALICE Environment) – Grid framework developed by ALICE – developed by ALICE – used in production for ~5 used in production for ~5 yearsyears..

Based on WEB services and standard protocols.Based on WEB services and standard protocols. Built around open source codeBuilt around open source code

– Less than 5% is native AliEn code (mainly Less than 5% is native AliEn code (mainly PERLPERL).).

To date, To date, > 500,000> 500,000 ALICE jobs have been run ALICE jobs have been run under AliEn control worldwide.under AliEn control worldwide.

Page 8: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

8

AliEn ‘Pull’ ProtocolAliEn ‘Pull’ Protocol

One of the major differences between ALiEn and LCG One of the major differences between ALiEn and LCG grids is that AliEn uses the ‘grids is that AliEn uses the ‘pullpull’ rather than ‘’ rather than ‘pushpush’ protcol.’ protcol.

EDG/Globus model:EDG/Globus model:

ALiEn model:ALiEn model:

user server

ResourceBroker

user server

ResourceBroker

job

list

Page 9: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

9

LCG / gLiteLCG / gLite

ALICE is committed to using as much common ALICE is committed to using as much common grid applications as possible.grid applications as possible.

ChangesChanges have been made to make AliEn work have been made to make AliEn work with LCGwith LCG– E.g. changes to File Catalogue (FC) E.g. changes to File Catalogue (FC) LFC (Local File LFC (Local File

Catalogue or LCG File Catalogue) Catalogue or LCG File Catalogue)

– V0 Box at each Tier 1 and Tier 2 V0 Box at each Tier 1 and Tier 2

– Globus/GSI compatible authenticationGlobus/GSI compatible authentication

Interface Interface AliEn AliEn gLite gLite in development in development

Page 10: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

10

AnalysisAnalysis

Core of ALICE computing model is Core of ALICE computing model is AliRootAliRoot– Uses ROOT frameworkUses ROOT framework

Couple AliEn with ROOT for Grid-based analysis.Couple AliEn with ROOT for Grid-based analysis.– Use Use PROOFPROOF – Parallel ROOT Facility – Parallel ROOT Facility

– To the user it’s like using ROOTTo the user it’s like using ROOT

4-tier architecture: 4-tier architecture: – ROOT client session, API server (ROOT client session, API server (AliEn + PROOFAliEn + PROOF), ),

Site PROOF master servers, PROOF slave servers. Site PROOF master servers, PROOF slave servers.

Data from DC2006 only accessible via GridData from DC2006 only accessible via Grid

Page 11: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

11

PROOFPROOF

Each node has PROOF slave

Each site has PROOF master server

Uses ‘pull’ protocol i.e. the slaves ask the master for work packets.Slower slaves get smaller work packets etc.

ClientAPI

APIServer

AliEnFC….

List of sites with

data

Page 12: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

12

ALICE Data Challenge ALICE Data Challenge 2006 (PDC’06) 2006 (PDC’06)

Last ‘challenge’ before the start of data Last ‘challenge’ before the start of data takingtaking

Test of all Grid components Test of all Grid components – AliEn as a ALICE interface to the Grid and AliEn as a ALICE interface to the Grid and

much, much moremuch, much more– LCG/gLite baseline services (WMS, DMS)LCG/gLite baseline services (WMS, DMS)

Test of computing centres infrastructureTest of computing centres infrastructure Major test of stability of all of the aboveMajor test of stability of all of the above

Page 13: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

13

Grid software deployment Grid software deployment and runningand running

LCG sites are operated through the VO-box frameworkLCG sites are operated through the VO-box framework – All ALICE sites need oneAll ALICE sites need one– Relatively extended deployment cycle, a lot of configuration and Relatively extended deployment cycle, a lot of configuration and

version update issues had to be solvedversion update issues had to be solved– Situation is quite routine nowSituation is quite routine now

Data managementData management– This year – xrootd as disk pool manager on all sitesThis year – xrootd as disk pool manager on all sites– The installation/configuration procedures have just been releasedThe installation/configuration procedures have just been released– xrootd integrated in other storage management solutions (CASTOR, xrootd integrated in other storage management solutions (CASTOR,

DPM, dCache) – under development DPM, dCache) – under development Data replication (FTS)Data replication (FTS)

– We use it for scheduled replication of data between the computing We use it for scheduled replication of data between the computing centres (RAW from T0->T1, MC production T2->T1, etc…)centres (RAW from T0->T1, MC production T2->T1, etc…)

– Fully incorporated in the AliEn FTD, to be extensively tested in JulyFully incorporated in the AliEn FTD, to be extensively tested in July

Page 14: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

14

VO box support and VO box support and operationoperation

In additional to the standard LCG components, the In additional to the standard LCG components, the VO-box runs ALICE-specific software componentsVO-box runs ALICE-specific software components– V0-boxes now at RAL Tier 1 and Birmingham Tier 2V0-boxes now at RAL Tier 1 and Birmingham Tier 2– Birmingham ALICE students are testing ALiEn for analysis Birmingham ALICE students are testing ALiEn for analysis

purposes through Birmingham Tier 2.purposes through Birmingham Tier 2. The installation and maintenance of these is entirely The installation and maintenance of these is entirely

our responsibility:our responsibility:– Support for UK V0-box supplied by CERN (no UK Support for UK V0-box supplied by CERN (no UK

manpower available)manpower available) Site related problems are handled by the site adminsSite related problems are handled by the site admins LCG services problems are reported to GGUSLCG services problems are reported to GGUS

Page 15: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

15

Operation statusOperation status

Running in a continuous mode since 24/05 Running in a continuous mode since 24/05 VO-boxes: VO-boxes:

– monthly releases of AliEn (curently v.2-10) , LCG 2.7.0 and monthly releases of AliEn (curently v.2-10) , LCG 2.7.0 and soon tests of gLite 3.0soon tests of gLite 3.0

Central ALICE services:Central ALICE services:– AliEn machinery and API Service is developed/deployed and AliEn machinery and API Service is developed/deployed and

maintained by the AliEn team maintained by the AliEn team Site services:Site services:

– Stability testing of both AliEn and LCG componentsStability testing of both AliEn and LCG components– The interfaces AliEn-LCG/gLite are still in developmentThe interfaces AliEn-LCG/gLite are still in development– A gLite V0-box has already been provided at CERN and first A gLite V0-box has already been provided at CERN and first

tests performed.tests performed.

Page 16: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

16

Running status – Running status – one monthone month

Page 17: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

17

Sites contributions in the Sites contributions in the past 2 monthspast 2 months

60%T1, 40%T2 (almost half from 2 T2 sites!)60%T1, 40%T2 (almost half from 2 T2 sites!)

RAL: 0.7%

Page 18: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

18

Running status – site Running status – site averagesaverages

Pledged resources – Pledged resources – 4000 CPUs4000 CPUs

Our average is on a 12% Our average is on a 12% levellevel– Due to central and site Due to central and site

services malfunctionsservices malfunctions– Mostly due to sites Mostly due to sites

providing less CPUs providing less CPUs than pledgedthan pledged

Page 19: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

19

Stability Stability improvementsimprovements

This is a data challenge, so there is always This is a data challenge, so there is always place for improvement:place for improvement:– AliEn is undergoing gradual fixes and new AliEn is undergoing gradual fixes and new

features are added features are added – The LCG software will undergo a quantum leap The LCG software will undergo a quantum leap

– move from LCG to gLite– move from LCG to gLite– Site infrastructure – VO-box, etc… also needs Site infrastructure – VO-box, etc… also needs

solidification, especially at the T2ssolidification, especially at the T2s– Monitoring and control – continuously adding Monitoring and control – continuously adding

new features new features

Page 20: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

20

OutlookOutlook

PDC’06 has started as plannedPDC’06 has started as planned– This is the last exercise before the beam!This is the last exercise before the beam!– It is a test of all Grid tools/services we will use It is a test of all Grid tools/services we will use

in 2007in 2007» If not in PDC’06, good chance is that they will not If not in PDC’06, good chance is that they will not

be readybe ready

– It is also a large-scale test the computing It is also a large-scale test the computing infrastructure – computing, storage and infrastructure – computing, storage and network performancenetwork performance

Page 21: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

21

Outlook (2)Outlook (2) We have all pieces needed to run production on the We have all pieces needed to run production on the

GridGrid (some untested). (some untested). The exercise started 2 months ago and will continue The exercise started 2 months ago and will continue

until the end of the yearuntil the end of the year At the moment, we are optimising the use of At the moment, we are optimising the use of

resources – attempting to get from the sites the resources – attempting to get from the sites the promised resourcespromised resources

Next phase of the plan is a test of the file transfer Next phase of the plan is a test of the file transfer utilities of LCG (FTS) and integration with AliEn utilities of LCG (FTS) and integration with AliEn FTDFTD

In parallel to that we will run event production as In parallel to that we will run event production as usualusual

Page 22: 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006

22

SummarySummary

AliEn is a Grid framework developed by ALICE AliEn is a Grid framework developed by ALICE using 95% open source code (e.g using 95% open source code (e.g SOAPSOAP) and 5 % ) and 5 % AliEn specific (AliEn specific (perlperl) code.) code.

AliEn evolving to take into account EGEE/gLite AliEn evolving to take into account EGEE/gLite framework and to work with LCG.framework and to work with LCG.– New user interfaces developed New user interfaces developed – PROOF for analysis developedPROOF for analysis developed– Better authentication/authorisation developedBetter authentication/authorisation developed

Data Challenge 2006Data Challenge 2006 – since April – – since April – going wellgoing well V0 boxes at RAL T1 and B’ham T2V0 boxes at RAL T1 and B’ham T2 Lack of computing resources a worry.Lack of computing resources a worry.