overview & status al-ain, uae november 2007

32
Ian Bird Ian Bird CERN CERN LCG Deployment Manager LCG Deployment Manager Overview & Status Al-Ain, UAE November 2007

Upload: yeo-harper

Post on 31-Dec-2015

27 views

Category:

Documents


2 download

DESCRIPTION

Overview & Status Al-Ain, UAE November 2007. Outline. Introduction – The computing challenge -why grid computing? Overview of the LCG Project Project Status Challenges & Outlook. The LHC Computing Challenge. Signal/Noise 10 -9 Data volume - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Overview & Status Al-Ain, UAE November 2007

Ian BirdIan BirdCERN CERN LCG Deployment ManagerLCG Deployment Manager

Overview & Status

Al-Ain, UAENovember 2007

Page 2: Overview & Status Al-Ain, UAE November 2007

Outline

Introduction – The computing challenge -why grid computing?

Overview of the LCG Project

Project Status

Challenges & Outlook

25-Nov-07 [email protected]

Page 3: Overview & Status Al-Ain, UAE November 2007

The LHC Computing Challenge

25-Nov-07 [email protected] 3

Signal/Noise 10-9

Data volume High rate * large number of

channels * 4 experiments

15 PetaBytes of new data each year

Compute power Event complexity * Nb. events *

thousands users

100 k of (today's) fastest CPUs Worldwide analysis & funding

Computing funding locally in major regions & countries

Efficient analysis everywhere

GRID technology

Page 4: Overview & Status Al-Ain, UAE November 2007

Timeline: LHC Computing

25-Nov-07 [email protected] 4

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

LHC approved

ATLAS & CMS approved

ALICEapproved

LHCb approved

“Hoffmann”Review

7x107 MIPS1,900 TB disk

ATLAS (or CMS) requirementsfor first year at design luminosity

ATLAS&CMSCTP

107 MIPS100 TB disk

LHC start

ComputingTDRs

55x107 MIPS70,000 TB disk

(140 MSi2K)

Page 5: Overview & Status Al-Ain, UAE November 2007

Evolution of CPU Capacity at CERN

SC (0.6GeV)

PS (28GeV)ISR (300GeV)

SPS (400GeV)

ppbar(540GeV)

LEP (100GeV)

LEP II (200GeV)

LHC (14 TeV)

Costs (2007Swiss Francs)

Includes infrastructurecosts (comp.centre,

power, cooling, ..) and physics tapes

25-Nov-07 [email protected]

Page 6: Overview & Status Al-Ain, UAE November 2007

Requirements Match

25-Nov-07 [email protected] 6

CPU & diskrequirements:

>10 times CERNpossibility

Page 7: Overview & Status Al-Ain, UAE November 2007

LHC Computing Multi-science Grid

1999 - MONARC project First LHC computing architecture – hierarchicaldistributed model

2000 – growing interest in grid technology HEP community main driver in launching the

DataGrid project 2001-2004 - EU DataGrid project

middleware & testbed for an operational grid 2002-2005 – LHC Computing Grid – LCG

deploying the results of DataGrid to provide aproduction facility for LHC experiments

CERN

25-Nov-07 [email protected]

Page 8: Overview & Status Al-Ain, UAE November 2007

The Worldwide LHC Computing Grid Purpose

Develop, build and maintain a distributed computing environment for the storage and analysis of data from the four LHC experiments

Ensure the computing service … and common application libraries and tools

Phase I – 2002-05 - Development & planning

Phase II – 2006-2008 – Deployment & commissioning of the initial services

25-Nov-07 [email protected]

Page 9: Overview & Status Al-Ain, UAE November 2007

WLCG Collaboration

The Collaboration 4 LHC experiments ~250 computing centres 12 large centres

(Tier-0, Tier-1) 38 federations of smaller

“Tier-2” centres Growing to ~40 countries Grids: EGEE, OSG, Nordugrid

Technical Design Reports WLCG, 4 Experiments: June 2005

Memorandum of Understanding Agreed in October 2005

Resources 5-year forward look

Page 10: Overview & Status Al-Ain, UAE November 2007

LCG Service Hierarchy

25-Nov-07 [email protected] 10

Tier-0 – the accelerator centre Data acquisition & initial processing Long-term data curation Distribution of data Tier-1 centres

Tier-1 – “online” to the data acquisition process high availability

Managed Mass Storage – grid-enabled data service

Data-heavy analysis National, regional support

Tier-2: ~130 centres in ~35 countries End-user (physicist, research group)

analysis – where the discoveries are made Simulation

Canada – Triumf (Vancouver)France – IN2P3 (Lyon)Germany – Forschunszentrum KarlsruheItaly – CNAF (Bologna)Netherlands – NIKHEF/SARA (Amsterdam)Nordic countries – distributed Tier-1

Spain – PIC (Barcelona)Taiwan – Academia SInica (Taipei)UK – CLRC (Oxford)US – FermiLab (Illinois) – Brookhaven (NY)

Page 11: Overview & Status Al-Ain, UAE November 2007

CERN18%

All Tier-1s39%

All Tier-2s43%

CERN12%

All Tier-1s55%

All Tier-2s33%

CERN34%

All Tier-1s66%

CPU Disk Tape

Summary of Computing Resource RequirementsAll experiments - 2008From LCG TDR - June 2005

CERN All Tier-1s All Tier-2s TotalCPU (MSPECint2000s) 25 56 61 142Disk (PetaBytes) 7 31 19 57Tape (PetaBytes) 18 35 53

Distribution of Computing Services

about 100,000CPU cores

New data will grow atabout 15 PetaBytes

per year – with two copies

Significant fraction of the resourcesdistributed over more than

120 computing centres25-Nov-07 [email protected]

Page 12: Overview & Status Al-Ain, UAE November 2007

Grid Activity

Continuing increase in usage of the EGEE and OSG grids All sites reporting accounting data (CERN, Tier-1, -2, -3) Increase in past 17 months – 5 X number of jobs

- 3.5 X cpu usage

100K jobs/day

Page 13: Overview & Status Al-Ain, UAE November 2007

October 2007 - CPU UsageCERN, Tier-1s, Tier-2s

> 85% of CPU Usage is external to CERN

* NDGF usage for September 2007

*

25-Nov-07 [email protected]

Page 14: Overview & Status Al-Ain, UAE November 2007

Tier-2 Sites – October 2007

30 sites deliver 75% of the cpu 30 sites deliver 1%

25-Nov-07 [email protected]

Page 15: Overview & Status Al-Ain, UAE November 2007

LHCOPN Architecture

Tier-2s and Tier-1s are inter-connected by the general

purpose research networks

Any Tier-2 mayaccess data at

any Tier-1

Tier-2 IN2P3

TRIUMF

ASCC

FNAL

BNL

Nordic

CNAF

SARAPIC

RAL

GridKa

Tier-2

Tier-2

Tier-2

Tier-2

Tier-2

Tier-2

Tier-2Tier-2

Tier-2

25-Nov-07 [email protected]

Page 16: Overview & Status Al-Ain, UAE November 2007

Data Transfer out of Tier-0

25-Nov-07 [email protected] 16

Page 17: Overview & Status Al-Ain, UAE November 2007

Middleware: Baseline Services

Storage Element Castor, dCache, DPM (with SRM 1.1) Storm added in 2007 SRM 2.2 – long delays incurred

- being deployed in production Basic transfer tools – Gridftp, .. File Transfer Service (FTS) LCG File Catalog (LFC) LCG data mgt tools - lcg-utils Posix I/O –

Grid File Access Library (GFAL) Synchronised databases T0T1s

3D project

Information System Compute Elements

Globus/Condor-C web services (CREAM)

gLite Workload Management in production at CERN

VO Management System (VOMS)

VO Boxes Application software installation Job Monitoring Tools

The Basic Baseline Services – from the TDR (2005)

... continuing evolutionreliability, performance, functionality, requirements

25-Nov-07 [email protected]

Page 18: Overview & Status Al-Ain, UAE November 2007

Site Reliability – CERN + Tier-1s

“Site Reliability” a function of

grid services middleware site operations storage management

systems networks ........

Targets – CERN + Tier-1sBefore

July July 07 Dec 07 Avg.last 3 months

Each site 88% 91% 93% 89%

8 best sites 88% 93% 95% 93%

Page 19: Overview & Status Al-Ain, UAE November 2007

Tier-2 Site Reliability

average all sites 81%top 50% 96%top 20% 99%

Tier-2 Site Reliability October 2007

Site ReliabilityTier-2 Sites

83 Tier-2 sites being monitored

Page 20: Overview & Status Al-Ain, UAE November 2007

Improving Reliability

Monitoring Metrics Workshops Data challenges Experience Systematic

problem analysis Priority from software

developers

Page 21: Overview & Status Al-Ain, UAE November 2007

LCG depends on two major science grid infrastructures ….

EGEE - Enabling Grids for E-ScienceOSG - US Open Science Grid

25-Nov-07 [email protected]

Page 22: Overview & Status Al-Ain, UAE November 2007

LHC Computing Multi-science Grid

1999 - MONARC project First LHC computing architecture – hierarchicaldistributed model

2000 – growing interest in grid technology HEP community main driver in launching the DataGrid

project 2001-2004 - EU DataGrid project

middleware & testbed for an operational grid 2002-2005 – LHC Computing Grid – LCG

deploying the results of DataGrid to provide aproduction facility for LHC experiments

2004-2006; 2006-2008 – EU EGEE project starts from the LCG grid shared production infrastructure expanding to other communities and sciences Now preparing 3rd phase

CERN

25-Nov-07 [email protected]

Page 23: Overview & Status Al-Ain, UAE November 2007

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

240 sites45 countries45,000 CPUs12 PetaBytes> 5000 users> 100 VOs> 100,000 jobs/day

ArcheologyAstronomyAstrophysicsCivil ProtectionComp. ChemistryEarth SciencesFinanceFusionGeophysicsHigh Energy PhysicsLife SciencesMultimediaMaterial Sciences…

Grid infrastructure project co-funded by the European Commission - now in 2nd phase with 91 partners in 32 countries

Page 24: Overview & Status Al-Ain, UAE November 2007

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

EGEE infrastructure use

LHCC Comprehensive Review; 19-20 November 2007 24

> 90k jobs/day LCG>143 k jobs/day total> 90k jobs/day LCG>143 k jobs/day total

Data from EGEE accounting systemData from EGEE accounting system

Page 25: Overview & Status Al-Ain, UAE November 2007

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 25

EGEE working with related infrastructure projects

Page 26: Overview & Status Al-Ain, UAE November 2007

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Sustainability: Beyond EGEE-II

• Need to prepare permanent, common Grid infrastructure• Ensure the long-term sustainability of the European e-infrastructure

independent of short project funding cycles• Coordinate the integration and interaction between National Grid

Infrastructures (NGIs)• Operate the European level of the production Grid infrastructure for

a wide range of scientific disciplines to link NGIs

26

Page 27: Overview & Status Al-Ain, UAE November 2007

EGI – European Grid Initiative

www.eu-egi.orgwww.eu-egi.org EGI Design Study proposal to the European Commission (started Sept 07)

Supported by 37 National Grid Initiatives (NGIs)

2 year project to prepare the setup and operation of a new organizational model for a sustainable pan-European grid infrastructure after the end of EGEE-3

Page 28: Overview & Status Al-Ain, UAE November 2007

Challenges

Short timescale Preparation for start-up:

○ Resource ramp-up across Tier 1 and 2 sites○ Site and service reliability

Longer term Infrastructure – power and cooling Multi-core CPU – how will we make best use of them? Supporting large scale analysis activities – just starting now –

what will be the new problems that arise?

Migration from today’s grid to a model of national infrastructures – how to ensure that LHC gets what it needs

25-Nov-07 [email protected]

Page 29: Overview & Status Al-Ain, UAE November 2007

Combined Computing Readiness Challenge - CCRC

A combined challenge by all Experiments & Sites validate the readiness of the WLCG computing infrastructure before start of data taking at a scale comparable to that need for data taking in 2008

Should be done well in advance of the start of data taking to identify flaws, bottlenecks and allow time to fix them

Wide battery of tests – simultaneously – all experiments Driven from DAQ with full Tier-0 processing Site-site data transfers, storage system to storage system Required functionality and performance Data access patterns similar to 2008 processing CPU and data loads simulated as required to reach 2008 scale

Coordination team in place Two test periods – February, May

Page 30: Overview & Status Al-Ain, UAE November 2007

Ramp-up Needed for Startup

Jul Sep Apr -07 -07 -08

3.7 X

Sep Jul Apr -06 -07 -08

Sep Jul Apr -06 -07 -08

2.3 X 3.7 Xtarget usageusage

pledgeinstalled

25-Nov-07 [email protected]

Page 31: Overview & Status Al-Ain, UAE November 2007

Summary

We have an operational grid service for LHC

EGEE – The European Grid Infrastructure - is the world’s largest multi-disciplinary grid for science ~240 sites; > 100 application groups

Over the next months before LHC comes on-line: Ramp-up resources to the MoU levels Improve service reliability and availability Full program of “dress-rehearsals” to

demonstrate the complete computing system

Page 32: Overview & Status Al-Ain, UAE November 2007

Tier-1 Centers: TRIUMF (Canada); GridKA(Germany); IN2P3 (France); CNAF (Italy); SARA/NIKHEF (NL); Nordic Data Grid Facility (NDGF); ASCC (Taipei);

RAL (UK); BNL (US); FNAL (US); PIC (Spain)

The Grid is now in operation, working on: reliability, scaling up, sustainability

[email protected]