take on messages from lecture 1 lhc computing has been well sized to handle the production and...

Take on messages from Lecture 1 LHC Computing has been well sized to handle the

production and analysis needs of LHC (very high data rates and throughputs) Based on the hierarchical Monarc model It has been very successful

WLCG operates smoothly and reliably Data is well transferred and made available in a very

short time to everybody Higgs boson discovery was announced within a week

from latest data update! Network has worked well and allows now for

computing model changes

Ian.Bird@cern.ch /

August 2012

Grid co

mputing enables the rapid delivery

of physic

s results

Outlook to the Future

Computing Model Evolution

Evolution of computing models

Hierarchy Mesh

Evolution

During the development the evolution of the WLCG Production grid has oscillated between structure and flexibility Driven by capabilities of the infrastructure and the

needs of the experiments

ALICE RemoteAccess

PD2P/Popularity

CMS Full Mesh

Structur

Data management in the WLCG has been moving to a less deterministic system as the software improved

Started with deterministic pre-placement of data on disk storage for all samples (ATLAS)

Then subscriptions driven by physics groups (CMS)

Then dynamic placement of data based on access to only replicate samples that were going to be looked at (ATLAS)

Once IO is optimized and network links improve we can send data over the wide area so jobs can run anywhere and access the data (ALICE, ATLAS, CMS)• Good for opportunistic resources, balancing, clouds, or any other

time when the sample will be accessed only once

Data Management Evolution

Structur

Scheduling evolution has similar drivers We started with a very deterministic system where jobs

were sent directly to a specific site

This leads to early binding of jobs to resourcesrequests idle in long queues, no ability to reschedule

All 4 experiments evolved to use a set of pilots to make better scheduling decisions based on current information

The pilot system now evolves further to allow submission to additional resources like clouds

What began as a deterministic system has evolved to flexibility in scheduling and resources

Scheduling Evolution

More dynamic data placement is needed

less restrictions in where the data comes from

but data is still pushed to sites8

Data Access Frequency

Ian FiskFNAL/CD

Tier-1

Tier-2

Tier-1

Tier-2

Services like the Data Popularity Service track all the file accesses and can show what data is accessed and for how long

Over a year, popular data stays that way for reasonable long periods of time

Popularity

Ian FiskFNAL/CD

CMS Data Popularity Service

ATLAS uses the central queue and popularity to understand how heavily used a dataset is

Additional copies of the data made Jobs re-brokered to use them

Unused copies are cleaned

Dynamic Data Placement

Ian FiskFNAL/CD

Requests

Tier-1

Tier-2

With optimized IO other methods of managing the data and the storage are available

Sending data directly to applications over the WANAllows users to open any file regardless of their locations or the file’s source

Sites deploy at least one xrootd server that acts as a proxy/door

Wide Area Access

Ian FiskFNAL/CD

Once we have a combination of dynamic placement, wide area access to data, and reasonable networking then facilities we can be treated as part of a coherent system

Also opens doors to use new kinds of resources (opportunistic resorces, commercial clouds, data centers..)

Transparent Access to Data

CERN is deploying a remote computing facility in Budapest

200Gb/s of networking between the centers at 35ms ping time

As experiments we cannot really tell the difference where resources are installed

Example: Expanding the CERN Tier0

CERN Budapest100Gb/s

100Gb/s

Tier 0: Wigner Data Centre, Budapest

• New facility due to be ready at the end of 2012

• 1100m² (725m²) in an existing building but new infrastructure

• 2 independent HV lines• Full UPS and diesel

coverage for all IT load (and cooling)

• Maximum 2.7MW

These 100Gb/s links are the first in production for WLCG Other sites will soon follow

We have reduced the differences in site functionality

Then reduced the difference in even the perception that two sites are separate

We can begin to think of the facility as a big center and not a cluster of center This concept can be expanded to many facilities

Networks

The WLCG service architecture has been reasonably stable for over a decade

This is beginning to change with new Middleware for resource provisioning

A variety of places are opening their resources to “Cloud” type of provisioning

From a site perspective this is often chosen for cluster management and flexibility reasons

Everything is virtualized and services are put on top

Changing the Services

Grids offer primarily standard services with agreed protocols

Designed to be generic, but execute a particular task

Clouds offer the ability to build custom services and functions

More flexible, but also more work for users

Clouds vs Grids

CMS and ATLAS are trying to provision resources like this with the High Level Trigger farms

Open Stack interfaced to the Pilot systems In CMS we got to 6000 running cores and

the facility looks like another destination, though no grid CE exists

It will be used for large scale production running in a few weeks

Already several sites have requested similar connections to local resources

Trying this out

We have a grid because: We need to collaborate and share resources Thus we will always have a “grid” Our network of trust is of enormous value for us and

for (e-)science in general

We also need distributed data management That supports very high data rates and throughputs We will continually work on these tools

We are now working on how to integrate Cloud Infrastructures in WLCG

WLCG will remain a Grid

Evolution of the Services and Tools

Computing infrastructure is a needed piece to the ultimate core mission of HEP experiments development effort is steadily decreasing

Common solutions try to take advantage of the similarities in the experiment activities optimize development effort and offer lower long-term

maintenance and support costs

Together with the willingness of the experiments to work together Successful examples in Distributed Data Management, Data

Analysis, Monitoring( HammerCloud, Dashboards, Data Popularity, the Common Analysis Framework , …)

Taking advantage of the Long Shut-down 1

Need for Common Solutions

Architecture of the Common Analysis Framework

Evolution of Capacity: CERN & WLCG

Modest growth until 2014

Anticipate x2 in 2015

Anticipate x5 after 2018

What we thought was needed at LHC start

What we actually used at LHC start!

Resource Utilization was highest in 2012 for both Tier-1 and Tier-2 sites

CMS Tier-1 Pledge Usage

20406080

100120140160180

CMS Tier-2 Pledge Usage

CMS Resource Utilization

Growth curves for resources

CMS Resource Utilization

2012 2013 2014 2015 2016 20170

CMS Tier-1 CPU Run2

Resource Request

Flat Growth

2012 2013 2014 2015 2016 20170

CMS Tier-1 Disk Run2

Resource Request

Flat Growth

2012 2013 2014 2015 2016 20170

Tier-1 Tape

Resource Request

Flat GrowthPB

Conclusions

First years of LHC data – WLCG has helped deliver physics rapidly

Data available everywhere within 48h

Just the start of decades of exploration of new physics Sustainable solutions!

Entering a phase of consolidation and at the same time evolution

LS1: opportunity for disruptive changes and scale testing of new technologies

Wide area access, dynamic data placement, new analysis tools, clouds

Challenges for computing – scale & complexity – will continue to increase

Conclusions

In the new resource provisioning model the pilot infrastructure communicates with the resource provisioning tools directly

Requesting groups of machines for periods of time

Evolving the Infrastructure

Resource Provisioning

Pilots

ResourceRequests

Cloud Interface

VM with PilotsVM with PilotsVM with PilotsVM with PilotsVM with PilotsVM with PilotsVM with Pilots

Batch Queue

WN with PilotsWN with PilotsWN with PilotsWN with PilotsWN with PilotsWN with PilotsWN with Pilots

take on messages from lecture 1 lhc computing has been well sized to handle the production and...

data alice

popular data

dynamic placement of

high energy data

high data rates

latest data update

data madejobs rebrokered

deterministic system

Documents

lhc logging db user documentation -...

high luminosity lhc (hl-lhc) and magnet progress

lead ions in the lhc -...

hl-lhc (high luminosity lhc) general overview · isabel...

lhc symposium - may 3, 2003 1 super lhc - slhc lhc detector...

making sure the lhc is not physics’ best kept secret. ·...

lhc software architecture – evolution toward lhc beam...

what is cern? general overview of cern what is lhc? the...

“changing natural gas pipeline throughputs in...

the lhc computing project common solutions for the lhc

user manual - · pdf fileuser manual esco technologies, inc....

lhc‐3 superconducting magnets for the lhc luminosity...

vision from lhc to hl-lhc operation

hl-lhc and (v)he-lhc accelerator designs and plans

hbaseconasia2017: lift the ceiling of hbase throughputs

ls1 general planning and strategy for lhc, lhc injectors

get quality screening, hygienic operations & high...

mail migration to office 365 measure and estimate mail...

the lhc collider i -...

beam-beam effects for lhc and lhc upgrade scenarios