lhc data challenge...the lhc data challenge • the accelerator will run for 20 years •...
TRANSCRIPT
LHC data challenge & computing outlook
Nils Høimyr – CERN/IT
Accelerating Science and Innovation
Computing at CERN
3
CERN was founded 1954: 12 European States “Science for Peace”
Today: 22 Member States
Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Israel, Italy, the Netherlands, Norway, Poland, Portugal, Romania, Slovakia, Spain, Sweden, Switzerland and the United Kingdom Associate Member in Pre-Stage to Membership: Serbia, CyprusAssociate Members: India, Pakistan, Turkey, UkraineApplicant States for Membership or Associate Membership:Brazil, Russia, SloveniaObservers to Council: India, Japan, Russia, Turkey, United States of America; European Commission and UNESCO
~ 2300 staff ~ 1000 other paid personnel > 11000 users Budget (2015) ~1000 MCHF
17 January 2017 Computing for the LHC 4
17 January 2017 Computing for the LHC 5
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
7000 tons, 150 million sensorsgenerating data 40 millions times per second
The ATLAS experiment
7
A collision at LHC
January 2013 - The LHC Computing Grid - Nils Høimyr
COLLISIONS
8
Collisions Produce 1PB/s
Tier 0 at CERN: Acquisition, First pass reconstruction, Storage & Distribution
January 2013 - The LHC Computing Grid - Nils Høimyr
1.25 GB/sec (ions)
2012: 400-600 MB/sec2012: 400-600 MB/sec
2012: 4-6 GB/sec2012: 4-6 GB/sec
Pick the interesting events• 40 million per second
• Fast, simple information• Hardware trigger in
a few micro seconds
• 100 thousand per second• Fast algorithms in local
computer farm • Software trigger in <1 second
• Few 100 per second• Recorded for study
10
Muontracks
Energydeposits
Computing for the LHC
Pick the interesting events: Data size
• 40 million per second• Fast, simple information• Hardware trigger in
a few micro seconds
• 100 thousand per second• Fast algorithms in
computers • Software trigger
• Few 100 per second• Recorded for study
Computing for the LHC 11
~1 Petabyte per second?• Cannot afford to store it
• 1 year’s worth of LHC data at 1 PB/s would cost few hundred trillion dollars/euros
• Have to filter in real time to keep only “interesting” data
• We keep 1 event in a million • Yes, 99.9999% is thrown away
>>6 Gigabytes per second
17 January 2017
CERN Data Centre
• Built in the 70s on the CERN site (Meyrin-Geneva), 3.5 MW for equipment
• Extension located at Wigner (Budapest), 2.7 MW for equipment• Connected to the Geneva CC with 3x100Gb links (24 ms RTT)• Hardware generally based on commodity• 15,000 servers, providing 190,000 processor cores• 80,000 disk drives providing 250 PB disk space• 104 tape drives, providing 140 PB
Computing for the LHC 12
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
The LHC Data Challenge
• The accelerator will run for 20 years
• Experiments are producing about 35 Million Gigabytes of data each year (about 4 million DVDs – 1250 years of movies!)
• LHC data analysis requires a computing power equivalent to ~100,000 of today's fastest PC processors
• Requires many cooperating computer centres, as CERN can only provide ~15% of the capacity
The LHC Computing Grid - Nils Høimyr
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
The LHC Computing Grid - Nils Høimyr
Solution: the Grid
• Use the Grid to unite computing resources of particle physics institutes around the world
The World Wide Web provides seamless access to information that is stored in many millions of different geographical locations
The Grid is an infrastructure that provides seamless access to computing power and data storage capacity distributed over the globe
The Worldwide LHC Computing Grid
Tier-1: permanent storage, re-processing, analysis
Tier-0 (CERN): data recording, reconstruction and distribution
Tier-2: Simulation,end-user analysis
> 2 million jobs/day
~600’000 cores
500 PB of storage
nearly 170 sites, 40 countries
10-100 Gb links
WLCG:An International collaboration to distribute and analyse LHC data
Integrates computer centres worldwide that provide computing and storage resource into a single infrastructure accessible by all LHC physicists23 September 2015 Ian Bird 15
WLCG: World wide infrastructure
Computing for the LHC 16
Facts about WLCG:
• A community of 10,000 physicists are the WLCG users
• On average around 250,000 jobs running concurrently
• 600,000 processing cores• 15% of the WLCG computing
resources are at CERN’s data centre
• 500 petabytes storage available worldwide
• 20-40 Gbit/s optical-fiber links connect CERN to each of the 13 Tier-1 institutes
17 January 2017
Pledged Resources - 2016
17 January 2017 Computing for the LHC 170
200
400
600
800
1000
1200
1400
1600
1800
Tier 0 Tier 1 Tier 2
Th
ou
san
ds
CPU(kHS06)ALICE ATLAS CMS LHCb
0
20
40
60
80
100
120
140
Tier 0 Tier 1 Tier 2
Disk(PB)ALICE ATLAS CMS LHCb
0
50
100
150
200
250
300
Tier 0 Tier 1
Tape(PB)ALICE ATLAS CMS LHCb
0
200
400
600
800
1000
1200
1400
1600
1800
Tier 0 Tier 1 Tier 2
Th
ou
san
ds
CPU(kHS06)ALICE ATLAS CMS LHCb
0
20
40
60
80
100
120
140
Tier 0 Tier 1 Tier 2
Disk(PB)ALICE ATLAS CMS LHCb
0
50
100
150
200
250
300
Tier 0 Tier 1
Tape(PB)ALICE ATLAS CMS LHCb
Tier022%
Tier134%
Tier244%
CPU
Tier022%
Tier134%
Tier244%
CPU
Tier 0,58, 19%
Tier 1,118, 38%
Tier 2,131, 43%
Disk
Tier 0,58, 19%
Tier 1,118, 38%
Tier 2,131, 43%
Disk
Tier 0,128, 33%
Tier 1,260, 67%
Tape
Tier 0,128, 33%
Tier 1,260, 67%
Tape
17 January 2017 Computing for the LHC 18
LHC data – Continue to break records:
10.7 PB recorded in JulyCERN archive ~160 PB
~160 PB on tape at CERN 500 M files
June-Aug 2016>500 TB / day(Run 1 peak for HI was 220 TB)
2016 to date: 35 PB
• ALICE 6PB• ATLAS 11.6PB• CMS 11.9PB• LHCb 5.4PB
30/11/2016
Compute Growth Outlook
0
20
40
60
80
100
120
140
160
Run 1 Run 2 Run 3 Run 4
GRID
ATLAS
CMS
LHCb
ALICE
Compute: Growth > x50Moore’s law only x16
What we can afford
… and 400PB/year by 2023
OpenStack Switzerland Meetup 19
ArcheologyAstronomyAstrophysicsCivil ProtectionComp. ChemistryEarth SciencesFinanceFusionGeophysicsHigh Energy PhysicsLife SciencesMultimediaMaterial Sciences…
The LHC Computing Grid - Nils Høimyr
WLCG has been leveraged on both sides of the Atlantic, to benefit the wider scientific communityEurope:Enabling Grids for E-sciencE (EGEE) 2004-2010European Grid Infrastructure (EGI) 2010--Nordugrid in the Nordic countriesUSA:Open Science Grid (OSG) 2006-2012 (+ extension)
Many scientific applications
Broader Impact of the LHC Computing Grid
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Grid vs Cloud
• “Cloud computing” is becoming the standard– Web based solutions (http/https and RES)
– Virtualisation, upload virtual machine images to remote sites
• GRID has mainly a scientific user base– Complex applications running across multiple sites, but
works like a cluster batch system for the end user
– Mainly suitable for parallel computing and massive data processing
• Technologies converging– “Internal Cloud” at CERN – lxcloud, now OpenStack
– CernVM – virtual machine running e.g. at Amazon
– “Volunteer Cloud” - LHC@home 2.0
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
From grid to clouds● Data centres providing infrastructure as a service
● Clouds complement and extend the Grid
● Decrease heterogeneity seen by the user (hardware virtualisation)
● VMs provide a uniform user interface to resources– Isolate software and operating system from physical
hardware
– CernVM and CernVM FS adopted by LHC experiments
● New resources (commercial, research clouds)
● Grid of clouds already tried by the LHC experiments
– ATLAS ~450k production jobs from Google over a few weeks
– Tests on Amazon EC2 ~economically viable
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Cloud Infrastructure 23
In production:
• >190K cores• >7000
hypervisors
~100,000 additional cores being installed in next 6 months
90% of CERN’s compute resources are now delivered on top of OpenStack
OpenStack@CERN Status
30/11/2016 OpenStack Switzerland Meetup 24
CERN Tool Chain
30/11/2016 OpenStack Switzerland Meetup 25
Onwards Federated Clouds
Public Cloud such as Rackspace or IBM
CERN Private Cloud160K cores
ATLAS Trigger28K cores
ALICE Trigger9K cores
CMS Trigger13K cores
INFNItaly
Brookhaven National Labs
NecTARAustralia
Many Others on Their Way
30/11/2016 OpenStack Switzerland Meetup 26
Available in standard OpenStack since Kilo
• From ~200TB total to ~450 TB of RBD + 50 TB RGW• From ~200TB total to ~450 TB of RBD + 50 TB RGW
OpenStack Glance + Cinder
Example: ~25 puppet masters readingnode configurations at up to 40kHz
iops
CephFS with Manila is now in pilot phase for cluster filesystems
30/11/2016 OpenStack Switzerland Meetup 27
External Clouds• Terraform is used to create hybrid clouds• Tests for 2-3 months on various European
public resources
30/11/2016 OpenStack Switzerland Meetup 28
High-throughput batch serviceBatch service balances the fair-share across all competing applications according to CERN resource policies
Users interaction pattern: “submit a job,
sits in queue, runs, get result back”
● Currently migrating from proprietary product (LSF) to Open Source HTCondor
– Integrated with WLCG grid environment
● Also running HTCondor on external clouds
● HPC with SLURM and HTCondor backfill
29
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Volunteer grid - LHC@home
• LHC volunteer computing– Allows us to get additional computing
resources for e.g. accelerator physics and theory simulations
• Based on BOINC– “Berkeley Open Infrastructure for
Network Computing”
– Software platform for distributed computing using volunteered computer resources
– Uses a volunteer PC’s unused CPU cycles to analyse scientific data
– Virtualization support - CernVM
– Other well known projects• SETI@Home
• Climateprediction.net
• Einstein@Home
31
You can help us!
• As a volunteer, you can help us by donating CPU when your computer is idle
• Connect with us on: – http://cern.ch/lhcathome
31
Compute Workloads
IT-CM Group Meeting 32
Where is x3 improvement ?
IT-CM Group Meeting 33
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
5000000
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
WLCGCPUGrowth
Tier2
Tier1
CERN
20%Growth
Current state
2008-12linear
0
20
40
60
80
100
120
140
160
Run 1 Run 2 Run 3 Run 4
GRID
ATLAS
CMS
LHCb
ALICE
Compute: Growth > x50
What we think is affordable unless we do something differently
Challenges for computing infrastructure and physics software
● Optimization of the computing infrastructure ● And a new generation of physics software:
– Simulation: Geant 5 for physics event simulation● Vectorization, GPU
– Analysis: Root next gen
– Experiment packages
– HEP software foundation
Plenty of challenges for young talented developers!
The Balance between Academic Freedom, Operations & Computer Security
http://cern.ch/security
Open Data
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it19 August 2015 CERN-ITU - Frédéric Hemmer 37
http://opendata.cern.ch
http://zenodo.org
http://cds.cern.ch
Open Data – Open KnowledgeCERN & the LHC experiments have made the first steps towards Open Data (http://opendata.cern.ch/)
- Key drivers: Educational Outreach & Reproducibility- Increasingly required by Funding Agencies- Paving the way for Open Knowledge as envisioned by
DPHEP (http://dphep.org)- ICFA Study Group on Data Preservation and Long
Term Analysis in High Energy Physics
CERN has released Zenodo, a platform for Open Data as a Service (http://zenodo.org)1
• Building on experience of Digital Libraries & Extreme scale data management
• Targeted at the long tail of science• Citable through DOIs, including the associated software• Generated significant interest from open data publishers
such as Wiley, Ubiquity, F1000, eLife, PLOS
1Initially cofunded by the EC FP7 OpenAire series of projects
19 August 2015 CERN-ITU - Frédéric Hemmer 38
Training
16 January 2015 IT Department Meeting 40
CERN School of Computing
13 October 2016 CERN-NTNU Nils Høimyr 41
CERN openlab in a nutshell
• A science – industry partnership to drive R&D and innovation with over a decade of success
• Evaluate state-of-the-art technologies in a challenging environment and improve them
• Test in a research environment today what will be used in many business sectors tomorrow
• Train next generation of engineers/employees
• Disseminate results and outreach to new audiences
42The Worldwide LHC Computing Grid
43
Innovations
Energy
19 August 2015 CERN-ITU - Frédéric Hemmer 44
30M EUR/year Electricity Bill
Up to 200MW at peak utilisation
45
Electrical Consumption - computing
19 August 2015 CERN-ITU - Frédéric Hemmer 46
0
200
400
600
800
1000
1200
1400
1600
Physics Databases CERN Internet Exchange Point
User Support Infrastruc-ture
Grid Computing In-frastructure
Data Management Site NetworkingPhysics Computing Infrastructure
Database Services
Administrative Comput-ing Services
Unix Home Directories
Windows Infrastructure
kW
Assuming CERN is 20 %→ 25 MW Worldwide 24x7…
1 2 3 4 5 6 70
100
200
300
400
500
600
700
0
500
1000
1500
2000
2500
3000
600900
12001500
18002100
2400
PhysicsCriticalTotal
Yearly Capacity (KW)
Year
Total Capacity (KW)
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
IT at CERN – more than the Grid
• Physics computing – Grid (this talk!)
• Administrative information systems– Financial and administrative management systems, e-business...
• Desktop and office computing– Windows, Linux and Web infrastructure for day to day use
• Engineering applications and databases– CAD/CAM/CAE (Autocad, Catia, Cadence, Ansys etc)
– A number of technical information systems based on Oracle, MySQL
• Controls systems– Process control of accellerators, experiments and infrastructure
• Networks and telecom– European IP hub, security, voice over IP...
More information: http://cern.ch/it
Thank You!Thank You!
48
49