accelerating science with puppet
DESCRIPTION
Review of CERN's objectives and how the computing infrastructure is evolving to address the challenges at scale using community supported software such as Puppet and OpenStack.TRANSCRIPT
![Page 1: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/1.jpg)
Accelerating Sciencewith Puppet
@noggin143
PuppetConf San Francisco28th September 2012
1PuppetConf 2012 Tim Bell, CERN
![Page 2: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/2.jpg)
What is CERN ?
PuppetConf 2012 Tim Bell, CERN 2
• Conseil Européen pour la Recherche Nucléaire – aka European Laboratory for Particle Physics
• Between Geneva and the Jura mountains, straddling the Swiss-French border
• Founded in 1954 with an international treaty
• Our business is fundamental physics , what is the universe made of and how does it work
![Page 3: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/3.jpg)
PuppetConf 2012 Tim Bell, CERN 3
Answering fundamental questions…• How to explain particles have mass?
We have theories and accumulating experimental evidence.. Getting close…
• What is 96% of the universe made of ?We can only see 4% of its estimated mass!
• Why isn’t there anti-matterin the universe?
Nature should be symmetric…
• What was the state of matter justafter the « Big Bang » ?
Travelling back to the earliest instants ofthe universe would help…
![Page 4: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/4.jpg)
Community collaboration on an international scale
Tim Bell, CERN 4PuppetConf 2012
![Page 5: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/5.jpg)
Tim Bell, CERN 5
The Large Hadron Collider
PuppetConf 2012
![Page 6: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/6.jpg)
PuppetConf 2012 Tim Bell, CERN 6
![Page 7: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/7.jpg)
LHC construction
PuppetConf 2012 Tim Bell, CERN 7
![Page 8: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/8.jpg)
8
The Large Hadron Collider (LHC) tunnel
PuppetConf 2012 Tim Bell, CERN
![Page 9: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/9.jpg)
PuppetConf 2012 Tim Bell, CERN 9
![Page 10: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/10.jpg)
Superconducting magnets – October 2008
PuppetConf 2012 Tim Bell, CERN 10
A faulty connection between two superconducting magnets led to the release of a large amount of helium into the LHC tunnel and forced the machine to shut down for repairs for one year
![Page 11: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/11.jpg)
Accumulating events in 2009-2011
PuppetConf 2012 Tim Bell, CERN 11
![Page 12: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/12.jpg)
PuppetConf 2012 Tim Bell, CERN 12
![Page 13: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/13.jpg)
Heavy Ion Collisions
PuppetConf 2012 Tim Bell, CERN 13
![Page 14: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/14.jpg)
PuppetConf 2012 Tim Bell, CERN 14
![Page 15: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/15.jpg)
PuppetConf 2012 Tim Bell, CERN 15
Tier-1 (11 centres):•Permanent storage•Re-processing•Analysis
Tier-0 (CERN):•Data recording•Initial data reconstruction•Data distribution
Tier-2 (~200 centres):• Simulation• End-user analysis
• Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid• In a normal day, the grid provides 100,000 CPU days executing 1 million jobs
![Page 16: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/16.jpg)
PuppetConf 2012 Tim Bell, CERN 16
• Data Centre by Numbers– Hardware installation & retirement
• ~7,000 hardware movements/year; ~1,800 disk failures/year
Xeon 51502%
Xeon 516010%
Xeon E5335
7%Xeon
E534514%
Xeon E5405
6%
Xeon E541016%
Xeon L5420
8%
Xeon L552033%
Xeon 3GHz4%
Fujitsu3%
Hitachi23% HP
0% Maxtor
0% Seagate15%
Western Digital
59%
Other0%
High Speed Routers(640 Mbps → 2.4 Tbps) 24
Ethernet Switches 350
10 Gbps ports 2,000
Switching Capacity 4.8 Tbps
1 Gbps ports 16,939
10 Gbps ports 558
Racks 828
Servers 11,728
Processors 15,694
Cores 64,238
HEPSpec06 482,507
Disks 64,109
Raw disk capacity (TiB) 63,289
Memory modules 56,014
Memory capacity (TiB) 158
RAID controllers 3,749
Tape Drives 160
Tape Cartridges 45,000
Tape slots 56,000
Tape Capacity (TiB) 73,000
IT Power Consumption 2,456 KW
Total Power Consumption 3,890 KW
![Page 17: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/17.jpg)
Our Challenges - Data storage
PuppetConf 2012 Tim Bell, CERN 17
• 25PB/year to record• >20 years retention• 6GB/s average• 25GB/s peaks
![Page 18: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/18.jpg)
PuppetConf 2012 Tim Bell, CERN 18
![Page 19: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/19.jpg)
PuppetConf 2012 Tim Bell, CERN 19
45,000 tapes holding 73PB of physics data
![Page 20: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/20.jpg)
New data centre to expand capacity
PuppetConf 2012 Tim Bell, CERN 20
• Data centre in Geneva reaches limit of electrical capacity at 3.5MW
• New centre chosen in Budapest, Hungary
• Additional 2.7MW of usable power
• Hands off facility• Deploying from 2013
![Page 21: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/21.jpg)
Time to change strategy
• Rationale– Need to manage twice the servers as today– No increase in staff numbers– Tools becoming increasingly brittle and will not scale as-is
• Approach– We are no longer a special case for compute– Adopt an open source tool chain model– Strong engineering skills allows rapid adoption of new technologies
• Evaluate solutions in the problem domain• Identify functional gaps and challenge them
– Contribute new function back to the community
PuppetConf 2012 Tim Bell, CERN 21
![Page 22: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/22.jpg)
Building Blocks
PuppetConf 2012 Tim Bell, CERN 22
Bamboo
Koji, Mock
AIMS/PXEForeman
Yum repoPulp
Puppet-DB
mcollective, yum
JIRA
Lemon /Hadoop
git
OpenStack Nova
Hardware database
Puppet
Active Directory /LDAP
![Page 23: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/23.jpg)
Training and Support
• Buy the book rather than guru mentoring• Newcomers are rapidly productive (and often know more than us)• Community and Enterprise support means we’re not on our own
PuppetConf 2012 Tim Bell, CERN 23
![Page 24: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/24.jpg)
Staff Motivation
• Skills valuable outside of CERN when an engineer’s contracts end
PuppetConf 2012 Tim Bell, CERN 24
![Page 25: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/25.jpg)
Prepare the move to the clouds
• Improve operational efficiency– Machine reception and testing– Hardware interventions with long running programs– Multiple operating system demand
• Improve resource efficiency– Exploit idle resources, especially waiting for tape I/O– Highly variable load such as interactive or build machines
• Improve responsiveness– Self-Service– Coffee break response time
PuppetConf 2012 Tim Bell, CERN 25
![Page 26: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/26.jpg)
Service Model
PuppetConf 2012 Tim Bell, CERN 26
• Pets are given names like pussinboots.cern.ch
• They are unique, lovingly hand raised and cared for
• When they get ill, you nurse them back to health
• Cattle are given numbers like vm0042.cern.ch
• They are almost identical to other cattle• When they get ill, you get another one
• Future application architectures tend towards Cattle but Pets with configuration management are also viable
![Page 27: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/27.jpg)
OpenStack
PuppetConf 2012 Tim Bell, CERN 27
• Open source cloud run by an independent foundation with over 6,000 members from 850 organisations
• Started in 2010 but maturing rapidly with public cloud services from Rackspace, HP and Ubuntu
Platinum Members
![Page 28: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/28.jpg)
Many OpenStack Components to Configure
PuppetConf 2012 Tim Bell, CERN 28
Compute Scheduler
NetworkVolume
Registry Image
KEYSTONE HORIZON
NOVA
GLANCE
![Page 29: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/29.jpg)
When communities combine…
• OpenStack’s many components and options make configuration complex out of the box
• Puppet forge module from PuppetLabs (Thanks, Dan Bode)• The Foreman adds OpenStack provisioning for user kiosk
PuppetConf 2012 Tim Bell, CERN 29
![Page 30: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/30.jpg)
Scaling up with Puppet and OpenStack
• Use LHC@Home based on BOINC for simulating magnetics guiding particles around the LHC
• Naturally, there is a puppet module puppet-boinc• 1000 VMs spun up to stress test the hypervisors with Puppet,
Foreman and OpenStack
PuppetConf 2012 Tim Bell, CERN 30
![Page 31: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/31.jpg)
Next Steps• Expand tool chain
– Mcollective– Puppet-DB
• Deploy at scale in production– Move towards 15,000 hypervisors over next two years– Extimate 100-300,000 virtual machines
• Work with labs on common solutions for scientific computing– Batch system configurations– Grids– Publishing to http://github.com/cernops
• Investigate desktop and device management– Linux desktops– Macs– KVMs, PDUs
PuppetConf 2012 Tim Bell, CERN 31
![Page 32: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/32.jpg)
Final Thoughts
PuppetConf 2012 Tim Bell, CERN 32
• A small project to share documents at CERN in the ‘90s created the massive phenomenon that is today’s world wide web
• Open Source• Vibrant community and eco-system
• Working with the Puppet and OpenStack communities has shown the power of collaboration
• We have built a toolchain in one year with part time resources
• Running 15,000 servers and up to 300,000 VMs is scary but achievable
• Looking forward to further contributions as we move to large scale deployment
![Page 33: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/33.jpg)
For more details, see Ben Jones’ talk at 15:50 todayConfiguration Management at CERN – From Homegrown to Industry Standard
Tim Bell
![Page 34: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/34.jpg)
References
PuppetConf 2012 Tim Bell, CERN 34
CERN http://public.web.cern.ch/public/Scientific Linux http://www.scientificlinux.org/Worldwide LHC Computing Grid http://lcg.web.cern.ch/lcg/
http://rtm.hep.ph.ic.ac.uk/Jobs http://cern.ch/jobsDetailed Report on Agile Infrastructure http://cern.ch/go/N8wp
![Page 35: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/35.jpg)
Backup Slides
PuppetConf 2012 Tim Bell, CERN 35
![Page 36: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/36.jpg)
CERN’s tools
• The world’s most powerful accelerator: LHC– A 27 km long tunnel filled with high-tech instruments– Equipped with thousands of superconducting magnets– Accelerates particles to energies never before obtained– Produces particle collisions creating microscopic “big bangs”
• Very large sophisticated detectors– Four experiments each the size of a cathedral– Hundred million measurement channels each– Data acquisition systems treating Petabytes per second
• Top level computing to distribute and analyse the data– A Computing Grid linking ~200 computer centres around the globe– Sufficient computing power and storage to handle 25 Petabytes per
year, making them available to thousands of physicists for analysis
PuppetConf 2012 Tim Bell, CERN 36
![Page 37: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/37.jpg)
Our Infrastructure
• Hardware is generally based on commodity, white-box servers– Open tendering process based on SpecInt/CHF, CHF/Watt and GB/CHF– Compute nodes typically dual processor, 2GB per core– Bulk storage on 24x2TB disk storage-in-a-box with a RAID card
• Vast majority of servers run Scientific Linux, developed by Fermilab and CERN, based on Redhat Enterprise
– Focus is on stability in view of the number of centres on the WLCG
PuppetConf 2012 Tim Bell, CERN 37
![Page 38: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/38.jpg)
New architecture data flows
PuppetConf 2012 Tim Bell, CERN 38
![Page 39: Accelerating science with Puppet](https://reader036.vdocument.in/reader036/viewer/2022062405/55638d5bd8b42adf7a8b511d/html5/thumbnails/39.jpg)
OpenStack
PuppetConf 2012 Tim Bell, CERN 39
Gold Members