infrastructure manager, cern :: @noggin143 clouds and research collide at cern tim bell

43
Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

Upload: clement-todd

Post on 04-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

Infrastructure Manager, CERN :: @noggin143

Clouds and Research Collide at CERN

T IM BE L L

Page 2: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

CLOUDS AND RESEARCH COLLIDE

AT CERN

Page 3: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

About CERN

CERN is the European Organization for Nuclear Research in Geneva

• Particle accelerators and other infrastructure for high energy physics (HEP) research

• Worldwide community‒ 21 member states (+ 2 incoming members)‒ Observers: Turkey, Russia, Japan, USA, India‒ About 2300 staff‒ >10’000 users (about 5’000 on-site)‒ Budget (2014) ~1000 MCHF

Birthplace of the World Wide Web

Page 4: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL
Page 5: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL
Page 6: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

COLLISIONS

Page 7: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

TIER-1: permanent storage,

re-processing, analysis

TIER-0 (CERN): data recording,

reconstruction and distribution

TIER-2: Simulation,

end-user analysis

> 2 million jobs/day

~350’000 cores

500 PB of storage

nearly 170 sites, 40 countries

10-100 Gb links

The Worldwide LHC Computing Grid

Page 8: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

LHC Data Growth

Expecting to record 400PB/year by 2023

Compute needs expected to be around 50x current levels if budget available

2010 2015 2018 2023

PB

pe

r ye

ar

Page 9: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

THE CERN MEYRIN DATA CENTRE

http://goo.gl/maps/K5SoG

Page 10: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

Public Procurement CycleStep Time (Days) Elapsed (Days)

User expresses requirement 0

Market Survey prepared 15 15

Market Survey for possible vendors 30 45

Specifications prepared 15 60

Vendor responses 30 90

Test systems evaluated 30 120

Offers adjudicated 10 130

Finance committee 30 160

Hardware delivered 90 250

Burn in and acceptance 30 days typical with 380 worst case 280

Total 280+ Days

Page 11: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL
Page 12: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

CERN Tool Chain

Page 13: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

OpenStack Private Cloud Status

4 OpenStack clouds at CERN• Largest is ~120,000 cores in ~5,000 servers• 3 other instances with 45,000 cores total• 1,100 active users

Running OpenStack Juno• 3PB Ceph block storage for attached volumes

All non-CERN specific code is contributed back to the community

Page 14: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

Source: eBay

Upstream OpenStack on its own does not give you a cloud

Packaging

Integration

Burn In

SLA

Monitoring

Cloud is a

service!

Monitoring and alerting

Metering and

chargebackAutoscaling

Remediation

Scale out

Capacity planning

Upgrades

SLA

Customer supportUser

experience

Incident resolution

Alerting

Cloud monitoring

Metrics

Log processing

High availability

Config management

Infraonboarding

CI

BuildsNet/info

sec

Network design

OpenStack APIs

Page 15: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

Marketplace OptionsBuild your own

• High level of programming skills needed

• Complete the service circles

Use a distribution• Good linux admin skills• License costs

Use a public cloud or hosted private cloud

• Quick start• Varying pricing models available,

check for “OpenStack powered”

Page 16: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

Hooke’s Law for Cultural Change

Under load, an organization can extend proportional to external force

Too much stretching leads to permanent deformation

Page 17: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

THE AGILE EXPERIENCE

Page 18: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

CULTURAL BARRIERS

Page 19: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

CERN Openlab in a Nutshell

A science – industry partnership to drive R&D and innovation with over a decade of success

Evaluate state-of-the-art technologies in a challenging environment and improve them

Test in a research environment today what will be used in many business sectors tomorrow

Train next generation of engineers/employees

Disseminate results and outreach to new audiences

Page 20: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

Phase V Members

P A R T N E R S

C O N T R I B U T O R S

A S S O C I A T E S

R E S E A R C H

Page 21: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

IN2P3Lyon

Brookhaven National Labs

NecTARAustralia

ATLAS Trigger28K cores

ALICE Trigger12K cores

CMS Trigger12K cores

Many Others on Their Way

Public Cloud such as Rackspace

Onwards the Federated Clouds

CERN Private Cloud120K cores

Page 22: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

Openlab Past Activities

Developed OpenStack identity federation• Demo’d between CERN and Rackspace

at the Paris summit

Validated Rackspace private and public clouds for physics simulation workloads

• Performance and reliability comparable with private cloud

Page 23: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

Openlab Ongoing Activities

Expand federation functionality • Images• Orchestration

Expand workloads• Reconstruction (High I/O and network)• Bare-metal testing• Exploit federation

Page 24: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

Summary

OpenStack clouds are in production at scale for High Energy Physics

Cultural change to an Agile approach has required time and patience but is paying off

CERN’s computing challenges and collaboration with Rackspace has fostered sustainable innovation

Many options for future computing models according to budget and needs

Page 25: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

For Further Information ?

Technical details at http://openstack-in-production.blogspot.fr

CERN code is upstream or at http://github.com/cernops

Page 26: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

Backup Slides

Page 27: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

04/06/2015 27Tim Bell - Rackspace::Solve

Page 28: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

New Data Centre in Budapest

04/06/2015 28

Page 29: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

Cultural TransformationsTechnology change needs cultural change• Speed

• Are we going too fast ?

• Budget• Cloud quota allocation rather than CHF

• Skills inversion• Legacy skills value is reduced

• Hardware ownership• No longer a physical box to check

04/06/2015 Tim Bell - Rackspace::Solve 29

Page 30: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

Good News, Bad News

04/06/2015 Tim Bell - Rackspace::Solve 30

• Additional data centre in Budapest now online• Increasing use of facilities as data rates increase

But…• Staff numbers are fixed, no more people• Materials budget decreasing, no more money• Legacy tools are high maintenance and brittle• User expectations are for fast self-service

Page 31: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

Innovation Dilemma• How can we avoid the sustainability trap ?

• Define requirements• No solution available that meets those requirements• Develop our own new solution• Accumulate technical debt

• How can we learn from others and share ?• Find compatible open source communities• Contribute back where there is missing functionality• Stay mainstream

Are CERN computing needs really special ?

04/06/2015 Tim Bell - Rackspace::Solve 31

Page 32: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

O’Reilly Consideration

04/06/2015 Tim Bell - Rackspace::Solve 32

Page 33: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

Job Trends Consideration

04/06/2015 Tim Bell - Rackspace::Solve 33

Page 34: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

OpenStack Cloud Platform

04/06/2015 Tim Bell - Rackspace::Solve 34

Page 35: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

OpenStack Governance

04/06/2015 35

Page 36: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

04/06/2015 36Tim Bell - Rackspace::Solve

Page 37: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

The LHC timeline

L~7x1033

Pile-up~20-35

L=1.6x1034

Pile-up~30-45

L=2-3x1034

Pile-up~50-80 L=5x1034

Pile-up~ 130-200

L.Rossi

3704/06/2015

Page 38: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

04/06/2015 38

http://www.eucalyptus.com/blog/2013/04/02/cy13-q1-community-analysis-%E2%80%94-openstack-vs-opennebula-vs-eucalyptus-vs-cloudstack

Tim Bell - Rackspace::Solve

Page 39: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

compute-nodescontrollers

compute-nodes

Scaling Architecture Overview

39

Child CellGeneva, Switzerland

Child CellBudapest, Hungary

Top Cell - controllersGeneva, Switzerland

Load BalancerGeneva, Switzerland

controllers

04/06/2015 Tim Bell - Rackspace::Solve

Page 40: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

Monitoring - Kibana

4004/06/2015 Tim Bell - Rackspace::Solve

Page 41: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

Architecture Components

41

rabb

itmq

- Keystone

- Nova api- Nova conductor- Nova scheduler- Nova network- Nova cells

- Glance api

- Ceilometer agent-central- Ceilometer collector

Controller

- Flume

- Nova compute

- Ceilometer agent-compute

Compute node

- Flume

- HDFS

- Elastic Search

- Kibana

- MySQL

- MongoDB

- Glance api- Glance registry

- Keystone

- Nova api- Nova consoleauth- Nova novncproxy- Nova cells

- Horizon

- Ceilometer api

- Cinder api- Cinder volume- Cinder scheduler

rabb

itmq

Controller

Top Cell Children Cells

- Stacktach

- Ceph

- Flume

04/06/2015 Tim Bell - Rackspace::Solve

Page 42: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

04/06/2015 42

Microsoft Active Directory

Database Services

CERN Network Database

Account mgmt system

Horizon

Keystone

Glance

NetworkCompute

Scheduler

Cinder

Nova

Block StorageCeph & NetApp

CERN Accounting

Ceilometer

Tim Bell - Rackspace::Solve

Page 43: Infrastructure Manager, CERN :: @noggin143 Clouds and Research Collide at CERN TIM BELL

04/06/2015 43

Helix Nebula

AtosCloudSigma

T-Systems

Broker(s)

EGI Fed Cloud

Front-endFront-end Front-endFront-endFront-end

Academic Other market sectorsBig Science Small and Medium Scale Science

Publicly funded Commercial

Government Manufacturing Oil & gas, etc.

Net

wor

k C

omm

erci

al/G

EA

NT

Interoute

Front-end