sdn presentation full lifecycle government cloud open ... · provides virtualized resources...

94
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved. lessons learnt: large scale government OpenStack private cloud

Upload: vuongtruc

Post on 07-May-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

lessons learnt: large scale government OpenStack private cloud

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

1. the client highly technical, critical operations

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

the client

government x

disparate user groups, varying and competing needs, long-lived and short-lived workloads

technically astute organization

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

background

challenges with energy consumption, system efficiency

need to squeeze more out of the IT investment

understands gross wastage within their traditional solution

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

background

need to meet organizational mission

don’t throw money at the challenge

money does not grow on trees

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

2. OpenStack

flexible, widely adopted

IaaS platform

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

what is OpenStack?

at the most basic level: management layer for virtualized compute, networking, storage resources throughout a datacenter

modern web ui dashboard: gives administrators control while empowering self-service by users

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

what does OpenStack offer?

flexible compute, storage, networking

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

lifecycle

deploy

operate

upgrade

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

how do we turn this into a solution for the client?

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

3. the solution system for production, not-a-toy

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

route to solution

problem definition and solution design

system architecture

hardware specification

project management: what’s happening when, by who

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

route to solution

plan for production from day 0

build for production from day 0

it’s not a toy: high availability is not optional

monkey-capable no-magic scaling up process

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

route to solution

your client != your guinea pig

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

4. deploy showtime: get solution design to work

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

from design to deployment

the ingredients: what’s the configuration needed

ingredients ready: make sure hardware are available (servers, storage, networking)

system configuration: single source of truth at all times

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

automate, automate, automate

system configuration = input for automation

deployment of entire system: fully automated

reduced project risk = increased operational confidence

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

deploy: 1, 2, 3 … showtime

1. input: configuration and inventory files

2. run deployer

3. 1 or 2 cups of coffee

… done

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

deployer

pre-tested and proven off-site

ci/cd process

no knowledge of OpenStack necessary: enable factory deployment, integration and testing

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

5. upgrade reliable, interruption-free upgrade

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

run at all times

n to n+1 upgrade every 6 months (releases eol in 12 months)

2n upgrades in n years: not an option

it’s not a toy: zero-downtime is not optional

system has to continue operation at all times

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

zero-downtime

get solution architecture right (from deploy phase) … else, find out problem 6 months later

monkey-capable fully automated upgrade process

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

6. reliability dead is not an option: averting

service death

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

new expectations

on demand

it should just work

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

new roles

operator — vs — consumer

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

operator

operation of underlying infrastructure

provides virtualized resources (compute, networking, storage) … and stops at this!

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

consumer

operation within the virtualized environment

not concerned about underlying infrastructure

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

consumer

don’t care about what it takes to provide the cloud environment

it should just work

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

operator

responsibility to keep cloud environment running

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

dead

in event the service is dead: expectation no longer met

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

dead is not an option

if it is dead, it is too late!

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

when could there be risk of death?

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

operate

system is deployed, now … keep services running

high-availability is not a luxury

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

risk of death

services encountering problems + impacting consumers

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

death during operate phase

intrinsic — vs — extrinsic

death risks

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

extrinsic death risk?

dealt with via highly available solution architecture

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

what is OpenStack?

a series of intercommunicating services

HTTP, MQ

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

types of services

1. with data + configuration

2. configuration only

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

safeguards: configuration

infrastructure-as-code operating model

must be able to re-deploy components by re-running deployer

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

safeguards: data

replication, replication, replication

2-node model or quorum/odd-node model

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

safeguards: data

traditional high availability operations model

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

anything else?

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

what else?

http: it’s a web server! let’s treat it as such!

high availability for web servers

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

what else?

mq: put stuff in queue, take stuff from queue

replicate reader/writer

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

extrinsic death risk

keep extra copy available at all times

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

is that all?

yes

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

smarts?

it’s in the solution architecture

get it right, else it can come back to bite you!

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

intrinsic death risk

it’s intrinsic, not externally triggered

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

intrinsic death risk

traditional operation model: monitor, alert, action

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

monitor

look for dead or alive

if it looks like it is dead, check again x 3, to be sure

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

monitor

… but it’s too late if it is dead!

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

intrinsic death risk?

it’s intrinsic!

highly available solution architecture will not suffice

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

smarts?

need to detect sick states

sick: alive, not healthy

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

smarts

correlation between x and y

not just causality (that’s easy!)

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

OpenStack architecture

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

causality? correlation?

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

challenge

non-trivial computational task

plus, “sick” in one environment may not be “sick” in another

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

data

2 types of data

metrics

logs

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

it needs to …

example: response time behavior of Web service + disk SMART errors

detect sick states that are otherwise below the radar

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

how?

logs ingest from OpenStack infrastructure

time series store for machine generated data

watch log data for anomalies

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

analog

take continuous x-ray pictures of patient

feed through decision engine

if detect certain dark spots, the patient may be “sick”

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

analog

if decision engine got it wrong, need to re-train

next time, dark spots will be classed as “sick”

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

health

picture covers a certain configurable/variable time window

constituents of the picture: configurable

operator to trigger re-training on wrong decisions

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

health

shortcuts

pre-condition: has to be alive to check for “sick”

causality relationships

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

not a toy

highly available

no-polling, operate on pushed data streams

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

not a toy

back pressure sensitive: not to overload data pipeline

horizontal scalable data store

load balancing across multiple backends

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

not a toy

high availability for an evaluation system?

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

massive volumes of data

how much: x00s metrics per data source

volume: potentially saturating x00 GbE network

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

smarts?

smart data ingest reduction

imbalanced I/O pattern: large volume of small writes, small number of large reads

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

deal with risk of death and …

know imminent fault

deal with latent threat, before threat becomes patent

meet consumer expectations

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

avert death

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

7. efficiency reduce, optimize away wastage, raise

efficiency, increase utilization

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

efficiency

it’s about what you do with what you have

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

reservation

what you asked for

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

utilization

what is actually used

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

… what’s your utilization level?

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

traditional hpc sites: “90+% utilization according to scheduler” … but is that reservation or utilization?

the client: understands the wastage in their traditional environment

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

a large public cloud operator: “90% memory reservation whilst average cpu was 6% and average max cpu was 15%; this is not unusual”

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

leave lights on?

do you leave lights on at home for the whole week if you are home for a day?

… that’s 14% utilization!

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

what to do?

need data, detailed data

what’s happening in each part of cpu operating unit, memory subsystem, i/o subsystem

form full picture of system utilization

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

have data, then?

large scale n-dimensional jigsaw puzzle problem

each parameter = 1 dimension

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

2 classes of problems

placement: where to place new VMs

rebalancing: how to optimally lay out VMs, optimal number of hosts

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

2 classes of problems

placement: where to place new piece

rebalancing: dimensions have changed, re-do jigsaw puzzle

solve jigsaw puzzle with x00s of dimensions

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

why do we need all these parameters?

answer: try packing parcels while having only height information, not having width nor depth?

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

the impact

the client: raised utilization from 20% to 60% (leaving 40% head room)

3x utilization increase

66% energy reduction

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

efficient resource management

operating system-based model, throughout entire facility

dynamic, efficient: lower OpEx + CapEx

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

8. conditional

flexible, widely adopted

IaaS platform

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

conditional engine

fast, scalable, highly available

handle large volumes of conditions to evaluate

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

… topics for another day

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

9. sardina systems

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

this is Sardina Systems

full-lifecycle automated OpenStack: deploy, operate, upgrade

AI-driven smart, efficient, super-scalable automation technology

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

enable organizations to rapidly experience the value of OpenStack cloud and maximize utility of their resources

sectors: finance, government, aerospace, research, academia

in 2015, Sardina FishOS won the IDC HPC Innovation Award

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

top-10 trader on New York market

U of Edinburgh: top-5 UK academic and research site

150k VMs at classified government site

Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.

lessons learnt: large scale government OpenStack private cloud

dr kenneth tan [email protected]

+44 798 941 7838