10 years of cloud - moodle.msengineering.ch

64
10 YEARS OF CLOUD 10 YEARS OF CLOUD PIERRE-YVES RITSCHARD ( PIERRE-YVES RITSCHARD ( ) @PYR @PYR

Upload: others

Post on 10-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 10 YEARS OF CLOUD - moodle.msengineering.ch

10 YEARS OF CLOUD10 YEARS OF CLOUDPIERRE-YVES RITSCHARD (PIERRE-YVES RITSCHARD ( ))@PYR@PYR

Page 2: 10 YEARS OF CLOUD - moodle.msengineering.ch

👋👋👋👋👋👋: Three-line Bio

CTO & Co-founder at Distributed systems and monitoring enthusiastOpen-Source developer

@pyr

Exoscale

Page 3: 10 YEARS OF CLOUD - moodle.msengineering.ch

10 YEARS OF CLOUD10 YEARS OF CLOUDBuilding better infrastructure with parentheses

Page 4: 10 YEARS OF CLOUD - moodle.msengineering.ch

EXOSCALEEXOSCALEInfrastructure as a serviceZones in Frankfurt, Vienna, Zürich, Geneva, Münich, Sofia

Page 5: 10 YEARS OF CLOUD - moodle.msengineering.ch

EXOSCALEEXOSCALE

Page 6: 10 YEARS OF CLOUD - moodle.msengineering.ch

EXOSCALEEXOSCALEprovider "exoscale" { api_key = "${var.exoscale_api_key}" secret_key = "${var.exoscale_secret_key}" }

resource "exoscale_instance" "web" { template = "Ubuntu 21.04" disk_size = "50g" profile = "medium" ssh_key = "production" }

Page 7: 10 YEARS OF CLOUD - moodle.msengineering.ch

WHAT'S IN A CLOUD PROVIDERWHAT'S IN A CLOUD PROVIDERDatacenter operationsSo�ware development

Page 8: 10 YEARS OF CLOUD - moodle.msengineering.ch

SOFTWARE AT EXOSCALESOFTWARE AT EXOSCALEAPI GatewayOrchestrators (VM instance, Load-Balancer, Kubernetes)Object storage controllerNetwork controller (SDN)Customer managementMetering systemBillingWeb portal

Page 9: 10 YEARS OF CLOUD - moodle.msengineering.ch

ENGINEERING AT EXOSCALEENGINEERING AT EXOSCALE

Page 10: 10 YEARS OF CLOUD - moodle.msengineering.ch

ISN'T ALL OF THIS BASH, PERL, ANDISN'T ALL OF THIS BASH, PERL, ANDYAML?YAML?

Page 11: 10 YEARS OF CLOUD - moodle.msengineering.ch

ENGINEERING AT EXOSCALE: A TIMELINEENGINEERING AT EXOSCALE: A TIMELINE

Page 12: 10 YEARS OF CLOUD - moodle.msengineering.ch

2012: THE EARLY DAYS2012: THE EARLY DAYSWe started with

3 peopleA bit of timeA product idea

Page 13: 10 YEARS OF CLOUD - moodle.msengineering.ch

A DIFFERENT CLOUD PROVIDERA DIFFERENT CLOUD PROVIDERNot yet another virtual datacenter productIntegration with automation toolingIntegration in language-specific librariesFocus on horizontally-scalable applications

Local storageSecurity groups

Page 14: 10 YEARS OF CLOUD - moodle.msengineering.ch

THINGS THAT DIDN'T EXIST IN 2012THINGS THAT DIDN'T EXIST IN 2012AnsibleTerraformDocker

Page 15: 10 YEARS OF CLOUD - moodle.msengineering.ch

THINGS THAT DIDN'T EXIST IN 2012THINGS THAT DIDN'T EXIST IN 2012TelevisionWifi

Page 16: 10 YEARS OF CLOUD - moodle.msengineering.ch

OUR MINIMAL STACKOUR MINIMAL STACKApache CloudstackPuppetGood old MySQLA third-party customer management toolPython + AngularJSRiemann

Page 17: 10 YEARS OF CLOUD - moodle.msengineering.ch

OUR MINIMAL STACKOUR MINIMAL STACK

Page 18: 10 YEARS OF CLOUD - moodle.msengineering.ch

RIEMANNRIEMANNThe common saying back then was monitoring sucksPush-based model was a great fit for our use caseA great opportunity to contribute, as Riemann was in earlystages

Page 19: 10 YEARS OF CLOUD - moodle.msengineering.ch

2013: GOING LIVE2013: GOING LIVE

Page 20: 10 YEARS OF CLOUD - moodle.msengineering.ch

BACKEND DEVELOPERS DOINGBACKEND DEVELOPERS DOINGFRONTENDFRONTEND

Page 21: 10 YEARS OF CLOUD - moodle.msengineering.ch

THINGS OUR EARLY ADOPTERS ENJOYEDTHINGS OUR EARLY ADOPTERS ENJOYEDVagrant supportSecurity groups instead of firewallingA public IP per instance

Page 22: 10 YEARS OF CLOUD - moodle.msengineering.ch

IMPROVING RELEASE AUTOMATIONIMPROVING RELEASE AUTOMATION

Page 23: 10 YEARS OF CLOUD - moodle.msengineering.ch

WARPWARP

Page 24: 10 YEARS OF CLOUD - moodle.msengineering.ch

WARPWARP

Page 25: 10 YEARS OF CLOUD - moodle.msengineering.ch

WARPWARPOpen SourceTLS client certificate-based authenticationIRC supportHaskell Go agentPrefigured our inclination for Clojure at the orchestration layer

Page 26: 10 YEARS OF CLOUD - moodle.msengineering.ch

TROUBLE KICKS INTROUBLE KICKS INLate paymentsBitcoin mining on free credit

Page 27: 10 YEARS OF CLOUD - moodle.msengineering.ch

SOLVING ABUSESOLVING ABUSENeed to pull data from a bunch of placesStandard FSM type of problem

Page 28: 10 YEARS OF CLOUD - moodle.msengineering.ch

FUNCTIONAL PROGRAMMING TO THEFUNCTIONAL PROGRAMMING TO THERESCUE!RESCUE!

(match [state new-state unpaid-invoices?] [:ok :warning _ ] :warn! [:ok :critical _ ] :suspend! [:warning :critical _ ] :suspend! [:warning :ok _ ] :active! [:critical :ok false ] :active! [:critical :warning false ] :active! [_ _ _ ] nil)

Page 29: 10 YEARS OF CLOUD - moodle.msengineering.ch

2014: THE YEAR OF STORAGE2014: THE YEAR OF STORAGE

Page 30: 10 YEARS OF CLOUD - moodle.msengineering.ch

OBJECT STORAGEOBJECT STORAGEThe obvious choice for our crowdArchitecturally simpler than distributed block storageA good complement to our local storage backed instances

Page 31: 10 YEARS OF CLOUD - moodle.msengineering.ch

OBJECT STORAGE NEEDSOBJECT STORAGE NEEDSS3 is the sole player in that field: we need API compatibilityThe only alternative at the time was bad HTTP extensions

Page 32: 10 YEARS OF CLOUD - moodle.msengineering.ch

OBJECT STORAGE IN THE WILDOBJECT STORAGE IN THE WILDCephRiak-CSSwi�Costly vendor-backed solutions

Page 33: 10 YEARS OF CLOUD - moodle.msengineering.ch

WRITING AN OBJECT STOREWRITING AN OBJECT STOREWe focused on how to store large objectsTempted by a description of the (non-OpenSource) approach byDatastax on top of Cassandra

Page 34: 10 YEARS OF CLOUD - moodle.msengineering.ch

CHOOSING CASSANDRACHOOSING CASSANDRAGreat library supportSimple for us to operateVery few moving partsOur implementation could remain fully stateless

Page 35: 10 YEARS OF CLOUD - moodle.msengineering.ch

WE WERE (ALMOST) YOUNG AND (WAYWE WERE (ALMOST) YOUNG AND (WAYTOO) NAIVETOO) NAIVE

How are could it be?

Page 36: 10 YEARS OF CLOUD - moodle.msengineering.ch

WHAT WE DIDN'T ANTICIPATEWHAT WE DIDN'T ANTICIPATEIt's not all about actual data storage

The S3 API is a beastThe S3 API is under specifiedThe S3 API is not versionedThe S3 API client landscape is a mess

Page 37: 10 YEARS OF CLOUD - moodle.msengineering.ch

A QUICK DIGRESSION: S3 REQUESTSA QUICK DIGRESSION: S3 REQUESTSOperation: put object foo in bucket bar:

PUT /foo

Host bar.sos-ch-dk-2.exo.io

Authorization: AWS ....

<...>

Page 38: 10 YEARS OF CLOUD - moodle.msengineering.ch

A QUICK DIGRESSION: S3 REQUESTSA QUICK DIGRESSION: S3 REQUESTSOperation: update acl for object foo in bucket bar:

PUT /foo?acl

Host bar.sos-ch-dk-2.exo.io

Authorization: AWS ....

X-Amz-ACL: bucket-owner-full-control

Page 39: 10 YEARS OF CLOUD - moodle.msengineering.ch

A QUICK DIGRESSION: S3 REQUESTSA QUICK DIGRESSION: S3 REQUESTSOperation: Copy object bim from bucket bam to object foo in

bucket bar:PUT /foo

Host bar.sos-ch-dk-2.exo.io

Authorization: AWS ....

X-Amz-Copy-Source: /bim/bam

X-Amz-Copy-Source-If-Unmodified-Since: ARE YOU KIDDING ME?

Page 40: 10 YEARS OF CLOUD - moodle.msengineering.ch

BY THE WAYBY THE WAYStoring terrabytes of data on off-the-shelf hardware doesn'tcome by easy eitherInput and output payloads of arbitrary lengths aren't easyThe standard web stack doesn't cut it

Page 41: 10 YEARS OF CLOUD - moodle.msengineering.ch

2015: SCALING UP2015: SCALING UP

Page 42: 10 YEARS OF CLOUD - moodle.msengineering.ch

BILLING ISSUESBILLING ISSUESThe cron based approach to billing is showing its limitHard to keep it at a hourly rate because it takes too long

Page 43: 10 YEARS OF CLOUD - moodle.msengineering.ch

AT A CROSSROADSAT A CROSSROADS

Page 44: 10 YEARS OF CLOUD - moodle.msengineering.ch

AT A CROSSROADSAT A CROSSROADS

Page 45: 10 YEARS OF CLOUD - moodle.msengineering.ch

AT A CROSSROADSAT A CROSSROADS

Page 46: 10 YEARS OF CLOUD - moodle.msengineering.ch

INTRODUCING STREAM PROCESSINGINTRODUCING STREAM PROCESSINGYou can't do everything with cron

Page 47: 10 YEARS OF CLOUD - moodle.msengineering.ch

OUR CANDIDATE: APACHE KAFKAOUR CANDIDATE: APACHE KAFKAPartition-isolated consistencyDisaggregating memory

Page 48: 10 YEARS OF CLOUD - moodle.msengineering.ch

WHY KAFKA?WHY KAFKA?

Page 49: 10 YEARS OF CLOUD - moodle.msengineering.ch

A FIRST CANDIDATE: BANDWIDTHA FIRST CANDIDATE: BANDWIDTHMETERINGMETERING

Traffic accounting on hypervisors, with a small C agent30 second aggregates sent over to KafkaA Clojure Kafka consumer on the other end

Page 50: 10 YEARS OF CLOUD - moodle.msengineering.ch

KEY TAKEWAYSKEY TAKEWAYSIncredible reliabilitySoon expanded to the rest of our billing infrastructureThe system can weather temporary failures with no billingimpact

Page 51: 10 YEARS OF CLOUD - moodle.msengineering.ch

2017: TOO MUCH DATA2017: TOO MUCH DATA

Page 52: 10 YEARS OF CLOUD - moodle.msengineering.ch

SUDDEN S3 PICKUP IN USAGESUDDEN S3 PICKUP IN USAGEOur initial implementation limits the throughputTail latencies go through the roofCassandra is just not great at doing dense nodes

We knew this going inWe hit the wall hard

Page 53: 10 YEARS OF CLOUD - moodle.msengineering.ch

WE NEED A NUMBER OF NEW APIWE NEED A NUMBER OF NEW APICAPABILITIESCAPABILITIES

V4 signatures are becoming the norm for S3Better ACL support is neededThe docker registry exercises all weird properties of the API

Page 54: 10 YEARS OF CLOUD - moodle.msengineering.ch

WE FIND A GOOD PAPERWE FIND A GOOD PAPERAmbry attacks the same problem spaceThe paper lays out a great strategy

Page 55: 10 YEARS OF CLOUD - moodle.msengineering.ch

LET'S WRITE A DISTRIBUTED SYSTEMLET'S WRITE A DISTRIBUTED SYSTEMFROM SCRATCHFROM SCRATCH

What could go wrong?

Page 56: 10 YEARS OF CLOUD - moodle.msengineering.ch

KEY DECISIONSKEY DECISIONSA storage agent in COrchestration in ClojureZookeeper for agent discoveryCassandra for metadata storage

Page 57: 10 YEARS OF CLOUD - moodle.msengineering.ch

UIUI

Page 58: 10 YEARS OF CLOUD - moodle.msengineering.ch

2018: SECURITY, API, AND KUBERNETES2018: SECURITY, API, AND KUBERNETES

Page 59: 10 YEARS OF CLOUD - moodle.msengineering.ch

SPECTRE AND MELTDOWNSPECTRE AND MELTDOWNEstablished crisis communication channels with other keyprovidersLarge scale automation of Linux Kernel roll-outs

Page 60: 10 YEARS OF CLOUD - moodle.msengineering.ch

BUILDING ON KUBERNETESBUILDING ON KUBERNETESWe previously bet on MesosTraction seemed to be stronger around KubernetesOther players were still relevant (notably Swarm)Need to gain knowledge before being ready to sell

Page 61: 10 YEARS OF CLOUD - moodle.msengineering.ch

LOOKING BACKLOOKING BACK

Page 62: 10 YEARS OF CLOUD - moodle.msengineering.ch

OPEN-SOURCE AT EXOSCALEOPEN-SOURCE AT EXOSCALEApache Cloudstack was a great way to bootstrap our serviceHigh dependence on Open Source DatabasesSome key low level components (Qemu, Libvirt)

Page 63: 10 YEARS OF CLOUD - moodle.msengineering.ch

OPEN-SOURCE INVOLVEMENTOPEN-SOURCE INVOLVEMENTSeveral Apache developersKey contributions to major projects

Page 64: 10 YEARS OF CLOUD - moodle.msengineering.ch

THANKSTHANKSQuestions