openstack reality check (dach tag edition)

82

Upload: isotopp

Post on 28-Jul-2015

1.257 views

Category:

Internet


2 download

TRANSCRIPT

Page 1: Openstack Reality Check (DACH Tag Edition)
Page 2: Openstack Reality Check (DACH Tag Edition)

45 Minutes of OpenStack Hate: A Reality Check

Kristian Köhntopp, Cloud Architect Old Fart

SysEleven

https://www.flickr.com/photos/pinksherbet/188842453 Pink Sherbet Photography (CC BY 2.0)

Page 3: Openstack Reality Check (DACH Tag Edition)

3

Page 4: Openstack Reality Check (DACH Tag Edition)

What is it that we want to do?

4

Page 5: Openstack Reality Check (DACH Tag Edition)

–Operating POV

„Any VM, anywhere.“

5

Page 6: Openstack Reality Check (DACH Tag Edition)

–Developer POV

„Infrastructure as Code.“

6

Page 7: Openstack Reality Check (DACH Tag Edition)

Why would I need that?

• 24 Cores (48 Cores HT), 256G RAM, 2* 10GBit und 12* 3TB HDD including a solid BBU

• 10k EUR

• Applications that actually fill that box are rare. So we are cutting it up to sell off the parts.

7

Page 8: Openstack Reality Check (DACH Tag Edition)

"Any VM, anywhere"8

CPU, RAM

StorageNetwork

OverlayUnderlay

Page 9: Openstack Reality Check (DACH Tag Edition)

Openstack Overview (simplified)9

Page 10: Openstack Reality Check (DACH Tag Edition)

10

Page 11: Openstack Reality Check (DACH Tag Edition)

https://uksysadmin.files.wordpress.com/2011/03/openstackwallpaper1.png

Page 12: Openstack Reality Check (DACH Tag Edition)

Widely supported by many vendors marketing departmentshttps://www.flickr.com/photos/ahockley/8662640096 Aaron Hockley (cc-by-sa 2.0)

Page 13: Openstack Reality Check (DACH Tag Edition)

13

http://www.zdnet.com/article/customers-reporting-interest-in-cloud-containers-linux-and-openstack-for-2015/ http://www.businesscloudnews.com/2015/04/09/red-hat-dell-redouble-openstack-private-cloud-efforts/

http://blogs.wsj.com/cio/2015/02/25/the-morning-download-h-p-deal-with-deutsche-bank-is-step-forward-for-openstack/

Page 14: Openstack Reality Check (DACH Tag Edition)

What Openstack Vendors promise…https://www.flickr.com/photos/debarshiray/8237431709/sizes/l debarshiray (CC-BY-SA)

Page 15: Openstack Reality Check (DACH Tag Edition)

How Openstack Admins imagine their workplace…(C) Kristian Köhntopp, 2006

Page 16: Openstack Reality Check (DACH Tag Edition)

What is being delivered…(C) Hendrik Scholz, 2006

Page 17: Openstack Reality Check (DACH Tag Edition)

What the admin job actually looks like…(C) Kristian Köhntopp, 2015

Page 18: Openstack Reality Check (DACH Tag Edition)

Storage"Where the Internet lives", http://www.google.com/about/datacenters/gallery/#/tech/12

Page 19: Openstack Reality Check (DACH Tag Edition)

Storage requirements

• VM with Ephemeral Storage • Not a problem, right? Because storage can be an un-

RAID-ed local disk.• But then you got no migration.

• Maintenance sucks w/o Migration. For you and for your customers.

19

Page 20: Openstack Reality Check (DACH Tag Edition)

Lesson #1:Nope, local storage is not a good

default.

20

Page 21: Openstack Reality Check (DACH Tag Edition)

Storage

• VM with Volume: All writes are always remote. • And redundant.

• Relevant metrics: • Bandwidth, IOPS and Latency • MB/sec, multithreaded-fsync()/sec und sequential-

fsync()/sec• Where do the limits come from?

21

Page 22: Openstack Reality Check (DACH Tag Edition)

Storage vs. requirements

• Scales easily: • Bandwidth, MT-IOPS

• Hard to scale: • Sequential-IOPS (b/c latency, target: 10k) • Default-Benchmark: "MySQL Slave on a volume" • "16 KB single-thread random-write on a datafile."

22

Page 23: Openstack Reality Check (DACH Tag Edition)

Storage vs. requirements

• shared storage = at least one network write • one write = 1200 MB/s

two writes = half latency • latency is at least 1/20000s • at most 10k-20k commit/s

23

Page 24: Openstack Reality Check (DACH Tag Edition)

Lesson #2:A Single-Threaded Random-Write

Benchmark is a fine first evaluation.

24

Page 25: Openstack Reality Check (DACH Tag Edition)

Storage vs. actual Openstack delivery

• OpenStack Default is iSCSI und tgtd • »The problem is left as an exercise to the user.«

• Storage Vendors are totally in love with that approach. • https://wiki.openstack.org/wiki/CinderSupportMatrix

25

Page 26: Openstack Reality Check (DACH Tag Edition)

Storage: Ceph - the good

• Excellent bandwidth • Good Multithreaded-IOPS • Very robust, if you got spare memory and time for MTTR

26

Page 27: Openstack Reality Check (DACH Tag Edition)

Storage: Ceph - the bad

• CRUSH • Layout follows topology • Change topology, move data needlessly • You do need a background network for storage to

facilitate Rebalancing/Recovery/Cluster-Reorg

27

Page 28: Openstack Reality Check (DACH Tag Edition)

Storage: Ceph - the ugly

• Logging • IO is being serialised.

• Sequential IOPS fatally slow (200 IOPS). • MySQL slave on our Ceph-Volumes @ 200 Commit/s • Windows 8.1 boots from a Ceph-Volume in 15 Minutes

28

Page 29: Openstack Reality Check (DACH Tag Edition)

Lesson #3:Pure Play OpenStack will not work for production.

Corollary: OpenStack is not a functioning

Open Source Project.

29

Page 30: Openstack Reality Check (DACH Tag Edition)

Storage requirements vs. Multitenancy

• IOPS-Requirements need SSD to implement

30

"Magento Indexer"

Single-Threaded Database Updates

Page 31: Openstack Reality Check (DACH Tag Edition)

Storage requirements vs. Multitenancy

• IOPS-Requirements need SSD to implement

• IOPS Quotas require SSD to implement

30

You can't put quota on what isn't there.

Small IOPS = large incremental steps, large variance.

Page 32: Openstack Reality Check (DACH Tag Edition)

Storage requirements vs. Multitenancy

• IOPS-Requirements need SSD to implement

• IOPS Quotas require SSD to implement

• SSD complicated - Price vs. guranteed Performance

30

Target price 1 EUR/GB atm.

Low performance variation more

important than huge peak perf.

Page 33: Openstack Reality Check (DACH Tag Edition)

Storage requirements vs. Multitenancy

• IOPS-Requirements need SSD to implement

• IOPS Quotas require SSD to implement

• SSD complicated - Price vs. guranteed Performance

• Caches complicated

30

Working Set > Cache

means you are working

the raw iron.

Page 34: Openstack Reality Check (DACH Tag Edition)

Lesson #4:Caching is as much part of the problem

as it is part of the solution.

31

Page 35: Openstack Reality Check (DACH Tag Edition)

NetworkMercury Redstone Connector MR-1 (1960) https://www.flickr.com/photos/jurvetson/5691350527 Steve Jurvetson (CC-BY)

Page 36: Openstack Reality Check (DACH Tag Edition)

Network requirements

• Hosting setup:

• Topology: Wait-Free, Oversubscription-Free

• Redundancy: SPOF-Free

• Multi-Tenancy: Isolation and Quotas

• Also:

• Capable of carrying the storage traffic

33

Page 37: Openstack Reality Check (DACH Tag Edition)

–Joe Random Alphauser

„We just rolled out a few dozen VMs with Hadoop and played Terasort.“

34

Page 38: Openstack Reality Check (DACH Tag Edition)

Terasort to watch the world burn35

http://www.slideshare.net/pramodbiligiri/shuffle-phase-as-the-bottleneck-in-hadoop-terasort by http://www.slideshare.net/pramodbiligiri/presentations

Page 39: Openstack Reality Check (DACH Tag Edition)

Oversubscription free networking

• Leaf and Spine Architecture • More ports, more switches • Price per port:

• 200-300 EUR/10GBit

36

http://bradhedlund.com/2012/01/25/construct-a-leaf-spine-design-with-40g-or-10g-an-observation-in-scaling-the-fabric/

Page 40: Openstack Reality Check (DACH Tag Edition)

Notwork: Openstack default offering

• Open vSwitch, GRE-Ball, Broadcast-Problem, Chokepoint, SPOF

• Current releases are only marginally less braindead.

37

SPOF!

Page 41: Openstack Reality Check (DACH Tag Edition)

Lesson #5:Ok, networking doesn't work, either.

38

Page 42: Openstack Reality Check (DACH Tag Edition)

Let's look around for a non-broken solution39

Page 43: Openstack Reality Check (DACH Tag Edition)

OpenContrail: the good, …

• Open Source Project by Juniper • Uses existing hardware routing infrastructure

• scales, no SPOFs, no Chokepoints • based on MPLS, BGP, and other well understood

protocols • actually delivers stuff (as opposed to e.g. OpenDaylite)

40

Page 44: Openstack Reality Check (DACH Tag Edition)

OpenContrail: the bad, …

• Juniper bought Contrail, doesn't understand how to market it

• little and outdated documentation, bad release management, very bad packaging

• funny support organisation (the support is awesome, the organisation is funny)

41

Page 45: Openstack Reality Check (DACH Tag Edition)

OpenContrail: the bad, …

• During the OpenContrail build process, the scons-based build-process will

• download the full source of tools such as Bind and Curl • patch them wildly • put the resulting binaries into a .deb package

• which will of course conflict with half a dozen Ubuntu packages

42

Page 46: Openstack Reality Check (DACH Tag Edition)

… and the ugly.https://www.flickr.com/photos/59145750@N03/5559004171 Mara Tr. (CC-BY 2.0)

Page 47: Openstack Reality Check (DACH Tag Edition)

Technology-Jenga

• OpenContrail’s "Stack"

• vrouter (.ko), if-map, Sandesh (XML over Thrift!)

• C++, Python, node.js, irond (Java)

• redis, Cassandra, Zookeeper

• xmpp, BGP, MPLS

• Up Next in OpenContrail 2.2: Kafka

• Put that into a job profile, try to find a candidate. Is it Pedro?

44

Page 48: Openstack Reality Check (DACH Tag Edition)

Lesson #6:The Jenga problem is actually not limited to Contrail scope.

45

Page 49: Openstack Reality Check (DACH Tag Edition)

Let's look what else is in the box…https://www.flickr.com/photos/markjsebastian/4217877353/ Mark Sebastian (CC BY-SA 2.0)

Page 50: Openstack Reality Check (DACH Tag Edition)

General requirements as of 2015

• Distributed Anything: • NTP, centralized Logging, centralized Monitoring • functional validation before components before join, clean

cluster startup • cluster comms are Kyle-Kingsbury-proof • CA, encrypted data in flight, optionally encrypted data at rest • User-Story regarding Live-Upgrades, with Canaries

47

Page 51: Openstack Reality Check (DACH Tag Edition)

OpenStack delivers this…https://www.flickr.com/photos/z287marc/3189567558/ z287marc (CC BY 2.0)

Page 52: Openstack Reality Check (DACH Tag Edition)

A wild hack session appears…https://www.flickr.com/photos/nauright/11341288174/ Romana Klee (CC BY-SA 2.0)

Page 53: Openstack Reality Check (DACH Tag Edition)

Not an isolated problem…

• Start a HEAT stack with a few networks and 20 VMs• … and the cluster switches off.• Nova does not communicate with Cinder, but has a

hard timeout.• "Are we stupid or is OpenStack stupid? Let's test a few

public clouds…"

50

Page 54: Openstack Reality Check (DACH Tag Edition)

Result…

• Phone ringing… • "Whatever you are doing,

could you please do something else?"

51

"Rotes Telefon", http://de.wikipedia.org/wiki/Datei:Jimmy_Carter_Library_and_Museum_99.JPG User:Piotrus (CC-BY 3.0)

Page 55: Openstack Reality Check (DACH Tag Edition)

Not an isolated problem…52

Implementation tested

on Devstack in VMware Fusion

on a MacBook Air in St. Oberholz?

Page 56: Openstack Reality Check (DACH Tag Edition)

Not an isolated problem…53

https://thespeechatimeforchoosing.files.wordpress.com/2012/12/facepalm.jpg

Keystone

Page 57: Openstack Reality Check (DACH Tag Edition)

Nova can't count has no clue whatsoever54

Page 58: Openstack Reality Check (DACH Tag Edition)

Not an isolated problem…55

Ceilometer

http://de.wikipedia.org/wiki/Datei:Deritend_Car_Park.jpg User:Oosoom (CC-BY)

Page 59: Openstack Reality Check (DACH Tag Edition)

What is the real problem?https://www.flickr.com/photos/jdhancock/8395113234 JD Hancock (CC BY 2.0)

Page 60: Openstack Reality Check (DACH Tag Edition)

„Any VM, anywhere.“

57

Page 61: Openstack Reality Check (DACH Tag Edition)

„As a hoster/enterprise/department, I want a virtualization platform

which does…“

58

Page 62: Openstack Reality Check (DACH Tag Edition)

Problemspec vs. Produktspec

• Requirements regarding • Tenant Isolation, • Billing Model, • Operations Model, • Development Model, • Scalability

59

Page 63: Openstack Reality Check (DACH Tag Edition)

Prototypes vs. Product"Prototype in the round file", https://www.flickr.com/photos/generated/3313311558 Jared Tarbell (CC-BY 2.0)

Page 64: Openstack Reality Check (DACH Tag Edition)

Stabilize the core with actual engineering vs.

Moar components

61

Page 65: Openstack Reality Check (DACH Tag Edition)

“The Big Tent”https://www.flickr.com/photos/fmckinlay/2410261506 Fiona Shields (CC BY-NC-ND 2.0)

Page 66: Openstack Reality Check (DACH Tag Edition)

"The problem with the Big Tent is that it is full of clowns."https://www.flickr.com/photos/fmckinlay/2410261506 Fiona Shields (CC BY-NC-ND 2.0)

Page 67: Openstack Reality Check (DACH Tag Edition)

–Maik Zumstrull

„Instead of having a functional solution I can now choose between 13

differently deficient components.“

64

Page 68: Openstack Reality Check (DACH Tag Edition)

Democracy as a software architecture model

• Product definition == target specification• Who is our customer and what do they need?• What are the mandatory/desireable/optional

properties of the product to become part of the solution space instead of the problem space?

65

Page 69: Openstack Reality Check (DACH Tag Edition)

Democracy as a software architecture model

• Quality Assurance• "But OpenStack does have Blueprints, CI and code

review"• Without a target specification you have no idea about

requirements on scale, workload types, security, …

66

Page 70: Openstack Reality Check (DACH Tag Edition)

Who actually votes on your commit…

Page 71: Openstack Reality Check (DACH Tag Edition)

Democracy as a software architecture model

• Limited, learnable technology stack• One recommended and tested solution with a

documented best practice for deployment instead of a plugin architecture.

• Reuseable solutions for comparable problems in neighboring subprojects (“Architecture Refactoring”).

68

Page 72: Openstack Reality Check (DACH Tag Edition)

Monasca

Page 73: Openstack Reality Check (DACH Tag Edition)

How to win at Hipsterbingo…

• »Monasca is an open-source multi-tenant, highly scalable, performant, fault-tolerant monitoring-as-a-service solution that integrates with OpenStack. It uses a REST API for high-speed metrics processing and querying and has a streaming alarm engine and notification engine.«

70

Bingo!

Page 74: Openstack Reality Check (DACH Tag Edition)

How to win at Hipsterbingo…

• »Uses a number of underlying technologies:Apache Kafka, Apache Storm, Zookeeper, MySQL, Vagrant, Dropwizard, InfluxDB, Vertica.«

71

Page 75: Openstack Reality Check (DACH Tag Edition)

Democracy as a software architecture model

• Useful vs. Hyped • “Docker is like OpenStack,

but from a Dev instead of an Ops perspective. Wouldn't it be great to unify the projects?”

72

https://twitter.com/martinisoft/status/527191803603468288

Page 76: Openstack Reality Check (DACH Tag Edition)

And finally…73

Page 77: Openstack Reality Check (DACH Tag Edition)

And finally…73

Page 78: Openstack Reality Check (DACH Tag Edition)

TL;DR please?

74

Page 79: Openstack Reality Check (DACH Tag Edition)

TL;DR

• OpenStack right now: not a product, but a box of nuts and bolts. Significant assembly required.

• Not all parts are actually bad, but the quality varies. A lot.

• No other project has the momentum. • Most other projects do not have multi-tenancy, multi-node. • Many haven't even grasped the actual problem. Including Docker.

75

Page 80: Openstack Reality Check (DACH Tag Edition)

TL;DR

• Deep in the Gartner hype cycle phase 2:

• Users and hence real-world requirements missing.

• No unified view on acceptable solutions.

• Vendors think they can "win" and "own" the market or products, pushing their embrace and extend agendas.

76

Page 81: Openstack Reality Check (DACH Tag Edition)

77

?

Page 82: Openstack Reality Check (DACH Tag Edition)