what is the cloud? (english)
TRANSCRIPT
What is the Cloud?Kristian Köhntopp Old Fart SysEleven
© 2015 Kristian Köhntopp
Chapter 1: Hardware for Hipsters
Help, too much compute!http://hpserver.by/images/detailed/1/hp_dl380p_gen8_inside_in_t7e8-xt.jpg © 2014 HP Press Material
http://hpserver.by/images/detailed/1/hp_dl380p_gen8_inside_in_t7e8-xt.jpg © 2014 HP Press Material
CPU
used 8
unused 40
RAM
used 16
unused 240
CPU
used 4
unused 44
RAM
used 8
unused 248
Java Appserver
PHP Appserver
"Solution": virtual machines5
Hardware Node
vSwitchvRouter
VM VM VM VM VM
"Solution": virtual machines6
4 Cores8 GB RAM50 GB Ephemeral Disk
8 Cores32 GB RAM50 GB Ephemeral Disk
2 TB Persistent Volume
2 Network Interfaces2 Cores4 GB RAM50 GB Ephemeral Disk
Booting an Instance, parts needed.
• Turn Boot Image into Ephemeral Disk
• Attach Volume • Attach Network • Start VM • DHCP • Config: Hostname, Startscript
7
8 Cores32 GB RAM50 GB Ephemeral Disk
2 TB Persistent Volume
Glance
Cinder
Neutron
Nova
cloud-init
Handling hypervisor failure
• Essential: Persistent data from volume.
• Everything else is faster to recreate than to recover.
• Iff: Setup is completely automated.
8
8 Cores32 GB RAM50 GB Ephemeral Disk
2 TB Persistent Volume
Puppet, Ansible, Salt, Chef
More than a single machine…9
CPU, RAM
StorageNetwork
OverlayUnderlay
[x] It's complicated…
• Underlay: • Multiple Hosts (how many?), shared Storage, sufficient
amount of network • Overlay:
• freely defineable Networks, freely defineable Storage, defineable Guests, defineable Firewall- and Loadbalancer-Rulesets
10
Infrastructure as Code…11
Merkel: »Das Internet ist für uns alle #neuland«https://en.wikipedia.org/wiki/File:Angela_Merkel_Juli_2010_-_3zu4.jpg, Armin Linnartz
Problem 1: Storage
• Filer? • Pro: proven technology, sufficient bandwidth, storage
network separated. • Contra: How to scale this in size and financially?
Storage network separated.
• Alternatives?
13
I can haz live migration, plz?
• Yes, but for a price. • Live migration good for reboot
of Underlay nodes, fixup of scheduling problems, data recovery
• requires: Shared Storage
14
Problem 2: Network-Capacity
• Convergence: Use disks in computes for storage. • Hyperconvergence: Fold storage network into production
network. • Examples: HDFS, Ceph, Quobyte, …
• 3 Copies, one Off-Rack • Latency? IOPS? Bandwidth? • How much network per compute node?
15
Mercury Redstone Connector MR-1 (1960) https://www.flickr.com/photos/jurvetson/5691350527 Steve Jurvetson (CC-BY)
2005: 50 DL360 = 50 Cores, 50 GBit/s Net, ~ 2 Racks
2015: 2 HE, 48 Cores, 2x 10 GBit/s Net = ~40% Net
Ohai, can I haz 2x 25 Gbit/s, plz?
„Be careful what you wish for!“
19
16 DL380 with 2x 25 GBit/s per Rack, Ceph (Dramatization, Do Not Attempt)
Top of Rack Switch
18U Rack (resize vertically) 18U Rack (resize vertically) 18U Rack (resize vertically)
2x 10GBit/s(2400 MB/sec) 2x 25 GBit/s(6000 MB/sec)
20 Boxen,40 Interfaces,~ 1 TBit/saggregated
Storage Traffic (East-West Traffic)
Internet (North South Traffic)
VM with Volume
Capacity problem? What capacity problem?21
Terasort to watch the world burn22
http://www.slideshare.net/pramodbiligiri/shuffle-phase-as-the-bottleneck-in-hadoop-terasort by http://www.slideshare.net/pramodbiligiri/presentations
Meanwhile, at the Chocolate Factory…23
Google “Jupiter” Superblock, “1 Petabit/sec of total bisection bandwidth”© 2015 Google Presserelease
Build principle: Leaf and Spine24
http://bradhedlund.com/2012/01/25/construct-a-leaf-spine-design-with-40g-or-10g-an-observation-in-scaling-the-fabric/
Net >> Storage
• Usable Storage needs a lot of network • “Leaf and Spine” needs a central flow controller
• Several vendors groked that. • But there are no large scale functional deployments.
25
Contrail26
Midonet27
Side note: Power, Cooling
“With great power comes a great electricity bill…”web.de Amalienbadstrasse, Karlsruhe, (C) 2004 Kristian Köhntopp
High Density
• 6 blade centers or 16 2HU servers ~ 20kW per rack
• Air cooled: • “specific heat capacity”
(Heat 1kg by 1K) • hot aisle/cold aisle
30
web.de Amalienbadstrasse, Karlsruhe, (C) 2004 Kristian Köhntopp
Chapter 2: Overlay and Underlay
Scheduler, Spread Strategy32
VMHost
http://hpserver.by/images/detailed/1/hp_dl380p_gen8_inside_in_t7e8-xt.jpg © 2014 HP Press Material
CPU
used 8
unused 40
RAM
used 16
unused 240
CPU
used 4
unused 44
RAM
used 8
unused 248
Java Appserver
PHP Appserver
IMBA: Uneven Ressorce Usage
Which resource is needed most?34
Resources
• 48 Cores: • 256 GB RAM, 2x 10 GBit/s • 12x 3TB Disk (200 IOPS ea) or 4x 2TB SSD (20k IOPS ea)
• per Core (“Compute Unit”) • 5 GB RAM, 400 MBit/s, 50 IOPS Disk, 1500 IOPS SSD
35
Flavors
• “Compute Unit”: “1/48 of a box” • 5 GB RAM, 400 MBit/s, 50 IOPS Disk, 1500 IOPS SSD
• Flavor: • x Compute Units • Flavor i = 2* Flavor (i-1) • no clipping waste
36
Isolation: Quota on everything
• CPU Cores • RAM • Disk I/O (IOPS, MB/s) • Netz I/O (Bit/s)
37
VMHost
Quota with Token Bucket
Arrival Rate
Volume = Elasticity
Consumption
One Image, many instances39
H
HardwareNode
Ubuntu14.04 LTS
Appserver 1
Appserver 2
DatabaseMaster
copyon
write
download
Glance
More SSD
for every
body!
QCOW2:
Turn linear I/O
into random I/O
Ephemeral vs. Persistent Volume40
MySQLDB
Master
/dev/vda
/dev/vdb
50 GBtied to VMlifetime defined by VM
Configureable sizeDetachable/Attachablelifetime variable(billed)
MySQLDB
Master
/dev/vda
MySQLDB
Master
/dev/vda
/dev/vdb
50 GBtied to VMlifetime defined by VM
Configureable sizeDetachable/Attachablelifetime variable(billed)
MySQLDB
Master
/dev/vda
Floating IP41
Floating IP
Internal IP 1
Internal IP 2
Distributed Anything42
Distributed Anything43
ZK ZK ZK
ZK ZK ZK
Distributed Anything44
ZK ZK ZKZK ZK ZK
Distributed Anything45
Distributed Anything
• “Cluster membership” using MySQL, Redis, MongoDB, … does not work.
• “Paxos", “Raft” and other proveable consensus algorithms do work (ZK, etcd, consul)
• when applied correctly • “Kyle Kingsbury Proof” (http://aphyr.com/tags/jepsen)
46
Control Systems
• "should" vs. "is" state • within the same Paxos-Domain
• State Transition • Check, Update of "is"-state by measurement • self regulating
47
Distributed Anything vs. Performance
• “Microservices” • “Pile of network traversals” • Disconnect, Partition • Throughput, Jitter • Asynchronous Calls? Straggler Handling? Total Jitter! • non-linear performance, non-linear operative complexity
48
• Virtualization is High Density Computing. • That's not cheaper, only different. • Especially a new network design is needed. • The SDN issue is mostly open, and a much harder nut
to crack than all other topics.
50
• “Infrastructure as Code” is cool. • “Automated Provisioning”. • Network insufficiency is visible to upper layers, as
fsync/Commit insufficiency. • “Microservices”, “distributed anything” - avoid, if at all
possible. If not, do it properly. Overhead!
51
52
?