lecture 21 - cloudsswift/classes/cs739-fa19... · –scale-up for most apps •elimination of...

11/21/19

1

Cloud Computing

CS 739 Fall 2019

1

What is cloud computing

• Illusion of infinite computing resources available on demand– Scale-up for most apps

• Elimination of up-front commitment– Small initial investment, scale only as needed

• Pay-per-use on short-term basis– transfer purchase risk (will equipment be used?) to

cloud provider• Result: Cost-associativity

– Use of 1 CPU for 100 hours costs the same as 100 CPUs for one hour

2

11/21/19

2

Historical Context

• Utility computing – late 1960’s– Genesis of Multics at MIT/GE– Central pool of mainframes serve dial-up

customers– Developed as time-sharing bureaus

• Single server, many clients– No real connectivity between servers

3

Death of Utility Computing

• Killer Micros: cheap PCs that afforded more flexibility, more power then central computer– text terminals couldn’t compete on flexibility, GUI– Micros more cost effective EXCEPT for

management• upgrades, patches, software installs

4

11/21/19

3

Rise of Cloud Computing

• Data centers ride commodity-part tide– Computing in the large now similar in price to client-

side computing• Internet properties develop technology to

efficiently deploy, manage, serve large clusters– container-based data centers– scalable, replicated, distributed, reliable storage

• Rich programming models allow better client-side UI– AJAX, JavaScript, HTML

5

Enabling Technologies

• Virtualization– For platform-as-a-service and remote management

• Web Services– XML-RPC, SOA, etc. – APIs to services

• Cluster management experience by large internet companies– E.g. Amazon, Google, Microsoft

• Cluster management software– Storage: GFS, Dryad

6

11/21/19

4

Cloud Computing Models

• First wave: Software-as-a-Service (SaaS)– Rather than install software on customer

machines, run it in a provider data center• E.g.: HotMail, Salesforce.com

– Web-scale software: serve everybody

7

Software-as-a-Service

• Think GMail, google docs• What are the benefits?

– No end-customer management; provider can do on-line upgrades, provisioning

– No hardware purchase– Vendor sees how customer uses software, can adapt

• What are the downsides?– The internet goes down– Edge bandwidth can be low– Browser lacks polish of real UIs

8

11/21/19

5

Infrastructure-as-a-Service

• Provide virtual machines, networking– Augment with higher level services:

• Storage• Monitoring• Security

• Customer provides complete software stack• Prevents lock in

– Few APIs to write against beyond data management

9

Platform-as-a-Service

• Third wave (PaaS)– Providers supply generic platform for running

Software-as-a-Service AND collaborative web software

• Allow third parties to host their applications on common infrastructure

• Google AppEngine, Windows Azure (sort of)– Still API-rich

• Provide many services (load balancing, request distribution, scheduling)

10

11/21/19

6

Coming: Data as a service

• Step 1: storage as a service– Dropbox/OneDrive/Google Drive– Amazon Simple Storage Service– Why: Cheaper than managing storage locally

• Step 2: data as a service– Windows Azure Data Market

• Rent data from someone else rather than collecting your own

• E.g make large data sets available for sale– Geographical databases– Historical data sets (e.g. weather)

11

What drives Cloud Computing

• Economics:– Cost of providing computing services locally– Utilization of enterprise infrastructure– Provisioning for worst-case

12

11/21/19

7

Cost of Provisioning

• Within a company:– Real estate/power is expensive (often better used for

office space)– Lacks large economies of scale– Management typically 1 person 10-100 machines

• Within a data center:– Put where land/power is cheap– Buy in bulk (containers, thousands of machines,

gigabits of bandwidth)– Low management overhead: lots of self-managing

systems, 1 person/ 1000-10,000 machines

13

Elastic Provisioning

• Companies must provision for worst case– Leads to low utilization (1-20%) most of the time– Leads to overload some of the time (slashdot effect)– Hard to grow rapidly if popularity surges

• e.g. doubling every day• May lose customers if bad service

• Cloud provisioning can start large, multiplex machines over many customers– Higher average utilization– Higher resource availability to surging sites– Cheap to decommission resources as popularity falls

14

11/21/19

8

Unused resources

Cloud Economics 101

• Cloud Computing User: Static provisioning for peak - wasteful, but necessary for SLA

“Statically provisioned”data center

“Virtual” data center in the cloud

Demand

Capacity

Time

Demand

Capacity

Time

15

Elastic Provisioning (2)

• Billing based on usage– Shifts risk from customer to provider– Separates cost of memory, I/O, disk, CPU and bill

appropriately– Reduces risk of trying cloud computing

• Shifts capital expenses (buying things) to operational expense (cost of providing service)– Small upfront cost good for starting out

16

11/21/19

9

When is data center computing cheaper?

• Unpredictable/fluxuating workloads– Expensive to provision your own– Good for:

• startups who don’t know their popularity• big companies with predictable peak loads (e.g. christmas, olympics)

• Not super-CPU intensive– Raw CPU, memory, disk is cheaper locally (not scalable, not

replicated)– Extra charge for renting relatively small (2.5 x for CPU, 20-50%

for disk)– QUESTION: Why disk cheaper? it must be replicated ….

• Dedup?• Volume purchases?

17

Example

• Zynga started off in Amazon EC2– Large growth in demand for some games

• Over time, built private cloud– Older games had predictable workloads– Much cheaper in private data center

• New games with bursts of popularity start in EC2

• “Surge usage” – overflow to cloud only when private datacenters at capacity

18

11/21/19

10

Cloud Economics

• Some costs cheaper in cloud– Fixed costs of buildings, buy machines, power

• Cheaper when bought in bulk

– Variable costs cheaper• Bandwidth much cheaper in bulk (e.g. 10x)• System management much cheaper (e.g. 10-100x)

– Massive redundancy– Massive homogeneity

19

Energy & Cloud Computing?• Cloud Computing saves Energy?

• Don’t buy machines for local use that are often idle• More efficient cooling

• Newest data centers are air cooled only using outside air

• Better to ship bits as photons over fiber vs. ship electrons over transmission lines to spin disks, power processors locallyo Clouds use nearby (hydroelectric) powero Leverage economies of scale of cooling, power distribution

20

11/21/19

11

What apps can move to the cloud?

• Compute over same data multiple times– Amortize cost of uploading data

• Perfectly scalable parallel apps (batch)– Can compute N times faster on N more machines– E.g. NY times, WA post scanning documents

• CPU/data intensive apps for mobile devices– provide heavy lifting, data persistence

• Apps with widely variable resource demands– Animoto rendering service

21

Differences to App Writers

• State no longer lives in a file system– Storage systems for shared data (e.g. S3,

databases) instead

• Applications scale both ways– Up for bigger loads– Down for multi-tenancy

• Idle cycles aren’t free

22

11/21/19

12

What apps cannot move to the cloud?

• Video games? – Heavy client-side CPU component– But:

• Compute graphics in cloud and stream to client • OnLive.com

• Banking– Need better security

• Offline apps

23

Cloud vs. Grid• Grid: more about batch scheduling parallel jobs

– Each machine generally runs only jobs for one customer– Jobs typically use multiple machines– Jobs are not interactive– Big data set, large data objects– Scientific computing

• Cloud– Multi-tenancy (multiple customers on a machine)– Resource-based billing– Public (often)– Fine-grained storage

24

11/21/19

13

Public vs private cloud

• Public: someone else owns infrastructure– Many apps from many companies

• Private: your organization owns infrastructure– Many apps from your company

• Reasoning:– Private cloud cheaper for predictable workloads – e.g.

zynga for mature games– Private cloud more secure for sensitive workloads

• MS development/test• DoD/Military

25

Serverless Cloud Computing• What does cloud computing do for you?

– Infinite capacity– Pay as you dedicate resources– Scalable platform services (e.g., reliable storage)

• What does it not do?– Redundancy/replication of compute– Geo-distribution– Load balancing– Autoscaling on load– Monitoring– Logging– System upgrades/patching (many VMs running old insecure)– Migration to new hardware types when available

26

11/21/19

14

Serverless Idea1. Manage a set of user defined functions2. Take an event sent over HTTP or received from an

event source3. Determine function(s) to which to dispatch the event4. Find an existing instance of function or create a new

one5. Send the event to the function instance6. Wait for a response7. Gather execution logs8. Make the response available to the user9. Stop the function when it is no longer needed.

27

What does it solve?

• VS containers:– 25 second startup median– 80% of time spent on package installation– Slow to scale

• With AWS Elastic Beanstalk autoscale– Must scale whole VMs– Takes 20 seconds to deploy new instance

• 20 seconds of unsatisfied requests

28

11/21/19

15

Use cases

• Original use case: infrequent events– Example: run virus scanner when file uploaded to

cloud• If upload 10 files/day, VM too heavyweight – idle 99.9%

of time

• New use case: radical scaling– Want to transcode 1-hour video in 1 minute à

use 1,000,000 functions in parallel– Works because can get additional resources for

short period very cheaply

29

Serverless Economics

• Compared to VMs:– 100% utilized VM for long period is always

cheaper• E.g. medium-latency (10s of minutes or longer) batch

jobs– But pay added cost of management: replication,

load balancing

• When better:– Bursty workloads – web requests

30

11/21/19

16

Basic Architecture

• Credit: Ion Stoica

31

How Serverless Functions Work

• Customer: provide a handler function– HTTP request– Trigger from storage service

• Object/data changed– Stateless: no local memory from one request to the

next• Abilities:

– Processing– Access platform data services– Invoke APIs, other functions– Return result

32

11/21/19

17

Handling state

• Option 1: provide state with request– All information passed in.– Example: scanning request for vulnerability (e.g. CSS)– Output is filtered request/detection

• Option 2: platform services– Blob, table, queue storage

• Challenge: platforms service mismatch– Target VMs with internal state, so only store state for

persistence & caching

33

Serverless Platforms

• Platform:– Find a virtual machine– Create a container– Install function code– Send request to container– Terminate container

• Question: how do we make this fast?– Launch a VM: 1 minute– Launch a container: 20 seconds

34

11/21/19

18

Fast Serverless Functions

• Fast VM launch:– OS provided by platform à can pre-launch VMs

and keep them around• Fast container launch:

– Standard runtimes (python, java, javascript) àpre-install files

• Fast function invocation:– Time spent in runtime library initialization à re-

use same runtime

35

Provider Scheduling Functions

• On first request:– Cold start: find a VM/Container, send code to start

• On second request:– Find existing container, send next request to it

• On flood of requests:– Decide when to create a new instance

• How much queuing, how much spawning new?

• On quiescence– Decide when to terminate existing instances

36

11/21/19

19

Example: AWS Lambdas

• Cold launch: 200 ms• Hot launch• Scaling:

– Pack additional functions on same machine as existing VMs

– Shutdown > 1 in 5 minutes last one after 30 minutes

37

Other platform functions

• Load balance: find which VM/host to use• Fault tolerance: handle failure of worker hosts• Upgrade: migrate to new OS, VMM, hardware

38

11/21/19

20

Billing/Sizing model

• Developer chooses requested memory capacity– 128mb – 1800 mb for AWS

• CPU scheduling proportional to memory– 128mb -> 1/8th of a core, 256 MB -> 1/4th of a core

• Power-of-2 sizing allows efficient placement– Easy to allocate fixed-sized regions

• Billing: memory x time running– GCF bug: not charge for background thread

• Challenges: developer must know how much memory needed

39

AWS isolation

• Question: how safely isolate tenants from each other?– Options: Containers on an OS, VMs?

• Azure functions: containers on an OS– Fewer resources, OS bug can compromise

isolation• AWS Lambdas: each tenant gets a VM,

containers in that VM (old data)– More resources, more isolation

40

11/21/19

21

TradeoffsLambdas Containers VMs

Isolation Low Medium high

Flexibility Low Medium high

Overhead consumption

High Medium low

Overhead latency (startup)

Low Medium High

Management Low Medium High

41

Key Distinction of Serverless

• Decouple execution and storage– Stateless functions (easy to scale, recover)– Platform state management– Key: separation is like bigtable, frangipani

• Execute code with managing resource allocation– Provider handles # of hosts, VMs, instances, and

where they run• Pay-by-use instead of resources allocation

– VM: pay by hour running, even if idle– Serverless: pay only while execute; idle is free

42

lecture 21 - cloudsswift/classes/cs739-fa19... · –scale-up for most apps •elimination of...

Documents