lecture 21 - cloudsswift/classes/cs739-fa19... · –scale-up for most apps •elimination of...
TRANSCRIPT
11/21/19
1
Cloud Computing
CS 739 Fall 2019
1
What is cloud computing
• Illusion of infinite computing resources available on demand– Scale-up for most apps
• Elimination of up-front commitment– Small initial investment, scale only as needed
• Pay-per-use on short-term basis– transfer purchase risk (will equipment be used?) to
cloud provider• Result: Cost-associativity
– Use of 1 CPU for 100 hours costs the same as 100 CPUs for one hour
2
11/21/19
2
Historical Context
• Utility computing – late 1960’s– Genesis of Multics at MIT/GE– Central pool of mainframes serve dial-up
customers– Developed as time-sharing bureaus
• Single server, many clients– No real connectivity between servers
3
Death of Utility Computing
• Killer Micros: cheap PCs that afforded more flexibility, more power then central computer– text terminals couldn’t compete on flexibility, GUI– Micros more cost effective EXCEPT for
management• upgrades, patches, software installs
4
11/21/19
3
Rise of Cloud Computing
• Data centers ride commodity-part tide– Computing in the large now similar in price to client-
side computing• Internet properties develop technology to
efficiently deploy, manage, serve large clusters– container-based data centers– scalable, replicated, distributed, reliable storage
• Rich programming models allow better client-side UI– AJAX, JavaScript, HTML
5
Enabling Technologies
• Virtualization– For platform-as-a-service and remote management
• Web Services– XML-RPC, SOA, etc. – APIs to services
• Cluster management experience by large internet companies– E.g. Amazon, Google, Microsoft
• Cluster management software– Storage: GFS, Dryad
6
11/21/19
4
Cloud Computing Models
• First wave: Software-as-a-Service (SaaS)– Rather than install software on customer
machines, run it in a provider data center• E.g.: HotMail, Salesforce.com
– Web-scale software: serve everybody
7
Software-as-a-Service
• Think GMail, google docs• What are the benefits?
– No end-customer management; provider can do on-line upgrades, provisioning
– No hardware purchase– Vendor sees how customer uses software, can adapt
• What are the downsides?– The internet goes down– Edge bandwidth can be low– Browser lacks polish of real UIs
8
11/21/19
5
Infrastructure-as-a-Service
• Provide virtual machines, networking– Augment with higher level services:
• Storage• Monitoring• Security
• Customer provides complete software stack• Prevents lock in
– Few APIs to write against beyond data management
9
Platform-as-a-Service
• Third wave (PaaS)– Providers supply generic platform for running
Software-as-a-Service AND collaborative web software
• Allow third parties to host their applications on common infrastructure
• Google AppEngine, Windows Azure (sort of)– Still API-rich
• Provide many services (load balancing, request distribution, scheduling)
10
11/21/19
6
Coming: Data as a service
• Step 1: storage as a service– Dropbox/OneDrive/Google Drive– Amazon Simple Storage Service– Why: Cheaper than managing storage locally
• Step 2: data as a service– Windows Azure Data Market
• Rent data from someone else rather than collecting your own
• E.g make large data sets available for sale– Geographical databases– Historical data sets (e.g. weather)
11
What drives Cloud Computing
• Economics:– Cost of providing computing services locally– Utilization of enterprise infrastructure– Provisioning for worst-case
12
11/21/19
7
Cost of Provisioning
• Within a company:– Real estate/power is expensive (often better used for
office space)– Lacks large economies of scale– Management typically 1 person 10-100 machines
• Within a data center:– Put where land/power is cheap– Buy in bulk (containers, thousands of machines,
gigabits of bandwidth)– Low management overhead: lots of self-managing
systems, 1 person/ 1000-10,000 machines
13
Elastic Provisioning
• Companies must provision for worst case– Leads to low utilization (1-20%) most of the time– Leads to overload some of the time (slashdot effect)– Hard to grow rapidly if popularity surges
• e.g. doubling every day• May lose customers if bad service
• Cloud provisioning can start large, multiplex machines over many customers– Higher average utilization– Higher resource availability to surging sites– Cheap to decommission resources as popularity falls
14
11/21/19
8
Unused resources
Cloud Economics 101
• Cloud Computing User: Static provisioning for peak - wasteful, but necessary for SLA
“Statically provisioned”data center
“Virtual” data center in the cloud
Demand
Capacity
Time
Demand
Capacity
Time
15
Elastic Provisioning (2)
• Billing based on usage– Shifts risk from customer to provider– Separates cost of memory, I/O, disk, CPU and bill
appropriately– Reduces risk of trying cloud computing
• Shifts capital expenses (buying things) to operational expense (cost of providing service)– Small upfront cost good for starting out
16
11/21/19
9
When is data center computing cheaper?
• Unpredictable/fluxuating workloads– Expensive to provision your own– Good for:
• startups who don’t know their popularity• big companies with predictable peak loads (e.g. christmas, olympics)
• Not super-CPU intensive– Raw CPU, memory, disk is cheaper locally (not scalable, not
replicated)– Extra charge for renting relatively small (2.5 x for CPU, 20-50%
for disk)– QUESTION: Why disk cheaper? it must be replicated ….
• Dedup?• Volume purchases?
17
Example
• Zynga started off in Amazon EC2– Large growth in demand for some games
• Over time, built private cloud– Older games had predictable workloads– Much cheaper in private data center
• New games with bursts of popularity start in EC2
• “Surge usage” – overflow to cloud only when private datacenters at capacity
18
11/21/19
10
Cloud Economics
• Some costs cheaper in cloud– Fixed costs of buildings, buy machines, power
• Cheaper when bought in bulk
– Variable costs cheaper• Bandwidth much cheaper in bulk (e.g. 10x)• System management much cheaper (e.g. 10-100x)
– Massive redundancy– Massive homogeneity
19
Energy & Cloud Computing?• Cloud Computing saves Energy?
• Don’t buy machines for local use that are often idle• More efficient cooling
• Newest data centers are air cooled only using outside air
• Better to ship bits as photons over fiber vs. ship electrons over transmission lines to spin disks, power processors locallyo Clouds use nearby (hydroelectric) powero Leverage economies of scale of cooling, power distribution
20
11/21/19
11
What apps can move to the cloud?
• Compute over same data multiple times– Amortize cost of uploading data
• Perfectly scalable parallel apps (batch)– Can compute N times faster on N more machines– E.g. NY times, WA post scanning documents
• CPU/data intensive apps for mobile devices– provide heavy lifting, data persistence
• Apps with widely variable resource demands– Animoto rendering service
21
Differences to App Writers
• State no longer lives in a file system– Storage systems for shared data (e.g. S3,
databases) instead
• Applications scale both ways– Up for bigger loads– Down for multi-tenancy
• Idle cycles aren’t free
22
11/21/19
12
What apps cannot move to the cloud?
• Video games? – Heavy client-side CPU component– But:
• Compute graphics in cloud and stream to client • OnLive.com
• Banking– Need better security
• Offline apps
23
Cloud vs. Grid• Grid: more about batch scheduling parallel jobs
– Each machine generally runs only jobs for one customer– Jobs typically use multiple machines– Jobs are not interactive– Big data set, large data objects– Scientific computing
• Cloud– Multi-tenancy (multiple customers on a machine)– Resource-based billing– Public (often)– Fine-grained storage
24
11/21/19
13
Public vs private cloud
• Public: someone else owns infrastructure– Many apps from many companies
• Private: your organization owns infrastructure– Many apps from your company
• Reasoning:– Private cloud cheaper for predictable workloads – e.g.
zynga for mature games– Private cloud more secure for sensitive workloads
• MS development/test• DoD/Military
25
Serverless Cloud Computing• What does cloud computing do for you?
– Infinite capacity– Pay as you dedicate resources– Scalable platform services (e.g., reliable storage)
• What does it not do?– Redundancy/replication of compute– Geo-distribution– Load balancing– Autoscaling on load– Monitoring– Logging– System upgrades/patching (many VMs running old insecure)– Migration to new hardware types when available
26
11/21/19
14
Serverless Idea1. Manage a set of user defined functions2. Take an event sent over HTTP or received from an
event source3. Determine function(s) to which to dispatch the event4. Find an existing instance of function or create a new
one5. Send the event to the function instance6. Wait for a response7. Gather execution logs8. Make the response available to the user9. Stop the function when it is no longer needed.
27
What does it solve?
• VS containers:– 25 second startup median– 80% of time spent on package installation– Slow to scale
• With AWS Elastic Beanstalk autoscale– Must scale whole VMs– Takes 20 seconds to deploy new instance
• 20 seconds of unsatisfied requests
28
11/21/19
15
Use cases
• Original use case: infrequent events– Example: run virus scanner when file uploaded to
cloud• If upload 10 files/day, VM too heavyweight – idle 99.9%
of time
• New use case: radical scaling– Want to transcode 1-hour video in 1 minute à
use 1,000,000 functions in parallel– Works because can get additional resources for
short period very cheaply
29
Serverless Economics
• Compared to VMs:– 100% utilized VM for long period is always
cheaper• E.g. medium-latency (10s of minutes or longer) batch
jobs– But pay added cost of management: replication,
load balancing
• When better:– Bursty workloads – web requests
30
11/21/19
16
Basic Architecture
• Credit: Ion Stoica
31
How Serverless Functions Work
• Customer: provide a handler function– HTTP request– Trigger from storage service
• Object/data changed– Stateless: no local memory from one request to the
next• Abilities:
– Processing– Access platform data services– Invoke APIs, other functions– Return result
32
11/21/19
17
Handling state
• Option 1: provide state with request– All information passed in.– Example: scanning request for vulnerability (e.g. CSS)– Output is filtered request/detection
• Option 2: platform services– Blob, table, queue storage
• Challenge: platforms service mismatch– Target VMs with internal state, so only store state for
persistence & caching
33
Serverless Platforms
• Platform:– Find a virtual machine– Create a container– Install function code– Send request to container– Terminate container
• Question: how do we make this fast?– Launch a VM: 1 minute– Launch a container: 20 seconds
34
11/21/19
18
Fast Serverless Functions
• Fast VM launch:– OS provided by platform à can pre-launch VMs
and keep them around• Fast container launch:
– Standard runtimes (python, java, javascript) àpre-install files
• Fast function invocation:– Time spent in runtime library initialization à re-
use same runtime
35
Provider Scheduling Functions
• On first request:– Cold start: find a VM/Container, send code to start
• On second request:– Find existing container, send next request to it
• On flood of requests:– Decide when to create a new instance
• How much queuing, how much spawning new?
• On quiescence– Decide when to terminate existing instances
36
11/21/19
19
Example: AWS Lambdas
• Cold launch: 200 ms• Hot launch• Scaling:
– Pack additional functions on same machine as existing VMs
– Shutdown > 1 in 5 minutes last one after 30 minutes
37
Other platform functions
• Load balance: find which VM/host to use• Fault tolerance: handle failure of worker hosts• Upgrade: migrate to new OS, VMM, hardware
38
11/21/19
20
Billing/Sizing model
• Developer chooses requested memory capacity– 128mb – 1800 mb for AWS
• CPU scheduling proportional to memory– 128mb -> 1/8th of a core, 256 MB -> 1/4th of a core
• Power-of-2 sizing allows efficient placement– Easy to allocate fixed-sized regions
• Billing: memory x time running– GCF bug: not charge for background thread
• Challenges: developer must know how much memory needed
39
AWS isolation
• Question: how safely isolate tenants from each other?– Options: Containers on an OS, VMs?
• Azure functions: containers on an OS– Fewer resources, OS bug can compromise
isolation• AWS Lambdas: each tenant gets a VM,
containers in that VM (old data)– More resources, more isolation
40
11/21/19
21
TradeoffsLambdas Containers VMs
Isolation Low Medium high
Flexibility Low Medium high
Overhead consumption
High Medium low
Overhead latency (startup)
Low Medium High
Management Low Medium High
41
Key Distinction of Serverless
• Decouple execution and storage– Stateless functions (easy to scale, recover)– Platform state management– Key: separation is like bigtable, frangipani
• Execute code with managing resource allocation– Provider handles # of hosts, VMs, instances, and
where they run• Pay-by-use instead of resources allocation
– VM: pay by hour running, even if idle– Serverless: pay only while execute; idle is free
42