cs433/533 computer networks lecture 12 cdn 2/16/2012 1
TRANSCRIPT
CS433/533Computer Networks
Lecture 12
CDN
2/16/2012
1
Admin
Programming assignment 1 status
2
Recap: High-Performance Network Servers
Avoid blocking (so that we can reach bottleneck throughput) Introduce threads
Limit unlimited thread overhead Thread pool, async io
Coordinating data access synchronization (lock, synchronized)
Coordinating behavior: avoid busy-wait Wait/notify; FSM
Extensibility/robustness Language support/Design for interfaces
3
Recap: Operational Laws
Utilization law: U = XS Forced flow law: Xi = Vi X Bottleneck device: largest Di = Vi Si Little’s Law: Qi = Xi Ri Bottleneck analysis:
4
},min{)(max
1ZDN
DNX
},max{)( max ZNDDNR
Recap: Why Multiple Servers?
Scalability beyond single server capability and geolocation of a
single server
Redundancy and fault tolerance Administration/maintenance (e.g., incremental upgrade) Redundancy (e.g., to handle failures)
System/software architecture Resources may be naturally distributed at different
machines (e.g., run a single copy of a database server due to single license; access to resource from third party)
Security (e.g., front end, business logic, and database)
5
Recap: Load Direction: Basic Architecture
Four components Server state
monitoring• Load (incl. failed or
not); what requests it can serve
Path properties between clients and servers
• E.g., Bw, delay, loss, network cost
Server selection alg.• Alg. to choose site(s) and
server(s)
Server direction mechanism
• Inform/direct a client to chosen server(s)
6
InternetInternet
Client
Site A Site B
?
Recap: Load Direction
7
server state
path propertybetween
servers/clients
serverselectionalgorithm
specificrequest ofa client
notifyclient
about selection(direction
mechanism)
Basic Direction Mechanisms
Implicit IP anycast
• Same IP address shared by multiple servers and announced at different parts of the Internet. Network directs different clients to different servers (e.g., Limelight)
Load balancer (smart switch) indirection Reverse proxy
Explicit Mirror/server listing: client is given a list of candidate
DNS names DNS name resolution gives a list of server addresses A single server IP address may be a virtual IP address
for a cluster of physical servers (smart switch)
8
Direction Mechanisms are Often Combined
9
DNS name1
IP1 IP2 IPn
Cluster1in US East
Cluster2in US West
Load balancer
Load balancer
proxy
Cluster2in Europe
Load balancer
Load balancer
servers
DNS name2
Example: Netflix
10
Example: Netflix Manifest File
11
Client player authenticate and then downloads manifest file from servers at Amazon Cloud
Example: Netflix Manifest File
12
Example: wikipedia architecture
13http://wikitech.wikimedia.org/images/8/81/Bergsma_-_Wikimedia_architecture_-_2007.pdf
DNS Indirection and Rotation
14
157.166.226.25
router
DNS serverfor cnn.com
IP addressof cnn.com
157.166.226.25157.166.226.26
IP addressof cnn.com
157.166.226.26157.166.226.25
157.166.226.26
157.166.255.18
Example: Amazon Elastic Load Balancing
Use the elb-create-lb command to create an Elastic Load Balancer.
Use the elb-register-instances-with-lb command to register the Amazon EC2 instances that you want to load balance with the Elastic Load Balancer.
Elastic Load Balancing automatically checks the health of your load balancing Amazon EC2 instances. You can optionally customize the health checks by using the elb-configure-healthcheck command.
Traffic to the DNS name provided by the Elastic Load Balancer is automatically distributed across your load balanced, healthy Amazon EC2 instances.
15http://aws.amazon.com/documentation/elasticloadbalancing/
Details: Step 11. Call CreateLoadBalancer with the following parameters:
AvailabilityZones = us-east-1a Listeners
• Protocol = HTTP• InstancePort = 8080• LoadBalancerPort = 80• LoadBalancerName = MyLoadBalancer
The operation returns the DNS name of your LoadBalancer. You can then map that to any other domain name (such as www.mywebsite.com) using a CNAME or some other technique.
PROMPT> elb-create-lb MyLoadBalancer --headers --listener "lb-port=80,instance-port=8080,protocol=HTTP" --availability-zones us-east-1a
Result:DNS-NAME DNS-NAME DNS-NAME MyLoadBalancer-2111276808.us-east-1.elb.amazonaws.com
16http://docs.amazonwebservices.com/ElasticLoadBalancing/latest/DeveloperGuide/
Details: Step 2
2. Call ConfigureHealthCheck with the following parameters: LoadBalancerName = MyLoadBalancer Target = http:8080/ping
• NoteMake sure your instances respond to /ping on port 8080 with an HTTP 200 status code.
Interval = 30 Timeout = 3 HealthyThreshold = 2 UnhealthyThreshold = 2
PROMPT> elb-configure-healthcheck MyLoadBalancer --headers --target "HTTP:8080/ping" --interval 30 --timeout 3 --unhealthy-threshold 2 --healthy-threshold 2
Result:HEALTH-CHECK TARGET INTERVAL TIMEOUT HEALTHY-THRESHOLD UNHEALTHY-THRESHOLDHEALTH-CHECK HTTP:8080/ping 30 3 2 2
17
Details: Step 3
3. Call RegisterInstancesWithLoadBalancer with the following parameters: LoadBalancerName = MyLoadBalancer Instances = [ i-4f8cf126, i-0bb7ca62 ]
PROMPT> elb-register-instances-with-lb MyLoadBalancer --headers --instances i-4f8cf126,i-0bb7ca62
Result:INSTANCE INSTANCE-ID INSTANCE i-4f8cf126 INSTANCE i-0bb7ca62
18
Discussion
Advantages and disadvantages of using DNS
19
Clustering with VIP: Basic Idea Clients get a single service IP address,
called virtual IP address (VIP) A virtual server (also referred to as load
balancer, vserver or smart switch) listens at VIP address and port
A virtual server is bound to a number of physical servers running in a server farm
A client sends a request to the virtual server, which in turn selects a physical server in the server farm and directs this request to the selected physical server
20
VIP Clustering
server array
Clients
L4: TCPL7: HTTP
SSLetc.
Goalsserver load balancingfailure detectionaccess control filteringpriorities/QoSrequest localitytransparent caching smart
switch
virtual IP addresses
(VIPs)
What to switch/filter on?L3 source IP and/or VIPL4 (TCP) ports etc.L7 URLs and/or cookiesL7 SSL session IDs
Big Picture
22
Load Balancer (LB): Basic Structure
23
LBClient
Server1
Server2
Server3
Problem of the basic structure?
VIP
RIP1
RIP2
RIP3
D=VIPS=client
Problem
Client to server packet has VIP as destination address, but real servers use RIPso if LB just forwards the packet from client to a real
server, the real server may drop the packeto Reply from real server to client has real server IP as
source -> client will drop the packet
24
Real Server TCP socket space
state: listeningaddress: {*.6789, *:*}completed connection queue: C1; C2 sendbuf:recvbuf:
state: establishedaddress: {128.36.232.5:6789, 198.69.10.10.1500}
sendbuf: recvbuf:
state: establishedaddress: {128.36.232.5:6789, 198.69.10.10.1500}
sendbuf:recvbuf:
…
…
D=VIPS=client
Solution 1: Network Address Translation (NAT)
LB does rewriting/translation
Thus, the LB is similar to a typical NAT gateway with an additional scheduling function
25
Load Balancer
Example Virtual Server via NAT
LB/NAT Flow
27
LB/NAT Flow
28
SLB/NAT Flow: Details
1. When a user accesses a virtual service provided by the server cluster, a request packet destined for the virtual IP address (the IP address to accept requests for virtual service) arrives at the load balancer.
2. The load balancer examines the packet's destination address and port number. If they match a virtual service in the virtual server rule table, a real server is selected from the cluster by a scheduling algorithm and the connection is added to hash table that records connections. Then, the destination address and the port of the packet are rewritten to those of the selected server, and the packet is forwarded to the server. When an incoming packet belongs to an established connection, the connection can be found in the hash table and the packet is rewritten and forwarded to the right server.
3. The request is processed by one of the physical servers. 4. When response packets come back, the load balancer
rewrites the source address and port of the packets to those of the virtual service. When a connection terminates or timeouts, the connection record is removed from the hash table.
5. A reply is sent back to the user.
29
LB/NAT Advantages and Disadvantages Advantages:
o Only one public IP address is needed for the load balancer; real servers can use private IP addresses
o Real servers need no change and are not aware of load balancing
Problemo The load balancer must on the critical patho The load balancer may become the bottleneck
due to load to rewrite request and response packets
• Typically, rewriting responses has a lot more load because there are typically a lot more response packets
LB with Direct Reply
31
LBClient
Server1
Server2
Server3
Direct reply
VIP
VIP
Each real server uses VIP as its IP address
LB/DR Architecture
load balancer
Connectedby a single
switch
Why IP Address Matters?
Each network interface card listens to an assigned MAC address A router is configured with the range of IP addresses connected
to each interface (NIC) To send to a device with a given IP, the router needs to translate
IP to MAC (device) address The translation is done by the Address Resolution Protocol (ARP)
33
VIP
34
ARP Protocol
ARP is “plug-and-play”:o nodes create their ARP tables without
intervention from net administrator
A broadcast protocol: o Router broadcasts query frame, containing
queried IP address • all machines on LAN receive ARP query
o Node with queried IP receives ARP frame, replies its MAC address
ARP in Action
35
VIP
- Router broadcasts ARP broadcast query: who has VIP?
- ARP reply from LB: I have VIP; my MAC is MACLB
- Data packet from R to LB: destination MAC = MACLB
Router R
D=VIPS=client
LB/DR Problem
36
VIP VIP VIPVIP
ARP and race condition:• When router R gets a packet with dest. address VIP, it broadcasts an Address Resolution Protocol (ARP) request: who has VIP?• One of the real servers may reply before load balancer
Solution: configure real servers to not respond to ARP request
Router R
LB via Direct Routing
The virtual IP address is shared by real servers and the load balancer.
Each real server has a non-ARPing, loopback alias interface configured with the virtual IP address, and the load balancer has an interface configured with the virtual IP address to accept incoming packets.
The workflow of LB/DR is similar to that of LB/NAT: o the load balancer directly routes a packet to the selected
server • the load balancer simply changes the MAC address of the data frame to
that of the server and retransmits it on the LAN (how to know the real server’s MAC?)
o When the server receives the forwarded packet, the server determines that the packet is for the address on its loopback alias interface, processes the request, and finally returns the result directly to the user
LB/DR Advantages and Disadvantages Advantages:
o Real servers send response packets to clients directly, avoiding LB as bottleneck
Disadvantages:o Servers must have non-arp alias interfaceo The load balancer and server must have
one of their interfaces in the same LAN segment
Example Implementation of LB
An example open source implementation is Linux virtual server (linux-vs.org)
• Used by– www.linux.com
– sourceforge.net
– wikipedia.org
• More details on ARP problem: http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.arp_problem.html
o Many commercial LB servers from F5, Cisco, …
More details please read chapter 2 of Load Balancing Servers, Firewalls, and Caches
39
Question to Think About
How do you test if Amazon ELB uses LB/NAT or LB/DR?
40
Discussion: Problem of theLoad Balancer Architecture
41
LBClient
Server1
Server2
Server3
A major remaining problem is that the LB becomes a single point of failure (SPOF).
VIPD=VIPS=client
Solutions
Redundant load balancerso E.g., two load balancers
Fully distributed load balancingo e.g., Microsoft Network Load Balancing
(NLB)
42
Microsoft NLB
No dedicated load balancer All servers in the cluster receive all packets All servers within the cluster simultaneously run
a mapping algorithm to determine which server should handle the packet. Those servers not required to service the packet simply discard it.
Mapping (ranking) algorithm: computing the “winning” server according to host priorities, multicast or unicast mode, port rules, affinity, load percentage distribution, client IP address, client port number, other internal load information
43
http://technet.microsoft.com/en-us/library/cc739506%28WS.10%29.aspx
Discussion
Compare the design of using Load Balancer vs Microsoft NLB
44
45
Forward Cache/Proxy
Web caches/proxy placed at entrance of an ISP
Client sends all http requests to web cache if object at web
cache, web cache immediately returns object in http response
else requests object from origin server, then returns http response to client
client
Proxyserver
client
http request
http re
quest
http response
http re
sponse
http re
quest
http re
sponse
http requesthttp response
origin server
origin server
Forward Web Proxy/Cache
Web caches give good performance because very often a single client
repeatedly accesses the same document
a nearby client also accesses the same document
Cache Hit ratio increases logarithmically with number of users
46
app. server
C0
client 1
client 2 client
3
ISP cache
client 4
client 5
client 6
ISP cache
47
Benefits of Forward Web Caching
Assume: cache is “close” to client (e.g., in same network)
smaller response time: cache “closer” to client
decrease traffic to distant servers link out of
institutional/local ISP network often bottleneck
originservers
public Internet
institutionalnetwork 10 Mbps LAN
1.5 Mbps access link
institutionalcache
What Went Wrong with Forward Web Caches? Web protocols evolved extensively to
accommodate caching, e.g. HTTP 1.1 However, Web caching was developed with a
strong ISP perspective, leaving content providers out of the picture It is the ISP who places a cache and controls it ISPs only interest to use Web caches is to reduce
bandwidth
48
Outline
Load direction/distributiono Basic load direction mechanismso Path properties (to be covered later)o Case studies: Content Distribution
Networks, Akamai, and YouTube
49
Content Distribution Networks Content Distribution Networks (CDNs)
provide examples of Internet-scale load distribution for content publishers
CDN Design Perspective Performance scalability (high throughput,
going beyond single server throughput) and Geographic scalability (low propagation
latency, going to close-by servers) Low cost operation
50
Akamai
Akamai – original and largest commercial CDN operates around 91,000 servers in over 1,000 networks in 70 countries
Akamai (AH kuh my) is Hawaiian for intelligent, clever and informally “cool”. Founded Apr 99, Boston MA by MIT students
Akamai evolution:o Files/streaming (our focus at this moment)o Secure pages and whole pageso Dynamic page assembly at the edge (ESI)o Distributed applications
51
Akamai Scalability Bottleneck
52See Akamai 2009 investor analysts meeting
Basic of Akamai Architecture
Content publisher (e.g., CNN, NYTimes)o provides base HTML documentso runs origin server(s)
Akamai runs o edge servers for hosting content
• Deep deployment into 1000 networks
o customized DNS redirection servers to select edge servers based on
• closeness to client browser• server load
53
Linking to Akamai
Originally, URL Akamaization of embedded content: e.g.,<IMG SRC= http://www.provider.com/image.gif >
changed to <IMGSRC = http://a661. g.akamai.net/hash/image.gif>
URL Akamaization is becoming obsolete and supported mostly for legacy reasonso Currently most content publishers prefer to use
DNS CNAME to link to Akamai servers• a CNAME is an alias
54
Note that this DNS redirection unit is per customer, not individual files.
55
Akamai Load Direction Flow
Internet
Web client
Hierarchy of CDN DNS servers
Customer DNS servers
(1)
(2)
(3)
(4)
(5)(6)
LDNSClient requests site
Client gets CNAME entry with domain name in Akamai
Client is given 2 nearby web replica servers (fault tolerance)
Web replica serversMultiple redirections to find nearby edge servers
More details see “Global hosting system”: FT Leighton, DM Lewin – US Patent 6,108,703, 2000.
Exercise: Zoo machine
Check any web page of New York Times and find a page with an image
Find the URL Use %dig +trace +recurse to see Akamai load direction
56
Akamai Load Direction
57
Akamai Load Redirection Framework
58
edge servers
clients
If the directed edge server does not have requested content,the edge server goes back to the original server (source) .
Load Direction Formulation: Input Potential related input:
o p(m, e): path properties (from a client site m to an edge sever e)
• Akamai might use a one-hop detour routing (see akamai-detour.pdf)
o akm: request arrival rate from client site m to
publisher ko uk: service rate for requests for publisher k
o xe: load on edge server eo caching state of a server e
59
Load Direction Formulation
Details of Akamai algorithms are proprietary
So what we discuss is our formulation and the measurements of some researchers.
60
akYale ak
ATT1
Edge servers
Request client sites
Load Direction: Control Parameters Control interval: T Mapping from client
sites to serverso server pool Sm
k(t): the pool of edge servers that can be assigned to client site m to publisher k, at time t
61
akYale ak
ATT1
Load Direction: Comments An algorithm (column 12 of Akamai Patent) at
a local DNS servero Compute the load to each publisher k (called serial
number)o Sort the publishers from increasing loado For each publisher, associate a list of random servers
generated by a hash functiono Assign the publisher to the first server that does not
overload
We can formulate more complex versions
62
63
Experimental Study of Akamai Load Balancing Methodology
o 2-months long measuremento 140 PlanetLab nodes (clients)
• 50 US and Canada, 35 Europe, 18 Asia, 8 South America, the rest randomly scattered
o Every 20 sec, each client queries an appropriate CNAME for Yahoo, CNN, Fox News, NY Times, etc.
Akamai Low-LevelDNS Server
AkamaiWeb replica 1 Akamai
Web replica 2
AkamaiWeb replica 3
.……
Web client
See http://www.aqualab.cs.northwestern.edu/publications/Ajsu06DBA.pdf
Client 2: Purdue
Web
rep
lica
IDs
06/1/05 16:16
64
Server Pool: to Yahoo
day
night
Client 1: Berkeley
Web
rep
lica
IDs
Target: a943.x.a.yimg.com (Yahoo)
Server Pool (to Yahoo)
65
66
Server Pool: Multiple Akamai Hosted Sites
Num
ber
of A
kam
ai W
eb R
eplic
as
Clients
67
Load Balancing Dynamics
Berkeley Brazil
Korea
68
Redirection Effectiveness: Measurement Methodology
Planet Lab Node
Akamai Low-LevelDNS Server
9 Best Akamai Replica Servers………
ping
ping ping ping
69
Do redirections reveal network conditions?
Rank = r1+r2-1o 16 means perfect correlation o 0 means poor correlation
Brazil is poor
MIT and Amsterdam are excellent
70
Server Diversity for Yahoo
Good overlay-to-CDN mapping candidates
Majority of PL nodessee between 10 and 50 Akamai edge-servers
Nodes far away from Akamaihot-spots
Akamai Streaming Architecture
71
A content publisher (e.g., a radio or a TV station) encodes streams and transfer themto entry points
When a user watches a stream from an edge server, the server subscribes to a reflector
Group a set of streams (e.g., some popular some not) into a bucket called a portset. A set ofreflectors will distribute a given portset.
Compare with Web architecture.
Akamai Streaming: Resource Naming
Each unique stream is encoded by an URL called Akamai Resource Locator (ARL)
72
mms://a1897.l3072828839.c30728.g.lm.akamaistream.net/D/1897/30728/v0001/reflector:28839
Windows
media player
portsetStream ID
LiveMediaservice
Customer# (NBA)
Akamai Streaming Load Direction
From ARL to edge servero Similar to web direction
From edge server to reflectoro if (stream is active) then forward to clientelse if (VoD) then fetch from original serverelse using Akamai DNS to query portset+region code
73
Streaming Redirection Interval
- 40% use 30 sec- 10% does not have any redirection (default edge server
cluster in Boston 72.246.103.0/24 and 72.247.145.0/24)
74
Overlapping of Servers
75
Testing Akamai Streaming Load Balancing
(a) Add 7 probing machines to the same edge server(b) Observe slow down(c) Notice that Akamai removed the edge server from DNS;
probing machines stop 76
You Tube
02/2005: Founded by Chad Hurley, Steve Chen and Jawed Karim, who were all early employees of PayPal.
10/2005: First round of funding ($11.5 M) 03/2006: 30 M video views/day 07/2006: 100 M video views/day 11/2006: acquired by Google 10/2009: Chad Hurley announced in a blog
that YouTube serving well over 1 B video views/day (avg = 11,574 video views /sec )
77
http://video.google.com/videoplay?docid=-6304964351441328559#
Pre-Google Team Size
2 Sysadmins 2 Scalability software architects 2 feature developers 2 network engineers 1 DBA 0 chefs
78
YouTube Design Flow
while (true){ identify_and_fix_bottlenecks(); drink(); sleep(); notice_new_bottleneck();}
79
YouTube Major Components
Web servers
Video servers
Thumbnail servers
Database serverso Will cover the social networking/database
bottleneck/consistency issues later in the course
80
YouTube: Web Servers
Components Netscaler load balancer;
Apache; Python App Servers; Databases
Python Web code (CPU) is not
bottleneck JIT to C to speedup C extensions Pre-generate HTML responses
Development speed more important
81
NetScaler
Apache
PythonApp Server
Webservers
Databases
YouTube: Video Server
82
See “Statistics and Social Network of YouTube Videos”, 2008.
YouTube: Video Popularity
83
See “Statistics and Social Network of YouTube Videos”, 2008.
YouTube: Video Popularity
84
See “Statistics and Social Network of YouTube Videos”, 2008.
How to designa system to handle highly skewed distribution?
YouTube: Video Server Architecture
Tiered architectureo CDN servers (for popular videos)
• Low delay; mostly in-memory operation
o YouTube servers (not popular 1-20 per day)
85
Request
CDN
Most popular
Others
YouTubeColo 1
YouTubeColo N
YouTube Redirection Architecture
86
YouTube servers
YouTube Video Servers
Each video hosted by a mini-cluster consisting of multiple machines
Video servers use the lighttpd web server for video transmission: Apache had too much overhead (used in the first few
months and then dropped)
Async io: uses epoll to wait on multiple fds
Switched from single process to multiple process configuration to handle more connections
87
Thumbnail Servers
Thumbnails are served by a few machines
Problems running thumbnail serverso A high number of requests/sec as web
pages can display 60 thumbnails on pageo Serving a lot of small objects implies
• lots of disk seeks and problems with file systems inode and page caches
• may ran into per directory file limit
• Solution: storage switched to Google BigTable (we will cover this later)
88
Thumbnail Server Software Architecture Design 1: Squid in front of Apache
o Problems• Squid worked for a while, but as load increased
performance eventually decreased: Went from 300 requests/second to 20
• under high loads Apache performed badly, changed to lighttpd
Design 2: lighttpd by default (By default lighttpd uses a single thread)o Problem: often stalled due to I/O
Design 3: switched to multiple processes contending on shared accepto Problems: high contention overhead/individual
caches89
Thumbnails Server: lighttpd/aio
90
Discussion: Problems of Traditional Content Distribution
91
app. server
C0
client 1
client 2
client 3
client n
DNS
92
Objectives of P2P
Share the resources (storage and bandwidth) of individual clients to improve scalability/robustness
Bypass DNS to find clients with resources! examples: instant
messaging, skype
Internet
P2P
But P2P is not new
Original Internet was a p2p system: The original ARPANET connected UCLA,
Stanford Research Institute, UCSB, and Univ. of Utah
No DNS or routing infrastructure, just connected by phone lines
Computers also served as routers
P2P is simply an iteration of scalable distributed systems
Backup Slides
94