advanced(computer(networks(( - systems group · advanced(computer(networks((263350100 ......
TRANSCRIPT
Advanced Computer Networks 263-‐3501-‐00
Layer-‐7 switching Patrick Stuedi
Spring Semester 2014
© Oriana Riva, Department of Computer Science | ETH Zürich
Outline
• Last Fme
– Datacenter TCP • Today
– L7 Switching
3 Slides adapted from Prof. Roscoe
Course overview
covered in basic ETH “Opera3ng Systems and Networks” course
Wireless networking technologies
Datacenter networking
We are now here: accessing the datacenter
4 Slides adapted from Prof. Roscoe
Challenge: accessing services
• Large web applicaFons are typically replicated over several data centers
• Within a data center, applicaFons share many machines
So:
• What address does, e.g. www.search.ch resolve to?
• What enFty does this address refer to?
• What does this enFty do?
5 Slides adapted from Prof. Roscoe
Requirements
• “Close by” datacenter • Load balance across machines in a center
• Target machines where the user’s state is kept
• Accessed using TCP (HTTP, SSL, …)
6 Slides adapted from Prof. Roscoe
OpFon 1: IP Anycast
• One IP address refers to mulFple desFnaFons
– BGP adverFzes mulFple desFnaFons – Packets end up at “nearest” AS to source.
• Problems:
IP layer ⇒ only reliable for stateless protocols (UDP) All packets of a TCP flow must go to the same machine
Service locaFon pushed into BGP ⇒ couples rouFng with end-‐system provision
7 Slides adapted from Prof. Roscoe
OpFon 2: DNS
• Insight: who says the answer is always the same?
• Idea: “smart” DNS server authoritaFve for service
Query for, e.g.. www.google.com or www.bing.com returns a different “A” record depending on:
– Source address of browser machine – Current state of the service
• Load • Failures
– A random number
8 Slides adapted from Prof. Roscoe
Using CNAMEs
9 Slides adapted from Prof. Roscoe
First DNS resolver returns CNAME
Using CNAMEs
10 Slides adapted from Prof. Roscoe
Regional service resolver can be more specific
Using CNAMEs
11 Slides adapted from Prof. Roscoe
Fmeout
DNS does not solve the problem
Need IP address for every instance of the service
100,000 machines ⇒ 100,000 globally routable IP addresses – expensive!
Machine fails ⇒ need to update DNS state
DNS state changes rapidly ⇒ short TTL on queries ⇒ even higher load on DNS servers
12 Slides adapted from Prof. Roscoe
Datacenter
Next step: use 1 IP address
• Use Network Address TranslaFon • Hash source addresses to server machines
Internet
173.194.35.19
10.1.1.1
10.1.1.2
10.1.1.3
10.1.1.4
10.1.1.5
10.1.1.6
10.1.1.7
IP addr. A
IP addr. B
13 Slides adapted from Prof. Roscoe
Datacenter
Next step: use 1 IP address
• Use Network Address TranslaFon • Hash source addresses to server machines
Internet
173.194.35.19
10.1.1.1
10.1.1.2
10.1.1.3
10.1.1.4
10.1.1.5
10.1.1.6
10.1.1.7
IP addr. A
IP addr. B
Hash(A) = 6
14 Slides adapted from Prof. Roscoe
Datacenter
Next step: use 1 IP address
• Use Network Address TranslaFon • Hash source addresses to server machines
Internet
173.194.35.19
10.1.1.1
10.1.1.2
10.1.1.3
10.1.1.4
10.1.1.5
10.1.1.6
10.1.1.7
IP addr. A
IP addr. B
Hash(B) = 2
Hash(A) = 6
15 Slides adapted from Prof. Roscoe
Stateless hashing
Hash(Source IP)
• Completely staFc – No dynamic load balancing
Hash(Source IP, Source TCP port)
• Bejer, but sFll staFc – Limited to 64k desFnaFons per client machine
• Known as a “Layer-‐4 load balancer”
16 Slides adapted from Prof. Roscoe
Stateless hashing
Hash(Source IP)
• Completely staFc – No dynamic load balancing
Hash(Source IP, Source TCP port)
• Bejer, but sFll staFc – Limited to 64k desFnaFons per client machine
• Known as a “Layer-‐4 load balancer”
Basic problem: nothing else is known by the end of the handshake!
17 Slides adapted from Prof. Roscoe
Why is staFc hashing bad?
• Machine failure/upgrade/provisioning
– Can’t update hash funcFon efficiently in switch
• Load balancing – Can’t avoid a heavily-‐loaded machine
– Can’t spread load from a small group of clients
• Lack of Locality – Resource being accessed – Client accessing the resource
18 Slides adapted from Prof. Roscoe
What else might we want to hash on?
19 Slides adapted from Prof. Roscoe
What else might we want to hash on?
20 Slides adapted from Prof. Roscoe
HTTP Host: header
• Introduced in HTTP/1.1 – mandatory
• Hash based on virtual host avoids replicaFng all service state everywhere
– Different services have different virtual host
21 Slides adapted from Prof. Roscoe
What else might we want to hash on?
22 Slides adapted from Prof. Roscoe
Switching on URL
• Locality: – Allows state to be parFFoned across machines
• IsolaFon: – Rare, computaFonally intensive URLs can be sequestered
– SensiFve data can be kept on more expensive, audijed machines
23 Slides adapted from Prof. Roscoe
What else might we want to hash on?
24 Slides adapted from Prof. Roscoe
Hashing on cookies
• Enables parFoning of servers by – User state – Session state
25 Slides adapted from Prof. Roscoe
How to do it?
• Problem:
– Don’t know the hash key unFl aoer the HTTP request – Typically the first segment aoer the 3WS
• SoluFon: – Don’t establish connecFon to server unFl client has sent HTTP request
26 Slides adapted from Prof. Roscoe
Late-‐binding of TCP connecFon
27
Fme
Client Server
Port = 3620
Switch
Slides adapted from Prof. Roscoe
Late-‐binding of TCP connecFon
28
Fme
Client Server
Port = 3620
Switch
TCP connecFon setup + HTTP GET
Slides adapted from Prof. Roscoe
Late-‐binding of TCP connecFon
29
Fme
Client Server
Port = 3620
Switch
TCP connecFon setup + HTTP GET
TCP connecFon setup + HTTP GET
Slides adapted from Prof. Roscoe
Late-‐binding of TCP connecFon
30
Fme
Client Server
Port = 3620
Switch
TCP connecFon setup + HTTP GET
TCP connecFon setup + HTTP GET
HTTP response
(acks not shown)
Slides adapted from Prof. Roscoe
Late-‐binding of TCP connecFon
31
Fme
Client Server
Port = 3620
Switch
TCP connecFon setup + HTTP GET
TCP connecFon setup + HTTP GET
HTTP response
HTTP response
(acks not shown)
Slides adapted from Prof. Roscoe
Naïve implementaFon (from Maltz & Bhagwat)
c = accept() client connection;
<authenticate client>
s = socket();
connect(s) to server;
send(c) OK message;
while (1) {
read() from c, write() to s;
read() from s, write() to c;
if (c and s return EOF) {
close(c); close(s);
break;
}
}
<service next request>
32 Slides adapted from Prof. Roscoe
Inefficient: data copies between the two connecFons
TCP Splicing
• Proposed around 1997 by Maltz & Bhagwat at IBM
• Key idea: – Take two established TCP connecFons and splice them – Transfer segments unmodified between them – Remap port numbers and segment numbers on the fly
• Advantages: – Very simple calculaFon per packet – Not much state to maintain per spliced connecFon – No segmentaFon/reassembly – No buffering
33 Slides adapted from Prof. Roscoe
Splicing pseudocode (from Maltz & Bhagwat)
34 Slides adapted from Prof. Roscoe
What state is needed?
For each packet, need to do the following:
• IP header operaFons: – Rewrite source and desFnaFon IP addresses – Update IP header checksum
• TCP header operaFons: – Rewrite source and desFnaFon port numbers
– Apply fixed offset to sequence number – Apply fixed offset to acknowledgement number
– Update TCP header checksum
35 Slides adapted from Prof. Roscoe
It’s easy to do in hardware • ArrowPoint CS-‐800 Content Switch from 1998
– Acquired by Cisco soon aoer • Forwarding ASIC for TCP splicing
• Various load balancing policies – Round robin, measurement-‐based
• Failure detecFon for servers and automaFc failover – Request Fmeout, heartbeat msg
• New server added dynamically
• >16 GbE ports on each “side”
• Around 15,000 HTTP connecFon requests / second
36 Slides adapted from Prof. Roscoe
References
• “Host AnycasFng Service”, C. Partridge, T. Mendez, W. Milliken, Internet RFC 1546, November 1993.
• “TCP Splicing for ApplicaFon Layer Proxy Performance”, David A. Maltz, and Pravin Bhagwat. IBM Research Report 21139 (Computer Science/MathemaFcs), IBM Research Division, 1998.
• “Cisco Data Center Infrastructure 2.5 Design Guide”, Cisco Systems, November 2, 2011. http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/DC_Infra2_5/DCI_SRND_2_5_book.html (very markeFng oriented, and not the only way to do it, but gives an idea of the complexity!)
37 Slides adapted from Prof. Roscoe