dns and cdns (content distribution networks) paul francis cornell computer science
TRANSCRIPT
DNS and CDNs (Content Distribution Networks)
Paul Francis
Cornell Computer Science
What do all of these have in common?
http://www.cnn.com/news/story.htmlHTTP (web)
mailto://[email protected]
sip://[email protected] (Session Initiation Protocol)
They all have a DNS name somewhere
http://www.cnn.com/news/story.htmlHTTP (web)
sip://[email protected] (Session Initiation Protocol)
Why is DNS so important?
Names are easier to remember than IP [email protected] ???
And in any event, IP addresses are not “dependable”They change often (dialup)They are not all unique
DNS is the “core” of the Internet
So “we” (humans, and applications) like to deal with dependable, stable, friendly DNS names
The names get “mapped” into IP addresses by lower layersBy the Domain Name System (DNS)
Then the learned IP address is put into packets, and IP routing gets the packets across the Internet
Picture of DNS query/reply
Why all these dots?
Why falcon.cs.cornell.edu? Why not “cornell-falcon” or
something?
It wasn’t always that way
Twenty years ago, this was a valid email address:george@isi
How did my computer learn the IP address of “isi”?
The “host table” and DNS
Before DNS, there was the host table This was a complete list of all the hosts in
the Internet! It was copied every night to every machine
on the Internet! At some point, this was perceived as a
potential scaling bottleneck… So a distributed directory called the
“Domain Name System” was invented (DNS)
The host table (historic)
Host Name IP Address
mit-dlab 133.65.14.77
isi-mail 24.72.188.13
mit-lcs 133.65.29.1
… …
Distributed Directory
A primary goal of DNS was to have a distributed “host table”, so that each site could manage its own name-to-address mapping
But also, it should scale well!
DNS is simple but powerful
Only one type of query Query(domain name, RR type)
• Resource Record (RR) type is like an attribute type
Answer(values, additional RRs) Limited number of RR types Hard to make new RR types
Not for technical reasons… Rather because each requires global
agreement
DNS is the core of the Internet
Global name spaceCan be the core of a naming or
identifying scheme Global directory service
Can resolve a name to nearly every computer on the planet
Important DNS RR types
NS: Points to next Name Server down the tree A: Contains the IP address
AAAA for IPv6 MX: Contains the name of the mail server Service-oriented RR types
SRV: Contains addresses and ports of services on servers
• One way to learn what port number to use NAPTR: Essentially a generalized mapping from one
name space (i.e. phone numbers) to another (i.e. SIP URL)
DNS tree structure
.
edu.
cornell.edu.
cs.cornell.edu.
com. jp. us.
cmu.edu. mit.edu.
eng.cornell.edu.
foo.cs.cornell.edu A 10.1.1.1bar.cs.cornell.edu A 10.1.1.1
NS RR “pointers”
Primary and secondary servers
cornell.edu.
cs.cornell.edu.
NS RRs point to both primary and secondary servers
RRs are initially configured into primary server
Primary server replicates RRs onto secondary servers periodically(updates are incremental)
Resolver structure and configuration
.
edu.
cornell.edu.
cs.cornell.edu.
com. jp.
cmu.edu.
eng.cornell.edu.
Static configurationof root servers
Stub resolver resides on client
host, points to configured recursive
server
Resolver manages DNS queries on behalf of stub resolvers
Resolver structure and configuration
.
edu.
cornell.edu.
cs.cornell.edu.
com. jp.
cmu.edu.
eng.cornell.edu.
1. Stub resolver sends recursive query
2,3,4… Resolver makes iterative queries to servers
N. Resolver returns final answer to stub resolver (which also caches result)
Resolver caches results for efficiency
DNS cache management
All RRs have Time-to-live (TTL) values When TTL expires, cache entries are
removed NS RRs tend to have long TTLs
Cached for a long time Reduces load on higher level servers
A RRs may have very short TTLs Order one minute for some web services Order one day for typical hosts
Caching is the key to performance
Without caching, the small number of machines at the top of the hierarchy would be overwhelmed
But what if you want to change the IP address of a host? How do you change all those cached entries around the world?You can’t…you wait until they timeout
on their own, then make your change
Changing a DNS name
Say your TTL was set to one day This means that even if you change DNS
now, some hosts will continue to use the old address for a day
So, give the host two IP addresses for a while (the old one and the new one) But DNS only answers with the new one
After a day, the old one is cleaned out of caches, and you can remove it from the host
DNS Issues
DoS attacks on (13) root serversDoS = Denial of Service
Mis-configuration issues But on the whole DNS is an incredible
system, and is in many important respects is the “core” of the Internethttp://www.cnn.com/[email protected]
Next, Content Distribution Networks
Idea here is to replicate a “web server” in many places over the InternetLatency to a single centralized web
server farm may be too highA centralized web server farm may fail
Content Routing Principle(a.k.a. Content Distribution Network)
S
ISP
BackboneISP
IX IX
S S
Site
S
ISP
S S S
ISP
S S
BackboneISP
BackboneISP
HostingCenter
HostingCenter
Sites
Content Routing Principle(a.k.a. Content Distribution Network)
S
ISP
BackboneISP
IX IX
S S
Site
S
ISP
S S S
ISP
S S
BackboneISP
BackboneISP
HostingCenter
HostingCenter
Sites
CS CS CS
CS
CS
Content Origin hereat Origin Server
Content Servers distributed
throughout the Internet
OS
Content Routing Principle(a.k.a. Content Distribution Network)
S
ISP
BackboneISP
IX IX
S S
Site
S
ISP
S S S
ISP
S S
BackboneISP
BackboneISP
HostingCenter
HostingCenter
Sites
CS CS CS
CS
CS
Content is served from content
servers nearer to the client
CC
OS
Two basic types of CDN: cached and pushed
S
ISP
BackboneISP
IX IX
S S
Site
S
ISP
S S S
ISP
S S
BackboneISP
BackboneISP
HostingCenter
HostingCenter
Sites
CS CS CS
CS
CS
C C
OS
Cached CDN
S
ISP
BackboneISP
IX IX
S S
Site
S
ISP
S S S
ISP
S S
BackboneISP
BackboneISP
HostingCenter
HostingCenter
Sites
CS CS CS
CS
CS
1. Client requests content.
C C
OS
Cached CDN
S
ISP
BackboneISP
IX IX
S S
Site
S
ISP
S S S
ISP
S S
BackboneISP
BackboneISP
HostingCenter
HostingCenter
Sites
CS CS CS
CS
CS
1. Client requests content.
2. CS checks cache, if miss gets content from origin server.
C C
OS
Cached CDN
S
ISP
BackboneISP
IX IX
S S
Site
S
ISP
S S S
ISP
S S
BackboneISP
BackboneISP
HostingCenter
HostingCenter
Sites
CS CS CS
CS
CS
1. Client requests content.
2. CS checks cache, if miss gets content from origin server.
3. CS caches content, delivers to client.
C C
OS
Cached CDN
S
ISP
BackboneISP
IX IX
S S
Site
S
ISP
S S S
ISP
S S
BackboneISP
BackboneISP
HostingCenter
HostingCenter
Sites
CS CS CS
CS
CS
1. Client requests content.
2. CS checks cache, if miss gets content from origin server.
3. CS caches content, delivers to client.
4. Delivers content out of cache on subsequent requests.
C C
OS
Pushed CDN
S
ISP
BackboneISP
IX IX
S S
Site
S
ISP
S S S
ISP
S S
BackboneISP
BackboneISP
HostingCenter
HostingCenter
Sites
CS CS CS
CS
CS
1. Origin Server pushes content out to all CSs.
C
OS
C
Pushed CDN
S
ISP
BackboneISP
IX IX
S S
Site
S
ISP
S S S
ISP
S S
BackboneISP
BackboneISP
HostingCenter
HostingCenter
Sites
CS CS CS
CS
CS
1. Origin Server pushes content out to all CSs.
2. Request served from CSs.
C C
OS
CDN benefits
Content served closer to client Less latency, better performance
Load spread over multiple distributed CSs More robust (to ISP failure as well as other
failures) Handle flashes better (load spread over
ISPs) But well-connected, replicated Hosting
Centers can do this too
CDN costs and limitations
Cached CDNs can’t deal with dynamic/personalized content More and more content is dynamic “Classic” CDNs limited to images
Managing content distribution is non-trivial Tension between content lifetimes and
cache performance Dynamic cache invalidation Keeping pushed content synchronized and
current
What if lots of clients try to access the same CS?
S
ISP
BackboneISP
IX IX
S S
Site
S
ISP
S S S
ISP
S S
BackboneISP
BackboneISP
HostingCenter
HostingCenter
Sites
CS CS CS
CS
CS
C C
OS
C CCC
How can the CDN spread this load around?
S
ISP
BackboneISP
IX IX
S S
Site
S
ISP
S S S
ISP
S S
BackboneISP
BackboneISP
HostingCenter
HostingCenter
Sites
CS CS CS
CS
CS
C C
OS
C CCC
Guess what: DNS!
Smart DNS server monitors load on the content servers
When it answers a DNS request, it picks a server that is not overloaded (and near the client)
The DNS answer has a small TTL (30 seconds – one minute) Small TTL allows the DNS load balancer to
make fine-grained load decisions Can quickly offload a busy or even crashed
content server
How well do CDNs work?
Hard to say… Some evidence suggests they are not so
good a picking nearby servers Internet bandwidth is improving, so not as
important to pick nearby servers Central hosting centers are easier to
manage, and perform increasingly well In fact, Akamai is beginning to find it difficult
to justify its service!