4 pg(18-22)
TRANSCRIPT
-
8/3/2019 4 pg(18-22)
1/5
National Level Conference On Information Processing 2011
CONTENT DISTRIBUTION NETWORKSTrupti V.G [1], Rekha V [2]
1- M.Tech(CE),1st sem, 2-Asst. prof
Dept of Computer science & Engineering
SJB Institute of Technology, Bangalore[1] [email protected] [2][email protected]
Abstract- The paper introduces about Content Delivery Networks. How the traffic is reduced by distributing
the contents across several artificial servers and making the contents highly accessible and available. Success
in Internet applications involves user interactions whose quality is mainly affected by application response
time. Content Delivery Networks (CDNs) have shortly appeared as a distributed solution to serve content
faster than contacting a centralized server. It also specifies the CDN architecture which mainly has a Client,
Server and Surrogate server that form a CDN topology. A set of surrogate servers (distributed around the
world) that cache the origin servers content, Routers and network elements that deliver content requests to
the optimal location and the optimal surrogate server; and An accounting mechanism that provides logs and
information to the origin servers. Under a CDN, the client-server communication is replaced by two
communication flows: one between the client and the surrogate server, and another between the surrogate
server and the origin server. This distinction into two communication flows reduces congestion (particularly
over popular servers) and increases content distribution and availability. This report tries to describe a CDN
from a different point of view, paying much attention on the implementation process of a CDN. More
specifically, CDNs maintain multiple Points of Presence (PoP) with clusters of (the so-called surrogate)
servers that store copies of identical content, such that users requests are satisfied by the most appropriate
site.
Keywords- surrogate servers, Multicast, Computer Networking
I INTRODUCTION
By the growth of World Wide Web the traffic
increased in many of the popular websites i.e. is the
client requests were more which has to be serviced
by the server. These websites have a motivation toprovide better services.
Thus CDNs came into existence and
these have been used to provide better service. A
Content Delivery Network (CDN) is an overlay
network on top of the Internet which pushes
content closer to end users. It is achieved by
strategically placing servers, called surrogates, next
to these users and serving them the desired content.
The surrogates act typically as intelligent and
transparent proxy caches that retrieve content
previously from the origin server before
responding. As the origin server is less accessed,
backbone traffic is reduced and network bandwidthis efficiently used. Besides, load can be balanced
among the servers. CDNs has primarily focused on
techniques for efficiently redirecting user requests
to appropriate surrogates to reduce request latency
and balance load, and placement strategies to place
server replicas in order to achieve better
performance. Many CDN service providers like
Akamai offer some overview whitepapers, they
hide the real implementation as a private secret and
fundamental key of their business success. This
paper tries to provide some implementation hints to
establish a CDN basis. This paper also includes
CDN model architecture with the description of themain building blocks from an implementional point
of view. CDN attempts to reduce network latency
by avoidance of congestion paths.
CDNs maintain multiple Points of
Presence (PoP) with clusters of (the so-called
surrogate) servers that store copies of identical
content, such that users requests are satisfied bythe most appropriate site
II INSITE AND PERSPECTIVENESS OF CDNs
A Content Delivery Network (CDN) is an
overlay network on top of the Internet which
pushes content closer to end users. It is achieved by
strategically placing servers, called surrogates, next
to these users and serving them the desired content.
A. CDN TopologyCDN topology involves:
A set of surrogate servers (distributed around the
world) that cache the origin servers content; Routers and network elements that deliver content
requests to the optimal location and the optimal
surrogate server; and
An accounting mechanism that provides logs and
information to the origin servers.
Under a CDN, the client-server communication is
replaced by two communication flows: one
between the client and the surrogate server, and
another between the surrogate server and theorigin
server and increases content distribution and
availability. To maintain (worldwide) distributed
copies of identical content, the practice for a CDN
is to locate its surrogate servers within strategicdata centers (relying on multiple network
Dept of ISE, AMCEC 18
mailto:[email protected]:[email protected]:[email protected] -
8/3/2019 4 pg(18-22)
2/5
National Level Conference On Information Processing 2011
providers), over a globally distributed
infrastructure. This distinction into two
communication flows reduces congestion
(particularly over popular servers)
Some of the content providers are Akamai
Technologies which is the leading one has more
than 20000 servers over 10000 networks. Inktoni, aYahoo company provides services for load
balancing, streaming media.
B. Content Delivery Network World Wide
Organizations offering content to a
geographically distributed and potentially large
audience (such as the Web), are attracted to CDNs
and the trend for them is to sign a contract with a
CDN provider and offer their sites content over
this CDN. CDNs are widely used in the Web
community, but a fundamental problem is that the
costs involved are quite high.
Fig 1. Content Delivery Networks Overview
C.Issues involved in CDNSurrogate Servers Placement: Choosing
the best location for each surrogate server is
important for each CDN infrastructure since the
location of surrogate servers is related to important
issues in the content delivery process. Determining
the best network locations for CDN surrogate
servers (known as the Web server replica
placement problem) is critical for content
outsourcing performance and the overall content
distribution process. CDN topology is built such
that the client-perceived performance is maximized
and the infrastructures cost is minimized
Therefore, effective surrogate server placement
may reduce the number of surrogate servers needed
and the size of content (replicated on them), in an
effort to combine the high quality of services and
low CDN prices. In this context, several placement
algorithms have been proposed (such as Greedy1,
which incrementally places replicas, Hot Spot,
which places replicas near the clients generating
the greatest load, and Tree based replicas). These
algorithms specify the locations of the surrogate
servers in order to achieve improved performance
with low infrastructure cost. Earlier
experimentation has shown that the greedy
placement strategy can yield close to optimalperformance.
Content Selection: The choice of the content that
should be outsourced in order to meet customers
needs is another important issue in the content
selection problem. An obvious choice is to
outsource the entire set of origin servers objects to
other surrogate servers (the so-called entire
replication). The greatest advantage of entire
replication is its simplicity; however, such a
solution is not feasible or practical because
although disk prices are continuously dropping, the
sizes of Web objects increase as well (such asaudio or video on demand). Moreover, the problem
of updating such a huge collection of Web objects
is unmanageable. Therefore, the challenge of the
content selection problem is to find a sophisticated
management strategy for replication of Web
content.
A typical way is to group Web content based on
either correlation or access frequency and then
replicate objects in units of content clusters.
Two types of content clustering have been
proposed:
Users sessions-based: The content of the Web log
files3 is exploited in order to group together a setof users navigation sessions showing similar
characteristics. Clustering users sessions is useful
for discovering both groups of users exhibiting
similar browsing patterns and groups of pages
having related content based on how often URL
references occur together across them.
URL-based: Web content is clustered using the
Web site topology (which is considered as a
directed graph), where Web pages are vertices and
hyperlinks are arcs. The Web pages (URLs) are
clustered by eliminating arcs between dissimilar
pages. The most popular objects from a Web site
are identified, (the so-called hot data), and
replicated in units of clusters where the correlation
distance between every pair of URLs is based on a
certain correlation metric. Furthermore, several
coarse-grain dynamic replication schemes are used.
By using these replication schemes, the
performance of the Web services can be
significantly improved.
D. CDN Pricing
Commercial-oriented Web sites turn to
CDNs to contend with the high traffic problems
while providing high data quality and increased
security for theirs clients in order to increase their
Dept of ISE, AMCEC 19
-
8/3/2019 4 pg(18-22)
3/5
National Level Conference On Information Processing 2011
profit and popularity. CDN providers charge their
customersowners of Web sitesaccording to
their traffic (delivered by their surrogate servers to
the clients).
The most indicative factors affecting the pricing of
CDN services include: Bandwidth cost;
Variation in traffic distribution;
Size of content replicated over surrogate servers;
Number of surrogate servers;
Reliability and stability of the whole system; and
Security issues of outsourcing content delivery.
E. Meet CDN User Preferences:Meeting the user preferences is crucial for CDNs,
adopting a content management task by which the
content is personalized to meet the specific needs
of each individual user. User preferences are learnt
from web usage data by using data miningtechniques. Some indicative objectives of content
personalization over CDNs are
Deliver the appropriate content to the
interested users in a timely, scalable, and
cost-effective manner;
Increase the quality of the published
content by ensuring it is accurate,
authorized, updated, easily searched and
retrieved, as well as personalized
according to various users and audiences
Manage the content throughout its entire
life cycle from creation, acquisition, or
migration to publication and retirement;
and
Meet security requirements since
introducing content personalization on
CDNs will facilitate the security issues
raised such as authentication, signing,
encryption, access control, auditing, and
resource control for ensuring content
security and users privacy.
III. CONTENT DISTRIBUTION
ARCHITECTURE
The process Redirector is mainlycomposed by an algorithm that accepts input The
architecture comprises six basic elements. The
relationships between blocks are as follows:
The origin server delegates its URI namespace to
the request routing system (1), and publishes
content (2) to be distributed to the remote
surrogates (3) by the distribution system. Client
requests content from what he perceives to be the
origin server, but his request is treated by the
request routing system (4) which redirects him to
the optimum surrogate server (5). The surrogate
servers periodically send information to the
accounting system (6), which summarizes it indetail statistics and sends it as feedback to the
origin server(7) and the request routing system
Fig 2. General Architecture of CDN
A. Sequence of action taken during content
transaction
1. The client will connect to a portal, e.g.www.porta1. com, through a web browser.aportal consists of a set of surrogates that
build together a CDN.
2. The request is processed by theauthoritative DNS server, which is
responsible to map the name
www.portal.com into at least one IP
address. This is the best point to introduce
the Request Routing System, and is
mostly used by current CDN companies.
In fact, the DNS server is nothing but an
interface: another process, call it
Redirector, is the one in charge fordetermining the optimal surrogate.
3. parameters and produces a response,
typically a list of IP choice of an
appropriate server depends on client
proximity, server overhead and network
congestion.
4. Server overhead and network congestionimplies some type of continuously
monitoring the system, for example,
through SNMP. This is addressed by
another process, say SNMP Monitor,
responsible for capturing periodical
information of the servers and the
network.
5. The client will retrieve a list of IPaddresses decrementally ordered by
optimal performance estimated by the
Redirectorprocess. Once the client entersthe portal from one of the surrogates, it
has to select a content. This content is
typically in a multimedia format and is
delivered streamlined by a media server.
So we need both a web server and a media
server.
6. Once the desired content is selected by auser, a new resolution phase is needed, as
this selected content supposes a new input
Dept of ISE, AMCEC 20
-
8/3/2019 4 pg(18-22)
4/5
National Level Conference On Information Processing 2011
parameter. It is also important to note that
target web surrogates could be different
from target media surrogates. The
resolution phase takes place at HTTP
level, acting the first contacted surrogate
as interface.
7. In order to distribute the content in a
streamlined multimedia format, some kind
of plug-in is required inside the browser,
such as RealPlayer, QuickTime or, in an
open way, a simple Java applet. This plug-
in connects to the media server in order to
retrieve the content.
B.Look at the Components
Some of the components of the
architecture are given below:
DNS server: The function of our DNS is
to simply map CDN name servers intoCDN identifiers. Once a client request for
a certain website arrives at the DNS
server, it filters it depending on thecontent: if the site is associated to a
certain CDN, then the DNS serverobtainsthe corresponding CDN identifier and
resends the request to the Redirectormodule. Otherwise, the request can be
forwarded to a local DNS server following
the hierarchical DNS operation.
addresses.
The Fig 3. Data Exchange between the modules
Redirector module: The Redirector is a
key process of the whole system, as is the one in
charge of deciding an adequate surrogate for each
client request. There are two different functional
modes, though similar, related with the number of
input parameters that the included algorithm
supports. In the first mode, which takes place at
DNS resolution phase, the Redirector module
retrieves the CDN identifier and a client IP address.
The latter parameter (IP address) is at this stage
unnecessary if only scalability is targeted. The
second mode takes place after the client has
selected the content. This time the surrogate that isserving him has to interact in background with the
Redirectormodule to retrieve an optimal surrogatefor serving this content, which is a key parameter
in the selection strategy.
It is also important to serve content from a nearby
surrogate in order to obtain a low response time;
therefore, client proximity is estimated and taken
into account. If the CDN environment remains
local (iCDN) and the number of surrogates is not
considerable, a simple way of calculating proximity
consists of sending pings from each surrogate to
the client loaded within a time interval, each of
them will serve a client request with the sameprobability if a surrogate is overloaded above an
established limit, it will not be considered in the
algorithm.
The SNMP Monitor captures status informationfrom the surrogates. This information is of two
types: on the one hand, the monitor stores data
about available resources in each portal or
surrogate (memory, CPU utilization and number of
connections); on the other hand, the monitor tracks
information about network status between clients
and portals. Whereas the first type of data is
periodically read, the second type isasynchronously requested from the Redirector
module each time a client issues a request to the
CDN.
The surrogates or portals act as CDNentry points
for the clients and are in charge of serving them the
desired content. The portals store static content
(web pages) and generate dynamic content. Once a
portal receives a client request for a streaming
media content, it firstly interacts with the
Redirectormodule to obtain an optimum surrogate
IP address. After that, the portal generates an applet
that contains a media player and sends it to the
client, including the IP address of the optimum
server. The client then initiates the applet and
reproduces the multimedia content.
The CDNManager is responsible for initializing all
CDNparameter values, as well as managing how
and where to store content according to a certain
policy. That includes cache time control, content
transfers between portals, content inclusion,
content deletion, etc.
C.Database Design
Any system that stores and bases its
behavior on stored data (at least partially) must
Dept of ISE, AMCEC 21
-
8/3/2019 4 pg(18-22)
5/5
National Level Conference On Information Processing 2011
include an effective design of its database structure.
The database design is highly dependent of the
desired content to be published. In the case of our
CDN, there are various important .databases
associated to the different modules of the
architecture.
There is a global content database that includes
three data tables:
table_lessons: it includes some important
information for reference (the title of the lesson, the
correspondent subject, faculty and teacher).
lessons_CDN: it associates a lesson with a portal.copies_lessons: it indicates which surrogate has an
available copy of a certain lesson.
The first two data tables of the contentdatabase are remotely replicated on each surrogate,
so that each surrogate has local knowledge of the
available content in the CDN. The SNMP Monitorhas its own database to store all the information
obtained by the SNMP agents etiher periodically
-CPU usage, used memory and connections or
asynchronously - pings and network hops. Note
that ping mechanisms may suppose a problem if a
client incorporates a firewall that rejects ICMP
messages. The redirection algorithm, as part of the
Redirector Module, also has its own database to
store values of server load and server proximity.
D.Data Exchange between the ComponentsA well performance of a CDN
significantly depends on the correct communicationof each process of the system. This communication
takes place in form of messages, whose exchange is
illustrated in Fig. 3.2.
Two different routes can be distinguished: A DNS
resolution phase: It redirects a client to a portal
using a load balancing algorithm (4 steps), and
A portal resolution phase: When the client has
already entered a portal and is going to select a
streamlined multimedia content from a list of
available ones (7 steps).
If no server is available, an empty list is
sent and an error message is forwarded to the
client. Besides these messages that occur in a
content transaction, there are additional ones
related to management tasks, such as content
transfers, cache control, etc.
IV. CONCLUSIONS
In this paper we described about the
Content Delivery Networks. It tells about how well
the contents are distributed world wide to provide
better accessibility of data for the clients.
CDNs are still in an early stage of
development and their future evolution remains an
open issue. The challenge is to provide a delicate
balance between costs and customers satisfaction.
In this framework, caching-related practices,
content personalization processes, and data mining
techniques seem to offer an effective roadmap for
the further evolution of CDNs.
CDNs deliver all the data to multicast
group, so the clients can join the group anytime
when they need for particular content by sendingsession messages.. This technique saves the
network bandwidth and makes it scalable.
V. FUTURE WORK
The client-server communication flow is
replaced in CDN by two communication flows,
namely one between the origin server and the
surrogate server and the other between the
surrogate server and the client. Thus congestion
control can be done for the communication flow
between client and surrogate server since CDNs
support streaming media contents congestion cantake place which can be reduced further.
REFERENCES
[1] Baruffa, G., Femminella, M., Frescura, F.,
Micanti, P., Parisi, A. and Reali, G.,Multicast
Distribution of Digital Cinema, NEM Summit,
September 2008
[2] Byers, J. and Kwon, G., STAIR: Practical
AIMD Multirate Multicast Congestion Control,
3rd Int'l Workshop on Networked Group
Communication, 2001, pp. 100-112
[3] Floyd, S., Jacobson, V., Liu, C., McCanne, S.
and Zhang, L., SRM: Scalable Reliable Multicast.
A Reliable Multicast Framework for Light-weight
Sessions and Application Level Framing,
IEEE/ACM Transactions on Networking, 1997, pp.
784-803.
[4] Pallis, G. and Vakali, A., Insight and
Perspectives for Content Delivery Networks,
ACM Communications of ACM, 49(1), 2006, pp.
101-106.
[5] Molina, B., Palau, C., Esteve, M., Alonso, I.
and Ruiz, V., On content delivery network
implementation, Computer Communications, vol.
29, no 12, pp: 2396-2412, September 2006.
[6] Matrawy, A. and Lambadaris, I., A Survey of
Congestion Control Schemes for Multicast Video
Applications, IEEE Comms Surveys & Tutorials,
2004, pp. 22-31.
[7] Akamai Technologies www.akamai.com.
Dept of ISE, AMCEC 22