4 pg(18-22)

Upload: sandesh-kumar-b-v

Post on 06-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 4 pg(18-22)

    1/5

    National Level Conference On Information Processing 2011

    CONTENT DISTRIBUTION NETWORKSTrupti V.G [1], Rekha V [2]

    1- M.Tech(CE),1st sem, 2-Asst. prof

    Dept of Computer science & Engineering

    SJB Institute of Technology, Bangalore[1] [email protected] [2][email protected]

    Abstract- The paper introduces about Content Delivery Networks. How the traffic is reduced by distributing

    the contents across several artificial servers and making the contents highly accessible and available. Success

    in Internet applications involves user interactions whose quality is mainly affected by application response

    time. Content Delivery Networks (CDNs) have shortly appeared as a distributed solution to serve content

    faster than contacting a centralized server. It also specifies the CDN architecture which mainly has a Client,

    Server and Surrogate server that form a CDN topology. A set of surrogate servers (distributed around the

    world) that cache the origin servers content, Routers and network elements that deliver content requests to

    the optimal location and the optimal surrogate server; and An accounting mechanism that provides logs and

    information to the origin servers. Under a CDN, the client-server communication is replaced by two

    communication flows: one between the client and the surrogate server, and another between the surrogate

    server and the origin server. This distinction into two communication flows reduces congestion (particularly

    over popular servers) and increases content distribution and availability. This report tries to describe a CDN

    from a different point of view, paying much attention on the implementation process of a CDN. More

    specifically, CDNs maintain multiple Points of Presence (PoP) with clusters of (the so-called surrogate)

    servers that store copies of identical content, such that users requests are satisfied by the most appropriate

    site.

    Keywords- surrogate servers, Multicast, Computer Networking

    I INTRODUCTION

    By the growth of World Wide Web the traffic

    increased in many of the popular websites i.e. is the

    client requests were more which has to be serviced

    by the server. These websites have a motivation toprovide better services.

    Thus CDNs came into existence and

    these have been used to provide better service. A

    Content Delivery Network (CDN) is an overlay

    network on top of the Internet which pushes

    content closer to end users. It is achieved by

    strategically placing servers, called surrogates, next

    to these users and serving them the desired content.

    The surrogates act typically as intelligent and

    transparent proxy caches that retrieve content

    previously from the origin server before

    responding. As the origin server is less accessed,

    backbone traffic is reduced and network bandwidthis efficiently used. Besides, load can be balanced

    among the servers. CDNs has primarily focused on

    techniques for efficiently redirecting user requests

    to appropriate surrogates to reduce request latency

    and balance load, and placement strategies to place

    server replicas in order to achieve better

    performance. Many CDN service providers like

    Akamai offer some overview whitepapers, they

    hide the real implementation as a private secret and

    fundamental key of their business success. This

    paper tries to provide some implementation hints to

    establish a CDN basis. This paper also includes

    CDN model architecture with the description of themain building blocks from an implementional point

    of view. CDN attempts to reduce network latency

    by avoidance of congestion paths.

    CDNs maintain multiple Points of

    Presence (PoP) with clusters of (the so-called

    surrogate) servers that store copies of identical

    content, such that users requests are satisfied bythe most appropriate site

    II INSITE AND PERSPECTIVENESS OF CDNs

    A Content Delivery Network (CDN) is an

    overlay network on top of the Internet which

    pushes content closer to end users. It is achieved by

    strategically placing servers, called surrogates, next

    to these users and serving them the desired content.

    A. CDN TopologyCDN topology involves:

    A set of surrogate servers (distributed around the

    world) that cache the origin servers content; Routers and network elements that deliver content

    requests to the optimal location and the optimal

    surrogate server; and

    An accounting mechanism that provides logs and

    information to the origin servers.

    Under a CDN, the client-server communication is

    replaced by two communication flows: one

    between the client and the surrogate server, and

    another between the surrogate server and theorigin

    server and increases content distribution and

    availability. To maintain (worldwide) distributed

    copies of identical content, the practice for a CDN

    is to locate its surrogate servers within strategicdata centers (relying on multiple network

    Dept of ISE, AMCEC 18

    mailto:[email protected]:[email protected]:[email protected]
  • 8/3/2019 4 pg(18-22)

    2/5

    National Level Conference On Information Processing 2011

    providers), over a globally distributed

    infrastructure. This distinction into two

    communication flows reduces congestion

    (particularly over popular servers)

    Some of the content providers are Akamai

    Technologies which is the leading one has more

    than 20000 servers over 10000 networks. Inktoni, aYahoo company provides services for load

    balancing, streaming media.

    B. Content Delivery Network World Wide

    Organizations offering content to a

    geographically distributed and potentially large

    audience (such as the Web), are attracted to CDNs

    and the trend for them is to sign a contract with a

    CDN provider and offer their sites content over

    this CDN. CDNs are widely used in the Web

    community, but a fundamental problem is that the

    costs involved are quite high.

    Fig 1. Content Delivery Networks Overview

    C.Issues involved in CDNSurrogate Servers Placement: Choosing

    the best location for each surrogate server is

    important for each CDN infrastructure since the

    location of surrogate servers is related to important

    issues in the content delivery process. Determining

    the best network locations for CDN surrogate

    servers (known as the Web server replica

    placement problem) is critical for content

    outsourcing performance and the overall content

    distribution process. CDN topology is built such

    that the client-perceived performance is maximized

    and the infrastructures cost is minimized

    Therefore, effective surrogate server placement

    may reduce the number of surrogate servers needed

    and the size of content (replicated on them), in an

    effort to combine the high quality of services and

    low CDN prices. In this context, several placement

    algorithms have been proposed (such as Greedy1,

    which incrementally places replicas, Hot Spot,

    which places replicas near the clients generating

    the greatest load, and Tree based replicas). These

    algorithms specify the locations of the surrogate

    servers in order to achieve improved performance

    with low infrastructure cost. Earlier

    experimentation has shown that the greedy

    placement strategy can yield close to optimalperformance.

    Content Selection: The choice of the content that

    should be outsourced in order to meet customers

    needs is another important issue in the content

    selection problem. An obvious choice is to

    outsource the entire set of origin servers objects to

    other surrogate servers (the so-called entire

    replication). The greatest advantage of entire

    replication is its simplicity; however, such a

    solution is not feasible or practical because

    although disk prices are continuously dropping, the

    sizes of Web objects increase as well (such asaudio or video on demand). Moreover, the problem

    of updating such a huge collection of Web objects

    is unmanageable. Therefore, the challenge of the

    content selection problem is to find a sophisticated

    management strategy for replication of Web

    content.

    A typical way is to group Web content based on

    either correlation or access frequency and then

    replicate objects in units of content clusters.

    Two types of content clustering have been

    proposed:

    Users sessions-based: The content of the Web log

    files3 is exploited in order to group together a setof users navigation sessions showing similar

    characteristics. Clustering users sessions is useful

    for discovering both groups of users exhibiting

    similar browsing patterns and groups of pages

    having related content based on how often URL

    references occur together across them.

    URL-based: Web content is clustered using the

    Web site topology (which is considered as a

    directed graph), where Web pages are vertices and

    hyperlinks are arcs. The Web pages (URLs) are

    clustered by eliminating arcs between dissimilar

    pages. The most popular objects from a Web site

    are identified, (the so-called hot data), and

    replicated in units of clusters where the correlation

    distance between every pair of URLs is based on a

    certain correlation metric. Furthermore, several

    coarse-grain dynamic replication schemes are used.

    By using these replication schemes, the

    performance of the Web services can be

    significantly improved.

    D. CDN Pricing

    Commercial-oriented Web sites turn to

    CDNs to contend with the high traffic problems

    while providing high data quality and increased

    security for theirs clients in order to increase their

    Dept of ISE, AMCEC 19

  • 8/3/2019 4 pg(18-22)

    3/5

    National Level Conference On Information Processing 2011

    profit and popularity. CDN providers charge their

    customersowners of Web sitesaccording to

    their traffic (delivered by their surrogate servers to

    the clients).

    The most indicative factors affecting the pricing of

    CDN services include: Bandwidth cost;

    Variation in traffic distribution;

    Size of content replicated over surrogate servers;

    Number of surrogate servers;

    Reliability and stability of the whole system; and

    Security issues of outsourcing content delivery.

    E. Meet CDN User Preferences:Meeting the user preferences is crucial for CDNs,

    adopting a content management task by which the

    content is personalized to meet the specific needs

    of each individual user. User preferences are learnt

    from web usage data by using data miningtechniques. Some indicative objectives of content

    personalization over CDNs are

    Deliver the appropriate content to the

    interested users in a timely, scalable, and

    cost-effective manner;

    Increase the quality of the published

    content by ensuring it is accurate,

    authorized, updated, easily searched and

    retrieved, as well as personalized

    according to various users and audiences

    Manage the content throughout its entire

    life cycle from creation, acquisition, or

    migration to publication and retirement;

    and

    Meet security requirements since

    introducing content personalization on

    CDNs will facilitate the security issues

    raised such as authentication, signing,

    encryption, access control, auditing, and

    resource control for ensuring content

    security and users privacy.

    III. CONTENT DISTRIBUTION

    ARCHITECTURE

    The process Redirector is mainlycomposed by an algorithm that accepts input The

    architecture comprises six basic elements. The

    relationships between blocks are as follows:

    The origin server delegates its URI namespace to

    the request routing system (1), and publishes

    content (2) to be distributed to the remote

    surrogates (3) by the distribution system. Client

    requests content from what he perceives to be the

    origin server, but his request is treated by the

    request routing system (4) which redirects him to

    the optimum surrogate server (5). The surrogate

    servers periodically send information to the

    accounting system (6), which summarizes it indetail statistics and sends it as feedback to the

    origin server(7) and the request routing system

    Fig 2. General Architecture of CDN

    A. Sequence of action taken during content

    transaction

    1. The client will connect to a portal, e.g.www.porta1. com, through a web browser.aportal consists of a set of surrogates that

    build together a CDN.

    2. The request is processed by theauthoritative DNS server, which is

    responsible to map the name

    www.portal.com into at least one IP

    address. This is the best point to introduce

    the Request Routing System, and is

    mostly used by current CDN companies.

    In fact, the DNS server is nothing but an

    interface: another process, call it

    Redirector, is the one in charge fordetermining the optimal surrogate.

    3. parameters and produces a response,

    typically a list of IP choice of an

    appropriate server depends on client

    proximity, server overhead and network

    congestion.

    4. Server overhead and network congestionimplies some type of continuously

    monitoring the system, for example,

    through SNMP. This is addressed by

    another process, say SNMP Monitor,

    responsible for capturing periodical

    information of the servers and the

    network.

    5. The client will retrieve a list of IPaddresses decrementally ordered by

    optimal performance estimated by the

    Redirectorprocess. Once the client entersthe portal from one of the surrogates, it

    has to select a content. This content is

    typically in a multimedia format and is

    delivered streamlined by a media server.

    So we need both a web server and a media

    server.

    6. Once the desired content is selected by auser, a new resolution phase is needed, as

    this selected content supposes a new input

    Dept of ISE, AMCEC 20

  • 8/3/2019 4 pg(18-22)

    4/5

    National Level Conference On Information Processing 2011

    parameter. It is also important to note that

    target web surrogates could be different

    from target media surrogates. The

    resolution phase takes place at HTTP

    level, acting the first contacted surrogate

    as interface.

    7. In order to distribute the content in a

    streamlined multimedia format, some kind

    of plug-in is required inside the browser,

    such as RealPlayer, QuickTime or, in an

    open way, a simple Java applet. This plug-

    in connects to the media server in order to

    retrieve the content.

    B.Look at the Components

    Some of the components of the

    architecture are given below:

    DNS server: The function of our DNS is

    to simply map CDN name servers intoCDN identifiers. Once a client request for

    a certain website arrives at the DNS

    server, it filters it depending on thecontent: if the site is associated to a

    certain CDN, then the DNS serverobtainsthe corresponding CDN identifier and

    resends the request to the Redirectormodule. Otherwise, the request can be

    forwarded to a local DNS server following

    the hierarchical DNS operation.

    addresses.

    The Fig 3. Data Exchange between the modules

    Redirector module: The Redirector is a

    key process of the whole system, as is the one in

    charge of deciding an adequate surrogate for each

    client request. There are two different functional

    modes, though similar, related with the number of

    input parameters that the included algorithm

    supports. In the first mode, which takes place at

    DNS resolution phase, the Redirector module

    retrieves the CDN identifier and a client IP address.

    The latter parameter (IP address) is at this stage

    unnecessary if only scalability is targeted. The

    second mode takes place after the client has

    selected the content. This time the surrogate that isserving him has to interact in background with the

    Redirectormodule to retrieve an optimal surrogatefor serving this content, which is a key parameter

    in the selection strategy.

    It is also important to serve content from a nearby

    surrogate in order to obtain a low response time;

    therefore, client proximity is estimated and taken

    into account. If the CDN environment remains

    local (iCDN) and the number of surrogates is not

    considerable, a simple way of calculating proximity

    consists of sending pings from each surrogate to

    the client loaded within a time interval, each of

    them will serve a client request with the sameprobability if a surrogate is overloaded above an

    established limit, it will not be considered in the

    algorithm.

    The SNMP Monitor captures status informationfrom the surrogates. This information is of two

    types: on the one hand, the monitor stores data

    about available resources in each portal or

    surrogate (memory, CPU utilization and number of

    connections); on the other hand, the monitor tracks

    information about network status between clients

    and portals. Whereas the first type of data is

    periodically read, the second type isasynchronously requested from the Redirector

    module each time a client issues a request to the

    CDN.

    The surrogates or portals act as CDNentry points

    for the clients and are in charge of serving them the

    desired content. The portals store static content

    (web pages) and generate dynamic content. Once a

    portal receives a client request for a streaming

    media content, it firstly interacts with the

    Redirectormodule to obtain an optimum surrogate

    IP address. After that, the portal generates an applet

    that contains a media player and sends it to the

    client, including the IP address of the optimum

    server. The client then initiates the applet and

    reproduces the multimedia content.

    The CDNManager is responsible for initializing all

    CDNparameter values, as well as managing how

    and where to store content according to a certain

    policy. That includes cache time control, content

    transfers between portals, content inclusion,

    content deletion, etc.

    C.Database Design

    Any system that stores and bases its

    behavior on stored data (at least partially) must

    Dept of ISE, AMCEC 21

  • 8/3/2019 4 pg(18-22)

    5/5

    National Level Conference On Information Processing 2011

    include an effective design of its database structure.

    The database design is highly dependent of the

    desired content to be published. In the case of our

    CDN, there are various important .databases

    associated to the different modules of the

    architecture.

    There is a global content database that includes

    three data tables:

    table_lessons: it includes some important

    information for reference (the title of the lesson, the

    correspondent subject, faculty and teacher).

    lessons_CDN: it associates a lesson with a portal.copies_lessons: it indicates which surrogate has an

    available copy of a certain lesson.

    The first two data tables of the contentdatabase are remotely replicated on each surrogate,

    so that each surrogate has local knowledge of the

    available content in the CDN. The SNMP Monitorhas its own database to store all the information

    obtained by the SNMP agents etiher periodically

    -CPU usage, used memory and connections or

    asynchronously - pings and network hops. Note

    that ping mechanisms may suppose a problem if a

    client incorporates a firewall that rejects ICMP

    messages. The redirection algorithm, as part of the

    Redirector Module, also has its own database to

    store values of server load and server proximity.

    D.Data Exchange between the ComponentsA well performance of a CDN

    significantly depends on the correct communicationof each process of the system. This communication

    takes place in form of messages, whose exchange is

    illustrated in Fig. 3.2.

    Two different routes can be distinguished: A DNS

    resolution phase: It redirects a client to a portal

    using a load balancing algorithm (4 steps), and

    A portal resolution phase: When the client has

    already entered a portal and is going to select a

    streamlined multimedia content from a list of

    available ones (7 steps).

    If no server is available, an empty list is

    sent and an error message is forwarded to the

    client. Besides these messages that occur in a

    content transaction, there are additional ones

    related to management tasks, such as content

    transfers, cache control, etc.

    IV. CONCLUSIONS

    In this paper we described about the

    Content Delivery Networks. It tells about how well

    the contents are distributed world wide to provide

    better accessibility of data for the clients.

    CDNs are still in an early stage of

    development and their future evolution remains an

    open issue. The challenge is to provide a delicate

    balance between costs and customers satisfaction.

    In this framework, caching-related practices,

    content personalization processes, and data mining

    techniques seem to offer an effective roadmap for

    the further evolution of CDNs.

    CDNs deliver all the data to multicast

    group, so the clients can join the group anytime

    when they need for particular content by sendingsession messages.. This technique saves the

    network bandwidth and makes it scalable.

    V. FUTURE WORK

    The client-server communication flow is

    replaced in CDN by two communication flows,

    namely one between the origin server and the

    surrogate server and the other between the

    surrogate server and the client. Thus congestion

    control can be done for the communication flow

    between client and surrogate server since CDNs

    support streaming media contents congestion cantake place which can be reduced further.

    REFERENCES

    [1] Baruffa, G., Femminella, M., Frescura, F.,

    Micanti, P., Parisi, A. and Reali, G.,Multicast

    Distribution of Digital Cinema, NEM Summit,

    September 2008

    [2] Byers, J. and Kwon, G., STAIR: Practical

    AIMD Multirate Multicast Congestion Control,

    3rd Int'l Workshop on Networked Group

    Communication, 2001, pp. 100-112

    [3] Floyd, S., Jacobson, V., Liu, C., McCanne, S.

    and Zhang, L., SRM: Scalable Reliable Multicast.

    A Reliable Multicast Framework for Light-weight

    Sessions and Application Level Framing,

    IEEE/ACM Transactions on Networking, 1997, pp.

    784-803.

    [4] Pallis, G. and Vakali, A., Insight and

    Perspectives for Content Delivery Networks,

    ACM Communications of ACM, 49(1), 2006, pp.

    101-106.

    [5] Molina, B., Palau, C., Esteve, M., Alonso, I.

    and Ruiz, V., On content delivery network

    implementation, Computer Communications, vol.

    29, no 12, pp: 2396-2412, September 2006.

    [6] Matrawy, A. and Lambadaris, I., A Survey of

    Congestion Control Schemes for Multicast Video

    Applications, IEEE Comms Surveys & Tutorials,

    2004, pp. 22-31.

    [7] Akamai Technologies www.akamai.com.

    Dept of ISE, AMCEC 22