grid computing project

Upload: sujith-kms

Post on 07-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Grid Computing Project

    1/24

    Grid Computing

    Vincent PoonUniversity of Pennsylvania

    Oby SumampouwUniversity of Pennsylvania

    ABSTRACT

    Grid computing brings the diverse resources of multiple administrative domains to bear on

    large scale computing problems. Recent advances in desktop computing power and

    network bandwidth have generated widespread interest and investment in grid

    technologies. This paper examines the current status of grid computing through a review

    of recent literature on the topic. The analysis focuses on how grids are implemented, the

    benefits and drawbacks of grid computing, and both public and private applications of grid

    technologies.

    Keywords: grid computing, distributed computing, Internet, Information Technology

    CIT 595

  • 8/6/2019 Grid Computing Project

    2/24

    Table of Contents

    Introduction 1

    Implementation 1

    What is the Grid? 1Grid Computing Architecture 4GRID Middleware (Globus Toolkit) 5GRID Framework 6

    Benefits 9

    Drawbacks, Risks, and Limitations 10

    Security 10Impact on Network Traffic 12Accounting and Charging for Grid Resources 12Amdahls Law 14

    Applications 15

    SETI@Home 15Folding@Home 16Private/Corporate Applications 17

    Conclusion 19

    Appendix 20

    Task Summary 20

  • 8/6/2019 Grid Computing Project

    3/24

    IntroductionSince the dawn of computers, there have always been computational problems requiring

    massive amounts of processing power. Despite Moores Law, the large-scale calculations de-

    manded by these problems have consistently exceeded the computational capabilities of even the

    fastest processors available. In an attempt to satisfy this demand for large-scale processing

    power, several approaches have been taken, including super-computing and cluster computing.

    These approaches have typically relied on the use of multiple processors or computers operating

    in parallel to act as a single, ultra-fast computer. These approaches have had their share of suc-

    cesses, but have also been limited by high costs and short life-spans.

    We think that two general trends in computing portend a potential explosion in grid com-

    puting. First, while personal computer processors have achieved exponential increases in speed

    and capabilities, the processing demands of the average users typical computer usage has not

    kept pace. This has resulted in a situation where most CPUs remain idle for large amounts of

    time. Secondly, the rapid expansion of the Internet and broadband access has essentially created

    a high-speed network between most of the personal computers currently in use. Grid computing

    offers the potential to take advantage of these two trends by breaking up computational problems

    into smaller pieces, and transmitting these pieces over the Internet to harness the large number of

    idle CPU cycles. In this way, grid computing represents a low-cost, efficient usage of computer

    resources to solve large-scale computational problems.

    ImplementationWhat is the Grid?

    The term grid computing is not defined rigidly. According to [1], grid computing has

    three characteristics as shown in figure 1:

    1.) Decentralized Resource Coordination:

    All resources within the network are handled at the local level. Grid computing handles

    the integration and distribution of users from multiple domains. The grid must also address the

    security issues which emerge from the interactions among many users. This approach is the op-

    1

  • 8/6/2019 Grid Computing Project

    4/24

    posite of traditional server-client resource coordination, where resources are heavily centralized

    on the server.

    2.) Open source, standard and general purpose protocols and interfaces:

    Grid computing is used to handle multi-various applications with userson different do-

    mains. Therefore it is important that the communication protocols among the nodes are imple-

    mented in a standard way. Open source development plays a significant role in ensuring that the

    protocol can be expanded to serve specific applications.

    3.) Deliver high quality services:

    Grid computing must be able to solve complex interactions among resources in a respon-

    sive and coordinated way.

    Figure 1. The Basic Foundation of a grid enabled application Source [2]

    In some scientific communities, grid computing refers to CPU scavenging, where idle

    machines are converted into a shared computing resource, such as the system provided by

    SETI@home [3] to search for extraterrestrial life. However based on the 3 criteria listed above,

    SETI project and Folding@home should not be considered as a GRID application. The public

    2

    mailto:Folding@homemailto:Folding@homemailto:Folding@home
  • 8/6/2019 Grid Computing Project

    5/24

    nature of SETI and Folding@home creates security compromises and they are prone to malicious

    attack [2]. In addition, we think that since SETI and Folding clients cannot interact with each

    other, they thereby do not follow the specification of the grid computing model. However since

    SETI and Folding are commonly acknowledged as grid computing examples [4], we will intro-

    duce a few distributed computing models that can be considered as grid computing under a

    broader definition.

    a) Internet Computing uses the Internet as a means to solving large problems. Large prob-

    lems are divided into smaller sub problems and are distributed through the Internet to small

    computing resources such as personal computers and laptops. SETI@home and Folding@home

    use this model. Resource nodes incorporate Internet computing by installing a client program.

    The client program will then download a small problem and utilize unused CPU cycles to solve

    the problem and resend the solution to the server. The server assigns a unique ID tag for each

    chunk of problem and each problem is solved by several users. This redundant problem solving

    is conducted to maintain accuracy and to prevent backlog from nodes who failed to solve a prob-

    lem.

    Unused CPU cycle management is delegated to the client's operating system usually by

    setting the default program priority to the lowest priority. For example: Folding@home clientprogram is run with default niceness of 19, which is the lowest priority program in Linux.

    Internet computing's major advantage is scalability and a high degree of independence from net-

    work latency due to its decentralized nature. However, due to the free and open nature of the cli-

    ent program, this computing model is prone to security attacks.

    b) P2P or peer-to-peer can be represented as storage grid. The advantage of a peer-to-peer

    distributed model is decentralized control. The nodes interact among themselves and they relieve

    the burden of managing resources from the central server. Kazaa, Limewire and Napster are

    prime examples of the P2P model. In this model, resources such as data and network bandwidth

    are located in local client called peers. Peers can share and leverage unused resources by ag-

    gregating cycles and sharing digital content. Available download bandwidth is directly correlated

    with the number of client available. This is the greatest strength of the P2P model. Unfortunately

    3

    mailto:Folding@homemailto:Folding@homemailto:Folding@homemailto:Folding@homemailto:Folding@homemailto:Folding@home
  • 8/6/2019 Grid Computing Project

    6/24

    since P2P has no centralized control, it is hard to find an efficient search mechanism and it tends

    to create high network latency because the speed of the network depends on the number of users

    aggregated for a certain resource. To alleviate this problem other P2P models have been devel-

    oped - namelyHybrid decentralized P2Pin which a server hold meta data with respect to each

    resource so searching is faster andPartially centralized P2Pin which several nodes are gathered

    and managed by one larger node which acts like a pseudo server.

    Grid Computing ArchitectureIn general, grid architecture can be represented as in Figure 2.% %

    % % % % % % % % % %

    % % %

    %

    % % % % % % % % % % % % % % % %

    % % % % % % % % % % % % % % % %

    % % % %

    %

    % %% % % % % % % % % % % % %

    % % % % % % % % % % % % % %

    % % %

    % % % % % % % % % % % % % )

    % % % % % % % % % % % % )

    .))4#/3+#',

    &'441/+#D1

    N16'*"/1

    &',,1/+#D#+=

    735"#/

    H')%P1D14

    Q'++'(%P1D14

    % % % % % % % %

    Figure 2. High level concept of Grid Computing Architecture

    Explanation of each layer:

    a.) Fabric:Fabric is the lowest layer in grid architecture. Unlike in normal computer architecture

    where the lowest layer represents logic gates, the fabric is an abstract layer which represents lo-

    cal computing resources such as storage, networking and computational resources.

    b)Connectivity:

    4

  • 8/6/2019 Grid Computing Project

    7/24

    Connectivity layer connects several fabrics into one giant node of fabric. Connectivity

    layer provides secure connections and is implemented using network protocols such as Internet

    protocol (TCP/IP) and application protocols (DNS), etc.

    c)Resource:

    Resource layer deals with management of many connectivity layers. Resource layer can

    be information protocols used to obtain information about configuration, load and usage policies,

    and management protocolthat negotiate the policies for handling resource requirement and op-

    erations.

    d)Collective:

    Collective layer consist of the protocols of interactions among several different resources.This layer includes directory services, accounting payment, collaboration services, and schedul-

    ing services to name a few.

    e) Application:

    Application layer is the highest layer in grid computing architecture. This layer calls

    other layers to perform desired actions. Application layer is simply the program we are working

    with to solve our problems.

    GRID Middleware (Globus Toolkit)Since grid computing is relatively new, standards are being developed to accommodate

    the openness and integrity of grids. There are two competing industry standards groups, the

    Global Grid Forum, started in 1999 and the Enterprise Grid Alliance, founded in 2004. [5]. Here

    are some examples of middleware and APIs used for developing grid applications: Globus

    Toolkit, Berkeley Open Infrastructure for Network Computing (BOINC), Simple Grid Protocol

    and Java CoG Kit.

    The Globus Toolkit(GT) was developed by Global Alliance, a division of Global Grid

    Forum. Global Alliance comprises of R&D research groups based at several universities such as

    the University of Chicago, the University of Edinburgh and the University of Southern Califor-

    nia. GT is the de facto standard for grid computing [2] and it is comprised of 3 main services:

    5

  • 8/6/2019 Grid Computing Project

    8/24

    a) The core services:

    Basic infrastructures to enable grid computing such as: resource management for naming

    and locating computational resources on remote systems, security and system level services, and

    monitor status.

    b) Security services:

    Security is implemented using the standard GSI (Grid Security Infrastructure) and CAS

    (Community Authorization Service). GSI offers services such as basic certification, PKI, and

    many other security libraries.

    c) Data/Resource Management

    Protocols to ensure rapid and secure data transfer among resource nodes. There are 4 pro-tocols: GridFTP, Reliable File Transfer(RFT), Replica Location Service(RLS), and Extensible

    Input/Output(XIO). GRAM(Globus Resource Allocation Managers) is the Data Management for

    GT. TeraGrid [5], TIGER, Taiwan UniGrid[6] are examples of grid projects that use GT.

    GRID FrameworkLike many other high performance computing models, a grid enabled application has a

    typical framework as shown in figure 3.

    Figure 3. The typical framework of a grid-enabled application. Source [2]

    6

  • 8/6/2019 Grid Computing Project

    9/24

  • 8/6/2019 Grid Computing Project

    10/24

    # # # # # # # # # # # # # # #

    # # # # # # # # # # # # # # #

    # # # # # # # # # # # # # # # # # #

    # # # # # # # #

    # # # # #

    # # # # # # # # # # # # # # # #

    ABC=

  • 8/6/2019 Grid Computing Project

    11/24

    FindServiceData() which provides information about the service such as status and reg-

    istry

    SetTerminationTime() which sets how long until the service is terminated

    Destroy() which allows the client to destroy instances

    OGSA interfaces, which is called WSDL PortTypes, also implements additional meth-

    ods such as

    SubscribeTo-NotificationTopic() which allow delivery of notifications via third party

    messaging services

    RegisterService() which register the GSH

    CreateService() which creates new grid service instances and many other interfaces.

    BenefitsGrid computing offers several benefits over regular cluster computing and super com-

    puter models. For example, grid computing enables several resource nodes such as regular desk-

    top computers, super computers and even cluster computers to be connected as one giant com-

    puter. This is possible because grid computing has a transparency layer that shields the user from

    the impression that the grid is a network of computers. In addition grid also offers several other

    benefits such as:a) The ability to use computing resources regardless of their location and therefore, man-

    aged by different people and organizations. [8] Unlike regularly connected network (i.e. the

    Internet, server-client networks), a user in Chicago could access a file in a computer down at At-

    lanta as if the file was in his personal desktop. This is possible because the grid treats storage and

    computing power of several clusters as a single computer by implementing the transparency

    layer and virtualization program.

    b) Internet computing and P2P offers a cheap solution to solve a large problem. This is

    especially useful for scientists that are working to solve a scientific problem that requires mas-

    sive computing facilities but do not have sufficient funding to purchase adequate facilities. The

    grid model allows common folks to contribute to science in ways that were not available before.

    9

  • 8/6/2019 Grid Computing Project

    12/24

    c) Unlike super computers, cluster computing builds a giant computing resource based on

    regular computer components such as regular Intel Pentium processors, regular DDR SDRAM,

    etc. Thus the cost and scalability is superior to that of super computers. Grids have similar per-

    formance and scalability as cluster computing and may cost less. According to [6], once a net-

    work speed surpasses a certain limit, the speed of a grid network does not affect grid perform-

    ance much.

    Drawbacks, Risks, and LimitationsSecurity

    Grid computing poses a variety of unique security challenges. In a more traditional

    server-client model, a client is authenticated by a server to use the servers resources. In a grid

    computing environment, however, resources from different administrative domains are brought

    to bear on a single computation. As [9] points out, it is quite possible in such an environment for

    a particular grid resource to act as both a server and a client. When a user first sends out a com-

    putation onto the grid, the first resource to receive the request is acting as a server. Yet this initial

    server may quickly become a client as it requests assistance from other resources on the grid.

    This scenario highlights one of the security demands of grid computing - delegation.

    That is, a user needs to be able to delegate authority to the grid application he/she is running, sothat the application can then authorize any subprocesses it needs to run on other grid resources.

    With the resources of a grid widely spread out, all using different security policies with various

    levels of security, this can be quite a challenge. To solve this problem, a proxy can be used. If

    the proxy is recognized and trusted by all administrative domains, the user can can login to the

    proxy, and all requests for new grid tasks would be handled by the proxy. Of course, this would

    necessitate a global identification system [10], since different administrative domains may con-

    tain the same local login IDs. One proposed solution is to use a naming system similar to the

    DNS system, where components are added to an ID progressively until it is globally unique.

    In addition to the authentication issues just mentioned, grids need to also manage confi-

    dentiality. The nature of grid computing is such that the data being computed will be copied to

    many different machines, each a potential security leak. One of the original drivers of grid com-

    10

  • 8/6/2019 Grid Computing Project

    13/24

    puting was the demand for large-scale, high performance computing by the scientific communi-

    ties. The need for security in these initial scientific applications was minimal, since scientific

    research is typically done openly with peer review and public funding. As such, researchers

    didnt have to worry about the confidentiality of data being sent over public grids. As grid com-

    puting expands beyond its scientific roots into the private arena, however, grids will be running

    more mission-critical, confidential applications, and keeping the data being sent over the grid a

    secret will become a priority.

    On the flip side, every machine that participates in a grid network wants to ensure its own

    security with respect to the opening of its resources to the grid. Each machine needs to guard

    against malicious code that it might receive from the grid (or an unauthorized party posing as a

    grid member) as a computing task. It is crucial that this security safeguard is in place at the local

    machine level, to prevent a major risk of grid computing: the propagation of viruses, worms, or

    other malicious content over the grid [11].

    Thus, there are dual security concerns at play in grid computing: the desire for confiden-

    tiality of the data sent over the grid, and the desire to protect each resource/machine from mali-

    cious data/code. One way of solving these problems is to run grid processes in asandbox [12],

    where the local client system has limited access to the grid data, and the grid code being run haslimited access to the local system. Encryption can be combined with this model to transfer data

    between resources on the grid, as to prevent unauthorized parties that intercept the data from

    reading it. For example, a Public Key Infrastructure such as RSA can be employed either to en-

    crypt the data directly, or to open a secure channel of communication between grid participants.

    Maintaining the integrity of data on the grid is also an important security challenge. That

    is, after a users task is complete, he/she needs to ensure that no individual part of the grid has

    tampered with the data being computed. This can be dealt with by either using MD5 checksums,

    redundancy (where the same task is parceled out to several grid resources), or a combination of

    the two.

    11

  • 8/6/2019 Grid Computing Project

    14/24

    Finally, a security challenge for future grid growth will perhaps not be a technical hurdle,

    but a legal one. Grids already encompass geographically widespread and diverse administrative

    domains, with some even spanning multiple continents. With laws and policies widely differing

    between countries and states with respect to encryption and privacy [13], grid administrators will

    need to find a way to secure the grid while still respecting local laws. In addition, legal statutes

    may govern the unauthorized installation of middleware, even if the computer resources

    wouldnt have been used for anything else. For example, David McOwen was sued for $415,000

    for installing grid computing programs on the computers at his college, even though these pro-

    grams were set to use only idle CPU resources [14].

    Impact on Network Traffic

    One of the concerns about grid computing is its impact on network traffic. Grid comput-

    ing was first developed in the mid 90s, and at the time many individual users did not even have

    broadband access in their homes. From the start, then, grid computing designers have had a mo-

    tivation to minimize the impact of their software on network traffic. Grid computing employs

    sophisticated scheduling and caching to reduce the impact on the users network capacity. Users

    can control settings that allow them to decide when to receive/transmit data, and how much data

    to cache. For example, the BOINC client can restrict network usage to certain times of the day,

    and maximum upload/download rates can be set. Finally, bandwidth monitors can be used [15]

    to ensure that a minimal level of network capacity is available to the user at all times. Of course,

    all of these measures can slow down the grids overall processing speed.

    Accounting and Charging for Grid ResourcesThe public, research-oriented grids currently in use do not have any measures in place to

    account for the expenditure of resources such as network bandwidth. The middleware for these

    projects are distributed with licenses that free the originators from any liability with respect to

    the use of the software. Any bandwidth or power expenditures, then, are paid for by the users of

    the middleware, who in essence voluntarily give up these resources forthe progress of the grid

    project.

    12

  • 8/6/2019 Grid Computing Project

    15/24

    As the use of grid computing expands and more data is pushed out over the grid, how-

    ever, accounting and charging systems will be needed to keep track of resource expenditures and

    payments. In traditional computing paradigms, charging for bandwidth and server processing/

    storage usage is fairly straightforward using conventional metering based on time or amount

    used on a per-client basis. The challenge in grid systems is that any given task can use a wide

    array of resources spread out across the grid simultaneously. Thus, as [16] points out, if proper

    charging of usage is to take place, it is imperative that all administrative domains within a grid

    agree to the same standard of accounting and charging.

    Deciding which standard to apply can be an involved process, and can depend on what

    the grid is primarily used for. For example, some grid computing tasks may involve large

    amounts of data analysis and data mining, thereby requiring heavy bandwidth usage to transfer

    the data sets across the grid, whereas other tasks might use very limited bandwidth but require

    intense processor usage. [17] lists several examples of resources that might be metered and

    charged for:

    CPU time

    Memory Usage

    Page faults

    Storage Usage

    Bandwidth Consumption Software and Libraries accessed

    Signals Received/Context Switches

    A particular grid, then, may charge based on any combination of such resources. A grid

    could also have varying grid service classes [18], where, for instance, some classes would re-

    ceive lower latency and other benefits but be charged at higher rates. The accounting for this,

    however, would still be complicated, and some have suggested a flat-rate pricing model to sim-

    plify this process. Others have proposed market based schemes [19], where grid resources are

    considered producers and the users of the grid are considered consumers. In such a scheme, the

    producers would offer a set of services for a given price in an auction, and consumers would

    consequently bid on these services. The ostensible benefit of such a system would be similar to

    the advantages of private markets in other economies - e.g. the users with the most urgent (as de-

    termined by amount bid at auction) computing tasks would get serviced first. And just as there

    13

  • 8/6/2019 Grid Computing Project

    16/24

    are brokers in real-world economies, software brokers have been proposed - intelligent soft-

    ware agents that seek out resources at the best prices for their owners. In the future, public par-

    ticipants in grid networks may even get compensated monetarily for allowing their spare com-

    puter resources to be used by whoever is willing to pay for it. The hope is that utility (in this

    case computational success) is maximized for all under such a free-market system.

    Amdahls LawAmdahls law is a general statement about the limitations of parallelization in computing,

    which is inherently relied upon by grid computing. In his original paper [20], Amdahl referred to

    an inevitable portion of the computational load that he called data management housekeeping,

    and pointed out that this portion is mostly sequential and hence will limit the gains that can be

    achieved through parallel processing. Although he did not give any equations in his paper, a

    common formulation of his ideas is [21]:

    where S is the overall system speedup,fis the fraction of work per-

    formed by the component being analyzed, and kis the speedup of the new component. Applying

    the equation to parallelization, we find that if the fraction of the work that can be made parallel is

    not 100%, then doubling the numberofcpus does not necessarily double the speed of the overall

    system.

    Within a grid computing context, we can take Amdahls law even further by noticing that

    even the speedup from parallelizing a process is itself limited by communication time [22]. That

    is, even if a process can be sped up by dividing it into chunks and calculating these chunks sepa-

    rately over a grid, this speedup is limited by the time it takes to transfer the initial data through

    the grid to the grid resources and eventually back to the user after the calculation is complete.Even if we were to assume instantaneous computation of results by the grid resources, the calcu-

    lation time can be no smaller than the time it takes to communicate data over the network (which

    in turn involves the significant security and potential accounting overheard described earlier).

    14

  • 8/6/2019 Grid Computing Project

    17/24

    Amdahls law thus gives the theoretical bounds on possible speedups from the use of grid

    computing. In practice, we find that it is indeed the case that every grid application in use to date

    does not rely on low latency or high response times. Even if a grid has more processing capabil-

    ity than any single computer, it will never have the fast communication time inherent in a single

    computer, and this places significant limitations on the applications of grid computing.

    ApplicationsSETI@Home

    One of the earliest, and most successful, public-resource grid computing projects was

    SETI@Home, or the Search for Extra-Terrestrial Intelligence. The project uses grid computing

    to analyze radio waves from outer space for signs of intelligent life. This analysis requires the

    use of fast Fourier transforms and adjustments to correct for what is known as Doppler drift [23],

    all of which requires large amounts of computational processing. In fact, even with 3.96 million

    users as of 2002, the project still receives more raw data than it can analyze, creating a rising

    backlog of data to be examined. The following diagram [15] depicts the overall process:

    TheinternetWork unit

    storageData splittersTapes

    fromArecibo

    2.4 millionusers

    User database

    Data server

    Science database

    First, data is sent on 35GB tapes to a centralized server location, where the data on the

    tapes is broken up into work units. The data is actually very amenable to this process and is ideal

    for grid computing, since observations of different portions of the sky are independent of one

    another, and hence can be broken up into work units fairly easily. These work units are stored on

    a data server that distributes them upon request (using the HTTP protocol to avoid firewalls)

    15

  • 8/6/2019 Grid Computing Project

    18/24

    from users that have the client software installed. This is a more limited form of grid computing

    in that the clients do not communicate with each other, but instead send the work units directly

    back to the data server upon completion. This simplifies security and synchronization issues,

    and most internet computing projects have followed this model.

    One interesting and innovative aspect of SETI@Home is its use of two databases [24].

    One is a science database, which is needed to store results from completed work units. But it is

    the user database that enabled the project to garner as much support as it has. The user database

    stores information about the submitter whenever it receives a completed work unit, recording a

    variety of stats such as team, country, and total CPU time contributed. This allows for fun,

    friendly competition between different teams and countries, which in turn helps spread the word

    about the project. One problem with this, however, is that some users goto extremes and send

    fake or manipulated data to increase their stats [25]. To combat this, the SETI@Home project

    uses a redundancy level of 2 to 3, and has looked into embedding encrypted tags into work units

    to verify that no tampering has taken place.

    Folding@HomeAs discussed previously, one of the challenges in grid computing is designing algorithms

    for the task at hand that can be massively parallelized. The simulation of protein folding was in

    the past an example of an application that required enormous amounts of computing power, but

    could not be spread out over more than a few hundred CPUs very easily. The Pande group at

    Stanford, however, came up with a method using ensemble dynamics that made it easy to divide

    up the work of protein folding simulations into separate computations, which results in an almost

    linear speed up with the number of processors [26]. They formed a public distributing project

    called Folding@Home in 2000 based on their algorithm, and as of March 2007 almost 2 million

    CPUs have contributed to the project.

    The actual implementation of Folding@Home is very similar to that of SETI@Home -

    work units are created and distributed in the same fashion. What is unique about Fold-

    ing@Home is that the developers have migrated the code so that it can take advantage of a wide

    variety of resources. [27] shows the mix of contributing platforms as of Mar 2007:

    16

  • 8/6/2019 Grid Computing Project

    19/24

    OS Type Current TFLOPS Active CPUs Total CPUs

    Windows 155 163467 1630664

    Mac OS X/PowerPC 7 8974 95656

    Mac OS X/Intel 10 3180 7864

    Linux 43 25570 216555

    GPU 45 769 2287PLAYSTATION3 392 29920 43712

    Total 652 231880 1996738

    Official support for the ATI Radeon x1900 GPU was added in Sep 2006, and sup-

    port for the PS3 was added March 15, 2007. As can be seen from the table, these sources pro-

    vide a much higher TFLOP/processor ratio than desktop CPUs, and have boosted the project

    much closer to its goal of 1 petaflop. Thus, the Folding@Home project is proof-of-concept that

    grid networks can take advantage of a panoply of computing resources to tackle large problems.

    Private/Corporate ApplicationsMost businesses in the private sector already have large investments in IT and computing

    resources. Yet these resources are typically not used uniformly, thus providing an opportunity

    for efficiency gains from employing grid computing. According to IBMs vice president of grid

    computing [28], in a typical enterprise environment, Windows desktops and servers have roughly

    5 to 10% utilization, and Unix servers have between 10 and 20% utilization. By using grid com-

    puting, companies can lower their IT costs by using idle desktop cycles rather than purchasing

    new servers, and can divert resources from idle divisions to busier ones. For these reasons, grid

    computing is gaining traction in the enterprise market. Indeed, corporate investment in grid

    computing has grown exponentially in the past few years. Worldwide spending totaled $719 mil-

    lion in 2005, $1.8 billion in 2006, and is expected by analysts to reach a staggering $12 billion

    in 2007, and $24.5 billion by 2011 [29]. The following table shows the results of a survey in

    [30] citing reasons given for implementing grid technology:

    Reduce overall capital costs 69%

    Increase performance/service levels 62%

    Greater flexibility in assigning IT resources 52%

    Improve utilization rates 41%

    Reduce IT staffing costs 41%

    17

  • 8/6/2019 Grid Computing Project

    20/24

    Reduce IT upgrade cycle 17%

    Reduce data center floorspace 17%

    Corporations are finding other uses for grid computing besides taking advantage of ex-

    cess cycles on desktops. For example, Ebay is using grid computing to spread work across their

    more than 15,000 servers [31]. Its system administrators normally have to manage each server

    individually, but with grid technology they can manage entire domains together. One problem

    they are facing, however, is the ability to find common grid computing standards in the industry.

    Industry organizations such as the Enterprise Grid Alliance (EGA) are working to resolve such

    issues.

    Many companies are experimenting with grid computing by incrementally adding grid

    technologies alongside their current IT systems. Rather than immediately install grid middle-

    ware on all the desktops and risk bringing down mission-critical systems, many companies are

    adding dedicated grids that are used for the most resource-intensive computations. For example,

    UPS recently moved its billing application from a mainframe to a Linux grid [32]:

    In this approach, the grid does not completely replace the traditional mainframe, but

    complements it. So far, it has been a success: the UPS team discovered that a process that took

    270 minutes on the mainframe could be done in less than 40 minutes on a mere two-server, 8-cpu

    18

  • 8/6/2019 Grid Computing Project

    21/24

    grid. As predicted by Amdahls law, however, they found diminishing returns when adding a

    third or fourth server, with only a few percentage points of performance differential. Another

    major problem UPS ran into was licensing - grid computing doesnt help if you dont have any

    software to run on the grid. It turns out that many software licenses are node locked, which

    means they tie the software to a designated computer. Grid computing requires concurrent use

    licenses, which allow more than one user to run the software simultaneously.

    Within the realm of incremental approaches, Sun Microsystems offers an innovative and

    perhaps ironic approach. In the past, computer vendors offered server time on a per-use basis -

    for example, a company might pay to use a server for a given amount of time. This approach fell

    out of favor when the price of PCs fell dramatically. Yet now Sun is once again reviving the on-

    demand computing approach with the Sun Grid Utility [33], which allows the public to use Suns

    grid for $1 per CPU hour. For example, if a job uses 1000 of the grid CPUs for one minute, it

    would count as 16.67 CPU hours, and hence cost $17 [34]. This allows companies to tap into

    large amounts of computing power when they need it, and reduces the cost of capital for startups,

    who dont have to purchase servers immediately. Suns ostensible strategy is to give people a

    chance to experience the capability of grids, as a way of driving business to Suns grid comput-

    ing offerings. This heralds a potential future where large corporations will be able to sell their

    idle CPU cycles to drive down their computing costs.

    ConclusionThus, grid computing delivers high quality services through a decentralized coordination

    of resources. Open source protocols and standardized interfaces are now bringing the advanced

    distributed job handling capabilities of grid networks to a wider audience than ever before. The

    advantages of this are clear: more efficient use of computational resources and increased produc-

    tivity, at a lower cost than other computing paradigms. Perhaps more importantly, grid comput-

    ing offers solutions to problems so large that they were previously considered infeasible or cost

    prohibitive.

    Even so, risks remain: security concerns and Amdahls law place significant limitations

    on the ultimate reach of grid computing. But if future development of grid computing remains

    19

  • 8/6/2019 Grid Computing Project

    22/24

    consistent with its historical trends, bright researchers and a private sector with a vested interest

    will continue to develop new and innovative methodologies to minimize these drawbacks.

    AppendixTask Summary

    Vincent Poon

    Drawbacks, Risks, Limitations

    Applications of Grid Computing

    Oby Sumampouw

    Implementation of Grid Computing

    Benefits of Grid Computing

    20

    References1 Foster, I. What is the Grid? A Three Point Checklist. Argonne National Laboratory & University of Chi-

    cago, July 20, 2002. Available online at http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf

    2 Silva, V. Grid Computing for Developers. Hingham, Massachusettts, Charles River Media, Inc. 2006.

    3 Sullivan III, W. T., Werthimer, D., Bowyer, S., Cobb, J., Gedye, D. & Anderson, D. A new major SETIproject based on Project Serendip data and 100 000 personal computers. In Proc. 5th Int. Conf. Bioas-tronomy (ed. C. B. Cosmovici, J. Bowyer & D. Werthimer). Bologna, Italy: Editrice Composition. IAU Col-loquium No. 161. 2001

    4 Abbas, A. Grid Computing: A Practical Guide to Technology and Applications. Hingham, Massachusetts,

    Charles River Media, Inc. 2004.

    5 Beckman, P.H. Building The Tera Grid. Philosophical Transactions of The Royal SocietyA. (2005) 363,p17151728.

    6 Chang, H., Li, K., Lin, Y., Yang, C., Wang, H., Lee, L. Performance Issues of Grid Computing Based onDifferent Architecture Cluster Computing Platforms. Proceedings of the 19th International Conference onAdvanced Information Networking and Applications(AINA05) Vol 2, p321-324. Issued 28-30 March

    2005.

    7 Gannon, D., Chiu, K., Govindaraju, M., and Slominski, A. An Analysis of the Open Grid Services Archi-

    tecture. Department of Computer Science, Indiana University, Bloomington, IN. Available online athttp://www.extreme.indiana.edu/~aslom/papers/ogsa_analysis3.html

    8 Coveney, P.V. Scientific Grid Computing Philosophical Transactions of The Royal Society A. (2005)363, p17071713.

    9 Foster, I. The Grid: a new infrastructure for 21st century science. Physics Today, v 55, n 2, Feb.2002, p 42-7.

    http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdfhttp://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdfhttp://www.extreme.indiana.edu/~aslom/papers/ogsa_analysis3.htmlhttp://www.extreme.indiana.edu/~aslom/papers/ogsa_analysis3.htmlhttp://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdfhttp://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf
  • 8/6/2019 Grid Computing Project

    23/24

    21

    10 Humphrey, M; Thompson, M. Security for Grids. Proceedings of the IEEE, v 93, n 3, March, 2005,

    p 644-652

    11 Johnston, W.; Jackson, K.; Talwar, S. Overview of Security Considerations for Computational andData Grids. Proceedings 10th IEEE International Symposium on High Performance Distributed Comput-ing, 2001, p 439-40

    12 Cummings, M.; Huskamp, J. Grid Computing. EDUCAUSE Review, vol. 40, no. 6 (November/December 2005): 11617.

    13 Ramakrishnan, L. Source. Securing Next Generation Grids. IT Professional, v 6, n 2, March-April2004, p 34-9

    14 Hermida, A. When Screensavers are a Crime. BBC news online, Jan 28, 2002. HTTP:

    http://news.bbc.co.uk/1/hi/sci/tech/1782050.stm

    15 Surveyer, J. Grid Computing Uses Spare CPU Power. NetworkWorld, July 15, 2002. HTTP:http://www.networkworld.com/news/tech/2002/0715tech.html

    16 McGinnis, L.F.; Thigpen, W.; Hacker, T.J.. Accounting and Accountability for Distributed and Grid Sys-

    tems. Proceedings CCGRID 2002. 2nd IEEE/ACM International Symposium on Cluster Computing andthe Grid, 2002, p 284-5

    17 Zhengyou, L.; Zhang, L.; Shoubin, D.; Wenguo, W. Charging and Accounting for Grid Computing

    System. Grid and Cooperative Computing. Second International Workshop (GCC 2003) (Lecture Notesin Comput. Sci. Vol.3032), 2004, pt. 2, p 644-51 Vol.2

    18 Stiller, B.; Gerke, J.; Flury, P.; Reichl, P. Charging Distributed Services of a Computational Grid Archi-

    tecture. Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid,2001, p 596-601

    19 Buyya, R.; Abramson, D.; Venugopal, S. The Grid Economy. Proceedings of the IEEE, v 93, n 3,

    March, 2005, Grid Computing, p 698-714

    20 Amdahl, Gene., Validity of the single processor approach to achieving large scale computing capabili-ties, AFIPS spring joint computer conference, 1967.

    21 Null, L.; Lobur, J. The Essentials of Computer Organization and Architecture, Second Edition, 2006, p

    328-329

    22 Browne, J. Performance and Scalability. CS395T Lecture Notes. HTTP:http://www.cs.utexas.edu/~browne/CS395Tf2002/

    23 Korpela, E.; Werthimer, D.; Anderson, D.; Cobb, J.; Lebofsky, M. SETI@HOME - Massively distrib-

    uted computing for SETI. Computing in Science and Engineering, v 3, n 1, January/February, 2001, p78-83

    24 Anderson, D.P.; Cobb, J.; Korpela, E.; Lebofsky, M.; Werthimer, D. SETI@home: an experiment inpublic-resource computing. Communications of the ACM, v 45, n 11, Nov. 2002, p 56-61

    25 Bansal, R. ET or EC? IEEE Antennas and Propagation Magazine, v 43, n 4, Aug. 2001, p 118

    26 Larson, S. M.; Snow, C. D.; Shirts, M.; Pande, V. S. Folding@Home and Genome@Home: Using dis-tributed computing to tackle previously intractable problems in computational biology. Computational Ge-

    nomics, Horizon Press, 2002

    http://www.cs.utexas.edu/~browne/CS395Tf2002/%06http://www.networkworld.com/news/tech/2002/0715tech.html%06http://www.cs.utexas.edu/~browne/CS395Tf2002/%06http://www.cs.utexas.edu/~browne/CS395Tf2002/%06http://www.networkworld.com/news/tech/2002/0715tech.html%06http://www.networkworld.com/news/tech/2002/0715tech.html%06
  • 8/6/2019 Grid Computing Project

    24/24

    27 Folding@Home client statistics by OS. HTTP:

    http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats

    28 Thiboudeau, P. IBM Expands Grid Offerings. Computerworld. May 5, 2003. Vol. 37, Iss. 18; p. 7

    29 Vreede, S.V. Grid Computing Market Trends. Faulkners Advisory for IT Studies, March 2007.30 Summit Strategies. Grid Computing Facts. InfoTech Trends, Apr 2004.

    31 Patrick T. EBay Seeks Grid Standards as It Expands Massive System. Computerworld, Sep 25,

    2006. Vol. 40, Iss. 39; p. 18

    32 Julie B. How to avoid bumps on the road to grid computing. Network World. Feb 19, 2007. Vol. 24,Iss. 7; p. 32

    33 Solheim, S. Sun Grid Goes Live. InfoWorld, 3/20/2006, Vol. 28 Issue 12, p17

    34 Sun Utility Computing website, HTTP: http://www.sun.com/service/sungrid/

    http://www.sun.com/service/sungrid/%06http://www.sun.com/service/sungrid/%06http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats%06http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats%06http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats%06http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats%06