c loud c omputing : f eatures, i ssues and c hallenges demetris delgeris

CLOUD COMPUTING: FEATURES, ISSUES AND CHALLENGESDemetris Delgeris

INTRODUCTION

Cloud Computing

A definition refers to any situation in which computing is done in a remote location (out in the clouds), rather than on your desktop or portable device.

You tap into that computing power over an Internet connection.

“The cloud is a smart, complex, powerful computing system in the sky that people can just plug into."

--Web browser pioneer Marc Andreessen

Refers to both:

the applications delivered as services over the Internet and

the hardware and systems software in the datacenters that provide those services.

The services themselves have long been referred to as Software as a Service (SaaS).

The datacenter hardware and software is what we will call a Cloud.

Public Cloud: offered as pay-as-you-go to the general public

Private Cloud: internal datacenters of a business

CLOUD TYPES

Two different but related types of clouds are:

those that provide computing instances on demand

and those that provide computing capacity

The first is designed to scale out by providing additional computing instances

The second is designed to support data- or compute-intensive applications via scaling capacity.

FEATURES

WHAT’S NEW Scaling:

Company infrastructures scale over several (or more) data-centres.

Pricing: pricing model that lets you pay as you go and for just the

services that you need. No capital expenditure is required.

Simplicity: writing code for high-performance and distributed

computing was relatively complicated (explicitly passed messages between nodes and other specialized methods).

cloud-based storage service APIs and Map-Reduce-style computing (parallel programming method) APIs are relatively simple compared to previous methods.

HARDWARE POINT OF VIEW

3 new aspects:

The illusion of infinite computing resources available on demand

The elimination of an up-front commitment by Cloud users

The ability to pay for use of computing resources on a short-term basis as needed

CLOUD COMPUTING SERVICE MODELS

Software As A Service: Gmail

Defined as service-on-demand, where a provider will license software tailored to user needs

Customers can utilize greater computing power while saving on the following

– Cost– Space– Power Consumption

End users don’t have control over cloud infrastructure


Platform As A Service: Google App Engine

Provides all the facilities necessary to support the complete process of building and delivering web applications and services, all available over the internet

Has to possess development infrastructure including programming environment, tools and configuration management


Infastructure As A Service: Amazon EC2

Defined as delivery of computer infrastructure as a service Fully outsourced service so businesses do not have to

purchase servers, software or equipment

Infrastructure providers can dynamically allocate resources for service providers

LAYERS OF CLOUD COMPUTING

A cloud client consists of computer hardware and computer software that relies on cloud computing for application delivery, or that is specifically designed for delivery of cloud services

Deliver software as a service over the Internet, eliminating the need to

install and run the application on the customer’s own pc and simplify maintenance and support.

Cloud platform services deliver a computing platform as a service, often consuming cloud infrastructure and sustaining cloud applications.

LAYERS OF CLOUD COMPUTING

Cloud infrastructure services delivers computer infrastructure

The servers layer consists of computer hardware and software products that are specifically designed for the delivery of cloud services.

CLOUD APPLICATION CHARACTERISTICS These represent ideals that people want for the

applications in the cloud

Incremental Scalability Agility: the cloud provides flexible, automated

management to distribute the computing resources among the cloud’s users

Availability: Cloud environments take advantage of the large numbers of servers by enabling high levels of availability

SLA-driven: Clouds are managed dynamically based on service-level agreements that define policies like delivery parameters, costs, and other factors

APIs: Because clouds virtualize resources as a service they must have an application programming interface

ISSUES AND CHALLENGES

CLOUD STORAGE ISSUE

Is a model of networked Computer data storage where data is stored on multiple virtual servers, generally hosted by third parties, rather than being hosted on dedicated servers.

Hosting companies operate large data centers; and people who require their data to be hosted buy or lease storage capacity from them and use it for their storage needs.

Requires an ability to keep data synchronized even though it is stored in two or more distinct geographies.

CLOUD STORAGE ISSUE That requires attacking three key issues:

The efficient transfer of large data blocks or files over long distances,

Caching technologies that can help to overcome some of the distance delays, and

Synchronization and coordination among the storage sites

The second and third items on the list are being addressed by a number of storage technology vendors

The issue of efficient high-speed transfer remains elusive.

PERFORMANCE ISSUE:HIGH SPEED TRANSFERS

The delays in accessing storage at a distance can be orders of magnitude compared to local storage.

A huge amount of it arises in the low level details of transmitting large blocks of data over long distance links using traditional networking technologies.

Goal: to raise performance level of remote storage to a level where delta between it and local storage can be concealed with caching.

Moving data between remote sites depends on a reliable transport such as TCP.

TCP uses a sliding window protocol (congestion) to detect failures or collisions within the network.

These failures or collisions result in dropped packets and can impact the network performance.

Congestion window determines the number of bytes that the sender can transmit before it must stop and wait for an acknowledgement from the receiver.

Performance metric is the bandwidth-delay product of the interconnect.

It is a measure of the amount of data that can be stored ‘in transit’ on the wire.

If TCP’s max congestion window is much smaller than the bandwidth-delay product for the wire connection, the transport will not be able to keep the wire continuously full.

For LANs the bandwidth-delay product of a reasonable wire length is on the order of magnitude of a typical TCP congestion window size.

As the bandwidth-delay product of the wire increases, either due to increasing length or increasing bandwidth, so too must the congestion window size.

A larger window size means that more bytes are in-flight at any given point so:

Risk of loss due to congestion dropping or error.

As the congestion window size increases, so too does the overhead associated with recovering from a lost byte.

RDMA OVER WANS It is as a message passing service

Applications exchange messages with each other

directly using shared memory buffers.

Efficiently implementing the RDMA access method over a network depends on a network transport mechanism

which is suitable for transporting memory buffers between servers.

InfiniBand Architecture instead of TCP:

extremely low end-to-end latencies ability to reduce the memory bandwidth burdens on end

nodes packet congestion window instead of byte congestion window

DATA MOBILITY ISSUE Cloud data may reside on a location a long way away

geographically from the organization that owns the data.

Cloud providers may decide to keep moving data from one location to another.

There are several reasons for this, including:

Reducing the cost of storing data Efficiency of retrieval of data Efficient linking of different data resident on different locations Resource optimization.

High levels of data mobility have negative implications for:

Data security and data protection Data availability.

DATA LOCATION ISSUE Applications have little or no information regarding the

location of their data in the network. Without this information, applications cannot optimize their

execution by moving computation closer to data, data closer to users, or related data closer to each other.

The current state-of-the-art solution for this problem

involves guesswork: the cloud determines data placement by predicting the future

access patterns of the application based on past history, treating the application as a black box.

counter-productive since the application typically has more accurate information than the cloud about its own future behaviour.

Another idea is to expose the location of data to applications and allow them to optimize their own execution. We want applications to be able to estimate the time taken to update or retrieve data from different network locations.

DATA LOCATION ISSUE

The Contour System uses replication topologies to monitors network interactions

The basic functionality provided by Contour is data access latency estimation: Applications can estimate the time taken to read or

write data from any compute node in the network. closest-node discovery constraint satisfaction

This is useful if the application wants to choose an existing compute node to run a particular task based on the data it accesses.

It is also useful for cloud allocation (moving data closer to given network locations, or for requesting new resources near existing data.

RELIABILITY ISSUE The cloud computing is more service-oriented than

resource-oriented

The reliability of the cloud computing is very critical but hard to analyze due to its characteristics of massive-scale service sharing, wide-area network, heterogeneous software/hardware components and complicated interactions among them.

Cloud Service Reliability: the probability that a cloud service under consideration can be

successfully completed for a user in a specified period of time.

Possible Failures: overflow, timeout, data/computing resource missing,

software/hardware/network failure

Request stage Failures: Overflow, Timeout The due time for a specific service is the allowed time spent from

the submission of the job request to the completion of the job. If a job request is not served by a scheduler before the due time, it

will be dropped. The dropping rate is denoted by μd. Τhe arrival of submissions of job requests follow a Poisson process

with the arrival rate of λd . State n (n=0,1,…,N) represents the number of requests in the

queue.

At state N, the arrival of a new request will make the request queue overflow, so the request is dropped and the queue still stays at state N. The service rate of a request by a schedule server is μr . If n ≤ S , then n requests can be immediately served by the S schedule servers, so the departure rate of any one request is equal to nμr . If n > S , only S requests are being simultaneously served by schedule servers, so the departure rate is Sμr. The dropping rate for any one request in the

queue to reach its due time is nμd (n=1,2,…,N). qn is the steady probability for the system to stay at state n (n=0,1,…,N). It is easy to derive qn by solving the following Chapman-Kolmogorov equations:

To study the timeout failure, suppose the current length of the request queue is n (n=0,1,…,N-1) when the new service request under consideration arrives. The probability density function of waiting time to complete the n requests by S schedule servers is

If the waiting time is longer than the due time Τd , the timeout failure occurs.

NETWORKING ISSUE Network's mission in cloud computing:

connecting the servers into a resource pool and then connecting users to the correct resources

Public Networking Issues

Not all cloud computing providers will support encrypted tunnels, so your information may be sent in the open on the Internet.

Where encryption is available, using it will certainly increase delay and may impact performance.

The only way to reduce delay without compromising security is by minimizing transit “hops”

reaching a given cloud computing service may involve transiting several provider networks

The best ISP combination in terms of delay will almost always be one with the smallest number of hops.

NETWORKING ISSUE Private Networking Issues

Enterprises will access their own private clouds using the same technology they employed for access to their data centers. (Internet VPN)

All cloud computing implementations will rely on intra-cloud networking to link users with the resource

The performance of those connections will then impact cloud computing performance overall.

Security Principles C I A (Confidentiality, Integrity, Availability)

Provider Security

Threads Provider controls servers, network, etc. Customer must trust provider’s security Failures may violate CIA principles

Countermeasures Verify and monitor provider’s security

SECURITY ISSUE

SECURITY ISSUE

Attacks from other customers security

Threads

Provider resources shared with untrusted parties Customer data and applications must be separated Failures will violate CIA principles

Countermeasures

VPNs, VLANs, firewalls for network separation Cryptography (strong)

SUMMARY AND CONCLUSIONS

SUMMARY

Pros:

Reduced Hardware equipment/maintenance cost for end users

Improved Performance Accessibility Flexibility

Cons:

Performance in terms of Internet Connection speed

Availability Security

3 major services:

Infastructure as a Service Platform as a Service Software as a Service

CONCLUSIONS With cloud computing, the “unit of computing” has

moved from a single computer or rack of computers to a data center of computers resulting in complexity increase.

It has also introduced software, systems, and programming models that significantly reduce the complexity of accessing and using these resources.

Cloud computing provides a super computing power.

The applications and data served by cloud are available to broad group of users.

REFERENCES M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, D.

Patterson, A. Rabkin, and I. Stoica, "Above the clouds: A Berkeley view of cloud computing," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-28, 2009.

Mladen A. Vouk “Cloud Computing – Issues, Research and Implementations “, Department of Computer Science, North Carolina State University, Journal of Computing and Information Technology - CIT 16, 2008, 4, 235–246

Tharam Dillon, Chen Wu and Elizabeth Chang, “Cloud Computing: Issues and Challenges”, Curtin University of Technology Perth, Australia, 2010 24th IEEE International Conference on Advanced Information Networking and Applications

Robert L. Grossman,” The Case for Cloud Computing”, University of Illinois at Chicago and Open Data Group,IT Professional magazine

Paul Grun, Storage at a Distance; Using RoCE as a WAN Transport, System Fabric Works, Inc.

Paul T. Jaeger, Jimmy Lin, Justin M. Grimes “Cloud Computing and Information Policy: Computing in a Policy Cloud?”, Journal of Information Technology & Politics

Yuan-Shun Dai, Bo Yang, Jack Dongarra, Gewei Zhang “Cloud Service Reliability: Modeling and Analysis”, Innovative Computing Laboratory, Department of Electrical Engineering & Computer Science, University of Tennessee, Knoxville, TN, USA

Birjodh Tiwana, Mahesh Balakrishnan, Marcos K. Aguilera, Hitesh Ballani, Z. Morley Mao

“Location, Location, Location! Modeling Data Proximity in the Cloud”,

QUESTIONS

c loud c omputing : f eatures, i ssues and c hallenges demetris delgeris

Documents

state n n

n requests

n etworking

new request

s requests

request queue overflow

new service request

ssue slide