starting for the cloud, ow2 conference nov10
TRANSCRIPT
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
Starting for the cloud -- two issuses in cluster:
resource allocation and overload management
Ziyou Wang, Yan Li, Chao You, Minghui Zhou Peking University
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
Agenda
Cloud Computing: Challenges Resource Allocation
Shared cluster Resource allocation planning
Overload Management Examples Automatic degradation mechanism Considerations
Cloud Computing: Challenges� The emergence of cloud computing makes it a cost-efficient way
for application providers to lease the computing resources from a third provider Benefit: increase resource utilization, improve business agility,
decrease power consumption… But how to effectively allocate various resources in cloud to
different applications is still an open problem. When the applications host in the cloud face with overload, which
means the demand on at least one of the cloud’s resources exceeds the capacity of that resource, what can we do to handle this situation?
… …
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
Shared Cluster� Considering one kind of cloud implementation: the workloads of
different web applications are not correlated, a large-scale cluster, called shared cluster or data center, is maintained to host a large number of applications simultaneously Each application runs on a subset of nodes Each node may run multiple applications
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
Users Enterprises
Third parties
Resource Allocation: a scenario As the cluster’s resources are no longer occupied by one
application, it requires the cluster to allocate the resources on demand For example
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
middleware
Node 150
app D
High-‐throughput low-‐latency network
app C
An increase of app A,C’s workload Place new instances in the data center
re-allocate workload
middleware
Node 1
app A app C
Repository
Apps …
Other nodes
Dispatcher
Applica>on users
middleware
Node 16
app B app A
middleware
Node 99
app B app A
Shared cluster
Self-adaptive Resource Allocation Model
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
Resource alloca>on planning
Resource alloca>on execu>on
Requests
Self-‐adap4ve resource alloca4on
Our Resource Allocation Work
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
Middleware
Virtual Machine Monitor�
VM�
customized JOnAS�app a�
… Resource par>>oner�
App deployer�
Dispatcher �
requests
Repository�
VM�
customized JOnAS�app x �
Communicator �
Local valuator�
Resource alloca>on planning�
Resource alloca>on execu>on �
Middleware�
Resource alloca>on planning�
……
coopera>on �
Management Console �
commands messages
For the resource allocation planning, we propose a decentralized resource allocation planning approach • Nodes decide their own resource allocation • Market-based coordination is adopted to help them make the resource decision Until now, the approach is evaluated with a serial of simulated experiments, and is being implemented in the cluster with JO2nAS
Resource Allocation Planning To support application prioritization, applications can be assign
with the different utility values. Accordingly, the goal of resource management is to maximize the total utility values of the requests satisfied
Inspired by human market, we model the shared cluster as a market, where shares of application requests are treated as goods and nodes as dealers to exchange goods
Basing on local valuation of the goods, each node autonomously and continuously trades with others in order to find an application share combination which fits the node’s resource constrains and maximize its income
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
Resource Allocation Planning When a node wants to sell, more than one node may want to buy.
To make the seller transfer the goods to the appropriate buyers, an auction mechanism is adopted
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
1. multicast
4. notify
2.1 valuation
2.1 valuation
2.1 valuation
4. inform
(appC, 50%, 100 req/s)
...
Node 1
app A app C Node 50
app A app B
Node 65
app B app C
Node 100
app B app D
...
Nodes
app
...
...
want C, 35%
want C, 20%
2.2
Sell C 30%
2.2
Sell C 20%
3. sort
4 notify N100: …
N65: …(app C, 10%)
N50: …
N1: … (app C, 70%)
Dispatcher
N100: …
N65: …(app C, 30%)
N50: … (app C,30%)
N1: … (app C, 20%)
update
(app C , 30% to n50, 20% to n65)
middleware
middleware
middleware middleware
middleware
Our Resource Allocation Work
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
Middleware
Virtual Machine Monitor�
VM�
customized JOnAS�app a�
… Resource par>>oner�
App deployer�
Dispatcher �
requests
Repository�
VM�
customized JOnAS�app x �
Communicator �
Local valuator�
Resource alloca>on planning�
Resource alloca>on execu>on �
Middleware�
Resource alloca>on planning�
……
coopera>on �
Management Console �
commands messages
For the resource allocation execution • Integrate a VMM into the middleware • Automatically load the app and partition the resource at runtime via VMM • Customize JOnAS for the app, and store the customized image in the repository • Proportionally workload dispatching Now, we use Open VZ, a lightweight OS level VMM, as a case study, and are trying to integrate OpenVZ into the middleware
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
Agenda
Cloud Computing: Challenges Resource Allocation
Shared cluster Resource allocation planning
Overload Management Examples Automatic degradation mechanism Considerations
Examples
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
On September 11th 2001, for instance, the workload on a popular news web site increased by an order of magnitude in 30 min, with the workload doubling every 7 min in that period.
April 21th 2010, is the China National Mourning for Yushu Quake Victims. Theatre and sporting performances are cancelled, karaoke bars shut and the culture ministry has ordered suspension of all online music, games, comics, films and TV shows.
Too many people choose to visit an online shopping site.
When overload happens? Overload prevention is a critical goal so that a system can remain
operational in the presence of overload even when the incoming request rate is several times greater than the system’s capacity.
It is well known that the workload seen by Internet applications varies over multiple time-scales and often in an unpredictable fashion.
Unexpected things are always happening: Featured on national television or in a major newspaper. Under-provisioning for sales-boosting holidays
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
The TaoBao Architecture
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
Apache + Application Server + MySQL 200+ applications, thousands of components 12k servers 2k~3k java servers
Search
Product Browsing
Product Recommendation
Shop Cart
14/46
The Reality – Manual Service Degradation
In response to overload: CNN replaced its front page with simple HTML page that could
be transmitted in a single Ethernet packet . Taobao turned off a sub system.
All these techniques are implemented manually, though a better approach would be to degrade service gracefully and automatically in response to load. Which point causes overload? Which resource is the bottleneck? Which service should be degraded or turned off? All user be affected or not?
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
Automatic Degradation Mechanism
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
Overload Priority defines the priorities of different services and degradation actions can be taken.
Overload Detection is responsible for signaling the occurrence of instable status of the application.
Overload Localization is triggered to locate the bottleneck of resources. Overload Controller will take appropriate actions to degrade some
unnecessary services to release more resources to support key services.
Resource Monitoring�
Mechanism�
Overload Detection�
Overload �Localization�
Overload Controller�
Performance �Metrics�
Degradation�Actions�
-�Applications�
Service � Service �
Service� Service�
Service �Overload Priority�
Automatic Application Degradation
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
Cluster level degradation Coarse-grained
Sub-system level degradation
Resource management Service differentiation
Node level degradation Fine-grained
Component level degradation Middleware level degradation
Considerations Hard to be transparent to the user ( what can de degraded?
sometimes how?)
Using it alone can contribute to delay overload, but it needs to be combined with other techniques to be fully effective. Dynamic resource allocation Admission control Service differentiation … …
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.
OW2 Annual Conference 2010, November 24-25, La Cantine, Paris. www.ow2.org.