system problems associated with current centralized...
TRANSCRIPT
The increasing need of cloud and big data applications requires data center networks to be scalable and
bandwidth-rich. Current data center network architectures often use rigid topologies to increase
network bandwidth. A major limitation is that they can hardly support incremental network growth.
Recent work has been investigating new network architecture and protocols to achieve three important
design goals of large data centers, namely, high throughput, routing and forwarding scalability, and
flexibility for incremental growth. Unfortunately, existing data center network architectures focus on
one or two of the above properties and pay little attention to the others. In this paper, we design a novel
flexible data center network architecture, Space Shuffle (S2), which applies greedy routing on multiple
ring spaces to achieve high-throughput, scalability, and flexibility. The proposed greedy routing
protocol of S2 effectively exploits the path diversity of densely connected topologies and enables key-
based routing. Extensive experimental studies show that S2 provides high bisectional bandwidth and
throughput, near-optimal routing path lengths, extremely small forwarding state, fairness among
concurrent data flows, and resiliency to network failures.
ETPL
PDS -001 Space Shuffle: A Scalable, Flexible, and High-Performance Data Center
Network
In current data centers, an application (e.g., MapReduce, Dryad, search platform, etc.) usually
generates a group of parallel flows to complete a job. These flows compose a coflow and only
completing them all is meaningful to the application. Accordingly, minimizing the average Coflow
Completion Time (CCT) becomes a critical objective of flow scheduling. However, achieving this goal
in today’s Data Center Networks (DCNs) is quite challenging, not only because the schedule problem
is theoretically NP-hard, but also because it is tough to perform practical flow scheduling in large-scale
DCNs. In this paper, we find that minimizing the average CCT of a set of coflows is equivalent to the
well-known problem of minimizing the sum of completion times in a concurrent open shop. As there
are abundant existing solutions for concurrent open shop, we open up a variety of techniques for coflow
scheduling. Inspired by the best known result, we derive a 2-approximation algorithm for coflow
scheduling, and further develop a decentralized coflow scheduling system, D-CAS, which avoids the
system problems associated with current centralized proposals while addressing the performance
challenges of decentralized suggestions. Trace-driven simulations indicate that D-CAS achieves a
performance close to Varys, the state-of-the-art centralized method, and outperforms Baraat, and the
only existing decentralized method, significantly.
ETPL
PDS - 002 Towards Practical and Near-optimal Coflow Scheduling for Data Center
Networks
Large scale iterative graph computation presents an interesting systems challenge due to two well
known problems: (1) the lack of access locality and (2) the lack of storage efficiency. This paper
presents PathGraph, a system for improving iterative graph computation on graphs with billions of
edges. First, we improve the memory and disk access locality for iterative computation algorithms on
large graphs by modeling a large graph using a collection of tree-based partitions. This enables us to
use path-centric computation rather than vertexcentric or edge-centric computation. For each tree
partition, we re-label vertices using DFS in order to preserve consistency between the order of vertex
ids and vertex order in the paths. Second, a compact storage that is optimized for iterative graph parallel
computation is developed in the PathGraph system. Concretely, we employ delta-compression and
store tree-based partitions in a DFS order. By clustering highly correlated paths together as tree based
partitions, we maximize sequential access and minimize random access on storage media. Third but
not the least, our path-centric computation model is implemented using a scatter/gather programming
model. We parallel the iterative computation at partition tree level and perform sequential local updates
for vertices in each tree partition to improve the convergence speed. To provide well balanced
workloads among parallel threads at tree partition level, we introduce the concept of multiple stealing
points based task queue to allow work stealings from multiple points in the task queue. We evaluate
the effectiveness of PathGraph by comparing with recent representative graph processing systems such
as GraphChi and X-Stream etc. Our experimental results show that our approach outperforms the two
systems on a number of graph algorithms for both in-memory and out-of-core graphs. While our
approach achieves better data balance and load balance, it also shows better speedup than the two -
ystems with the growth of threads.
ETPL
PDS -003 PathGraph: A Path Centric Graph Processing System
The shift to the in-memory data processing paradigm has had a major influence on the development of
cluster data processing frameworks. Numerous frameworks from the industry, open source community
and academia are adopting the in-memory paradigm to achieve functionalities and performance
breakthroughs. However, despite the advantages of these inmemory frameworks, in practice they are
susceptible to memorypressure related performance collapse and failures. The contributions of this
paper are two-fold. Firstly, we conduct a detailed diagnosis of the memory pressure problem and
identify three preconditions for the performance collapse. These preconditions not only explain the
problem but also shed light on the possible solution strategies. Secondly, we propose a novel
programming abstraction called the leaky buffer that eliminates one of the preconditions, thereby
addressing the underlying problem. We have implemented a leaky buffer enabled hashtable in Spark,
and we believe it is also able to substitute the hashtable that performs similar hash aggregation
operations in any other programs or data processing frameworks. Experiments on a range of memory
intensive aggregation operations show that the leaky buffer abstraction can drastically reduce the
occurrence of memory-related failures, improve performance by up to 507% and reduce memory usage
by up to 87.5%.
ETPL
PDS - 004 Leaky Buffer: A Novel Abstraction for Relieving Memory Pressure from
Cluster Data Processing Frameworks
Among uncertain graph queries, reachability, i.e., the probability that one vertex is reachable from
another, is likely the most fundamental one. Although this problem has been studied within the field
of network reliability, solutions are implemented on a single computer and can only handle small
graphs. However, as the size of graph applications continually increases, the corresponding graph data
can no longer fit within a single computer’s memory and must therefore be distributed across several
machines. Furthermore, the computation of probabilistic reachability queries is #P-complete making it
very expensive even on small graphs. In this paper, we develop an efficient distributed strategy, called
DistR, to solve the problem of reachability query over large uncertain graphs. Specifically, we perform
the task in two steps: distributed graph reduction and distributed consolidation. In the distributed graph
reduction step, we find all of the maximal subgraphs of the original graph, whose reachability
probabilities can be calculated in polynomial time, compute them and reduce the graph accordingly.
After this step, only a small graph remains. In the distributed consolidation step, we transform the
problem into a relational join process and provide an approximate answer to the #P-complete
reachability query. Extensive experimental studies show that our distributed approach is efficient in
terms of both computational and communication costs, and has high accuracy.
ETPL
PDS -005 DistR: A Distributed Method for the Reachability Query over Large
Uncertain Graphs
The alignment of many short sequences of DNA, called reads, to a long reference genome is a common
task in molecular biology. When the problem is expanded to handle typical workloads of billions of
reads, execution time becomes critical. In this paper we present a novel reconfigurable architecture for
minimal perfect sequencing (RAMPS). While existing solutions attempt to align a high percentage of
the reads using a small memory footprint, RAMPS focuses on performing fast exact matching. Using
the human genome as a reference, RAMPS aligns short reads hundreds of thousands of times faster
than current software implementations such as SOAP2 or Bowtie, and about a thousand times faster
than GPU implementations such as SOAP3. Whereas other aligners require hours to preprocess
reference genomes, RAMPS can preprocess the reference human genome in a few minutes, opening
the possibility of using new reference sources that are more genetically similar to the newly sequenced
data.
ETPL
PDS - 006 RAMPS: A Reconfigurable Architecture for Minimal Perfect Sequencing
Due to the increasing usage of cloud computing applications, it is important to minimize energy cost
consumed by a data center, and simultaneously, to improve quality of service via data center
management. One promising approach is to switch some servers in a data center to the idle mode for
saving energy while to keep a suitable number of servers in the active mode for providing timely
service. In this paper, we design both online and offline algorithms for this problem. For the offline
algorithm, we formulate data center management as a cost minimization problem by considering
energy cost, delay cost (to measure service quality), and switching cost (to change servers’s active/idle
mode). Then, we analyze certain properties of an optimal solution which lead to a dynamic
programming based algorithm. Moreover, by revising the solution procedure, we successfully
eliminate the recursive procedure and achieve an optimal offline algorithm with a polynomial
complexity. For the online algorithm, we design it by considering the worst case scenario for future
workload. In simulation, we show this online algorithm can always provide near-optimal solutions.
ETPL
PDS -007 Cost Minimization Algorithms for Data Center Management
This work presents an on-line, energy- and communication-aware scheduling strategy for SaaS
applications in data centers. The applications are composed of various services and represented as
workflows. Each workflow consists of tasks related to each other by precedence constraints and
represented by Directed Acyclic Graphs (DAGs). The proposed scheduling strategy combines
advantages of state-of-the-art workflow scheduling strategies with energy-aware independent task
scheduling approaches. The process of scheduling consists of two phases. In the first phase, virtual
deadlines of individual tasks are set in the central scheduler. These deadlines are determined using a
novel strategy that favors tasks which are less dependent on other tasks. During the second phase, tasks
are dynamically assigned to computing servers based on the current load of network links and servers
in a data center. The proposed approach, called Minimum Dependencies Energy-efficient DAG
(MinD+ED) scheduling, has been implemented in the GreenCloud simulator. It outperforms other
approaches in terms of energy efficiency, while keeping a satisfiable level of tardiness.
ETPL
PDS - 008 Minimum Dependencies Energy-Efficient Scheduling in Data Centers
The advent of software defined networking enables flexible, reliable and feature-rich control planes
for data center networks. However, the tight coupling of centralized control and complete visibility
leads to a wide range of issues among which scalability has risen to prominence due to the excessive
workload on the central controller. By analyzing the traffic patterns from a couple of production data
centers, we observe that data center traffic is usually highly skewed and thus edge switches can be
clustered into a set of communicationintensive groups according to traffic locality. Motivated by this
observation, we present LazyCtrl, a novel hybrid control plane design for data center networks where
network control is carried out by distributed control mechanisms inside independent groups of switches
while complemented with a global controller. LazyCtrl aims at bringing laziness to the global controller
by dynamically devolving most of the control tasks to independent switch groups to process frequent
intra-group events near the datapath while handling rare inter-group or other specified events by the
controller. We implement LazyCtrl and build a prototype based on Open vSwitch and Floodlight.
Tracedriven experiments on our prototype show that an effective switch grouping is easy to maintain
in multi-tenant clouds and the central controller can be significantly shielded by staying “lazy”, with
its workload reduced by up to 82%.
ETPL
PDS - 009 LazyCtrl: A Scalable Hybrid Network Control Plane Design for Cloud
Data Centers
The increasing pervasiveness of mobile devices along with the use of technologies like GPS, Wifi
networks, RFID, and sensors, allows for the collections of large amounts of movement data. This
amount of data can be analyzed to extract descriptive and predictive models that can be properly
exploited to improve urban life. From a technological viewpoint, Cloud computing can play an
essential role by helping city administrators to quickly acquire new capabilities and reducing initial
capital costs by means of a comprehensive pay-as-you-go solution. This paper presents a workflow-
based parallel approach for discovering patterns and rules from trajectory data, in a Cloud-based
framework. Experimental evaluation has been carried out on both real-world and synthetic trajectory
data, up to one million of trajectories. The results show that, due to the high complexity and large
volumes of data involved in the application scenario, the trajectory pattern mining process takes
advantage from the scalable execution environment offered by a Cloud architecture in terms of both
execution time, speed-up and scale-up.
ETPL
PDS - 010 Trajectory Pattern Mining for Urban Computing in the Cloud
HLA-based simulation systems are prone to load imbalances due to lack management of shared
resources in distributed environments. Such imbalances lead these simulations to exhibit performance
loss in terms of execution time. As a result, many dynamic load balancing systems have been
introduced to manage distributed load. These systems use specific methods, depending on load or
application characteristics, to perform the required balancing. Load prediction is a technique that has
been used extensively to enhance load redistribution heuristics towards preventing load imbalances. In
this paper, several efficient Time Series model variants are presented and used to enhance prediction
precision for large-scale distributed simulation-based systems. These variants are proposed to extend
and correct the issues originating from the implementation of Holt’s model for time series in the
predictive module of a dynamic load balancing system for HLA-based distributed simulations. A set
of migration decision-making techniques is also proposed to enable a prediction-based load balancing
system to be independent of any prediction model, promoting a more modular construction.
ETPL
PDS - 011 Time Series-Oriented Load Prediction Model and Migration Policies for
Distributed Simulation Systems
With the increasing complexity of today’s high-performance computing (HPC) architectures,
simulation has become an indispensable tool for exploring the design space of HPC systems—in
particular, networks. In order to make effective design decisions, simulations of these systems must
possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely
manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC
network simulation frameworks, however, are constrained in one or more of these areas. In this work,
we present a simulation framework for modeling two important classes of networks used in today’s
IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer
Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies
at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-
event simulation. Our simulation framework meets all the requirements of a practical network
simulation and can assist network designers in design space exploration. First, it uses validated and
detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second,
instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit
simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to
achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems.
Third, our models give network designers a choice in simulating a broad range of network workloads,
including HPC application workloads using detailed network traces, an ability that is rarely offered in
parallel with high-fidelity network simulations.
ETPL
PDS - 012 Enabling Parallel Simulation of Large-Scale HPC Network Systems
The Internet was designed with the end-to-end principle where the network layer provided merely the
best-effort forwarding service. This design makes it challenging to add new services into the Internet
infrastructure. However, as the Internet connectivity becomes a commodity, users and applications
increasingly demand new in-network services. This paper proposes PacketCloud, a cloudlet-based
open platform to host in-network services. Different from standalone, specialized middleboxes,
cloudlets can efficiently share a set of commodity servers among different services, and serve the
network traffic in an elastic way. PacketCloud can help both Internet Service Providers (ISPs) and
emerging application/content providers deploy their services at strategic network locations. We have
implemented a proof-of-concept prototype of PacketCloud. PacketCloud introduces a small additional
delay, and can scale well to handle high-throughput data traffic. We have evaluated PacketCloud in
both a fully functional emulated environment, and the real Internet.
ETPL
PDS - 013 PacketCloud: A Cloudlet-Based Open Platform for In-Network Services
In current data centers, an application (e.g., MapReduce, Dryad, search platform, etc.) usually
generates a group of parallel flows to complete a job. These flows compose a coflow and only
completing them all is meaningful to the application. Accordingly, minimizing the average Coflow
Completion Time (CCT) becomes a critical objective of flow scheduling. However, achieving this goal
in today’s Data Center Networks (DCNs) is quite challenging, not only because the schedule problem
is theoretically NP-hard, but also because it is tough to perform practical flow scheduling in large-scale
DCNs. In this paper, we find that minimizing the average CCT of a set of coflows is equivalent to the
well-known problem of minimizing the sum of completion times in a concurrent open shop. As there
are abundant existing solutions for concurrent open shop, we open up a variety of techniques for coflow
scheduling. Inspired by the best known result, we derive a 2-approximation algorithm for coflow
scheduling, and further develop a decentralized coflow scheduling system, D-CAS, which avoids the
system problems associated with current centralized proposals while addressing the performance
challenges of decentralized suggestions. Trace-driven simulations indicate that D-CAS achieves a
performance close to Varys, the state-of-the-art centralized method, and outperforms Baraat, the only
existing decentralized method, significantly.
ETPL
PDS - 014 Towards Practical and Near-optimal Coflow Scheduling for Data Center
Networks
In past decades, significant attention has been devoted to the task allocation and load balancing in
distributed systems. Although there have been some related surveys about this subject, each of which
only made a very preliminary review on the state of art of one single type of distributed systems. To
correlate the studies in varying types of distributed systems and make a comprehensive taxonomy on
them, this survey mainly categorizes and reviews the representative studies on task allocation and load
balancing according to the general characteristics of varying distributed systems. First, this survey
summarizes the general characteristics of distributed systems. Based on these general characteristics,
this survey reviews the studies on task allocation and load balancing with respect to the following
aspects: 1) typical control models; 2) typical resource optimization methods; 3) typical methods for
achieving reliability; 4) typical coordination mechanisms among heterogeneous nodes; and 5) typical
models considering network structures. For each aspect, we summarize the existing studies and discuss
the future research directions. Through the survey, the related studies in this area can be well
understood based on how they can satisfy the general characteristics of distributed systems.
ETPL
PDS - 015 A Survey of Task Allocation and Load Balancing in Distributed Systems
Cloud data owners prefer to outsource documents in an encrypted form for the purpose of privacy
preserving. Therefore it is essential to develop efficient and reliable ciphertext search techniques. One
challenge is that the relationship between documents will be normally concealed in the process of
encryption, which will lead to significant search accuracy performance degradation. Also the volume
of data in data centers has experienced a dramatic growth. This will make it even more challenging to
design ciphertext search schemes that can provide efficient and reliable online information retrieval on
large volume of encrypted data. In this paper, a hierarchical clustering method is proposed to support
more search semantics and also to meet the demand for fast ciphertext search within a big data
environment. The proposed hierarchical approach clusters the documents based on the minimum
relevance threshold, and then partitions the resulting clusters into sub-clusters until the constraint on
the maximum size of cluster is reached. In the search phase, this approach can reach a linear
computational complexity against an exponential size increase of document collection. In order to
verify the authenticity of search results, a structure called minimum hash sub-tree is designed in this
paper. Experiments have been conducted using the collection set built from the IEEE Xplore. The
results show that with a sharp increase of documents in the dataset the search time of the proposed
method increases linearly whereas the search time of the traditional method increases exponentially.
Furthermore, the proposed method has an advantage over the traditional method in the rank privacy
and relevance of retrieved documents.
ETPL
PDS - 016 An Efficient Privacy-Preserving Ranked Keyword Search Method
Software-defined networking (SDN) is an emerging network paradigm that simplifies network
management by decoupling the control plane and data plane, such that switches become simple data
forwarding devices and network management is controlled by logically centralized servers. In SDN-
enabled networks, network flow is managed by a set of associated rules that are maintained by switches
in their local Ternary Content Addressable Memories (TCAMs) which support high-speed parallel
lookup on wildcard patterns. Since TCAM is an expensive hardware and extremely power-hungry,
each switch has only limited TCAM space and it is inefficient and even infeasible to maintain all rules
at local switches. On the other hand, if we eliminate TCAM occupation by forwarding all packets to
the centralized controller for processing, it results in a long delay and heavy processing burden on the
controller. In this paper, we strive for the fine balance between rule caching and remote packet
processing by formulating a minimum weighted flow provisioning ( MWFP) problem with an objective
of minimizing the total cost of TCAM occupation and remote packet processing. We propose an
efficient offline algorithm if the network traffic is given, otherwise, we propose two online algorithms
with guaranteed competitive ratios. Finally, we conduct extensive experiments by simulations using
real network traffic traces. The simulation results demonstrate that our proposed algorithms can
significantly reduce the total cost of remote controller processing and TCAM occupation, and the
solutions obtained are nearly optimal.
ETPL
PDS - 017 Cost Minimization for Rule Caching in Software Defined Networking
In this paper we study energy conservation in the Internet. We observe that different traffic volumes
on a link can result in different energy consumption; this is mainly due to such technologies as trunking
(IEEE 802.1AX), adaptive link rates, etc. We design a green Internet routing scheme, where the routing
can lead traffic in a way that is green. We differ from previous studies where they switch network
components, such as line cards and routers, into sleep mode. We do not prune the Internet topology.
We first develop a power model, and validate it using real commercial routers. Instead of developing
a centralized optimization algorithm, which requires additional protocols such as MPLS to materialize
in the Internet, we choose a hop-by-hop approach. It is thus much easier to integrate our scheme into
the current Internet. We progressively develop three algorithms, which are loop-free, substantially
reduce energy consumption, and jointly consider green and QoS requirements such as path stretch. We
further analyze the power saving ratio, the routing dynamics, and the relationship between hop-by-hop
green routing and QoS requirements. We comprehensively evaluate our algorithms through
simulations on synthetic, measured, and real topologies, with synthetic and real traffic traces. We show
that the power saving in the line cards can be as much as 50 percent.
ETPL
PDS - 018 A Hop-by-Hop Routing Mechanism for Green Internet
Geometric partitioning is fast and effective for load-balancing dynamic applications, particularly those
requiring geometric locality of data (particle methods, crash simulations). We present, to our
knowledge, the first parallel implementation of a multidimensional-jagged geometric partitioner. In
contrast to the traditional recursive coordinate bisection algorithm (RCB), which recursively bisects
subdomains perpendicular to their longest dimension until the desired number of parts is obtained, our
algorithm does recursive multi-section with a given number of parts in each dimension. By computing
multiple cut lines concurrently and intelligently deciding when to migrate data while computing the
partition, we minimize data movement compared to efficient implementations of recursive bisection.
We demonstrate the algorithm's scalability and quality relative to the RCB implementation in Zoltan
on both real and synthetic datasets. Our experiments show that the proposed algorithm performs and
scales better than RCB in terms of run-time without degrading the load balance. Our implementation
partitions 24 billion points into 65,536 parts within a few seconds and exhibits near perfect weak
scaling up to 6K cores.
ETPL
PDS - 019 Multi-Jagged: A Scalable Parallel Spatial Partitioning Algorithm
Multi-tenancy is one of the key features of cloud computing, which provides scalability and economic
benefits to the end-users and service providers by sharing the same cloud platform and its underlying
infrastructure with the isolation of shared network and compute resources. However, resource
management in the context of multi-tenant cloud computing is becoming one of the most complex task
due to the inherent heterogeneity and resource isolation. This paper proposes a novel cloud-based
workflow scheduling (CWSA) policy for compute-intensive workflow applications in multi-tenant
cloud computing environments, which helps minimize the overall workflow completion time,
tardiness, cost of execution of the workflows, and utilize idle resources of cloud effectively. The
proposed algorithm is compared with the state-of-the-art algorithms, i.e., First Come First Served
(FCFS), EASY Backfilling, and Minimum Completion Time (MCT) scheduling policies to evaluate
the performance. Further, a proof-of-concept experiment of real-world scientific workflow applications
is performed to demonstrate the scalability of the CWSA, which verifies the effectiveness of the
proposed solution. The simulation results show that the proposed scheduling policy improves the
workflow performance and outperforms the aforementioned alternative scheduling policies under
typical deployment scenarios.
ETPL
PDS - 020 Workflow Scheduling in Multi-Tenant Cloud Computing Environments
It is imperative for cloud storage systems to be able to provide deadline guaranteed services according
to service level agreements (SLAs) for online services. In spite of many previous works on deadline
aware solutions, most of them focus on scheduling work flows or resource reservation in datacenter
networks but neglect the server overload problem in cloud storage systems that prevents providing the
deadline guaranteed services. In this paper, we introduce a new form of SLAs, which enables each
tenant to specify a percentage of its requests it wishes to serve within a specified deadline. We first
identify the multiple objectives (i.e., traffic and latency minimization, resource utilization
maximization) in developing schemes to satisfy the SLAs. To satisfy the SLAs while achieving the
multi-objectives, we propose a Parallel Deadline Guaranteed (PDG) scheme, which schedules data
reallocation (through load re-assignment and data replication) using a tree-based bottom-up parallel
process. The observation from our model also motivates our deadline strictness clustered data
allocation algorithm that maps tenants with the similar SLA strictness into the same server to enhance
SLA guarantees. We further enhance PDG in supplying SLA guaranteed services through two
algorithms: i) a prioritized data reallocation algorithm that deals with request arrival rate variation, and
ii) an adaptive request retransmission algorithm that deals with SLA requirement variation. Our trace-
driven experiments on a simulator and Amazon EC2 show the effectiveness of our schemes for
guaranteeing the SLAs while achieving the multi-objectives.
ETPL
PDS - 022 Deadline Guaranteed Service for Multi-Tenant Cloud Storage
Dynamic Bin Packing (DBP) is a variant of classical bin packing, which assumes that items may arrive
and depart at arbitrary times. Existing works on DBP generally aim to minimize the maximum number
of bins ever used in the packing. In this paper, we consider a new version of the DBP problem, namely,
the MinTotal DBP problem which targets at minimizing the total cost of the bins used overtime. It is
motivated by the request dispatching problem arising from cloud gaming systems. We analyze the
competitive ratios of the modified versions of the commonly used First Fit, Best Fit, and Any Fit
packing (the family of packing algorithms that open a new bin only when no currently open bin can
accommodate the item to be packed) algorithms for the MinTotal DBP problem. We show that the
competitive ratio of Any Fit packing cannot be better than μ + 1, where μ is the ratio of the maximum
item duration to the minimum item duration. The competitive ratio of Best Fit packing is not bounded
for any given μ. For First Fit packing, if all the item sizes are smaller than 1/β of the bin capacity (β>
1 is a constant), the competitive ratio has an upper bound of β/β-1·μ+3β/β-1 + 1. For the general case,
the competitive ratio of First Fit packing has an upper bound of 2μ + 7. We also propose a Hybrid First
Fit packing algorithm that can achieve a competitive ratio no larger than 5/4 μ + 19/4 when μ is not
known and can achieve a competitive ratio no larger than μ + 5 when μ is known.
ETPL
PDS - 021 Dynamic Bin Packing for On-Demand Cloud Resource Allocation
Hadoop is a widely-used implementation framework of the MapReduce programming model for large-
scale data processing. Hadoop performance however is significantly affected by the settings of the
Hadoop configuration parameters. Unfortunately, manually tuning these parameters is very time-
consuming, if at all practical. This paper proposes an approach, called RFHOC, to automatically tune
the Hadoop configuration parameters for optimized performance for a given application running on a
given cluster. RFHOC constructs two ensembles of performance models using a random-forest
approach for the map and reduce stage respectively. Leveraging these models, RFHOC employs a
genetic algorithm to automatically search the Hadoop configuration space. The evaluation of RFHOC
using five typical Hadoop programs, each with five different input data sets, shows that it achieves a
performance speedup by a factor of 2.11 $times$ on average and up to 7.4 $times$ over the recently
proposed cost-based optimization (CBO) approach. In addition, RFHOC's performance benefit
increases with input data set size.
ETPL
PDS - 023 RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop's
Configuration
With the booming cloud computing industry, computational resources are readily and elastically
available to the customers. In order to attract customers with various demands, most Infrastructure-as-
a-service (IaaS) cloud service providers offer several pricing strategies such as pay as you go, pay less
per unit when you use more (so called volume discount), and pay even less when you reserve. The
diverse pricing schemes among different IaaS service providers or even in the same provider form a
complex economic landscape that nurtures the market of cloud brokers. By strategically scheduling
multiple customers' resource requests, a cloud broker can fully take advantage of the discounts offered
by cloud service providers. In this paper, we focus on how a broker can help a group of customers to
fully utilize the volume discount pricing strategy offered by cloud service providers through cost-
efficient online resource scheduling. We present a randomized online stack-centric scheduling
algorithm (ROSA) and theoretically prove the lower bound of its competitive ratio. Three special cases
of the offline concave cost scheduling problem and the corresponding optimal algorithms are
introduced. Our simulation shows that ROSA achieves a competitive ratio close to the theoretical lower
bound under the special cases. Trace-driven simulation using Google cluster data demonstrates that
ROSA is superior to the conventional online scheduling algorithms in terms of cost saving.
ETPL
PDS - 024 Online Resource Scheduling Under Concave Pricing for Cloud
Computing
In the cloud, for achieving access control and keeping data confidential, the data owners could adopt
attribute-based encryption to encrypt the stored data. Users with limited computing power are however
more likely to delegate the mask of the decryption task to the cloud servers to reduce the computing
cost. As a result, attribute-based encryption with delegation emerges. Still, there are caveats and
questions remaining in the previous relevant works. For instance, during the delegation, the cloud
servers could tamper or replace the delegated ciphertext and respond a forged computing result with
malicious intent. They may also cheat the eligible users by responding them that they are ineligible for
the purpose of cost saving. Furthermore, during the encryption, the access policies may not be flexible
enough as well. Since policy for general circuits enables to achieve the strongest form of access control,
a construction for realizing circuit ciphertext-policy attribute-based hybrid encryption with verifiable
delegation has been considered in our work. In such a system, combined with verifiable computation
and encrypt-then-mac mechanism, the data confidentiality, the fine-grained access control and the
correctness of the delegated computing results are well guaranteed at the same time. Besides, our
scheme achieves security against chosen-plaintext attacks under the k-multilinear Decisional Diffie-
Hellman assumption. Moreover, an extensive simulation campaign confirms the feasibility and
efficiency of the proposed solution.
ETPL
PDS - 025 Circuit Ciphertext-Policy Attribute-Based Hybrid Encryption with
Verifiable Delegation in Cloud Computing
Cloud computing provides promising platforms for executing large applications with enormous
computational resources to offer on demand. In a Cloud model, users are charged based on their usage
of resources and the required quality of service (QoS) specifications. Although there are many existing
workflow scheduling algorithms in traditional distributed or heterogeneous computing environments,
they have difficulties in being directly applied to the Cloud environments since Cloud differs from
traditional heterogeneous environments by its service-based resource managing method and pay-per-
use pricing strategies. In this paper, we highlight such difficulties, and model the workflow scheduling
problem which optimizes both makespan and cost as a Multi-objective Optimization Problem (MOP)
for the Cloud environments. We propose an evolutionary multi-objective optimization (EMO)-based
algorithm to solve this workflow scheduling problem on an infrastructure as a service (IaaS) platform.
Novel schemes for problem-specific encoding and population initialization, fitness evaluation and
genetic operators are proposed in this algorithm. Extensive experiments on real world workflows and
randomly generated workflows show that the schedules produced by our evolutionary algorithm
present more stability on most of the workflows with the instance-based IaaS computing and pricing
models. The results also show that our algorithm can achieve significantly better solutions than existing
state-of-the-art QoS optimization scheduling algorithms in most cases. The conducted experiments are
based on the on-demand instance types of Amazon EC2; however, the proposed algorithm are easy to
be extended to the resources and pricing models of other IaaS services.
ETPL
PDS - 026 Evolutionary Multi-Objective Workflow Scheduling in Cloud
With simple access interfaces and flexible billing models, cloud storage has become an attractive
solution to simplify the storage management for both enterprises and individual users. However,
traditional file systems with extensive optimizations for local disk-based storage backend can not fully
exploit the inherent features of the cloud to obtain desirable performance. In this paper, we present the
design, implementation, and evaluation of Coral, a cloud based file system that strikes a balance
between performance and monetary cost. Unlike previous studies that treat cloud storage as just a
normal backend of existing networked file systems, Coral is designed to address several key issues in
optimizing cloud-based file systems such as the data layout, block management, and billing model.
With carefully designed data structures and algorithms, such as identifying semantically correlated data
blocks, kd-tree based caching policy with self-adaptive thrashing prevention, effective data layout, and
optimal garbage collection, Coral achieves good performance and cost savings under various
workloads as demonstrated by extensive evaluations.
ETPL
PDS - 027 Coral: A Cloud-Backed Frugal File System
With the prevalence of cloud computing and virtualization, more and more cloud services including
parallel soft real-time applications (PSRT applications) are running in virtualized data centers.
However, current hypervisors do not provide adequate support for them because of soft real-time
constraints and synchronization problems, which result in frequent deadline misses and serious
performance degradation. CPU schedulers in underlying hypervisors are central to these issues. In this
paper, we identify and analyze CPU scheduling problems in hypervisors. Then, we design and
implement a parallel soft real-time scheduler according to the analysis, named Poris, based on Xen. It
addresses both soft real-time constraints and synchronization problems simultaneously. In our
proposed method, priority promotion and dynamic time slice mechanisms are introduced to determine
when to schedule virtual CPUs (VCPUs) according to the characteristics of soft real-time applications.
Besides, considering that PSRT applications may run in a virtual machine (VM) or multiple VMs, we
present parallel scheduling, group scheduling and communication-driven group scheduling to
accelerate synchronizations of these applications and make sure that tasks are finished before their
deadlines under different scenarios. Our evaluation shows Poris can significantly improve the
performance of PSRT applications no matter how they run in a VM or multiple VMs. For example,
compared to the Credit scheduler, Poris decreases the response time of web search benchmark by up
to 91.6 percent.
ETPL
PDS - 028 Poris: A Scheduler for Parallel Soft Real-Time Applications in
Virtualized Environments
Traditional parallel algorithms for mining frequent itemsets aim to balance load by equally partitioning
data among a group of computing nodes. We start this study by discovering a serious performance
problem of the existing parallel Frequent Itemset Mining algorithms. Given a large dataset, data
partitioning strategies in the existing solutions suffer high communication and mining overhead
induced by redundant transactions transmitted among computing nodes. We address this problem by
developing a data partitioning approach called FiDoop-DP using the MapReduce programming model.
The overarching goal of FiDoop-DP is to boost the performance of parallel Frequent Itemset Mining
on Hadoop clusters. At the heart of FiDoop-DP is the Voronoi diagram-based data partitioning
technique, which exploits correlations among transactions. Incorporating the similarity metric and the
Locality-Sensitive Hashing technique, FiDoop-DP places highly similar transactions into a data
partition to improve locality without creating an excessive number of redundant transactions. We
implement FiDoop-DP on a 24-node Hadoop cluster, driven by a wide range of datasets created by
IBM Quest Market-Basket Synthetic Data Generator. Experimental results reveal that FiDoop-DP is
conducive to reducing network and computing loads by the virtue of eliminating redundant transactions
on Hadoop nodes. FiDoop-DP significantly improves the performance of the existing parallel frequent-
pattern scheme by up to 31% with an average of 18%.
ETPL
PDS - 029 FiDoop-DP: Data Partitioning in Frequent Itemset Mining on Hadoop
Clusters
With the growing popularity of cloud computing the number of cloud service providers and services
have significantly increased. Thus selecting the best cloud services becomes a challenging task for
prospective cloud users. The process of selecting cloud services involves various factors such as
characteristics and models of cloud services, user requirements and knowledge, and service level
agreement (SLA), to name a few. This paper investigates into the cloud service selection tools,
techniques and models by taking into account the distinguishing characteristics of cloud services. It
also reviews and analyses academic research as well as commercial tools in order to identify their
strengths and weaknesses in the cloud services selection process. It proposes a framework in order to
improve the cloud service selection by taking into account services capabilities, quality attributes, level
of user's knowledge and service level agreements. The paper also envisions various directions for future
research.
ETPL
PDS - 030 Trends and Directions in Cloud Service Selection
Virtualization of servers and networks is a key technique to resolve the conflict between the increasing
demands on computing power and the high cost of hardware in data centers. In order to map virtual
networks to physical infrastructure efficiently, designers have to make careful decisions on the
allocation of limited resources, which makes placement of virtual networks in data centers a critical
issue. In this paper, we study the placement of virtual networks in fat-tree data center networks. In
order to meet the requirements of instant parallel data transfer between multiple computing units, we
propose a model of multicast-capable virtual networks (MVNs). We then design four virtual machine
(VM) placement schemes to embed MVNs into fat-tree data center networks, named Most-Vacant- Fit
(MVF), Most-Compact-First (MCF), Mixed-Bidirectional-Fill (MBF), and Malleable-Shallow-Fill
(MSF). All these VM placement schemes guarantee the nonblocking multicast capability of each MVN
while simultaneously achieving significant saving in the cost of network hardware. In addition, each
VM placement scheme has its unique features. The MVF scheme has zero interference to existing
computing tasks in data centers; the MCF scheme leads to the greatest cost saving; the MBF scheme
simultaneously possesses the merits of MVF and MCF, and it provides an adjustable parameter
allowing cloud providers to achieve preferred balance between the cost and the overhead; the MSF
scheme performs at least as well as MBF, and possesses some additional predictable features. Finally,
we compare the performance and overhead of these VM placement schemes, and present simulation
results to validate the theoretical results.
ETPL
PDS - 031 Placement and Performance Analysis of Virtual Multicast Networks in
Fat-Tree Data Center Networks