system problems associated with current centralized...

The increasing need of cloud and big data applications requires data center networks to be scalable and

bandwidth-rich. Current data center network architectures often use rigid topologies to increase

network bandwidth. A major limitation is that they can hardly support incremental network growth.

Recent work has been investigating new network architecture and protocols to achieve three important

design goals of large data centers, namely, high throughput, routing and forwarding scalability, and

flexibility for incremental growth. Unfortunately, existing data center network architectures focus on

one or two of the above properties and pay little attention to the others. In this paper, we design a novel

flexible data center network architecture, Space Shuffle (S2), which applies greedy routing on multiple

ring spaces to achieve high-throughput, scalability, and flexibility. The proposed greedy routing

protocol of S2 effectively exploits the path diversity of densely connected topologies and enables key-

based routing. Extensive experimental studies show that S2 provides high bisectional bandwidth and

throughput, near-optimal routing path lengths, extremely small forwarding state, fairness among

concurrent data flows, and resiliency to network failures.

ETPL

PDS -001 Space Shuffle: A Scalable, Flexible, and High-Performance Data Center

Network

In current data centers, an application (e.g., MapReduce, Dryad, search platform, etc.) usually

generates a group of parallel flows to complete a job. These flows compose a coflow and only

completing them all is meaningful to the application. Accordingly, minimizing the average Coflow

Completion Time (CCT) becomes a critical objective of flow scheduling. However, achieving this goal

in today’s Data Center Networks (DCNs) is quite challenging, not only because the schedule problem

is theoretically NP-hard, but also because it is tough to perform practical flow scheduling in large-scale

DCNs. In this paper, we find that minimizing the average CCT of a set of coflows is equivalent to the

well-known problem of minimizing the sum of completion times in a concurrent open shop. As there

are abundant existing solutions for concurrent open shop, we open up a variety of techniques for coflow

scheduling. Inspired by the best known result, we derive a 2-approximation algorithm for coflow

scheduling, and further develop a decentralized coflow scheduling system, D-CAS, which avoids the

system problems associated with current centralized proposals while addressing the performance

challenges of decentralized suggestions. Trace-driven simulations indicate that D-CAS achieves a

performance close to Varys, the state-of-the-art centralized method, and outperforms Baraat, and the

only existing decentralized method, significantly.

ETPL

PDS - 002 Towards Practical and Near-optimal Coflow Scheduling for Data Center

Networks

Large scale iterative graph computation presents an interesting systems challenge due to two well

known problems: (1) the lack of access locality and (2) the lack of storage efficiency. This paper

presents PathGraph, a system for improving iterative graph computation on graphs with billions of

edges. First, we improve the memory and disk access locality for iterative computation algorithms on

large graphs by modeling a large graph using a collection of tree-based partitions. This enables us to

use path-centric computation rather than vertexcentric or edge-centric computation. For each tree

partition, we re-label vertices using DFS in order to preserve consistency between the order of vertex

ids and vertex order in the paths. Second, a compact storage that is optimized for iterative graph parallel

computation is developed in the PathGraph system. Concretely, we employ delta-compression and

store tree-based partitions in a DFS order. By clustering highly correlated paths together as tree based

partitions, we maximize sequential access and minimize random access on storage media. Third but

not the least, our path-centric computation model is implemented using a scatter/gather programming

model. We parallel the iterative computation at partition tree level and perform sequential local updates

for vertices in each tree partition to improve the convergence speed. To provide well balanced

workloads among parallel threads at tree partition level, we introduce the concept of multiple stealing

points based task queue to allow work stealings from multiple points in the task queue. We evaluate

the effectiveness of PathGraph by comparing with recent representative graph processing systems such

as GraphChi and X-Stream etc. Our experimental results show that our approach outperforms the two

systems on a number of graph algorithms for both in-memory and out-of-core graphs. While our

approach achieves better data balance and load balance, it also shows better speedup than the two -

ystems with the growth of threads.

ETPL

PDS -003 PathGraph: A Path Centric Graph Processing System

The shift to the in-memory data processing paradigm has had a major influence on the development of

cluster data processing frameworks. Numerous frameworks from the industry, open source community

and academia are adopting the in-memory paradigm to achieve functionalities and performance

breakthroughs. However, despite the advantages of these inmemory frameworks, in practice they are

susceptible to memorypressure related performance collapse and failures. The contributions of this

paper are two-fold. Firstly, we conduct a detailed diagnosis of the memory pressure problem and

identify three preconditions for the performance collapse. These preconditions not only explain the

problem but also shed light on the possible solution strategies. Secondly, we propose a novel

programming abstraction called the leaky buffer that eliminates one of the preconditions, thereby

addressing the underlying problem. We have implemented a leaky buffer enabled hashtable in Spark,

and we believe it is also able to substitute the hashtable that performs similar hash aggregation

operations in any other programs or data processing frameworks. Experiments on a range of memory

intensive aggregation operations show that the leaky buffer abstraction can drastically reduce the

occurrence of memory-related failures, improve performance by up to 507% and reduce memory usage

by up to 87.5%.

ETPL

PDS - 004 Leaky Buffer: A Novel Abstraction for Relieving Memory Pressure from

Cluster Data Processing Frameworks

Among uncertain graph queries, reachability, i.e., the probability that one vertex is reachable from

another, is likely the most fundamental one. Although this problem has been studied within the field

of network reliability, solutions are implemented on a single computer and can only handle small

graphs. However, as the size of graph applications continually increases, the corresponding graph data

can no longer fit within a single computer’s memory and must therefore be distributed across several

machines. Furthermore, the computation of probabilistic reachability queries is #P-complete making it

very expensive even on small graphs. In this paper, we develop an efficient distributed strategy, called

DistR, to solve the problem of reachability query over large uncertain graphs. Specifically, we perform

the task in two steps: distributed graph reduction and distributed consolidation. In the distributed graph

reduction step, we find all of the maximal subgraphs of the original graph, whose reachability

probabilities can be calculated in polynomial time, compute them and reduce the graph accordingly.

After this step, only a small graph remains. In the distributed consolidation step, we transform the

problem into a relational join process and provide an approximate answer to the #P-complete

reachability query. Extensive experimental studies show that our distributed approach is efficient in

terms of both computational and communication costs, and has high accuracy.

ETPL

PDS -005 DistR: A Distributed Method for the Reachability Query over Large

Uncertain Graphs

The alignment of many short sequences of DNA, called reads, to a long reference genome is a common

task in molecular biology. When the problem is expanded to handle typical workloads of billions of

reads, execution time becomes critical. In this paper we present a novel reconfigurable architecture for

minimal perfect sequencing (RAMPS). While existing solutions attempt to align a high percentage of

the reads using a small memory footprint, RAMPS focuses on performing fast exact matching. Using

the human genome as a reference, RAMPS aligns short reads hundreds of thousands of times faster

than current software implementations such as SOAP2 or Bowtie, and about a thousand times faster

than GPU implementations such as SOAP3. Whereas other aligners require hours to preprocess

reference genomes, RAMPS can preprocess the reference human genome in a few minutes, opening

the possibility of using new reference sources that are more genetically similar to the newly sequenced

data.

ETPL

PDS - 006 RAMPS: A Reconfigurable Architecture for Minimal Perfect Sequencing

Due to the increasing usage of cloud computing applications, it is important to minimize energy cost

consumed by a data center, and simultaneously, to improve quality of service via data center

management. One promising approach is to switch some servers in a data center to the idle mode for

saving energy while to keep a suitable number of servers in the active mode for providing timely

service. In this paper, we design both online and offline algorithms for this problem. For the offline

algorithm, we formulate data center management as a cost minimization problem by considering

energy cost, delay cost (to measure service quality), and switching cost (to change servers’s active/idle

mode). Then, we analyze certain properties of an optimal solution which lead to a dynamic

programming based algorithm. Moreover, by revising the solution procedure, we successfully

eliminate the recursive procedure and achieve an optimal offline algorithm with a polynomial

complexity. For the online algorithm, we design it by considering the worst case scenario for future

workload. In simulation, we show this online algorithm can always provide near-optimal solutions.

ETPL

PDS -007 Cost Minimization Algorithms for Data Center Management

This work presents an on-line, energy- and communication-aware scheduling strategy for SaaS

applications in data centers. The applications are composed of various services and represented as

workflows. Each workflow consists of tasks related to each other by precedence constraints and

represented by Directed Acyclic Graphs (DAGs). The proposed scheduling strategy combines

advantages of state-of-the-art workflow scheduling strategies with energy-aware independent task

scheduling approaches. The process of scheduling consists of two phases. In the first phase, virtual

deadlines of individual tasks are set in the central scheduler. These deadlines are determined using a

novel strategy that favors tasks which are less dependent on other tasks. During the second phase, tasks

are dynamically assigned to computing servers based on the current load of network links and servers

in a data center. The proposed approach, called Minimum Dependencies Energy-efficient DAG

(MinD+ED) scheduling, has been implemented in the GreenCloud simulator. It outperforms other

approaches in terms of energy efficiency, while keeping a satisfiable level of tardiness.

ETPL

PDS - 008 Minimum Dependencies Energy-Efficient Scheduling in Data Centers

The advent of software defined networking enables flexible, reliable and feature-rich control planes

for data center networks. However, the tight coupling of centralized control and complete visibility

leads to a wide range of issues among which scalability has risen to prominence due to the excessive

workload on the central controller. By analyzing the traffic patterns from a couple of production data

centers, we observe that data center traffic is usually highly skewed and thus edge switches can be

clustered into a set of communicationintensive groups according to traffic locality. Motivated by this

observation, we present LazyCtrl, a novel hybrid control plane design for data center networks where

network control is carried out by distributed control mechanisms inside independent groups of switches

while complemented with a global controller. LazyCtrl aims at bringing laziness to the global controller

by dynamically devolving most of the control tasks to independent switch groups to process frequent

intra-group events near the datapath while handling rare inter-group or other specified events by the

controller. We implement LazyCtrl and build a prototype based on Open vSwitch and Floodlight.

Tracedriven experiments on our prototype show that an effective switch grouping is easy to maintain

in multi-tenant clouds and the central controller can be significantly shielded by staying “lazy”, with

its workload reduced by up to 82%.

ETPL

PDS - 009 LazyCtrl: A Scalable Hybrid Network Control Plane Design for Cloud

Data Centers

The increasing pervasiveness of mobile devices along with the use of technologies like GPS, Wifi

networks, RFID, and sensors, allows for the collections of large amounts of movement data. This

amount of data can be analyzed to extract descriptive and predictive models that can be properly

exploited to improve urban life. From a technological viewpoint, Cloud computing can play an

essential role by helping city administrators to quickly acquire new capabilities and reducing initial

capital costs by means of a comprehensive pay-as-you-go solution. This paper presents a workflow-

based parallel approach for discovering patterns and rules from trajectory data, in a Cloud-based

framework. Experimental evaluation has been carried out on both real-world and synthetic trajectory

data, up to one million of trajectories. The results show that, due to the high complexity and large

volumes of data involved in the application scenario, the trajectory pattern mining process takes

advantage from the scalable execution environment offered by a Cloud architecture in terms of both

execution time, speed-up and scale-up.

ETPL

PDS - 010 Trajectory Pattern Mining for Urban Computing in the Cloud

HLA-based simulation systems are prone to load imbalances due to lack management of shared

resources in distributed environments. Such imbalances lead these simulations to exhibit performance

loss in terms of execution time. As a result, many dynamic load balancing systems have been

introduced to manage distributed load. These systems use specific methods, depending on load or

application characteristics, to perform the required balancing. Load prediction is a technique that has

been used extensively to enhance load redistribution heuristics towards preventing load imbalances. In

this paper, several efficient Time Series model variants are presented and used to enhance prediction

precision for large-scale distributed simulation-based systems. These variants are proposed to extend

and correct the issues originating from the implementation of Holt’s model for time series in the

predictive module of a dynamic load balancing system for HLA-based distributed simulations. A set

of migration decision-making techniques is also proposed to enable a prediction-based load balancing

system to be independent of any prediction model, promoting a more modular construction.

ETPL

PDS - 011 Time Series-Oriented Load Prediction Model and Migration Policies for

Distributed Simulation Systems

With the increasing complexity of today’s high-performance computing (HPC) architectures,

simulation has become an indispensable tool for exploring the design space of HPC systems—in

particular, networks. In order to make effective design decisions, simulations of these systems must

possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely

manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC

network simulation frameworks, however, are constrained in one or more of these areas. In this work,

we present a simulation framework for modeling two important classes of networks used in today’s

IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer

Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies

at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-

event simulation. Our simulation framework meets all the requirements of a practical network

simulation and can assist network designers in design space exploration. First, it uses validated and

detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second,

instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit

simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to

achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems.

Third, our models give network designers a choice in simulating a broad range of network workloads,

including HPC application workloads using detailed network traces, an ability that is rarely offered in

parallel with high-fidelity network simulations.

ETPL

PDS - 012 Enabling Parallel Simulation of Large-Scale HPC Network Systems

The Internet was designed with the end-to-end principle where the network layer provided merely the

best-effort forwarding service. This design makes it challenging to add new services into the Internet

infrastructure. However, as the Internet connectivity becomes a commodity, users and applications

increasingly demand new in-network services. This paper proposes PacketCloud, a cloudlet-based

open platform to host in-network services. Different from standalone, specialized middleboxes,

cloudlets can efficiently share a set of commodity servers among different services, and serve the

network traffic in an elastic way. PacketCloud can help both Internet Service Providers (ISPs) and

emerging application/content providers deploy their services at strategic network locations. We have

implemented a proof-of-concept prototype of PacketCloud. PacketCloud introduces a small additional

delay, and can scale well to handle high-throughput data traffic. We have evaluated PacketCloud in

both a fully functional emulated environment, and the real Internet.

ETPL

PDS - 013 PacketCloud: A Cloudlet-Based Open Platform for In-Network Services

In current data centers, an application (e.g., MapReduce, Dryad, search platform, etc.) usually

generates a group of parallel flows to complete a job. These flows compose a coflow and only

completing them all is meaningful to the application. Accordingly, minimizing the average Coflow

Completion Time (CCT) becomes a critical objective of flow scheduling. However, achieving this goal

in today’s Data Center Networks (DCNs) is quite challenging, not only because the schedule problem

is theoretically NP-hard, but also because it is tough to perform practical flow scheduling in large-scale

DCNs. In this paper, we find that minimizing the average CCT of a set of coflows is equivalent to the

well-known problem of minimizing the sum of completion times in a concurrent open shop. As there

are abundant existing solutions for concurrent open shop, we open up a variety of techniques for coflow

scheduling. Inspired by the best known result, we derive a 2-approximation algorithm for coflow

scheduling, and further develop a decentralized coflow scheduling system, D-CAS, which avoids the

system problems associated with current centralized proposals while addressing the performance

challenges of decentralized suggestions. Trace-driven simulations indicate that D-CAS achieves a

performance close to Varys, the state-of-the-art centralized method, and outperforms Baraat, the only

existing decentralized method, significantly.

ETPL

PDS - 014 Towards Practical and Near-optimal Coflow Scheduling for Data Center

Networks

In past decades, significant attention has been devoted to the task allocation and load balancing in

distributed systems. Although there have been some related surveys about this subject, each of which

only made a very preliminary review on the state of art of one single type of distributed systems. To

correlate the studies in varying types of distributed systems and make a comprehensive taxonomy on

them, this survey mainly categorizes and reviews the representative studies on task allocation and load

balancing according to the general characteristics of varying distributed systems. First, this survey

summarizes the general characteristics of distributed systems. Based on these general characteristics,

this survey reviews the studies on task allocation and load balancing with respect to the following

aspects: 1) typical control models; 2) typical resource optimization methods; 3) typical methods for

achieving reliability; 4) typical coordination mechanisms among heterogeneous nodes; and 5) typical

models considering network structures. For each aspect, we summarize the existing studies and discuss

the future research directions. Through the survey, the related studies in this area can be well

understood based on how they can satisfy the general characteristics of distributed systems.

ETPL

PDS - 015 A Survey of Task Allocation and Load Balancing in Distributed Systems

Cloud data owners prefer to outsource documents in an encrypted form for the purpose of privacy

preserving. Therefore it is essential to develop efficient and reliable ciphertext search techniques. One

challenge is that the relationship between documents will be normally concealed in the process of

encryption, which will lead to significant search accuracy performance degradation. Also the volume

of data in data centers has experienced a dramatic growth. This will make it even more challenging to

design ciphertext search schemes that can provide efficient and reliable online information retrieval on

large volume of encrypted data. In this paper, a hierarchical clustering method is proposed to support

more search semantics and also to meet the demand for fast ciphertext search within a big data

environment. The proposed hierarchical approach clusters the documents based on the minimum

relevance threshold, and then partitions the resulting clusters into sub-clusters until the constraint on

the maximum size of cluster is reached. In the search phase, this approach can reach a linear

computational complexity against an exponential size increase of document collection. In order to

verify the authenticity of search results, a structure called minimum hash sub-tree is designed in this

paper. Experiments have been conducted using the collection set built from the IEEE Xplore. The

results show that with a sharp increase of documents in the dataset the search time of the proposed

method increases linearly whereas the search time of the traditional method increases exponentially.

Furthermore, the proposed method has an advantage over the traditional method in the rank privacy

and relevance of retrieved documents.

ETPL

PDS - 016 An Efficient Privacy-Preserving Ranked Keyword Search Method

Software-defined networking (SDN) is an emerging network paradigm that simplifies network

management by decoupling the control plane and data plane, such that switches become simple data

forwarding devices and network management is controlled by logically centralized servers. In SDN-

enabled networks, network flow is managed by a set of associated rules that are maintained by switches

in their local Ternary Content Addressable Memories (TCAMs) which support high-speed parallel

lookup on wildcard patterns. Since TCAM is an expensive hardware and extremely power-hungry,

each switch has only limited TCAM space and it is inefficient and even infeasible to maintain all rules

at local switches. On the other hand, if we eliminate TCAM occupation by forwarding all packets to

the centralized controller for processing, it results in a long delay and heavy processing burden on the

controller. In this paper, we strive for the fine balance between rule caching and remote packet

processing by formulating a minimum weighted flow provisioning ( MWFP) problem with an objective

of minimizing the total cost of TCAM occupation and remote packet processing. We propose an

efficient offline algorithm if the network traffic is given, otherwise, we propose two online algorithms

with guaranteed competitive ratios. Finally, we conduct extensive experiments by simulations using

real network traffic traces. The simulation results demonstrate that our proposed algorithms can

significantly reduce the total cost of remote controller processing and TCAM occupation, and the

solutions obtained are nearly optimal.

ETPL

PDS - 017 Cost Minimization for Rule Caching in Software Defined Networking

In this paper we study energy conservation in the Internet. We observe that different traffic volumes

on a link can result in different energy consumption; this is mainly due to such technologies as trunking

(IEEE 802.1AX), adaptive link rates, etc. We design a green Internet routing scheme, where the routing

can lead traffic in a way that is green. We differ from previous studies where they switch network

components, such as line cards and routers, into sleep mode. We do not prune the Internet topology.

We first develop a power model, and validate it using real commercial routers. Instead of developing

a centralized optimization algorithm, which requires additional protocols such as MPLS to materialize

in the Internet, we choose a hop-by-hop approach. It is thus much easier to integrate our scheme into

the current Internet. We progressively develop three algorithms, which are loop-free, substantially

reduce energy consumption, and jointly consider green and QoS requirements such as path stretch. We

further analyze the power saving ratio, the routing dynamics, and the relationship between hop-by-hop

green routing and QoS requirements. We comprehensively evaluate our algorithms through

simulations on synthetic, measured, and real topologies, with synthetic and real traffic traces. We show

that the power saving in the line cards can be as much as 50 percent.

ETPL

PDS - 018 A Hop-by-Hop Routing Mechanism for Green Internet

Geometric partitioning is fast and effective for load-balancing dynamic applications, particularly those

requiring geometric locality of data (particle methods, crash simulations). We present, to our

knowledge, the first parallel implementation of a multidimensional-jagged geometric partitioner. In

contrast to the traditional recursive coordinate bisection algorithm (RCB), which recursively bisects

subdomains perpendicular to their longest dimension until the desired number of parts is obtained, our

algorithm does recursive multi-section with a given number of parts in each dimension. By computing

multiple cut lines concurrently and intelligently deciding when to migrate data while computing the

partition, we minimize data movement compared to efficient implementations of recursive bisection.

We demonstrate the algorithm's scalability and quality relative to the RCB implementation in Zoltan

on both real and synthetic datasets. Our experiments show that the proposed algorithm performs and

scales better than RCB in terms of run-time without degrading the load balance. Our implementation

partitions 24 billion points into 65,536 parts within a few seconds and exhibits near perfect weak

scaling up to 6K cores.

ETPL

PDS - 019 Multi-Jagged: A Scalable Parallel Spatial Partitioning Algorithm

Multi-tenancy is one of the key features of cloud computing, which provides scalability and economic

benefits to the end-users and service providers by sharing the same cloud platform and its underlying

infrastructure with the isolation of shared network and compute resources. However, resource

management in the context of multi-tenant cloud computing is becoming one of the most complex task

due to the inherent heterogeneity and resource isolation. This paper proposes a novel cloud-based

workflow scheduling (CWSA) policy for compute-intensive workflow applications in multi-tenant

cloud computing environments, which helps minimize the overall workflow completion time,

tardiness, cost of execution of the workflows, and utilize idle resources of cloud effectively. The

proposed algorithm is compared with the state-of-the-art algorithms, i.e., First Come First Served

(FCFS), EASY Backfilling, and Minimum Completion Time (MCT) scheduling policies to evaluate

the performance. Further, a proof-of-concept experiment of real-world scientific workflow applications

is performed to demonstrate the scalability of the CWSA, which verifies the effectiveness of the

proposed solution. The simulation results show that the proposed scheduling policy improves the

workflow performance and outperforms the aforementioned alternative scheduling policies under

typical deployment scenarios.

ETPL

PDS - 020 Workflow Scheduling in Multi-Tenant Cloud Computing Environments

It is imperative for cloud storage systems to be able to provide deadline guaranteed services according

to service level agreements (SLAs) for online services. In spite of many previous works on deadline

aware solutions, most of them focus on scheduling work flows or resource reservation in datacenter

networks but neglect the server overload problem in cloud storage systems that prevents providing the

deadline guaranteed services. In this paper, we introduce a new form of SLAs, which enables each

tenant to specify a percentage of its requests it wishes to serve within a specified deadline. We first

identify the multiple objectives (i.e., traffic and latency minimization, resource utilization

maximization) in developing schemes to satisfy the SLAs. To satisfy the SLAs while achieving the

multi-objectives, we propose a Parallel Deadline Guaranteed (PDG) scheme, which schedules data

reallocation (through load re-assignment and data replication) using a tree-based bottom-up parallel

process. The observation from our model also motivates our deadline strictness clustered data

allocation algorithm that maps tenants with the similar SLA strictness into the same server to enhance

SLA guarantees. We further enhance PDG in supplying SLA guaranteed services through two

algorithms: i) a prioritized data reallocation algorithm that deals with request arrival rate variation, and

ii) an adaptive request retransmission algorithm that deals with SLA requirement variation. Our trace-

driven experiments on a simulator and Amazon EC2 show the effectiveness of our schemes for

guaranteeing the SLAs while achieving the multi-objectives.

ETPL

PDS - 022 Deadline Guaranteed Service for Multi-Tenant Cloud Storage

Dynamic Bin Packing (DBP) is a variant of classical bin packing, which assumes that items may arrive

and depart at arbitrary times. Existing works on DBP generally aim to minimize the maximum number

of bins ever used in the packing. In this paper, we consider a new version of the DBP problem, namely,

the MinTotal DBP problem which targets at minimizing the total cost of the bins used overtime. It is

motivated by the request dispatching problem arising from cloud gaming systems. We analyze the

competitive ratios of the modified versions of the commonly used First Fit, Best Fit, and Any Fit

packing (the family of packing algorithms that open a new bin only when no currently open bin can

accommodate the item to be packed) algorithms for the MinTotal DBP problem. We show that the

competitive ratio of Any Fit packing cannot be better than μ + 1, where μ is the ratio of the maximum

item duration to the minimum item duration. The competitive ratio of Best Fit packing is not bounded

for any given μ. For First Fit packing, if all the item sizes are smaller than 1/β of the bin capacity (β>

1 is a constant), the competitive ratio has an upper bound of β/β-1·μ+3β/β-1 + 1. For the general case,

the competitive ratio of First Fit packing has an upper bound of 2μ + 7. We also propose a Hybrid First

Fit packing algorithm that can achieve a competitive ratio no larger than 5/4 μ + 19/4 when μ is not

known and can achieve a competitive ratio no larger than μ + 5 when μ is known.

ETPL

PDS - 021 Dynamic Bin Packing for On-Demand Cloud Resource Allocation

Hadoop is a widely-used implementation framework of the MapReduce programming model for large-

scale data processing. Hadoop performance however is significantly affected by the settings of the

Hadoop configuration parameters. Unfortunately, manually tuning these parameters is very time-

consuming, if at all practical. This paper proposes an approach, called RFHOC, to automatically tune

the Hadoop configuration parameters for optimized performance for a given application running on a

given cluster. RFHOC constructs two ensembles of performance models using a random-forest

approach for the map and reduce stage respectively. Leveraging these models, RFHOC employs a

genetic algorithm to automatically search the Hadoop configuration space. The evaluation of RFHOC

using five typical Hadoop programs, each with five different input data sets, shows that it achieves a

performance speedup by a factor of 2.11 $times$ on average and up to 7.4 $times$ over the recently

proposed cost-based optimization (CBO) approach. In addition, RFHOC's performance benefit

increases with input data set size.

ETPL

PDS - 023 RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop's

Configuration

With the booming cloud computing industry, computational resources are readily and elastically

available to the customers. In order to attract customers with various demands, most Infrastructure-as-

a-service (IaaS) cloud service providers offer several pricing strategies such as pay as you go, pay less

per unit when you use more (so called volume discount), and pay even less when you reserve. The

diverse pricing schemes among different IaaS service providers or even in the same provider form a

complex economic landscape that nurtures the market of cloud brokers. By strategically scheduling

multiple customers' resource requests, a cloud broker can fully take advantage of the discounts offered

by cloud service providers. In this paper, we focus on how a broker can help a group of customers to

fully utilize the volume discount pricing strategy offered by cloud service providers through cost-

efficient online resource scheduling. We present a randomized online stack-centric scheduling

algorithm (ROSA) and theoretically prove the lower bound of its competitive ratio. Three special cases

of the offline concave cost scheduling problem and the corresponding optimal algorithms are

introduced. Our simulation shows that ROSA achieves a competitive ratio close to the theoretical lower

bound under the special cases. Trace-driven simulation using Google cluster data demonstrates that

ROSA is superior to the conventional online scheduling algorithms in terms of cost saving.

ETPL

PDS - 024 Online Resource Scheduling Under Concave Pricing for Cloud

Computing

In the cloud, for achieving access control and keeping data confidential, the data owners could adopt

attribute-based encryption to encrypt the stored data. Users with limited computing power are however

more likely to delegate the mask of the decryption task to the cloud servers to reduce the computing

cost. As a result, attribute-based encryption with delegation emerges. Still, there are caveats and

questions remaining in the previous relevant works. For instance, during the delegation, the cloud

servers could tamper or replace the delegated ciphertext and respond a forged computing result with

malicious intent. They may also cheat the eligible users by responding them that they are ineligible for

the purpose of cost saving. Furthermore, during the encryption, the access policies may not be flexible

enough as well. Since policy for general circuits enables to achieve the strongest form of access control,

a construction for realizing circuit ciphertext-policy attribute-based hybrid encryption with verifiable

delegation has been considered in our work. In such a system, combined with verifiable computation

and encrypt-then-mac mechanism, the data confidentiality, the fine-grained access control and the

correctness of the delegated computing results are well guaranteed at the same time. Besides, our

scheme achieves security against chosen-plaintext attacks under the k-multilinear Decisional Diffie-

Hellman assumption. Moreover, an extensive simulation campaign confirms the feasibility and

efficiency of the proposed solution.

ETPL

PDS - 025 Circuit Ciphertext-Policy Attribute-Based Hybrid Encryption with

Verifiable Delegation in Cloud Computing

Cloud computing provides promising platforms for executing large applications with enormous

computational resources to offer on demand. In a Cloud model, users are charged based on their usage

of resources and the required quality of service (QoS) specifications. Although there are many existing

workflow scheduling algorithms in traditional distributed or heterogeneous computing environments,

they have difficulties in being directly applied to the Cloud environments since Cloud differs from

traditional heterogeneous environments by its service-based resource managing method and pay-per-

use pricing strategies. In this paper, we highlight such difficulties, and model the workflow scheduling

problem which optimizes both makespan and cost as a Multi-objective Optimization Problem (MOP)

for the Cloud environments. We propose an evolutionary multi-objective optimization (EMO)-based

algorithm to solve this workflow scheduling problem on an infrastructure as a service (IaaS) platform.

Novel schemes for problem-specific encoding and population initialization, fitness evaluation and

genetic operators are proposed in this algorithm. Extensive experiments on real world workflows and

randomly generated workflows show that the schedules produced by our evolutionary algorithm

present more stability on most of the workflows with the instance-based IaaS computing and pricing

models. The results also show that our algorithm can achieve significantly better solutions than existing

state-of-the-art QoS optimization scheduling algorithms in most cases. The conducted experiments are

based on the on-demand instance types of Amazon EC2; however, the proposed algorithm are easy to

be extended to the resources and pricing models of other IaaS services.

ETPL

PDS - 026 Evolutionary Multi-Objective Workflow Scheduling in Cloud

With simple access interfaces and flexible billing models, cloud storage has become an attractive

solution to simplify the storage management for both enterprises and individual users. However,

traditional file systems with extensive optimizations for local disk-based storage backend can not fully

exploit the inherent features of the cloud to obtain desirable performance. In this paper, we present the

design, implementation, and evaluation of Coral, a cloud based file system that strikes a balance

between performance and monetary cost. Unlike previous studies that treat cloud storage as just a

normal backend of existing networked file systems, Coral is designed to address several key issues in

optimizing cloud-based file systems such as the data layout, block management, and billing model.

With carefully designed data structures and algorithms, such as identifying semantically correlated data

blocks, kd-tree based caching policy with self-adaptive thrashing prevention, effective data layout, and

optimal garbage collection, Coral achieves good performance and cost savings under various

workloads as demonstrated by extensive evaluations.

ETPL

PDS - 027 Coral: A Cloud-Backed Frugal File System

With the prevalence of cloud computing and virtualization, more and more cloud services including

parallel soft real-time applications (PSRT applications) are running in virtualized data centers.

However, current hypervisors do not provide adequate support for them because of soft real-time

constraints and synchronization problems, which result in frequent deadline misses and serious

performance degradation. CPU schedulers in underlying hypervisors are central to these issues. In this

paper, we identify and analyze CPU scheduling problems in hypervisors. Then, we design and

implement a parallel soft real-time scheduler according to the analysis, named Poris, based on Xen. It

addresses both soft real-time constraints and synchronization problems simultaneously. In our

proposed method, priority promotion and dynamic time slice mechanisms are introduced to determine

when to schedule virtual CPUs (VCPUs) according to the characteristics of soft real-time applications.

Besides, considering that PSRT applications may run in a virtual machine (VM) or multiple VMs, we

present parallel scheduling, group scheduling and communication-driven group scheduling to

accelerate synchronizations of these applications and make sure that tasks are finished before their

deadlines under different scenarios. Our evaluation shows Poris can significantly improve the

performance of PSRT applications no matter how they run in a VM or multiple VMs. For example,

compared to the Credit scheduler, Poris decreases the response time of web search benchmark by up

to 91.6 percent.

ETPL

PDS - 028 Poris: A Scheduler for Parallel Soft Real-Time Applications in

Virtualized Environments

Traditional parallel algorithms for mining frequent itemsets aim to balance load by equally partitioning

data among a group of computing nodes. We start this study by discovering a serious performance

problem of the existing parallel Frequent Itemset Mining algorithms. Given a large dataset, data

partitioning strategies in the existing solutions suffer high communication and mining overhead

induced by redundant transactions transmitted among computing nodes. We address this problem by

developing a data partitioning approach called FiDoop-DP using the MapReduce programming model.

The overarching goal of FiDoop-DP is to boost the performance of parallel Frequent Itemset Mining

on Hadoop clusters. At the heart of FiDoop-DP is the Voronoi diagram-based data partitioning

technique, which exploits correlations among transactions. Incorporating the similarity metric and the

Locality-Sensitive Hashing technique, FiDoop-DP places highly similar transactions into a data

partition to improve locality without creating an excessive number of redundant transactions. We

implement FiDoop-DP on a 24-node Hadoop cluster, driven by a wide range of datasets created by

IBM Quest Market-Basket Synthetic Data Generator. Experimental results reveal that FiDoop-DP is

conducive to reducing network and computing loads by the virtue of eliminating redundant transactions

on Hadoop nodes. FiDoop-DP significantly improves the performance of the existing parallel frequent-

pattern scheme by up to 31% with an average of 18%.

ETPL

PDS - 029 FiDoop-DP: Data Partitioning in Frequent Itemset Mining on Hadoop

Clusters

With the growing popularity of cloud computing the number of cloud service providers and services

have significantly increased. Thus selecting the best cloud services becomes a challenging task for

prospective cloud users. The process of selecting cloud services involves various factors such as

characteristics and models of cloud services, user requirements and knowledge, and service level

agreement (SLA), to name a few. This paper investigates into the cloud service selection tools,

techniques and models by taking into account the distinguishing characteristics of cloud services. It

also reviews and analyses academic research as well as commercial tools in order to identify their

strengths and weaknesses in the cloud services selection process. It proposes a framework in order to

improve the cloud service selection by taking into account services capabilities, quality attributes, level

of user's knowledge and service level agreements. The paper also envisions various directions for future

research.

ETPL

PDS - 030 Trends and Directions in Cloud Service Selection

Virtualization of servers and networks is a key technique to resolve the conflict between the increasing

demands on computing power and the high cost of hardware in data centers. In order to map virtual

networks to physical infrastructure efficiently, designers have to make careful decisions on the

allocation of limited resources, which makes placement of virtual networks in data centers a critical

issue. In this paper, we study the placement of virtual networks in fat-tree data center networks. In

order to meet the requirements of instant parallel data transfer between multiple computing units, we

propose a model of multicast-capable virtual networks (MVNs). We then design four virtual machine

(VM) placement schemes to embed MVNs into fat-tree data center networks, named Most-Vacant- Fit

(MVF), Most-Compact-First (MCF), Mixed-Bidirectional-Fill (MBF), and Malleable-Shallow-Fill

(MSF). All these VM placement schemes guarantee the nonblocking multicast capability of each MVN

while simultaneously achieving significant saving in the cost of network hardware. In addition, each

VM placement scheme has its unique features. The MVF scheme has zero interference to existing

computing tasks in data centers; the MCF scheme leads to the greatest cost saving; the MBF scheme

simultaneously possesses the merits of MVF and MCF, and it provides an adjustable parameter

allowing cloud providers to achieve preferred balance between the cost and the overhead; the MSF

scheme performs at least as well as MBF, and possesses some additional predictable features. Finally,

we compare the performance and overhead of these VM placement schemes, and present simulation

results to validate the theoretical results.

ETPL

PDS - 031 Placement and Performance Analysis of Virtual Multicast Networks in

Fat-Tree Data Center Networks

system problems associated with current centralized...

Documents