making cloud intermediate data fault-tolerant

Making Cloud Intermediate Data Fault-Tolerant

Steven Y. Ko*, Imranul Hoque, Brain Cho, Indranil Gupta

Presentation by John Shu

Dept. of Computer ScienceUniversity of Illinois at Urbana-

ChampaignUrbana, IL, USA

*Dept. of Computer SciencePrinceton UniversityPrinceton, NJ, USA

AgendaTerminologyTrendsMotivationKey IssuesSolutions ImplementationResultsConclusion

TerminologyMap Reduce: Map Reduce is a software framework from Google to support

distributed computing on a large data set on clusters of computers.

Map Step: Master Node takes input and partitions and distributes to smaller

nodes. They send answers back to master.

Reduce Step: Master Node combines answer to problem in an attempt to

generate an answer to the input data.

Terminology Map Reduce: The paper outlines three major steps in the standard implementation

Map Step: Map phase executes user provided functions in parallel. Input is

divided into chunks and stored in DFS. Map reads chunks, generates Intermediate Data (ID) and stores for next stage.

Shuffle Step: Shuffle phase moves Intermediate Data (ID) generated among

machines in the cluster. Communication model is all to all i.e. from map to reduce.

Reduce Step: Reduce phase executes user provided functions in parallel over Map

Reduce cluster. It stores it’s output in the DFS. For one task this is will final output but for more this will be ID .

Trends With the advent of cloud computing the already existing need

for copious amounts of data processing is hitting record highs. Yahoo web graph generation receives 280TB of input a day.

Parallel data flow programming offers a very feasible solution through frameworks like Map Reduce, Dryad, Pig, Hive etc

Organizations like A9.com, AOL, Facebook, Yahoo and the New York times use Hadoop an open source implementation of MapReduce.

Parallel Programs written in these test beds run in data centers such as private clouds eg Microsoft and public such as UIUC.

Motivation In these huge data facilities, the focus is efficient

performance and productivity without having to sacrifice processing time.

Parallel Data Flow programs generate enormous amounts of distributed data that is short-lived yet crucial for completion go the job and run time performance.

This distributed data is called Intermediate Data and this paper focuses on minimizing the effects of run time server failure on the availability of ID and also performance metrics e.g. job completion time.

ID Background Intermediate Data is data generated during executions of

parallel data flow programs directly or indirectly from the input data but excludes the final output and input data as well.

Framework operates as a sequence of map reduce jobs. In the execution of these jobs ID is produced as output from one stage and serves as input to the next stage.

The ID thus has to be distributed among the nodes in the cluster and continuous propagation adds up to really enormous amounts of data.

ID Background

[Making Cloud Intermediate Data Fault-Tolerant, Steven Y. Ko et al]

Key Issues Intermediate Data is short-lived, used immediately, written

once and read once. It is stored in blocks and is distributed on a large scale across the cluster.

Now, the blocks for the next stage have to be ready from the previous stage before execution starts. In essence performance and job completion is dependent on generation of ID before next stage.

ID can be lost if there is server failure and for some small scale Hadoop applications this has led to 50% time prolongation.

Existing SolutionsFirst approach, data can be stored locally in

naïve file system and fetched remotely by tasks of next stage. Data is not replicated here and failure results in re-execution of tasks.

Second approach involves DFS where data is written back to a distributed file system where it is automatically replicated. This adequately supports fault tolerance but incurs significant over head.

Proposed Solution Goal is to achieve as good a performance as the

local store but also the fault tolerance of the DFS approach

With Store local, a single failure results in cascaded re-execution

If this can be avoided by implementing a robust fault tolerant system while maintaining low over head then we can produce a better system

Proposed Solution: The How ?

Asynchronous replication which allows writers to proceed without waiting for replication to complete.

Rack-Level Replication where replicas of intermediate data blocks are always placed among the machines of the same rack.

Selective Replication which selectively replicates data to be consumed locally reducing the total amount of replication to be done.

ImplementationThe above mentioned scheme was dubbed ISS

(Intermediate Storage System).

It replicates the Map and Reduce with significantly less overhead while preventing cascaded re-execution for multi stage reduce programs

ISS is not a stand alone framework, it is implemented as an extension to the Hadoop and performs well enough elimination the need for the shuffle phase altogether.

Results

ConclusionWe have shown the need for, presented

requirements towards, and designed a new intermediate storage system (ISS) that treats intermediate data in data flow programs as a first-class citizen in order to tolerate failures.

We have also shown that our asynchronous rack-level selective replication mechanism is effective and masks interference very well.

Under a failure, ISS incurs only up to 18% of overhead compared to Hadoop with no failures.

ReferencesB. F. Cooper, R. Ramakrishnan, U. Srivastava,

A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!’s Hosted Data Serving Platform. In Proceedings of the International Conference on Very Large Data Bases (VLDB), 2008.

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed Data-Parallel Programs From Sequential Building Blocks. In Proceedings of the 2007 EuroSys Conference (EuroSys), 2007.

M. K. Aguilera, A. Merchant, M. A. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A New Paradigm for Building Scalable Distributed Systems. In Proceedings of the ACM Symposium on Operating systems principles (SOSP), 2007.

Questions ?

making cloud intermediate data fault-tolerant

Documents

input data

backgroundintermediate

data centers

intermediate data id

enormous amounts of

parallel data flow programming

large data set

huge data facilities