making cloud intermediate data fault-tolerant
DESCRIPTION
Making Cloud Intermediate Data Fault-Tolerant. Steven Y. Ko *, Imranul Hoque , Brain Cho, Indranil Gupta Presentation by John Shu. Dept . of Computer Science University of Illinois at Urbana-Champaign Urbana, IL, USA. *Dept. of Computer Science Princeton University - PowerPoint PPT PresentationTRANSCRIPT
Making Cloud Intermediate Data Fault-Tolerant
Steven Y. Ko*, Imranul Hoque, Brain Cho, Indranil Gupta
Presentation by John Shu
Dept. of Computer ScienceUniversity of Illinois at Urbana-
ChampaignUrbana, IL, USA
*Dept. of Computer SciencePrinceton UniversityPrinceton, NJ, USA
AgendaTerminologyTrendsMotivationKey IssuesSolutions ImplementationResultsConclusion
TerminologyMap Reduce: Map Reduce is a software framework from Google to support
distributed computing on a large data set on clusters of computers.
Map Step: Master Node takes input and partitions and distributes to smaller
nodes. They send answers back to master.
Reduce Step: Master Node combines answer to problem in an attempt to
generate an answer to the input data.
Terminology Map Reduce: The paper outlines three major steps in the standard implementation
Map Step: Map phase executes user provided functions in parallel. Input is
divided into chunks and stored in DFS. Map reads chunks, generates Intermediate Data (ID) and stores for next stage.
Shuffle Step: Shuffle phase moves Intermediate Data (ID) generated among
machines in the cluster. Communication model is all to all i.e. from map to reduce.
Reduce Step: Reduce phase executes user provided functions in parallel over Map
Reduce cluster. It stores it’s output in the DFS. For one task this is will final output but for more this will be ID .
Trends With the advent of cloud computing the already existing need
for copious amounts of data processing is hitting record highs. Yahoo web graph generation receives 280TB of input a day.
Parallel data flow programming offers a very feasible solution through frameworks like Map Reduce, Dryad, Pig, Hive etc
Organizations like A9.com, AOL, Facebook, Yahoo and the New York times use Hadoop an open source implementation of MapReduce.
Parallel Programs written in these test beds run in data centers such as private clouds eg Microsoft and public such as UIUC.
Motivation In these huge data facilities, the focus is efficient
performance and productivity without having to sacrifice processing time.
Parallel Data Flow programs generate enormous amounts of distributed data that is short-lived yet crucial for completion go the job and run time performance.
This distributed data is called Intermediate Data and this paper focuses on minimizing the effects of run time server failure on the availability of ID and also performance metrics e.g. job completion time.
ID Background Intermediate Data is data generated during executions of
parallel data flow programs directly or indirectly from the input data but excludes the final output and input data as well.
Framework operates as a sequence of map reduce jobs. In the execution of these jobs ID is produced as output from one stage and serves as input to the next stage.
The ID thus has to be distributed among the nodes in the cluster and continuous propagation adds up to really enormous amounts of data.
ID Background
[Making Cloud Intermediate Data Fault-Tolerant, Steven Y. Ko et al]
Key Issues Intermediate Data is short-lived, used immediately, written
once and read once. It is stored in blocks and is distributed on a large scale across the cluster.
Now, the blocks for the next stage have to be ready from the previous stage before execution starts. In essence performance and job completion is dependent on generation of ID before next stage.
ID can be lost if there is server failure and for some small scale Hadoop applications this has led to 50% time prolongation.
Existing SolutionsFirst approach, data can be stored locally in
naïve file system and fetched remotely by tasks of next stage. Data is not replicated here and failure results in re-execution of tasks.
Second approach involves DFS where data is written back to a distributed file system where it is automatically replicated. This adequately supports fault tolerance but incurs significant over head.
Proposed Solution Goal is to achieve as good a performance as the
local store but also the fault tolerance of the DFS approach
With Store local, a single failure results in cascaded re-execution
If this can be avoided by implementing a robust fault tolerant system while maintaining low over head then we can produce a better system
Proposed Solution: The How ?
Asynchronous replication which allows writers to proceed without waiting for replication to complete.
Rack-Level Replication where replicas of intermediate data blocks are always placed among the machines of the same rack.
Selective Replication which selectively replicates data to be consumed locally reducing the total amount of replication to be done.
ImplementationThe above mentioned scheme was dubbed ISS
(Intermediate Storage System).
It replicates the Map and Reduce with significantly less overhead while preventing cascaded re-execution for multi stage reduce programs
ISS is not a stand alone framework, it is implemented as an extension to the Hadoop and performs well enough elimination the need for the shuffle phase altogether.
Results
Results
ConclusionWe have shown the need for, presented
requirements towards, and designed a new intermediate storage system (ISS) that treats intermediate data in data flow programs as a first-class citizen in order to tolerate failures.
We have also shown that our asynchronous rack-level selective replication mechanism is effective and masks interference very well.
Under a failure, ISS incurs only up to 18% of overhead compared to Hadoop with no failures.
ReferencesB. F. Cooper, R. Ramakrishnan, U. Srivastava,
A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!’s Hosted Data Serving Platform. In Proceedings of the International Conference on Very Large Data Bases (VLDB), 2008.
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed Data-Parallel Programs From Sequential Building Blocks. In Proceedings of the 2007 EuroSys Conference (EuroSys), 2007.
M. K. Aguilera, A. Merchant, M. A. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A New Paradigm for Building Scalable Distributed Systems. In Proceedings of the ACM Symposium on Operating systems principles (SOSP), 2007.
Questions ?