medusa: an efficient cloud fault-tolerant mapreducemenasce/cs788/slides/cs788x...paper presentation...

Paper Presentation

Medusa: An Efficient Cloud Fault-Tolerant

MapReduce [1]

PePedro A. R. S. Costa, Xiao Bai†, Fernando M. V. Ramos, Miguel Correia2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing

Presented by : Muhammad Salman Aslam (G-00763290)

Department of Computer Science, VSE,

George Mason University M S Aslam - GMU VSE 11/13/17

Sequence of Presentation

Basics of Map Reduce

Distribute Map Reduce (Overview and Issues / Problem Definition)

Proposed Solution (Medusa)

Medusa Design

Medusa Scheduler Model

Q & A (Design and Model)

Experimental Evaluation and Results

Critique

Important Takeaways

Q & A (Final)

• Additional Comments / Observations by the Presenter in white Font Color

• Not included in the Paper by Authors - Open to disagreement / discussion

M S Aslam - GMU VSE 11/13/17

2

Map Reduce - Basics

MAP Reduce Basics (Architecture)

Architecture

Master Node

Many Worker

Nodes

Work is

distributed

Distributed

Nodes

Job tracker

Task tracker Task tracker Task tracker

Slave node 1 Slave node 2 Slave node N

Workers

Client

Workers Workers

Master node


4

MAP Reduce Basics (Operations

/Work flow)

Client Adds Job at master which is Manger

Map and Reduce at Nodes monitored by

Mode Manager


5

Map Reduce Functions

Map

Process a key/value pair to generate intermediate key/value pairs

Reduce

Merge all intermediate values associated with the same key

Partition

By default : hash(key) mod R

Goal is to keep the partitions Well balanced


6

Map Reduce Scaling and

Distributed Deployments• Review

• Issues

Revisiting Architecture

Think of the nodes

as part of one

cluster

Typically cluster

sizes grow large as

larger jobs are

designed

A cloud may be

holding one Map

Reduce


8

How to Handle Faults

Typical Fault Handling

Monitoring and restart tasks

Node manager and Resource Manager Crashes

Restart the Node

Spin a new Machine and Restart execution

Tasks crash

Restart the Task

Redistribute the task to different node

Adding checksums to the files in HDFS to detect data corruption in disks

Techniques for single cloud Only.


9

Distributed Map Reduce

May be use several clouds !

Due

Size of Problem

BY Design

To Build efficiency

Redundancy

Fault Tolerance

Cloud 1

Cloud 2

Cloud 3

Cloud of

Clouds

_


10

Map Reduce Fault Tolerance Needed

Distribution BY Design fro Fault Tolerance

Build Fault Tolerance

Complete cloud Failure

Power outages

Calamities

Malicious Attacks

INSIDER THREAT

Cloud 1

Cloud 2

Cloud 3

Cloud of

CloudsM S Aslam - GMU VSE 11/13/17

11

Proposed SolutionChallenges Addressed

Medusa Design

Challenges Addressed

Transparent Solution

Keep MapReduce API (no changes),

No modification of Map Reduce Jobs necessary

No Modification to Hadoop Framework.

Tolerate Additional Faults (Map Reduce only address machine crash faults)

Arbitrary faults

Cloud outages

Malicious faults

Minimum replication

Matching or Better performance

Solution - An efficient Middleware named MEDUSA


13

Medusa Design

Work as a Middleware

Distributing and Managing MapReduce jobs among clouds

Minimizing amount of data replication

Ensuring efficient completion of the entire MapReduce

Distribute Jobs to ensure lower workspan

Ensure the copy of data between clouds with high pairwise bandwidth

Select Clouds with high computational Power

Ensuring Fault Tolerance

Traditional Approach

If f faulty resource manager then we need at least f +1 correct results to rule out f identical wrong answers

There can be f Faulty clouds; Thus we need to distribute the job at 2f+1

replications of Map and aggregation Jobs

Medusa ensures only f+1 replications – provides same fault

toleranceM S Aslam - GMU VSE 11/13/17

14

Map Reduce on a Federation

Federation

Distributed Cloud N

Distributed Cloud 2

Distributed Cloud 1

-

-

-Medusa

(Proxy)


15

Medusa - 2 Phase Operation

1. Vanilla Map Reduce Rank Clusters

( Scheduling Algorithm)

Copy Data

(f=1 -> only 2 replicas)

Run Job

Verify Output

2. Global Map Reduce Rank Clusters

(Minimize Data Transfer)

Copy Job

Run Job

Verify Output

3. Return ResultM S Aslam - GMU VSE 11/13/17

16

Medusa – Important Issues

Fault Tolerance

Build Integrity

Data Digest using Hash Function (SHA 256)

Send hash of data when transmitting job : Ensure successful Tx

In verification step the hash of result is compared between the replicated jobs

If the hash matches Good, Else re run the job

Efficiency

Ensure the best make span (job ends is least time)

Select clouds to Replicate the job optimally

Estimate Transmission time

Estimate Processing time


17

Medusa – Scheduler Model

Estimation Model

Total Job Time (Workspan)

Transmission time

Processing time

Scheduler Algorithm (pseudo code) is not included in the Paper

But is available as open source distribution M S Aslam - GMU VSE 11/13/17

18

Medusa Scheduler : Model

Estimate Workspan

Running time (Phase 1)

Transmission time + Processing time

Running time (Phase 2)

Copy phase output from phase 1 to all clouds which does not have that result

Select Assignments with minimal t1 and t2 for all i clouds

Schedule job replicas to ensure lowest workspan


19

Medusa Scheduler : Estimating transmission time

Transmission time :

𝑡𝑡𝑟𝑎𝑛𝑠(𝑖, 𝑗) =𝑙 (𝑖, 𝑗)

2+

𝑆

𝑇 (𝑖, 𝑗)

Terms

𝑡𝑡𝑟𝑎𝑛𝑠(𝑖, 𝑗) _ Transmission time between i and j clouds

𝑙 (𝑖, 𝑗) _ Distance between i and j (in terms of ound trip time)

𝑇 (𝑖, 𝑗) _ Network Through put (Transmission rate)

S - Data size

Parameters tracking

Through put Medusa tracks inter cloud through puts – using Ipref

RTT Measured periodically M S Aslam - GMU VSE 11/13/17

20

Medusa Scheduler : Estimating Processing time

Scheduler Model

𝑦′ = 𝛽1𝑥1 + 𝛽

2𝑥2 +⋯+ 𝛽

𝑛𝑥𝑛 + 𝛽

0

Terms

y _ Data Processing time predicted

𝑥1 , 𝑥2 …𝑥𝑛 _ n features

𝛽1, 𝛽2 … 𝛽𝑛 _ estimated parameters

Parameters estimated based on

Job Configuration (Input size, Number of map jobs, Number of reduce jobs)

Cloud capacity (Clock, Cores, Memory)

Overhead (Queued Jobs, No of Map reduce jobs running, % completion)


21

Q & A

Basics – Medusa Design - Model


22

Experiments

Evaluations/Results

Evaluation : Setup

Applications (from Gridmax Benchmarks)

Word Count

Web Data Scan

Monster query

Up to 6 GB of problem size generated

Larger sizes also used in some experiments – confirms trends

Evaluation Platform : ExoGini testbed

Geological distribution

Word count : Chicago, California and West Virginia

Others : Pittsburg, Massachusetts and Texas

Each cloud - One Resource Manager + three node manager (4 hosts)

Each experiment repeated 40 times

Linear regression based on 30 historical points

Cloud stressed with external jobs with random workloadsM S Aslam - GMU VSE 11/13/17

24

Evaluations

Companions

Baseline : Round Robin based scheduling

Medusa

Compare % cloud usage on various clouds(varying Input)

Compare the Workspan

Fault tolerance

With No faults

With Faults


25

Results : RR vs Md Workspan [1]

Experiments

Run WordCount, WebScan and MonsterQuerry

Varying Input Data Size (1000 – 60000 MB)

Results

Medusa has Better Workspan in all experiments

Word Count : Almost 300% improvement

WebScan : Better result and Medusa more stable

Monster query – Marginally better result

Observation:

Medusa outperforms RR heavily in the right

conditions (cloud and restrictive through puts)

Better Perf and more stable under any

arrangement

Word Count Run on different Clouds show marked

improvement – 300% improvement


26

Results : RR vs Md Workspan (cntd.)

Experiments

Measure Network Throughput between clouds

Results

Word Count : Through put is lower and variable

Webscan/Monsterquery network through higher and consistent

Cloud to Cloud – Throughput different if direction reversed

(Why?)

Observation:

Word Count through put are over taxing

In restrictive though put scenario Medusa out performs RR

When Through put is high and less varying then Medusa’s

advantage are overshadowed - However Medusa is still

stableM S Aslam - GMU VSE 11/13/17

27

Results : RR vs Md Workspan (cntd.)

Experiments

Measure Cloud usage (work distribution)

Results

Round Robin : Uniform Work distribution

Medusa : Distribute based on estimated Workspan

More workload on Chicago II and WV

Observation:

RR – Assumes even cloud performances

Variation only because of high RTT

Medusa – Takes into account the cloud performance and Network Throughput

End Result is better Workspan by Medusa

Medusa’s work load distribution follows similar trends for Chicago I and CA, and similar for WV and Chicago II 0

5

10

15

20

25

30

35

40

45

50

1500 3000 4500 6000

%C

lou

d U

sage

Medusa - Work eff. for input data size

WV ChgI ChgII CA

0

5

10

15

20

25

30

35

1500 3000 4500 6000

% C

lou

d u

sage

Round Robin – Work eff. for Input data size

WV ChgI ChgII CA

Poly. (WV) Poly. (ChgI) Poly. (ChgII) Poly. (CA)


28

Work Load Distribution Combined

How Workload is distributed on each cloud How RR and Md select clouds for different data size

Observation:

When Medusa Favors the distribution to the cloud with highest Network Through put

Some clouds seem to provide more advantage for a problem of particular data size eg.. Chicago I

Some clouds are selected very consistently - eg. WV (with Medusa)

0

5

10

15

20

25

30

35

40

45

50

WV WV MD ChgI ChgI MD ChgII ChgII MD CA CA MD

Combined RR and MD - Proc usage on clouds

1500 3000 4500 6000

0

10

20

30

40

50

60

1500 3000 4500 6000

% C

lou

d U

sag

e

Combined RR and MD - Proc Usage with Input size

WV ChgI ChgII CA WV MD ChgI MD ChgII MD CA MD


29

Results : Fault Tolerance

Experiments

Medusa With no Faults

Medusa with faults

(Malicious faults - Digest Corrupted after Vanilla job, can not schedule to same cloud

Arbitrary Faults - Digest Corrupted after Vanilla job, But can schedule job to same cloud

Cloud outage - Crash the resource manager

Results

Malicious Fault : Workspan doubles (i.e. Fault is detected and extra replica is launched to arrive at the correct result before moving to global map reduce (same cloud is not trusted again for the replica)

Arbitrary Fault : Workspan improves (i.e. With high probability of job is launched on the same server

Cloud Outage - Requires the highest increases in Workspan – Unlike digest faults cloud outage takes more time to detect – and the copy of job has to looked up in another cloud.

,

Observation:

Medusa is able to detect and correct the arbitrary, malicious and cloud outage faults

With high input size Medusa has better Workspan for word count even with a fault as compared to no fault RR Workspan. produce better RR –Assumes even cloud performances


30

Critique Model for Estimated work s[an not detailed

Cloud Processing power estimates not presented

Why Throughput is different when directions reversed?

Stark difference in through puts between apps (were the jobs not normalized)

Only 1 fault – f=1 assumed

Mix of faults (Arbitrary + Cloud outage) not explored

Detection of Malicious and Arbitrary faults - Difference ?

All cloud stats only presented for the clouds for Word Count

Maintain the state of cloud-cloud through put is cumbersome – How expensive ?

Counter critique

Source code made available and can be explore the community


31

Important Take away Distributed MapReduce is effective

Bit prone to faults

Medusa is an efficient scheduler

Takes into account cloud conditions and Network through put

Experiment results support high gains in right condition and application

Always more efficient and stable

Medusa effective for Fault tolerance and recovery

Also effective to recover from faults and detect malicious/arbitrary faults

A very slight cost overhead

Cloud condition and transmission delays contribute significantly to Workspan regardless of the technique used

Future Work Determine the right mix of technique for various application types and clouds

Cloud classifications can be done offline periodically and learnt overtime ?

May be affinity aware application deployment strategy

Apply other hashes other than SH-256 for the digest

More Malicious faults types be studied


32

Q & A

Final


33

References

1. PePedro A. R. S. Costa, Xiao Bai†, Fernando M. V. Ramos, Miguel Correia,

Medusa: An Efficient Cloud Fault-Tolerant MapReduce, 6th IEEE/ACM

International Symposium on Cluster, Cloud, and Grid Computing, 2016

2. J. Dean and S. Ghemawat, “MapReduce: a flexible data processing tool,”

Communications of the ACM, vol. 53, pp. 72–77, Jan. 2010.

3. T. Kurze, M. Klems, D. Bermbach, et al., “Cloud federation,” in Proceedings

of the 2nd International Conference on Cloud Computing, Grids, and

Virtualization, Sept. 2011.

4. M. Kandias, N. Virvilis, and D. Gritzalis, “The insider threat in cloud

computing,” in Critical Information Infrastructure Security, vol. 6983 of LNCS,

pp. 93–103, Springer Berlin Heidelberg, 2013.


34

Additional NotesApplication Basics

WordCount in a multi-cloud system can be considered as building the inverted indexes of a multi-site web search engine for each search site and then aggregating the results

WebdataScan is a benchmarking application that extracts samples from a large data set

Monsterquery is a benchmarking application that queries part of the data from a large data set

---------------------------------------------------------------------------------------------------------------------------------------

Message queuing service (MQ) uses reliable channels so that no messages are lost, duplicated or corrupted. In practice, this is provided by establishing TCP/IP connections (A message is only lost if the cloud is unreachable) also use for Cloud Outage Detection

-----------------------------------------------------------------------------------------------------------

Features for the Processing time on each cloud are retrieved from filesystem information on Node Masters

jobs that are currently running, the percentage of completion of jobs, the number of MapReduce jobs queued, and the size of the input data


35

Back

medusa: an efficient cloud fault-tolerant mapreducemenasce/cs788/slides/cs788x...paper presentation...

Documents