hdfs-hc: a data placement module for heterogeneous hadoop clusters

1

HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters

23/4/11 1

Xiao Qin

Department of Computer Science and Software Engineering

Auburn Universityhttp://www.eng.auburn.edu/~xqin

[email protected]

Slides 2-20 are adapted from notes by Subbarao Kambhampati (ASU), Dan Weld (U. Washington), Jeff Dean, Sanjay Ghemawat, (Google, Inc.)

2

Motivation

• Large-Scale Data Processing– Want to use 1000s of CPUs

• But don’t want hassle of managing things

• MapReduce provides– Automatic parallelization & distribution– Fault tolerance– I/O scheduling– Monitoring & status updates

3

Map/Reduce• Map/Reduce

– Programming model from Lisp – (and other functional languages)

• Many problems can be phrased this way• Easy to distribute across nodes• Nice retry/failure semantics

4

Distributed Grep

Very big

data

Split data

Split data

Split data

Split data

grep

grep

grep

grep

matches

matches

matches

matches

catAll

matches

5

Distributed Word Count

Very big

data

Split data

Split data

Split data

Split data

count

count

count

count

count

count

count

count

mergemergedcount

6

Map Reduce

• Map:– Accepts input

key/value pair– Emits intermediate

key/value pair

• Reduce :– Accepts intermediate

key/value* pair– Emits output key/value

pair

Very big

data

ResultMAP

REDUCE

PartitioningFunction

7

Map in Lisp (Scheme)

• (map f list [list2 list3 …])

• (map square ‘(1 2 3 4))– (1 4 9 16)

• (reduce + ‘(1 4 9 16))– (+ 16 (+ 9 (+ 4 1) ) )– 30

• (reduce + (map square (map – l1 l2))))

Unary operator

Binary operator

8

Map/Reduce ala Google• map(key, val) is run on each item in set

– emits new-key / new-val pairs

• reduce(key, vals) is run for each unique key emitted by map()– emits final output

9

count words in docs

– Input consists of (url, contents) pairs

– map(key=url, val=contents):• For each word w in contents, emit (w, “1”)

– reduce(key=word, values=uniq_counts):• Sum all “1”s in values list• Emit result “(word, sum)”

10

Count, Illustrated

map(key=url, val=contents):For each word w in contents, emit (w, “1”)

reduce(key=word, values=uniq_counts):Sum all “1”s in values list

Emit result “(word, sum)”

see bob throw

see spot run

see 1

bob 1

run 1

see 1

spot 1

throw 1

bob 1 Run 1

see 2

spot 1

throw 1

11

Grep

– Input consists of (url+offset, single line)– map(key=url+offset, val=line):

• If contents matches regexp, emit (line, “1”)

– reduce(key=line, values=uniq_counts):• Don’t do anything; just emit line

12

Reverse Web-Link Graph

• Map– For each URL linking to target, …– Output <target, source> pairs

• Reduce– Concatenate list of all source URLs– Outputs: <target, list (source)> pairs

13

Example uses: distributed grep distributed sort web link-graph reversal

term-vector / host web access log stats inverted index construction

document clustering machine learning statistical machine translation

... ... ...

Model is Widely ApplicableMapReduce Programs In Google Source Tree

14

Typical cluster:

• 100s/1000s of 2-CPU x86 machines, 2-4 GB of memory • Limited bisection bandwidth • Storage is on local IDE disks • GFS: distributed file system manages data (SOSP'03) • Job scheduling system: jobs made up of tasks,

scheduler assigns tasks to machines

Implementation is a C++ library linked into user programs

Implementation Overview

15

Execution

• How is this distributed?1. Partition input key/value pairs into chunks,

run map() tasks in parallel

2. After all map()s are complete, consolidate all emitted values for each unique emitted key

3. Now partition space of output map keys, and run reduce() in parallel

• If map() or reduce() fails, reexecute!

16

Job Processing

JobTracker

TaskTracker 0TaskTracker 1 TaskTracker 2

TaskTracker 3 TaskTracker 4 TaskTracker 5

1. Client submits “grep” job, indicating code and input files

2. JobTracker breaks input file into k chunks, (in this case 6). Assigns work to ttrackers.

3. After map(), tasktrackers exchange map-output to build reduce() keyspace

4. JobTracker breaks reduce() keyspace into m chunks (in this case 6). Assigns work.

5. reduce() output may go to NDFS

“ grep”

17

Execution

18

Parallel Execution

19

Task Granularity & Pipelining• Fine granularity tasks: map tasks >> machines

– Minimizes time for fault recovery– Can pipeline shuffling with map execution– Better dynamic load balancing

• Often use 200,000 map & 5000 reduce tasks • Running on 2000 machines

20

MapReduce outside Google

• Hadoop (Java)– Emulates MapReduce and GFS

• The architecture of Hadoop MapReduce and DFS is master/slave

Master Slave

MapReduce jobtracker tasktracker

DFS namenode datanode

Improving MapReduce Performance through Data Placement in

Heterogeneous Hadoop Clusters

Download Software at: http://www.eng.auburn.edu/~xqin/software/hdfs-hc

This HDFS-HC tool was described in our paper - Improving MapReduce Performance via Data Placement in Heterogeneous Hadoop Clusters - by J. Xie, S. Yin, X.-J. Ruan, Z.-Y. Ding, Y. Tian, J. Majors, and X. Qin, published in Proc. 19th Int'l Heterogeneity in Computing Workshop, Atlanta, Georgia, April 2010.

http://www.eng.auburn.edu/~xqin/software/hdfs-hc

http://www.eng.auburn.edu/users/xzq0001/pubs/hcw10.pdf





22

Hadoop Overview

2222

(J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI ’04, pages 137–150, 2008)

23

One time setup

• set hadoop-site.xml and slaves• Initiate namenode

• Run Hadoop MapReduce and DFS

• Upload your data to DFS

• Run your process…

• Download your data from DFS

24

Hadoop Distributed File System

2424

(http://lucene.apache.org/hadoop)

25

Motivational Example

Time (min)

Node A(fast)

Node B(slow)

Node C(slowest)

2x slower

3x slower

1 task/min

26

The Native Strategy

Node A

Node B

Node C

3 tasks

2 tasks

6 tasks

Loading Transferring Processing 2626

Time (min)

27

Our Solution--Reducing data transfer time

2727

Node A’

Node B’

Node C’

3 tasks

2 tasks

6 tasks

Loading Transferring Processing 2727

Time (min)

Node A

28

Preliminary Results

Impact of data placement on performance of grep

29

Challenges

• Does computing ratio depend on the application?

• Initial data distribution

• Data skew problem– New data arrival– Data deletion – New joining node– Data updating

30

Measure Computing Ratios• Computing ratio

• Fast machines process large data sets

3030

Time

Node A

Node B

Node C

2x slower

3x slower

1 task/min

31

Steps to MeasureComputing Ratios

3131

Node Response time(s)

Ratio # of File Fragments

Speed

Node A 10 1 6 Fastest

Node B 20 2 3 Average

Node C 30 3 2 Slowest

1. Run the application on each node with the same size data, individually collect the response time

2. Set the ratio of the shortest response as 1, accordingly set the ratio of other nodes

3.Caculate the least common multiple of these ratios

4. Count the portion of each node

32

Initial Data Distribution

Namenode

Datanodes

112233

File1445566

778899

aabb

cc

• Input files split into 64MB blocks

• Round-robin data distribution algorithm

CBA

3232

Portion 3:2:1

33

1Data Redistribution

1.Get network topology, the ratio and utilization

2.Build and sort two lists:under-utilized node list L1

over-utilized node list L2

3. Select the source and destination node from the lists.

4.Transfer data

5.Repeat step 3, 4 until the list is empty.

Namenode

1122

33

4455

66778899

aabbcc

CA

CBA

B

234

L1

L2

3333

Portion 3:2:1

34

Sharing Files among Multiple Applications

• The computing ratio depends on data-intensive applications.– Redistribution– Redundancy

3434

35

Experimental Environment

Five nodes in a hadoop heterogeneous cluster

3535

Node CPU Model CPU(Hz) L1 Cache(KB)

Node A Intel core 2 Duo 2*1G=2G 204

Node B Intel Celeron 2.8G 256

Node C Intel Pentium 3 1.2G 256

Node D Intel Pentium 3 1.2G 256

Node E Intel Pentium 3 1.2G 256

36

Grep and WordCount

• Grep is a tool searching for a regular expression in a text file

• WordCount is a program used to count words in a text file

3636

37

Computing ratio for two applications

3737

Computing ratio of the five nodes with respective of Grep and Wordcount applications

Computing Node Ratios for Grep Ratios for Wordcount

Node A 1 1

Node B 2 2

Node C 3.3 5

Node D 3.3 5

Node E 3.3 5

38

Response time of Grep andwordcount in each Node

3838

Application dependence

Data size independence

39

Six Data Placement Decisions

3939

40

Impact of data placement on performance of Grep

4040

41

Impact of data placement on performance of WordCount

4141

42

Conclusion

• Identify the performance degradation caused by heterogeneity.

• Designed and implemented a data placement mechanism in HDFS.

4242

43

Future Work

• Data redundancy issue

• Dynamic data distribution mechanism

• Prefetching

4343

44

Fellowship Program Samuel Ginn College of Engineering at

Auburn University• Dean's Fellowship: $32,000 per year plus tuition

fellowship• College Fellowship: $24,000 per year plus tuition

fellowship• Departmental Fellowship: $20,000 per year plus tuition

fellowship.• Tuition Fellowships: Tuition Fellowships provide a full

tuition waiver for a student with a 25 percent or greater full-time-equivalent (FTE) assignment. Both graduate research assistants (GRAs) and graduate teaching assistants (GTAs) are eligible.

45

http://www.eng.auburn.edu/programs/grad-school/fellowship-program/

46

http://www.eng.auburn.edu

47


48


Download the presentation slideshttp://www.slideshare.net/xqin74

Google: slideshare Xiao Qin

http://www.slideshare.net/xqin74

50

Questions

hdfs-hc: a data placement module for heterogeneous hadoop clusters

Technology

data placement

map execution

simplified data processing

map tasks machines

grep input

mapreduce performance

grep job

values list