mapreduce: simpli ed data processing on large …overview basic functionality re nements performance...

33
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat Google Communications of the ACM (CACM), January 2008/Vol 51,No 1

Upload: others

Post on 27-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

MapReduce: Simplified Data Processing on LargeClusters

Jeffrey Dean and Sanjay Ghemawat

Google

Communications of the ACM (CACM), January 2008/Vol51,No 1

Page 2: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Earlier Paper

Dean, J. and Ghemawat, S. 2004.MapReduce: Simplified data processing on large clusters.

In Proceedings of Operating Systems Design and Implementation(OSDI). San Francisco, CA. 137-150.

MapReduce: Simplified Data Processing on Large Clusters

Page 3: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Motivation:

Process large amounts of data to produce other data

Use hundreds or thousands of CPUs

Make this easy

Map Reduce provides:

Automatic parallelization and distribution

Fault tolerance

I/O scheduling

Status and monitoring

MapReduce: Simplified Data Processing on Large Clusters

Page 4: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Programming Model

Two programmer specified functions:

Map

Input: key/value pairs → (k1,v1)Output: intermediate key/value pairs → list(k2,v2)

Reduce

Input: intermediate key/value pairs → (k2,list(v2))Output: List of values → list(v2)

MapReduce: Simplified Data Processing on Large Clusters

Page 5: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Counting Example

map(String key, String value):

//key: document name

//value: document contents

for each word w in value:

EmitIntermediate(w, “1”);

reduce(String key, Iterator values):

// key: a word

//values: a list of counts

int result = 0;

for each v in values:

result += ParseInt(v);

Emit(AsString(result));

MapReduce: Simplified Data Processing on Large Clusters

Page 6: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Other Examples

Distributed Grep: The map function emits a line if itmatches a supplied pattern. The reduce function is an identityfunction that just copies the supplied intermediate data to theoutput

Count of URL Access Frequency: The map functionprocesses logs of web page requests and outputs 〈URL, 1〉.The reduce function adds together all values for the sameURL and emits a 〈URL, totalcount〉 pair.

MapReduce: Simplified Data Processing on Large Clusters

Page 7: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Other Examples

Distributed Grep: The map function emits a line if itmatches a supplied pattern. The reduce function is an identityfunction that just copies the supplied intermediate data to theoutput

Count of URL Access Frequency: The map functionprocesses logs of web page requests and outputs 〈URL, 1〉.The reduce function adds together all values for the sameURL and emits a 〈URL, totalcount〉 pair.

MapReduce: Simplified Data Processing on Large Clusters

Page 8: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Other Examples

Reverse Web-Link Graph: The map function outputs〈target, source〉 pairs for each link to a target URL found in apage named source. The reduce function concatenates the listof all source URLs associated with a given target URL andemits the pair: 〈target,list(source)〉

Term-Vector per Host: A term vector summarizes the mostimportant words that occur in a document or a set ofdocuments as a list of 〈word , frequency〉 pairs. The mapfunction emits a 〈hostname, term vector〉 pair for each inputdocument (where the hostname is extracted from the URL ofthe document). The reduce function is passed allper-document term vectors for a given host. It ads these termvectors together, throwing away infrequent terms, then emitsa final 〈hostname, term vector〉 pair.

MapReduce: Simplified Data Processing on Large Clusters

Page 9: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Other Examples

Reverse Web-Link Graph: The map function outputs〈target, source〉 pairs for each link to a target URL found in apage named source. The reduce function concatenates the listof all source URLs associated with a given target URL andemits the pair: 〈target,list(source)〉Term-Vector per Host: A term vector summarizes the mostimportant words that occur in a document or a set ofdocuments as a list of 〈word , frequency〉 pairs. The mapfunction emits a 〈hostname, term vector〉 pair for each inputdocument (where the hostname is extracted from the URL ofthe document). The reduce function is passed allper-document term vectors for a given host. It ads these termvectors together, throwing away infrequent terms, then emitsa final 〈hostname, term vector〉 pair.

MapReduce: Simplified Data Processing on Large Clusters

Page 10: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Other Examples

Inverted Index: The map function parses each document,emits a sequence of 〈word , document ID〉 pairs. The reducefunction accepts all pairs for a given word, sorts thecorresponding document IDs and emits a〈word , list(document ID)〉 pair. The set of all output pairsform a simple inverted index. It is easy to augment thiscomputation to keep track of word positions

Distributed Sort: The map function extracts the key fromeach record, and emits a 〈key , record〉 pair. The reducefunction emits all pairs unchanged. This computation dependson the partitioning and ordering functions of the MapReduceLibrary.

MapReduce: Simplified Data Processing on Large Clusters

Page 11: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Other Examples

Inverted Index: The map function parses each document,emits a sequence of 〈word , document ID〉 pairs. The reducefunction accepts all pairs for a given word, sorts thecorresponding document IDs and emits a〈word , list(document ID)〉 pair. The set of all output pairsform a simple inverted index. It is easy to augment thiscomputation to keep track of word positions

Distributed Sort: The map function extracts the key fromeach record, and emits a 〈key , record〉 pair. The reducefunction emits all pairs unchanged. This computation dependson the partitioning and ordering functions of the MapReduceLibrary.

MapReduce: Simplified Data Processing on Large Clusters

Page 12: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

1. fork

Split input files

MapReduce: Simplified Data Processing on Large Clusters

Page 13: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

2. assign map/reduce

Assign map and reduce tasks to idle workers

MapReduce: Simplified Data Processing on Large Clusters

Page 14: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

3. read

Read the contents of corresponding file split

MapReduce: Simplified Data Processing on Large Clusters

Page 15: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

4. local write

The output of map is written locally

MapReduce: Simplified Data Processing on Large Clusters

Page 16: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

5. remote read

Intermediate keys are remotely read to same places

MapReduce: Simplified Data Processing on Large Clusters

Page 17: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

6. write

Reduce workers work and write to a file output file

MapReduce: Simplified Data Processing on Large Clusters

Page 18: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Master Data Structures

For each task:

Store idle, in-progress, completed

Tracks intermediate data of in-progress nodes

Uses this data structure for fault tolerance

MapReduce: Simplified Data Processing on Large Clusters

Page 19: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Worker Failures

Master pings workers

If worker doesn’t answer, worker is marked as failed

All jobs (both in-progress and completed) from that workerare reset to idle

Nuances:

Completed map tasks are rescheduled because of data localityand thus inaccessibleCompleted reduce tasks are stored globally

MapReduce: Simplified Data Processing on Large Clusters

Page 20: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Locality

Network transfer are expensive

Every worker has subsets of the data

When available, tasks are assigned to workers that alreadyhave the necessary data

MapReduce: Simplified Data Processing on Large Clusters

Page 21: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Task Granularity

Map phase is divided into M pieces (usually 16-64 MB)

Reduce phase is divided into R pieces

M and R are usually much larger the number of workers

This improves dynamic load balancing

MapReduce: Simplified Data Processing on Large Clusters

Page 22: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Task Granularity

Upper bounds on granularity:

O(M+R) scheduling decisionsO(M*R) data structure state in memory

R is also constrained by output requirements

In practice:

An M so that each task is is 16-64 MB makes localoptimization most effectiveAn R that is a small multiple of the number of workers

Example:

M=200,000R=5,000Workers = 2,000

MapReduce: Simplified Data Processing on Large Clusters

Page 23: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Task Granularity

Upper bounds on granularity:

O(M+R) scheduling decisionsO(M*R) data structure state in memory

R is also constrained by output requirements

In practice:

An M so that each task is is 16-64 MB makes localoptimization most effectiveAn R that is a small multiple of the number of workers

Example:

M=200,000R=5,000Workers = 2,000

MapReduce: Simplified Data Processing on Large Clusters

Page 24: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Backup Tasks

Stragglers are an issue

When close to completion (Idle workers is larger than idletasks), reschedule in progress tasks

MapReduce: Simplified Data Processing on Large Clusters

Page 25: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Partitioning

R is the number of reduce tasks/output files.

Data is partitioned R times using the intermediate keys

e.g. “hash(key) mod R”

Results in well balanced partitions

Can specify special partitioning function

“hash(Hostname(urlkey)) mod R”

MapReduce: Simplified Data Processing on Large Clusters

Page 26: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Ordering Guarantees

Within a partition, reducers process intermediate key/valuepairs in increasing key order

Makes it easy to generate sorted output files per partition

MapReduce: Simplified Data Processing on Large Clusters

Page 27: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Combiner Function

Significant repetition of intermediate key

Example:

〈the, 1〉 occurs many times (Zipf distribution)

A combiner function can be specified

Each map worker pre-processes using the combiner beforesending data over the network

Partial combining significantly speeds up certain classes ofoperations (2004 paper has example in Appendix A)

MapReduce: Simplified Data Processing on Large Clusters

Page 28: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Other Refinements

Input and Output Types: custom filetypes through a readerinterface

Side Effects: special reduce worker output files

Skipping Bad Records: Bugs occur (sometimes outsideprogrammer control such as in 3rd party libraries)deterministically on specific records. MapReduce can detectthis and begin skipping them to avoid crashes

Local Execution: Alternative (sequential) implementation fordebugging

Status Information: The master maintains an HTTP serverfor worker and job status

Counters: MapReduce can count a variety of events andprovide statistics on them

MapReduce: Simplified Data Processing on Large Clusters

Page 29: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Grep

1010 100-Bye records for rare three-character pattern

64 MB Map pieces (M=15000)

Output in 1 file (R=1)

MapReduce: Simplified Data Processing on Large Clusters

Page 30: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Sort

1010 records

Modeled after TeraSort benchmark

MapReduce: Simplified Data Processing on Large Clusters

Page 31: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Sort

MapReduce: Simplified Data Processing on Large Clusters

Page 32: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

When is it not appropriate?

No Real-time Processing (MR is best suited to batch jobs)

Not always easy to implement things as an MR

No talk between tasks

Shuffling data is still expensive

MapReduce: Simplified Data Processing on Large Clusters

Page 33: MapReduce: Simpli ed Data Processing on Large …Overview Basic Functionality Re nements Performance Conclusion Earlier Paper Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli ed data

Overview Basic Functionality Refinements Performance Conclusion

Conclusion

MapReduce is fairly ubiquitous in Google operations

Used for indexing and many other applications

Allows programmers to not think about parallelization toohard

Handles failures elegantly

Questions?

MapReduce: Simplified Data Processing on Large Clusters