mapreduce: simpli ed data processing on large …overview basic functionality re nements performance...
TRANSCRIPT
MapReduce: Simplified Data Processing on LargeClusters
Jeffrey Dean and Sanjay Ghemawat
Communications of the ACM (CACM), January 2008/Vol51,No 1
Overview Basic Functionality Refinements Performance Conclusion
Earlier Paper
Dean, J. and Ghemawat, S. 2004.MapReduce: Simplified data processing on large clusters.
In Proceedings of Operating Systems Design and Implementation(OSDI). San Francisco, CA. 137-150.
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Motivation:
Process large amounts of data to produce other data
Use hundreds or thousands of CPUs
Make this easy
Map Reduce provides:
Automatic parallelization and distribution
Fault tolerance
I/O scheduling
Status and monitoring
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Programming Model
Two programmer specified functions:
Map
Input: key/value pairs → (k1,v1)Output: intermediate key/value pairs → list(k2,v2)
Reduce
Input: intermediate key/value pairs → (k2,list(v2))Output: List of values → list(v2)
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Counting Example
map(String key, String value):
//key: document name
//value: document contents
for each word w in value:
EmitIntermediate(w, “1”);
reduce(String key, Iterator values):
// key: a word
//values: a list of counts
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Other Examples
Distributed Grep: The map function emits a line if itmatches a supplied pattern. The reduce function is an identityfunction that just copies the supplied intermediate data to theoutput
Count of URL Access Frequency: The map functionprocesses logs of web page requests and outputs 〈URL, 1〉.The reduce function adds together all values for the sameURL and emits a 〈URL, totalcount〉 pair.
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Other Examples
Distributed Grep: The map function emits a line if itmatches a supplied pattern. The reduce function is an identityfunction that just copies the supplied intermediate data to theoutput
Count of URL Access Frequency: The map functionprocesses logs of web page requests and outputs 〈URL, 1〉.The reduce function adds together all values for the sameURL and emits a 〈URL, totalcount〉 pair.
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Other Examples
Reverse Web-Link Graph: The map function outputs〈target, source〉 pairs for each link to a target URL found in apage named source. The reduce function concatenates the listof all source URLs associated with a given target URL andemits the pair: 〈target,list(source)〉
Term-Vector per Host: A term vector summarizes the mostimportant words that occur in a document or a set ofdocuments as a list of 〈word , frequency〉 pairs. The mapfunction emits a 〈hostname, term vector〉 pair for each inputdocument (where the hostname is extracted from the URL ofthe document). The reduce function is passed allper-document term vectors for a given host. It ads these termvectors together, throwing away infrequent terms, then emitsa final 〈hostname, term vector〉 pair.
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Other Examples
Reverse Web-Link Graph: The map function outputs〈target, source〉 pairs for each link to a target URL found in apage named source. The reduce function concatenates the listof all source URLs associated with a given target URL andemits the pair: 〈target,list(source)〉Term-Vector per Host: A term vector summarizes the mostimportant words that occur in a document or a set ofdocuments as a list of 〈word , frequency〉 pairs. The mapfunction emits a 〈hostname, term vector〉 pair for each inputdocument (where the hostname is extracted from the URL ofthe document). The reduce function is passed allper-document term vectors for a given host. It ads these termvectors together, throwing away infrequent terms, then emitsa final 〈hostname, term vector〉 pair.
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Other Examples
Inverted Index: The map function parses each document,emits a sequence of 〈word , document ID〉 pairs. The reducefunction accepts all pairs for a given word, sorts thecorresponding document IDs and emits a〈word , list(document ID)〉 pair. The set of all output pairsform a simple inverted index. It is easy to augment thiscomputation to keep track of word positions
Distributed Sort: The map function extracts the key fromeach record, and emits a 〈key , record〉 pair. The reducefunction emits all pairs unchanged. This computation dependson the partitioning and ordering functions of the MapReduceLibrary.
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Other Examples
Inverted Index: The map function parses each document,emits a sequence of 〈word , document ID〉 pairs. The reducefunction accepts all pairs for a given word, sorts thecorresponding document IDs and emits a〈word , list(document ID)〉 pair. The set of all output pairsform a simple inverted index. It is easy to augment thiscomputation to keep track of word positions
Distributed Sort: The map function extracts the key fromeach record, and emits a 〈key , record〉 pair. The reducefunction emits all pairs unchanged. This computation dependson the partitioning and ordering functions of the MapReduceLibrary.
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
1. fork
Split input files
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
2. assign map/reduce
Assign map and reduce tasks to idle workers
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
3. read
Read the contents of corresponding file split
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
4. local write
The output of map is written locally
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
5. remote read
Intermediate keys are remotely read to same places
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
6. write
Reduce workers work and write to a file output file
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Master Data Structures
For each task:
Store idle, in-progress, completed
Tracks intermediate data of in-progress nodes
Uses this data structure for fault tolerance
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Worker Failures
Master pings workers
If worker doesn’t answer, worker is marked as failed
All jobs (both in-progress and completed) from that workerare reset to idle
Nuances:
Completed map tasks are rescheduled because of data localityand thus inaccessibleCompleted reduce tasks are stored globally
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Locality
Network transfer are expensive
Every worker has subsets of the data
When available, tasks are assigned to workers that alreadyhave the necessary data
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Task Granularity
Map phase is divided into M pieces (usually 16-64 MB)
Reduce phase is divided into R pieces
M and R are usually much larger the number of workers
This improves dynamic load balancing
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Task Granularity
Upper bounds on granularity:
O(M+R) scheduling decisionsO(M*R) data structure state in memory
R is also constrained by output requirements
In practice:
An M so that each task is is 16-64 MB makes localoptimization most effectiveAn R that is a small multiple of the number of workers
Example:
M=200,000R=5,000Workers = 2,000
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Task Granularity
Upper bounds on granularity:
O(M+R) scheduling decisionsO(M*R) data structure state in memory
R is also constrained by output requirements
In practice:
An M so that each task is is 16-64 MB makes localoptimization most effectiveAn R that is a small multiple of the number of workers
Example:
M=200,000R=5,000Workers = 2,000
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Backup Tasks
Stragglers are an issue
When close to completion (Idle workers is larger than idletasks), reschedule in progress tasks
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Partitioning
R is the number of reduce tasks/output files.
Data is partitioned R times using the intermediate keys
e.g. “hash(key) mod R”
Results in well balanced partitions
Can specify special partitioning function
“hash(Hostname(urlkey)) mod R”
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Ordering Guarantees
Within a partition, reducers process intermediate key/valuepairs in increasing key order
Makes it easy to generate sorted output files per partition
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Combiner Function
Significant repetition of intermediate key
Example:
〈the, 1〉 occurs many times (Zipf distribution)
A combiner function can be specified
Each map worker pre-processes using the combiner beforesending data over the network
Partial combining significantly speeds up certain classes ofoperations (2004 paper has example in Appendix A)
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Other Refinements
Input and Output Types: custom filetypes through a readerinterface
Side Effects: special reduce worker output files
Skipping Bad Records: Bugs occur (sometimes outsideprogrammer control such as in 3rd party libraries)deterministically on specific records. MapReduce can detectthis and begin skipping them to avoid crashes
Local Execution: Alternative (sequential) implementation fordebugging
Status Information: The master maintains an HTTP serverfor worker and job status
Counters: MapReduce can count a variety of events andprovide statistics on them
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Grep
1010 100-Bye records for rare three-character pattern
64 MB Map pieces (M=15000)
Output in 1 file (R=1)
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Sort
1010 records
Modeled after TeraSort benchmark
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Sort
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
When is it not appropriate?
No Real-time Processing (MR is best suited to batch jobs)
Not always easy to implement things as an MR
No talk between tasks
Shuffling data is still expensive
MapReduce: Simplified Data Processing on Large Clusters
Overview Basic Functionality Refinements Performance Conclusion
Conclusion
MapReduce is fairly ubiquitous in Google operations
Used for indexing and many other applications
Allows programmers to not think about parallelization toohard
Handles failures elegantly
Questions?
MapReduce: Simplified Data Processing on Large Clusters