principles of data management lecture #16 (mapreduce & dfs ... · the google odsi 2004 talk...
TRANSCRIPT
![Page 1: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying](https://reader034.vdocument.in/reader034/viewer/2022042418/5f34391e009fa030e3180c68/html5/thumbnails/1.jpg)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Principles of Data Management
Lecture #16 (MapReduce & DFS for Big Data)
Instructor: Mike Carey [email protected]
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
Today’s News Bulletin
v Project dates § Query execution layer is due on 3/17
v Upcoming lectures § Today: MapReduce (and distributed file systems) § Next week: Wrap-up & review, in-class endterm
v Other upcoming events § The long-lost midterms will appear on Tuesday!
v Class participation opportunities § Teaching evaluations (after Tuesday J) § End-of-term opinion survey (watch for it!)
![Page 2: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying](https://reader034.vdocument.in/reader034/viewer/2022042418/5f34391e009fa030e3180c68/html5/thumbnails/2.jpg)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3
Motivation
v Google needed to process web-scale data § Data much larger than what fits on one machine § Needed parallel processing to get results in a
reasonable time § Wanted to use cheap commodity machines to do
the job v Credits: Some of the following slide content is excerpted from
the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying DFS.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4
Requirements
v Solution must § Scale to 1000s of compute nodes § Must automatically handle faults § Provide monitoring of jobs § Be easy for programmers to use
![Page 3: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying](https://reader034.vdocument.in/reader034/viewer/2022042418/5f34391e009fa030e3180c68/html5/thumbnails/3.jpg)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5
MapReduce Programming model
v Input and Output are sets of key/value pairs v Programmer provides two functions
§ map(K1, V1) -> list(K2, V2) • Produces list of intermediate key/value pairs for each
input key/value pair
§ reduce(K2, list(V2)) -> list(K3, V3) • Produces a list of result values for all intermediate values
that are associated with the same intermediate key
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6
MapReduce Pipeline
Map Shuffle Reduce
Read from DFS Write to DFS
![Page 4: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying](https://reader034.vdocument.in/reader034/viewer/2022042418/5f34391e009fa030e3180c68/html5/thumbnails/4.jpg)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7
MapReduce in Action
Map (k1, v1) à list(k2, v2) • Processes one input key/value pair • Produces a set of intermediate key/value pairs
Reduce (k2, list(v2)) list(k3, v3) • Combines intermediate values for one particular key • Produces a set of merged output values (usually one)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8
MapReduce Architecture
MapReduce MapReduce MapReduce MapReduce
Distributed File System
Network
MapReduce Job Tracker
![Page 5: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying](https://reader034.vdocument.in/reader034/viewer/2022042418/5f34391e009fa030e3180c68/html5/thumbnails/5.jpg)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9
Software Components
v Job Tracker (Master) § Maintains Cluster membership of workers § Accepts MR jobs from clients and dispatches tasks
to workers § Monitors workers’ progress § Restarts tasks in the event of failure
v Task Tracker (Worker) § Provides an environment to run a task § Maintains and serves intermediate files between
Map and Reduce phases
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10
MapReduce Parallelism
Hash Partitioning
![Page 6: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying](https://reader034.vdocument.in/reader034/viewer/2022042418/5f34391e009fa030e3180c68/html5/thumbnails/6.jpg)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11
Example 1: Count Word Occurrences
v Input: Set of (Document name, Document Contents)
v Output: Set of (Word, Count(Word)) v map(k1, v1):
for each word w in v1 emit(w, 1)
v reduce(k2, v2_list): int result = 0; for each v in v2_list
result += v; emit(k2, result)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12
Map
Example 1: Count Word Occurrences
Map
Reduce
Reduce
this is a line
this is another line
another line
yet another line
this, 1 is, 1 a, 1 line, 1 this, 1 is, 1 another, 1 line, 1
another, 1 line, 1
yet, 1 another, 1 line, 1
a, 1 another, 1 is, 1 is, 1
line, 1 line, 1 this, 1 this, 1
another, 1 another, 1
line, 1 line, 1 yet, 1
a, 1 another, 3 is, 2
line, 4 this, 2 yet, 1
![Page 7: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying](https://reader034.vdocument.in/reader034/viewer/2022042418/5f34391e009fa030e3180c68/html5/thumbnails/7.jpg)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13
(Picture borrowed from Shiv Babu @ Duke University)
MapReduce Pipeline Revisited
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14
Example 2: Equijoins
v Input: Rows of Relation R, Rows of Relation S v Output: R join S on R.x = S.y v map(k1, v1)
if (input == R) emit(v1.x, [“R”, v1])
else emit(v1.y, [“S”, v2])
v reduce(k2, v2_list) for r in v2_list where r[1] == “R”
for s in v2_list where s[1] == “S” emit(1, result(r[2], s[2]))
![Page 8: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying](https://reader034.vdocument.in/reader034/viewer/2022042418/5f34391e009fa030e3180c68/html5/thumbnails/8.jpg)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15
Other Examples
v Distributed grep v Inverted index construction v Machine learning v Distributed sort v Fuzzy join v … v Or: A Pig script or a Hive query (which are
then auto-converted to a Hadoop MapReduce job series under the covers) – e.g., at Netflix
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16
Fault Tolerant Evaluation
v Task Fault Tolerance is achieved through re-execution ( roll forward, not back!)
v All consumers consume data only after completely generated by the producer § This is an important property to isolate faults to
one task
v Task completion committed through Master v Cannot handle master failure
![Page 9: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying](https://reader034.vdocument.in/reader034/viewer/2022042418/5f34391e009fa030e3180c68/html5/thumbnails/9.jpg)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17
Task granularity and pipelining
v Fine granularity tasks § Many more map tasks than machines
• Minimizes time for fault recovery • Pipelines shuffling with map execution • Better load balancing
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18
Optimization: Combiners
v Sometimes partial aggregation is possible on the Map side
v May cut down the amount of data needing to be transferred to the reducer (significantly in some cases, like grouped aggregation in Hive)
v combine(K2, list(V2)) -> K2, list(V2) v For Word Occurrence Count example,
Combine == Reduce (Q: Why?)
![Page 10: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying](https://reader034.vdocument.in/reader034/viewer/2022042418/5f34391e009fa030e3180c68/html5/thumbnails/10.jpg)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19
Map
Example 1: Word Count Revisited (With Combiners)
Map
Reduce
Reduce
this is a line
this is another line
another line
yet another line
this, 1 is, 1 a, 1 line, 1 this, 1 is, 1 another, 1 line, 1
another, 1 line, 1
yet, 1 another, 1 line, 1
a, 1 another, 1 is, 2
line, 2 this, 2
another, 2
line, 2 yet, 1
a, 1 another, 3 is, 2
line, 4 this, 2 yet, 1
ß Inte
rmedi
ate
Result
s
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 20
Optimization: Redundant Execution
v Slow workers lengthen completion time v Slowness happens because of
§ Other jobs consuming resources § Bad disks/network etc
v Solution: Near the end of the job spawn extra copies of long running tasks § Whichever copy finishes first, wins. § Kill the rest
v In Hadoop this is called “speculative execution”
![Page 11: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying](https://reader034.vdocument.in/reader034/viewer/2022042418/5f34391e009fa030e3180c68/html5/thumbnails/11.jpg)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 21
Optimization: Locality
v Task scheduling policy § Ask DFS (next topic!) for locations of replicas of
input file blocks § Map tasks scheduled so that input blocks are
machine local or rack local
v Effect: Tasks read data at local disk speeds v Without this, rack switches limit data rate
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 22
Distributed (Big!) Filesystem
v Used as the “store” for MapReduce data v MapReduce reads its input from DFS and
writes its output to DFS v Provides a “shared disk” view to applications
using local storage on shared-nothing hardware v Provides redundancy by replication to protect
from node/disk failures
![Page 12: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying](https://reader034.vdocument.in/reader034/viewer/2022042418/5f34391e009fa030e3180c68/html5/thumbnails/12.jpg)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 23
DFS Architecture
Taken from Ghemawat’s SOSP’03 paper (The Google Filesystem)
• Single Master (with backups) that track DFS file name to chunk mapping • Several Chunk servers that store chunks on local disks • Chunk Size ~ 64MB or larger • Chunks are replicated • Master only used for chunk lookups – Does not participate in transfer of data
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 24
Chunk Replication
v Several Replicas of each Chunk § Replicas usually spread across racks and data centers
to maximize availability § 3 replicas common (local, same rack, different rack)
v Master tracks location of each replica of a chunk v When chunk failure is detected, master
automatically rebuilds new replica to maintain replication level
v Automatically picks chunk servers for new replicas based on utilization
![Page 13: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying](https://reader034.vdocument.in/reader034/viewer/2022042418/5f34391e009fa030e3180c68/html5/thumbnails/13.jpg)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 25
MapReduce & DFS: Summary
v Google laid a foundation for a new flurry of large-scale data storage and processing with their MR and DFS work in the early 2000’s
v Apache open source versions soon sprung up outside of Google: Hadoop MapReduce & HDFS
v Today, Big Data use cases are addressed with a mix of parallel RDBMS technologies as well as more “flexible” Hadoop-based technologies
v So where are we now…?
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 26
(Pig)
Today’s Tangled World
![Page 14: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying](https://reader034.vdocument.in/reader034/viewer/2022042418/5f34391e009fa030e3180c68/html5/thumbnails/14.jpg)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 27
(Pig)
Today’s Tangled World
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 28
Additional Reading v Original MapReduce Paper (*** MUST READ! ***)
§ “Simplified Data Processing on Large Clusters” by Jeffrey Dean and Sanjay Ghemawat in OSDI ’04
v Original DFS Paper § “The Google Filesystem” by Sanjay Ghemawat, Howard Gobioff, and
Shun-Tak Leung in SOSP ’03
v MapReduce vs. Parallel DBMS Papers in CACM (Jan. 2010) § “MapReduce and Parallel DBMSs: Friends or Foes?” by Michael
Stonebraker, Daniel Abadi, David DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, and Alexander Rasin
§ “MapReduce: A Flexible Data Processing Tool” by Jeffrey Dean and Sanjay Ghemawat
v EDBT “Ogres & Onions Keynote” Paper § “Inside "Big Data Management": Ogres, Onions, or Parfaits?” by Vinayak
Borkar, Michael J. Carey, and Chen Li in EDBT '12 (or watch the movie )