mapreduce introduction | overview | online training | basics
TRANSCRIPT
Introduction to Hadoop Map Reduce
Email: [email protected] us: +91 8099776681
www.kerneltraining.com
MapReduce
Pre-requisites for learning MapReduce ??
1. Hadoop Framework2. Distributed storage system such as HDFS3. Parallel programming concepts
www.kerneltraining.com
MapReduce
Overview of mapreduce workflow
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Reducer
Mapping Phase
Reducing Pahase
Map Input List
www.kerneltraining.com
MapReduce
<1, Delhi Mumbai Delhi>
<Delhi, 1><Mumbai, 1><Delhi, 1>
<2, Bangalore Delhi Chennai>
<3, Mumbai Delhi Chennai>
<Bangalore, 1><Delhi, 1><Chennai, 1>
<Mumbai, 1><Delhi, 1><Chennai, 1>
<Delhi, 1><Delhi, 1><Delhi, 1><Delhi, 1><Bangalore, 1>
<Mumbai, 1><Mumbai, 1>
<Chennai, 1><Chennai, 1>
<Delhi, (1,1,1,1)><Bangalore, 1>
<Mumbai, (1,1)><Chennai, (1,1)>
Delhi Mumbai Delhi
Bangalore Delhi Chennai
Mumbai Delhi Chennai
Map Phase
Shuffle/Sort
Reduce Phase
Map Output
Overview of MapReduce Framework
Input File
www.kerneltraining.com
MapReduceResponsibilities to tackle various phases
Input Map Shuffling ReduceMap Output
Create ‘Input Splits’
Create individual Records -- Framework
User Defined Logic -- User
User Defined Logic -- User
Framework
<1, Delhi Mumbai Delhi>
<Delhi, 1><Mumbai, 1><Delhi, 1>
<2, Bangalore Delhi Chennai>
<3, Mumbai Delhi Chennai>
<Bangalore, 1><Delhi, 1><Chennai, 1>
<Mumbai, 1><Delhi, 1><Chennai, 1>
<Bangalore, 1>
<Mumbai, 1><Mumbai, 1>
<Chennai, 1><Chennai, 1>
<Delhi, (1,1,1,1)><Bangalore, 1>
<Mumbai, (1,1)><Chennai, (1,1)>
<Delhi, 4><Bangalore, 1>
Delhi Mumbai Delhi
Bangalore Delhi Chennai
Mumbai Delhi Chennai <Mumbai,
2><Chennai, 2>
<Delhi, 1><Delhi, 1><Delhi, 1><Delhi, 1>
www.kerneltraining.com
MapReduce
Reduce Process
Mapper Process
Block A Block B Block C
Driver
Mapper
Record Reader
Input Split 1
Input Split 2
Input Split 3
Input Split 4
InputFormat
Mapper Process
Mapper
Record Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines Passes
<K,V> pairs
<K, V> pairs
<K, V> pairs
Components of MapReduce
www.kerneltraining.com
MapReduceComponents of MapReduce
Reduce Process
Mapper Process
Block A Block B Block C
Driver
Mapper
Record Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Mapper Process
Mapper
Record Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines Passes
<K,V> pairs
<K, V> pairs
<K, V> pairs
Reduce Process
Reduce Process
www.kerneltraining.com
MapReduceComponents of MapReduce
Reduce Process
Mapper Process
Block A Block B Block C
Driver
Mapper
Record Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Mapper
Record Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines Passes
<K,V> pairs
<K, V> pairs
<K, V> pairs
Reduce Process
Reduce Process
Reduce ProcessReducer
Reduce ProcessReducer
Passes <K,V> pairs
Passes <K,V> pairs
Shuffle
www.kerneltraining.com
MapReduceComponents of MapReduce
Reduce Process
Mapper Process
Block A Block B Block C
Driver
Mapper
Record Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Mapper
Record Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines Passes <K,V>
pairs
<K, V> pairs
<K, V> pairs
Reduce Process
Reduce Process
Reduce ProcessReducer
Reduce ProcessReducer
Passes <K,V> pairs
Passes <K,V> pairs
Shuffle
Writer
Output Data
Writer
Output DataWrites
Writes
OutputFormat
Defines
Defines
Defines
Defines
Defines
www.kerneltraining.com
MapReduceDeciding factors to decide MapReduce.
Questions we must ask before deciding MapReduce :-
• Are input files input files independent of each other to process?
• Can the problem be broken into smaller tasks such that each task can be processed independently?
• Can the partial results of executing processing on small tasks be aggregated or consolidated?
www.kerneltraining.com
MapReduceDesign Patterns
Template for solving a common and general data manipulation problem with MapReduce.
• Summarization Patterns
• Filtering Patterns
• Join Patterns
• Job Chaining Patterns
www.kerneltraining.com
MapReduceCase Study – Summarization Pattern
•To find out subscribers and their corresponding downloaded bytes from sample logs of airmobile provided. Each line has information about subscriber (substring 15,26) the bytes downloaded (substring 87,97)
•Sample log files are present in above format. Data is present in line delimited format. From each line Customer ID and Downloaded Bytes have to be extracted for analysis.
www.kerneltraining.com
MapReduceCase Study – Summarization Pattern
(K1, V1) -- Input to user defined map function
•(0 , subId=00001111911128052639towerid=11232w34532543456345623453456984756894756bytes=122112212212212219.6726312167218586E17)
•(121 , subId=00001111911128052615towerid=11232w34532543456345623453456984756894756bytes=122112212212212216.9431647633139046E17
•(242 , subId=00001111911128052615towerid=11232w34532543456345623453456984756894756bytes=122112212212212214.7836041833447418E17)
www.kerneltraining.com
MapReduceCase Study – Summarization Pattern
list(K2, V2) -- Output from use defined map function
•(28052627, 8.4621702216543) •(28052639, 9.672631216721a858) •(28052627, 8.64072609693471)
www.kerneltraining.com
MapReduceCase Study – Summarization Pattern
(K2, list(V2)) -- Input to use defined reduce function
•(“28052627”, (8.4621702216543, 8.64072609693471) •(“28052639”, (9.672631216721858))
www.kerneltraining.com
MapReduceCase Study – Summarization Pattern
Mapper Class
www.kerneltraining.com
MapReduceCase Study – Summarization Pattern
Reducer Class
www.kerneltraining.com
MapReduceCase Study – Summarization Pattern
Driver Class
www.kerneltraining.com
MapReduceCase Study – Summarization Pattern
MapReduce Output
THANK YOUfor attending Demo of Hadoop Map Reduce
www.kerneltraining.com
Email: [email protected] us: +91 8099776681