hadoop 101: north east wisconsin code camp
TRANSCRIPT
![Page 1: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/1.jpg)
HADOOP
101Cluster Computing Made Easy
![Page 2: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/2.jpg)
Show of Hands
![Page 3: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/3.jpg)
Big Data
![Page 4: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/4.jpg)
Big Data
Volume
Variety
Velocity
![Page 5: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/5.jpg)
Common Types of Analysis
Text mining
Index building
Graph creation and analysis
Pattern recognition
Collaborative filtering
Prediction Models
Sentiment Analysis
Risk Assessment
![Page 6: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/6.jpg)
Hadoop
Hadoop is a cluster storage and computing
framework.
![Page 7: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/7.jpg)
Changing of the Guard
“Scale out guarantees that
hardware and software will
fail”
“I don’t want to see anymore
2001 papers about awesome
my IT team was because they
could reshard my database
on demand.”
![Page 8: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/8.jpg)
Storage
A
B
A
A
A
B
B
B
![Page 9: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/9.jpg)
Storage
A
B
A
A
A
B
B
B
![Page 10: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/10.jpg)
Tunneling Through the Cost
Barrier
![Page 11: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/11.jpg)
Solutions
![Page 12: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/12.jpg)
Solutions
![Page 13: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/13.jpg)
Solutions
“In pioneer days they
used oxen for heavy
pulling, and when one ox
couldn’t budge a log, we
didn’t try to grow a larger
ox. We shouldn’t be trying
for bigger computers, but
for more systems of
computers.”
![Page 14: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/14.jpg)
Cluster Computing
Complexities
Process management
Communication
Data movement
Task coordination
Partial failures
Scheduling
Tracking
![Page 15: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/15.jpg)
Cluster Computing
Complexities
Process management
Communication
Data movement
Task coordination
Partial failures
Scheduling
Tracking
RobustnessResiliencePerformanceSimplicity
![Page 16: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/16.jpg)
Where Do You Fit?
Input Split 1
Shuffle and Sort
Record
Reader
Output Format
Reducer
Mapper
Partitioner
Output File
Input Split 2
Record
Reader
Mapper
Partitioner
Input Split n
Record
Reader
Mapper
Partitioner
Output Format
Reducer
Output File
![Page 17: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/17.jpg)
Storage
A
B
A
A
A
B
B
B
![Page 18: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/18.jpg)
Where Do You Fit?
Input Split A
Shuffle and Sort
Record
Reader
Output Format
Reducer
Mapper
Partitioner
Output File
Input Split B
Record
Reader
Mapper
Partitioner
Output Format
Reducer
Output File
![Page 19: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/19.jpg)
Mapper Purpose
Sanitize Data
Select Subsets
Convert
Input Split A
Record
Reader
Mapper
Partitioner
![Page 20: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/20.jpg)
Mapper
Input:
Key
Value
Context
Output:
Key
Value
Input Split A
Record
Reader
Mapper
Partitioner
Mapper
![Page 21: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/21.jpg)
Word Count Mapper
Input: (Long, Text)
Key: 0
Value: “the cat sat on the mat”
Output: (Text, Long)
Key Value
the 1
cat 1
sat 1
on 1
the 1
mat 1
![Page 22: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/22.jpg)
Where Do You Fit?
Input Split A
Shuffle and Sort
Record
Reader
Output Format
Reducer
Mapper
Partitioner
Output File
Input Split B
Record
Reader
Mapper
Partitioner
Output Format
Reducer
Output File
![Page 23: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/23.jpg)
Reducer
Input:
Key
Values // This is an iterable
Context
Output:
Key
Value
![Page 24: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/24.jpg)
Reducer
Key Values
cat 1
mat 1
on 1
sat 1
the 1, 1
cat 1
mat 1
on 1
sat 1
the 2
Reducer
reduce(){
}
part-r-00001
![Page 25: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/25.jpg)
Demo
MRUnit
Mapper
Reducer
Run the whole cycle
![Page 26: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/26.jpg)
Platform
![Page 27: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/27.jpg)
Bibliography
Rear Admiral Hopper http://www.youtube.com/watch?v=1-
vcErOPofQ
Mike Olson talk http://web.archive.org/web/20130729201323id_/http://itc.conversationsnetw
ork.org/shows/detail4868.html
Large Scale C++ by John Lakos http://www.amazon.com/Large-
Scale-Software-Design-John-Lakos/dp/0201633620
![Page 28: Hadoop 101: North East Wisconsin Code Camp](https://reader034.vdocument.in/reader034/viewer/2022042817/55a6c5181a28ab81428b47d3/html5/thumbnails/28.jpg)
Jim Argeropoulos
@exploremqt
https://github.com/exploremqt