hadoop ali sharza khan high performance computing 1
TRANSCRIPT
1
Hadoop
Ali Sharza KhanHigh Performance Computing
2
Table of Content
• Hadoop• Where did Hadoop come from ?• What problems can Hadoop solve?• Where does Hadoop applies to ?• How is Hadoop architected?• Two main parts of Hadoop• Conclusion
3
Hadoop
• What is Hadoop ?
– Open Source project
– Processing Large data sets in parallel
4
Where did Hadoop come from?
• Google• Yahoo, Facebook, Twitter and Linkedln are
actively contributing towards Hadoop.
5
What problems can Hadoop solve?
• Where you have lot of data
• Run analytics that are deep and computational extensive
6
Where does Hadoop applies to ?
• Search engine• Finance• Online Retail• Government• Media and entertainment• Research Institution and other market
7
How is Hadoop architected?
• Every server has 2 or 4 or 8 Cpu’s.• Each server operates on its own little piece of
data.• Hadoop clusters at Yahoo covers 25000
servers, and store 25 petabytes of application data.
• The largest cluster being 3500 servers.
8
Cloudera CEO Interview
http://www.youtube.com/watch?v=qNP4_ICDeqE
9
Two main parts of Hadoop
• HDFS (Hadoop Distributed File System)
• Map Reduce Framework– Map Phase– Reduce Phase– JobTracker (The master)– TaskTracker (The slave)
10
MapReduce FrameWork
11
Conclusion
• Why Hadoop is able to deal with lots of data?
• Why Hadoop is able to compute complicated Computational questions?