hadoop ali sharza khan high performance computing 1

11
Hadoop Ali Sharza Khan High Performance Computing 1

Upload: phyllis-barker

Post on 31-Dec-2015

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Hadoop Ali Sharza Khan High Performance Computing 1

1

Hadoop

Ali Sharza KhanHigh Performance Computing

Page 2: Hadoop Ali Sharza Khan High Performance Computing 1

2

Table of Content

• Hadoop• Where did Hadoop come from ?• What problems can Hadoop solve?• Where does Hadoop applies to ?• How is Hadoop architected?• Two main parts of Hadoop• Conclusion

Page 3: Hadoop Ali Sharza Khan High Performance Computing 1

3

Hadoop

• What is Hadoop ?

– Open Source project

– Processing Large data sets in parallel

Page 4: Hadoop Ali Sharza Khan High Performance Computing 1

4

Where did Hadoop come from?

• Google• Yahoo, Facebook, Twitter and Linkedln are

actively contributing towards Hadoop.

Page 5: Hadoop Ali Sharza Khan High Performance Computing 1

5

What problems can Hadoop solve?

• Where you have lot of data

• Run analytics that are deep and computational extensive

Page 6: Hadoop Ali Sharza Khan High Performance Computing 1

6

Where does Hadoop applies to ?

• Search engine• Finance• Online Retail• Government• Media and entertainment• Research Institution and other market

Page 7: Hadoop Ali Sharza Khan High Performance Computing 1

7

How is Hadoop architected?

• Every server has 2 or 4 or 8 Cpu’s.• Each server operates on its own little piece of

data.• Hadoop clusters at Yahoo covers 25000

servers, and store 25 petabytes of application data.

• The largest cluster being 3500 servers.

Page 8: Hadoop Ali Sharza Khan High Performance Computing 1

8

Cloudera CEO Interview

http://www.youtube.com/watch?v=qNP4_ICDeqE

Page 9: Hadoop Ali Sharza Khan High Performance Computing 1

9

Two main parts of Hadoop

• HDFS (Hadoop Distributed File System)

• Map Reduce Framework– Map Phase– Reduce Phase– JobTracker (The master)– TaskTracker (The slave)

Page 10: Hadoop Ali Sharza Khan High Performance Computing 1

10

MapReduce FrameWork

Page 11: Hadoop Ali Sharza Khan High Performance Computing 1

11

Conclusion

• Why Hadoop is able to deal with lots of data?

• Why Hadoop is able to compute complicated Computational questions?