hadoop training by keylabs
TRANSCRIPT
![Page 1: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/1.jpg)
BigDataBigData
An Introduction by An Introduction by KeylabsKeylabs
![Page 2: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/2.jpg)
Need For A New Processing Platform (BigData)
What is BigData ? - Twitter (over 7~ TB/day) - Facebook (over 10~ TB/day) - Google (over 20~ PB/day)
Where does it come from ?
Existing systems (vertical scalibility)
Why Hadoop (horizontal scalibility)?
![Page 3: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/3.jpg)
Origin of Hadoop
![Page 4: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/4.jpg)
Companies Using Hadoop
Yahoo Google Facebook LinkedIn IBM Amazon HortonWorks Cloudera NY Times … the list goes on.
![Page 5: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/5.jpg)
What is Hadoop? Flexible infrastructure for large scale computation & data
processing on a network of commodity hardware.
Completely written in java.
Open source & distributed under Apache license
Hadoop Core Components: HDFS & MapReduce.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
![Page 6: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/6.jpg)
What Hadoop is Not?
A File system
A database
An online transaction processing (OLTP) system
Replacement of all programming logic
![Page 7: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/7.jpg)
Three Vs of Hadoop and counting…
![Page 8: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/8.jpg)
Hadoop Introduction and Architecture
![Page 9: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/9.jpg)
Hadoop High-Level Architecture
![Page 10: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/10.jpg)
Hadoop Architecture
Admin Node
Job Tracker
Name Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
MapReduce Engine
HDFS Cluster
![Page 11: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/11.jpg)
Hadoop Cluster
![Page 12: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/12.jpg)
Distributed File System Hadoop Distributed File System
Read 1TB Data
1 Machine•4 I/O Channels•Each Channel – 100MB/s
10 Machines•4 I/O Channels•Each Channel – 100MB/s
![Page 13: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/13.jpg)
What’s so Special About Open Source Hadoop?
![Page 14: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/14.jpg)
HDFS - Hadoop Distributed File System
Design of HDFS Where HDFS is not a good fit Why Is a Block in HDFS So Large? Advantage of HDFS?
![Page 15: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/15.jpg)
HDFS is not for. Low Latency Data Access
Large number of small files.
Multiple writers, arbitrary file modifications.
![Page 16: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/16.jpg)
HDFS Architecture
![Page 17: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/17.jpg)
Let us Zoom into HDFS
![Page 18: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/18.jpg)
NameNode
Deeper Things about Name NodeRequest to note down these points
![Page 19: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/19.jpg)
DataNode
What is DataNode?
![Page 20: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/20.jpg)
NameNode and DataNodes
![Page 21: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/21.jpg)
Data Replication
What is Data Replication
![Page 22: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/22.jpg)
Data Replication & Rack Awareness
![Page 23: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/23.jpg)
File Write Operation
File Write Operation
![Page 24: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/24.jpg)
A client writing the data to HDFS
![Page 25: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/25.jpg)
File Write Operation in Depth - 1
![Page 26: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/26.jpg)
File Write Operation in Depth - 2
![Page 27: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/27.jpg)
File Write Operation in Depth - 3
![Page 28: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/28.jpg)
File Write Operation in Depth - 4
![Page 29: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/29.jpg)
File Write Operation in Depth - 4
![Page 30: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/30.jpg)
File Write Operation – Unhappy Path
![Page 31: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/31.jpg)
File Read Operation
File Read Operation
![Page 32: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/32.jpg)
A client reading data from HDFS
![Page 33: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/33.jpg)
File Read Operation in Depth - 1
![Page 34: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/34.jpg)
File Read Operation in Depth - 2
![Page 35: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/35.jpg)
File Read Operation in Depth - 3
![Page 36: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/36.jpg)
File Read Operation - Unhappy Path
![Page 37: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/37.jpg)
Secondary NameNode
![Page 38: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/38.jpg)
Hadoop Cluster – A Typical Scenario
![Page 39: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/39.jpg)
Hadoop Ecosystem
![Page 40: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/40.jpg)
Data Loading Techniques and Analysis
![Page 41: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/41.jpg)
When should we go for Hadoop? Data is too huge
Processes are independent
Online analytical processing (OLAP)
Better scalability
Parallelism
Unstructured data
![Page 42: Hadoop training by keylabs](https://reader031.vdocument.in/reader031/viewer/2022020307/55a987861a28ab76248b46f7/html5/thumbnails/42.jpg)
THANK YOUTHANK YOUFOR YOURFOR YOUR
ATTENTION!ATTENTION!