bn1028 demo hadoop administration and development
TRANSCRIPT
![Page 1: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/1.jpg)
Introduction to Hadoop Administration and Development
BN1028 – Demo PPT
Demo Hadoop Admin & Development
![Page 2: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/2.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
Agenda• What is Big Data and Hadoop?
• Challenges of Big Data
• Technologies support Big Data
• What is Hadoop? And Why Hadoop?
• Hadoop Eco System
• Use Cases of Hadoop
• HDFS
• Map Reduce
• Hadoop Cluster
• Pig
• Hive
• Hbase
• ZooKeeper
• Flume
![Page 3: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/3.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
What is Big Data and Hadoop?
• Apache Hadoop is a software framework that supports
data-intensive distributed applications under a free license.
• It enables applications to work with thousands of nodes
an petabytes of data.
• Hadoop was inspired by Google's MapReduce and Google File
System (GFS) papers
![Page 4: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/4.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
History of Hadoop
![Page 5: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/5.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
What is Big Data?
• Big data is a term applied to data sets whose size is beyond the ability ofcommonly used software tools to capture, manage, and process the datawithin a tolerable elapsed time.
![Page 6: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/6.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
Before Hadoop
![Page 7: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/7.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
Big Data Growth?Sources :: Web logs; RFID; sensor networks; social networks; social data; Internet text/Index; call detail records; astronomy, atmospheric science, biological; military surveillance; medical records; photography & video archives
![Page 8: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/8.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
Present hadoop
![Page 9: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/9.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
Challenges of Big Data• How we can we capture /store big data in right time ?
• How we can we process big data in right time ?
• How we can we analyze and use big data in right time and deliver to right people?
• Traditional Systems: They can’t scale, not reliable and expensive.
![Page 10: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/10.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
Why Hadoop?
■ Accessible—Hadoop runs on large clusters of commodity machines or on cloud (EC2 ).■ Robust—Hadoop is architected with the assumption of frequent hardware malfunctions. It can gracefullyhandle most such failures.■ Scalable—Hadoop scales linearly to handle larger data by adding more nodes to the cluster.■ Simple—Hadoop allows users to quickly write efficient parallel code.■ Data Locality—Move Computation to the Data.■ Replication - Use replication across servers to deal with unreliable torage/servers
Characteristics
![Page 11: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/11.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
Hadoop Adotion drivers
■ Business DriversBigger the data, Higher the value
■ Financial DriversCost advantage of Open Source + Commodity H/WLow cost per TB
■ Technical DriversExisting systems failing under growing requirements –3 Vs
Adoption Drivers
![Page 12: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/12.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
Who Uses Hadoop?Few Users of Hadoop:
• Yahoo - 100,000+ CPUs in >36,000+ computers
• Facebook - 1100-machine cluster with 8800 cores and about 12+ PB storage and A 300+-machine cluster with 2400cores and about 3 PB raw storage
• Linkedin –
• Ebay
• IBM
• IIIT, Hyderabad - 10 to 30 nodes ,Quad 6600s, 4GB RAM and 1TB disk
• PSG Tech, Coimbatore - 5 to 10 nodes. Cluster nodes vary from 2950 Quad Core Rack Server, with 2x6MB Cacheand 4 x 500 GB SATA Hard Drive to E7200 / E7400 processors with 4 GB RAM and 160 GB HDD.
• Rackspace
• Google – University initiative
• Adobe
• New York Times
![Page 13: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/13.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
Low Cost per TB● Typical Hardware:
● Two Quad Core Processor
● 24GB RAM
● 12 * 1TB SATA disks (JBOD mode, no need for RAID)
● 1 Gigabit Ethernet card
● Cost/node: $5K/node
● Effective HDFS Space:
● ¼ reserved for temp shuffle space, which leaves 9TB/node
● 3 way replication leads to 3TB effective HDFS space/node
● But assuming 7x compression that becomes ~ 20TB/node
Effective Cost per user TB: $250/TB
Other solutions cost in the range of $5K to $100K per user TB
![Page 14: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/14.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
Comparison with RDBMS
![Page 15: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/15.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
Hadoop Eco System
![Page 16: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/16.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
Use Cases of Hadoop
![Page 17: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/17.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
HDFS
![Page 18: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/18.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
Map Reduce
![Page 19: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/19.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
Hadoop Cluster
![Page 20: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/20.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
PIG ,HIVE,HBASE
![Page 21: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/21.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
ZooKeeper
![Page 22: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/22.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
Flume
![Page 23: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/23.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
Benefits of Hadoop
![Page 24: Bn1028 demo hadoop administration and development](https://reader034.vdocument.in/reader034/viewer/2022052606/58ef804d1a28abf9768b4569/html5/thumbnails/24.jpg)
http://www.conlinetraining.com/courses/hadoop-administration-development-online-training/
Email us : [email protected]
Visit : www.conlinetraining.com