![Page 1: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/1.jpg)
AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing
and Storage of CReSIS Polar Data
Mentor: Je’aime Powell, Dr. Mohammad Hasan
Members: JerNettie Burney, Jean Bevins, Cedric Hall, Glenn M. Koch
![Page 2: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/2.jpg)
Abstract
The primary focus of this research was to explore the capabilities of Hadoop as a software package to process, store and manage CReSIS polar data in a clustered environment. The investigation involved Hadoop functionality and usage through reviewed publications.The team’s research was aimed at determining if Hadoop was a viable software package to implement on the Elizabeth City State University (ECSU) Umfort computing cluster. Utilizing case studies; processing, storage, management, and job distribution methods were compared. A final determination of the benefits of Hadoop for the storing and processing of data on the Umfort cluster was then made.
![Page 3: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/3.jpg)
INTRODUCTION
• Hadoop is a set of open source technologies
• Hadooporiginated from the open source web search engine, Apache Nutch.
• Hadoopwas adopted by over 100 different companies
![Page 4: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/4.jpg)
Hadoop Functionality
• Hadoopis broken down into different parts• Some of the more imperative components of
Hadoop include MapReduce, Zookeeper, HDFS, Hive, Jobtracker, Namenode, and HBase.
• Hadoop’sadaptive functionalities allow various organizations’ needs to be met.
![Page 5: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/5.jpg)
Functionality
HadoopMapReduce
Zookeeper
HBase
JobTracker
NameNode
Hive
HDFS
![Page 6: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/6.jpg)
• Framework that processes large datasets• MapReduce is broken down into two steps• Maps out operation to servers and reduces the
results into a single result set
MapReduce
![Page 7: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/7.jpg)
• Data warehouse infrastructure• Goal is to provide acceptable wait times for
data browsing, and queries over small data sets or test queries
Hive
![Page 8: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/8.jpg)
• Used to maintain configuration information, manage computer naming schemes, provide distributed synchronization, and provide group services
Zookeeper
![Page 9: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/9.jpg)
HDFS
• Distributed storage system used by Hadoop• Designed to work and run on low-cost
hardware• Works on operations even when the system
fails
![Page 10: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/10.jpg)
NameNode
• Essential piece of the HDFS file system• Keeps a directory tree of all files in the file
system• NameNodewas considered a single point of
failure for a HDFS Cluster; when the NameNodefails, the file system goes offline
![Page 11: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/11.jpg)
Hadoop Process
Application JobTracker
NameNode• HDFS
TaskTracker
![Page 12: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/12.jpg)
HBase
• Hadoop Base (HBase) is the Hadoopdatabase• The goal of HBase is to host very large tables,
with billions of rows by millions of columns • In order to accomplish this HBase uses tables
including cascading, Hive and Pig source modules
![Page 13: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/13.jpg)
Case Studies
• Many institutions and companies utilize Hadoop
• Using the Services:FacebookEbayGoogleSan Diego Supercomputing Center
![Page 14: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/14.jpg)
• Google first created MapReduce
![Page 15: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/15.jpg)
• Distributed File System
![Page 16: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/16.jpg)
• Hadoop Hive system
![Page 17: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/17.jpg)
EBay
Fair SchedulerNameNode Zookeeper JobTracker
HBase
![Page 18: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/18.jpg)
The San Diego Supercomputer Center
• MapReduce
![Page 19: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/19.jpg)
Conclusion
Umfort current
• xCAT - Management• Linux ext3 over NFS -
Storage• TORQUE – Job
Distribution• MATLAB - Processing
Umfort proposed using Hadoop
• Hadoop NameNode and Zookeeper - Management
• Hadoop Distribution File System (HDFS) – Storage
• Hadoop JobTracker – Job Distribution
• MapReduce - Processing
![Page 20: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/20.jpg)
Conclusion (con’t…)
• Benefits:– Homogeneous product– Support– Cost efficient
![Page 21: AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d2f5503460f94a06e4b/html5/thumbnails/21.jpg)
Future Work
• Installation • Implementation• Testing– Repeat of past summer 2009 Polar Grid team’s
project using Hadoop– Convert CReSIS data into GIS database