project--2 nd review_2
TRANSCRIPT
![Page 1: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/1.jpg)
Deploying and Researching Hadoop in Virtual Machines
![Page 2: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/2.jpg)
Hadoop:
• Hadoop is an open source software platform. • It is derived from Google’s MapReduce and GFS(Google file
system).• Hadoop is an open source implementation of MapReduce.• It develops open source software for reliable and scalable distributed
computing. Definition:• Basically, it's a way of storing enormous data sets across clusters of
computers . • It is designed to be Robust and Efficient.• The Apache Hadoop software library is a framework .• It is designed to scale up from single servers to thousands of
machines.
![Page 3: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/3.jpg)
Who uses Hadoop?
![Page 4: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/4.jpg)
Abstract:
• Hadoop's emerging and the maturity of virtualization make it feasible.
• It introduces some technologies used such as CloudStack, MapReduce and Hadoop.
• How to deploy Hadoop in virtual machines which can be obtained from Cloud Stack .
• we run some Hadoop programs under the virtual cluster.
![Page 5: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/5.jpg)
Introduction:
• Now a days, the most frequently used programs are those Internet based services.
• MapReduce can process 20 PB of data per day.• Ability to read and write data.• A reliable shared storage and analysis system (HDFS and
MapReduce)• Enables applications to work .
![Page 6: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/6.jpg)
Literature survey:
• Ignoring the data locality issue in different types of environments can easily reduce the MapReduce performance.
• Experimental results on two real data-intensive applications show that their data placement strategy.
• The first generation of Hadoop had two single points of failure: the NameNode and JobTracker processes.
• Hadoop MapReduce has two main services: the jobtracker and the tasktracker.
![Page 7: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/7.jpg)
Existing System:
• Need to process terabytes of data in efficient manner on daily bases.
• In the existing system we are using single virtual machine.• The disadvantage is that the potential for poor performance
and heavy load undoubtedly, which is what to be solved .
![Page 8: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/8.jpg)
Proposed System:
• In the proposed system we are using cloud stack infrastructure. • MapReduce is designed under cluster, management of thousands
commodity PCs is a big job. • Deploying the Hadoop Applications on virtual machines .• Maybe the biggest problem is the power consumption.
![Page 9: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/9.jpg)
Modules:
• Module 1: User has to start namenode, datanode, jobtracker and task tracker nodes based on the virtual machine.
• Module2: User observes the virtual machines running on cluster infrastructure.
• Module3: User can connect to any virtual machine running on cluster by providing required details.
• Module4: In this module user can deploy the files on connected virtual machine and do research on any virtual machine.
![Page 10: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/10.jpg)
Hardware Requirements
• Pentium 4 Processor • 8GB RAM• 64 bit OS(Ubuntu)• 200 GB HDD
![Page 11: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/11.jpg)
Software Requirements
• Java 6• Eclipse Indigo (With Hadoop Configuration)• Hadoop Appliance• Cygwin• CloudStack
![Page 12: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/12.jpg)
ARCHITECTURE
![Page 13: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/13.jpg)
3-Tier Architecture
![Page 14: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/14.jpg)
Master/Slave Architeture
![Page 15: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/15.jpg)
HDFS Architecture
![Page 16: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/16.jpg)
DESIGNING
![Page 17: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/17.jpg)
CLASS DIAGRAM Start node
nameNodePort : numberdataNamePort : numberhdfsPort : numbercommand : stringnodeName : string
start()format()
Researchquery : string
submit()cancel()
Deploy filesfileName : stringpath : stringdirectory : string
deploy()cancel()
Connect to VMportNo : numberhostName : string
connect()cancel()
![Page 18: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/18.jpg)
USECASE DIAGRAM
name node
data node
start job tracker
connect to VM
logout
deploy files
research on files
user
![Page 19: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/19.jpg)
SEQUENCE DIAGRAM
user HDFS
start name node
response
data noderesponse
job tracker
response
deploy files
response
research on filesresponse
logout
response
![Page 20: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/20.jpg)
COLLABORATION DIAGRAM
user HDFS
1: start name node
2: response
3: data node
4: response
5: job tracker
6: response
7: deploy files
8: response
9: research on files
10: response
11: logout
12: response
![Page 21: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/21.jpg)
TESTING Black Box Testing White Box Testing Grey Box Testing Regression Testing
![Page 22: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/22.jpg)
Test CasesName Input Output
Activate Root Account Username and password Successfully Enabled
Starting management
Server
Management Server Details
Successfully started
Adding Pod Pod details
Successfully Added
Adding Zone Zone Details
Successfully Added
Adding Cluster Cluster Details
Successfully Added
Primary Storage Primary Storage Details
Successfully Added
Secondary Storage Secondary Storage Details
Successfully Added
![Page 23: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/23.jpg)
OUTPUTSCREENS
![Page 24: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/24.jpg)
Home Page
![Page 25: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/25.jpg)
Dash Board
![Page 26: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/26.jpg)
Instances
![Page 27: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/27.jpg)
Network
![Page 28: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/28.jpg)
Events
![Page 29: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/29.jpg)
Accounts
![Page 30: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/30.jpg)
Domains
![Page 31: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/31.jpg)
Infrastructure
![Page 32: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/32.jpg)
Projects
![Page 33: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/33.jpg)
Global Settings
![Page 34: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/34.jpg)
Service Settings
![Page 35: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/35.jpg)
Conclusion:
• This Project CloudStack, MapReduce programming model and Hadoop, which allows distributed parallel running, which shows that it is feasible to deploying and research Hadoop in Virtual machines . The advantages are that it can ease the management, fully utilize the computing resources, make Hadoop more reliable and save power and so on. Then some methods to optimize Hadoop in virtual machines are discussed.
![Page 36: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/36.jpg)
Future Enhancements
• Right Management:For example, we can arrange a test administrator to
be responsible for this experimental course, then the experimental teachers can only view and count related information of experimental course, other courses do not have permission. • Experimental Control and Report Submission:
The instructor can specify the actionable experimental project, and the system design experimental record, save the 1219 experimental project information that students have taken in pilot project, facilitate faculty management .
![Page 37: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/37.jpg)
BIBLIOGRAPHY• List of Reference Documents:• Grady Brooch, “The Unified Modeling Language Users guide” • Roger S Pressman, “Software Engineering”, A practitioners
approach• Walker Royce, “Software Project Management”• Head First Series for Java
• Web References:• http://en.wikipedia.org/wiki/HDFS#Hadoop_distributed_file_sy
stem• http://hadoop.apache.org/• http://en.wikipedia.org/wiki/Mapreduce• http://en.wikipedia.org/wiki/Main_Page• http://cloudstack.apache.org/about.html
![Page 38: project--2 nd review_2](https://reader035.vdocument.in/reader035/viewer/2022070522/58ee210b1a28ab2e078b46c3/html5/thumbnails/38.jpg)
Thank you for
watching…!