an introduction to apache hadoop
Post on 10-May-2015
250 Views
Preview:
DESCRIPTION
TRANSCRIPT
Introduction of Apache Hadoop
Presenter: Prem Chand Mali, Mindfire SolutionsDate: 30/01/2014
Presenter: Prem Chand Mali, Mindfire Solutions
About MeSCJP/OCJP - Oracle Certified Java ProgrammerMCP:70-480 - Specialist certification in HTML5
with JavaScript and CSS3 Exam
Skills : Java, Swings, Springs, Hibernate, JavaFX, Jquery, prototypeJS, ExtJS.
Connect Me : https://www.facebook.com/prem.c.mali http://www.linkedin.com/in/premmali https://twitter.com/prem_mali https://plus.google.com/106150245941317924019/about/p/pub
Contact Me : premchandm@mindfiresolutions.com / prem.c.mali@gmail.com mfsi_premchandm
Agenda
Presenter: Prem Chand Mali, Mindfire Solutions
History
What is Apache Hadoop
Why Apache Hadoop
HDFS
MapReduce
Q & A
History• Nutch Crawler based search
• GFS and Map Reduce paper published. • Yahoo! hired Doug Cutting and given dedicated team.
Presenter: Prem Chand Mali, Mindfire Solutions
What is Apache Hadoop ?• Apache Hadoop is an open-source software framework that supports data-intensive distributed applications licensed under the Apache v2 license. It supports running applications on large clusters of commodity hardware.
• Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework.
• Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers.
Presenter: Prem Chand Mali, Mindfire Solutions
What is Apache Hadoop ?• The Apache Hadoop framework is composed of the following modules :
– Hadoop Distributed File System (HDFS) - a distributed file-system that stores data on the commodity machines, providing very high aggregate bandwidth across the cluster.
– Hadoop MapReduce - a programming model for large scale data processing.– Hadoop Common - contains libraries and utilities needed by other Hadoop
modules– Hadoop YARN - a resource-management platform responsible for managing
compute resources in clusters and using them for scheduling of users' applications.
Presenter: Prem Chand Mali, Mindfire Solutions
Why Apache Hadoop ?• State of Data
– 90% of data in past three years.– Type of data
• Unstructured• Semi-structured• Relational
– Relation world can handle GB of data.• Distributed • Scalable• Flexible• Fault tolerant• Intelligent
Presenter: Prem Chand Mali, Mindfire Solutions
HDFS• HDFS is the primary distributed storage used by Hadoop applications. It consist of following two type of components.
– NameNode
– DataNode • HDFS, is well suited for distributed storage and distributed processing using commodity hardware.
• Hadoop supports shell-like commands to interact with HDFS directly.
Presenter: Prem Chand Mali, Mindfire Solutions
HDFS
Presenter: Prem Chand Mali, Mindfire Solutions
MapReduce
Presenter: Prem Chand Mali, Mindfire Solutions
• MapReduce if combination of following three things.
– Map
– Shuffle
– Reduce • It done it's job through Job Tracker and Task Tracker
MapReduce
Presenter: Prem Chand Mali, Mindfire Solutions
MapReduce
Presenter: Prem Chand Mali, Mindfire Solutions
MapReduce
Presenter: Prem Chand Mali, Mindfire Solutions
Presenter: Prem Chand Mali, Mindfire Solutions
Question and Answer
Thank you
Presenter: Prem Chand Mali, Mindfire Solutions
www.mindfiresolutions.com
https://www.facebook.com/MindfireSolutions
http://www.linkedin.com/company/mindfire-solutions
http://twitter.com/mindfires
Presenter: Prem Chand Mali, Mindfire Solutions
top related