hadoop cluster configuration and data loading - module 2
TRANSCRIPT
Hadoop Cluster Configuration and
Data Loading
Hadoop Cluster Specification• Hadoop is designed to run on commodity hardware• “Commodity” does not mean “low-end.”• Processor• 2 quad-core 2-2.5GHz CPUs
• Memory• 16-24 GB ECC RAM1
• Storage• 4 × 1TB SATA disks
• Network• Gigabit Ethernet
Hadoop Cluster Architecture
Hadoop Cluster Configuration filesFilename Format Description
hadoop-env.sh Bash scriptEnvironment variables that are used in the scripts to run Hadoop.
core-site.xml
Hadoop configurationXML
Configuration settings for Hadoop Core, such as I/O settings that are common to HDFS and MapReduce.
hdfs-site.xml
Hadoop configurationXML
Configuration settings for HDFS daemons: the namenode, the secondary namenode, and the datanodes.
mapred-site.xml
Hadoop configurationXML
Configuration settings for MapReduce daemons: the jobtracker, and the tasktrackers.
masters Plain textA list of machines (one per line) that each run a secondarynamenode.
slaves Plain textA list of machines (one per line) that each run a datanode and atasktracker.
Hadoop Cluster Modes• Standalone (or local) mode
There are no daemons running and everything runs in a single JVM. Standalone mode is suitable for running MapReduce programs during development, since it is easy to test and debug them.
• Pseudo-distributed modeThe Hadoop daemons run on the local machine, thus simulating a cluster on a small scale.
• Fully distributed modeThe Hadoop daemons run on a cluster of machines.
Multi-Node Hadoop Cluster
Reference: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
A Typical Production Hadoop ClusterMachine Type Workload
Pattern/ Cluster Type
Storage Processor (# of Cores)
Memory (GB) Network
Slaves Balanced workload
Four to six 1 TB disks
Dual Quad 24 Dual 1 GB links for all nodes in a 20 node rack and 2 x 10 GB intercon nect links per rack going to a pair of cen tral switches.
Compute intensive workload
Four to six 1 TB or 2 TB disks
Dual Hexa Quad 24-48
I/O inten sive work load
Twelve 1 TB disks Dual Quad 24-48
HBase clus ters Twelve 1 TB disks Dual Hexa Quad 48-96
Masters All work load pat terns/HBase clusters
Four to six 2 TB disks
Dual Quad Depends on number of file system objects to be created by NameNode.
References : http://docs.hortonworks.com/HDP2Alpha/index.htm#Hardware_Recommendations_for_Hadoop.htm
MapReduce Job execution (Map Task)
MapReduce Job execution (Reduce Task)
Hadoop Shell commands• Create a directory in HDFS at given path(s)
Usage: hadoop fs -mkdir <paths> Example: hadoop fs -mkdir /user/saurzcode/dir1 /user/saurzcode/dir2
• List the contents of a directoryUsage: hadoop fs -ls <args>Example: hadoop fs -ls /user/saurzcode
• Upload and download a file in HDFS.Usage: hadoop fs -put <localsrc> ... <HDFS_dest_Path>Example: hadoop fs -put /home/saurzcode/Samplefile.txt
/user/saurzcode/dir3/
Usage: hadoop fs -get <hdfs_src> <localdst>Example: hadoop fs -get /user/saurzcode/dir3/Samplefile.txt /home/
Hadoop Shell commands contd..• See contents of a file
Usage: hadoop fs -cat <path[filename]>Example: hadoop fs -cat /user/saurzcode/dir1/abc.txt
• Move file from source to destination.Usage: hadoop fs -mv <src> <dest>Example: hadoop fs -mv /user/saurzcode/dir1/abc.txt /user/saurzcode/dir2
• Remove a file or directory in HDFS.Usage : hadoop fs -rm <arg>Example: hadoop fs -rm /user/saurzcode/dir1/abc.txt
Usage : hadoop fs -rmr <arg>Example: hadoop fs -rmr /user/saurzcode/
Hadoop Shell commands contd..• Display last few lines of a file.
Usage : hadoop fs -tail <path[filename]>Example: hadoop fs -tail /user/saurzcode/dir1/abc.txt
• Display the aggregate length of a file.Usage : hadoop fs -du <path>Example: hadoop fs -du /user/saurzcode/dir1/abc.txt
Hadoop Copy Commands• Copy a file from source to destination
Usage: hadoop fs -cp <source> <dest>Example: hadoop fs -cp /user/saurzcode/dir1/abc.txt /user/saurzcode/dir2
• Copy a file from/To Local file system to HDFSUsage: hadoop fs -copyFromLocal <localsrc> URI
Example: hadoop fs -copyFromLocal /home/saurzcode/abc.txt /user/saurzcode/abc.txt
Usage: hadoop fs -copyToLocal URI <localdst>
Example: hadoop fs -copyFromLocal /user/saurzcode/abc.txt /home/saurzcode/abc.txt