hadoop - computer science departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfhadoop...
TRANSCRIPT
![Page 1: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/1.jpg)
HADOOP
Installation and Deployment of a Single Node on a Linux System
Presented by:
Liv Nguekap
And
Garrett Poppe
![Page 2: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/2.jpg)
Topics
● Create hadoopuser and group
● Edit sudoers
● Set up SSH
● Install JDK
● Install Hadoop
● Editting Hadoop settings
● Running Hadoop
● Resources
![Page 3: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/3.jpg)
Add Hadoopuser
![Page 4: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/4.jpg)
Edit sudoers
![Page 5: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/5.jpg)
Set up SSH
● sudo chown hadoopuser ~/.ssh
● sudo chmod 700 ~/.ssh
● sudo chmod 600 ~/.ssh/id_rsa
● sudo cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
● sudo chmod 600 ~/.ssh/authorized_keys
● Edit /etc/ssh/sshd_config
![Page 6: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/6.jpg)
Install JDK
● Login as hadoopuser ● Uninstall previous
versions of JDK ● Download current
version of JDK ● Install JDK ● Edit JAVA_HOME
and PATH variables in “~/.bashrc” file
![Page 7: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/7.jpg)
Install Hadoop ● Download current stable release
● Untar the download
● tar xzvf hadoop-2.4.1.tar.gz
● Move the untarred folder
● sudo mv hadoop-2.4.1 /usr/local/hadoop
● Change ownership and create nodes
● sudo chown -R hadoopuser:hadoopgroup /usr/local/hadoop
● mkdir -p ~/hadoopspace/hdfs/namenode
● mkdir -p ~/hadoopspace/hdfs/datanode
![Page 8: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/8.jpg)
Install Hadoop
● Edit Hadoop variables in “~/.bashrc” file
● After editing file, use command to apply.
● “source ~/.bashrc”
![Page 9: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/9.jpg)
Editing Hadoop settings
● Go to directory located at /usr/local/hadoop/etc/hadoop
● Create a copy of mapred-site.xml.template as mapred-site.xml
![Page 10: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/10.jpg)
Editing Hadoop settings
● Edit mapred-site.xml ● Add code between
<configuration> tabs
<property> <name>mapreduce.fra
mework.name </name> <value>yarn</value> </property>
![Page 11: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/11.jpg)
Editing Hadoop settings
● Edit yarn-site.xml ● Add code between
<configuration> tabs
<property> <name>yarn.nodemana
ger.aux-services </name> <value> mapreduce_shuffle </
value> </property>
![Page 12: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/12.jpg)
Editing Hadoop settings
● Edit core-site.xml ● Add code between
<configuration> tabs
<property> <name> fs.default.name </name> <value> hdfs://localhost:9000 </value> </property>
![Page 13: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/13.jpg)
Editing Hadoop settings
● Edit hdfs-site.xml
● Add code between <configuration> tabs
<property>
<name>
dfs.replication
</name>
<value>
1
</value>
</property>
<property>
<name>
dfs.name.dir
</name>
<value>
file:///home/hadoopuser/hadoopspace/hdfs/namenode
</value>
</property>
<property>
<name>
dfs.data.dir
</name>
<value>
file:///home/hadoopuser/hadoopspace/hdfs/datanode
</value>
</property>
![Page 14: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/14.jpg)
Editing Hadoop settings
● Edit “hadoop-env.sh” ● Create the
JAVA_HOME variable using current JDK path.
![Page 15: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/15.jpg)
Editting Hadoop settings
● Format the namenode using the command “hdfs namenode -format”
![Page 16: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/16.jpg)
Running Hadoop
● Start services ● “start-dfs.sh” ● “start-yarn.sh”
![Page 17: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/17.jpg)
Running Hadoop
● Use jps command to make sure all services are running.
![Page 18: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/18.jpg)
Running Hadoop
● Open web browser. ● Type “localhost:
50070” into address bar to access web interface.
![Page 19: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/19.jpg)
● WRITING MAPREDUCE PROGRAMS FOR HADOOP
Part 2
![Page 20: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/20.jpg)
Languages/scripts used
● We will talk about two languages used to write mapreduce programs in Hadoop:
● 1) Pig Script (also called Pig Latin) ● 2) Java
![Page 21: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/21.jpg)
Pig
● What is Pig? ● Pig is a high-level platform for creating
MapReduce programs used with Hadoop. ● It is somewhat similar to SQL
![Page 22: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/22.jpg)
How Pig Works
● Pig has two modes of execution:
● 1) Local Mode - To run Pig in local mode, you need access to a single machine.
● 2) Mapreduce Mode - To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation.
![Page 23: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/23.jpg)
Syntax to run Pig
● To run Pig in Local Mode, use: ● pig -x local id.pig
● To run Pig in Mapreduce Mode, use: ● pig id.pig or pig -x mapreduce id.pig
![Page 24: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/24.jpg)
Ways to run Pig
● Whether in local or mapreduce mode, there are 3 ways of running Pig:
● 1) Grunt shell ● 2) Batch or script file ● 3) Embedded Program
![Page 25: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/25.jpg)
Sample Grunt Shell Code
![Page 26: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/26.jpg)
Grunt Shell Commands
![Page 27: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/27.jpg)
Grunt Shell Commands
![Page 28: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/28.jpg)
Batch
● To run Pig with batch files, the pig script is written entirely into a Pig file and the file run with Pig.
● A sample syntax for the file totalmiles.pig is: ● Pig totalmiles.pig
![Page 29: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/29.jpg)
Content of file totalmiles.pig
![Page 30: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/30.jpg)
Content of 1987 flight data file
![Page 31: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/31.jpg)
JAVA
● We tested the mapreduce function of Hadoop on a java program called WordCount.java
● The wordcount.class is provided in the examples that come with hadoop installation
![Page 32: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/32.jpg)
Where to find the Hadoop Examples
![Page 33: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/33.jpg)
JAVA
![Page 34: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/34.jpg)
Launching WordCount job
![Page 35: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/35.jpg)
WordCount Processing
![Page 36: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/36.jpg)
WordCount Processing
![Page 37: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/37.jpg)
Results
![Page 38: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/38.jpg)
Results
![Page 39: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/39.jpg)
WordCount.Java - Map
![Page 40: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/40.jpg)
WordCount.java - Reduce
![Page 41: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/41.jpg)
● Fin
● Thank YOU!!
![Page 42: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap](https://reader031.vdocument.in/reader031/viewer/2022022506/5ac18b557f8b9a357e8ccd29/html5/thumbnails/42.jpg)
Resources
● http://alanxelsys.com/hadoop-v2-single-node-installation-on-centos-6-5/
● http://tecadmin.net/setup-hadoop-2-4-single-node-cluster-on-linux/
● http://hadoop.apache.org/ ● http://cs.smith.edu/dftwiki/index.php/
Hadoop_Tutorial_1_--_Running_WordCount ● https://pig.apache.org/docs/r0.10.0/basic.html