apache hadoop 2_installation
TRANSCRIPT
Apache Hadoop 2 Installation in Pseudo Mode
Download URL1. Hadoop: https://archive.apache.org/dist/hadoop/core/stable/2. Hive: http://people.apache.org/~hashutosh/hive-0.10.0-rc0/3. Pig: http://ftp.udc.es/apache/pig/pig-0.12.0/4. Hbase: http://archive.apache.org/dist/hbase/hbase-0.94.10/
Step 1: Generate ssh key
$ssh-keygen -t rsa -P “”
Step 2: Copy id_rsa.pub to authorized_keys
$cd .ssh$cp id_rsa.pub authorized_keys$chmod 644 authorized_keys
Step 3: Passwordless ssh to localhost
$cd ~$ssh localhost
Step 4: Untar tarballs
$tar -xvzf hadoop-2.2.0.tar.gz
Step 5: Configuration files
$cd hadoop-2.2.0/etc/hadoop/$vim core-site.xml
Add following properties in core-site.xml
<property><name>fs.defaultFS</name><value>hdfs://172.17.196.14</value></property>
<property><name>io.native.lib.available</name><value>true</value></property>
$vim hdfs-site.xml
Add following property in hdfs-site.xml
<property><name>dfs.datanode.data.dir</name><value>/home/hadoop/hadoop-2.2.0/pseudo/dfs/data</value>
</property>
<property><name>dfs.namenode.name.dir</name><value>/home/hadoop/hadoop-2.2.0/pseudo/dfs/name</value>
</property>
<property><name>dfs.replication</name><value>1</value>
</property>
<property><name>dfs.permissions</name><value>false</value>
</property>
$vim mapred-site.xml
Add following property in mapred-site.xml
<property><name>mapreduce.cluster.temp.dir</name><value>/home/hadoop/hadoop-2.2.0/temp</value><final>true</final>
</property>
<property><name>mapreduce.cluster.local.dir</name><value>/home/hadoop/hadoop-2.2.0/local</value><final>true</final>
</property>
<property><name>mapreduce.framework.name</name><value>yarn</value>
</property>
$vim yarn-site.xml
Add following property in yarn-site.xml
<property><name>yarn.resourcemanager.resource-tracker.address</name><value>localhost:6000</value>
</property>
<property><name>yarn.resourcemanager.scheduler.address</name><value> localhost:6001</value>
</property>
<property><name>yarn.resourcemanager.scheduler.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler<
/value></property>
<property><name>yarn.resourcemanager.address</name><value> localhost:6002</value>
</property>
<property><name>yarn.nodemanager.local-dirs</name><value>/home/hadoop/hadoop-2.2.0/yarn_nodemanager</value>
</property>
<property>
<name>yarn.nodemanager.address</name><value>0.0.0.0:6003</value>
</property>
<property><name>yarn.nodemanager.resource.memory-mb</name><value>10240</value>
</property>
<property><name>yarn.nodemanager.remote-app-log-dir</name><value>/home/hadoop/hadoop-2.2.0/app-logs</value>
</property>
<property><name>yarn.nodemanager.log-dirs</name><value>/home/hadoop/hadoop-2.2.0/logs</value>
</property>
<property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value>
</property>
$vim slaves
Add localhost in masters file
Step 6: set .bashrc
$cd ~$vim .bashrc
export JAVA_HOME=/usr/export HADOOP_HOME=/home/ahadoop2/hadoop-2.2.0export HADOOP_CONF_DIR=/home/ahadoop2/hadoop-2.2.0/etc/hadoopexport PIG_HOME=/home/ahadoop2/pig-0.12.0export HBASE_HOME=/home/ahadoop2/hbase-0.96.0-hadoop2export HIVE_HOME=/home/ahadoop2/hive-0.11.0export PIG_CLASSPATH=/home/ahadoop2/hadoop-2.2.0/etc/hadoop
export CLASSPATH=$PIG_HOME/pig-withouthadoop.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.2.0.jar:$HADOOP_HOME/share/hadoop/hdfs/hadoop-hdfs-2.2.0.jar:$HBASE_HOME/lib/hbase-client-0.96.0-hadoop2.jar:$HBASE_HOME/lib/hbase-common-0.96.0-hadoop2.jar:$HBASE_HOME/lib/hbase-server-0.96.0-hadoop2.jar:$HBASE_HOME/lib/commons-httpclient-3.1.jar:$HBASE_HOME/lib/commons-collections-3.2.1.jar:$HBASE_HOME/lib/commons-lang-2.6.jar:$HBASE_HOME/lib/jackson-mapper-asl-1.8.8.jar:$HBASE_HOME/lib/jackson-core-asl-1.8.8.jar:$HBASE_HOME/lib/guava-12.0.1.jar:$HBASE_HOME/lib/protobuf-java-2.5.0.jar:
$HBASE_HOME/lib/commons-codec-1.7.jar:$HBASE_HOME/lib/zookeeper-3.4.5.jar:$HIVE_HOME/lib/hive-jdbc-0.11.0.jar:$HIVE_HOME/lib/hive-metastore-0.11.0.jar:$HIVE_HOME/lib/hive-serde-0.11.0.jar:$HIVE_HOME/lib/hive-common-0.11.0.jar:$HIVE_HOME/lib/hive-service-0.11.0.jar:$HIVE_HOME/lib/libfb303-0.9.0.jar:$HIVE_HOME/lib/postgresql-9.2-1003.jdbc3.jar:$HIVE_HOME/lib/libthrift-0.9.0.jar:$HIVE_HOME/lib/slf4j-api-1.6.1.jar:$HIVE_HOME/lib/commons-logging-1.0.4.jar:/home/ahadoop2/Hadoop2Training.jar
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PIG_HOME/bin:$HBASE_HOME/bin:$HIVE_HOME/bin:/bin:/usr/lib64/qt-3.3/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:
Step 7: Load .bashrc
$cd ~$. .bashrc
Step 8: Formatting the name node
$cd ~$hadoop namenode -format
Step 9: Starting Cluster
$cd ~/hadoop-2.2.0/sbin$ ./start-all.sh
To view the started daemons$ jps This should show the started daemons.
NameNodeDataNodeSecondaryNamenodeNodemanagerResourceManager
Apache Hbase Installation in Pseudo Mode
Step 1: Untar the tarballs
$tar -xvzf hbase-0.96.0-hadoop2.tar.gz
Step 2: Configuration files
$cd hbase-0.96.0-hadoop2/conf
$vim hbase-site.xml
Copy following properties in hbase-site.xml
<property><name>hbase.rootdir</name><value>hdfs://localhost:8020/hbase</value><description>The directory shared by RegionServers</description>
</property>
<property><name>hbase.cluster.distributed</name><value>true</value>
</property>
<property><name>dfs.replication</name><value>1</value>
</property>
$vim regionservers
Add localhost in regionservers file
Step 3: Add hadoop jars from hadoop directory to hbase lib directory
$cd /home/hadoop/hadoop-2.2.0/share/hadoop/common/$cp hadoop-common-2.2.0.jar /home/hadoop/hbase-0.96.0-hadoop2/lib/
Step 4: start hbase
$cd ~$start-hbase.sh
Step 5: To view the started daemons
$ jps HmasterHregionserverHquorumpeer
Step 6: To view hbase shell
$hbase shell
Step 7: Before connecting to hbase using java
Start hbase rest service by executing following command
$hbase-daemon.sh start rest -p 8090
Apache Hive Installation
Step 1: Untar the tarballs
$tar -xvzf hive-0.11.0.tar.gz
Step 2: Configuring a remote PostgreSQL database for the Hive Metastore
Before you can run the Hive metastore with a remote PostgreSQL database, you must configure a connector to the remote PostgreSQL database, set up the initial database schema, and configure the PostgreSQL user account for the Hive user.
Install and start PostgreSQL if you have not already done so you need to edit the postgresql.conf file. Set the listen property to * to make sure that the Configure authentication for your network in pg_hba.conf. Add a new line into pg_hba.con that has the following information:
Start PostgreSQL Server
$ su postgres
$cd $postgres_home/bin
$./pg_ctl start -D path_to_data_dir
Install the Postgres JDBC Driver
Copy postgresql-jdbc driver in $HIVE_HOME/lib/
Create the metastore database and user account
Proceed as in the following example:
bash# sudo –u postgres psql
bash$ psqlpostgres=# CREATE USER hiveuser WITH PASSWORD 'mypassword';postgres=# CREATE DATABASE metastore;postgres=# exit;
bash# sudo –u hiveuser metastore
You are now connected to database 'metastore' with hiveuser.
metastore=# \i /home/hadoop/hive-0.11.0/scripts/metastore/upgrade/postgres/hive-schema-0.10.0.postgres.sql
Step 3: Configuration files
$cd hive-0.11.0/conf$vim hive-site.xml
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property><name>hive.metastore.warehouse.dir</name><value>/user/hive/warehouse</value><description>location of default database for the warehouse</description>
</property>
<property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:postgresql://<postgresql instance ip>:5432/metastore</value>
</property>
<property><name>javax.jdo.option.ConnectionDriverName</name><value>org.postgresql.Driver</value>
</property>
<property><name>javax.jdo.option.ConnectionUserName</name><value>hiveuser</value>
</property>
<property><name>javax.jdo.option.ConnectionPassword</name><value>mypassword</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name><value>false</value>
</property>
<property><name>hive.metastore.uris</name><value>thrift://<namenode ip>:9083</value><description>IP address (or fully-qualified domain name) and port of the metastore
host</description></property>
<property><name>datanucleus.autoStartMechanism</name><value>SchemaTable</value>
</property>
</configuration>
Step 4: Strat hive metastore
$hive --service metastore
Step 5: To view hive console
$hivehive>show tables;OK
Step 6: Before connecting to hive using java
Start hiveserver by executing following command
$hive --service hiveserver
Apache pig installation
Step 1: Untar the tarballs
$tar -xvzf pig-0.12.0.tar.gz
Step 2: Delete two jars (pig and pig-without hadoop jar) from pig home directory and add pig-withouthadoop.jar in pig installation directory (Uploaded in knowmax same path)
Step 3: To open pig grunt
$pig
<name>datanucleus.autoCreateSchema</name><value>false</value>
</property>
<property><name>hive.metastore.uris</name><value>thrift://<namenode ip>:9083</value><description>IP address (or fully-qualified domain name) and port of the metastore
host</description></property>
<property><name>datanucleus.autoStartMechanism</name><value>SchemaTable</value>
</property>
</configuration>
Step 4: Strat hive metastore
$hive --service metastore
Step 5: To view hive console
$hivehive>show tables;OK
Step 6: Before connecting to hive using java
Start hiveserver by executing following command
$hive --service hiveserver
Apache pig installation
Step 1: Untar the tarballs
$tar -xvzf pig-0.12.0.tar.gz
Step 2: Delete two jars (pig and pig-without hadoop jar) from pig home directory and add pig-withouthadoop.jar in pig installation directory (Uploaded in knowmax same path)
Step 3: To open pig grunt
$pig