apache hadoop 2_installation

10
Apache Hadoop 2 Installation in Pseudo Mode Download URL 1. Hadoop: https://archive.apache.org/dist/hadoop/core/stable/ 2. Hive: http://people.apache.org/~hashutosh/hive-0.10.0-rc0/ 3. Pig: http://ftp.udc.es/apache/pig/pig-0.12.0/ 4. Hbase: http://archive.apache.org/dist/hbase/hbase-0.94.10/ Step 1: Generate ssh key $ssh-keygen -t rsa -P “” Step 2: Copy id_rsa.pub to authorized_keys $cd .ssh $cp id_rsa.pub authorized_keys $chmod 644 authorized_keys Step 3: Passwordless ssh to localhost $cd ~ $ssh localhost Step 4: Untar tarballs

Upload: sushantbit04

Post on 16-Jul-2015

76 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Apache hadoop 2_installation

Apache Hadoop 2 Installation in Pseudo Mode

Download URL1. Hadoop: https://archive.apache.org/dist/hadoop/core/stable/2. Hive: http://people.apache.org/~hashutosh/hive-0.10.0-rc0/3. Pig: http://ftp.udc.es/apache/pig/pig-0.12.0/4. Hbase: http://archive.apache.org/dist/hbase/hbase-0.94.10/

Step 1: Generate ssh key

$ssh-keygen -t rsa -P “”

Step 2: Copy id_rsa.pub to authorized_keys

$cd .ssh$cp id_rsa.pub authorized_keys$chmod 644 authorized_keys

Step 3: Passwordless ssh to localhost

$cd ~$ssh localhost

Step 4: Untar tarballs

Page 2: Apache hadoop 2_installation

$tar -xvzf hadoop-2.2.0.tar.gz

Step 5: Configuration files

$cd hadoop-2.2.0/etc/hadoop/$vim core-site.xml

Add following properties in core-site.xml

<property><name>fs.defaultFS</name><value>hdfs://172.17.196.14</value></property>

<property><name>io.native.lib.available</name><value>true</value></property>

$vim hdfs-site.xml

Add following property in hdfs-site.xml

<property><name>dfs.datanode.data.dir</name><value>/home/hadoop/hadoop-2.2.0/pseudo/dfs/data</value>

</property>

<property><name>dfs.namenode.name.dir</name><value>/home/hadoop/hadoop-2.2.0/pseudo/dfs/name</value>

</property>

<property><name>dfs.replication</name><value>1</value>

</property>

<property><name>dfs.permissions</name><value>false</value>

</property>

$vim mapred-site.xml

Add following property in mapred-site.xml

Page 3: Apache hadoop 2_installation

<property><name>mapreduce.cluster.temp.dir</name><value>/home/hadoop/hadoop-2.2.0/temp</value><final>true</final>

</property>

<property><name>mapreduce.cluster.local.dir</name><value>/home/hadoop/hadoop-2.2.0/local</value><final>true</final>

</property>

<property><name>mapreduce.framework.name</name><value>yarn</value>

</property>

$vim yarn-site.xml

Add following property in yarn-site.xml

<property><name>yarn.resourcemanager.resource-tracker.address</name><value>localhost:6000</value>

</property>

<property><name>yarn.resourcemanager.scheduler.address</name><value> localhost:6001</value>

</property>

<property><name>yarn.resourcemanager.scheduler.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler<

/value></property>

<property><name>yarn.resourcemanager.address</name><value> localhost:6002</value>

</property>

<property><name>yarn.nodemanager.local-dirs</name><value>/home/hadoop/hadoop-2.2.0/yarn_nodemanager</value>

</property>

<property>

Page 4: Apache hadoop 2_installation

<name>yarn.nodemanager.address</name><value>0.0.0.0:6003</value>

</property>

<property><name>yarn.nodemanager.resource.memory-mb</name><value>10240</value>

</property>

<property><name>yarn.nodemanager.remote-app-log-dir</name><value>/home/hadoop/hadoop-2.2.0/app-logs</value>

</property>

<property><name>yarn.nodemanager.log-dirs</name><value>/home/hadoop/hadoop-2.2.0/logs</value>

</property>

<property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value>

</property>

$vim slaves

Add localhost in masters file

Step 6: set .bashrc

$cd ~$vim .bashrc

export JAVA_HOME=/usr/export HADOOP_HOME=/home/ahadoop2/hadoop-2.2.0export HADOOP_CONF_DIR=/home/ahadoop2/hadoop-2.2.0/etc/hadoopexport PIG_HOME=/home/ahadoop2/pig-0.12.0export HBASE_HOME=/home/ahadoop2/hbase-0.96.0-hadoop2export HIVE_HOME=/home/ahadoop2/hive-0.11.0export PIG_CLASSPATH=/home/ahadoop2/hadoop-2.2.0/etc/hadoop

export CLASSPATH=$PIG_HOME/pig-withouthadoop.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.2.0.jar:$HADOOP_HOME/share/hadoop/hdfs/hadoop-hdfs-2.2.0.jar:$HBASE_HOME/lib/hbase-client-0.96.0-hadoop2.jar:$HBASE_HOME/lib/hbase-common-0.96.0-hadoop2.jar:$HBASE_HOME/lib/hbase-server-0.96.0-hadoop2.jar:$HBASE_HOME/lib/commons-httpclient-3.1.jar:$HBASE_HOME/lib/commons-collections-3.2.1.jar:$HBASE_HOME/lib/commons-lang-2.6.jar:$HBASE_HOME/lib/jackson-mapper-asl-1.8.8.jar:$HBASE_HOME/lib/jackson-core-asl-1.8.8.jar:$HBASE_HOME/lib/guava-12.0.1.jar:$HBASE_HOME/lib/protobuf-java-2.5.0.jar:

Page 5: Apache hadoop 2_installation

$HBASE_HOME/lib/commons-codec-1.7.jar:$HBASE_HOME/lib/zookeeper-3.4.5.jar:$HIVE_HOME/lib/hive-jdbc-0.11.0.jar:$HIVE_HOME/lib/hive-metastore-0.11.0.jar:$HIVE_HOME/lib/hive-serde-0.11.0.jar:$HIVE_HOME/lib/hive-common-0.11.0.jar:$HIVE_HOME/lib/hive-service-0.11.0.jar:$HIVE_HOME/lib/libfb303-0.9.0.jar:$HIVE_HOME/lib/postgresql-9.2-1003.jdbc3.jar:$HIVE_HOME/lib/libthrift-0.9.0.jar:$HIVE_HOME/lib/slf4j-api-1.6.1.jar:$HIVE_HOME/lib/commons-logging-1.0.4.jar:/home/ahadoop2/Hadoop2Training.jar

export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PIG_HOME/bin:$HBASE_HOME/bin:$HIVE_HOME/bin:/bin:/usr/lib64/qt-3.3/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:

Step 7: Load .bashrc

$cd ~$. .bashrc

Step 8: Formatting the name node

$cd ~$hadoop namenode -format

Step 9: Starting Cluster

$cd ~/hadoop-2.2.0/sbin$ ./start-all.sh

To view the started daemons$ jps This should show the started daemons.

NameNodeDataNodeSecondaryNamenodeNodemanagerResourceManager

Apache Hbase Installation in Pseudo Mode

Step 1: Untar the tarballs

$tar -xvzf hbase-0.96.0-hadoop2.tar.gz

Step 2: Configuration files

$cd hbase-0.96.0-hadoop2/conf

Page 6: Apache hadoop 2_installation

$vim hbase-site.xml

Copy following properties in hbase-site.xml

<property><name>hbase.rootdir</name><value>hdfs://localhost:8020/hbase</value><description>The directory shared by RegionServers</description>

</property>

<property><name>hbase.cluster.distributed</name><value>true</value>

</property>

<property><name>dfs.replication</name><value>1</value>

</property>

$vim regionservers

Add localhost in regionservers file

Step 3: Add hadoop jars from hadoop directory to hbase lib directory

$cd /home/hadoop/hadoop-2.2.0/share/hadoop/common/$cp hadoop-common-2.2.0.jar /home/hadoop/hbase-0.96.0-hadoop2/lib/

Step 4: start hbase

$cd ~$start-hbase.sh

Step 5: To view the started daemons

$ jps HmasterHregionserverHquorumpeer

Step 6: To view hbase shell

$hbase shell

Step 7: Before connecting to hbase using java

Start hbase rest service by executing following command

Page 7: Apache hadoop 2_installation

$hbase-daemon.sh start rest -p 8090

Apache Hive Installation

Step 1: Untar the tarballs

$tar -xvzf hive-0.11.0.tar.gz

Step 2: Configuring a remote PostgreSQL database for the Hive Metastore

Before you can run the Hive metastore with a remote PostgreSQL database, you must configure a connector to the remote PostgreSQL database, set up the initial database schema, and configure the PostgreSQL user account for the Hive user.

Install and start PostgreSQL if you have not already done so you need to edit the postgresql.conf file. Set the listen property to * to make sure that the Configure authentication for your network in pg_hba.conf. Add a new line into pg_hba.con that has the following information:

Start PostgreSQL Server

$ su postgres

$cd $postgres_home/bin

$./pg_ctl start -D path_to_data_dir

Install the Postgres JDBC Driver

Copy postgresql-jdbc driver in $HIVE_HOME/lib/

Create the metastore database and user account

Proceed as in the following example:

bash# sudo –u postgres psql

Page 8: Apache hadoop 2_installation

bash$ psqlpostgres=# CREATE USER hiveuser WITH PASSWORD 'mypassword';postgres=# CREATE DATABASE metastore;postgres=# exit;

bash# sudo –u hiveuser metastore

You are now connected to database 'metastore' with hiveuser.

metastore=# \i /home/hadoop/hive-0.11.0/scripts/metastore/upgrade/postgres/hive-schema-0.10.0.postgres.sql

Step 3: Configuration files

$cd hive-0.11.0/conf$vim hive-site.xml

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property><name>hive.metastore.warehouse.dir</name><value>/user/hive/warehouse</value><description>location of default database for the warehouse</description>

</property>

<property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:postgresql://<postgresql instance ip>:5432/metastore</value>

</property>

<property><name>javax.jdo.option.ConnectionDriverName</name><value>org.postgresql.Driver</value>

</property>

<property><name>javax.jdo.option.ConnectionUserName</name><value>hiveuser</value>

</property>

<property><name>javax.jdo.option.ConnectionPassword</name><value>mypassword</value>

</property>

<property>

Page 9: Apache hadoop 2_installation

<name>datanucleus.autoCreateSchema</name><value>false</value>

</property>

<property><name>hive.metastore.uris</name><value>thrift://<namenode ip>:9083</value><description>IP address (or fully-qualified domain name) and port of the metastore

host</description></property>

<property><name>datanucleus.autoStartMechanism</name><value>SchemaTable</value>

</property>

</configuration>

Step 4: Strat hive metastore

$hive --service metastore

Step 5: To view hive console

$hivehive>show tables;OK

Step 6: Before connecting to hive using java

Start hiveserver by executing following command

$hive --service hiveserver

Apache pig installation

Step 1: Untar the tarballs

$tar -xvzf pig-0.12.0.tar.gz

Step 2: Delete two jars (pig and pig-without hadoop jar) from pig home directory and add pig-withouthadoop.jar in pig installation directory (Uploaded in knowmax same path)

Step 3: To open pig grunt

$pig

Page 10: Apache hadoop 2_installation

<name>datanucleus.autoCreateSchema</name><value>false</value>

</property>

<property><name>hive.metastore.uris</name><value>thrift://<namenode ip>:9083</value><description>IP address (or fully-qualified domain name) and port of the metastore

host</description></property>

<property><name>datanucleus.autoStartMechanism</name><value>SchemaTable</value>

</property>

</configuration>

Step 4: Strat hive metastore

$hive --service metastore

Step 5: To view hive console

$hivehive>show tables;OK

Step 6: Before connecting to hive using java

Start hiveserver by executing following command

$hive --service hiveserver

Apache pig installation

Step 1: Untar the tarballs

$tar -xvzf pig-0.12.0.tar.gz

Step 2: Delete two jars (pig and pig-without hadoop jar) from pig home directory and add pig-withouthadoop.jar in pig installation directory (Uploaded in knowmax same path)

Step 3: To open pig grunt

$pig