hadoop on ec2
Post on 29-Nov-2014
3.576 Views
Preview:
DESCRIPTION
TRANSCRIPT
Hadoop on EC2
Configuring and running Hadoop clusters, using Cloudera distribution
My farm
Start
Confirm
OK, it's running
Set /etc/hosts
Logging into an EC2 machine
ec2_login.sh:#!/bin/shssh -i ~/.ssh/shmsoft_hadoop.pem ubuntu@$1
For example,
ec2_login sh1
Run command on the cluster
run_on_cluster.sh: #!/bin/bashfor i in {1..6}do ssh -i ~/.ssh/shmsoft_hadoop.pem ubuntu@sh$i $1done
For example:
run_on_cluster.sh 'ifconfig | grep cast'
Result of running ifconfig
inet addr:10.220.141.227 Bcast:10.220.141.255 inet addr:10.95.31.140 Bcast:10.95.31.255 inet addr:10.220.214.15 Bcast:10.220.215.255 inet addr:10.94.245.56 Bcast:10.94.245.255 inet addr:10.127.17.143 Bcast:10.127.17.255 inet addr:10.125.79.225 Bcast:10.125.79.255
Edit local conf
gedit masters core-site.xml mapred-site.xml slaves
10.220.141.227 - masters, core-site.xml10.95.31.140 - mapred-site.xml10.220.214.15 - slaves...10.94.245.5610.127.17.14310.125.79.225
masters
10.220.141.227
core-site.xml
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration> <property> <name>fs.default.name</name> <value>hdfs://10.220.141.227</value> </property></configuration>
mapred-site.xml
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration> <property> <name>mapred.job.tracker</name> <value>10.95.31.140:54311</value> </property> <property> <name>mapred.local.dir</name> <value>/tmp/mapred</value> </property></configuration>
slaves
10.220.214.1510.94.245.5610.127.17.14310.125.79.225
update-hadoop-cluster.sh
#!/bin/bashfor i in {1..6}do scp -i ~/.ssh/hadoop.pem -r ~/projects//hadoop/conf ubuntu@hc$i:/home/ubuntu/done
run-hadoop-cluster.sh 'sudo cp /home/ubuntu/conf/* /etc/hadoop/conf/'
Important gotchas
sudo chkconfig hadoop-0.20-namenode off
repeat for each installed service
Important gotchas - 2
On EC2 in /etc/hosts you have
# Added by cloud-init127.0.1.1 domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53
Instead, do 127.0.0.1# Added by me - the developer127.0.0.1 domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53
# !!! for remote access, use internal ip:10.220.169.157 domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53
Now start Hadoop services
On each node
for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done
Do it with script
run_my_cluster.sh ''
Verify HDFS and MR
Verify!
hadoop fs -ls /copy tocopy fromrun MR better yet...
w3m http://localhost:50070
w3m next screen
Start HBase
start-hbase.sh
#!/bin/shsudo /etc/init.d/hadoop-hbase-master startsudo /etc/init.d/hadoop-zookeeper-server startsudo /etc/init.d/hadoop-hbase-regionserver start
w3m http://localhost:60010
Stop HBase
1. Do compaction
2. stop-hbase.sh
#!/bin/shsudo /etc/init.d/hadoop-hbase-master stopsleep 5sudo /etc/init.d/hadoop-zookeeper-server stopsudo /etc/init.d/hadoop-hbase-regionserver stop
Amazon EMR
Whirr
top related