application development on aws moulikrishna koppolu chandan singh rana

Application Development On AWS MOULIKRISHNA KOPPOLU CHANDAN SINGH RANA

Proposals Recommender System 1. Amazon Ec2 2. Hadoop 3. Mahout 4. Amazon review Dataset Analytical System 1. Amazon EMR 2. Amazon Ec2 3. Hive 4. S3

Dataset (Amazon Review Dataset) Arts Jewelery Automotives Baby Products

Recommender System Implementation

EC2 Set up Launch 3 EC2 micro instances for 3-node Hadoop setup Choose Amazon Machine Image (Ubuntu14.04 LTS) Set up parameters like the storage, number of instances, security parameters. Create a key-pair group for all three EC2 instances to allow access to them through a client like putty Download the key and set up putty for all three instances using instances hostname and the key

Download Hadoop 1.2.1 Configure Hadoop on all three instances Start all services from the master node, which in turn runs corresponsding services on both secondary master node and slave node Test the Hadoop working by running a sample Map-Reduce job on master node. [Command : Hadoop jar hadoopexamples-1.2.1.jar pi 10 100000 ]

Download mahout 0.9 on master node Mahout 0.9 only works with Hadoop 1.X versions. Mahout 1.10 is available for Hadoop 2.X but with completely different setup.

Upload the Amazon review data set in the form of csv. To run a recommender job on mahout, the dataset should not have any string literals. So the csv file is converted into a set of integers. Using mahout recommender jar, run the recommendation job using command hadoop jar /home/ubuntu/mahout-distribution-0.9/mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s SIMILARITY_COOCCURRENCE --input u.data --output output

Results

Analytical system Steps involved: 1. Set up S3 bucket instance 2. Set up emr cluster 3. Set up hive program as a job in emr 4. Create table and Upload the data into the hive table using hiveQL

Sample HiveQl query CREATE EXTERNAL TABLE IF NOT EXISTS sampledb( name STRING, course STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY "," lines terminated by '\n' LOCATION 's3://moulibucket/Files';

THANK YOU!

application development on aws moulikrishna koppolu chandan singh rana

Documents

s3 slide

key slide

hiveql slide

moulibucketfiles slide

output output slide

download hadoop

node hadoop setup

amazon ec2