application development on aws moulikrishna koppolu chandan singh rana

Download Application Development On AWS MOULIKRISHNA KOPPOLU CHANDAN SINGH RANA

If you can't read please download the document

Upload: jane-oneal

Post on 24-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • Application Development On AWS MOULIKRISHNA KOPPOLU CHANDAN SINGH RANA
  • Slide 2
  • Proposals Recommender System 1. Amazon Ec2 2. Hadoop 3. Mahout 4. Amazon review Dataset Analytical System 1. Amazon EMR 2. Amazon Ec2 3. Hive 4. S3
  • Slide 3
  • Dataset (Amazon Review Dataset) Arts Jewelery Automotives Baby Products
  • Slide 4
  • Recommender System Implementation
  • Slide 5
  • EC2 Set up Launch 3 EC2 micro instances for 3-node Hadoop setup Choose Amazon Machine Image (Ubuntu14.04 LTS) Set up parameters like the storage, number of instances, security parameters. Create a key-pair group for all three EC2 instances to allow access to them through a client like putty Download the key and set up putty for all three instances using instances hostname and the key
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Recommender System Implementation
  • Slide 11
  • Download Hadoop 1.2.1 Configure Hadoop on all three instances Start all services from the master node, which in turn runs corresponsding services on both secondary master node and slave node Test the Hadoop working by running a sample Map-Reduce job on master node. [Command : Hadoop jar hadoopexamples-1.2.1.jar pi 10 100000 ]
  • Slide 12
  • Recommender System Implementation
  • Slide 13
  • Download mahout 0.9 on master node Mahout 0.9 only works with Hadoop 1.X versions. Mahout 1.10 is available for Hadoop 2.X but with completely different setup.
  • Slide 14
  • Recommender System Implementation
  • Slide 15
  • Slide 16
  • Upload the Amazon review data set in the form of csv. To run a recommender job on mahout, the dataset should not have any string literals. So the csv file is converted into a set of integers. Using mahout recommender jar, run the recommendation job using command hadoop jar /home/ubuntu/mahout-distribution-0.9/mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s SIMILARITY_COOCCURRENCE --input u.data --output output
  • Slide 17
  • Results
  • Slide 18
  • Slide 19
  • Analytical system Steps involved: 1. Set up S3 bucket instance 2. Set up emr cluster 3. Set up hive program as a job in emr 4. Create table and Upload the data into the hive table using hiveQL
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Sample HiveQl query CREATE EXTERNAL TABLE IF NOT EXISTS sampledb( name STRING, course STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY "," lines terminated by '\n' LOCATION 's3://moulibucket/Files';
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • THANK YOU!