big data with hadoop and cloud computing
Post on 14-Jan-2015
1.650 Views
Preview:
DESCRIPTION
TRANSCRIPT
http://clean-clouds.com
Big Data with Hadoop and Cloud Computing
Researcher’s Blog - http://clean-clouds.com
“Big Data Processing” relevant for Enterprises
• Big Data used to be discarded or un-analyzed & archived.
– Loss of information, insight, and prospects to extract new value.
• How Big Data is beneficial?
– Energy companies - Geophysical analysis.
– Science and medicine - Empiricism is growing than experimentation
– Disney – Customer behavior patterns across its stores, and theme parks
• Pursuit of a “Competitive Advantage” is the driving factor for Enterprises
– Data mining (Log processing, click stream analysis, similarity algorithms, etc.), Financial
simulation (Monte Carlo simulation), File processing (resize jpegs), Web indexing
Researcher’s Blog - http://clean-clouds.com
Cloud Computing ~ brings economy to Big Data Processing
• Big Data Processing can be implemented by HPC & Cloud.
1) HPC implementation is very costly w.r.t. CAPEX & OPEX.
2) Cloud Computing is efficient because of its paper use nature.
• MapReduce programming model is used for processing big data sets.
• Pig, Hive, Hadoop, … are used for Big data Processing
– Pig - SQL-like operations that apply to datasets.,
– Hive - Perform SQL-like data analysis on data
– Hadoop - processes vast amounts of data; (Focal point)
• Use EC2 instances to analyze “Big Data” in Amazon IaaS.
• Amazon MapReduce reduces complex setup & Magt.
Researcher’s Blog - http://clean-clouds.com
Cost Comparison of Alternatives
Elastic: 1000 Standard ExtraLarge instances 15GB RAM,1690GB storageElastic MapReduce $377395
$1,746,769Elastic, Easy to Use, ReliableAuto turnoff resources.
Amazon MapReduce
As per Amazon EC2 cost comparison calculator
Use case: Analyze Next Generation Sequencing data to understand genetics of cancer.
100 Steady & 200 Peak load Servers68.4GB memory 1690 GB storage
•CAPEX & OPEX•Time-consuming set-up•Magt. of Hadoop clusters
HPC
400 reserved,600 on demandStandard Extra Largeinstances 15GB RAM,1690GB storage
•Time-consuming set-up•Magt. of Hadoop clusters
Amazon IaaS
Researcher’s Blog - http://clean-clouds.com
Future Direction• Current Experiments & Identified areas
– Social network analysis– Managing Data center– Collective Intelligence - Algorithms and Visualization
techniques – Predictive analytics
• Accelerators Exploration – Apache Whirr - Cloud-neutral way to run services – Apache Mahout - Scalable machine learning library– Cascading - Distributed computing framework– HAMA - define and execute fault tolerant data processing
workflows• Exploration of LAMP-like stack for Big Data aggregation, processing and analytics
Download with Linkedin Username/Password
Download with Linkedin Username/Password
Download with Linkedin Username/Password
Download with Linkedin Username/Password
Download with Linkedin Username/Password
Thank You
top related