engineering big data with hadoop
DESCRIPTION
This presentation explains about Introduction of BIG DATA with HADOOP.TRANSCRIPT
ENGINEERING BIG DATA WITH
HADOOP
BYInternational School of
Engineering {We Are Applied Engineering}
Disclaimer: Some of the Images and content have been taken from multiple online sources and this presentation is intended only for knowledge sharing but not for any commercial business intention
OVERVIEW
• WHAT IS BIG DATA?
• EXPLOSION OF DATA
• DATA CONTRIBUTIONS
• DATA EXPLOSION
• WHO ARE THE PLAYERS?
• BIG DATA–BIG PICTURE– LANDSCAPE
• BIG DATA– ENTERPRISE ROLES
• WHAT IS HADOOP?
• EVOLUTION OF HADOOP
• HADOOP ECOSYSTEM
• HADOOP ECOSYSTEM MAP
• HADOOP: 30,000 FEET VIEW
• BIG DATA & ANALYTICS Case studies
• VIDEO OF HADOOP ECOSYSYTEM
WHAT IS BIG DATA?
• High-volume, high-velocity and high- variety information assets that demand cost- effective,
innovative forms of information processing for enhanced insight and decision making.
-Gartner
HIGH VOLUME
HIGH VELOCITY
HIGH VARIETY
EXPLOSION OF DATA
Source: http://www.emc.com/leadership/digital-universe/iview/index.htm
DATA CONTRIBUTIONS
DATA EXPLOSION
Bing ingests > 7 petabyte a month
The Twitter community generates over 1 terabyte of tweets every day
Cisco predicts that by 2013 annual internet traffic flowing will reach 667
exabytes
Source: http://www.emc.com/collateral/about/news/idc-emc-digital-universe-2011-infographic.pdf
Source: http://www.emc.com/collateral/about/news/idc-emc-digital-universe-2011-infographic.pdf
WHO ARE THE PLAYERS?
BIG DATA–BIG PICTURE– LANDSCAPE
BIG DATA– ENTERPRISE ROLES
INTRODUCTION TO
WHAT IS HADOOP?
• Flexible
Structured/Unstructured
Text/Binary
Schema/Schema less
• 100% Open Source
• Scalable
– Petabytes of Data
– Thousands of Nodes
Source: http://cloudtimes.org/2013/06/25/hadoop-as-a-service-market-growing/
How does an Elephant Sneak up on you?
EVOLUTION OF HADOOP
HADOOP ECOSYSTEM
Chukwa Sqoop Zookeeper Pig
HBase Avno Mahout Flume
WhirrMap Reduce Engine
Hama
Hive
Hadoop Distributed File System
Hadoop Common
Source: http://indoos.wordpress.com/2010/08/16/hadoop-ecosystem-world-map/
HADOOP ECOSYSTEM MAP
Hadoop Evolution – Map Explained!
• How did it all start- huge data on the web!
• Nutch built to crawl this web data
• Huge data had to be saved- HDFS was born!
• How to use this data? Map reduce framework built for coding and running analytics – java,
any language-streaming (Hadoop streaming)
• How to get in unstructured data – Web logs, Click streams, Apache logs, Server logs –
fuse,webdav, chukwa, flume, Scribe
• Hiho and sqoop for loading data into HDFS – RDBMS can join the Hadoop band wagon!
Continued
• High level interfaces required over low level map reduce programming– Pig, Hive, Jaql
• BI tools with advanced UI reporting- drilldown etc- Intellicus
• Workflow tools over Map-Reduce processes and High level languages: Oozie
• Monitor and manage hadoop, run jobs/hive, view HDFS – high level view- Hue, karmasphere,
eclipse plugin, cacti, ganglia
• Support frameworks- Avro (Serialization), Zookeeper (Coordination)
• More High level interfaces/uses- Mahout, Elastic map Reduce
• OLTP- also possible – Hbase
• Distribute data initially
– Let processors / nodes work on local data
– Minimize data transfer over network
– Replicate data multiple times for increased availability
• Write applications at a high level
– Programmers should not have to worry about network programming, temporal
dependencies, low level infrastructure, etc
• Minimize talking between nodes (share-nothing)
HADOOP: 30,000 FEET VIEW
BIG DATA & ANALYTICS
Case Studies
YAHOO - PERSONALIZATION
YAHOO SEARCH ASSIST
For Detailed Description of HADOOP ECOSYSTEM
components
checkout our video on
Plot no 63/A, 1st Floor, Road No 13, Film Nagar, Jubilee Hills, Hyderabad-500033
For Individuals (+91) 9502334561/62For Corporates (+91) 9618 483 483
Facebook: www.facebook.com/insofe
Slide share: www.slideshare.net/INSOFE
International School of Engineering