big data&hadoop

17

Upload: tata-sairamesh

Post on 17-May-2017

229 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Big Data&Hadoop
Page 2: Big Data&Hadoop

Course Topics

Week 1 – Understanding Big Data– Introduction to HDFS– Playing around with Cluster– Data loading Techniques

Week 2– Map-Reduce Basics, types

and formats– Use-cases for Map-Reduce– Analytics using Pig– Understanding Pig Latin

Week 3 – Analytics using Hive– Understanding HIVE QL – NoSQL Databases– Understanding HBASE

Week 4– Zookeeper, Sqoop, Flume– Debug MapReduce programs

in Eclipse.– Real world Datasets and

Analysis– Planning a career in Big Data

Page 3: Big Data&Hadoop

What is Big Data?

Page 4: Big Data&Hadoop

Facebook Example

Facebook users spend 10.5 billion minutes (almost 20,000 years) online on the social networkFacebook has an average of 3.2 billion likes and comments are posted every day.

Page 5: Big Data&Hadoop

Twitter has over 500 million registered users.

The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough to finish well ahead of Brazil, Japan, the UK and Indonesia.

79% of US Twitter users are more like to recommend brands they follow

67% of US Twitter users are more likely to buy from brands they follow

57% of all companies that use social media for business use Twitter

Twitter Example

Page 6: Big Data&Hadoop

Other Industrial Usecases

• Insurance • Healthcare• Genome Sequencing• Utilities

Page 7: Big Data&Hadoop

Hadoop Users

http://wiki.apache.org/hadoop/PoweredBy

Page 8: Big Data&Hadoop

Data volume is growing exponentially

• Estimated Global Data Volume:– 2011: 1.8 ZB– 2015: 7.9 ZB

• The world's information doubles every two years

• Over the next 10 years:– The number of servers worldwide will grow

by 10x– Amount of information managed by

enterprise data centers will grow by 50x– Number of “files” enterprise data center

handle will grow by 75x

Source: http://www.emc.com/leadership/programs/digital-universe.htm, which was based on the 2011 IDC Digital Universe Study

Page 9: Big Data&Hadoop

Un-Structured Data is exploding

Page 10: Big Data&Hadoop

Read 1 TB Data

10 Machines 4 I/O Channels Each Channel – 100 MB/s

4 I/O Channels Each Channel – 100 MB/s

1 Machine

Why DFS?

Page 11: Big Data&Hadoop

10 Machines 4 I/O Channels Each Channel – 100 MB/s

4 I/O Channels Each Channel – 100 MB/s

1 Machine

Read 1 TB Data

45 Minutes

Why DFS?

Page 12: Big Data&Hadoop

4.5 Minutes45 Minutes

10 Machines 4 I/O Channels Each Channel – 100 MB/s

4 I/O Channels Each Channel – 100 MB/s

1 Machine

Read 1 TB Data

Why DFS?

Page 13: Big Data&Hadoop

What Is Distributed File System? (DFS)

Page 14: Big Data&Hadoop

Apache Hadoop is a framework that allows for the distributed processing of large data

sets across clusters of commodity computers using a simple programming model.

Companies using Hadoop:

- Yahoo

- Google

- Facebook

- Amazon

- AOL

- IBM

- And many more at

http://wiki.apache.org/hadoop/PoweredBy

What is Hadoop?

Page 15: Big Data&Hadoop

Hadoop Eco-System

Page 16: Big Data&Hadoop

HDFS – Hadoop Distributed File System (storage)

MapReduce (processing)

Hadoop Core Components:

Page 17: Big Data&Hadoop

Any Questions ? See you in Next class

Thankyou.Sainagaraju vaduka