big data analysis for page ranking using map reduce

18
B B ig ig D D ata Analysis for ata Analysis for Page Ranking using Page Ranking using Map/Reduce Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, S.F.R.College for Women, Sivakasi.

Upload: vidhya-kumar

Post on 08-Aug-2015

66 views

Category:

Data & Analytics


6 download

TRANSCRIPT

BBig ig DData Analysis for Page ata Analysis for Page

Ranking using Map/ReduceRanking using Map/Reduce

R.Renuka, R.Vidhya Priya, III B.Sc., IT, S.F.R.College for Women, Sivakasi.

OverviewIntroductionWhat is Big Data!Why Big Data?4 V’s Of Big DataBig Data Analytics TechnologiesMap/Reduce Applications Case StudyConclusion

IntroductionData have outgrown the storage and processing capabilities of a single host.

Two fundamental challenges: – how to store and – how to work with voluminous data sizes, and, – how to understand data and turn it into a

competitive advantage.

What is Big Data! ‘Big-data’ is similar to ‘Small-data’, but

bigger

But having data bigger requires different approaches: techniques, tools & architectures

To solve: New problems and old problems in a better

way.

The Blind men and the Elephant

Why Big Data?Key enablers for the growth of “Big

Data” are:

Increase of Processing Power

Increase of Storage Capacities

Availability of Data

4 V’s of Big Data

Big Data Analytics Technologies

Hadoop

PLATFORA

WibiData

PIG

Hive

MapReduce

NoSQL databases

Column-oriented databases

HadoopHadoop is a distributed file system and data processing engine

Hadoop has two components:– The Hadoop distributed file system

(HDFS)– The MapReduce programing.

Map / ReduceA High level abstracted framework for distributed processing of large datasets

Fault Tolerant , Parallelization

Computation consists of two phasesMapReduce

A Master-Slave architecture

Computations occurs in multiple slave nodes

And it tries to provide data locality as much as possible.

MR modelMap– Process a key/value pair to generate

intermediate key/value pairsReduce– Merge all intermediate values associated with

the same key

Users implement interface of two primary methods:

1. Map: (key1, val1) → (key2, val2)2. Reduce: (key2, [val2]) → [val3]

Applications

Homeland Security

Finance Smarter Healthcare

Multi-channel sales

Telecom

Manufacturing

Traffic Control

Trading Analytics

Fraud and Risk

Log Analysis

Search Quality

Retails

Case Study

Conclusion

Real-time big data isn’t just a process

for storing petabytes or exabytes of data

in a data warehouse, It’s about the ability

to make better decisions and take

meaningful actions at the right time.

Queries ??