committed to deliver…. we are leaders in hadoop ecosystem. we support, maintain, monitor and...

16
Committed to Deliver…

Upload: deborah-stanley

Post on 11-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

Committed to Deliver…

Page 2: Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over

Hadoop whether you run apache Hadoop, Facebook version or Cloudera version in your own data center, or n cluster of machines Amazon EC2, Rackspace etc

We provide Scalable End-to-end Solution: Solution that can scale of large data set (Tera Bytes or Peta Bytes)

Low Cost Solution: Based on open source Framework currently used by Google, Yahoo and Facebook.

Solution optimized for minimum SLA and maximize performance

Page 3: Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

– Project Initiation• Project Planning• Requirement Collection• POC using Hadoop technology

– Team Building• Highly skilled Hadoop experts • Dedicated team for project

– Agile Methodology• Small Iterations• Easy to implement changing

requirement

– Support• Long term relationship to support developed product• Scope to change based on business/technical need

Page 4: Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

The combined experience has led to the adoption of unique methodology that ensures quality work. We:

Evaluating the hardware available and understand the clients requirements.

Peeking through the data. Analyzing data, prototype using M/R code. Show the

results to our clients. Iterative - and continuous improvement and develop

better understanding of data. Parallel development of various tasks:

◦ Data Collection◦ Data Storage in HDFS◦ M/R Analytics jobs.◦ Scheduler to run M/R jobs and bring coordination.◦ Transform output into OLAP cubes (Dimension

and Fact Table)◦ Provide a custom interface to retrieve the M/R

output

Page 5: Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

We are expert in time series data, in other words we receive time-stamp data.

We have ample experience in writing efficient fast and robust Map/Reduce code which implement ETL functions.

We have massaged Hadoop to enterprise standard provided features like High Availability, Data Collection, data Merging.

Writing Map/Reduce is not enough. We wrote layers on top of Hadoop which uses Hive, Pig to transform data in OLAP cubes for easy UI consumption.

Page 6: Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

We provide a brief about our clients.

Page 7: Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

Collector

Hadoop ClusterHadoop Cluster

Map / Reduce Output

UI Display

UI Display

Thrifit ServiceThrifit Service

Training Data

WebUI

WebUI

Page 8: Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

External News Collector

Map/ Reduce Categorization Index

Map/ Reduce (Filtering, Term Freq Collection)

Map/ Reduce (Training Set) Training Data

DFS Client

Hive Interface

Page 9: Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

We were asked to analyze their sales data and extract valuable information from the data.

The Data was in form of 9-tuple format: <OrderID, EmailID, MobileNum, ProductID, PayableAmount, DeliveryCharges, ModeofPayment, OrderStatus, OrderSite>

We were asked to provide information like unique subscribers count (used email address), per day transactions amount

We deployed the Hadoop cluster on three machines ◦ Deployed our collector to pump data from DB into

HDFS◦ Wrote M/R jobs to generate OLAP cubes.◦ Provided Hive Interface to extract and show in UI.

Page 10: Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

OrderID EmailID Mobile Num

Payable Amount

Delivery Charges

Mode of Payment

Order Status

Order Site

Day Granularity

Actual Number of Customers

Forecast Number of Customers

Total Aggregated Amount

Forecast Aggregated Amount

Email ID Payable Amount

Page 11: Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

We delivered end-to-end reporting solution to Guavus.

The Data was provided by Sprint Network (Tier 1 Company) we had to develop a reporting engine to analyze and generate OLAP cubes.

We were asked to provide evaluate Peta Bytes of data provide ETL solution

We deployed the Hadoop cluster on 10 Linux machines.

We wrote our collector which read Binary Data and pushed into Hadoop Cluster.

We wrote M/R jobs (which run for 4 hrs) every day The idea was to provide provide analytics on stream data

We generate OLAP cubes and storing results in Infinity DB (column DB), Hive.

Page 12: Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

Reporting UI/ Web InterfaceReporting UI/ Web Interface

Report Generation Task (Map / Reduce Framework)Report Generation Task (Map / Reduce Framework)

Data CollectorData Collector Query EngineQuery Engine Hadoop Configuration

Hadoop Configuration

Distributed Storage Framework (Hadoop / HDFS)Distributed Storage Framework (Hadoop / HDFS)

Infinity DB / Hive / PigInfinity DB / Hive / Pig

Page 13: Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

Hadoop Infrastructure

Hadoop Infrastructure

Map / Reduce Tasks

Map / Reduce Tasks

Data Collector

Data Collector

Data Collector

Data Collector

Data Collector

Data Collector

Data Collector

Data Collector

Monitor / Overall Scheduler

Infinity DB / Hive /

Pig

Infinity DB / Hive /

Pig

Rubix Framework

Rubix Framework

UI D

ispla

yU

I Disp

lay

Page 14: Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

For HT we are developing a syndication clustering algorithm.

We have large amount of old news document and we were asked cluster. Manually clustering was nearly impossible

We implement a clustering Map/Reduce algorithm using Cosine Similarity and clustered the documents.

Page 15: Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

XML files/Documents

V1

V2

VN

List of XMLNews Files

Transformed into Integer Vector.One XML news file maps to One Vector.

ApplyCoSine

SimilarityBetweenVectors

Get the Minimum Distance Pair of Vector

NewsFiles

NewsFiles

NewsFilesNewsFiles

News Files

News Files

CreateList of closely related stories

HADOOP PLATFORM

MAP Functionality REDUCE Functionality

Clus

ter A

lgor

ithm

C-Ba

yes

Clas

sific

ation

C-Ba

yes

Clas

sific

ation

Cate

goriz

e D

ocum

ents

Cate

goriz

e D

ocum

ents

Page 16: Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

Office Location: India A-82, Sector 57, Noida, UP, 201301 Japan 2-8-6-405,Higashi Tabata Kita-ku,Tokyo,Japan General Inquiries [email protected] Sales Inquiries [email protected]