bizosys at fifth elephant

Report

Tags:

Post on 12-Jul-2015

935 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

©2013 BIZOSYS TECHNOLOGIES PRIVATE LIMITED

15 Billion computations in

187 milliseconds

with a Big Join in Hadoop

Business Drivers

1. Support 6 months of data as opposed to 2 days

2. Near real-time calculation with optimal infrastructure

The Use-case : Assessing Market Risk of an

Investment Portfolio

The Use-case : Assessing Market Risk of an

Investment Portfolio

Acc Equity Qty

A1 MSFT 100

A1 ORCL 500

A2 CISCO 400

Equity Model1 Model2

MSFT $78.00 $77.12

ORCL $33.78 $31.09

CISCO $32.12 $16.00

X

What is the total portfolio value for Model1?

Problem with The Big Join :

Acc Equity Qty

A1 MSFT 100

A1 ORCL 500

A2 CISCO 400

Equity Model1 Model2

MSFT 78$ 77.12$

ORCL 45.12$ 49.77$

CISCO 32.12$ 16.0$

X3M positions2M products * 5000

Models/Day

15 Billion Calculations

Schema Design…

Price Model DAY1 DAY N

Model1 Product 1 - PriceProduct 2 - Price….Product 2000000 - Price

…

… … …

Model 5000 … …

Date All Positions

XX-XXX-XXXX Acc Id 1 – ProductId 1 - 23 stocks…Acc Id 22000 – ProductId 200000 - 111 stocks

Why 1 price model is packed in 1 HBase Cell?

0

100

200

300

400

500

600

2M Products in 1 Cell 2M Products in 2M Cells

Eventual Consistency Overhead

GBs required : Product-Price model Data

Get rid of “HBase Cell meta-data” payload

Why Region Server is set at 16*64 MB?

1 Thread per Price Model64 Price Model/Machine

78 64core machines** @ 78 Region Servers

Enable Parallel Computing

**This is based on scalability factor of performance testing (150ms/ price model with parallel computing)

Why HBase Coprocessors are used?

Region 2Machine 1

Region 1Machine 1

HBaseCoprocessor

1 Cell = 1st Price Model =2 Million product prices =

8 * 2 = 16M

1 Cell = 2nd Price Model =2 Million product prices =

8 * 2 = 16M

Region 78Machine 78

1 Cell=5000th Price Model =2 Million product prices =

8 * 2 = 16M

Value @ Risk output For 1

Day

Reducer

Mapper

Mapper

Mapper

Map-Reduce does not Jam Network.

Fin

al o

utp

ut

of

mo

de

ls

Why is price-model-id stored as row-key?

Reading Sequentially (HBase Scanner) is lot faster than Random Row Read

Hadoop Distributed File System

Hadoop Map-Reduce Hadoop HBase

HSearch Indexer HSearch Coprocessor

MR Indexing Job with Lucene Analyzers

VAR RealTime MR Plug-In

HSearch Adapter

VAR Computation Application

Batch Mode Indexing Real-Time computation

The Final Building Blocks

Why We Like HBase

Why We Built HSearch

• Scalable• Real-Time• Apache Licensed

• Search and Analysis inside Hadoop• Real-time Map-Reduce• Extreme Parallelization

• Distribute index with auto-sharding and auto-replication - Handle Big Data

• Parallelize Indexing, Searching, Grouping – in milliseconds

• Binary serde, Compress, (May encrypt) at storage and transmission - Securely

• Cache everything – Serving thousand of users

• Redundize everything –With very limited support engineers.

• Index, Search and Analyze multi-structure big data in milliseconds.

• Search/Analyze as events unfold - For any additions or changes at sources.

• Plug-in custom algos/code with runtime data grouping and computing.

WHY

HOW

Available on

Apache Licensed

hadoopsearch.net

https://github.com/bizosys/

©2013 BIZOSYS TECHNOLOGIES PRIVATE LIMITED

For more information regarding Bizosys business, please write to sunil@bizosys.com

http://www.bizosys.com

mailto: sunil@bizosys.com

http://www.bizosys.com/

top related

elephant fiber

Documents

elephant conservation center, elephant hospital and nursery,...

Documents

scaling solrcloud to a large number of collections - fifth...

Software

survey report on elephant movement, human-elephant ...survey...

Documents

promoting human elephant coexistence among human elephant

Documents

elephant water

Technology

an elephant is like... is this an elephant? what can an...

Documents

conservation for whom? elephant conservation and elephant

Documents

the world of elephant & piggie -...

Documents

bizosys service offerings

Technology

intel labwork - bizosys technologies

Technology

visualising multi dimensional data @ fifth elephant 2015

Data & Analytics

systems guide - elephant plasterboard · elephant...

Documents

4 elephant

Documents

storm @ fifth elephant 2013

Technology

elephant skeleton

Design

tongue :or: the talent of oliver elephant elephant

Documents

elephant brand suspended ceiling system · gsystem elephant...

Documents

elephant pattern

Documents

elephant coridor

Investor Relations