building an elastic real time no sql platform

18
Creating a platform for unlimited elastic computation power and storage Building An Elastic Real Time NoSQL Platform

Upload: dfilppi

Post on 28-Jun-2015

110 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Building an elastic real time no sql platform

Creating a platform for unlimited elastic computation power and storage

Building An Elastic Real Time NoSQL Platform

Page 2: Building an elastic real time no sql platform

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Motivation

Complete elastic solution stack Applications that need massive “strategic” storage (disk-based

NoSQL) and a real time (“tactical”) component Horizontally and vertically scalable Highly available Self healing Fault tolerant: suitable for commodity h/w strategy Simplified management and monitoring, vs conventional,

multi-product solutions

Page 3: Building an elastic real time no sql platform

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

What Is Real-Time? It’s all relative In this context, it means “really fast”. How fast is really fast? Reads as low as 5 μs read and typically

under 1 ms for a fully replicated write.

Source: http://blog.gigaspaces.com/2010/12/06/possible-impossibility-the-race-to-zero-latency/

Page 4: Building an elastic real time no sql platform

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Two Layer Approach Advantage: Minimal

“impedance mismatch” between layers.– Both NoSQL cluster

technologies, with similar advantages

Grid layer serves as an in memory cache for interactive requests.

Grid layer serves as a real time computation fabric for CEP, and limited ( to allocated memory) real time map/reduce capability.

In Memory Compute Cluster

NoSQL Cluster

...

Raw

Eve

nt S

trea

m

Raw

Eve

nt S

trea

m

Raw

Eve

nt S

trea

m

Raw And Derived Events

Rep

orti

ng E

ngin

e

SCALE

SCALE

Page 5: Building an elastic real time no sql platform

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Two Layer Approach (continued)

Grid layer doing CEP can act as a filter, as many raw events get converted to semantic/business events, reducing meaningless data verbosity

Grid layer provides scalable messaging NoSQL layer provides unlimited cheap storage on commodity

hardware NoSQL layer provides virtually unlimited scale processing

power

Page 6: Building an elastic real time no sql platform

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Basics Of In Memory DataGrid Technology An In Memory Data Grid (IMDG) is a data store Grid just means “cluster” Data can be partitioned across cluster nodes Processing power near data storage Distributed hash table Application optimized data model denormalization Nodes are typically configured with one or more replicas

(sound familiar yet)? Not a “cache”: a system of record, but can be used as a

cache, or both

Page 7: Building an elastic real time no sql platform

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Advanced Capabilities Business logic (code) co-resident with data shards Scalable messaging Dynamic code execution across cluster Multi-language support Object-oriented Document-oriented/schema free Multi-level indexing SQL Queries Full ACID transaction support Elastic scaling (automatic and manual) Write-behind persistence

Page 8: Building an elastic real time no sql platform

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Features: IMDG vs NoSQL

Eventual/Tunable Consistency

Fault Tolerant

Highly Available

Horizontally Scalable

Low Latency

Parallel Execution

Code co-location

Unlimited scaleService remoting

Transactional

ElasticMessaging

Complex Event Processing

Platform Independent

Flexible Schema

Cloud enabled Hadoop tools

Data Grid Disk Based NoSQL

Page 9: Building an elastic real time no sql platform

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Vive La Difference

The IMDG compliments a NoSQL store:– Can serve as a short term request cache (side cache or inline)– Can serve as a cache for MR results– Enables event driven architectures / CEP– In memory map/reduce– Very fast writes, regardless of NoSQL store– Transactional layer: can essentially turn “eventual” consistency into

pure transactional persistency without a performance hit– Highly available and independently scalable

Page 10: Building an elastic real time no sql platform

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

A Complete Scalable Application Platform

In Memory Compute Cluster

NoSQL Cluster

...

Raw

Eve

nt S

trea

m

Raw

Eve

nt S

trea

m

Raw

Eve

nt S

trea

m

Real

Tim

e Ev

ents

Raw And Derived Events

Real

Tim

e Ev

ents

Repo

rting

Eng

ine

SCALE

SCALE

Page 11: Building an elastic real time no sql platform

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Key Implementation Issues

Grid must support reliable asynchronous persistence– If not reliable: in-flight data is at risk. Ideally tunable to accommodate

differing risk tolerance.– If not asynchronous: too slow– If not persistent: obviously nothing gets send to disk

To do more than a distributed cache, grid must support code and data partitioning– Ideally, code is collocated in memory with data partition– Needed to support CEP, application, and service remoting capabilities

Page 12: Building an elastic real time no sql platform

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Key Implementation Issues

Grid ideally supports FIFO entry ordering– Key to using grid as a queue– Key to scaling messaging without an additional tier– Combined with co-located business logic, operates at memory speeds

Write speed on the NoSQL layer– Grid is, in effect, queuing entries to the NoSQL layer– If the NoSQL layer cannot keep up, in memory grid backs up– This behavior is an asset, unless an unanticipated, sustained flood

occurs.– The faster the write speed the better

Page 13: Building an elastic real time no sql platform

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Use Case 1 – Event Cloud Complex event processing

Collect events in real time•Interactions•Orders•Bills•Payments•Activations•…

Transform into decision factors•Good customer•Pays 3-6 days early•Decreasing usage•Missed payment•Unusual bill•App usage

Original events, possibly scrubbed or annotated, are passed through

Business logic derived “synthetic events” constructed from raw event stream. Possible rule engine integration(e.g. Drools).

Derived events and analytics passed on to NoSQL layer Other events forwarded to external listeners, systems

Page 14: Building an elastic real time no sql platform

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Use Case 2 – Time Bounded

Time Bounded – suited to operations with daily business cycle (e.g. trading)

Current day (or other time period that will fit in memory) held in memory, along with related application state, caching etc…

Still streaming operations to underlying NoSQL platform, or hold for end of day flush if back end can’t write fast enough.

Supports application hosting, messaging, and complex event processing.

External clients are aware of “current day” store, vs archival. Large scale reports/analytics run in background on NoSQL

archive.

Page 15: Building an elastic real time no sql platform

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Use Case 3 - LRU

Grid holds a subset of NoSQL store, and supports an LRU caching model.

In line or side-cache. Appropriate only in cases where, like any cache, usage pattern

does not generate many cache misses. Still supports CEP, messaging, and computation scaling

(provided grid product supports it).

Page 16: Building an elastic real time no sql platform

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Wishlist

This platform concept is still at an early stage For Gigaspaces, integrations already exist for Cassandra and

MongoDB. Customers are currently implementing solutions Stuff I’d like to see:

– Unified management and scaling. Shared infrastructure.– Grid/NoSQL aware hive façade that can run MR jobs on both. Perhaps

other Hadoop tools integration– Deeper integration. To further optimize write speed/capacity, and

perhaps offload some in-memory aspects of underlying NoSQL platform to minimize duplication and possibly optimize elasticity.

Page 17: Building an elastic real time no sql platform

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Conclusion

Two shared nothing “NoSQL” architectures complementing each other

Fully elastic/scalable Ultra high performance/low latency combined with unlimited

scale. Full application stack Highly reliable and self-healing Scalable complex event handling Multi-language Simple. Two products.

Page 18: Building an elastic real time no sql platform

18