#geodesummit: combining stream processing and in-memory data grids for near-real time aggregation...

Olivier MALLASSI / March 9, 2016

Combining Stream Processing and In-Memory Data

Grids for Near-Real Time Aggregation and Notifications

•  @omallassi

•  Principal Architect @ Murex

•  Backbone for Capital Markets; Front-Office to Back-Office to Risk, across multiple assets class

Post Trade and consolidated information are inputs for decision making Cycle time tends to be more and more « near real time » (depends on the asset class) Our mission: -  Process an increasing volume of

trades/events -  Aggregate trade and event data

based on use-case specific criteria -  Accommodate real-time and historical

data inputs

10 000 foot view of finance… Decision Making Determination of

the best investments based on market trends

and existing investments in portfolio (FO)

Acquisition & Verification

Procurement of the assets. E.g. the

gold, the equities, the futures etc.

(MO)

Operation and Maintenance

Management and use of assets (BO)

Risk Management Risk control on these decisions

Store immutable events Filter & Aggregate these events based on the demanded perspectives on these real-time or historical events Notify about updates on aggregates

Solution Summary,

« As A Service »: Perspectives can be requested at any time, on any type of events Be Scalable, Resilient to failure and ensure Low Latency (sub milli second)

And, of course…

Flexibility This is a framework to build and manage perspectives

Historical and real-time events

Are stored in an Event Log, each event is identified by a unique and strictly monotonic offset

Are aggregated through the same graph of computation (DAG)

Ensure horizontal scalability (and distribution) Avoid locking and move back to a single-threaded model (per aggregate) Limit the number of TCP hops Limit the usage of disk

Key Architectural Principles

High Level Architecture

Apache Geode (Continous Query) Aeron Apache Storm

Perspectives are described using a custom DSL (on top of Storm Flux + JEXL)

Apache Geode

Apache Storm: stream processing engine

Not micro batch Aggregations are expressed as a (distributed) DAG The framework ensures routing of the events

Based on groupBy, to well known threads Routing strategies can be custom

Apache Storm 101

select * from source where …!group by x.y.z!

The framework on which the Event Log is built High Availability and resilience Horizontal scalability and distribution

Control of data partitioning and regions collocation Advanced storage configuration: In-memory, overflow on disk, etc… Advanced notifications (via Continous Queries)

Why Apache Geode?

Distributed & scalable « Query Engine » (DAG) Routing of Events through the DAG Cluster Management

On demand perspective deployment Resilience (failed engine are automatically restarted)

Why Apache Storm?

Storm / Geode are running in their dedicated JVMs Storm groupings ensure

Distribution accross multiple threads and multiple JVMs

Single threaded model Horizontal scalability with the number of threads / JVMs Multiple TCP hops

« Usual » Deployment Pattern

« Low latency » Deployment Pattern

Storm/Geode collocated inside the same JVM Events are routed to the right JVM based on a routing key

Use first element of groupBy as Partition

Resolver Storm custom groupings enable

Multi-threading Single-threaded model (per aggregate)

Regions (events, aggregates) are collocated Horizontal scalability with the number of threads / JVMs Limited and known number of TCP hops

Event Log provides a way to work on real-time and historical data with the same code Collocation of Storm and Geode is powerful This is a powerful and general pattern implementation which gives us an efficient and open framework

Efficiency, Performance requirements are reached Openness, DAG can be easily extended with CEP engines, rules engines Notifications based on solutions like Aeron or Geode continuous queries

To conclude

Join the Apache Geode Community Today!

•  Check out: http://geode.incubator.apache.org

•  Subscribe: [email protected]

•  Download: http://geode.incubator.apache.org/releases/

Thank you!

#geodesummit: combining stream processing and in-memory data grids for near-real time aggregation...

Technology