#geodesummit: combining stream processing and in-memory data grids for near-real time aggregation...
TRANSCRIPT
Combining Stream Processing and In-Memory Data
Grids for Near-Real Time Aggregation and Notifications
• @omallassi
• Principal Architect @ Murex
• Backbone for Capital Markets; Front-Office to Back-Office to Risk, across multiple assets class
Post Trade and consolidated information are inputs for decision making Cycle time tends to be more and more « near real time » (depends on the asset class) Our mission: - Process an increasing volume of
trades/events - Aggregate trade and event data
based on use-case specific criteria - Accommodate real-time and historical
data inputs
10 000 foot view of finance… Decision Making Determination of
the best investments based on market trends
and existing investments in portfolio (FO)
Acquisition & Verification
Procurement of the assets. E.g. the
gold, the equities, the futures etc.
(MO)
Operation and Maintenance
Management and use of assets (BO)
Risk Management Risk control on these decisions
Store immutable events Filter & Aggregate these events based on the demanded perspectives on these real-time or historical events Notify about updates on aggregates
Solution Summary,
« As A Service »: Perspectives can be requested at any time, on any type of events Be Scalable, Resilient to failure and ensure Low Latency (sub milli second)
And, of course…
Flexibility This is a framework to build and manage perspectives
Historical and real-time events
Are stored in an Event Log, each event is identified by a unique and strictly monotonic offset
Are aggregated through the same graph of computation (DAG)
Ensure horizontal scalability (and distribution) Avoid locking and move back to a single-threaded model (per aggregate) Limit the number of TCP hops Limit the usage of disk
Key Architectural Principles
High Level Architecture
Apache Geode (Continous Query) Aeron Apache Storm
Perspectives are described using a custom DSL (on top of Storm Flux + JEXL)
Apache Geode
Apache Storm: stream processing engine
Not micro batch Aggregations are expressed as a (distributed) DAG The framework ensures routing of the events
Based on groupBy, to well known threads Routing strategies can be custom
Apache Storm 101
select * from source where …!group by x.y.z!
The framework on which the Event Log is built High Availability and resilience Horizontal scalability and distribution
Control of data partitioning and regions collocation Advanced storage configuration: In-memory, overflow on disk, etc… Advanced notifications (via Continous Queries)
Why Apache Geode?
Distributed & scalable « Query Engine » (DAG) Routing of Events through the DAG Cluster Management
On demand perspective deployment Resilience (failed engine are automatically restarted)
Why Apache Storm?
Storm / Geode are running in their dedicated JVMs Storm groupings ensure
Distribution accross multiple threads and multiple JVMs
Single threaded model Horizontal scalability with the number of threads / JVMs Multiple TCP hops
« Usual » Deployment Pattern
« Low latency » Deployment Pattern
Storm/Geode collocated inside the same JVM Events are routed to the right JVM based on a routing key
Use first element of groupBy as Partition
Resolver Storm custom groupings enable
Multi-threading Single-threaded model (per aggregate)
Regions (events, aggregates) are collocated Horizontal scalability with the number of threads / JVMs Limited and known number of TCP hops
Event Log provides a way to work on real-time and historical data with the same code Collocation of Storm and Geode is powerful This is a powerful and general pattern implementation which gives us an efficient and open framework
Efficiency, Performance requirements are reached Openness, DAG can be easily extended with CEP engines, rules engines Notifications based on solutions like Aeron or Geode continuous queries
To conclude
Join the Apache Geode Community Today!
• Check out: http://geode.incubator.apache.org
• Subscribe: [email protected]
• Download: http://geode.incubator.apache.org/releases/