the architecture of wemlin hub

Post on 27-Jun-2015

127 Views

Category:

Software

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Real-time systems always have an interesting architectures, and Wemlin Hub is no exception. In this talk we will examine this particular architecture of a sub-system of Netcetera’s Wemlin portfolio, a conduit responsible for gathering real-time passenger information data from transport authorities enriching it and consequently delivering them to people’s phones as fast as possible. A data hub for real-time passenger information, in short, which we call Wemlin Hub. The inspection of the broader context in which this sub-system operates in will uncover the key requirements on the basis of which the architectural decisions were being made. The carefully crafted restrictions and constraints eventually lead to greater freedom of change and recomposition in areas where this freedom is traditionally hard to win. We will see how this architecture allows us low cost zero-downtime system updates; graceful recovery from dependent on external system failures; horizontal and vertical scaling and most importantly, non-stop operation at low costs.

TRANSCRIPT

The Architecture of Wemlin HubOgnen Ivanovski, Netcetera Jazoon ‘14

Wemlin

Data

Data

Planning Software

Planning Software

AVCS

AVCS

Wemlin Hub

gWemlin Hub

Plan Data

transport schedule (plan) over certain time

package in nature

Formats: HAFAS, GTFS, VDV 453 REF, VDV 454 REF, REST

RT Streams

updates on planned data

stream in nature

Formats: GTFS RT, VDV-453, VDV-454, REST

What does the hub do

Gets the data from the source

Figures out what the hell the source is talking about

Stores the data and passes it to the clients

Getting Data

different protocols & implementations

remote system availability issues

Figuring things out

different sources, no referential guarantees

every data source is different

data quality issues

Example: station refeferences

Example: trip references

Data Enrichment

line colors

location

station metadata (e.g. which lines stop there, which stations are near by)

Serve many users

Summary

standardized data formats (but implementations vary)

disparate systems (referencing is hard)

enrichment

Key ConcernsRT

low latency

throughput

available

excllent failure management !

Batch

throughput

Key Concerns

processing programs the same for both batch and RT data

Key Concerns

fast reaction to load changes

Architecture

Constraints

decisions hard to change

Microkernel Architectural Style

Pipelines

all components must hook up together via small set of interfaces/contracts

microkernel style

pure (no external dependencies)

has been done successfully many times (httpd, tomcat to name a few)

Pipelines

sink / tap

filter

transform

aggregate

tap

filter

transform

aggregate

sink

tap

sink

filter!public interface Filter { ! boolean accept(Object obj); !}

stateless

transform public interface Transformer { ! Object transform(Object original); !}

stateless

aggregate public interface Aggregator { ! Optional<?> aggregate(Object obj); !}

CompositionpipeElement = PipelineBuilder.from(gtfsInputJunction()) .transform(new StoppingPlaceResolver()) .filter(new InvalidStopsFilter()) .transform(new UnresolvedLineVehicleTypeAdder()) .transform(new UnresolvedLineNameAdjuster()) .transform(new LineResolver()) .transform( new LineColorsEnricher(...)) .aggregate(new CacheAggregator(cache())) .to(nullSink());

Model

immutable: model objects are values (changing means a new object)

algebraic: each object identity is defined by it's contents.

pure: in the sense of no external dependencies

Schedule

Trip

Stop

Station

Line

Projection

StoppingPlace

ArchitectureModel

Pure Java

Immutable

Algebraic

Inverse References

!

Pipeline

Functional Microkernel

Filter (stateless, pure function)

Transformer (stateless, pure function)

Aggregator (stateful, function)

Sink (consumer) / Tap (producer)

pipeElement = PipelineBuilder.from(gtfsInputJunction()) .transform(new StoppingPlaceResolver()) .filter(new InvalidStopsFilter()) .transform(new UnresolvedLineVehicleTypeAdder()) .transform(new UnresolvedLineNameAdjuster()) .transform(new LineResolver()) .transform( new LineColorsEnricher(...)) .aggregate(new CacheAggregator(cache())) .to(nullSink());

pipeline “compile” assembly

Assemblies

Spring Integration based

Thread-pool based

Fork-join based

Segments

parallel / serial

separated by queues

segmentation based on the used interfaces

Fork-Join

to be effective, one must batch

batching introduces latency (in RT)

trick

batch mode

RT mode

Fork-Join

RT mode

batch mode

Memoization at construction time

Problem: immutable objects —> lots of created objects

!

@DesignatedFactoryMethod

AspectJ runtime weaver

hornetbackends frontend

Scaling

Storage

In-memory custom store

replacable

just an Aggregator

makes blue-green deployments a beeze

WinsConstraint

immutable model

algebraic model

Inverse references

controlled state

(functional) microkernel style !

Gain

execution stragety freedom

parallelism

scalability

memoization at constructor time

cacheability

composability

WinsConstraint

immutable model

algebraic model

Inverse references

controlled state

(functional) microkernel style !

Gain

execution stragety freedom

parallelism

scalability

memoization at constructor time

cacheability

composability

WinsConstraint

immutable model

algebraic model

Inverse references

controlled state

(functional) microkernel style !

Gain

execution stragety freedom

parallelism

scalability

memoization at constructor time

cacheability

composability

WinsConstraint

immutable model

algebraic model

Inverse references

controlled state

(functional) microkernel style !

Gain

execution stragety freedom

parallelism

scalability

memoization at constructor time

cacheability

composability

WinsConstraint

immutable model

algebraic model

Inverse references

controlled state

(functional) microkernel style !

Gain

execution stragety freedom

parallelism

scalability

memoization at constructor time

cacheability

composability

WinsConstraint

immutable model

algebraic model

Inverse references

controlled state

(functional) microkernel style !

Gain

execution stragety freedom

parallelism

scalability

memoization at constructor time

cacheability

composability

Architecture is Important

It’s the set of constraints you choose for your system

It how those constraints work in concert

It happens on a smaller scale than you usually think

AcknowledgmentsClojure (especially Rich Hickey’s talk on values, state and identity)

Laminahttps://github.com/ztellman/lamina

Apache Stormhttps://storm.incubator.apache.org

Casadinghttp://www.cascading.org

Akka http://akka.io

Clojure Transducers

Thank You

ognen.ivanovski@netcetera.com

@ognenivanovski ?

top related