the architecture of wemlin hub

The Architecture of Wemlin HubOgnen Ivanovski, Netcetera Jazoon ‘14

Wemlin

Planning Software

Wemlin Hub

gWemlin Hub

Plan Data

transport schedule (plan) over certain time

package in nature

Formats: HAFAS, GTFS, VDV 453 REF, VDV 454 REF, REST

RT Streams

updates on planned data

stream in nature

Formats: GTFS RT, VDV-453, VDV-454, REST

What does the hub do

Gets the data from the source

Figures out what the hell the source is talking about

Stores the data and passes it to the clients

Getting Data

different protocols & implementations

remote system availability issues

Figuring things out

different sources, no referential guarantees

every data source is different

data quality issues

Example: station refeferences

Example: trip references

Data Enrichment

line colors

location

station metadata (e.g. which lines stop there, which stations are near by)

Serve many users

Summary

standardized data formats (but implementations vary)

disparate systems (referencing is hard)

enrichment

Key ConcernsRT

low latency

throughput

available

excllent failure management !

throughput

Key Concerns

processing programs the same for both batch and RT data

Key Concerns

fast reaction to load changes

Architecture

Constraints

decisions hard to change

Microkernel Architectural Style

Pipelines

all components must hook up together via small set of interfaces/contracts

microkernel style

pure (no external dependencies)

has been done successfully many times (httpd, tomcat to name a few)

Pipelines

sink / tap

filter

transform

aggregate

filter

transform

aggregate

filter!public interface Filter { ! boolean accept(Object obj); !}

stateless

transform public interface Transformer { ! Object transform(Object original); !}

stateless

aggregate public interface Aggregator { ! Optional<?> aggregate(Object obj); !}

CompositionpipeElement = PipelineBuilder.from(gtfsInputJunction()) .transform(new StoppingPlaceResolver()) .filter(new InvalidStopsFilter()) .transform(new UnresolvedLineVehicleTypeAdder()) .transform(new UnresolvedLineNameAdjuster()) .transform(new LineResolver()) .transform( new LineColorsEnricher(...)) .aggregate(new CacheAggregator(cache())) .to(nullSink());

immutable: model objects are values (changing means a new object)

algebraic: each object identity is defined by it's contents.

pure: in the sense of no external dependencies

Schedule

Station

Projection

StoppingPlace

ArchitectureModel

Pure Java

Immutable

Algebraic

Inverse References

Pipeline

Functional Microkernel

Filter (stateless, pure function)

Transformer (stateless, pure function)

Aggregator (stateful, function)

Sink (consumer) / Tap (producer)

pipeElement = PipelineBuilder.from(gtfsInputJunction()) .transform(new StoppingPlaceResolver()) .filter(new InvalidStopsFilter()) .transform(new UnresolvedLineVehicleTypeAdder()) .transform(new UnresolvedLineNameAdjuster()) .transform(new LineResolver()) .transform( new LineColorsEnricher(...)) .aggregate(new CacheAggregator(cache())) .to(nullSink());

pipeline “compile” assembly

Assemblies

Spring Integration based

Thread-pool based

Fork-join based

Segments

parallel / serial

separated by queues

segmentation based on the used interfaces

Fork-Join

to be effective, one must batch

batching introduces latency (in RT)

batch mode

RT mode

Fork-Join

RT mode

batch mode

Memoization at construction time

Problem: immutable objects —> lots of created objects

@DesignatedFactoryMethod

AspectJ runtime weaver

hornetbackends frontend

Scaling

Storage

In-memory custom store

replacable

just an Aggregator

makes blue-green deployments a beeze

WinsConstraint

immutable model

algebraic model

Inverse references

controlled state

(functional) microkernel style !

execution stragety freedom

parallelism

scalability

memoization at constructor time

cacheability

composability

WinsConstraint

immutable model

algebraic model

Inverse references

controlled state

parallelism

scalability

cacheability

composability

WinsConstraint

immutable model

algebraic model

Inverse references

controlled state

parallelism

scalability

cacheability

composability

WinsConstraint

immutable model

algebraic model

Inverse references

controlled state

parallelism

scalability

cacheability

composability

WinsConstraint

immutable model

algebraic model

Inverse references

controlled state

parallelism

scalability

cacheability

composability

WinsConstraint

immutable model

algebraic model

Inverse references

controlled state

parallelism

scalability

cacheability

composability

Architecture is Important

It’s the set of constraints you choose for your system

It how those constraints work in concert

It happens on a smaller scale than you usually think

AcknowledgmentsClojure (especially Rich Hickey’s talk on values, state and identity)

Laminahttps://github.com/ztellman/lamina

Apache Stormhttps://storm.incubator.apache.org

Casadinghttp://www.cascading.org

Akka http://akka.io

Clojure Transducers

Thank You

ognen.ivanovski@netcetera.com

@ognenivanovski ?

the architecture of wemlin hub

transformnew lineresolver

rt data

public interface filter

gtfs rt

public interface transformer

filternew invalidstopsfilter

created objects

immutable objects

Software

exploiting a modern data architecture · a data hub on...

ltec 4550 networking components. 2 networking components hub...

dell emc integrated system for microsoft azure stack hub ·...

reference architecture: lenovo intelligent insights with sap...

oracle innovation hub · 2018-09-04 · built on tech...

huddle hub one / huddle hub one+ - configuration guide...

architecture tools hub 3

frequently asked questions - ebdale hub - city of … ·...

cloudera enterprise data hub reference architecture for...

implementing arcgis server as a hub within an enterprise...

architecture of wemlin hub

power hub rajasthan : the future energry hub of india

greater texas center marketplace · a4.2 - hub - cross...

avionics architecture using can - currawong engineering ·...

eosc services & architecture: the eosc-hub approach ·...

architecture, urban design and the building industry:...

supply, installation & operations of satellite hub...

introduction to big datathousands hours of annotated data....

thursday 31 may 2018 · practice research symposium design...

central application hub with home page menu information...