building reactive data-centric applications with vortex, apache spark and reactivex
TRANSCRIPT
Angelo Corsaro, PhD Chief Technology Officer
Reactive Data Centric Architectures with Vortex, Spark
and ReactiveX
Getting Reactive
Cop
yrig
ht P
rism
Tech
, 201
5
Up to recently Reactive Systems were mainstream only domains such as Industrial, Military, Telco, Finance and Aerospace
To address the challenges imposed by the Internet of Thing and Big Data Applications an increasing number of architects and developers is recognising the importance of the techniques and technologies used for building Reactive Systems
The Raising Importance of being Reactive
Cop
yrig
ht P
rism
Tech
, 201
5
The techniques and technologies traditionally used in a class of Reactive and Interactive Systems is gaining quite a bit of attention from the mainstream developers community
Recently, the Reactive Manifesto was defined to summarise the key traits of “Reactive Applications”
Reactive Manifesto
Responsive
Scalable
Event-Driven Resilient
http://www.reactivemanifesto.org/
Cop
yrig
ht P
rism
Tech
, 201
5
Event-Driven: modular, asynchronous
Responsive: Timely react to stimuli
Resilient: Tolerate failures functionally and ideally temporally
Scalable: Gracefully scale up and down depending on load demands
Reactive Manifesto
Responsive
Scalable
Event-Driven Resilient
http://www.reactivemanifesto.org/
Getting Data Centric
Cop
yrig
ht P
rism
Tech
, 201
5
The prevailing application-/service-centric mindset by focusing on the control-flow as opposed than the data-flow has lead to design systems that are hard to extend, scale and make fault-tolerant
An increasing number of architects has realised that the remedy is to change the main point of attention: Data is the centre of the universe — applications are ephemeral
The Importance of being Data-Centric
Cop
yrig
ht P
rism
Tech
, 201
5
Data-Centric Manifesto
http://datacentricmanifesto.org
the data-centric revolution puts data at the centre of the system
applications are optional visitors to the data
Cop
yrig
ht P
rism
Tech
, 201
5
Control-Flow-Centric Lightbulb
Example
Lightbulb- status(): bool- on():- off():- intensity(): float- intensity(i):
Cop
yrig
ht P
rism
Tech
, 201
5
Data-Flow-Centric Lightbulb
Example
Lightbulb- intensity: float- on: bool
Reactive and Data Centric Architectures
Cop
yrig
ht P
rism
Tech
, 201
5
Data-Centric: data is what matters. Application autonomously and asynchronously transform data
Responsive: Timely react to stimuli
Resilient: Tolerate failures functionally and ideally temporally
Scalable: Gracefully scale up and down depending on load demands
Reactive Data-Centric Manifesto
Responsive
Scalable
Data-Centric Resilient
http://www.reactivemanifesto.org/
Reactive Data-Centric Systems with Vortex
Step 1: Data Centricity in Vortex
Cop
yrig
ht P
rism
Tech
, 201
5
Vortex provides a Distributed Data Space abstraction where applications can autonomously and asynchronously read and write data enjoying spatial and temporal decoupling
Its built-in dynamic discovery isolates applications from network topology and connectivity details
Vortex’ Data Space is completely decentralised
High Level Abstraction
DDS Global Data Space
...
Data Writer
Data Writer
Data Writer
Data Reader
Data Reader
Data Reader
Data Reader
Data Writer
TopicAQoS
TopicBQoS
TopicCQoS
TopicDQoS
Cop
yrig
ht P
rism
Tech
, 201
5
Vortex supports the definition of Data Models.
These data models allow to naturally represent physical and virtual entities characterising the application domain
Vortex types are extensible and evolvable, thus allowing incremental updates and upgrades
Data Centricity
Cop
yrig
ht P
rism
Tech
, 201
5
A Topic defines a domain-wide information’s class
A Topic is defined by means of a (name, type, qos) tuple, where
• name: identifies the topic within the domain
• type: is the programming language type associated with the topic. Types are extensible and evolvable
• qos: is a collection of policies that express the non-functional properties of this topic, e.g. reliability, persistence, etc.
Topic
TopicTypeName
QoS
struct TemperatureSensor { @key long sid; float temp; float hum; }
Cop
yrig
ht P
rism
Tech
, 201
5
As explained in the previous slide a topic defines a class/type of information
Topics can be defined as Singleton or can have multiple Instances
Topic Instances are identified by means of the topic key
A Topic Key is identified by a tuple of attributes -- like in databases
Remarks: - A Singleton topic has a single domain-wide instance - A “regular” Topic can have as many instances as the number of different key
values, e.g., if the key is an 8-bit character then the topic can have 256 different instances
Topic and Instances
Cop
yrig
ht P
rism
Tech
, 201
5Vortex “knows” about application data types and uses this information provide type-safety and content-based routing
Content Awarenessstruct TemperatureSensor { @key long sid; float temp; float hum; }
sid temp hum101 25.3 0.6507 33.2 0.7913 27,5 0.551307 26.2 0.67
“temp > 25 OR hum >= 0.6”
sid temp hum101 25.3 0.6507 33.2 0.71307 26.2 0.67
Type
TempSensor
Cop
yrig
ht P
rism
Tech
, 201
5
For data to flow from a DataWriter (DW) to one or many DataReader (DR) a few conditions have to apply:
The DR and DW domain participants have to be in the same domain
The partition expression of the DR’s Subscriber and the DW’s Publisher should match (in terms of regular expression match)
The QoS Policies offered by the DW should exceed or match those requested by the DR
Quality of ServiceDomain
Participant
DURABILITY
OWENERSHIP
DEADLINE
LATENCY BUDGET
LIVELINESS
RELIABILITY
DEST. ORDER
Publisher
DataWriter
PARTITION
DataReader
Subscriber
DomainParticipant
offered QoS
Topicwrites reads
Domain Idjoins joins
produces-in consumes-from
RxO QoS Policies
requested QoS
Step 2: Reactive Architectures with Vortex
Cop
yrig
ht P
rism
Tech
, 201
5
Reactive Systems1
A Reactive System designates a permanently operating system that responds, reacts, to external stimuli at a speed determined by its environment, e.g. an industrial control system.
Reactive Systems
[1] Nicolas Halbwachs, Synchronous Programming of Reactive Systems
Cop
yrig
ht P
rism
Tech
, 201
5
Transformational Systems
A Transformational System designates a system that terminates after transforming its input into its output, e.g. a compiler
Transformational Systems
Cop
yrig
ht P
rism
Tech
, 201
5
Interactive Systems
An Interactive System designates a systems that permanently communicates with its environment at its own speed, e.g. a computer game
Interactive Systems
Cop
yrig
ht P
rism
Tech
, 201
5
Reactive react to an environment which “cannot wait”, e.g. reacting late has harsh consequences
response time needs to be deterministic
distributed concurrent systems
data-centric
Reactive vs. Interactive Systems
The term reactive systems is increasingly used to denote interactive systems with stringent response time, performance and scalability constraints.
Interactive react to an environment which “doesn’t like to wait”, e.g. reacting late induces QoS degradation
response time needs to be acceptable
distributed concurrent systems
data-centric
Cop
yrig
ht P
rism
Tech
, 201
5
Concurrency Aside from the concurrency between the system and its environment, it is natural to decompose these systems as an ensemble of concurrent components that cooperate to achieve the intended behaviour
Strict Temporal Requirements Reactive Systems have strict requirements with respect to the rate at which the need to process input as well as the response time for their outputs
Determinism The output of these systems are generally determined by the input and the occurrence time, e.g. no scheduling effects on the temporal properties of the output
Reliability Reactive Systems are often life/mission critical, as such reliability is of utmost importance
Reactive Systems Key Traits
Architectural Styles for Reactive Systems
Cop
yrig
ht P
rism
Tech
, 201
5
Stream Processing is a commonly adopted architectural style for building systems that operate over continuous (theoretically infinite) streams of data
Stream Processing is often applied in:
- Reactive Systems
- Interactive Systems
- Signal Processing Systems
- Functional Stream Programming
- Big Data Analytics
Stream Processing
Cop
yrig
ht P
rism
Tech
, 201
5
Stream Processing Architectures are ideal for modelling systems that react to streams of data produced by the cyber-phisical-systems, such as data produced by sensors, the stock market, etc.
Stream Processing Systems operate in real-time over streams and generate in turns other streams of data providing information on what is happening or suggesting actions to perform, such as by stock X, raise alarm Y, or detected spatial violation, etc.
Stream Processing
Cop
yrig
ht P
rism
Tech
, 201
5
Stream Processing Systems are typically modelled as collection of concurrent modules communicating via typed data channels
Modules usually play one of the following roles:
- Sources: Injecting data into the System
- Filters/Actors: Performing some computation over sources
- Sinks: Consuming the data produced by the system
Stream Processing
Filter
Source
FilterFilter
Stream Sink
Stream Processing with Vortex
Cop
yrig
ht P
rism
Tech
, 201
5
A stream represents a potentially infinite sequence of data samples of a given type T
As such, in Vortex, a stream can be naturally represented with a Topic
Example:
- Topic<PressureType>
- Topic<MarketData>
Streams in Vortex
Filter
FilterFilter
Stream
Cop
yrig
ht P
rism
Tech
, 201
5
Sources in Vortex are mapped to DataWriters for a given Topic
Sinks in Vortex are mapped to DataReaders for a given Topic
Source and Sinks in Vortex
Filter
Source
FilterFilter
Sink
Cop
yrig
ht P
rism
Tech
, 201
5
Filters implement bit and pieces of the application business logic
Each Filter has one or more input stream and one or more output streams
Obviously, a Filter consumes data from an input stream through a Vortex data reader and pushed data into an output stream through a Vortex data writer
Filters in Vortex
Filter
FilterFilter
Vortex Architectural Benefits
Cop
yrig
ht P
rism
Tech
, 201
5
Stream Processing Architectures in general and Vortex-based architecture in particular promote loose coupling
Anonymous Communication => No Spatial Coupling
Non-Blocking and Asynchronous Interaction => No control or temporal coupling
In Vortex the only thing that matters is the kind of information a module is interested in consuming or producing — who is producing the data or when the data has been produced is not relevant
Loose Coupling
Cop
yrig
ht P
rism
Tech
, 201
5
Location Transparency is an important architectural property as it makes it possible to completely decouple the logical decomposition of a system from its actual physical deployment
Vortex Automatic Discovery brings location transparency to the next level by enabling systems that are zero-conf, self-forming and self-healing
Location Transparency
Module
ModuleModule
Module
ModuleModule
Physical Deployment Unit
Logical Deployment Unit
Cop
yrig
ht P
rism
Tech
, 201
5
Vortex provides full control over temporal decoupling by means of its Durability Policy
As such the data produced by a Source can be:
- Volatile: available for those that are around at the time at production
- Transient: available as far as the system/source is running
- Durable: available forever
Temporal Decoupling
tSource
t
t
Sink
Sink
Volatile Durability
tSource
t
t
Sink
Sink
Transient Durability
Cop
yrig
ht P
rism
Tech
, 201
5
Vortex provides mechanism for detecting traditional faults as well as performance failures
The Fault-Detection mechanism is controlled by means of the Vortex Liveliness policy
Performance Failures can be detected using the Deadline Policy which allows to receive notification when data is not received within the expected delays
Failure Detection
Source
tSink
Fault Notification
TD
Source
tSink
Performance Failure Notification
P
t
P P
Cop
yrig
ht P
rism
Tech
, 201
5
Vortex provides a built-in fault-masking mechanism that allow to replicate Sources and transparently switch over when a failure occurs
At any point in time the “active” source is the one with the highest strength. Where the strength is an integer parameter controller by the user
Fault-Masking
Source
tSink
Source t
Cop
yrig
ht P
rism
Tech
, 201
5
Vortex provides a built-in fault-masking mechanism that allow to replicate Sources and transparently switch over when a failure occurs
At any point in time the “active” source is the one with the highest strength. Where the strength is an integer parameter controller by the user
Fault-Masking
Source
tSink
Source t
Cop
yrig
ht P
rism
Tech
, 201
5
Performance
Device-2-Device
• Peer-to-Peer Intra-core latency as low as 8 µs
• Peer-to-Peer latency as low as 30 µs
• Point-to-Point throughput well over 2.5M msg/sec
Device-2-Cloud
• Routing latency as low as 4 µs
• Linear scale out
• 44K* msgs/sec with a single router, 4x times more the average Tweets per second in the world (~6000 tweets/sec)!
*2048 bytes message payload
Cop
yrig
ht P
rism
Tech
, 201
5
Introduced in 2004, Vortex is an Object Management Group (OMG) Standard for efficient, secure and interoperable, platform- and programming-language independent data sharing
The Vortex Standard
Vortex has been recently recommended as one of the IIoT data-sharingplatform for the Industrial Internet Consortium Reference Architecture
Vortex and ReactiveX
Cop
yrig
ht P
rism
Tech
, 201
5
Reactive and Interactive Systems continuously interact with their environment
Stimuli coming from the cyber-physical world are often represented as events to be handled by the Reactive/Interactive System
- e.g. think about the event from a GUI, or the data available event in Vortex
Events are often handled through some form of callbacks, such as listeners, functors, etc. This stile of event handling leads to inversion of control
Dealing with Events
Cop
yrig
ht P
rism
Tech
, 201
5
The Inversion of Control induced by callback-based code in known to make applications brittle and harder to understand
As an mental exercise, think about the call-back style code you’d have to write to in Java for drawing when the mouse is being dragged. Although the logic is simple, it is far cry from having a “declarative style”
The problem of callback management, infamously known as Callback Hell, has become even more important due to the surge of Reactive Systems…
Can we escape the Callback Hell?
The Callback Hell
Cop
yrig
ht P
rism
Tech
, 201
5
Reactive Programming is a paradigm based on a relaxed form of Synchronous Dataflow Programming
The Reactive Programming is built around the notion of continuous time-varying values and propagation of change
Reactive Programming facilitates the declarative development of non-blocking event driven applications — in essence you express what should be done and the runtime decides when to do it
Reactive Programming was popularised in the context of Functional Programming Languages by the seminal done by Elliot and Hudak in 1997 as part of Fran — a framework for composing richly interactive multimedia animations
Reactive Programming
Cop
yrig
ht P
rism
Tech
, 201
5
There is an increasing number of Reactive Programming framework. The Rx framework introduced by Erik Meijer is becoming a reference
.NET Rx
Arguably the first framework that brought Reactive Programming to the masses. Available for the .NET platform
ReactiveX (http://reactivex.io)
An Open Source implementation of Rx for the JVM contributed by NetFlix, it supports, Java, Scala, Kotlin, Clojure, Groovy and JRuby
Reactive Programming Libraries
Cop
yrig
ht P
rism
Tech
, 201
5
ReactiveX is a Java implementation of .NET Reactive Extensions — a library for composing asynchronous and event-based programs by using observable sequences.
ReactiveX allow you to compose streams of data declaratively while abstracting away low level concerns such as threading, synchronization, concurrent data structures, and non-blocking I/O
The two main concepts at the foundation of ReactiveX are Observables and Observers
ReactiveX Overview
Cop
yrig
ht P
rism
Tech
, 201
5
ReactiveX Observables allow the composition of flows and sequences of asynchronous data
You can think of Observables as some kinds of asynchronous collections that push data to you
In essence an observable can be created from any synchronous or asynchronous stream of data.
- you can create an observables that represents the data coming into a Vortex Data Reader, the events from a Button, or even time
- you can also create an observable that represent an actual container
ReactiveX Observables
Cop
yrig
ht P
rism
Tech
, 201
5
Observable Examples
t1 2 3
val ticks: Observable[Long] = Observable.interval(1 seconds)
n4 50
t1 2 3
val evens: Observable[Long] = ticks.filter(n => n % 2 == 0)
n4 50 x x x
tval bufs: Observable[Seq[Long]] = evens.buffer(2,1)
n0 4 6 82
Cop
yrig
ht P
rism
Tech
, 201
5
The Observer receives notification concerning new values, errors or the completion of the stream — e.g. no more data
ReactiveX Observers
trait Observer[-‐T] { def onNext(value: T): Unit def onError(error: Throwable): Unit def onCompleted(): Unit }
Cop
yrig
ht P
rism
Tech
, 201
5
Beside from the standard ReactiveX/RxScala Observable, each framework that wants to integrate with ReactiveX/RxScala has to provide its own factory methods to create observables
For Vortex you have to use the DddsObservable defined as follows:
Vortex Observables
object DdsObservable {
def fromDataReaderData[T](dr: DataReader[T]): Observable[T]
def fromDataReaderEvents[T](dr: DataReader[T]): Observable[ReaderEvent[T]]
// more methods which we’ll ignore for the time being
}
Coding Lab
Cop
yrig
ht P
rism
Tech
, 201
5
We will use the omnipresent Shapes Application to illustrate the benefits of using Reactive Programming with Vortex
Three Topics
- Circle, Square, Triangle
One Type:
Shapes Application
struct ShapeType { string color; long x; long y; long shapesize; }; #pragma keylist ShapeType color
Spotted shapes represent subscriptions
Pierced shapes represent publications
Cop
yrig
ht P
rism
Tech
, 201
5
We want to show a triangle that positioned midway between every single pair of circle and square of the same colour
Problem 1: Shape in the Middle
Cop
yrig
ht P
rism
Tech
, 201
5
Logically
Shapes Marble Diagrams
t
t
• Conceptually, we need to select on stream, look at it one element at the time and match it with the corresponding item in the other stream
• Then, we can produce the average
Cop
yrig
ht P
rism
Tech
, 201
5
Logically
Shapes Marble Diagrams
t
t
t
x x
Cop
yrig
ht P
rism
Tech
, 201
5
Logically
Shapes Marble Diagrams
t
t
t
x x x x
P.S.: The dropping may seem a bit counter-intuitive but it is in part due to how the Vortex streams work
Cop
yrig
ht P
rism
Tech
, 201
5
Midpoint Triangle val circles = VortexObservable.fromDataReaderData { DataReader[ShapeType](Topic[ShapeType](circle)) } val squares = VortexObservable.fromDataReaderData { DataReader[ShapeType](Topic[ShapeType](square)) } val ttopic = Topic[ShapeType](triangle) val tdw = DataWriter[ShapeType](ttopic)
// Compute the average between circle and square of matching color with flatMap val triangles = circles.flatMap { c => squares.dropWhile(_.color != c.color).take(1).map { s => new ShapeType(s.color, (s.x + c.x)/2, (s.y + c.y)/2, (s.shapesize + c.shapesize)/4) } }
triangles.subscribe(tdw.write(_))
Cop
yrig
ht P
rism
Tech
, 201
5
Using for-comprehension // Compute the average between circle and square of matching colour with flatMap val triangles = circles.flatMap { c => squares.dropWhile(_.color != c.color).take(1).map { s => new ShapeType(s.color, (s.x + c.x)/2, (s.y + c.y)/2, (s.shapesize + c.shapesize)/4) } }
// Compute the average between circle and square of matching colour with for comprehension val triangles = for { c <-‐ circles; s <-‐ squares.dropWhile(_.color != c.color).take(1) } yield new ShapeType(s.color, (s.x + c.x)/2, (s.y + c.y)/2, (s.shapesize + c.shapesize)/4)
Cop
yrig
ht P
rism
Tech
, 201
5
Count the number of times a circle bounces on the left or top border and make the number of bounces proportional to the position of a square of the same color
Problem 2: Square Race
Cop
yrig
ht P
rism
Tech
, 201
5
Square Raceval circles = DdsObservable.fromReaderData {
DataReader[ShapeType](Topic[ShapeType]("Circle")) }
val sdw = DataWriter[ShapeType](Topic[ShapeType]("Square"))
val squares = circles.filter(s => { s.x == 0 || s.y == 0 }).map(s => { val x = xcoord(s.color) + delta xcoord(s.color) = x new ShapeType(s.color, x, ycoord(s.color), size) })
squares.subscribe(sdw.write(_))
Big Data with Vortex and Spark
Cop
yrig
ht P
rism
Tech
, 201
5
Spark is a fast and general-purpose — in memory — cluster computing system that provides high-level APIs in Java, Scala and Python
Spark has an optimised engine that supports general execution graphs
Spark supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming
Apache Spark
Cop
yrig
ht P
rism
Tech
, 201
5
Resilient Distributed Data Sets
Cop
yrig
ht P
rism
Tech
, 201
5
RDD Transformations
Cop
yrig
ht P
rism
Tech
, 201
5
RDD Operations
Cop
yrig
ht P
rism
Tech
, 201
5
Spark API
Cop
yrig
ht P
rism
Tech
, 201
5
Vortex can be used as a source or a sink for Apache Spark Streams
This allows to seamlessly integrate Vortex Application with Spark
… And at the same time
Vortex & Spark
Live Spark Lab
Cop
yrig
ht P
rism
Tech
, 201
5ReactiveX Examples:
- https://github.com/PrismTech/moliere-demo-rx
Spark Streaming for DDS
- https://github.com/kydos/spark-streaming-dds
Resources
Cop
yrig
ht P
rism
Tech
, 201
5