high performance distributed computing with dds and scala

9
High Performance Distributed Computing with DDS and Scala Angelo Corsaro Chief Technology Officer PrismTech Corp. 4 rue Angiboust Marcoussis, France [email protected] ABSTRACT The past few years have witnessed a tremendous increase in the amount of real-time data that applications in domains, such as, web analytics, social media, automated trading, smart grids, etc., have to deal with. The challenges faced by these applications, commonly called Big Data Applications, are manifold as the staggering growth in volumes is com- plicating the collection, storage, analysis and distribution of data. This paper focuses on the collection and distri- bution of data and introduces the Data Distribution Ser- vice for Real-Time Systems (DDS) an Object Management Group (OMG) standard for high performance data dissemi- nation used today in several Big Data applications. The pa- per then shows how the combination of DDS with functional programming languages, or at least incorporating functional features like the Scala programming language, makes a very natural and effective combination for dealing with Big Data applications. 1. INTRODUCTION The past few years have witnessed a tremendous increase in the amount of real-time data that applications in domains, such as, web analytics, social media, automated trading, smart grids, etc., have to deal with. The challenges faced by these applications, commonly called Big Data Applications, are manifold as the staggering growth in volumes is com- plicating the collection, storage, analysis and distribution of data. In this paper we focus our attention on the challenges tied to the collection and distribution of the large volumes of data characteristic of Big Data applications – the first and last stage on the pipeline shown in Figure 1 – and partly on its storage. Big data applications have to be capable of collecting as well as distributing massive amounts of data, much of which needs to be timely processed. As a result these applications need to optimally exploit networking as well as computing resources. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DEBS 2012, July 16–20, 2012, Berlin, Germany. Copyright 2012 ACM 978-1-4503-1315-5 ...$10.00. Collect Store Analyse Distribute Data Data Figure 1: Stages characteristics of Big Data appli- cations. In this paper we introduce the DDS an OMG standard for high performance data dissemination used today in several Big Data applications for data collection, dissemination as well as for high performance data durability. We also show how the combination of DDS with functional programming languages, or at least programming languages incorporating functional features like Scala [7], makes it a very natural and effective combination for dealing with the challenges of Big Data. DDS [8] is an OMG Publish/Subscribe (P/S) standard that enables scalable, real-time, dependable and high perfor- mance data exchanges between publishers and subscribers. DDS addresses the needs of mission- and business-critical applications, such as, financial trading, air traffic control and management, defense, aerospace, smart grids, and com- plex supervisory and telemetry systems. That key challenges addressed by DDS are to provide a P/S technology in which data exchanged between publishers and subscribers are: Real-time, meaning that the right information is de- livered at the right place at the right time–all the time. Failing to deliver key information within the re- quired deadlines can lead to life-, mission- or business- threatening situations. For instance in a financial trad- ing 1ms can make the difference between losing or gain- ing $1M. Likewise, in a supervisory applications for power-grids, failing to meet deadlines under an over- load situation could lead to severe blackout, such as the one experienced by the northeastern US and Canada in 2003 [1]. Dependable, ensuring availability, reliability, safety and integrity in spite of hardware and software failures. For instance, the lives of thousands of air travelers de- pend on the reliable functioning of an air traffic control and management system. These systems must ensure 99.999% availability and ensure that critical data is delivered reliably, regardless of experienced failures. High-performance, which necessitates the ability to distribute very high volumes of data with very low la-

Upload: angelo-corsaro

Post on 15-Jan-2015

3.175 views

Category:

Technology


0 download

DESCRIPTION

The past few years have witnessed a tremendous increase in the amount of real-time data that applications in domains, such as, web analytics, social media, automated trading, smart grids, etc., have to deal with. The challenges faced by these applications, commonly called Big Data Applications, are manifold as the staggering growth in volumes is com- plicating the collection, storage, analysis and distribution of data. This paper focuses on the collection and distri- bution of data and introduces the Data Distribution Ser- vice for Real-Time Systems (DDS) an Object Management Group (OMG) standard for high performance data dissemi- nation used today in several Big Data applications. The pa- per then shows how the combination of DDS with functional programming languages, or at least incorporating functional features like the Scala programming language, makes a very natural and effective combination for dealing with Big Data applications.

TRANSCRIPT

Page 1: High Performance Distributed Computing with DDS and Scala

High Performance Distributed Computingwith DDS and Scala

Angelo CorsaroChief Technology Officer

PrismTech Corp.4 rue Angiboust

Marcoussis, [email protected]

ABSTRACTThe past few years have witnessed a tremendous increase inthe amount of real-time data that applications in domains,such as, web analytics, social media, automated trading,smart grids, etc., have to deal with. The challenges faced bythese applications, commonly called Big Data Applications,are manifold as the staggering growth in volumes is com-plicating the collection, storage, analysis and distributionof data. This paper focuses on the collection and distri-bution of data and introduces the Data Distribution Ser-vice for Real-Time Systems (DDS) an Object ManagementGroup (OMG) standard for high performance data dissemi-nation used today in several Big Data applications. The pa-per then shows how the combination of DDS with functionalprogramming languages, or at least incorporating functionalfeatures like the Scala programming language, makes a verynatural and effective combination for dealing with Big Dataapplications.

1. INTRODUCTIONThe past few years have witnessed a tremendous increase

in the amount of real-time data that applications in domains,such as, web analytics, social media, automated trading,smart grids, etc., have to deal with. The challenges faced bythese applications, commonly called Big Data Applications,are manifold as the staggering growth in volumes is com-plicating the collection, storage, analysis and distribution ofdata.In this paper we focus our attention on the challenges tied

to the collection and distribution of the large volumes ofdata characteristic of Big Data applications – the first andlast stage on the pipeline shown in Figure 1 – and partlyon its storage. Big data applications have to be capable ofcollecting as well as distributing massive amounts of data,much of which needs to be timely processed. As a resultthese applications need to optimally exploit networking aswell as computing resources.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.DEBS 2012,July 16–20, 2012, Berlin, Germany.Copyright 2012 ACM 978-1-4503-1315-5 ...$10.00.

Collect Store Analyse DistributeData Data

Figure 1: Stages characteristics of Big Data appli-

cations.

In this paper we introduce the DDS an OMG standard forhigh performance data dissemination used today in severalBig Data applications for data collection, dissemination aswell as for high performance data durability. We also showhow the combination of DDS with functional programminglanguages, or at least programming languages incorporatingfunctional features like Scala [7], makes it a very natural andeffective combination for dealing with the challenges of BigData.

DDS [8] is an OMG Publish/Subscribe (P/S) standardthat enables scalable, real-time, dependable and high perfor-mance data exchanges between publishers and subscribers.DDS addresses the needs of mission- and business-criticalapplications, such as, financial trading, air traffic controland management, defense, aerospace, smart grids, and com-plex supervisory and telemetry systems. That key challengesaddressed by DDS are to provide a P/S technology in whichdata exchanged between publishers and subscribers are:

• Real-time, meaning that the right information is de-livered at the right place at the right time–all thetime. Failing to deliver key information within the re-quired deadlines can lead to life-, mission- or business-threatening situations. For instance in a financial trad-ing 1ms can make the difference between losing or gain-ing $1M. Likewise, in a supervisory applications forpower-grids, failing to meet deadlines under an over-load situation could lead to severe blackout, such as theone experienced by the northeastern US and Canadain 2003 [1].

• Dependable, ensuring availability, reliability, safetyand integrity in spite of hardware and software failures.For instance, the lives of thousands of air travelers de-pend on the reliable functioning of an air traffic controland management system. These systems must ensure99.999% availability and ensure that critical data isdelivered reliably, regardless of experienced failures.

• High-performance, which necessitates the ability todistribute very high volumes of data with very low la-

Page 2: High Performance Distributed Computing with DDS and Scala

tencies. As an example, financial auto-trading appli-cations must handle millions of messages per second,each delivered relialy with minimal latency, e.g., on theorder of tens of microseconds.

These characteristics make DDS a perfect fit for Big Dataapplications.

2. THE DATA DISTRIBUTION SERVICEP/S is a paradigm for one-to-many communication that

provides anonymous, decoupled, and asynchronous commu-nication between producers of data – the publishers – andconsumers of data – the subscribers. This paradigm is atthe foundation of many technologies used today to developand integrate distributed applications (such as social appli-cation, e-services, financial trading, etc.), while ensure thecomposed parts of the applications remain loosely coupledand independently evolvable.Different implementations of the P/S abstraction have

emerged to support the needs of different application do-mains. DDS [8] is an OMG P/S standard that enables scal-able, real-time, dependable and high performance data ex-changes between publishers and subscribers. DDS addressesthe needs of mission- and business-critical applications, suchas, financial trading, air traffic control and management, de-fense, aerospace, smart grids, and complex supervisory andtelemetry systems. That key challenges addressed by DDSare to provide a P/S technology in which data exchangedbetween publishers and subscribers are:

• Real-time, meaning that the right information is de-livered at the right place at the right time–all thetime. Failing to deliver key information within the re-quired deadlines can lead to life-, mission- or business-threatening situations. For instance in a financial trad-ing 1ms can make the difference between losing or gain-ing $1M. Likewise, in a supervisory applications forpower-grids, failing to meet deadlines under an over-load situation could lead to severe blackout, such as theone experienced by the northeastern US and Canadain 2003 [1].

• Dependable, thus ensuring availability, reliability, safetyand integrity in spite of hardware and software failures.For instance, the lives of thousands of air travelers de-pend on the reliable functioning of an air traffic controland management system. These systems must ensure99.999% availability and ensure that critical data isdelivered reliably, regardless of experienced failures.

• High-performance, which necessitates the ability todistribute very high volumes of data with very low la-tencies. As an example, financial auto-trading appli-cations must handle millions of messages per second,each delivered relialy with minimal latency, e.g., on theorder of tens of microseconds.

2.1 DDS Standard OrganizationThe key members of the OMG DDS standards family are

shown in Figure 2 and consist of the DDS v1.2 API [8] andthe Data Distribution Service Interoperability Wire Proto-col (DDSI) [9] and the DDS Extensible and Dynamic TopicTypes (DDS-XTYPES) [10] specification. The DDS APIstandard defines the key DDS abstractions and their seman-tics, this standard also ensures source code portability across

different vendor implementations. The DDSI Standard en-sures on the wire interoperability across DDS implementa-tions from different vendors. The DDS-XTYPES standardextends the basic DDS type-system with polymorphic struc-tural types supporting extensibility and evolvability.

Ownership DurabilityContent

Subscription

Minimum Profile

Data Centric Publish Subscriber (DCPS)

Application

DDS Interoperability Wire Protocol - DDSI-RTPS

Application

ApplicationUDP/IP

X-T

ypes

C/C++ C# Java Scala

Figure 2: The DDS Standard.

The DDS standard was formally adopted by the OMGin 2004. It quickly became the established P/S technol-ogy for distributing high volumes of data dependably andwith predictable low latencies in applications such as radarprocessors, flying and land drones, combat management sys-tems, air traffic control and management, high performancetelemetry, supervisory control and data acquisition systems,and automated stocks and options trading. Along with widecommercial adoption, the DDS standard has been mandatedas the technology for real-time data distribution by organi-zation worldwide, including the US Navy, the Departmentof Defence (DoD) Information-Technology Standards Reg-istry (DISR) the UK Mistry of Defence (MoD), the Mili-tary Vehicle Association (MILVA), and EUROCAE–the Eu-ropean organization that regulates standards in Air TrafficControl and Management.

2.2 Key DDS Concepts and EntitiesBelow we summarize the key architectural concepts and

entities in DDS.

2.2.1 Global data spaceThe key abstraction at the foundation of DDS is a fully

distributed Global Data Space (GDS) (see Figure 3). TheDDS specification requires a fully distributed implementa-tion of the GDS to avoid single points of failure or singlepoints of contention. Publishers and Subscribers can joinor leave the GDS at any point in time as they are dynam-ically discovered. The dynamic discovery of Publisher andSubscribers is performed by the GDS and does not rely onany kind of centralized registry, such as those found in otherP/S technologies like the Java Message Service (JMS) [6].The GDS also discovers application defined data types andpropagates them as part of the discovery process.

Since DDS provides a GDS equipped with dynamic discov-ery, there is no need for applications to configure anythingexplicitly when a system is deployed. Applications will beautomatically discovered and data will begin to flow. More-

Page 3: High Performance Distributed Computing with DDS and Scala

...

TopicA

TopicBTopicC

TopicD

Data

Writer

Data

Writer

Data

Writer

Data

Writer

Data

Reader

Data

Reader

Data

Reader

Data

Reader

Figure 3: The DDS Global Data Space.

over, since the GDS is fully distributed the crash of one nodewill not induce unknown consequences on the system avail-ability, i.e., in DDS there is no single point of failure and thesystem as a whole will continue to run even if applicationscrash/restart or connect/disconnect.

2.2.2 TopicThe information that populates the GDS is defined by

means of DDS Topics. A topic, identified by a unique name,a data type, and a collection of Quality of Service (QoS)policies, defines conceptually a class of data streams. Theunique name provides a mean of uniquely referring to giventopics, the data type defines the type of the stream of data,and the QoS captures all the non-functional aspect of theinformation, such as its temporal properties or its availabil-ity.

Topic types..DDS Topics can be specified using several different syn-

taxes, such as Interface Definition Language (IDL), eXten-sible Markup Langauge (XML), Unified Modeling Langauge(UML), and annotated Java. For instance, Listing 1 showsa type declaration for an hypothetical temperature sensortopic-type. Some of the attributes of a topic-type can bemarked as representing the key of the type.

Listing 1: Topic type declaration for an hypotheticaltemperature sensor

1 struct TempSensorType {@Key

3 long id;float temp;

5 float hum;};

The key allows DDS to define the selector that identifies aspecific data stream, commonly called a topic instance. Forinstance, the topic type declaration in Listing 1 defines theid attributes as being the key of the type. Each uniqueid value therefore identifies a specific data stream (topicinstance) for which DDS will manage the entire life-cycle.This allows applications to (1) identify the specific sourceof data, such as the specific physical sensor whose id=5,and (2) learn about relevant life-cycle events such as the thecreation and disposal of streams. In addition, DDS providesapplications with information concerning the existence of

producers for each data-stream. Figure 4 provides a visual

Topic

InstancesInstances

struct TempSensor { @key long id; float temp; float hum;};

id =701

id =809

id =977

Figure 4: Topic Instances and Data Streams.

representation of the relationship existing between a topic,its instances, and the associated data streams.

Topic QoS..The Topic QoS provides a mechanism to express relevant

non-functional properties of a topic. Section 4 presents adetailed description of the DDS QoS model, but at this pointwe simply mention the ability of defining QoS for topics tocapture the key non-functional invariant of the system andmake them explicit and visible.

Content filters..DDS supports defining content filters over a specific topic.

These content filters are defined by instantiating a Content-FilteredTopic for an existing topic and providing a filterexpression. The filter expression follows the same syntax ofthe WHERE clause on a SQL statement and can operateon any of the topic type attributes. For instance, a filterexpression for a temperature sensor topic could be "id =101 AND (temp > 35 OR hum > 65)".

2.2.3 DataWriters and DataReadersSince a topic defines the subjects produced and consumed,

DDS provides two abstractions for writing and reading thesetopics: DataWriterss and DataReaders, respectively. BothDataReaders and DataWriters are strongly typed and aredefined for a specific topic and topic type.

2.2.4 Publishers, Subscribers and PartitionsDDS also defines Publishers and Subscribers, which group

together sets of DataReaders and DataWriters and performsome coordinated actions over them, as well as manage thecommunication session. DDS also supports the concept of aPartition which can be used to organize data flows within aGDS to decompose un-related sets of topics.

2.3 Example of applying DDSNow that we presented the key architectural concepts and

entities in DDS, we will show the anatomy of a simple DDSapplication in Scala using the Escalier DDS API [4]. List-ing 2 shows the steps required to write a sample for thetemperature sensor topic describe earlier. In this specificexample, since no domain is explicitly joined, the runtimewill assume that the application is interested in joining thedefault domain.

Page 4: High Performance Distributed Computing with DDS and Scala

Listing 2: A simple DDS writer.

object TempSensorWriter {2 def main(args: Array[String]) {

4 // Define a topicval topic =

6 Topic[TempSensor]("TTempSensor")

8 // Create a Writerval writer =

10 DataWriter[TempSensor](topic)

12 // Create a Sampleval ts =

14 new TempSensor(0, 25, 0.6F)

16 // Write the samplewriter ! ts

18 }}

Note that no QoS has been specified for any of the DDSentities defined in this code fragment, so the behavior willbe the default QoS.

Listing 3: A simple DDS reader.

1 object SensorLogger {def main(args: Array[String]) {

3

// Define a topic5 val topic =

Topic[TempSensor]("TTempSensor")7

// Create a Reader9 val reader =

DataReader[TempSensor](topic)11

// Register a reaction for printingdata

13 reader.reactions += {case DataAvailable(dr) => {

15 (reader read) foreach (println)}

17 }}

19 }

Listing 3 shows the steps required to read the data samplespublished on a given topic. The code fragment shows theapplication registering a reaction with the DDS data readerby providing a partially specified function. The partiallydefined function uses pattern matching to select the eventin which it is interested. Specifically, the code in Listing 3registers a reaction that will print received data on the con-sole. It is worth noticing how the elegance and simplicityof this simple application comes from a combination of thedata-centric DDS paradigm and the use of Scala functionalfeatures such as partial and higher order functions.

3. THE DDS TYPE SYSTEMStrong typing plays a key role in developing software sys-

tems that are efficient, easier to extend, less expensive tomaintain, and in which runtime errors are limited to thosecomputational aspects that either cannot be decided at compile-time or that cannot be detected at compile-time by the typesystem. Since DDS targets mission- and business-criticalsystems where safety, extensibility, and maintainability are

critically important, DDS adopted a strongly, and somewhatstatically, typed system to define type properties of a topic.This section provides an overview of the DDS type system.

3.1 Structural Polymorphic Types SystemDDS provides a polymorphic structural type system [3, 2,

10], which means that the type system not only supportspolymorphism, but also bases its sub-typing on the structureof a type, as opposed than its name. For example, considerthe types declared in Listing 4.

Listing 4: Nominal vs. Structural Subtypes

1 struct Coord2D {long x;

3 long y;};

5

struct Coord3d : Coord2D {7 long z;};

9

struct Coord {11 long x;

long y;13 long z;

};

In a nominal polymorphic type system, the Coord3D wouldbe a subtype of Coord2D, which would be expressed bywriting Coord3D <: Coord2D. In a nominal type sys-tem, however, there would be no relationship between theCoord2D/Coord3D with the type Coord.

Conversely, in a polymorphic structural type system likethe one used by DDS the type Coord is a subtype of the typeCoord2D—thus Coord <: Coord2D and it is structurallythe same type as the type Coord3D.

The main advantage of a polymorphic structural type sys-tem over nominal type systems is that the former considersthe structure of the type as opposed to the name to de-termine sub-types relationships. As a result, polymorphicstructural types systems are more flexible and especiallywell-suited for evolvable and large-scale distributed systems.In particular, distributed systems often need to evolve incre-mentally to provide a new functionality to most subsystemsand systems, without requiring a complete upgrade and re-deployment.

In the example above the type Coord was a monotonicextension of the type Coord2D since it was added some newattributes “at the end.” The DDS type system can handleboth attribute reording and attribute removal with equalalacrity.

3.2 Type annotationsThe DDS type systems supports an annotation system

very similar to that available in Java [5]. It also defines a setof built-in annotations that can be used to control the exten-sibility of a type, as well as the properties of the attributesof a given type. Some important built-in annotations aredescribed below:

• The @ID annotation can be used to assign a globalunique ID to the data members of a type. This ID isused to deal efficiently with attributes reordering.

• The @Key annotation can be used to identify the typemembers that constitute the key of the topic type.

Page 5: High Performance Distributed Computing with DDS and Scala

HISTORY

LIFESPAN

DURABILITY

DEADLINE

LATENCY BUDGET

TRANSPORT PRIO

TIME-BASED FILTER

RESOURCE LIMITS

USER DATA

TOPIC DATA

GROUP DATA

OWENERSHIP

OWN. STRENGTH

LIVELINESS

ENTITY FACTORY

DW LIFECYCLE

DR LIFECYCLE

PRESENTATION

RELIABILITY

PARTITION

DEST. ORDER

RxO QoS Local QoS

Figure 5: DDS QoS Policies.

• The @Optional annotations can be used to expressthat an attribute is optional and might not be setby the sender. DDS provides specific accessors foroptional attributes that can be used to safely checkwhether the attribute is set or not. In addition, to savebandwidth, DDS will not send optional attributes forwhich a value has not been provided.

• The @Shared annotation can be used to specify thatthe attribute must be referenced through a pointer.This annotations helps avoid situations when large data-structures (such as images or large arrays) would beallocated contiguously with the topic type, which maybe undesirable in resource-contrained embedded sys-tems.

• The @Extensibility annotation can be used to con-trol the level of extensibility allowed for a given topictype. The possible values for this annotation are (1)Final to express the fact that the type is sealed andcannot be evolved – as a result this type cannot besubstituted by any other type that is structurally itssubtype, (2) Extensible to express that only mono-tonic extensibility should be considered for a type, and(3) Mutable to express that the most generic struc-tural subtype rules for a type should be applied whenconsidering subtypes.

In summary, the DDS type system provides all the advan-tages of a strongly-typed system, together with the flexibil-ity of structural type systems. This combinations supportsthe key requirements typical of a large class of distributedsystems since it preserves types end-to-end, while providingtype-safe extensibility and incremental system evolution andhigh performance!

4. THE DDS QOS MODELDDS relies on QoS policies to provides applications with

explicit control over a wide set of non-functional properties,such as data availability, data delivery, data timeliness andresource usage – Figure 5 shows the full list of availableQoS. Each QoS policy might be applicable to one or moreDDS entities, such as, topics, data readers, and data writers.Policies controlling end-to-end properties are considered aspart of the subscription’s matching.DDS uses a request vs. offered QoS matching approach in

which a data reader matches a data writer if and only if theQoS it is requesting for the given topic does not exceed (e.g.,is no more stringent than) the QoS with which the data isproduced by the data writer.

DDS subscriptions are matched against the topic type andname, as well as against the QoS being offered/requested bydata writers and readers. This DDS matching mechanismensures that (1) types are preserved end-to-end due to thetopic type matching and (2) end-to-end QoS invariants arealso preserved.

The reminder of this section describes the most importantQoS policies in DDS.

4.1 Data AvailabilityDDS provides the following QoS policies that control the

availability of data to domain participants:

• The DURABILITY QoS policy controls the lifetime ofthe data written to the global data space in a DDS do-main. Supported durability levels are (1) VOLATILE,which specifies that once data is published it is notmaintained by DDS for delivery to late joining appli-cations, (2) TRANSIENT_LOCAL, which specifies thatpublishers store data locally so that late joining sub-scribers get the last published item if a publisher is stillalive, (3) TRANSIENT, which ensures that the GDSmaintains the information outside the local scope ofany publishers for use by late joining subscribers, and(4) PERSISTENT, which ensures that the GDS storesthe information persistently so to make it available tolate joiners even after the shutdown and restart of thewhole system. Durability is achieved by relying ona durability service whose properties are configuredby means of the DURABILITY_SERVICE QoS of non-volatile topics.

• The LIFESPANQoS policy controls the interval of timeduring which a data sample is valid. The default valueis infinite, with alternative values being the time-spanfor which the data can be considered valid.

• The HISTORY QoS policy controls the number of datasamples (i.e., subsequent writes of the same topic) thatmust be stored for readers or writers. Possible valuesare the last sample, the last n samples, or all samples.

The data availability QoS policies decouple applications intime and space. They also enable these applications to coop-erate in highly dynamic environments characterized by con-tinuous joining and leaving of publisher/subscribers. Suchproperties are particularly relevant for distributed systemssince they increase the decoupling of the component parts.

4.2 Data DeliveryDDS provides the following QoS policies that control how

data is delivered and how publishers can claim exclusiverights on data updates:

• The PRESENTATION QoS policy gives control on howchanges to the information model are presented to sub-scribers. This QoS gives control on the ordering as wellas the coherency of data updates. The scope at whichit is applied is defined by the access scope, which canbe one of INSTANCE, TOPIC, or GROUP level.

• The RELIABILITY QoS policy controls the level of re-liability associated with data diffusion. Possible choicesare RELIABLE and BEST_EFFORT distribution.

Page 6: High Performance Distributed Computing with DDS and Scala

• The PARTITION QoS policy gives control over the as-sociation between DDS partitions (represented by astring name) and a specific instance of a publisher/sub-scriber. This association provides DDS implementa-tions with an abstraction that allow to segregate traf-fic generated by different partitions, thereby improvingoverall system scalability and performance.

• The DESTINATION_ORDERQoS policy controls the or-der of changes made by publishers to some instanceof a given topic. DDS allows the ordering of differentchanges according to source or destination timestamps.

• The OWNERSHIP QoS policy controls whether it is al-lowed for multiple data writers to concurrently up-date a given topic instance. When set to EXCLUSIVE,this policy ensures that only one among active datawriters–namely the one with the highest OWNERSHIP-_STRENGTH–will change the value of a topic instance.The other writers, those with lower strength, are stillable to write, yet their updates will not have an impactvisible on the distributed system. In case of failure ofthe highest strength data writer, DDS automaticallyswitches to the next among the remaining data writ-ers.

The data delivery QoS policies control the reliability andavailability of data, thereby allowing the delivery of the rightdata to the right place at the right time. More elaborateways of selecting the right data are offered by the DDScontent-awareness profile, which allows applications to se-lect information of interest based upon their content. TheseQoS policies are particularly useful since they can be used tofinely tune how—and to whom—data is delivered, thus lim-iting not only the amount of resources used, but also mini-mizing the level of interference by independent data streams.

4.3 Data timelinessDDS provides the following QoS policies to control the

timeliness properties of distributed data:

• The DEADLINE QoS policy allows applications to de-fine the maximum inter-arrival time for data. DDScan be configured to automatically notify applicationswhen deadlines are missed.

• The LATENCY_BUDGET QoS policy provides a meansfor applications to inform DDS how long time the mid-dleware can take in order to make data available tosubscribers. When set to zero, DDS sends the dataright away, otherwise it uses the specified interval toexploit temporal locality and batch data into biggermessages so to optimize bandwidth, CPU and batteryusage.

• The TRANSPORT_PRIORITY QoS policy allows appli-cations to control the priority associated with the dataflowing on the network. This priority is used by DDSto prioritize more important data relative to less im-portant data.

The DEADLINE, LATENCY_BUDGET and TRANSPORT-_PRIORITY QoS policy provide the controls necessaryto build priority pre-emptive distributed real-time sys-tems. In these systems, the TRANSPORT_PRIORITYis derived from a static priority scheduling analysis,

such as Rate Monotonic Analysis, the DEADLINE QoSpolicy represents the natural deadline of informationand is used by DDS to notify violations, finally theLATENCY_BUDGET is used to optimize the resource uti-lization in the system.

These DDS data timeliness QoS policies provide control overthe temporal properties of data. Such properties are partic-ularly relevant since they can be used to define and controlthe temporal aspects of various subsystem data exchanges,while ensuring that bandwidth is exploited optimally.

4.4 ResourcesDDS defines the following QoS policies to control the net-

work and computing resources that are essential to meetdata dissemination requirements:

• The TIME_BASED_FILTER QoS policy allows appli-cations to specify the minimum inter-arrival time be-tween data samples, thereby expressing their capabil-ity to consume information at a maximum rate. Sam-ples that are produced at a faster pace are not deliv-ered. This policy helps a DDS implementation op-timize network bandwidth, memory, and processingpower for subscribers that are connected over limitedbandwidth networks or which have limited computingcapabilities.

• The RESOURCE_LIMITS QoS policy allows applica-tions to control the maximum available storage to holdtopic instances and related number of historical sam-ples DDS’s QoS policies support the various elementsand operating scenarios that constitute net-centric mis-sion critical information management. By controllingthese QoS policies it is possible to scale DDS fromlow-end embedded systems connected with narrow andnoisy radio links, to high-end servers connected to high-speed fiber-optic networks.

These DDS resource QoS policies provide control over the lo-cal and end-to-end resources, such as memory and networkbandwidth. Such properties are particularly useful in en-vironments characterized by largely heterogeneous subsys-tems, devices, and network connections that often requiredown-sampling, as well as overall controlled limit on theamount of resources used.

4.5 ConfigurationThe QoS policies described above, provide control over the

most important aspects of data delivery, availability, timeli-ness, and resource usage. DDS also supports the definitionand distribution of user specified bootstrapping informationvia the following QoS policies:

• The USER_DATA QoS policy allows applications to as-sociate a sequence of octets to domain participant,data readers and data writers. This data is then dis-tributed by means of a built-in topic—which are topicspre-defined by the DDS standard and used for inter-nal purposes. This QoS policy is commonly used todistribute security credentials.

• The TOPIC_DATA QoS policy allows applications toassociate a sequence of octet with a topic. This boot-strapping information is distributed by means of a

Page 7: High Performance Distributed Computing with DDS and Scala

built-in topic. A common use of this QoS policy isto extend topics with additional information, or meta-information, such as IDL type-codes or XML schemas.

• The GROUP_DATA QoS policy allows applications toassociate a sequence of octets with publishers and sub-scribers – this bootstrapping information is distributedby means built-in topics. A typical use of this infor-mation is to allow additional application control oversubscriptions matching.

These DDS configuration QoS policies provide useful a mech-anism for bootstrapping and configuring applications in adistributed systems.

5. DDS CACHES AND CHANNELSEach DDS data writer and data reader has an associated

data cache. The data writer cache stores a subset of thesamples written. The data reader cache stores a subset ofthe samples produced by its matching data writers. Theexact behaviour of the cache, w.r.t. the samples it will store,is controlled through DDS QoS policies.One way of abstracting data readers and data writers

caches and their semantics under different QoS policies, alongwith data distribution, is to reason in terms of channels.A DDS data writer and its matching data readers can bethought of as connected by a logical typed communicationchannel (see Figure 6). DDS’ QoS policies define the prop-erties controlling the set of samples, produced by the datawriter, that will be eventually received by the matching datareaders.At the two extreme, DDS provides support for Last n-

Values Channel and for FIFO Channel. In order to reasonabout the samples that a reader might receive we will beconsidering the RELIABILITY QoS as set to RELIABLE.If that was not the case, the only guarantee that can beprovided to a matching data reader is that it will receive asubset of the data produced by the matching data writer.This subset will in general be different for each matchingreader.In the remainder of this section we will investigate the QoS

policies that provide control over the communication chan-nel and will describe the semantics associated with thesechannels. As a final remark it is worth pointing out thatthese channels are logical and fully distributed.

DR

DR

DR

TopicDW

Figure 6: The logical channel between a DDS data

writer and its matching readers

5.1 Last n-Values ChannelTo obtain a Last n-Values Channel the HISTORY QoS

policy has to be set to KEEP_LAST for a depth n. In thiscase we are guaranteed that the data reader will be receiv-ing the last n-samples produced by the data writer. TheDURABILITY QoS can be used in this case to decouple theproduction of data by the data writer with the consumptionof data by the data reader. The Last n-Values Channel isuseful to represent distributed state. For instance in severalBig Data analytics application, when performing the analysephase (see Figure 1) what matters are the current values, orperhaps the last n values, of the relevant indexes as opposedto the the full set of changes since the last analyse phase.

Last n-Values Channels should always be used when mod-elling distributed state as they allow to minimize the use ofcomputational, storage and network resources.

5.2 FIFO ChannelTo obtain a FIFO Channel the HISTORY QoS policy has

to be set to KEEP_ALL. In this case we are guaranteed thatthe data reader will be receiving all the samples produced bythe matching data writers. The DURABILITY QoS can beused to decouple the production of data, by the data writer,with the consumption of data, by the data reader, but be-ware that for TRANSIENT and PERSISTENT DURABILITYthe set of samples stored by the durability service is con-trolled through a specific policy, thus this QoS does not nec-essarily imply that all samples have to be persisted.

The FIFO Channel is useful to represent queues as well asdistributed events. This kind of channel is also very usefulwhen implementing distributed algorithms such as mutualexclusion, or distributed queues over DDS.

5.3 DDS Read and TakeTo complete the picture, along with describing the two key

channel semantics provided by DDS we need to explain howreceived data can be retrieved by the application. To eachDDS data reader is associated a local cache where the datareceived by the matching data writers is stored. The appli-cation can read the content of the cache using both state aswell as content selectors in which case the samples remainin the cache and are simply marked as read, i.e. the state ofthe samples is changed. Otherwise, the application can takethe samples from the cache, in which case the samples areremoved by the cache.

In general, the read operation is used in combination withLast n-Values Channel, while the take operation is used withthe FIFO Channels. It is a common beginners programmingerror to use read with FIFO Channels with the result thatthe resource usage will keep growing – unless the user hadspecified some resource limits.

5.4 SelectorsAs mentioned before, DDS provides both content as well

as state selectors to specify the properties of the samplesthat should be read.

5.4.1 Content Selectors

Content Filters.One of the mechanisms provided by DDS to select the

content that will be received by a data reader are ContentFiltered Topics. These kind of topics are created by associ-

Page 8: High Performance Distributed Computing with DDS and Scala

ating query expression with an existing topic (see Listing 5).The query expression specify the filter that will be evaluatedagainst the samples produced by the matching writers to de-termine whether the sample is relevant for the reader or not.In the example shown in Listing 5, only the samples whosetemperature is greater than 25 or whose humidity is higherthan 75% that will be seen by the data reader.

Listing 5: Content Filtered Topic.

1 val topic =Topic[TempSensor]("TTempSensor")

3 val query =Query("temp > %0 || hum > %1", List(25,

0.75F)5 val ftopic =

ContentFilteredTopic[TempSensor]("CFTempSensor",topic, query)

7 val reader =DataReader[TempSensor](ftopic)

Queries.DDS also provide a mechanism for executing queries over

the content of the data reader cache (see Listing 6). Themain difference between queries and content filtered topicsis that the former allows to select samples of the cache basedon their content, while the latter controls the samples thatget into the cache based on their content.

Listing 6: Query Selector.

1 val data =(reader filterContent(query)) read

List Comprehension.Another way of manipulating the data available on the

DDS cache is through list comprehension. In this case, thehigher order functions defined as part of the Scala List classcan be used to perform arbitrary operations on the itemsavailable on the cache. As an example, assuming that wewanted to compute the moving average of the temperatureover the last n samples, we could have used a DDS datareader with history depth set to n and then simply com-puted the moving average using the List fold right functionas shown in Listing 7.

Listing 7: List Comprehensions in DDS with fold-right.

val data =2 reader readval mavg =

4 (data :\ 0) (_ + _) / (data size)

Another way to achieve the same goal is to first map thelist of TempSensorType to a list of temperatures and thecompute the moving average. This approach is shown inFigure 8

Listing 8: List Comprehensions in DDS with map.

val data=2 (reader read) map (_.temp)val mavg =

4 (data sum) / (data size)

It is worth noticing how the strongly typed nature of DDSperfectly mixes with the Scala type system producing ele-gant, compact and very efficient code.

Streams/Instances.Often applications need to access the samples belonging

to a specific stream or topic instance. One way to achievethis is to query for a specific key value. A better way is toselect a stream by using an instance handle – the DDS spe-cific way of identifying a stream or topic instance. Listing 9shows how the data belonging to a specific stream can beprogrammatically selected.

Listing 9: Query Selector.

val handle = reader lookup (key)2 val data =

(reader instance(handle)) read

5.4.2 State SelectorsIn addition to content-based selectors, DDS also provides

state selectors which allow to select samples based on theirstate. Specifically, DDS state selectors allow to select sam-ples with respect to three different states, the READ_STATEthe VIEW_STATE and the INSTANCE_STATE. The READ-_STATE allows to distinguish between new samples and sam-ples that have already been read. The VIEW_STATE allowsto distinguish between samples belonging to a new instanceas opposed to samples belonging to an instance that werealready known. Finally the INSTANCE_STATE allows to dis-tinguish between samples belonging to an instance for whichthere are still active writers as opposed to a stream for whichthere are no more writers. The Listing 10 shows an exampleof state-based selector.

Listing 10: Instance Selector.

1 val data =(reader

3 filterState(SampleState.NewData)) read

Finally, it is worth mentioning that the content and stateselectors can be mixed and matched at will to perform com-plex selection operations as shown in the example of List-ing 11. Additionally, selectors can be used with read as wellas with take operations.

Listing 11: Example of Selectors composition.

val data =2 (reader

4 instance(handle)filterContent(query)

6 filterState(SampleState.NewData)) read

Listing 11 is showing an examples in which the data selectedis that belonging to a specific stream for which a query holds,and that have never been read before.

Obviously, list comprehension can be used for operatingon samples content and state. Differently from DDS built-inselectors, list comprehension operates on the samples reador taken, as opposed to being used for selecting the samples

Page 9: High Performance Distributed Computing with DDS and Scala

to read or take from the cache. Thus when using list com-prehension one should always be careful in selecting first theset of samples from the cache.

6. CONCLUDING REMARKSThis paper has introduced the key concepts at the foun-

dation of the OMG DDS and has shown how it combinationwith programming languages exposing functional features,such as Scala, makes a very natural and elegant combina-tion.The DDS standard is used today a large number of mission-

and business- critical systems and is becoming more andmore relevant to Big Data applications due to its perfor-mance its scalability and extensibility.

7. REFERENCES[1] http://bit.ly/nebout03, 2003.

[2] L. Cardelli. Type systems. ACM Comput. Surv.,28:263–264, 1996.

[3] L. Cardelli and P. Wegner. On understanding types,data abstraction, and polymorphism. ACM ComputingSurveys, 17(4):471–522, 1985.

[4] A. Corsaro. https://github.com/kydos/escalier, 2011.

[5] J. Gosling, B. Joy, G. Steele, and G. Bracha.Java(TM) Language Specification, The (3rd Edition)(Java (Addison-Wesley)). Addison-WesleyProfessional, 2005.

[6] S. Microsystems. The java message servicespecification v1.1, 2002.

[7] M. Odersky, L. Spoon, and B. Venners. Programmingin Scala. Artima Inc, 2 edition, 2011.

[8] OMG. Data distribution service for real-time systemsv1.2, 2007.

[9] OMG. Data distribution service interoperability wireprotocol v2.1, 2008.

[10] OMG. Dynamic and extensible topic types, 2010.