evaluating dds, mqtt, and zeromq under different iot ...gokhale/www/papers/m4iot2020.pdfevaluating...

Evaluating DDS, MQTT, and ZeroMQ Under DifferentIoT Traffic Conditions

Zhuangwei KangVanderbilt UniversityNashville, Tennessee

[email protected]

Robert CanadyVanderbilt UniversityNashville, Tennessee

[email protected]

Abhishek DubeyVanderbilt UniversityNashville, Tennessee

[email protected]

Aniruddha GokhaleVanderbilt UniversityNashville, Tennessee

[email protected]

Shashank ShekharSiemens Technology

Princeton, New [email protected]

Matous SedlacekSiemens TechnologyMunich, Germany

[email protected]

AbstractPublish/Subscribe (pub/sub) semantics are critical forIoT applications due to their loosely coupled nature.Although OMG DDS, MQTT, and ZeroMQ are maturepub/sub solutions used for IoT, prior studies show thattheir performance varies significantly under differentload conditions and QoS configurations, which makesmiddleware selection and configuration decisions hard.Moreover, the load conditions and role of QoS settings inprior comparison studies are not comprehensive and well-documented. To address these limitations, we (1) proposea set of performance-related properties for pub/sub mid-dleware and investigate their support in DDS, MQTT,and ZeroMQ; (2) perform systematic experiments underthree representative, lab-based real-world IoT use cases;and (3) improve DDS performance by applying threeof our proposed QoS properties. Empirical results showthat DDS has the most thorough QoS support, and morereliable performance in most scenarios. In addition, itsMulticast, TurboMode, and AutoThrottle QoS policiescan effectively improve DDS performance in terms ofthroughput and latency.

CCS Concepts: • Software and its engineering → Mes-sage oriented middleware; Publish-subscribe / event-based architectures; • General and reference → Evalu-ation.

Permission to make digital or hard copies of all or part of thiswork for personal or classroom use is granted without fee providedthat copies are not made or distributed for profit or commercialadvantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work ownedby others than ACM must be honored. Abstracting with credit ispermitted. To copy otherwise, or republish, to post on servers orto redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected] ’20, Dec 7-11, 2020, Delft, The Netherlands© 2020 Association for Computing Machinery.ACM ISBN 9781450370288. . . $15.00

Keywords: Publish/Subscribe Middleware, Benchmark-ing, MQTT, DDS, ZeroMQ, Performance Evaluation

1 IntroductionDistributed deployment of real-time applications andhigh-speed dissemination of massive data have been hall-marks of the Internet of Things (IoT) platforms. IoTapplications typically adopt publish/subscribe (pub/-sub) middleware for asynchronous and cross-platformcommunication. OMG Data Distribution Service (DDS),ZeroMQ, and MQTT are three representative pub/subtechnologies that have entirely different architectures (de-centralized data-centric, decentralized message-centric,and centralized message-centric, respectively). All ofthem implement the pub/sub messaging pattern andprovide a set of configurable parameters for customizingmiddleware behaviors and resource allocation. Accord-ingly, an essential question that needs to be answeredis how to choose an appropriate middleware given aworkload condition, and which parameters should beregarded for making the optimal configuration decisions.

To that end, we propose a set of QoS properties thatare tied to the performance of IoT pub/sub applicationsand investigate which of those properties are supportedby DDS, MQTT, and ZeroMQ. We then conduct a sys-tematic set of experiments to assess their performancein three pub/sub use cases (high-frequency, periodic,and sporadic), which provides us with baselines for do-ing further performance optimization. Empirical resultsshow that DDS has the most stable performance in theabove scenarios and provides the most in-line supportto the QoS proprieties we propose. To understand theperformance under high frequency data flows, we furtherexperimentally explore the impact of three competitiveQoS policies (Multicast, TurboMode, and AutoThrottle)on improving DDS application performance under suchconditions.

M4IoT ’20, Dec 7-11, 2020, Delft, The Netherlands Kang, et al.

2 QoS Properties Essentials for IoT SystemsThe primary purpose of this section is to introduce sev-eral performance-related QoS policies available in DDS,MQTT, and ZeroMQ that are necessary for performance-sensitive IoT applications. As a branch of networkingapplications, the performance (throughput and latency)of pub/sub middleware mainly depends on the networkcongestion status and internal resource configuration.In this paper, we look at the network aspect, and theprinciple we followed to choose these features is whetherit can effectively change the amount of data enteringand leaving the network pipeline per unit time or theprocessing logic of messages in the network.2.1 ReliabilityReliability policy determines whether messages can be re-liably transmitted, which fatally affect the data-processingand decision-making processes of IoT applications. DDSprovides two types of transmission guarantees: Best Ef-fort (i.e., no guarantees) and Reliable (i.e., retry untilsuccess). MQTT provides three service levels: QoS 0,QoS 1, QoS 2, representing "at-most-once", "at-least-once", and "exactly-once", respectively. MQTT’s QoS 0is equivalent to DDS’ BestEffort and QoS 2 to DDS’ Re-liable delivery. ZMQ pub/sub pattern may lose messageswhen the network is congested.2.2 MulticastMulticast allows IoT applications to scale better whenmultiple subscribers exist. Both DDS and ZeroMQ allowusers to set the number of hops that multicasting mes-sages can traverse, which intends to avoid flooding largenetworks with multicast traffic. Although MQTT runsover unicast protocol, it emulates the application-layer"multicast" by wrapping up point-to-point TCP connec-tions, which is also known as multicast-over-unicast. Theprotocol utilized in ZeroMQ for multicast include Prag-matic General Multicast (PGM) [5] and EncapsulatedPragmatic General Multicast (EPGM).2.3 Intelligent BatchingMessage batching intends to improve the throughput ofIoT applications. It avoids frequent system calls throughthe network stack due to message by message process-ing and can be performed at a fixed interval or overnumber of messages. The TurboMode QoS in DDS au-tomatically decides the optimal number of bytes in abatch on publisher side based on the data sample size,writing speed and the real-time system state. Messagebatching in ZeroMQ is can be enabled at either pub-lisher or subscriber side.Rather than deciding timeout orbatch size for a single chunk, ZeroMQ always forwardsall messages queued in memory at the moment to thenetwork interface card in one go. However, due to the

lack of strict reliability guarantee in ZeroMQ’s pub/subpattern, messages sent in a large batch risk being lostwhen the network is crowded. At the subscriber side, ifsubscriber can keep up with publisher and there is noapplication-level queue backlog, ZeroMQ would turn offits batching function for the sake of lower end-to-enddelay. Otherwise, the batching would be enabled to helpthe worker thread promote the speed of flushing back-logged messages. MQTT does not support intelligentbatching.

2.4 Rate Limit

Rate Limit is a flow control mechanism that protectsnetwork resources from being encroached by malicious ac-tors in an IoT cluster by specifying the maximum rate atwhich a publisher may send samples to the network. DDSallows users to define the Custom Flow Controller thatmaintains a separate FIFO queue for each connectionin which data instances generated by asynchronous pub-lishers can be placed. ZeroMQ supports rate limit onlywhen the multicast protocol (PGM/EPGM) is enabled,and is implemented by setting the multicast window size.MQTT server shapes the egress traffic of each TCP/SSLconnection based on the Leaky Bucket (LB) algorithm.

2.5 Message LifeSpan

Message expiration mechanism restrains the longest timethat a message is regarded as a legal one in the sys-tem, which improves overall memory utilization rateand avoids delivering stale data. In the reliable trans-port mode of DDS, if the LifeSpan QoS policy is notconfigured, unconfirmed data instances will reside inthe publisher’s writing buffer for a long time and bere-transmitted endlessly, which not only wastes mem-ory and bandwidth, but also destroys the timelinessof messages. Adhering to similar intent, MQTT imple-mentations(etc. EMQ X, HiveMQ, and VerneMQ) thatadopt the latest MQTT specification (MQTT 5.0) al-lows publishers to specify a property for each samplecalled message-expiry-interval that defines the actuallifespan of messages in seconds. There is no analogieswith LifeSpan QoS in current ZeroMQ implementations.

2.6 Topic Priority

Topic priority aims to distinguish the importance ofdifferent messages, which ensures that the overall utilityof a multi-topic IoT system is maximized under givenresource limitation. DDS enables this property via theTansport_Priority policy that is signified by a 32-bitsigned integer. An analogous parameter in ZeroMQ iscalled ZMQ_TOS that is implemented in the same logicas DDS. Unlike the above system-level implementations,the topic priority in MQTT applications is designated as

Evaluating DDS, MQTT, and ZeroMQ Under Different IoT Traffic Conditions M4IoT ’20, Dec 7-11, 2020, Delft, The Netherlands

a configurable attribute of message queue in the broker,with values from 0 to 255.

3 Understanding The Performance BaselineThe purpose of this section is to understand the baselineperformance of DDS, MQTT, and ZeroMQ under threerepresentative IoT workload conditions (high-frequency,periodic, sporadic). We performed our tests on an ARMcluster comprising 10 Raspberry Pi 3 Model B boardsthat have 1.20 GHz CPU speed, four physical cores, and 1GB memory. Software details of the cluster are as follows:Raspbian 9 OS, Linux 4.14.91 kernel, and GCC 4.7.3.The cluster bandwidth of 95 Mbps was inferred usingLinux iperf3. Interference was minimized by pinningpublishers and subscribers to separate cores. To avoidlatency measurement error introduced by system clockjitters, we synchronized the system clock on each nodeusing the Precision Time Protocol (PTP) [4]. Our testresults indicate that PTP can guarantee the clock offsetwithin 200 microseconds between Raspberry Pi boardseven if the CPU is 80 percent utilized.

We leveraged RTI-Perftest [7] to monitor and bench-mark throughput, latency and CPU utilization of thetesting DDS application. RTI-Perftest is a highly config-urable command-line benchmarking tool developed byRTI for evaluating applications performance that useRTI Connext DDS Professional 6.01 as middleware. RTI-Perftest measures throughput by counting the amountof bytes received by subscriber per second. To avoidmeasurement errors caused by system clock jitter, RTI-Perftest calculates the one-way delay by sending latencytest data samples in a ping-pong manner. For MQTTand ZeroMQ, we developed custom testing tools usingopen source APIs. All tests were repeated five times andeach run lasted 90 seconds. We measured performancemetrics every 5 seconds.3.1 High-frequency Data-flow TestsThis test was motivated by a fault diagnosis systemwhere vibrations and sound information gathered byacoustic sensors are continuously propagated to ADC orcloud servers for further ML-based analysis. Hence, inthis test, we configured a publisher floods a continuousdata-flow to a single/multiple subscribers with unlimitedrate. were profiled with default QoS settings. Figure 1plots mean throughput versus payload size.

The 1-1 subplot indicates that application throughputgrows as the payload size increases. When the messageis smaller than 1 KB, ZeroMQ performs best, whosethroughput is 1715.43%-3069.23% higher than that ofMQTT and 18.8%-110.66% greater than DDS. The rea-son is there is no bandwidth or CPU pressure for eitherZeroMQ or DDS at this time, and since ZeroMQ is built1 https://www.rti.com/products/connext-6

Figure 1. High-frequency Data-flow Tests: 1pub-1sub& 1pub-7sub, PubRate: unlimited. Figure shows meanthroughput over five runs. QoS settings: Unicast, Reli-able, No Batching.

on top of the socket layer (lower than DDS), it spends lesstime than DDS in executing application-level data pro-cessing. However, when the message is larger than 1KB,ZeroMQ throughput is not smoothly converged, and itsthroughput becomes lower than DDS. The shape of theDDS curve is as expected: its throughput increases firstthen converges to the maximum bandwidth (95mbps).Compared with DDS and ZeroMQ, the throughput ofMQTT is poor due to its broker-centric architecture.In the 1-7 test, the throughput of DDS and MQTT isone-seventh of the physical bandwidth since messagesare forwarded to seven subscribers one by one. On theother hand, ZeroMQ gains the highest throughput, whichreveals that the overhead of ZeroMQ packets in one-to-many cases is significantly lower than DDS and MQTT.3.2 Periodic Data-flow TestsThe Periodic use case maps to the continuous data-flowin real-life, where the size and frequency of data changeless frequently, e.g., a wind farm monitoring system inwhich windmills that rotate at a constant speed sendtelemetry data at a fixed rate through which vendors cancontinuously track the operational status of windmills.In this kind of scenario, latency attracts more intereststhan throughput as the timeliness of monitoring data ismore important. In the following test, we simulated thisuse case and configured a publisher to send messages

https://www.rti.com/products/connext-6


to a subscriber at a specified rate. The publishing ratevaries from 200 to 1000 samples per second. Figure 2shows the results of the 90th percentile latency versuspublishing rate for small, medium, and large messages.

Figure 2. Periodic Data-flow Tests: 1pub-1sub. Figureshows 90th latency over five runs. QoS settings: Unicast,Reliable, No Batching.

For small (64B) and medium (2KB) messages, MQTTlatency slightly improves first then remains flat, thereason being that MQTT broker needs more time to pro-cess ingress/egress traffic as the sending rate increases.However, when the broker can not keep up with thepublisher and the writing buffer on the publisher nodeis exhausted, the writing process of publisher will beblocked, thereby keeping the actual message dissemina-tion rate and latency unchanged. In addition, the changeof latency is not very obvious for DDS and ZeroMQbecause (1) their participants connect in an end-to-end

manner, (2) subscriber can keep up with the publishersince bandwidth is not stressed.

For large samples (32KB), MQTT latency remains flatas the publishing rate varies, and is consistently at thelowest level compared to others when the disseminationrate is faster than 400 samples per second for similarreasons as before. DDS and ZeroMQ share the sametrend with DDS latency a bit larger than that of ZeroMQwhen the publishing rate is 200 samples/sec. ZeroMQ andDDS latency suddenly increase as the message diffusionrate pass 400 and 600 samples per second, respectively,and then fluctuates marginally. We believe the reasonsare twofold: (1) the bandwidth is exhausted when thepublishing rate surpasses 400 samples per second (400samples/s * 32KB > 95Mbps); (2) the publishing processis blocked as the local buffer is exhausted, which is thesame reason as in MQTT.

3.3 Sporadic Data-flow TestsMultiple independent pub/sub applications may co-locatein the same LAN environment. Consider the same windfarm monitoring system example, if anomalous statusis identified on some windmills (i.e., stops spinning dueto slow wind speeds or mechanical stoppage ), it is nec-essary to send a large volume of failure detection datato the central control system. As a result, co-locatedapplications may experience latency deterioration dueto bandwidth contention caused by the bursty traffic.

To that end, we generated a periodic data-flow (PDF)and a sporadic data-flow (SDF) using two one-to-onepub/sub applications with separate topics. The packetssize of SDF application was set to 2MB and the messagefrequency of PDF set to 25Mbps of the physical band-width. Also, the SDF application began 30 seconds laterthan the PDF application and executed for 10 seconds.

Figure 3 depicts the latency of the application underthe interference of the SDF application. MQTT latencyhas higher standard deviation than DDS and MQTT,which implies MQTT is more sensitive to the burstydata stream due to the presence of the broker. And DDStolerates the situation bettwe than ZeroMQ.

4 DDS-focused EvaluationsSince DDS provides more modularized and pluggableQoS properties, this section probes it further. Priorworks [2][6] reveal that transport protocol, messagebatching, and flow control are non-trivial aspects ofperformance tuning in networked applications, we evalu-ate DDS along these dimensions that corresponds to theMulticast, TurboMode, and AutoThrottle QoS policiesin DDS. The setup was similar as in the High-frequencyData-flow Test.

Figures 4 and 5 reveal that the publisher’s CPU usageremains flat in the beginning then gradually decreases as


Figure 3. Sporadic Data-flow Test: 1pub-1sub. Figureshows the 90th latency of the PDF application, standarddeviation on each bar indicates the interference causedby the co-located SDF application. Payload(SDF):2MB,PubRate(SDF):unlimited, PubRate(PDF):25Mbps

the message size increases, which indicates that messagesare produced less often as message size increases sincedata samples need more time to be delivered. Yet, thewriting process is blocked during this period until morepackets can be put into the pipe.

The TurboMode improves throughput 23.6%-848.3%and 7.5%-1166.7% for messages that are smaller than1KB in 1-1 and 1-7 tests, respectively as the networkstack is traversed less frequently. Moreover, it does notalways lead to higher latency in the 1-n test becausethe time consumed on sending individual messages tomultiple receivers in a unicast manner may be longerthan sending multiple messages in batch. Multicast effec-tively improves application performance in the 1-7 test(272.3%-842.2%) but the throughput cannot converge tothe physical bandwidth as payload increases, which webelieve is due to the limitation of network switch.

When the AutoThrottle mode is enabled, our resultsshow that the application throughput reduced an averageof 18.0% and 18.7% in the 1-1 and 1-7 tests, respectively.Likewise, the latency reduced 3.6% and 8.8%, respec-tively. Since the AutoThrottle feature needs to keeptracking the system states(send window occupancy andNACK messages amount) to make throttling decisions,it results in higher CPU utilization than the normalconfiguration.

5 Related WorkPerformance assessment of pub/sub middleware havebeen conducted by many prior works under differentexperimental settings. Pereira et al. [9] propose a setof qualitative and quantitative dimensions for bench-marking IoT middleware. They used a large dataset tosimulate a smart city use case, and evaluated the per-formance of two middleware platforms (FIWARE2 and

2https://www.fiware.org.

Figure 4. DDS QoS Test: 1pub-1sub, PubRate: unlim-ited. Figure shows mean throughput, latency and CPUutilization. QoS(Base): Unicast, Reliable, No Batch-ing; QoS(Multicast): Multicast, Reliable, No Batch-ing; QoS(TurboMode): Unicast, Reliable, TurboMode;QoS(AutoThrottle): Unicast, Reliable, No Batching, Au-toThrottle.

oneM2M3) from their proposed benchmarking dimen-sions. Similarly, the quantitative analysis in [1] presentsperformance (throughput and latency) variation betweenOpen MQ, Active MQ, and Mantaray MQ for differentmessage sizes. However, since the middleware they in-vestigated are all broker-based, they learned publishingand subscribing processes separately, rather than per-forming end-to-end tests. In [10], authors provided anoverview of round trip time (RTT) difference of OPCUA, ROS, DDS, and MQTT under different CPU andnetwork load conditions. Dobbelaere et al.[3] established

3http://www.onem2m.org.


Figure 5. DDS QoS Test: 1pub-7sub, PubRate: unlim-ited. Figure shows mean throughput, latency and CPUutilization. QoS(Base): Unicast, Reliable, No Batch-ing; QoS(Multicast): Multicast, Reliable, No Batch-ing; QoS(TurboMode): Unicast, Reliable, TurboMode;QoS(AutoThrottle): Unicast, Reliable, No Batching, Au-toThrottle.

a generic comparison framework based on the core func-tionality of pub/sub systems. Using this framework, theydelved into qualitative and quantitative comparison oftwo commercially-supported middleware: Kafka and Rab-bitMQ. To avoid interference from the network layer,they executed their experiments on a single host withempirical application configurations. Luzuriaga et al.[8]present an experimental evaluation of AMQP and MQTTin the context of unstable network conditions. Their as-sessments are based on a simple one-to-one publish/sub-scribe scenario. Compared to these works, we designedtest cases based on real-life scenarios and provided users

with more insightful guidelines on selecting QoS policiesto improve middleware performance.

6 Concluding RemarksThis paper empirically evaluates the performance ofthree pub/sub technologies: OMG DDS, MQTT and Ze-roMQ for representative IoT scenarios (high-frequency,periodic, and sporadic). DDS provides more compre-hensive and modularized QoS support than others, andalso demonstrates better overall latency and throughputin most evaluated scenarios. Specifically, DDS gainedhigher throughput than ZeroMQ and MQTT in thehigh-frequency data-flow use case. In periodic data-flow,ZeroMQ has lower latency than DDS for small(64B) andmedium(2KB)messages. DDS latency outperforms Ze-roMQ when sending large messages(32KB). MQTT ismore sensitive to the in-parallel sporadic data-flow, andDDS can successfully shield the interference. Our resultsalso reveal that DDS’s Multicast QoS can effectivelyimprove throughput in multi-subscriber scenarios. TheTurboMode property can intelligently decide appropriatebatch size with regard to different payload and signifi-cantly improve throughput for small messages. And TheAutoThrottle property results in lower throughput andlatency and higher CPU utilization.

Our future work will include: (1) examining more QoSsettings (not only for DDS, but also MQTT and ZeroMQ)experimentally; (2) evaluating more middleware usingreal-world workload instead of synthetic data-flows; (3)designing intelligent decision-making algorithms to au-tomatically configure and adaptively adjust middlewareQoS parameters under various (dynamic) load condi-tions.

AcknowledgmentsThis work is supported by a grant from the Siemens Technology.Any opinions, findings, and conclusions or recommendations ex-pressed in this material are of the author(s) and do not necessarilyreflect the views of Siemens Technology.

References[1] Sanjay P Ahuja and Naveen Mupparaju. 2014. Performance

Evaluation and Comparison of Distributed Messaging UsingMessage Oriented Middleware. Computer and informationscience 7, 4 (2014), 9.

[2] Sowmya Balasubramanian, Dipak Ghosal, KamalaNarayanan Balasubramanian Sharath, Eric Pouyoul,Alex Sim, Kesheng Wu, and Brian Tierney. 2018. Auto-tunedpublisher in a pub/sub system: Design and performanceevaluation. In 2018 IEEE International Conference onAutonomic Computing (ICAC). IEEE, 21–30.

[3] Philippe Dobbelaere and Kyumars Sheykh Esmaili. 2017.Kafka versus RabbitMQ: A comparative study of two industryreference publish/subscribe implementations: Industry Paper.In Proceedings of the 11th ACM international conference ondistributed and event-based systems. 227–238.


[4] John C Eidson, Mike Fischer, and Joe White. 2002. IEEE-1588™ Standard for a precision clock synchronization protocolfor networked measurement and control systems. In Proceed-ings of the 34th Annual Precise Time and Time IntervalSystems and Applications Meeting. 243–254.

[5] Jim Gemmell, Todd Montgomery, Tony Speakman, and JonCrowcroft. 2003. The PGM reliable multicast protocol. IEEEnetwork 17, 1 (2003), 16–22.

[6] Joe Hoffert, Aniruddha Gokhale, and Douglas C Schmidt.2011. Timely autonomic adaptation of publish/subscribemiddleware in dynamic environments. International Journalof Adaptive, Resilient and Autonomic Systems (IJARAS) 2,4 (2011), 1–24.

[7] Real-Time Innovations. 2020. rtiperftest. https://github.com/rticommunity/rtiperftest.

[8] Jorge E Luzuriaga, Miguel Perez, Pablo Boronat, Juan CarlosCano, Carlos Calafate, and Pietro Manzoni. 2015. A compara-tive evaluation of AMQP and MQTT protocols over unstableand mobile networks. In 2015 12th Annual IEEE ConsumerCommunications and Networking Conference (CCNC). IEEE,931–936.

[9] Carlos Pereira, João Cardoso, Ana Aguiar, and Ricardo Morla.2018. Benchmarking Pub/Sub IoT middleware platforms forsmart services. Journal of Reliable Intelligent Environments4, 1 (2018), 25–37.

[10] Stefan Profanter, Ayhun Tekat, Kirill Dorofeev, Markus Rick-ert, and Alois Knoll. 2019. OPC UA versus ROS, DDS, andMQTT: performance evaluation of industry 4.0 protocols. InProceedings of the IEEE International Conference on Indus-trial Technology (ICIT).

https://github.com/rticommunity/rtiperftest

https://github.com/rticommunity/rtiperftest

evaluating dds, mqtt, and zeromq under different iot ...gokhale/www/papers/m4iot2020.pdfevaluating...

Documents