dependable communication protocols in ad-hoc networks
TRANSCRIPT
Dependable Communication Protocols in Ad-Hoc
Networks
Vadim Drabkin
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Dependable Communication Protocols in Ad-Hoc
Networks
Research Thesis
Submitted in Partial Fulfillment of the Requirements
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
Vadim Drabkin
Submitted to the Senate of the Technion — Israel Institute of Technology
SIVAN, 5768 Haifa June, 2008
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
This Research Thesis was done under the supervision of Assoc. Prof. Roy Friedman in theDepartment of Computer Science
It is my privilege to thank Roy Friedman for his insightful guidance that made this workpossible, and for bringing me up as a scientist, researcher and an entrepreneur.
It is a pleasure to thank my colleagues at the Technion, Gabi Kliot and Marc Segal z”l,for fruitful discussions and for the wonderful time we had together.
I feel that no words can express my deep gratitude to my wonderful wife Anna, whogave me unconditional love, support, and understanding through the better and worsetimes of my research. Also, to my grandparents, Enoch and Ester Tsvaig, my mother,Larisa Drabkin, and to my sister Regina Bricker to whom I owe my life and education fromthe very beginning.
The generous financial help of the Technion is gratefully acknowledged
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Contents
Abstract 1
Notation and Abbreviations 3
1 Introduction 5
2 Related work 11
2.1 Multicast & Broadcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Byzantine Failures & Failure Detectors . . . . . . . . . . . . . . . . . . . . 16
2.3 Group Communication & Replication . . . . . . . . . . . . . . . . . . . . . 16
3 Preliminaries 21
3.1 Ad-Hoc Networks Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Malicious Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Limitations on the power of the malicious nodes . . . . . . . . . . . 22
3.3 Failure Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.1 Interfacing with the Failure Detectors . . . . . . . . . . . . . . . . . 25
4 Reliable Probabilistic Dissemination inWireless Ad-Hoc Networks 29
4.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Common Reliable Dissemination Techniques . . . . . . . . . . . . . . . . . 30
4.2.1 Probabilistic Flooding . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.2 Counter Based Broadcast . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.3 Lazy Gossip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 The RAPID Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.1 Basic RAPID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.2 Enhanced RAPID . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
i
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
ii Contents (continued)
4.3.3 Maliciousness Resilient RAPID . . . . . . . . . . . . . . . . . . . . . 43
4.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5 Overlay Based Reliable Broadcast in Wireless Ad-Hoc Networks 59
5.1 System Model and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2 Failure Detectors and Nodes’ Architecture . . . . . . . . . . . . . . . . . . 59
5.3 The Broadcast Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.4 The Dissemination Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.4.1 The Dissemination Task in Detail . . . . . . . . . . . . . . . . . . . 62
5.4.2 Gossiping and Message Recovery in Detail . . . . . . . . . . . . . . 62
5.5 Overlay Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.6 Correctness Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.6.1 Protocol Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.6.2 Fast Dissemination with Eventually Perfect Failure Detectors . . . 72
5.6.3 Fast Dissemination with Interval Failure Detectors . . . . . . . . . . 72
5.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6 Byzantine Resilient Group Communication 81
6.1 Model, Assumptions and Problem Statement . . . . . . . . . . . . . . . . . 81
6.1.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.1.2 Byzantine Virtual Synchrony . . . . . . . . . . . . . . . . . . . . . . 83
6.2 Overview of the Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.2.1 JazzEnsemble and Fuzzy Membership . . . . . . . . . . . . . . . . . 87
6.2.2 Fuzzy Mute and Fuzzy Verbose Failure Detectors . . . . . . . . . . 89
6.2.3 Intra-View Reliable Delivery . . . . . . . . . . . . . . . . . . . . . . 91
6.2.4 Byzantine Membership Maintenance . . . . . . . . . . . . . . . . . . 91
6.2.5 Total Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2.6 Efficient Implementations of building blocks . . . . . . . . . . . . . 102
6.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7 Summary and Future Directions 115
7.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Contents (continued) iii
A Practical Application 121
A.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
References 122
Hebrew Abstract `i
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
iv Contents (continued)
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
List of Figures
3.1 Failure Detectors’ Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1 An upper bound on the probability that an arbitrary node does not receivea message m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 A transmission by a node s can be received by all nodes within its transmis-sion range: p, n1, ...,nk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Basic RAPID (executed by node p) . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 Enhanced RAPID (lines that were modified w.r.t Figure 4.3 are boxed whilelines 18–27 were added) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5 Maliciousness Resilient RAPID (lines that were modified w.r.t Figure 4.4 areboxed) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.6 Message delivery ratio when all nodes are mobile vs. varying values of β . . 51
4.7 Network load in terms of total number of transmissions when all nodes aremobile vs. varying values of β . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.8 Latency to deliver a message to X% of the nodes when all nodes are mobilevs. varying values of β (with 100 broadcasting nodes) . . . . . . . . . . . . 52
4.9 Latency to deliver a message to 98% of the nodes when all nodes are mobilevs. varying values of β (with 100 broadcasting nodes) . . . . . . . . . . . . 52
4.10 Message delivery ratio when all nodes are mobile vs. varying number ofbroadcasting nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.11 Network load in terms of total number of transmissions when all nodes aremobile vs. varying number of broadcasting nodes . . . . . . . . . . . . . . . 52
4.12 Latency to deliver a message to X% of the nodes when all nodes are mobile(with 100 broadcasting nodes) . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.13 Latency to deliver a message to X% of the nodes when all nodes are mobilevs. varying number of selfish nodes (with 100 broadcasting nodes) . . . . . 53
4.14 Message delivery ratio vs. varying number of broadcasting nodes (compareprotocols both in static and mobile environments) . . . . . . . . . . . . . . 53
4.15 Network load in terms of total number of transmissions vs. varying number ofbroadcasting nodes (compare protocols both in static and mobile environments) 53
4.16 Message delivery ratio when all nodes are static vs. varying density (with100 broadcasting nodes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
v
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
4.17 Network load in terms of total number of transmissions when all nodes arestatic vs. varying density (with 100 broadcasting nodes) . . . . . . . . . . . 54
5.1 A node’s architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 Malicious Resilient Dissemination Algorithm . . . . . . . . . . . . . . . . . 63
5.3 Malicious Resilient Dissemination Algorithm – continued . . . . . . . . . . . 64
5.4 Malicious overlay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.5 Message delivery ratio when all nodes are static . . . . . . . . . . . . . . . . 75
5.6 Network load in terms of total number of messages sent when all nodes arestatic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.7 Latency to deliver a message to X% of the nodes when all nodes are static(with 200 broadcasting nodes that send one message per second) . . . . . . 76
5.8 Latency to deliver a message to X% of the nodes when nodes are mobile(with 200 broadcasting nodes that send one message per second) . . . . . . 76
5.9 Message delivery ratio when all nodes are mobile . . . . . . . . . . . . . . . 77
5.10 Network load in terms of total number of messages sent when nodes are mobile 77
5.11 Message delivery ratio when all nodes are static vs. varying number of ma-licious nodes (out of a total of 200 nodes) . . . . . . . . . . . . . . . . . . . 78
5.12 Message delivery ratio when nodes are mobile vs. varying number of mali-ciouos nodes (out of a total of 200 nodes) . . . . . . . . . . . . . . . . . . . 78
5.13 Network load when all nodes are static vs. varying number of malicious nodes(out of a total of 200 nodes) . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.14 Network load when nodes are mobile with varying number of malicious nodes(out of a total of 200 nodes) . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.15 Latency to deliver a message to X% of the nodes when all nodes are staticvs. varying number of malicious nodes . . . . . . . . . . . . . . . . . . . . . 80
5.16 Latency to deliver a message to X% of the nodes when nodes are mobile vs.varying number of malicious nodes . . . . . . . . . . . . . . . . . . . . . . . 80
6.1 A node’s architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2 Message headers and data in layers (drawing taken from Ensemble’s referencemanual) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.3 Pseudo Code of Membership Protocol . . . . . . . . . . . . . . . . . . . . . 92
6.4 Pseudo Code of Suspicion Protocol . . . . . . . . . . . . . . . . . . . . . . . 93
6.5 Pseudo Code of Merge Protocol . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.6 Pseudo Code of FLUSH Protocol . . . . . . . . . . . . . . . . . . . . . . . . 99
6.7 Main variables held by each process pi . . . . . . . . . . . . . . . . . . . . . 103
6.8 ♦Pmute-Based Vector Byzantine Consensus Protocol Executed by pi (n > 6f) 104
vi
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
6.9 Uniform Broadcast Protocol Executed by pi (n > 6f) . . . . . . . . . . . . 108
6.10 Throughput measurements (the line for public key cryptography is hardlyvisible, as it is so close to 0 compared with the other lines) . . . . . . . . . 110
6.11 Latency measurements (the line for public key cryptography is dropped sinceit is orders of magnitude higher than the others) . . . . . . . . . . . . . . . 110
6.12 Throughput Measurements: the cost of total ordering and uniform broadcastwith and without symmetric-key cryptography . . . . . . . . . . . . . . . . 112
6.13 Time to establish a new view . . . . . . . . . . . . . . . . . . . . . . . . . . 112
A.1 WiPeer architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
vii
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
viii
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
List of Tables
4.1 Delivery ratio and message count vs. the number of selfish nodes (with 100broadcasting nodes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.1 Recovery time from problematic scenarios . . . . . . . . . . . . . . . . . . . 113
ix
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
x
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Abstract
Mobile Ad-Hoc Networks (MANETs) are networks of mobile devices that are formed in an
ad-hoc manner. The devices that participate in such networks have wireless communication
capabilities with limited range transmitters, and thus can directly communicate with other
devices that are within their range. Some of the devices occasionally volunteer to forward
some of the messages they receive, or in other words, act as routers, thereby forming a
multi-hop network with a wider reach. Yet, there is no fixed infrastructure, the network is
continuously changing, and routers are elected on demand. In other words, the networking
issues are handled in an ad-hoc manner.
Unlike infrastructure based networks in which routers are usually considered to be
trusted entities, in ad-hoc networks routing is performed by the devices themselves. Thus,
there is a high risk that some of the nodes of an ad-hoc network would not respect the
networking protocols. This can be due to maliciousness, or simply selfishness (trying to
save battery power). Thus, the possibility of having faulty nodes in the system motivates
the development of reliable broadcast protocols for ad-hoc networks.
Group communication systems have proven themselves as powerful middleware for build-
ing reliable networked applications in wired environments. These systems relieve program-
mers from many of the tedious and highly complex issues involved in designing such ap-
plications, allowing them to focus on the essential aspects of the application being devel-
oped, resulting in faster development time and fewer bugs. During the last few years,
group communication systems have become standard building blocks in many clustering
and replication products in both industry and academia.
In this thesis we present a novel ReliAble ProbabIlistic Dissemination protocol, called
RAPID, for mobile wireless ad-hoc networks, which tolerates message omissions, node
1
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
crashes, and selfish behavior. The protocol employs a combination of probabilistic for-
warding with deterministic corrective measures. The forwarding probability is set based
on the observed number of nodes in each one-hop neighborhood, while the deterministic
corrective measures include deterministic gossiping as well as timer based corrections of the
probabilistic process. These aspects of the protocol are motivated by a theoretical analysis,
which explains why this unique protocol design is inherent to ad-hoc networks environments.
Since the protocol only relies on local computations and probability, it is highly resilient to
mobility and failures. By adding authentication, it can even be made malicious tolerant.
As additional contribution, we present an efficient overlay based reliable broadcast pro-
tocol for ad hoc networks. The use of an overlay results in a significant reduction in the
number of messages. The protocol overcomes different types of nodes failures by combining
digital signatures, gossiping of message signatures, and failure detectors.
Last, we present and explore a reliable group communication system in wireless mobile
ad-hoc networks. The objective is to enable easy, efficient, and correct development of
applications in this environment. The main challenge was to develop protocols that are
resilient to attacks by malicious nodes, yet are scalable, robust and efficient. This included
adapting several existing protocols as well as developing new membership maintenance
mechanisms, followed by a thorough benchmarking of the system.
2
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Notation and Abbreviations
rp — Transmission range of node p
N t(1, p) — Set of direct neighbors of a node p in time t
N t(k, p) — A transitive closure with length k of N t(1, p) in time t
OV ERLAY — Set of nodes that belong to the overlay
OLt(1, p) — Set of direct neighbors of a node p that belong to the overlay in time t
OLt(k, p) — A transitive closure with length k of OLt(1, p) in time t
β — Required number of forwarders of message
davg — The average number of neighbors of any node
Q — A probability of a message to be successfully received by a neighboring node
P — A probability to broadcast a message
kp — A private key of device p
f — A number of failures in the system
Mute — Failure detector that detects mute failures
Verbose — Failure detector that detects verbose failures
Mute — Failure detect that detects node failures
Imute — Interval failure detect that detects mute failures
Iverbose — Interval failure detect that detects verbose failures
3
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
4
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Chapter 1
Introduction
Wireless mobile ad-hoc networks (MANET) are formed when an ad-hoc collection of de-
vices equipped with wireless communication capabilities happen to be in proximity to each
other [112]. When some of these devices agree to forward messages for other devices, a
multi-hop network is formed. One of the aspects of ad-hoc networks is that they are formed
without any pre-existing infrastructure or management authority. Also, due to mobility,
the physical structure of the network is continuously evolving.
MANETs offer a potential for a variety of new applications and improved services for
mobile users, especially as the computing power of mobile devices becomes stronger. Exam-
ple applications include interactive distributed games, ad-hoc transactions and e-commerce,
collaborative (shared white-board and video conferencing) applications, and enhancing the
bandwidth and reach of cellular communication (e.g., for Wi-Fi enabled cell-phones) [62, 63].
Broadcast is a basic service for many collaborative applications, as it enables any device
to disseminate information to all other participants in the network. In particular, a useful
broadcast service should be both efficient and provide a good level of reliability, meaning
that most nodes in the system will receive almost every broadcasted message.
Yet, some of the applications we mentioned above require stronger semantics than broad-
cast and can benefit from a group communication service that provides the application a
wider choice of fine grain semantics. Group communication is an umbrella term that de-
scribes a wide variety of toolkits, algorithms, and services that promote the ease of commu-
nication among a number of nodes [15]. The GC toolkits that have been developed typically
5
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
serve as a middleware, enabling developers to easily add communication primitives such as
reliable multicast, virtual synchrony, and total ordering to their applications. These pro-
tocols are typically not trivial to implement, thus contributing to the popularity of the
component model for such toolkits, where developers can simply select the services that are
required for their particular task and use them in a transparent, abstract manner. In recent
years, group communication toolkits have become standard building blocks in commercial
and academic clustering systems.
Despite the large body of work on group communication, most systems assumed a rel-
atively benign failure model, which largely excludes Byzantine failures [74]. In particular,
while some group communication systems support signatures and authentication of mes-
sages, the vast majority assume that all group members can be trusted. In contrast, under
the Byzantine failure model, a process can deviate arbitrarily from its protocol. This can
be either a result of a bug or hardware malfunction, or due to malicious behavior.
One of the main reasons why most systems ignore Byzantine faults is that group com-
munication has been largely used to coordinate clustered applications, all running within
the same LAN. It is often assumed that in such closed environments, all participants can be
trusted. In particular, when combining this assumption with the performance hit and pro-
tocol complexity associated with accommodating Byzantine failures, most projects opted
not to handle such failures.
However, given the rise in security attacks against computer systems, as well as the
desire to utilize group communication in new application domains, such as in ad-hoc net-
works [112], the need for Byzantine tolerance re-emerges. This is because the likelihood
that a node might be compromised is no longer negligible. Thus, if we want the system to
remain robust in these situations, it must be able to tolerate Byzantine failures. On the
other hand, we would like to ensure reasonable performance, since otherwise the system
would also be useless. In particular, we assume that the occurrence of Byzantine faults is
rare, and thus we believe that it is important to focus on the performance of the system
during normal runs, when there are no Byzantine faults. Yet, of course, the system must
still be able to recover from Byzantine faults and do useful work when they occur.
The design of traditional group communication systems requires each group member
to communicate periodically with all other group members. Yet, in multi-hop networks,
6
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
a node p can receive messages only from nodes whose distance from p is less than their
transmission range. Therefore, some nodes will have to forward those messages from other
nodes to their neighbors. It is even more complicated if some of the nodes choose to behave
in a Byzantine manner and not to forward these messages. Thus, in order to implement a
group communication system in multi-hop networks there is a need for reliable broadcast
protocol.
The simplest way to obtain broadcast in a multiple hop network is by employing flood-
ing [111]. That is, the sender sends the message to everyone in its transmission range.
Each device that receives a message for the first time delivers it to the application and also
forwards it to all other devices in its range. While this form of dissemination is very robust,
it is also very wasteful and may cause contention and a large number of collisions [114].
Common alternatives to flooding are either to perform a constrained flooding on top
of a deterministic overlay, e.g., [75, 109, 121, 123], or to perform a probabilistic flooding,
e.g., [57, 76].
In the probabilistic approach, whenever a node receives a message, it applies some locally
computable probabilistic mechanism to randomly determine whether it should broadcast the
message or not [24, 57, 76]. Probabilistic protocols are appealing since they are very simple,
and are inherently robust to failures and mobility. Yet, as was discovered in [57, 76, 104],
in order to obtain very high reliability levels with pure probabilistic broadcasting, one has
to set the retransmission probability to relatively high values. Consequently, such schemes
still generate a large number of redundant messages.
Other approaches [24, 57, 114, 115] combine probabilistic forwarding with some addi-
tional locally computable mechanism, such as counter-based, distance-based, location-based,
or any combination of those, to determine whether it should rebroadcast the message or
not. That way, the number of messages is further reduced. Yet, those protocols suffer
from increased latency. Finally, those schemes cannot ensure high reliability for arbitrary
topologies, and cannot cope with selfish (and malicious) behavior.
Our work began with the investigating of the various aspects of designing and enhancing
an effective group communication toolkit for MANETs that overcomes Byzantine failures.
Soon, we understood that some membership protocols require reliable dissemination of
7
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
messages to all group members in multi-hop wireless networks. Having the reliable dissem-
ination building block, enabled us to concentrate on group communication that overcomes
Byzantine failures in single hop networks. For didactic reasons we first present two flavors of
reliable broadcast protocols: RAPID, a reliable probabilistic broadcast protocol and BDP,
an overlay-based broadcast protocol. Then we turn our attention to a group communication
system that tolerates Byzantine failures.
Our contributions are:
• We demonstrate an efficient reliable probabilistic broadcast protocol for wireless ad-
hoc networks. The protocol employs a combination of probabilistic forwarding with
deterministic corrective measures. The forwarding probability is set based on the
locally observed network’s density. Additionally, we employ timer based corrections
that may cause a node to change its decision on whether to broadcast a message
or not. Finally, the protocol employs a deterministic gossip based mechanism that
recovers messages that were not delivered by the probabilistic dissemination.
• We analyze the relationship between the number of nodes that rebroadcast messages in
each one-hop neighborhood in a probabilistic dissemination protocol and the expected
reliability of this protocol (or in other words, the percentage of nodes that will receive
the message). We show that there is an optimal number, in the sense that this number
of retransmitting nodes, which is relatively small, is enough to ensure good reliability
and this number does not depend on the network’s density.
• We present an efficient overlay-based reliable broadcast protocol for wireless ad-hoc
networks. The protocol overcomes malicious failures by combining digital signatures,
gossiping of message signatures, and failure detectors. These ensure that messages
dropped or modified by malicious nodes will be detected and retransmitted and that
the overlay will eventually consist of enough correct processes to enable message dis-
semination. An appealing property of the protocol is that it only requires the existence
of one correct node in each one-hop neighborhood.
• We develop a Byzantine resilient membership protocol. In addition, we analyze the
sources of performance degradation associated with various aspects of overcoming
Byzantine failures.
8
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
• We present a novel Byzantine uniform broadcast protocol that terminates in two
communication steps (instead of three) at a cost of f < n/6.
• We present a novel vector-oriented Byzantine consensus protocol that allows the pro-
cesses to decide in one communication step in favorable circumstances. The protocol
is a failure detector-based and assumes f < n/6.
• We demonstrate a design of practical application that uses the JazzEnsemble group
communications toolkit and targets wireless mobile ad-hoc networks (MANETs).
Road-map: The rest of this thesis is organized as follows. Chapter 2 discusses rele-
vant related work such as broadcast protocols and group communication systems. Chapter 3
presents the Ad-Hoc networks model. Chapter 4 describes the reliable probabilistic broad-
cast and Chapter 5 describes the overlay-based broadcast protocols. The details of the
Byzantine JazzEnsemble protocols are given in Chapter 6. Finally, Chapter 7 summarizes
the thesis and outlines potential future work.
9
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
10
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Chapter 2
Related work
2.1 Multicast & Broadcast
A good survey of broadcast and multicast protocols for wireless ad hoc networks can be
found in [112]. In particular, (multicast) routing in MANET can be classified into proactive,
e.g., OLSR [31], reactive, e.g., AODV [92] and DSR [65], and mixtures of both, e.g., ZRP [56],
as well as geographic routing [66, 71, 77, 98]. These protocols, however, ignore Byzantine
failures.
The simplest probabilistic broadcast protocol is probabilistic flooding [57, 114]. In this
scheme, each node rebroadcasts a message with a predefined probability P. Works by Haas
et al. [57] and Sasson et al. [104] study the rebroadcasting probability P with regard to the
so called phase transition phenomena. Both works establish that the delivery distribution
has a bimodal behavior with regard to some threshold probability P, in a sense that for
any P > P almost all nodes will receive the message and for P < P almost none. Both
works show that the threshold probability P is around 0.59 − 0.65; in [104] this is done
analytically based on percolation theory while in [57] it is obtained by simulations. It is
also noted in [57] that the threshold probability depends on nodes density, yet without
providing any theoretical means to evaluate this dependance. We have studied the delivery
distribution using probabilistic methods in Section 4.2. We have shown that by making
a few probabilistic assumptions, the delivery distribution function behaves in a concaved
manner rather than being bimodal. That is, nodes coverage initially grows fast with P.
Then, at some critical point, the added coverage becomes negligible with further increase
11
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
of P. Our protocol is designed with corrective measures that compensate for situations in
which the simplifying assumptions do not hold.
Other probabilistic approaches [57, 85, 114, 115] include counter-based, distance-based,
and location-based mechanisms. The main idea in these schemes is that the additional
space coverage obtained by each additional broadcast decreases with the number of broad-
casts. For example, [57] presents a variant of the probabilistic protocol in which every node
monitors the transmissions of its neighbors and rebroadcasts a message if it has not heard
M transmissions of the same message. Yet, those protocols suffer from increased latency
due to the packet delay introduced at each hop (as explained in Section 4.3.2) and none
of them guarantee a reliable dissemination of messages to all nodes (as explained in Sec-
tion 4.3.2). On the other hand, the RAPID protocol, presented in this thesis, guarantees
reliable dissemination in any topology.
The works in [107, 127] utilize an adapted probabilistic flooding that makes use of local
density. The approaches of those works are based on the observation that the retransmission
probability P should be adjusted relatively to the local nodes density. In [127] this is
done through counters, while in [107] the uniform density is assumed. However, those
works contain little theoretical analysis of the proposed schemes and like other counter-
based schemes can also fail to provide reliability on certain topologies. To the best of our
knowledge, our work is the first to provide a theoretical analysis of the optimal usage of
nodes density in order to set P.
The work in [24] studies three variants of the above ideas. The first is to retransmit
with probability k/ni, where k is some constant and ni is the size of the neighborhood.
The second method is based on having each node learn its 2-hop neighborhood and then
computing the rebroadcasting probability based on 1-hop neighborhoods intersections. The
final scheme in [24] also computes the probability according to k/n, but adds a mechanism
in which if a node suspects that some of its neighbors did not receive the message, it
rebroadcast the message regardless of its initial decision. Unlike the work in [24], we formally
analyze the value of k. Also, we include a gossip and recovery mechanism, whereas none of
the protocols in [24] do so. Consequently, RAPID is more reliable than any of the schemes
of [24]. Moreover, RAPID has a variant that can deal with many forms of malicious behavior
while the other protocols do not.
12
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
The color-based scheme has been recently proposed in [68]. In this scheme, each node
forwards a message if it can assign it a color from a given pool, which it has not already
overheard after a random time. Using geometric analysis, they have shown that the size of
the rebroadcasting group is within a small constant factor of the optimum. The color-based
scheme is actually an advanced type of a counter-based scheme, and thus incurs similar
latencies and does not guarantee high reliability on arbitrary topologies. The bounds on
the size of the rebroadcasting set in homogenous dense network in [68] are similar to our
analysis in Section 4.2.1. Yet, our analysis is much simpler and holds for every probabilistic
algorithm that picks nodes uniformly at random in homogenous network, while their analysis
only holds for color-based schemes.
A number of works have been designed to provide a reliable dissemination of messages
to all nodes. An approach called Mistral tries to compensate for missing messages in
probabilistic dissemination by using forward error correction techniques [93]. In contrast,
our approach to recovery of messages is based on gossip. Also, Mistral cannot cope with
malicious behavior.
Demers et al. were the first to use gossip in the context of replicated databases in [34].
This idea was later adopted and extended in followup works such as the MNAK layer of
the Ensemble system in 1996 [58]. Additionally, randomized gossip has been used as a
method of ensuring reliable delivery of broadcast/multicast messages while maintaining
high throughput in the PBcast/Bimodal work [14] as well as in several followup papers,
e.g., [40]. In a way, the idea in our work is an inverse of the idea at PBcast/Bimodal work.
In the PBcast/Bimodal, each node deterministically sends every message to all the nodes
and later gossips about the existing messages with a random subset of nodes. Conversely,
in RAPID each node disseminates the messages to a random set of nodes (chosen among
its physical neighbors) and later deterministically gossips about the existing messages with
all its neighbors.
A generic framework for presenting gossip protocols was proposed in [64]. In particular,
it highlighted the advantages of designing gossiping protocols using a pull-push approach
for higher reliability. This framework was later extended to ad-hoc networks in [12, 53].
An example of a protocol for ad-hoc networks that uses a pull-push approach and is easily
expressed in the above framework is [79]. Both RAPID and BDP can also be seen as specific
13
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
instantiations of pull-push dissemination.
An additional protocol for reliable broadcast and manycast in ad hoc networks called
Scribble has been proposed in [120]. In Scribble, the responsibility for dissemination ini-
tially rests with the manycast originator, which periodically broadcasts the message, and is
subsequently passed around to other nodes. The termination condition in Scribble is deter-
mined by piggybacking a bit vector for all known nodes that have received the broadcast
message. Scribble does not employ probabilistic mechanisms and thus suffers from increased
latency and is more message consuming.
Another work that proposed a gossip-based multicast protocol resistant to DoS attacks,
Drum, was presented in [10]. Drum focuses on multicast only, and as a gossip-based protocol,
it relies on a high level of redundancy. Drum achieves DoS resistance using a combination
of pull and push operations, separate resource bounds for different operations, and the use
of random ports in order to reduce the chance of a port being attacked.
Random walk techniques have also been used to maintain group membership in ad-hoc
networks [36], as well as reliable multicast. Yet, these services only provide probabilistic
guarantees, with a probability that is based on the network density and the cover time of
the random walk agent.
Spanning tree based overlays have been often used as the main scheme for disseminating
messages to large groups, e.g., in IP multicast [90, 111] and in the MBone [42, 80]. More
sophisticated overlays such as hypercubes and Harary graphs have been explored, e.g.,
in [47, 78], as well as distributed hash tables like SCRIBE [101].
There has been a lot of work on securing point-to-point routing schemes against mali-
cious nodes. One example is the protocol presented in [7]. In this work, the authors describe
a mechanism for detecting malicious faults along a path and then discovering alternative
paths. Another secure routing protocol (SRP) has been proposed in [88]. SRP requires a
secure association between each pair of source and destination but assumes that Byzantine
nodes do not collude. Yet another protocol, SMT [89], protects pairwise communication by
breaking the message into several pieces based on a coding scheme that allows reconstruct-
ing the message even when some pieces are lost. Each piece is then sent along a different
path. Additional examples of secure point-to-point routing include, e.g,. [103, 124, 126].
The work of Minsky and Schneider [84] explored disseminating information using gossip
14
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
in wired networks, when some nodes can be faulty. This is by only trusting gossips that have
gained the support of at least f + 1 nodes, where f is the number of potential Byzantine
nodes. Several other works have also proposed a Byzantine multicast scheme that sends a
message along f + 1 distinct paths [33, 81]. Similarly, [16] has studied how to reduce the
possibility of interception by using multiple paths chosen in a stochastic manner.
Reliable Byzantine tolerant broadcast and multicast in networks where all nodes can
communicate directly with each other has been formally described in [18], and has been
explored, e.g., in [82]. Additionally, Byzantine tolerant atomic broadcast in general network
topologies that maintain connectivity between correct nodes has been investigated in [33].
Also, the works in [9, 20] have proposed a formal framework for defining and implementing
reliable multicast protocols in a hybrid failure environment (Byzantine, crash, and omission)
based on modern cryptography. In particular, they have investigated the computational
complexity of such protocols.
A framework for fault-tolerance by adaptation was proposed in [28]. In this framework, a
simple protocol is run during normal operation alongside some failure detection mechanism.
Once a failure is detected, the execution switches to a masking protocol. This idea was
demonstrated in [28] on the broadcast problem, which results in a somewhat similar solution
to BDP protocol. However, in [28] it was not mentioned how the overlay (a tree in their
case) is constructed and maintained. Also, the masking protocol was flooding, whereas we
avoid flooding even when failures are detected. Instead, in BDP, local message recovery
is first attempted. Moreover, in [28] it was not explained when and how to return to the
simple protocol once a failure is compensated for. Finally, our work encapsulates failure
detection behind failure detectors, which results in a modular implementation.
A reliable broadcast protocol in Mobile Ad-Hoc Networks that uses message depen-
dencies was presented in [108]. The protocol does not require explicit acknowledgements;
instead message dependencies provide implicit acknowledgements. Additionally, their pro-
tocol can be used to determine message stability, an important property for practical im-
plementations of a broadcast protocol.
15
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
2.2 Byzantine Failures & Failure Detectors
Byzantine failures have been introduced by Lamport, Shostack, and Pease in the context of
synchronous systems in [74]. Byzantine failures have been mostly studied in the context of
the consensus problem. The first randomized protocols to solve consensus in asynchronous
Byzantine systems have been proposed by Ben-Or [13] and Rabin [94]. Both Toueg and
Bracha have presented randomized asynchronous consensus protocols that are optimal with
respect to the number of processes that can exhibit a Byzantine behavior in [113] and [17],
respectively. Since then, many protocols have been published including, e.g., [21, 23, 41]
(this is far from being an exhaustive list).
The notion of a failure detector, which captures the required functional properties of
failure detection without specifying explicit timing assumptions, was initiated by Chandra
and Toueg in the context of the Consensus problem [27]. Mute failure detectors were ini-
tially proposed in [37, 38] in order to solve Byzantine Consensus in otherwise asynchronous
systems. They were later used also in [11, 48, 69].
2.3 Group Communication & Replication
Group communication has a noted research history [15, 30]. Yet, most of the systems de-
veloped ignore Byzantine failures. The few group communication systems that focus on se-
curity include SecureRing [70], Ensemble [58, 100], Rampart [99], Antigone [83], ITUA [97],
Cactus [59], and Secure Spread [6]. We elaborate on these systems below.
Ensemble’s architecture addresses security [100], yet it only protects the system from
external attacks, and does not handle Byzantine failures. Antigone is a framework that en-
ables specifying flexible application security policies [83]. The framework allows controlling
various quality of service issues, including security, but does not handle Byzantine failures
either.
Secure Spread [6] is a group communication system that was designed to provide group
communication over WANs. Spread integrates two low-level protocols: the Ring protocol
in each site and the hop protocol connecting the sites. Secure Spread relies on strong
synchronization guarantees to assure that no member can receive and decrypt messages
16
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
after it left the group and no new member can receive and decrypt messages sent before it
joined the group. Secure Spread also ignores Byzantine failures.
Rampart is the first group communication system that handled Byzantine attacks [99].
Rampart allows dynamic group membership and it must exclude faulty replicas from the
group to make progress (e.g., to remove a faulty primary and elect a new one). The
SecureRing system consists of a reliable delivery protocol, a group membership protocol,
and a Byzantine fault detector [70]. The system protects a low-level ring by authenticating
each transmission of the token and data message received. Both Rampart and SecureRing
can guarantee safety if fewer than 1/3 of the replicas are faulty. Additionally, Rampart and
SecureRing provide group membership protocols that can be used to implement recovery,
but only in the presence of benign faults.
The ITUA project has the goal of developing a middleware based intrusion tolerance
solution that helps building survivable distributed applications [97]. They have also taken
the approach of extending an existing layered group communication system, in their case,
C-Ensemble, and making it resilient to Byzantine faults. ITUA uses an adaptive and
unpredictable response as a major technique to cope with an attacker and its architecture
separates the role of detection from replication management. The Cactus project also enjoys
a layered micro-protocol architecture that allows adaptability and flexibility [59], and also
has a Byzantine tolerant protocol stack. Interestingly, the programming model of Cactus
is not virtually synchronous.
Rampart, SecureRing, Cactus, and ITUA all suffer from limited performance since they
use costly protocols and rely intensively on public key cryptography. On the other hand, the
BFT system, by Castro and Liskov [25], provides state-machine replication for an (almost)
asynchronous network, where fewer than a third of the replicas may fail. BFT operates in
epochs, where each epoch is made up of two phases, an optimistic phase and a recovery
phase. In the optimistic phase, one node is designated as a leader; this node decides on the
ordering of messages and notifies all other nodes about it using Bracha’s uniform broadcast
protocol [17]. If the leader becomes suspected, then it is being replaced using an agreement
protocol. BFT uses MACs to authenticate all messages and public-key cryptography is
used only to exchange the symmetric key pairs to compute the MACs. We have also taken
the approach of using symmetric key pairs. However, we implement a fully fledged Group
17
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Communication system with a proven scalability of up to 50 nodes, whereas BFT only
supported active replication.
The Query/Update (a.k.a. Q/U) protocol offers an optimistic quorum approach for
providing efficient Byzantine tolerant replication [2]. According to the Q/U protocol, clients
access servers trying to obtain a quorum that ensures atomic execution of queries and
updates. However, unlike traditional quorum based approaches, the Q/U protocol enables
a one pass execution of operations in “good scenarios”, i.e., when there are no conflicting
accesses to the same objects. On the other hand, concurrent accesses are resolved using
a probabilistic back-off protocol. This same technique is used to eliminate the need for
locks while supporting read-modify-write semantics as well as operations accessing multiple
objects atomically. The benefit of this approach, compared to consensus based approaches
like BFT, is in its improved fault-scalability, or in other words, the performance degradation
involved in tolerating an increasing number of faults. This approach can be viewed as
trading guaranteed termination for probabilistic one (due to their back-off protocol), in
exchange for better scalability.
Another approach that exploits speculation extensively to reduce the overhead and the
latency of BFT replication is Zyzzyva [72]. In failure-free and synchronous executions,
Zyzzyva is extremely efficient since requests complete in 3 one-way message delays. Yet, if
there is only single faulty replica Zyzzyva requires 5 one-way message delays. To decrease
the latency, the authors propose Zyzzyva5, which requires 5f + 1 replicas and completes
requests in 3 one-way message delays even if there are f faults in the replicas (except for
the primary).
The BAR (Byzantine, Altruist, Rational) framework was recently introduced in order
to handle systems in which some nodes are Byzantine, some are rational, and the rest are
altruist [4]. That is, Byzantine nodes can deviate arbitrarily from their protocol, rational
nodes only deviate from the protocol is they can gain something by that, and altruists always
obey the protocol. A generic set of services that accommodate this generalized failure model
was also developed, as well as a specific collaborative storage service, nicknamed BAR-B [4].
The BAR model is more suitable for collaborative systems, in which services are hosted on
the participants machines, and therefore it is likely that many of these nodes will only
participate if they are given an incentive to do so. On the other hand, this comes at a
18
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
substantial performance cost, and is thus not suitable for a dedicated servers based system.
It may be interesting to investigate adapting the BAR approach to ad-hoc networks, as the
latter is also prone to Byzantine and rational behavior.
Another optimized Byzantine atomic broadcast protocol is the Parsimonious Broadcast
protocol [96]. This protocol also includes a leader based optimistic phase and a recovery
phase. Yet, it utilizes a consistent broadcast service rather than uniform broadcast, in order
to reduce the message complexity from O(n2) to O(n). However, the protocol requires
public-key cryptography. Based on our results and the results of BFT [25], this might be a
limiting factor in its practical applicability (there are no performance measurements in [96]),
in particular with respect to throughput [45].
When the execution of client requests is computation-intensive, it is worth splitting the
decision on the execution order from the execution itself [125]. For example, in [95], they
use an agreement cluster of 3f + 1 nodes to decide only on the ordering of executions, and
then pass the execution itself to a set, called primary committee, of only f + 1 nodes. The
generated replies of the primary committee are then compared by the agreement cluster.
If all replies are the same, then they are returned to the client. Otherwise, if there is a
mismatch, the request is sent to additional f servers; a reply that repeats at least f + 1
times is declared correct and is sent to the client. In this case, a new primary committee
is also elected by the agreement cluster. The savings for compute-intensive requests comes
from the fact that on average, each request is executed by only f + 1 nodes. This is in
contrast with having a request executed on all 3f + 1 nodes required to decide on the
ordering. However, this only makes sense when the requests are indeed compute-intensive,
since the mechanism described above involves non-negligible overheads. Our results are
applicable to splitting approaches since they can help optimize the performance of the
agreement cluster.
The MAFTIA [119] project has explored two different approaches to building intrusion-
tolerant group communication protocols. The first approach is to use a linear secret sharing
scheme based on a generalized adversary structure that can model a more realistic set of fault
assumptions. The second approach is based on the use of a Trusted Timely Computing Base
(TTCB). Moreover, the use of TTCB was explored as another means of solving Byzantine
Consensus efficiently in [32].
19
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
In general, when it comes to benign failures, the approach of having a simple and efficient
protocol most of the time and only fixing things when needed has been practiced in the
area of group communication for a long time. For example, the Horus system included 4
optional total ordering protocols [52], two of which were leader based and two were token
based. In all of them, during normal computation, the protocol is very simple and proceeds
very efficiently, while a failure of the leader or token holder is being compensated for during
the computation of the new view.
20
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Chapter 3
Preliminaries
3.1 Ad-Hoc Networks Model
We assume a collection of nodes placed in a given finite size area. A node in the system
is a device owning an omni-directional antenna that enables wireless communication. A
transmission of a node p can be received by all nodes within a disk centered on p whose
radius depends on the transmission power, referred to in the following as the transmission
disk ; the radius of the transmission disk is called the transmission range. The combination
of the nodes and the transitive closure of their transmission disks forms a wireless ad-hoc
network.1
We denote the transmission range of device p by rp. This means that a node q can only
receive messages sent by p if the distance between p and q is smaller than rp. A node q is
a direct neighbor of another node p if q is located within the transmission disk of p. In the
following, N t(1, p) refers to the set of direct neighbors of a node p at time t and N t(k, p)
refers to the transitive closure with length k of N t(1, p) at time t. By considering N t(1, p)
as a relation (defining the set N t(1, p)), we say that a node p has a path to a node q at time
t if q appears in the transitive closure of the N t(1, p) relation.
As nodes can physically move, there is no guarantee that a neighbor q of p at time t will
remain in the transmission disk of p at a later time t′ > t. Additionally, messages can be1In practice, the transmission range does not behave exactly as a disk due to various physical phenomena.
However, for the description of the protocols in this work it does not matter, and on the other hand, a diskassumption greatly simplifies the formal model. At any event, our simulation results are carried on asimulator that simulates a real transmission range behavior including distortions, background noise, etc.
21
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
lost. For example, if two nodes p and q transmit a message at the same time, then if there
exists a node r that is a direct neighbor of both, then r will not receive either message, in
which case we say that there was a collision. Yet, we assume that a message is delivered
with positive probability.
3.2 Malicious Failures
A node is said to incur a malicious failure if it tries to harm the execution of the protocol
that it is supposed to perform. A node that incurs a malicious failure is referred to as a
malicious node, while the rest of the nodes are called correct. In particular, a malicious node
can avoid generating messages that it is expected to, fail to deliver messages it received from
the network, send different versions of a message to different nodes, etc. If a node is correct,
then it is presumed to be correct throughout the execution of the protocol. A special type of
failure is called crash, which means that the process stops incurring and generating events
of any kind. Note that a malicious failure is a sub-instance of a Byzantine failure where
a node can deviate in any arbitrary manner from its specification [74]. In this work, we
assume that up to f out of the total of n nodes in the system may experience some kind of
failures.
3.2.1 Limitations on the power of the malicious nodes
We assume that the malicious nodes are limited by the following restrictions:
1. Pairs of correct processes who are neighbors of each other are connected by fair-lossy
communication channels [3]. In other words, each message has a positive probability
of being delivered and malicious nodes cannot constantly collide all the messages in
their area.
2. The malicious nodes cannot impersonate other nodes. This can be realized using
cryptography [106].
3. Either malicious nodes cannot control the way the devices moves or there is a limita-
tion on their speed.
4. The dynamic subgraph of correct nodes is continuously connected.
22
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Validity and implications of the assumptions
The first assumption is necessary, since otherwise, malicious nodes can cause a denial of
service attack by constantly sending messages that jam the network (at the MAC level).
This assumption is legitimate whenever the malicious nodes are battery operated, in which
case, after some time, their battery will drain and they will not be able to constantly collide
all the messages. Also, this assumption can be reasonable if the malicious code is only
executed in user mode, and therefore has no direct access to the wireless network card.
Alternatively, various electronic warfare techniques, such as frequency hopping can be used
to overcome jamming attacks [29].
The second assumption is common in researches that deal with malicious nodes. Without
this assumption, the malicious node can pretend to be someone else, send corrupt messages
and cause other nodes to suspect correct nodes. This assumption can be fulfilled using
cryptography. The price is having cryptographic infrastructure in place.
The third assumption prevents malicious nodes from harming the communication between
correct nodes in one part of the network and instantly move to another part of the network,
to harm the communication in that part of the network as well, and so on. This assump-
tion is valid whenever malicious nodes are not much stronger than the correct nodes, or
maliciousness is caused by code intrusion to some user’s mobile devices and therefore their
speed cannot be much higher than the speed of correct nodes.
Finally, assumption 4 guarantees that malicious nodes cannot disconnect the subgraph of
correct nodes. This assumption enables to disseminate all the messages to correct nodes
relatively fast. It is possible to slightly weaken this assumption by assuming the following:
Let V ∗ be the set of correct nodes. Then,
∀s, d, t ∃v1, v2, ..., vk ∈ V ∗, t1, t2, ..., tk−1 such that(tk−1 ≥ tk−2 ≥ ... ≥ t1 > t
) ∧ (v1 ∈
N t(1, s), v2 ∈ N t1(1, v1), ..., vi ∈ N ti+1(1, vi+1), ..., vk ∈ N tk−1(1, d))
The implication of weakening assumption 4 is that the dissemination of messages to correct
nodes will take more time and each node will have to store all the messages until all the
correct nodes will receive them. Hence, it may cause both higher latency and a considerable
increase in the size of buffers that every correct node should have in order to disseminate
all the messages to all correct nodes.
23
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
3.3 Failure Detectors
Failure detector is a distributed oracle that provides hints about the operational status of
processes. Each local oracle (module) monitors a subset of the processes in the system, and
maintains a list of those that it currently suspects to be faulty. Each failure detector oracle
can make mistakes by erroneously adding processes to its list of suspects: i.e, it can suspect
that a process p is faulty even though p is still running and behaves in a correct way. If
this module later believes that suspecting p was a mistake, it can remove p from its list of
suspected nodes. Thus, each module may repeatedly add and remove processes from its list
of suspects. Furthermore, at any given time the failure detector modules at two different
processes may have different lists of suspects.
An inspection of many middleware systems and protocols implementations shows that
most of the messages that are sent in those systems have a header part and a data part.
The header part can often be anticipated based on local information only while the data
part cannot. For example, the type of a message, the id of the sender, and a sequence
number of the message are part of the header. On the other hand, the information that the
application level intended to send is part of the data.
Based on this, we define a mute failure as failure to send a message with an expected
header w.r.t. the protocol. The process is considered as mute if it experiences a mute
failure. Similarly, a verbose failure is sending messages too often w.r.t. the protocol and the
process is considered as verbose if it experiences a verbose failure. Note that both types of
failures can be detected accurately in a synchronous system based on local knowledge only.
This is because in synchronous systems each message has a known bounded deadline, so it
is possible to tell that a message is missing. Similarly, it is possible to accurately measure
the rate of messages received and verify that it is below an agreed upon threshold.
Obtaining synchronous communication in ad-hoc networks with standard hardware and
operating systems is extremely difficult. On the other hand, observations of communication
networks indicate that they tend to behave in a timely manner for large fractions of the
time. This is captured by the notion of the class ♦Pmute of failure detectors [11, 37, 38, 48].
This class includes all failure detectors that satisfy the following properties:
• Muteness Strong Completeness: Eventually, every mute or permanently disconnected
24
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
process is permanently suspected by every correct process.
• Eventual Strong Accuracy: There is a time after which no correct process that is not
disconnected is suspected.
In other words, such failure detectors are assumed to eventually (i.e., during periods of
timely network behavior) detect mute failures accurately. In this eventuality, all nodes
that suffer a mute failure are suspected (known as completeness) and only such nodes
are suspected (known as accuracy). This approach has the benefit that all synchrony
assumptions are encapsulated behind the functional specification of the failure detector (i.e.,
its ability to eventually detect mute failures and verbose failures in an accurate manner).
This also frees protocols that are based on such failure detectors from the implementation
details related to timers and timeouts, thus making them both more general and more
robust.
In a similar manner to ♦Pmute, we define ♦P verbose as a class of failure detectors that
eventually reliably detect verbose failure and ♦P trust as a class of failure detectors that
eventually reliably detect node misbehavior. During this work we assume that the failure
detector Mute is in the class ♦Pmute, Verbose is in the class ♦P verbose and Trust failure
detector is in the class ♦P trust.
An inherent problem with malicious failures is that by definition they are tightly related
to the semantics of a given protocol. Thus, a pure general detector, like the Chandra
and Toueg ones [27] can never be used to detect them. On the other hand, modularity
principles advocate the use of a failure detection module rather than having an ad-hoc
detection mechanism interleaved in the code of each protocol.
3.3.1 Interfacing with the Failure Detectors
Recall that the goal of the Mute failure detector is to detect when a process fails to send a
message with a header it is supposed to. To notify this failure detector about such messages,
its interface includes one method called expect (see Figure 3.1). This method accepts as
parameters a message header to look for, a set of nodes that are supposed to send this
message, and an indication if all of these nodes must send the message or only one of them
is enough. Note that the header passed to this method can include wildcards as well as
25
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Muteexpect(message header,set of nodes,one or all)
This method notifies the Mute failure detector about an expected message.It accepts as parameters the expected message header,
the set of nodes that are supposed to send the message,and a one or all indication.
The latter parameters indicates if ALL nodes are assumed to send the messageor only ONE of them.
Verboseindict(node id)
This method indicts a node with node id for being too verboseIt causes the Verbose failure detector to increment the suspicion level of node id.
Trustsuspect(node id,suspicion reason)
This method notifies the Trust failure detectors that thelevel of trust of node node id should be reduced based on the provided suspicion reason.
Figure 3.1: Failure Detectors’ Interfacezelitp i`lb ly wynn :3.1 xei`
exact values for each of the header’s fields. In this work we do not focus on how such a
failure detector is implemented. Intuitively, a simple implementation consists of setting a
timeout for each message reported to the failure detector with the expect method. When
the timer times out, the corresponding nodes that failed to send anticipated messages are
suspected for a certain period of time (see discussion in [37, 38]).
The goal of the Verbose failure detector is to detect verbose nodes. Such nodes try
to overload the system by sending too many messages that may cause other nodes to react
with messages of their own, thereby degrading the performance of the system. Detecting
such nodes is therefore useful in order to allow nodes to stop reacting to messages from
these nodes. The interface of Verbose exports one method called indict. This method
simply indicts a process that has sent too many messages of a certain type.
Practically, we assume that Verbose maintains a counter for each node that was listed
in any invocation of its method. The counter is incremented on each such event, and after
a given threshold, the node is considered to be a suspect. Verbose also includes a method
that allows to specify general requirements about the minimal spacing between consecutive
arrivals of messages of the same type. Such a method is typically invoked at initialization
time. As it it is not directly accessed by our protocol’s code, we do not discuss it any
further.
26
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
In order to recover from mistakes, both the Mute and the Verbose failure detectors
employ an aging mechanism. That is, the suspicion counters for each node are periodically
decremented.
Finally, we also define the Trust failure detector, which maintains a trust level for every
node known to it. Trust suspects a node q if q is suspected by either the Mute failure
detector or the Verbose failure detector, or if the suspect method has been invoked for q
(if, for example, q tried to send a forged message).
27
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
28
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Chapter 4
Reliable Probabilistic
Dissemination in
Wireless Ad-Hoc Networks
This chapter studies three common approaches for achieving scalable reliable broadcast in
ad-hoc networks, namely probabilistic flooding, counter based broadcast, and lazy gossip. The
strength and weaknesses of each scheme are analyzed, and a new protocol that combines
these three techniques, called RAPID, is developed.
Specifically, the analysis in this work focuses on the tradeoffs between reliability (per-
centage of nodes that receive each message), latency, and the message overhead of the
protocol. Each of these methods excels in some of these parameters, but no single method
wins in all of them. This motivated the need for a combined protocol that benefits from all
of these methods and allows to trade between them smoothly.
4.1 System Model
We assume the same model as Section 3.1. We also assume that some of the nodes may act
selfishly, i.e., they may refuse to forward messages of other nodes. Such nodes are called
selfish whereas the others are called correct. We assume that the correct nodes in the system
continuously form a connected sub-network. More severe malicious behavior is discussed
29
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
in Sections 4.3.3 and 4.4.2.
4.2 Common Reliable Dissemination Techniques
In this section we present the various techniques used for dissemination in wireless ad hoc
networks and discuss their properties.
4.2.1 Probabilistic Flooding
In the probabilistic approach, whenever a node receives a message, it applies some locally
computable probabilistic mechanism to randomly determine whether it should broadcast
the message or not [24, 57, 76]. Probabilistic protocols are appealing since they are very
simple and are inherently robust to failures and mobility. Moreover, these protocols enable
messages to advance asynchronously, and therefore they exhibit very low latency in deliv-
ering messages. Yet, as was empirically discovered in [57, 76, 104], in order to obtain very
high reliability levels with pure probabilistic broadcasting, one has to set the retransmission
probability to high values. This in turn translates into a very large number of redundant
messages.
Below, we obtain the following results about probabilistic flooding: We provide a model
for analyzing the tradeoff in probabilistic flooding between the number of randomly selected
nodes that retransmit each message in a given one hop neighborhood and the expected reli-
ability level. In other words, this analysis formally captures the tradeoff between efficiency
and reliability offered by pure probabilistic flooding. This, for example, enables designers
to decide on a forwarding probability based on their goals w.r.t. this tradeoff.
Second, our formal analysis shows that in order to achieve a given tradeoff point between
reliability and efficiency, it is enough that a constant number of nodes in each one hop
neighborhood will retransmit a message. Constant here means independent of the nodes
density. In other words, the forwarding probability of each node should be set in reverse
proportion to the size of its neighborhood. This probability can be expressed as β/ni,
where ni is the neighborhood size of node i and β is the required constant of forwarders.
Further, the behavior of the reliability w.r.t. forwarding probability is concaved with a
knee at values of β between 2.5 and 3.5 (Figure 4.1). Setting the forwarding probability to
30
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
these values results in delivery to 80%-90% of the nodes very quickly and very efficiently.
However, for boosting the reliability beyond these levels, it makes more sense to utilize some
complementing measures.
Finally, we show that regardless of the forwarding probability, pure probabilistic proto-
cols cannot ensure 100% reliability. This again hints that probabilistic flooding should be
aided by another mechanism if one wishes to ensure extremely high levels of reliability. We
now turn to the details of the analysis.
Formal Analysis of Probabilistic Flooding Probability
Our theoretical analysis in this section relies on a formal graph model of 2-dimensional
wireless ad hoc networks. The network connectivity graph G = (V, E) of an ad hoc network
is a special case of a d-dimensional Unit Disk graph, in which n nodes are embedded in
the surface of a d-dimensional unit torus, and any two nodes within Euclidean distance r of
each other are connected. When the nodes are placed uniformly at random on the surface
the graph is known as a Random Geometric Graph (RGG) [91] and is denoted by Gd(n, r).
Specifically, the G2(n, r) graph is often used to model the network connectivity graph of
2-dimensional wireless ad hoc networks and sensor networks [55]. In our case we assume n
nodes are placed uniformly at random in the rectangular area [a, b] and the transmission
radius r is scaled accordingly such as G2(n, r) is connected with high probability. It has been
shown (by Gupta and Kumar [55] and [87]) that for r satisfying πr2 ≥ ab log n+c(n)n , G2(n, r)
is asymptotically connected with probability one if and only if c(n) → ∞ as n → ∞. We
will therefore assume that r satisfies the above condition and the network is connected.
We stress here that the uniform distribution of nodes in the space is only used in the
theoretical analysis of this section, in order to set the retransmission probability in the
most efficient way. The correctness of RAPID does not depend on this assumption. If
the uniformity assumption does not hold, our protocol in Section 4.3 will ensure reliable
delivery in any case, alas possibly with higher communication cost.
Denote by davg the average number of neighbors of any node in G2(n, r). It is well
known that davg ≤ πr2(n−1)ab and for large networks, when the edges effect is negligible,
davg ∼ πr2(n−1)ab . It has been previously shown in [12] that the maximal and minimal
31
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
degrees are of the order of davg with high probability. That is, the actual degree of any
node in G2(n, r) is close to davg with high probability.
Assume some broadcasting algorithm A, which picks for every message m, a set of nodes
S that transmit m. Every node in S is picked with probability P = βdavg
from all network
nodes, independently from all other nodes, where β is a parameter called the reliability factor
of algorithm A. Informally, β is the average number of nodes in each one-hop neighborhood
that retransmit m. Also assume that a message that was sent has a probability Q to be
successfully received by a neighboring node. Let Yp be a random variable corresponding
to the number of times that node p has received a given message. We calculate below an
upper bound on the probability that an arbitrary node will not receive m, or in other words,
Pr(Yp = 0).
Claim 4.2.1 For any node p, the probability that p does not receive a message m is upper
bounded by e−βQ.
Proof: S is the set of all nodes that transmit message m. Notice that the size of S is a
binomial random variable with mean nP.
For each q ∈ S and any node p, let Xp,q be a 0-1 random variable indicating whether
the node p receives a message m that was sent by the node q or not. Node p can receive a
message m sent by q if and only if q is a neighbor of p in G2(n, r) and m has not collided
with other messages. Since two nodes are neighbors if and only if they are at distance at
most r from each other, then Pr(Xp,q = 1) = Qπr2
ab .
Let Yp be the random variable indicating the number of times node p has received m.
Naturally, if p ∈ S, Pr(Yp = 0) = 0. Otherwise,
Pr(Yp = 0) =n∑
i=0
Pr(Yp = 0||S| = i) Pr(|S| = i) =
n∑
i=0
∏
q∈S,|S|=i
Pr(Xp,q = 0) Pr(|S| = i) =
n∑
i=0
(1−Qπr2
ab)i
(n
i
)P i(1− P)n−i =
32
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
n∑
i=0
(n
i
)(P − PQπr2
ab)i(1− P)n−i = (1− PQπr2
ab)n
≤ (e−PQπr2
ab )n = e− β
davgQπr2
abn ≤ e
− βab
πr2(n−1)Qπr2
abn ≤ e−βQ
In the forth line we have used the binomial coefficients formula and in the last line the
inequality 1− x < e−x, which holds for all x > 0.
Figure 4.1 depicts an upper bound e−βQ on the value of Pr(Yp = 0) for an arbitrary
node p as a function of β and Q.
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
Beta
Non
rec
eive
pro
babi
lity
Successfull receive probability 1Successfull receive probability 0.8Successfull receive probability 0.6
Figure 4.1: An upper bound on the probability that an arbitrary node does not receive amessage m
m drced lawi `l edylk znevy zexazqd lr oeilr mqg :4.1 xei`
It can be seen from the figure that the probability that a given node does not receive
a message m is small for quite small values of β. For example, for Q = 0.8, Pr(Yp = 0)
is less than 0.06 for β = 3.5. That is, if there are only β = 3.5 nodes in every one-hop
neighborhood that transmit m and Q = 0.8, approximately 94% of all nodes will receive m.
Discussion: A broadcasting algorithm that sets the retransmission probability P inversely
proportional to the average degree has a number of advantages. First, the number of
transmissions (which is equal to the average size of set S) is constant with respect to the
number of nodes n and to the nodes’ density.
33
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
E(|S|) = nP =nβ
davg=
nβπr2(n−1)
ab
≤ abβ
πr2.
That is, the number of transmissions does not depend on the overall number of nodes,
but rather only on the physical size of the network, the transmission radius and the re-
quired reliability level. Hence, for a given physical network, there is a minimal number of
retransmissions that is required to guarantee high broadcasting reliability, and this number
is constant with respect to the number of nodes and to the nodes’ density. In particular,
such a broadcast protocol is highly efficient in dense networks.
Second, a probabilistic broadcasting algorithm that picks nodes uniformly at random
with probability inversely proportional to the average degree, can achieve high coverage of
the network with relatively few redundant messages. Most (but not all) of the network
nodes will receive almost every message while using a relatively small group S.
On the Impossibility of Absolute Reliability
Notice that no pure probabilistic protocol can ensure absolute dissemination reliability.
Consider an example of a node q with neighbors n1, . . . , nk, each of which forwards each
message they receive with probability pni . Hence, with probability Prob = Πi=1,...,k(1−Pni),
no node will retransmit the message and therefore q will not receive the message. No matter
how high the probabilities Pni are, Prob is non zero and can sometimes be non negligible. In
particular, even if the average density across the whole network is high, if nodes are scattered
in a somewhat random manner, there is a likelihood that some parts of the network will
have low density. In those parts k can even be less than 2. Thus, the probability that
there will be some node q that will not receive some messages is non-negligible in any pure
probabilistic protocol.
4.2.2 Counter Based Broadcast
The shortcomings of probabilistic flooding has led to the development of the counter-based
approach [24, 57, 114, 115] and its distance-based and location-based derivatives (and their
combinations). The idea in these schemes is that rather than placing the randomness
directly on the retransmission probability, the randomness is placed on the timing of the
34
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
i
nk
1
s p q
n
n
Figure 4.2: A transmission by a node s can be received by all nodes within itstransmission range: p, n1, ...,nk
nk,... ,n1 ,p :ely xeciyd geeha miznvd lk lv` lawzi s znev ici lr xeciy :4.2 xei`
rebroadcasting. That is, every node p that receives a message m for the first time, decides
to rebroadcast the message after some random time. If during this chosen period p hears k
(the counter) retransmissions of m, then p decides to abort its retransmission.
Interestingly, this is another way to ensure a constant number of retransmissions in each
neighborhood. But, as opposed to the probabilistic method, the number of retransmissions
is deterministically guaranteed by the protocol. Despite this, as we show below, even the
counter based approach cannot guarantee reliable delivery of all messages on an arbitrary
topology. In fact, if we assume that the nodes are uniformly distributed in the network,
and that the random function used for setting the retransmission time is independent of
the node’s location, then we can utilize our formal analysis from Section 4.2.1 to calculate
the reliability level of a counter-based protocol for a given value of k.
Empirical studies have shown that counter-based schemes can obtain high delivery ratios
with relative efficiency [24, 57, 114, 115]. Yet, these works do not include a formal analysis
of this behavior. Moreover, as we now discuss, counter-based schemes are inherently slower
than probabilistic schemes.
Latency
As mentioned before, the rebroadcasting time of each node is set randomly. However, in
order for the protocol to succeed, the values should be set from a sufficiently large range so
that the number of collisions will be small, or even zero [60, 61]. In other words, the range
35
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
from which the rebroadcast timing is chosen must be proportional to the number of nodes
in each neighborhood. For ensuring zero collisions, by using the birthday paradox, we can
deduce that the range should be roughly sl×n2i , where sl is the minimal slot required for a
message transmitted by one node to be heard by any other node in its neighborhood and ni
is the size of the neighborhood of node i. For example, routing protocols in ad hoc network
usually apply a random delay uniformly distributed between 0 and 10 milliseconds [19].
On the other hand, with probabilistic flooding as we suggest, and assuming β ≤ 3.5, at
most 3.5 nodes might retransmit simultaneously in each neighborhood. Hence, the jitter
applied to probabilistic forwarding can be much shorter than for counter-based schemes.
On the Impossibility of Absolute Reliability
We claim that no counter-based scheme can guarantee reliable delivery of all messages on
an arbitrary topology. Consider a scenario w.r.t Figure 4.2. When node s broadcasts a
message m, nodes p and n1, ..., nk receive it. If some of ni nodes rebroadcasts the message
before node p, p will refrain from rebroadcasting m and therefore q will not receive m. For
any counter-based scheme and for any value of the counter in p, there could be as much ni
nodes as needed, such that ni is a neighbor of s and p, but not of q. Then, all ni nodes
might rebroadcast m before p, by this satisfying the counter in p and preventing p from
rebroadcasting the message m.
4.2.3 Lazy Gossip
In lazy gossip [67], nodes periodically gossip with their neighbors about the ids of messages
they have received. Yet, this gossiping is performed in a deterministic manner, in the sense
that each node sends such a gossip message as a broadcast to all its neighbors. Whenever a
node q learns than one of its neighbors p has a message that q has missed, q explicitly asks
p to retransmit this message. Here, there can be a few optimizations such as broadcasting
requests for retransmissions, etc.
Lazy gossip incurs a constant per node message overhead due to the need to periodically
gossip about messages. The overall network overhead grows with the network density.
However, due to its deterministic nature, lazy gossip can obtain absolute reliability.
36
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
The shortcomings of lazy gossip mainly comes from its very high latency and the fact
that for reliability, it must gossip multiple times for each message. The latency stems from
the fact that messages are propagated only due to gossips, and these only occur periodically.
In order to keep the message overhead reasonable, gossips might be performed once every
several seconds, in which case forwarding a message across multiple hops can take dozens of
seconds. Also, due to message loss, obtaining absolute reliability involves unlimited memory
consumption and unbounded message sizes, at least in theory.
4.3 The RAPID Protocol
For didactic purposes, we develop our protocol in few steps. The basic version of our
protocol appears in Figure 4.3 whereas an enhanced version of the protocol that sends even
fewer messages and provides higher delivery ratio is depicted in Figure 4.4. A malicious
resilient version of RAPID, appears in Section 4.3.3. In all figures we make use of two
primitives. The primitive prob bcast denotes an immediate broadcast to all the direct
neighbors of the sender with a given probability. The primitive lazycast initiates periodic
broadcasting of the given message to the direct neighbors of the sender.
Our protocol is based on the following principles: Each node calculates its broadcast
probability according to the number of observed neighbors at a given moment. Since in
our protocol each node needs to know the number of its one-hop neighbors, every node
periodically sends a heartbeat/hello message (unless it has already sent another message
during a predefined time interval).
The rebroadcasting probability used by RAPID is set to min(1, β|Nt(1,k)|). β is a param-
eter of the protocol and corresponds directly to the communication overhead. For bigger β
higher reliability level is achieved, however with larger communication cost. As can be seen
in Figure 4.1 (the knee in the graph), a good tradeoff between the number of retransmissions
and the reliability level is achieved when β is set to around 3.5. We further explore the
effect of parameter β on RAPID in the simulation section.
In parallel, every node p periodically broadcasts to its neighbors the headers of messages
p received from other nodes, which is called gossiping. This technique enables nodes who
miss some messages that exist in the system to request these messages from their neighbors.
37
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Notice that nodes only send headers of messages they possess. Hence, the header of a mes-
sage that does not exist will not be disseminated in the network. Also, whenever possible,
gossip messages are piggybacked on other messages in order to further reduce the generated
traffic. Unlike many other gossiping mechanisms from distributed computing [14], in our
case, gossiping is deterministic, in the sense that a gossip message from p is broadcasted to
all of p’s neighbors at once.
When examining the graph in Figure 4.1, it can be seen that the reliability level obtained
depends on the probability that a transmission will not be lost. Specifically, in wireless
networks, most message losses are caused due to collisions. Hence, to reduce the chance
of collision, and thereby be able to obtain reliability levels similar to the bottom most
line of Figure 4.1, RAPID employs jitter. That is, when a node decides to rebroadcast a
message, it waits for a short random time before doing so. Hence, the small probability of
rebroadcasting plus the short jitter before rebroadcasting means that RAPID very rarely
causes message collisions. The value of jitter is discussed in Section 4.4.1.
4.3.1 Basic RAPID
The Dissemination Task in Details
This protocol is a combination of probabilistic flooding with lazy gossip. Hence, its message
dissemination consists of the following steps: (1) The originator p of a message m sends
m||header(m) to all nodes in N t(1, p) (Lines 1–4 in Figure 4.3). The header part of m
includes a sequence number and the identifier of the originator. (2) The originator p of m
then starts a periodic gossip of header(m) to all nodes in N t(1, p) (Line 5). (3) When a
node p receives a message m for the first time, p accepts m (Lines 6–7). (4) p broadcasts
m with probability min(1, β|Nt(1,p)|) (Line 8 – our protocol was simulated with β equals to
3.5). (5) p starts a periodic gossip of header(m) to all nodes in N t(1, p) (Line 9). (6) If a
node p receives a message m it has already received beforehand, then m is ignored.
Gossiping and Message Recovery in Detail
The gossiping and message recovery part of the protocol is composed of the following sub-
tasks:
38
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Upon send(msg) by application do(1) header := msg id||node id;(2) data msg := header||msg;(3) gos msg := header;(4) prob bcast(prob = 1, data msg, DATA);(5) lazycast(gos msg, GOSSIP);
Upon receive(msg, DATA or DATA REPLY) sent by pj do(6) if (have not received this msg before) then(7) Accept(pj , msg); /*forward to the application*/
(8) prob bcast(prob = min(1, β|Nt(1,p)| ), msg, DATA);
(9) lazycast(gos msg, GOSSIP);(10) endif;
Upon receive(gos msg, GOSSIP) sent by pj : do(11) if (there is no msg that fits the gos msg) then(12) /*Ask the neighbors to send the real message*/
(13) prob bcast(prob = min(1, β|Nt(1,p)| ), gos msg, REQUEST);
(14) endif;
Upon receive(gos msg, REQUEST) sent by pj do(15) if (I have the msg that matches gos msg) then
(16) prob bcast(prob = min(1, β|Nt(1,p)| ), msg, DATA REPLY);
(17) endif;
Figure 4.3: Basic RAPID (executed by node p)(p znev ici lr rvean) iqiqa RAPID :4.3 xei`
39
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
1. When p receives a message m, p gossips header(m) to other nodes in N t(1, p) (Lines 9).
Note that p does not forward gossips about messages it has not received yet. This is
done in order to make the recovery process more efficient.
2. When p receives a gossip header(m) for a message m it has not received yet, p asks its
neighbors to forward m to itself using a REQUEST message (Lines 11–14). Intuitively,
since p received a GOSSIP message about m, one of p’s neighbors should have m and
supply it when needed.
3. When p receives a REQUEST for a message m, yet p has not received m, p ignores
this request. Otherwise, p broadcasts the missing message (Lines 15–17).
One issue that needs to be taken care of is purging received messages, in order to avoid
unbounded memory requirements. This can be done either using timeouts, or by employing
a stability detection mechanism [54, 108]. In this work, we have chosen to use timeout
based purging due to its simplicity. Clearly, in this case there is a tradeoff in setting
the timeout value: a long timeout increases the reliability, but also increases the memory
consumption. From our experiments, it turns out that that even with short timeouts we
can reach reliability above 99.9% in most cases.
4.3.2 Enhanced RAPID
The basic RAPID protocol has an important drawback: if all nodes in a given neighborhood
decide not to broadcast a message, the dissemination of this message would be severely
delayed, as it will only be propagated through the gossip/request mechanism, which is slow.
In order to deal with this drawback and improve the reliability and the latency of
RAPID, we slightly change the protocol by adding a complementing counter-based like
mechanism. That is, whenever p initially probabilistically decides not to rebroadcast m,
but later on p does not hear any other rebroadcasting of m, then p adds m to its casting
queue. Thus, either p will hear a retransmission of m by one of its neighbors, or p will
retransmit m. This optimization of deciding to rebroadcast m even if initially a node p
probabilistically chose not to, but later did not hear any of its neighbors rebroadcast m
helps boosting the reliability of the protocol, by ensuring that a message will be propagated
to almost every neighborhood of the network.
40
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Upon send(msg) by application do(1) header := msg id||node id;(2) data msg := header||msg;(3) gos msg := header;(4) prob bcast(prob = 1, data msg, DATA);(5) lazycast(gos msg, GOSSIP);
Upon receive(msg, DATA or DATA REPLY) sent by pj do(6) if (have not received this msg before) then(7) Accept(pj , msg); /*forward it to the application*/
(8) cast queue.add(prob = min(1, β|Nt(1,p)| ), time=random(0, short jitter), msg, DATA);
(9) lazycast(gos msg, GOSSIP);(10) endif;
Upon receive(gos msg, GOSSIP) sent by pj : do(11) if (there is no message that fits the gos msg) then(12) /*Node asks from its neighbors to send the real message*/
(13) cast queue.add(prob = min(1, β|Nt(1,p)| ), time=random(0, short jitter), gos msg, REQUEST);
(14) endif;
Upon receive(gos msg, REQUEST) sent by pj do(15) if (I have the msg that matches gos msg) then
(16) cast queue.add(prob = min(1, β|Nt(1,p)| ), time=random(0, short jitter), msg, DATA REPLY);
(17) endif;
Interceptor(18) if (m that appears in cast queue was received) and (m.type==REQUEST or m.type==DATA REPLY) then(19) cast queue.remove(m);(20) endif;
Upon Expiration of timer of msg in cast queue do(21) cast queue.remove(msg);(22) pr = the probability attached to msg ;(23) type = the message type associated with msg ;(24) prob bcast(prob = pr, msg, type);(25) if (msg was not broadcasted) then(26) cast queue.add(prob = 1, time=long jitter, msg, type);(27) endif;
Figure 4.4: Enhanced RAPID (lines that were modified w.r.t Figure 4.3 are boxed whilelines 18–27 were added)
zexeye zipaln dqtewa ze`vnp 4.3 xei`l ziqgi ezpey xy` zexey) agxen RAPID :4.4 xei`(etqep 27--18
41
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
The Dissemination task in details
The pseudo-code for the enhanced version of RAPID is listed in Figure 4.4. In this code,
we use a queue called cast queue. The add method of this queue accepts the following
parameters. The sending probability, a time parameter, the message itself and the type of
the message. The time is used in order to set a timer to expire after the corresponding
amount of time elapses. The probability and type are stored alongside the message inside
the queue.
Dissemination in Enhanced RAPID works the same as in Basic RAPID (Section 4.3.1)
except for step 4 of the first paragraph of Section 4.3.1. In Enhanced RAPID, whenever
a node p receives a message m for the first time, it schedules a rebroadcast of m with
probability min(1, β|Nt(1,p)|) to occur after some random jitter (Line 8 in Figure 4.4). If a
received message has never been rebroadcasted, neither by p nor by any of its neighbors,
then p decides to rebroadcast m after all, by invoking prob broadcast with probability 1
(Lines 25–27).
Gossiping and Message Recovery in Detail
The main difference between gossiping in Basic RAPID vs. Enhanced RAPID is in the
cancelling of REQUEST and DATA REPLY messages. That is, in the enhanced protocol
every node p monitors its neighbors and if p planned to broadcast such a message m, but p
heard a transmission of m by its neighbor node, then p cancels the transmission of m. This
cancelling is done in order to eliminate redundant REQUEST and DATA REPLY messages
due to the broadcast nature of wireless networks. In addition, if p decided not to broadcast
m, but it does not hear the transmission of m by any of its neighbors, p broadcasts m.
These issues are handled in Lines 13, 15–17 and 25–27.
Latency of RAPID
In both RAPID and counter-based protocols [24, 57, 114, 115], nodes wait for a certain
amount of time before they rebroadcast a message. Yet, the average waiting time is much
shorter in RAPID than in counter based protocols. Notice that in Figure 4.4 we employ
two jitter lengths, short jitter and long jitter. The first is used to prevent collisions, while
42
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
the second is used as a corrective measure, as discussed above, and is similar to the counter
based approach. Notice that in order to be effective, the duration of the jitter must be
proportional to the number of expected concurrent transmissions. The expected number
of concurrent transmitters competing for transmission due to the probabilistic mechanism
is quite small (β). On the other hand, in the situations in which long jitter is used in our
protocol, and similarly in counter based protocols, all nodes in the neighborhood might
transmit concurrently. Hence, long jitter must be long enough to accommodate for that.
Consequently, short jitter can be much shorter than long jitter. For example, if the target is
to completely eliminate collisions with high probability, then following the birthday paradox,
the length of the jitter must be proportional to s2, where s is the expected number of
concurrent senders. Moreover, most times in RAPID the timer-based corrective measure
will not be used, so average latency is mostly dominated by short jitter. The actual values
used for both jitters are described in Section 4.4.1.
4.3.3 Maliciousness Resilient RAPID
Due to its probabilistic nature, RAPID can be resilient to many forms of malicious behavior.
Since the decisions that every node takes are based only on the number of its neighbors and
the transmissions it hears, the attacks that a malicious node can perform are quite limited.
We describe below how the protocol was modified in order to overcome these attacks.
Malicious Tolerant RAPID in Details
We use digital signatures in order to prevent a malicious node from forging others’ messages
or trying to impersonate other nodes. Each device p holds a private key kp, known only to
itself, with which p can digitally sign every message it sends [106]. We assume a malicious
node cannot forge signatures and that each device can obtain the public key of every other
device, and can thus authenticate the sender of any signed message.
The originator p of a message m adds two signatures to m before it broadcasts m. The
first signature is calculated on the concatenation of m, p’s node id, and m’s message id,
in order to bind between the context of the message, the node id of its originator and
the message id. The second signature is performed on the p’s node id and the message
id. The objective of the second signature being attached to the message is to speed up
43
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Upon send(msg) by application do
(1) gos msg := msg id||node id||sig(msg id||node id);
(2) data msg := msg id||node id||msg||sig(msg id||node id||msg)||sig(msg id||node id);
(3) prob bcast(prob = 1, data msg, DATA);(4) lazycast(gos msg, GOSSIP);
Upon receive(msg, DATA or DATA REPLY) sent by pj do
(5) if (verify signature(msg) = true) then
(6) if (have not received this msg before) then(7) Accept(pj , msg); /*forward it to the application*/
(8) cast queue.add(prob = min(1, β|trusted neighbors| ), time=random(0, short jitter), msg, DATA);
(9) lazycast(gos msg, GOSSIP);(10) endif;(11) else /* the message is not correct */
(12) suspect(pj);
(13) endif;
Upon receive(gos msg, GOSSIP) sent by pj : do
(14) if (verify signature(gos msg) = true) then
(15) if (there is no message that fits the gos msg) then
(16) expect(gos msg,pj);
(17) /* The node asks from the node that sent the gossip to send the real message */
(18) send(gos msg, REQUEST, pj);
(19) endif;(20) else /* the message is not correct */
(21) suspect(pj);
(22) endif;
Upon receive(gos msg, REQUEST, pk) sent by pj do
(23) if (verify signature(gos msg) = true) then
(24) if (I am pk and I have the msg that matches gos msg) then
(25) prob bcast(prob = 1, msg,DATA REPLY);
(26) endif;(27) else /* the message is not correct */
(28) suspect(pj);
(29) endif;
Figure 4.5: Maliciousness Resilient RAPID (lines that were modified w.r.t Figure 4.4 areboxed)
ze`vnp 4.4 xei`l ziqgi ezpey xy` zexey) zeipecf zelitp ipta cinr xy` RAPID :4.5 xei`(zipaln dqtewa
44
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
the dissemination of gossip messages in the system. That is, in our protocol, every time
a node q receives a data message m, q sends a gossip message about m to its neighbors.
However, the first signature binds both the message header (sender id and message id) with
the message data. Thus, a node that receives a message m cannot generate a valid gossip
message for m only based on the first signature. The second signature is the one that should
be sent with the gossip message. This enables any node that receives m to immediately
start gossiping about m, and be able to attach a valid signature that was generated by the
originator of m, to the gossip message. Otherwise, without the second signature, a receiver
q of m would have had to wait for a separate gossip message about m before q could have
started gossiping about m.
The pseudo-code for the maliciousness resilient protocol appears in Figure 4.5. Here
we introduce four new primitives: send, verify signature, suspect and expect, and the
retransmission probability is being computed based on the number of trusted neighbors
(trusted neighbors). The one-hop neighbors of a node p that it has not suspected yet of
being malicious form its set of trusted neighbors. The primitive send is a point to point
send. The primitive verify signature verifies that sig(m) matches m. If it does not
then m is ignored and the node that sent it is suspected by the receiver of the message.
The primitive suspect permanently removes a node pj that was caught forging a message
from the list of trusted neighbors (i.e., pj sent a message with a signature that fails to
authenticate). On the other hand, expect accepts two parameters: a gossip message and a
node id pj . In response, the node p that executed expect, sets a timer such that the given
message must be received from pj before the timer expires. If such a message is not received
in time, then pj is temporarily removed from the list of trusted neighbors of p. We use it to
temporarily suspect a node, which sent us a gossip but refused to deliver the corresponding
message.
As mentioned before, in the malicious resilient version of RAPID, each node only counts
its one-hop neighbors that it has not suspected yet of being malicious. This is because if
a node is malicious, it might not execute the protocol correctly, and in particular refuse
to forward some messages even when it should do so probabilistically. Hence, if a correct
node p is located in an area with many malicious nodes, then p’s broadcast probability
will become higher due to the fact that it will ignore those malicious nodes in counting its
45
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
neighbors. Even if malicious nodes manage to mislead a correct node p by pretending to
be correct nodes, the worst thing that can happen is that p’s broadcast probability will
be lower. In this case, any message m that is not sent by the probabilistic rebroadcasting
mechanism will still be forwarded to p’s neighbors either if p does not hear a retransmission
by any of its neighbors or via the gossip/request protocol. Either way, the reliability of the
protocol will not be degraded. The only thing that can suffer is the latency of delivering
the message to all the nodes.
Also, notice that the protocol in Figure 4.5 uses point-to-point requests (for missing
messages) and unconditional replies (node that was requested a message will send it to the
requesting node regardless of other nodes and other messages), rather than probabilistically
broadcasting requests and replies as in the previous versions of the protocol. This is done in
order to prevent attacks in which malicious nodes “convince” some nodes not to send their
messages. For example, consider the following scenario, which is possible with the recovery
scheme of Figure 4.4. A malicious node p can continuously broadcast REQUEST messages
such that its close neighbors will hear the transmission of the messages, while the rest of
its neighbors will not hear the transmissions of those REQUEST messages. Consequently,
the nearby neighbors of p will not broadcast REQUEST messages even if they miss some
messages, since they have heard the transmissions of the corresponding REQUEST messages
by p. Hence, these neighbors of p will never obtain messages that they failed to receive using
the probabilistic dissemination phase. A similar attack is for a malicious node p to always
rebroadcast DATA messages in response for REQUEST messages, but to do so such that
only the close neighbors of p will receive that DATA message, and will therefore never
retransmit it themselves. In this case, the other neighbors of p might never receive such
messages. Hence, by using point-to-point requests for missing messages, we slightly enlarge
the overhead of the protocol on one hand, but on the other hand, we increase the reliability
of the protocol.
It would have been possible to use a similar mechanism to the one used in Enhanced
RAPID in lines 13 and 16, but that would have required an additional twist. In order to
continue using the scheme of lines 13 and 16, each node would have had to store additional
information about messages it has decided not to broadcast due to broadcasts by its neigh-
bors. If some node p receives the same REQUEST (GOSSIP) message several times and
46
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
p has cancelled the rebroadcast of the corresponding DATA (REQUEST) message, then
p would have to rebroadcast the message (with probability 1) immediately. The code in
Figure 4.5 does not include this optimization for simplicity.
Resilience Against Malicious Attacks
Below we specify a number of specific attacks, which are being overcome by Maliciousness
Resilient RAPID. Those attacks include : (1) forwarding a message with the wrong data, (2)
not forwarding some/all messages (this is known as selfish behavior1), (3) sending gossip
messages without ever supplying the real messages in order to confuse other nodes, (4)
trying to collide others’ messages, and (5) sending messages as point-to-point messages
instead of broadcast messages, thus causing a correct node to decide not to rebroadcast a
message, even if it is the only one among all its neighbors that has received the message.
As mentioned above, the first attack is solved by adding signatures. That is, the origi-
nator of a message m signs the message with its private key and attaches this signature to
the message. Thus, every node p that receives m from q checks m’s signature and if the
signature does not match the content of m, p will suspect q and will not accept the message.
Moreover, p will no longer count q as one of its neighbors for the purpose of calculating the
rebroadcasting probability.
The second attack is solved by the monitoring mechanism. If a malicious node does
not rebroadcast a message m to all its neighbors, then our protocol guarantees that in any
case one of its neighbors will do it. Hence, as long as the correct nodes form a connected
sub-network, every message will be disseminated to all of them.
The third attack is solved using a simple timeout mechanism. When a node p receives
a gossip from q about a message m that p is missing, then in addition to sending a request
for m to q, p starts a timer. If p does not receive the requested message from q after
the timeout, it starts suspecting q as being malicious. In this case, p stops counting q for
calculating its rebroadcasting probability.
As for the fourth attack, in our model we assumed that all messages are delivered with a
non-zero probability. Hence, by assumption, the fourth attack is not possible. The rational1Giving incentives for nodes to participate is beyond the scope of this work. Here we only focus on
overcoming selfish behavior so that it does not prevent correct nodes from receiving messages, assumingthat the correct nodes form a connected sub-network.
47
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
behind this is twofold: first, if malicious nodes are allowed to collide all messages, then no
protocol can ensure reliable delivery. Second, if all nodes are battery operated, jamming
the channel will drain the battery very quickly, and hence such an attack cannot last for too
long. In particular, whenever malicious nodes are only selfish, rather than mean, then the
fourth attack does not make sense in any case, since it hurts everyone, including themselves.
Finally, if a malicious node sends a point-to-point message instead of rebroadcasting
it, our gossip mechanism will ensure that the message will still be propagated, yet with
an increased delay. In addition, some lower level mechanisms can be used, such as forcing
nodes to send messages and listen to messages only on IP-multicast addresses. Moreover, it
is possible to verify that a received IP-multicast message was also sent to a MAC destination
broadcast address rather than to a point-to-point destination address.
4.4 Simulations
In this section, we evaluate the performance of RAPID and compare it with the performance
of the counter-based protocol of Tseng et al. [114] and with the performance of the GOSSIP3
protocol [57]. In GOSSIP3, when a node q receives a message, it broadcasts the message
to its neighbors with probability P and with probability 1− P it discards the message. In
addition, q broadcasts a message if initially q got a message and did not broadcast it, but
later q did not get the message from at least M other nodes2. The reason for choosing
GOSSIP3 is that it is one of the best studied probabilistic protocols in the literature and
was found to be the best probabilistic broadcast mechanism among all the ones explored
in [57]. In our simulations we have measured the percentage of messages delivered to all the
nodes (delivery ratio), the latency to deliver a message to varying percentages of the nodes,
the load imposed on the network (number of transmitted messages) and the influence of
mute (selfish) nodes on the performance of our protocol.
4.4.1 Setup
We have used the JiST/SWANS simulator [116] to evaluate the protocols. In JiST/SWANS,
nodes use two-ray ground radio propagation model with IEEE 802.11 MAC protocol and2 GOSSIP3 was simulated with probability P = 0.65 and M=1.
48
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
54Mb/sec throughput. Communication between nodes is by broadcast. Two concurrent
broadcasts can collide, in which case, the messages will not be received by some of the
nodes. The collision may occur without the broadcasting node detecting the problem, a
phenomenon known as the hidden terminal problem [5]. In order to reduce the number
of collisions, we have employed a staggering technique (Figure 4.4). That is, each time a
node is supposed to send a message, it delays the sending by a random period, denoted by
short jitter, which was set to 3 milliseconds. In addition, in the TSENG protocol and in
the counter based mechanism in RAPID we have used a long jitter of 0.33× s2 millisecond
(where s is the expected number of concurrent senders).
The transmission range was set to roughly 200 meters3. The nodes were placed at uni-
formly random locations in a square area of 3500x3500 m2, and unless mentioned otherwise,
the results are reported for networks of 1,000 nodes, which corresponds to roughly 10 neigh-
bors per node. We have also checked other network sizes (2500x2500 m2 and 4500x4500 m2)
with similar density, but the results were qualitatively the same, regardless of the specific
network size and exact number of nodes. An additional analysis of varying network density
is presented in Section 4.4.2.
Mobility was modelled by the Random-Waypoint model [65]. In this model, each node
picks a random target location and moves there at a randomly chosen speed. The node
then waits for a random amount of time and then chooses a new location, etc. In our
case, the speed of movement ranged from 1-10 m/s. Being aware of recent criticisms of the
Random-Waypoint model [22], we set the pause time to be 0 seconds and discarded the first
1000 seconds of simulation time.
In our simulations the number of broadcasting nodes varied from 1 to 200 and the size
of data messages was set to 512 bytes (less than one UDP/IP packet). In every simulation,
every broadcasting node sends 100 messages and then after a cool down period the simula-
tion is being terminated. Each data point was generated as an average of 10 runs. Unless
otherwise mentioned, we use the default values defined in JiST/SWANS. We have used the
default Java pseudo random number generator, initialized with the current system time in
milliseconds as a seed.
3In SWANS one can choose the transmission power which translates into a transmission range based onpower degradation and background noise.
49
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
In the graphs, we have used the following notation: the enhanced version of our prob-
abilistic dissemination protocol from Figure 4.4 is denoted RAPID; a restricted version of
the enhanced RAPID in which the gossip and the recovery mechanism were disabled is
denoted RAPID-NO-GOSSIP; the counter-based protocol of Tseng et al. [114] is denoted
TSENG; GOSSIP3 is the probabilistic protocol by Haas et al. [57]. We limited the number
of times each message is gossiped by nodes in RAPID to 1. Additional gossip attempts
slightly improve the delivery ratios at the cost of additional messages.
4.4.2 Results
Broadcasting Probability - exploring β
Figures 4.6, 4.7, 4.8 and 4.9 explore the delivery ratio, the number of transmissions and
the latency (in seconds) against the broadcast probability of nodes in RAPID. Since the
broadcasting probability of node i is expressed as β/ni, where ni is the neighborhood size
of i, increasing the value of β leads to an increase in the broadcasting probability of i. The
following discussion and simulations analyze the influence of β values on the latency, the
delivery ratio and the number of transmissions of RAPID.
We can see in Figures 4.8 and 4.9 that when we increase the β value, the latency of
RAPID decreases. It can be explained by the fact that more nodes decide to broadcast
the received message and therefore more nodes receive messages from their neighbors by
the probabilistic mechanism and not due to the completion or recovery mechanisms. Yet,
when the value of β increases, more messages are injected into the network, as can be seen
in Figure 4.7. In addition, the value of β has hardly any influence on the reliability of
RAPID and even for β = 1.5, RAPID delivers all messages to more than 99% of nodes. In
this case more messages are delivered via the completion and the recovery phases, which
increases the latency, but still keeps the reliability of RAPID as high as 99%, as can be seen
in Figure 4.6. Hence, the decision of whether to use RAPID with low or high value of β
can be made based on the tradeoff between the latency and the message load for a given
application. In the following sections, we present RAPID with β = 3.5 since it gives a good
tradeoff between throughput and latency.
50
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
0 1 2 3 4 5 6 7 80
10
20
30
40
50
60
70
80
90
100Mobility=RandomWaypoint;#nodes=1000;#senders=100;length=3500 m
% re
ceiv
ed m
essa
ges
beta
RAPIDRAPID−NO−GOSSIP
Figure 4.6: Message delivery ratio when allnodes are mobile vs. varying values of β
lk lv` elawzdy zerced jqn feg` :4.6 xei`β ikxra zelzk ,miciip miznvd xy`k miznvd
mipey
0 1 2 3 4 5 6 7 80
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5x 10
6 Mobility=RandomWaypoint;#nodes=1000;#senders=100;length=3500 m
#tra
nsm
issi
ons
beta
RAPIDRAPID−NO−GOSSIP
Figure 4.7: Network load in terms of totalnumber of transmissions when all nodes are
mobile vs. varying values of β,miciip miznvd xy`k zyxd lr qner :4.7 xei`
mipey β ikxra zelzk
Changing the Number of Broadcasting Nodes
Figures 4.10 and 4.11 present a comparison of RAPID with other protocols in mobile net-
works. Figure 4.10 shows the percentage of nodes that received all messages vs. the number
of nodes that initiate one new broadcast per second. RAPID delivers a very high percent-
age of messages (99.9%), even when the number of broadcasting nodes is as high as 200.
RAPID-NO-GOSSIP, GOSSIP3 and TSENG also deliver high percentage of messages when
the number of broadcasting nodes is relatively small (about 50 nodes). Yet, when the num-
ber of broadcasting nodes increases and more messages are injected into the network, the
percentage of messages that RAPID-NO-GOSSIP, GOSSIP3 and TSENG deliver to all the
nodes decreases substantially. The reason for this degradation is the fact that when the
number of concurrent messages in the system is too high, many collisions occur causing
messages to be lost. Given that RAPID-NO-GOSSIP, GOSSIP3 and TSENG only employ
a probabilistic dissemination mechanism, they cannot recover these lost messages.
Interestingly, the gap between the reliability of RAPID-NO-GOSSIP and TSENG and
the reliability of GOSSIP3 grows as the number of broadcasting nodes is increased. This
is because RAPID-NO-GOSSIP and TSENG generates significantly fewer messages than
51
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
0 10 20 30 40 50 60 70 80 90 1000
0.05
0.1
0.15
0.2
0.25
0.3
#protocol=RAPID;#nodes=1000;#senders=100;length=3500 m
late
ncy
% nodes
beta−1.5beta−2.5beta−3.5beta−4.5beta−5.5beta−6.5beta−7.5
Figure 4.8: Latency to deliver a message toX% of the nodes when all nodes are mobile
vs. varying values of β (with 100broadcasting nodes)
X% l drced xiardl xefg` onf :4.8 xei`zelzk ,miciip miznvd lk xy`k miznvdnmigley miznv 100 xy`k) mipey β ikxra
(zeycg zerced
0 1 2 3 4 5 6 7 80
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Mobility=RandomWaypoint;#nodes=1000;#senders=100;length=3500 m
#lat
ency
−98
beta
RAPID−98RAPID−NO−GOSSIP−98
Figure 4.9: Latency to deliver a message to98% of the nodes when all nodes are mobile
vs. varying values of β (with 100broadcasting nodes)
98% l drced xiardl xefg` onf :4.9 xei`zelzk ,miciip miznvd lk xy`k miznvdnmigley miznv 100 xy`k) mipey β ikxra
(zeycg zerced
0 20 40 60 80 100 120 140 160 180 2000
10
20
30
40
50
60
70
80
90
100Mobility=RandomWaypoint;#nodes=1000;length=3500 m
%re
ceiv
ed m
essa
ges
#senders
RAPIDRAPID−NO−GOSSIPGOSSIP3TSENG
Figure 4.10: Message delivery ratio when allnodes are mobile vs. varying number of
broadcasting nodeslk lv` elawzdy zerced jqn feg` :4.10 xei`xtqna zelzk ,miciip miznvd xy`k miznvd
zeycg zerced mixcynd miznvd
0 10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
7x 10
6 Mobility=RandomWaypoint;#nodes=1000;length=3500 m
#tra
nsm
issi
ons
#senders
RAPIDRAPID−NO−GOSSIPGOSSIP3TSENG
Figure 4.11: Network load in terms of totalnumber of transmissions when all nodes aremobile vs. varying number of broadcasting
nodesmiznvd xy`k zyxd lr qner :4.11 xei`mixcynd miznvd xtqna zelzk ,miciip
zeycg zerced
52
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
0 10 20 30 40 50 60 70 80 90 1000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45Mobility=RandomWaypoint;#nodes=1000;#senders=100;length=3500 m
late
ncy
% nodes
RAPIDRAPID−NO−GOSSIPGOSSIP3TSENG
Figure 4.12: Latency to deliver a message toX% of the nodes when all nodes are mobile
(with 100 broadcasting nodes)X% l drced xiardl xefg` onf :4.12 xei`
100 xy`k) miciip miznvd lk xy`k miznvdn(zeycg zerced migley miznv
0 20 40 60 80 1000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16Mobility=RandomWaypoint;#nodes=1000;#senders=100;length=3500 m
late
ncy
% nodes
RAPID−0RAPID−50RAPID−100RAPID−200
Figure 4.13: Latency to deliver a message toX% of the nodes when all nodes are mobile
vs. varying number of selfish nodes (with 100broadcasting nodes)
X% l drced xiardl xefg` onf :4.13 xei`zelzk ,miciip miznvd lk xy`k miznvdn
miznv 100 xy`k) miikep`d miznvd xtqna(zeycg zerced migley
0 20 40 60 80 100 120 140 160 180 2000
10
20
30
40
50
60
70
80
90
100
#nodes=1000;length=3500 m
%re
ceiv
ed m
essa
ges
#senders
RAPID−MOBILERAPID−STATICGOSSIP3−MOBILEGOSSIP3−STATICTSENG−MOBILETSENG−STATIC
Figure 4.14: Message delivery ratio vs.varying number of broadcasting nodes
(compare protocols both in static and mobileenvironments)
lk lv` elawzdy zerced jqn feg` :4.14 xei`mixcynd miznvd xtqna zelzk miznvdxy`k milewehext z`eeyd) zeycg zerced
(migiipe miciip miznvd
0 20 40 60 80 100 120 140 160 180 2000
2
4
6
8
10
12
x 106 #nodes=1000;length=3500 m
#tra
nsm
issi
ons
#senders
RAPID−MOBILERAPID−STATICGOSSIP3−MOBILEGOSSIP3−STATICTSENG−MOBILETSENG−STATIC
Figure 4.15: Network load in terms of totalnumber of transmissions vs. varying numberof broadcasting nodes (compare protocolsboth in static and mobile environments)xtqna zelzk zyxd lr qner :4.15 xei`
z`eeyd) zeycg zerced mixcynd miznvd(migiipe miciip miznvd xy`k milewehext
53
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
0 100 200 300 400 500 600 700 800 900 10000
10
20
30
40
50
60
70
80
90
100
Mobility=Static;#senders=100;length=2500 m
%re
ceiv
ed m
essa
ges
#nodes
RAPID−100RAPID−NO−GOSSIP−100GOSSIP3−100TSENG−100
Figure 4.16: Message delivery ratio when allnodes are static vs. varying density (with 100
broadcasting nodes)lk lv` elawzdy zerced jqn feg` :4.16 xei`zetitva zelzk ,migiip miznvd xy`k miznvd
zerced migley miznv 100 xy`k) miznvd(zeycg
0 100 200 300 400 500 600 700 800 900 10000
1
2
3
4
5
6
7x 10
6 Mobility=Static;#senders=100;length=2500 m
#tra
nsm
issi
ons
#nodes
RAPID−100RAPID−NO−GOSSIP−100GOSSIP3−100TSENG−100
Figure 4.17: Network load in terms of totalnumber of transmissions when all nodes are
static vs. varying density (with 100broadcasting nodes)
miznvd xy`k zyxd lr qner :4.17 xei`100 xy`k) miznvd zetitva zelzk ,migiip
(zeycg zerced migley miznv
GOSSIP3 and therefore there are fewer collisions. Recall that the rebroadcasting proba-
bility of GOSSIP3 is fixed at 0.65. Conversely, in RAPID-NO-GOSSIP, (and RAPID) the
rebroadcasting probability is set to the minimal number required to ensure continued dis-
semination with high probability, depending on the number of observed neighbors of each
node. Practically, with this specific network density, in our protocol the rebroadcasting
probability is close to 0.35. This can also be observed when looking at the total number of
transmissions, which is reported in Figure 4.11.
We can also observe in Figure 4.11 that RAPID sends more messages than RAPID-NO-
GOSSIP and TSENG, in order to overcome the collisions and message loss. Hence, the
decision of whether to use RAPID or RAPID-NO-GOSSIP (or TSENG) can be made based
on the tradeoff between reliability and load for a given application.
Figure 4.12 explores the latency to deliver messages to a varying percentage of the nodes
when the number of broadcasting nodes is 100. As can be seen by the graphs, GOSSIP3 is
significantly faster than all other protocols. Yet, GOSSIP3 delivers messages only to 95%
of the nodes, while RAPID delivers the messages to 99.6% of the nodes within 0.15 seconds,
which is good enough for most envisioned applications of MANET. In the famous “no
54
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
free lunch” analogy, RAPID trades off latency (but still keeps it reasonable) for increased
reliability and reduced message overhead. As expected, RAPID is much faster than TSENG
due to the fact that the timeout between broadcasts of nodes in RAPID is smaller than the
timeout of broadcast in TSENG as it was explained in 4.3.2 and the recovery of missing
messages in RAPID is faster than the completion protocol in TSENG. Finally, RAPID
(with gossip) is faster than RAPID-NO-GOSSIP due to the recovery protocol that it runs
in parallel to probabilistic dissemination.
Impact of Mobility
Figures 4.14 and 4.15 explore the impacts of mobility. We have run simulations while varying
the speed of nodes (from 1 to 10 meters/sec) and discovered that the results are qualitatively
the same. Thus, we only present the results when the speed of nodes was between 1 and
5 meters/sec and when all the nodes are static. As can be seen in Figures 4.14 and 4.15,
when nodes are mobile, the performance of RAPID (in terms of delivery ratio and number
of transmitted messages) is slightly better than when all nodes are static. This is because
with mobility, the information about messages propagates faster to all areas of the network.
Additionally, when a node moves, its chances of overhearing a message in one of the visited
locations are higher than when it stays in the same place. Finally, when nodes move, they
appear to be in more neighborhoods, which slightly reduces the retransmission probability.
Network Density
Figures 4.16 and 4.17 explore the delivery ratio and the number of transmissions against the
density of the nodes. We can see that when the number of nodes is 200 and the network size
is 2500x2500 m2 (the average density is about 4 nodes per neighborhood), RAPID with 100
broadcasting nodes delivers all messages to 52.4% of the nodes, while GOSSIP3 delivers all
messages to 38.04% of the nodes and TSENG delivers all messages to 42.5% of the nodes.
These results are explained by the very poor network connectivity. We can also see that
when the number of nodes is 400 (the average density is about 8 nodes), GOSSIP3 with 100
broadcasting nodes delivers all messages to 94.9% of the nodes, Tseng delivers all messages
to 95.4% of the nodes, and the delivery ratio of RAPID is above 98%.
55
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Interestingly, this echoes the results of [102]. Moreover, we know from Gupta and
Kumar’s connectivity bound for ad hoc networks [55] that the networks’ connectivity is
ensured with high probability when r ≥ a
√C ln(n)
n , with r being the transmission range,
a the length of the network area, C is a constant such that C > 1π , and n the number of
nodes. Recall that in our case, r = 200 and a = 2, 500. With these numbers, we get that
for n = 200, the network is not likely to be connected, but for n = 400, the network is
already connected. Hence, with n = 200, no protocol can achieve high delivery ratios, yet
with n = 400, good reliability can already be obtained.
When looking at the total number of transmissions in Figure 4.17, we can observe
that RAPID scales much better than GOSSIP3 with the density of the network. The
number of transmission is almost constant (slightly increasing mainly due to the gossip
messages and increased collisions) due to the fact that RAPID tunes its rebroadcasting
probability based on the number of observed neighbors. This validates our theoretical
analysis in Section 4.2.1. RAPID achieves a slightly better delivery ratio than RAPID-
NO-GOSSIP. Yet, if the number of messages is more important, we may use RAPID-NO-
GOSSIP that sends even less messages than RAPID (this is since RAPID-NO-GOSSIP
tunes its rebroadcast probability according to the number of observed neighbors just like
RAPID, yet it does not gossip about the existing messages).
Selfish Nodes
Figure 4.13 explores the latency to deliver a message to X% of the nodes when the total
number of nodes in the system is 1,000 and some nodes are selfish, i.e., refuse to rebroadcast
messages. In this graph, we use the notation RAPID-Y to indicate that RAPID was run
with Y selfish nodes. Surprisingly, the latency does not grow with the number of selfish
nodes. This is since on one hand selfish nodes do not rebroadcast other’s messages, but on
the other hand they do not send gossip messages and therefore cause fewer collisions. We
can also see that even when the number of selfish nodes is 200 (20% of all nodes), RAPID
delivers the messages to 98.99% of the nodes within 0.14 seconds. We would like to point
out that by fine tuning the rate of gossips and the other timers in the system, it is possible
to reduce the quantitative latency numbers even further.
Table 4.1 presents the delivery ratio and the message overhead in mobile networks for
56
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
# Selfish Delivery ratio Message overhead
0 99.61% 4200144
50 99.57% 4071811.7
100 99.55% 4006395.7
200 98.99% 3816120.6
Table 4.1: Delivery ratio and message count vs. the number of selfish nodes (with 100broadcasting nodes)
xtqna zelzk zyxd lr qnere miznvd lk lv` elawzdy zerced jqn feg` :4.17 xei`(zeycg zerced migley miznv 100 xy`k) miikep`d miznvd
varying numbers of selfish nodes. We can see that the number of selfish nodes hardly
influences the delivery ratio of RAPID, which consistently delivers more than 99% of the
messages to all nodes. Interestingly, we also notice that the message overhead becomes
smaller as the number of selfish nodes increases. One could expect that the message overhead
should increase with the number of selfish nodes. In particular, the protocol must send more
REQUEST messages for recovering missing messages that were not rebroadcasted by selfish
nodes. However, selfish nodes do not send gossip messages. This reduces both the number
of retransmissions and the number of message collisions. Hence, overall, this results in a
reduced number of message transmissions.
57
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
58
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Chapter 5
Overlay Based Reliable Broadcast
in Wireless Ad-Hoc Networks
5.1 System Model and Definitions
We assume the same model as Section 3.1. We also assume an abstract entity called
an overlay, which is simply a collection of nodes. Nodes that belong to the overlay are
called overlay nodes. Nodes that do not belong to the overlay are called non-overlay nodes.
In the following, OVERLAY refers to the set of nodes that belong to the overlay and
OLt(1, p) ≡ N t(1, p)⋂
OVERLAY (the neighbors of p that belong to the overlay in time t).
Later in this chapter we give examples of a couple of known overlay maintenance protocols
that we adapted to our environment.
5.2 Failure Detectors and Nodes’ Architecture
We assume that each node is equipped with three types of failure detectors, Mute, Ver-
bose, and Trust as defined in Section 3.3 (see also illustration in Figure 5.1). The Trust
failure detector collects the reports of Mute and Verbose, as well as detections of messages
with bad signatures and other locally observable deviations from the protocol. In return,
Trust maintains a trust level for each neighboring node. This information is fed into the
overlay, as illustrated in Figure 5.1. As we describe later in the chapter, the information
59
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
appl
FD interceptorR
eliable Broadcast M
echanism
VERBOSE MUTE
TRUST
network
multicastoverlay
Figure 5.1: A node’s architectureznev ly dxehwhikx` :5.1 xei`
obtained from Trust is used to ensure that there are enough correct nodes in the overlay
so that the correct nodes of the overlay form a connected graph and that each correct node
is within the transmission disk of an overlay node that does not exhibit detectable malicious
behavior.
Interval Failure Detectors
Since the specification of eventual failure detectors requires the accuracy property to hold
from some point on forever [27], they are not practical in a real long running system. Hence,
we present a new type of failure detectors called Interval failure detector. We define Imute
as the class of failure detectors that detect mute failures that occur during special intervals
called mute intervals for the duration of an interval called suspicion interval. Formally, the
Imute failure detector is defined by two properties:
Interval Strong Accuracy: Non-mute processes are not suspected by any other correct
process during a certain interval that we call suspicion free interval.
Interval Local Completeness: Every process p that suffers a mute failure with respect to
a correct process q during a mute interval is suspected by q during a suspicion interval.
In a similar manner to Imute, we define Iverbose as a class of failure detectors that detect
verbose failure. In Section 5.6 we show that if the failure detector belongs to an interval
60
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
failure detector class, then during periods of good connectivity messages will be disseminated
fast (i.e., via the overlay nodes).
5.3 The Broadcast Problem
Intuitively, the broadcast problem states that a message sent by a correct node should
usually be delivered to all correct nodes. We capture this by the eventual dissemination
and the validity properties. Eventual dissemination specifies the ability of a protocol to
disseminate a message to all the nodes in the system. Validity specifies that when a correct
node accepts a message, then this message was indeed generated by the claimed originator.
Formally, we assume a primitive broadcast(p,m) that can be invoked by a node p
in order to disseminate a message m to all other nodes in the system, and a primitive
accept(p,q,m) in which a message claimed to be originated by q is accepted at a node p.
Eventual dissemination: If a correct node p invokes broadcast(p,−) infinitely often,
then eventually every correct node q invokes accept(q,p,−) infinitely often.1
Validity: If a correct node q invokes accept(p,q,m) and p is correct, then indeed q invoked
broadcast(p,m) beforehand. Moreover, for the same message m, a correct node p can
only invoke accept(p,q,m) once.
5.4 The Dissemination Protocol
Our protocol includes three concurrent tasks. First, messages are disseminated over the
overlay by the overlay nodes. Second, signatures about sent messages are gossiped among
all nodes in the system. This allows all nodes to learn about the existence of messages
they did not receive either due to collisions or due to a malicious behavior by an overlay
node. When a node p discovers that it misses a message following a gossip it heard from q,
then p requests the missing message from q as well as from its overlay neighbors. The third
and final task is the maintenance of the overlay, whose goal is to ensure that the evolving
overlay indeed disseminates messages to all correct nodes. Note that the dissemination and1Clearly, with this property it is possible to implement a reliable delivery mechanism. In order to bound
the buffers used by such a mechanism, it is common to use flow control mechanisms.
61
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
recovery tasks are independent of the overlay maintenance. At any event, for performance
reasons, most overlay maintenance messages can be piggybacked on gossip messages.
The pseudo-code of the main protocol appears in Figures 5.2 and 5.3 and is described in
detail in Section 5.4. These figures use two primitives. The primitive broadcast denotes a
broadcast of a message with a given TTL value, i.e., it reaches through flooding all nodes in
the corresponding hop distance from the sender. The primitive lazycast initiates periodic
broadcasting of the given message only to the immediate neighbors of the sender. The
overlay maintenance is described in Section 5.5.
5.4.1 The Dissemination Task in Detail
Dissemination consists of the following steps (described from the point of view of a node
p): (1) The originator p of a message m sends m||sig(m) to all nodes in N t(1, p). The
header part of m includes a sequence number and the identifier of the originator (Line 3 in
Figure 5.2). (2) The originator p of m then gossips sig(m) to all nodes in N t(1, p) (Line 4).
(3) When a node p receives a message m for the first time, p first verifies that sig(m)
matches m. If it does, then p accepts m. If the node that sent m is not an originator of m
and is not in OLt(1, p), p instructs its Mute failure detector to expect a transmission of m
by any of its overlay neighbors. Moreover, if p is also an overlay node, then p forwards m
to all nodes in N t(1, p) (Lines 5–13). However, if m does not fit sig(m), then m is ignored
and the process that sent it is suspected by the Trust failure detector (Lines 22–24). (4)
If a node p receives a message m it has already received beforehand, then m is ignored.
5.4.2 Gossiping and Message Recovery in Detail
Intuitively, the idea here is that nodes gossip about messages they received (or sent) to
all their neighbors. This way, if a node hears a gossip about a message that it has never
received, it can explicitly ask the message both from its overlay neighbor and from the node
from which it received the gossip. If any of the contacted nodes has the message, it forwards
it to the requesting node. Messages can be purged either after a timeout, or by using a
stability detection mechanism. In this work, we have chosen to use timeout based purging
due to its simplicity.
62
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Upon send(msg) by application do(1) message := msg id||node id||msg||sig(msg id||node id||msg);(2) gossip message := msg id||node id||sig(msg id||node id);(3) broadcast(message,DATA,ttl=1);(4) lazycast(gossip message,GOSSIP,ttl=1);
Upon receive(message,DATA,ttl) sent by pj do(5) if (have not received this message before) then(6) if (authenticate-signature(message) = true) then(7) Accept(pi,pj ,message) /* forward it to the application */;(8) if (pj /∈ OV ERLAY and pj is not originator of message) then(9) /* The correct message was received (but not from the overlay node) */(10) Mute.expect(message.header,OLt(1,current node),ANY);(11) endif;(12) if (current node ∈ OV ERLAY ) then(13) broadcast(message,DATA,1);(14) else /* the message is correct and I am not in the overlay */;(15) if (ttl = 2) then(16) broadcast(message,DATA,ttl-1);(17) endif;(18) endif;(19) if (already received a gossip message about message before) then(20) lazycast(gossip message,GOSSIP,ttl=1);(21) endif;(22) else/* the message is not correct */;(23) Trust.suspect(pj ,“bad signature reason”); /* notify the trust failure detector */(24) endif;(25) endif;
Upon receive(gossip message,GOSSIP,ttl) sent by pj : do(26) if (authenticate-signature(gossip message) = true) then(27) if (there is no message that fits the gossip message) then(28) Mute.expect(gossip message.header,pj ,ANY);(29) if (pj is not originator of message that fits the gossip message) then(30) /* The node asks from the node that sent the gossip message and from overlay nodes to */(31) /* send the real message */ ;(32) broadcast(gossip message,REQUEST MSG,ttl=1,pj);(33) endif;(34) else /* the message that fits the gossip message was received */ ;(35) if (gossip message have not been sent yet) then(36) lazycast(gossip message,GOSSIP,ttl=1);(37) endif;(38) endif;(39) else/* the message is not correct */;(40) Trust.suspect(pj ,“bad signature reason”);(41) endif;
Figure 5.2: Malicious Resilient Dissemination Algorithmzeipecf zelitp ipta cinr xy` dvtd ly mzixebl` :5.2 xei`
63
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Upon receive(missing message,REQUEST MSG,ttl,pk) sent by pj do(42) if (authenticate-signature(missing message) = true) then(43) if (current node ∈ OV ERLAY or current node = pk) then(44) if (message that matches missing message was received) then(45) if (current node ∈ OV ERLAY ) then(46) Verbose.indict(pj);(47) endif;(48) broadcast(message,DATA,ttl=1,pj);(49) else /* the message that fits the gossip message was not received */;(50) if (pj is not originator of the message that matches missing message) then(51) if (current node ∈ OV ERLAY ) then(52) broadcast(missing message,FIND MISSING MSG,2,pk);(53) endif;(54) else(55) Verbose.indict(pj);(56) endif;(57) endif;(58) endif;(59) else/* the message is not correct */;(60) Trust.suspect(pj ,“bad signature reason”);(61) endif;
Upon receive(missing message,FIND MISSING MSG,ttl,pk) sent by pj do(62) if (authenticate-signature(missing message) = true) then(63) if ( message that matches missing message was not received) then(64) if (ttl = 2) then(65) broadcast(missing message,FIND MISSING MSG,ttl-1);(66) endif;(67) else /*message that matches missing message was received)*/(68) if(current node ∈ OV ERLAY or current node = pk) then(69) if (pj ∈ N t(1, current node)) then(70) if(current node ∈ OV ERLAY ) then(71) Verbose.indict(pj);(72) endif;(73) broadcast(message,DATA,1);(74) else(75) broadcast(message,DATA,2);(76) endif;(77) endif;(78) endif;(79) else/* the message is not correct */;(80) Trust.suspect(pj ,“bad signature reason”);(81) endif;
Figure 5.3: Malicious Resilient Dissemination Algorithm – continuedjynd -- zeipecf zelitp ipta cinr xy` dvtd ly mzixebl` :5.3 xei`
64
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Additionally, there are several mechanisms in place to overcome malicious failures (in
addition to signatures that detect impersonations). In order to prevent a malicious overlay
node from blocking the dissemination of a message, searching a missing message can be
initiated by limited flooding with TTL=2, which ensures that the recovery request will
reach beyond a single malicious overlay node. This, in addition to requesting the message
from the process that gossiped about its existence. Also, when a node feels that it has
received a request for a missing message too often, or that such a request is unjustified, it
notifies its Verbose failure detector about it.
More accurately, the gossiping and message recovery task is composed of the following
subtasks:
1. When a node p receives a gossip header(m) for a message m it has already received
before, then p gossips header(m) to other nodes in N t(1, p) (Lines 34–38). Otherwise,
p ignores such gossips. In particular, p only gossips about messages it has already
received and does not forward gossips about messages it has not receive yet. This is
done in order to make the recovery process more efficient, and in order to help detect
mute failures more accurately.2
2. When p receives a gossip header(m) for a message m it has not received, p asks its
overlay neighbors and the sender q of the gossip to forward m to it using a RE-
QUEST MSG message. p also instructs its Mute failure detector to expect a trans-
mission of m by q (Lines 27–33). Intuitively, since q gossiped about m, it should have
m and supply it when needed. If q gossips about messages that do not exist or q does
not want to supply them, it will be suspected.
3. When an overlay node p receives a REQUEST MSG for the same message m too
many times from the same node q, it causes p’s Verbose failure detector to suspect
q (Lines 43–47 in Figure 5.3).
4. When an overlay node p receives a REQUEST MSG for the message m, yet p has
not received m, then p sends a FIND MISSING MSG message to nodes in OLt(2, p)
2It is possible to piggyback the first gossip of a message by the sender and by overlay nodes on theactual message. This saves one message and makes the recovery of messages a bit faster, since gossipsabout messages advance slightly faster this way. For clarity of presentation, we separate these two types ofmessages in the pseudo-code.
65
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
asking them to retransmit m (Lines 49–53). (The message is sent to overlay nodes at
distance 2 in order to bypass a potential neighboring malicious node.) Intuitively, if
p receives a REQUEST MSG from q for a message m, and p does not have m, then
it means that some neighbor r of q has gossiped header(m) to q. Therefore, at least
one node in N t(1, q) has m. Since real messages are broadcasted by the overlay nodes
faster than the gossips on these messages, it means that m is missing and therefore p
asks nodes in OLt(2, p) to retransmit m.
5. When an overlay node p receives a FIND MISSING MSG message for m from a node q
and p has m, then p first broadcast m to q. If q ∈ N t(1, p), then p notifies its Verbose
failure detector about it (Lines 67–73). Intuitively, if q is p’s neighbor and p is an
overlay node that has m, then p has broadcasted m to its neighbors and therefore q
should have m.
6. When a non-overlay node p receives a FIND MISSING MSG message for m (it gos-
siped about) from a node q, p broadcasts m to q (Lines 67–73).
5.5 Overlay Maintenance
Overlay maintenance is executed by a distributed protocol. There is no global knowledge
and each node must decide whether it considers itself an overlay node or not. Thus, the
collection of overlay nodes is simply the set of all nodes that consider themselves as such. At
the same time, every correct overlay node periodically publishes this fact to its neighbors,
so in particular, each overlay node eventually knows about all its correct overlay neighbors.
The goal of the protocol is to ensure that indeed the overlay can serve as a good backbone
for dissemination of messages. This means that eventually between every pair of correct
nodes p and q there will be a path consisting of overlay nodes that do not exhibit externally
visible malicious behavior. At the same time, for efficiency reasons, the overlay should
consist of as few nodes as possible.
For scalability and resiliency reasons, we are interested in a self-stabilizing distributed
algorithm in which every node decides whether it participates in the overlay based only on
the knowledge of its neighbors. Recall that the neighbors of p are the nodes that appear in
66
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
the transmission disk of p. Thus, p can communicate directly with them and every message
p sends is received by all of them.
In order to enable nodes to decide locally if they should become overlay nodes, we need
some deterministic symmetry breaking rule. In this work we utilize the overlay maintenance
protocols of [44]. The work of [44] defined the goodness number as a generic function that
associates each node with a value taken from some ordered domain. The goodness number
represents the node’s appropriateness to serve in the overlay. This way, it is possible to
compare any two nodes using their goodness number and to prefer to elect the one whose
value is highest to the overlay. Since in a malicious environment nodes can lie about their
goodness number, this becomes a useless criterion. Thus, we replace the notion of a goodness
number with the nodes id (which is unforgeable, by assumption).
Each node has a local status, which can be either active or passive; active means that
the node is in the overlay whereas passive means that it is not. The local state of each node
includes a status (active or passive), and its knowledge of the local states of all its neighbors
(based on the last local state they reported to it). Additionally every node p maintains a
variable overlay trust for each of its neighbors q, which can be either trusted, untrusted or
unknown; untrusted means that the Trust failure detector of p suspects q, unknown means
that the Trust failure detector of p does not suspect q but another neighbor of p that p
trusts reported to p that it suspects q, and trusted means that p has no reason to suspect
q. Also, p records for each neighbor the list of its active neighbors. We assume that overlay
maintenance messages are signed as well.
In order to ensure the appropriateness of the overlay, we need to verify that the overlay
includes alternatives to each detected mute or verbose node. Ideally, we would like to
eliminate these nodes from the overlay, but as they are malicious, they may continue to
consider themselves as overlay nodes. Thus, the best we can do is make sure that there is
an alternative path in the overlay that does not pass through such nodes, and that correct
nodes do not consider mute and verbose nodes as their overlay neighbors.
The protocol for deciding if a node should be in the overlay consists of computation steps
that are taken periodically and repeatedly by each node. In each computation step, each
node makes a local computation about whether it thinks it should be in the overlay or not,
and then exchanges its local information with its neighbors. For simplicity, we concentrate
67
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
below on the local computation steps only.
Additionally, if a node p receives a message from its neighbor q in which q reports that
it suspects a node r, then p changes r’s overlay trust to unknown, unless p already suspects
either q or r. This is done because a malicious node might be suspected only by some of
its neighbors. Therefore, a node that suspects one of its neighbors should notify its other
neighbors about this suspicion in order to preserve connectivity of correct nodes in the
overlay. Note that a malicious node may abuse this and cause its correct neighbors to join
the overlay. In other words, a malicious node can cause correct nodes to unnecessarily join
the overlay, but it cannot destroy the connectivity of the overlay w.r.t. correct nodes.
Our goal is to ensure that a node elects itself to the overlay if it has the highest identifier
among its trusted neighbors. Below, we mention a couple of overlay maintenance protocols
that realize this intuition (by making the goodness number of a node be its identifier).
Specifically, we have implemented two overlay maintenance protocols, namely the Con-
nected Dominating Set (CDS) and the Maximal Independent Set with Bridges (MIS+B)
of [44], augmented with trust levels (i.e., the overlay trust variable).3 Since other than
adding the trust level, the protocols are the same as in [44], we do not repeat them here.
5.6 Correctness Proof
Let us remind the reader that in Section 3.2 we assumed that there are enough correct nodes
so that non-malicious nodes form a connected graph. With this assumption, we prove the
following validity and eventual dissemination properties. We present proofs only for Mute
failure detector, since the proofs for Trust and Verbose failure detectors are trivial. For
this, we first introduce a few definitions.
1. gossip timeout - the time between two consecutive gossip messages by a correct node.
2. request timeout - the time between receiving a gossip message and sending a request
message.
3. rebroadcast timeout - the time between getting a request message and sending the
message that fits the requested message.3The CDS and MIS+B protocols in [44] are in fact self-stabilizing generalizations of the work of [122].
68
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
4. β - transmission time (the latency that takes a message to arrive to the receiver)
5. δ - the number of new messages that are injected to the network every second.
6. max timeout = gossip timeout + request timeout + rebroadcast timeout + 3× β
In the following, a pair of nodes p and q are well connected at time t if both are correct
and p ∈ N t(1, q) and q ∈ N t(1, p) during the time interval [t, t + max timeout]. We denote
this relation by WC(p, q, t). In order to ensure dissemination of messages, we assume the
following: starting with some time t′, for every t > t′, the graph induced by all pairs of
well connected nodes at time t is connected, and this graph includes all correct nodes in
the network.4 This can be seen as a refinement of the similar requirement in [33] to mobile
ad-hoc networks.
Theorem 5.6.1 The protocol satisfies the validity property.
Proof: According to the protocol, the originator of a message m adds a signature sig(m)
and then disseminates the message m||sig(m) to other nodes. Note that on receiving of
m||sig(m), every correct node checks if sig(m) corresponds to m before the node accepts
m. As a part of the model’s basic assumptions, a malicious node cannot forge signatures.
Therefore, no correct node will accept a message other than m as if it was m. Moreover,
according to the protocol, correct nodes filter duplicates of messages they have already
received.
Theorem 5.6.2 The protocol satisfies the eventual dissemination property.
Proof: We show that a message m that is sent infinitely often by a correct originator p
is disseminated to all the correct nodes. Assume, by way of contradiction, that there is a
message m that is not received by some correct process. Let k be the smallest number such
that there exists a correct node q ∈ N t(k, p) that does not receive m.
Recall that by assumption, during every time interval all the well connected nodes form
a connected graph that includes all correct nodes. Therefore there exists a correct node l
4One can weaken this requirement by saying that starting with some time t′, there are infinite times forwhich the graph induced by all pairs of well connected nodes is connected, and this graph includes all correctnodes in the network. The price of doing this is that the dissemination time of a message to all the nodeswill grow proportionally to the durations in which this graph is not connected.
69
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
∈ N t(k − 1, p) such that the distance between q and l is smaller than rl and l received m.
According to the protocol, l will send a gossip about m to its neighbors and if requested
by its neighbors, l will also send m. Thus, q will receive m either from its overlay node or
from l. This is a contradiction to the assumption about the minimality of k.
5.6.1 Protocol Analysis
In this section we compute a bound on the time required to disseminate a message to all the
nodes and present a limit on the size of buffers that every node must maintain to successfully
disseminate all the messages. In order to provide a bound on the dissemination time we
assume that messages do not collide.
The Dissemination Time:
In the following, CR(m, t) refers to the number of correct processes that received m by time
t.
Lemma 5.6.3 Let m be a message sent by some correct process p. Let t1 be the time in
which the first correct process received m. Let t2 be the time in which the last correct process
received m. During the interval [t1, t2], for all t3, t4 such that t2 ≥ t4 > t3 ≥ t1 and (t4 -
t3) ≥ max timeout, CR(m,t4) > CR(m, t3).
Proof: Assume by way of contradiction that the lemma does not hold. Therefore, there
are two neighboring nodes u and q such that q received the message m at time t and u does
not receive m by time t + max timeout. Recall our assumption that the graph induced
by well connected nodes is connected and includes all correct nodes. After getting the
gossip that fits the correct message m (that q received before), q broadcasts the gossip
to its neighbors after at most gossip timeout seconds. If its neighbors do not have it,
they send a request message after at most request timeout seconds and then after at most
rebroadcast timeout seconds q broadcasts the message to its neighbors. Therefore, after at
most max timeout seconds the message will be disseminated to u. A contradiction.
Theorem 5.6.4 Let m be a message sent by some correct process p at time t. Then all
correct nodes will receive m by time t + max timeout × (n− 1).
70
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
s
BB B B
Figure 5.4: Malicious overlayipecf zxeywz cly :5.4 xei`
Proof: According to Lemma 5.6.3, every max timeout seconds at least one node (that has
not received the message before) will get m. Since the graph of correct nodes is connected
and the number of correct nodes is at most n, all correct nodes will receive message m after
at most max timeout × (n− 1) seconds.
The dissemination time depends on the mobility of nodes. If all nodes are static, each
message will be disseminated to all the nodes in at most max timeout × n2 seconds, where
n is the number of nodes. The explanation for this bound is as follows: According to
Lemma 5.6.3, a node that has message m will broadcast it to its neighbors in at most
max timeout seconds. In the worst case, as illustrated in Figure 5.4, all nodes that belong
to the overlay are malicious and therefore all messages will be disseminated using the gossip-
request mechanism. Due to the assumption that the graph of correct nodes is connected,
the maximal number of hops in the network is n2 hops (in every hop there is one malicious
overlay node and one correct node). Therefore, the message should pass n2 hops and it
will take at most max timeout × n2 seconds. In mobile networks, each message will be
disseminated to all the nodes according to Theorem 5.6.4 within max timeout × (n − 1)
seconds.
Buffers Size:
The size of buffers that every node should have depends on the mobility of the nodes. If
all the nodes do not move, every node has to hold every message for max timeout seconds,
i.e., the time it takes to disseminate the message only to all its neighbors. Therefore, every
node in a static network should have a buffer of size max timeout × δ messages.
In mobile networks, every message should be kept until all the nodes receive the message.
As we showed, every message is disseminated to all the nodes within max timeout × (n−1)
seconds. Therefore, the buffer size of every node in mobile network should be max timeout
71
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
× (n− 1) ×δ messages.
In the following sections, we show that the messages are propagated fast (via the overlay
nodes) during certain periods if the failure detectors behave like eventually perfect failure
detectors or like interval failure detectors.
5.6.2 Fast Dissemination with Eventually Perfect Failure Detectors
In the following, we show that if the MUTE failure detector indeed belongs to ♦Pmute, then
eventually messages are disseminated to all correct nodes by the overlay. The significance of
this is that dissemination along overlay nodes is fast, since it need not wait for the periodic
gossip mechanism.
Lemma 5.6.5 Assume that the MUTE failure detector ∈ ♦Pmute. Then eventually the
non-mute overlay nodes form a connected graph COL such that every correct node is either
in COL, or within the transmission range of a non-mute node in COL.
Proof: Eventually, ♦Pmute of all correct nodes will suspect all the mute nodes. Thus, the
goodness number in the overlay maintenance protocol for mute nodes will be lower than
all other nodes. Consequently, the overlay built by the maintenance protocol will have the
desired property.
Theorem 5.6.6 Eventually, when there are no collisions, most messages propagate to all
the nodes via the overlay nodes, if the MUTE failure detector ∈ ♦Pmute.
Proof: In Lemma 5.6.5, we showed that eventually, the non-mute nodes of the overlay
form a connected graph that covers all non-mute nodes. Therefore, eventually, all messages
are propagated by overlay nodes to all correct nodes, which proves the theorem.
5.6.3 Fast Dissemination with Interval Failure Detectors
In this section we discuss the conditions under which our protocol implements Imute cor-
rectly. We show that, if the MUTE failure detector indeed belongs to Imute then during
periods of good connectivity messages are disseminated to all correct nodes by the overlay.
72
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Observation 5.6.1 If suspicion interval ≥ f × mute interval then there will be an in-
terval CIi in which at least one overlay node in Ni will be correct. We call CIi a correct
interval for node i.
Observation 5.6.2 There exists a suspicion interval such that there are CI1, CI2, ...
CIn for which CI1 ∩ CI2 ∩ ... ∩ CIn 6= ∅.
Observation 5.6.3 In order to prevent false suspicions of the overlay nodes the mute interval
of the Imute failure detector should be larger than (n− 1)×max timeout.
Observation 5.6.4 If some correct node p decides that it is not in the overlay, then after
some finite time all of its correct neighbors know that p /∈ OV ERLAY . This immediately
follows from the protocols that maintain the overlay.
Let OL(p1,p2,tstart,tend) be a relation such that p1 believes that p2 ∈ OLt(1,p1) during
interval [tstart,tend].
Lemma 5.6.7 Every malicious overlay node q that is mute w.r.t. another node p during
an interval [t, t + mute interval] and satisfies OL(p,q,t,t+mute interval) will be suspected
by p during suspicion interval.
Proof: Let q be a malicious overlay node that does not forward the message m. Let
p be a correct node that satisfies OL(p,q,t,t+mute interval). We assume that q is not
forwarding m to p during mute interval and we will show that q will be suspected during
suspicion interval .
According to Theorem 5.6.4, all correct nodes will receive m after at most max timeout
× (n − 1) seconds. Therefore, according to the protocol, after receiving m, p will activate
its MUTE failure detector and if q is not forwarding messages during mute interval, it will
be suspected during suspicion interval by p.
Lemma 5.6.8 Non-mute processes are not suspected by some correct process during
suspicion free interval.
73
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Proof: A non-mute non-overlay node p cannot be suspected, since according to the pro-
tocol a non-overlay node can be suspected by the MUTE failure detector only if it is not
forwarding a message m that it gossiped about. Since p is not mute, it will always forward
the message m and therefore will not be suspected. According to Observation 5.6.4, if p
leaves the overlay, then after some finite time that is smaller than mute interval all its
correct neighbors believe that p /∈ OV ERLAY . Therefore, p’s neighbors will not expect p
to broadcast messages and thus will not suspect it either.
Similarly, a non-mute overlay node p also cannot be suspected. This is because according
to the protocol, an overlay node can be suspected by the MUTE failure detector only if
it is not forwarding a message m that it received from another overlay node or from the
originator of m. Yet, since p is not mute, if p has m, it will always forward it. If p does not
have the message m, p will send a FIND MISSING MSG message and it will receive the
missing message m either from nodes that belong to N t(2, p) or after at most max timeout
× (n− 1) seconds (as we show in Theorem 5.6.4). Once p receives m, it will forward m to
its neighboring nodes and therefore p will not be suspected since mute interval > (n− 1)
× max timeout (according to Observation 5.6.3).
Lemma 5.6.9 Assume that the MUTE failure detector ∈ Imute. Then there is an interval
in which the non-mute overlay nodes form a connected graph such that every correct node
is either in the overlay, or within the transmission range of a non-mute overlay node.
Proof: Lemma 5.6.8 shows that non-mute nodes are not suspected and according to
Lemma 5.6.7 there is an interval such that Imute of all correct nodes suspects all mute
nodes. Thus, none of the mute nodes will be trusted. Consequently, the overlay built by
the maintenance protocol will have the desired property.
Theorem 5.6.10 If the MUTE failure detector ∈ Imute then there is an interval such that,
when there are no collisions, most messages propagate to all the nodes via the overlay nodes.
Proof: Lemma 5.6.7 shows that there is a certain interval when every mute overlay node
is suspected. In Lemma 5.6.9, we showed that there is an interval such that the non-mute
nodes of the overlay form a connected graph that covers all non-mute nodes. Therefore,
during a certain interval, all messages are propagated by overlay nodes to all correct nodes,
which proves the lemma.
74
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
0 0.5 1 1.5 2 2.5 3 3.5 40
10
20
30
40
50
60
70
80
90
100
%re
ciev
ed m
essa
ges
#messages sent per second
BDP(MIS)BDP(CDS)OVERLAY(MIS)OVERLAY(CDS)FLOODING
Figure 5.5: Message delivery ratio when allnodes are static
lk lv` elawzdy zerced jqn feg` :5.5 xei`migiip miznvd xy`k miznvd
0 0.5 1 1.5 2 2.5 3 3.5 40
0.5
1
1.5
2
2.5
3
3.5
4
4.5
x 105
#sen
t mes
sage
s
#messages sent per second
BDP(MIS)BDP(CDS)OVERLAY(MIS)OVERLAY(CDS)FLOODING
Figure 5.6: Network load in terms of totalnumber of messages sent when all nodes are
staticmigiip miznvd xy`k zyxd lr qner :5.6 xei`
5.7 Results
We have measured the performance of our protocol using the SWANS/JIST simulator [1]. In
the simulations, we have compared the performance of our protocol with the performance of
flooding on one hand and of simple dissemination along an overlay (without recovery of lost
messages). Here, flooding is an example of a very robust protocol against maliciousness, but
also very wasteful. At the other extreme, dissemination along an overlay without message
recovery is very efficient, but very unreliable as well. We have measured the percentage
of messages delivered to all nodes, the latency to deliver a message to all and to most of
the nodes, and the load imposed on the network. It is also important to note that our
performance measurements included the overhead of the overlay maintenance as well as the
gossip messages (although overlay maintenance are piggybacked on gossip messages).
In order to reduce the number of collisions, we have employed a staggering technique.
That is, each time a node is supposed to send a message, it delays the sending by a random
period of up to several milliseconds.
In the simulations, mobility was modelled by the Random-Waypoint model [65]. In this
model, each node picks a random target location and moves there at a randomly chosen
speed. The node then waits for a random amount of time and then chooses a new location
etc. In our case, the speed of movement ranged from 0.5-1.5 m/s, which corresponds to
75
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
0 20 40 60 80 1000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
late
ncy
(sec
)
%nodes
BDPOVERLAY
Figure 5.7: Latency to deliver a message toX% of the nodes when all nodes are static
(with 200 broadcasting nodes that send onemessage per second)
X% l drced xiardl xefg` onf :5.7 xei`200 xy`k) migiip miznvd lk xy`k miznvdn
(diipy lk zeycg zerced migley miznv
0 20 40 60 80 1000
0.5
1
1.5
2
2.5
3
late
ncy
(sec
)
%nodes
BDPOVERLAY
Figure 5.8: Latency to deliver a message toX% of the nodes when nodes are mobile
(with 200 broadcasting nodes that send onemessage per second)
X% l drced xiardl xefg` onf :5.8 xei`200 xy`k) miciip miznvd lk xy`k miznvdn
(diipy lk zeycg zerced migley miznv
walking speed. Also, the maximal waiting time was set to 20 seconds. Each simulation
lasted 5 minutes (of simulation time) and each data point was generated as an average of
10 runs. The transmission range was set to roughly 80 meters5 with a simulation area of
200x200 meters, the message size was set to 1KB (less than one UDP/IP packet), and the
network bandwidth to 1Mbps. In each simulation, two nodes were generating messages at
variable rates. We have run simulations with a varying number of nodes, but discovered that
with the exception of very sparse networks, the results are qualitatively the same. Thus, we
only present the results when the number of nodes is fixed at 200. In the graphs, we denote
the flooding protocol by FLOODING, our malicious resilient dissemination protocol by
BDP(MIS) and BDP(CDS) depending on the overlay mechanism used (see Section 5.5), and
by OVERLAY(MIS) and OVERLAY(CDS) the simple overlay dissemination mechanism
that has no message recovery. We limited the number of times each message is gossiped
to two. Additional gossip attempts slightly improve the delivery ratios, but at the cost of
additional messages. Finally, the main maliciouos behavior checked was of being mute, as
this has the most adverse affect on the performance of the system.
5In fact in SWANS one can choose the transmission power which translates into a transmission rangebased on power degradation and background noise.
76
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
0 0.5 1 1.5 2 2.5 3 3.5 40
10
20
30
40
50
60
70
80
90
100
%re
ciev
ed m
essa
ges
#messages sent per second
BDP(MIS)OVERLAY(MIS)FLOODING
Figure 5.9: Message delivery ratio when allnodes are mobile
lk lv` elawzdy zerced jqn feg` :5.9 xei`miciip miznvd xy`k miznvd
0 0.5 1 1.5 2 2.5 3 3.5 40
0.5
1
1.5
2
2.5
3
3.5
4
4.5
x 105
#sen
t mes
sage
s
#messages sent per second
BDP(MIS)OVERLAY(MIS)FLOODING
Figure 5.10: Network load in terms of totalnumber of messages sent when nodes are
mobilemiciip miznvd xy`k zyxd lr qner :5.10 xei`
The results of the simulations in static networks with no malicious nodes are presented
in Figures 5.5, 5.6, and 5.7. As can be seen by the graphs, in this benign case, all protocols
obtain very high delivery rates. Essentially, in all protocols the latency to deliver a message
to all nodes remain well below 200ms. However, the load on the network of the flooding
protocol grows dramatically in the number of neighbors each node has (or in other words,
the density of the network). Thus, from an energy standpoint, flooding is much worse and
less scalable than the others. Due to the staggering we used, even the flooding approach
resulted in a relatively small number of collisions that were compensated for by its high
redundancy, which explains its high delivery ratios. However, with higher sending rates, it
is expected to perform much worse.
Since MIS+B and CDS performed almost the same, yet MIS+B is much more com-
putationally efficient, during the rest of the this work, we only present the results for the
MIS+B overlay. Figures 5.8, 5.9 and 5.10 present the simulation results for a mobile net-
work. Here, flooding continues to behave well in terms of delivery ratio and latency (and
bad in terms of network load). However, we start seeing a significant difference between our
dissemination protocol (BDP) and a simple dissemination with no gossip and no recovery of
messages (OVERLAY). While BDP maintains delivery rates close to flooding (and close to
77
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
0 2 4 6 8 10 12 140
10
20
30
40
50
60
70
80
90
100
%re
ciev
ed m
essa
ges
#faulty nodes
BDP(MIS)OVERLAY(MIS)FLOODING
Figure 5.11: Message delivery ratio when allnodes are static vs. varying number of
malicious nodes (out of a total of 200 nodes)lk lv` elawzdy zerced jqn feg` :5.11 xei`xtqna zelzk ,migiip miznvd xy`k miznvd
(miznv 200 jezn) miipecfd miznvd
0 2 4 6 8 10 12 140
10
20
30
40
50
60
70
80
90
100
%re
ciev
ed m
essa
ges
#faulty nodes
BDP(MIS)OVERLAY(MIS)FLOODING
Figure 5.12: Message delivery ratio whennodes are mobile vs. varying number of
malicious nodes (out of a total of 200 nodes)lk lv` elawzdy zerced jqn feg` :5.12 xei`xtqna zelzk ,migiip miznvd xy`k miznvd
(miznv 200 jezn) miipecfd miznvd
100%), without gossip the delivery rate drops to 40%. Generally speaking, all protocols de-
liver messages fast. However, OVERLAY only delivers message to about 40% of the nodes.
Also, in BDP the latency slightly grows for the last nodes proportionally to the frequency
of a single gossip exchange.
Figures 5.11 and 5.12 explore the delivery ratio of the different protocols with varying
number of malicious nodes. As can be seen, when no recovery mechanism is employed, the
delivery rate drops dramatically. On the other hand, both our protocol and the flooding
protocol maintain very high delivery rates. Interestingly, when nodes are mobile, the impact
of malicious nodes is weakened. This can be explained by the fact that the overlay adapts
itself to the evolving network topology. Thus, a malicious node does not necessarily remain
in the overlay throughout the execution.
Figures 5.13 and 5.14 explore the network load imposed by the different protocols as a
function of the number of malicious nodes. In the static case, the network load imposed
by BDP exhibit a linear increase with the number of malicious nodes. On the other hand,
the network load imposed by flooding slightly improves. This can be explained by the fact
that if malicious nodes avoid sending messages, then fewer messages are sent. As for the
dynamic case, here we also observe the interesting phenomenon that mobility improves the
78
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
0 2 4 6 8 10 12 140
1
2
3
4
5
6x 10
4
#sen
t mes
sage
s
#faulty nodes
BDP(MIS)OVERLAY(MIS)FLOODING
Figure 5.13: Network load when all nodes arestatic vs. varying number of malicious nodes
(out of a total of 200 nodes)miznvd xy`k zyxd lr qner :5.13 xei`
jezn) miipecfd miznvd xtqna zelzk ,migiip(miznv 200
0 2 4 6 8 10 12 140
1
2
3
4
5
6x 10
4
#sen
t mes
sage
s
#faulty nodes
BDP(MIS)OVERLAY(MIS)FLOODING
Figure 5.14: Network load when nodes aremobile with varying number of malicious
nodes (out of a total of 200 nodes)miznvd xy`k zyxd lr qner :5.14 xei`
jezn) miipecfd miznvd xtqna zelzk ,miciip(miznv 200
asymptotic behavior of the protocols. Again, this can be explained by the fact that the
overlay structure evolves with the network topology, making it “harder” for malicious nodes
to block message dissemination along the overlay.
Figures 5.15 and 5.16 explore the latency to deliver a message to X% of the nodes when
some nodes are malicious (out of 200 nodes and a sending rate of 1 message per second).
Clearly, the latency grows with the number of malicious nodes. Also, in the static malicious
case, almost all nodes receive the message in less than a second and only when there are
many malicious nodes, it may take several seconds to deliver a message to the last 20% of
the nodes. In the mobile case we see the same qualitative behavior, but the latency starts
growing beyond one second at 60% of the nodes. We would like to point out that by fine
tuning the rate of gossips and the other timers in the system, it is possible to dramatically
reduce the quantitative latency numbers. However, the important thing to note is that
with malicious nodes, without a best-effort recovery mechanism, it is almost impossible to
ensure reliable delivery just by retransmission. This is because without additional recovery
mechanism, the malicious nodes might collude to block all messages from reaching some
parts of the network.
79
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
0 20 40 60 80 1000
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
late
ncy
(sec
)
%nodes
BDP−0BDP−1BDP−2BDP−8BDP−14
Figure 5.15: Latency to deliver a message toX% of the nodes when all nodes are static vs.
varying number of malicious nodesX% l drced xiardl xefg` onf :5.15 xei`
zelzk ,migiip miznvd lk xy`k miznvdnmiipecfd miznvd xtqna
0 20 40 60 80 1000
1
2
3
4
5
6
7
8
9
10
late
ncy
(sec
)
%nodes
BDP−0BDP−1BDP−2BDP−8BDP−14
Figure 5.16: Latency to deliver a message toX% of the nodes when nodes are mobile vs.
varying number of malicious nodesX% l drced xiardl xefg` onf :5.16 xei`
zelzk ,miciip miznvd lk xy`k miznvdnmiipecfd miznvd xtqna
80
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Chapter 6
Byzantine Resilient Group
Communication
In this chapter we outline Byzantine JazzEnsemble, a group communication system that
tolerate Byzantine failures. We have designed Byzantine JazzEnsemble to perform well in
the normal case, i.e., when no Byzantine failures occur, yet be resilient to them if they do.
Byzantine JazzEnsemble can serve as a building block in different collaborative applications,
as we show in Appendix A. Yet, in this work we address single-hop ad-hoc networks, while
enhancing Byzantine JazzEnsemble to multi-hop ad hoc networks is left for the future work.
6.1 Model, Assumptions and Problem Statement
6.1.1 Basic Concepts
We assume the standard group-communication/middleware enhanced distributed comput-
ing model. That is, we assume a collection of n nodes (also called processes), each with an
architecture similar to the one illustrated in Figure 6.1. In particular, a node includes an
application module, a group communication module, and a network module.
Physically, nodes can only communicate by sending and receiving messages over the
network. From a theoretical standpoint, the network itself can be modeled as being driven
by a scheduler that controls the timing in which messages are received and is also allowed
to drop messages. Furthermore, the scheduler may decide at each moment for any pair
81
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Failure Detector
NET_SEND
SEND_DELIVER
NET_RECEIVE
application
network
Group Communication
VIEW
CAST_DELIVERCASTSEND
JOINLEAVE
NET_CAST
Figure 6.1: A node’s architectureznev ly dxehwhikx` :6.1 xei`
of nodes whether they are connected or disconnected. When a pair of nodes p and q are
connected, most messages sent between p and q are delivered by the scheduler within a
known bounded latency and while preserving their FIFO sending order. The few messages
that are delayed, dropped, or reordered when p and q are connected are chosen randomly and
in an oblivious manner to their content. Otherwise, the network is disconnected. We may
treat the connected and disconnected properties as relations; we assume that the scheduler
maintains symmetry and transitivity for these relations at all times. Thus, if p is connected
to q, then q is connected to p, and if there is a third process r that p is connected to, then
q and r are also connected.1
The application module executes a program that involves communicating with the ap-
plication modules at other nodes by exchanging messages with them. The group communi-
cation module is responsible for providing an abstract communication model that has much
stronger semantics than the network module. In particular, in this work we are interested
in the Byzantine tolerant version of the strong virtual synchrony model, which is defined
below.
As is typically done in distributed computing, each module of a node can be modeled
as an automaton. In this work, we are mainly concerned with the group communication
module. The automaton of this module accepts input events from the application module,
the network module, or timer events. In turn, the group communication performs some1In some networks, these relations may not be transitive, yet transitivity can be obtained by a peer-to-peer
routing protocol.
82
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
computation that may change its state, returns some output events to the application mod-
ule and network module, or set future timer events. The input events include send – a
request by the application to send a message to a specific node, cast – a request by the
application to send the same message to all nodes, net-receive – receiving a message from
the network, as well as membership related events that will be introduced below. The out-
put events are net-send and net-cast to the network, send-deliver and cast-deliver
to the application, as well as membership related events, as reported below. The actions
performed by the group communication layer in response to a given event are governed by
its specification, which is also known as a transition function in automata theory.
With this model, for a process pi, we can define a process history hi to be the sequence of
events occurring at pi. A collection of process histories, one for each process in the system,
is called an execution. We assume in this work that all executions are well formed in the
sense that in every execution σ, if the history of a process pi in σ includes a net-receive
event with some message m sent to by some process pj to pi (or to everyone in the case of
a broadcast), then the history hj includes a net-send or net-cast event with the message
m send to pi by pj (or to everyone in the case of a broadcast).
Each process has a local clock. The local clocks are not synchronized. However, we
assume the existence of a global clock that is known to external observers of the system.
For a given history h and two events e1 and e2, we denote by e1 →h e2 the fact that e1 is
ordered before e2 in h. Similarly, for a given execution σ, we denote by e1 →σ e2 the fact
that e1 occurred in the global time of σ before e2.
6.1.2 Byzantine Virtual Synchrony
We assume an abstract entity called a group. The application module of a node can invoke
a join event, indicating that it wishes to join a group, or a leave event, indicating that it
is no longer interested in the group. During the time interval between a join event and a
subsequent leave event, the node is said to be a member of the group. The collection of
correct nodes that are members of the group at a given time t is called the group membership
at time t.
The virtual synchrony model presents an abstraction to the application, in which peri-
odically the applications receives a view event; such an event reports an estimate for the
83
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
current membership. The view event includes a view ID and an ordered membership list; for
a view event v, we denote its view identifier by v.vid and the view membership by v.mbrs.
The Byzantine virtual synchrony model includes two aspects: The first relates the contents
of views delivered to applications in different nodes to one another and to reality. The
second places restrictions on message delivery within a given view. The formal definition
appears below (the definition does not explicitly address join and leave events for simplic-
ity). It is split into Byzantine view synchrony, which only addresses views, and Byzantine
virtual synchrony, which adds certain requirements about message agreement and reliable
delivery.
In the definitions below, we use the following notation for convenience: For any view
events v1 and v2 and history hi, we denote C(hi, v1, v2) the fact that v1 →hi v2 and there
does not exist a third view event v3 such that v1 →hi v3 →hi v2 (this corresponds to saying
that v1 and v2 are consecutive view events in hi).
Definition 6.1.1 (Byzantine View Synchrony) An execution σ is Byzantine view syn-
chronous if it obeys the following restrictions:
1. For every view event v that is included in a history hi of a correct process, pi ∈ v.mbrs.
2. For every history hi of a correct process and every two view events v1 →hi v2, we have
v1.vid < v2.vid.
3. For every two correct nodes pi and pj and any two view events vi ∈ hi and vj ∈ hj
for which vi.vid = vj .vid, we also have vi.mbrs = vj .mbrs.
4. For every two correct nodes pi and pj that from some point on in σ are continuously
connected, there is a point in hi from which all view events v in hi are such that
pj ∈ v.mbrs.
5. For every correct node pi, if from some point on in σ there is another node pj that is
always disconnected from pi, or pj crashes, then from some point on in hi, for every
view event v in hi we have pj 6∈ v.mbrs.
6. If two correct nodes pi, pj ∈ v1, and C(hi, v1, v2) and pj 6∈ v2, then some process pk
∈ v1 suspected pj in v1.
84
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
7. For any two correct nodes pi and pj and views v1 and v2, if C(hi, v1, v2) and pj ∈v1.mbrs ∩ v2.mbrs, then the history hj (of pj) also includes v1.
Intuitively, Items 1 and 2 are sanity checks that ensure that a node is included in its
own view and that view identifiers are monotonically increasing. Item 3 requires correct
processes to agree on the membership of joint views. Item 4 and 5 relate the membership
lists in views to aspire to resemble the true list of connected correct processes. Item 6 verifies
that a node is not removed from the view without being suspected by another node in the
view. This prevents spurious views from occurring and therefore eliminates possible useless
implementation in which singleton views are continuously installed. Finally, Item 7 requires
that if a node pj appears in two consecutive views of another node pi, then pj has at least
installed the first of these two views. Thus, each view also serves as a confirmation and
synchronization point w.r.t. the previous view. The main difference between this definition
and the benign version of View Synchrony that appears in [51] is that here we restrict the
behavior of correct processes (and not of the alive processes) and we separate between being
connected and being correct.
Notice that if a process pi is included in the membership list of some view v, i.e.,
pi ∈ v.mbrs, it does not automatically mean that pi has also installed this view, i.e., that
v ∈ hi. Furthermore, without some strong synchronization assumptions, the 5th item in
the definition of Byzantine view synchrony cannot be satisfied [26]. Rather than adding
such explicit assumptions to our model, we assume that each node is equipped with a failure
detector module, as in Figure 6.1. The failure detector at process pi may occasionally report
some other processes as suspected. These reports may be erroneous, but it is assumed that
the failure detector is bound in some ways about the mistakes it can make. It has been
previously shown in [30] that in benign failure models, Items 4 and 5 in the definition above
are equivalent to what is known as an eventually perfect failure detector, also denoted 3Pin the literature. At this point in the paper, we do not restrict the failure detector type, but
rather relax the 4th requirement. Specifically, we only require that for some parameter k, if
there exists a subset of nodes of size at least k such that the failure detectors of these nodes
never suspect each other from some point on, then eventually these nodes continuously
remain in each other’s views. The ratio between k and f (the number of Byzantine nodes)
may depend on the exact failure detector used and possibly also on the protocols chosen.
85
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
In most cases, one is likely to require that k is at least 3f + 1.
We would like to emphasize that the definition of Byzantine View Synchrony supports
what is known as partitionable membership model, in which there can be multiple concurrent
views of the same group. In particular, a process can join the group, yet still be partitioned
from the rest of the group, or in other words, have its own view of the group, at least for a
while. If the members of two such views become connected for sufficiently long, they should
merge and create a joint view (as called for by Item 4).
For our next and final definition, we introduce the following notation: We denote
I(hi, v1) the set of events ev such that ev appears in some history hi after a view v1
(v1 →hi ev) and there is no other view v2 for which v1 →hi v2 →hi ev (this corresponds to
saying that ev occurred in view v1 in hi). Finally, for a given message m and process pi
that sends m, we denote si(m) the corresponding send event at pi; similarly, for a message
m and process pi that receives m, we denote ri(m) the corresponding receive event at pi.
Definition 6.1.2 (Byzantine Virtual Synchrony) An execution σ is Byzantine virtu-
ally synchronous if it obeys the following restrictions:
1. σ is Byzantine view synchronous.
2. Let m be a message and si(m) and rj(m) be corresponding send and receive events at
correct processes pi and pj, respectively. If for some view v1 si(m) ∈ I(hi, v1), then
rj(m) ∈ I(hj , v1).
3. Let m be a broadcast message such that si(m) ∈ I(hi, v1) for some correct process pi
and view v1, and let v2 be a view such that C(hi, v1, v2). Then for each correct process
pj such that both v1 ∈ hj and v2 ∈ hj, we have rj(m) ∈ hj.
4. Let m be a broadcast message such that ri(m) ∈ I(hi, v1) for some correct process pi
and view v1, and let v2 be a view such that C(hi, v1, v2). Then for each correct process
pj such that v1 ∈ hj and v2 ∈ hj, we have rj(m) ∈ hj.
5. Let m1 and m2 be two messages sent by a process pi that is either correct, or crashes
during σ, but does not suffer any other Byzantine failure in σ. Furthermore, assume
si(m1) →hi si(m2) and both si(m1) ∈ I(hi, v1) and si(m2) ∈ I(hi, v1). Then if for
some correct process pj rj(m2) ∈ I(hj , v1), then rj(m1) ∈ I(hj , v1) as well.
86
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Intuitively, Item 2 implies that a message can only be received in the same view in
which it was sent; Item 3 implies reliable delivery of messages sent by correct members that
remain in the same view; Item 4 implies agreement on which messages were received in a
terminating view; Item 5 implies no message omissions (or no FIFO holes even from crashed
processes). Here, again, the definition is similar to the benign case as it appears in [51],
with the exception that we only restrict the behavior of correct processes. An interesting
aspect of Item 4 is that a Byzantine process can send two distinct versions of the same
message to two different correct processes. This situation cannot occur in the benign failure
model. Ensuring that correct processes also agree on the content of a message is known as
uniform broadcast [86].
6.2 Overview of the Solution
Section 6.2.1 presents the architecture of JazzEnsemble, while the rest of the Section de-
scribes the adaption of JazzEnsemble to Byzantine environment.
6.2.1 JazzEnsemble and Fuzzy Membership
JazzEnsemble is an experimental variant of Ensemble. JazzEnsemble implements the ideas
of fuzzy group membership [43] and also supports various optimizations and protocol layers
that enable it to operate in ad-hoc networks, including, e.g., support for routing in ad-hoc
networks. Both Ensemble and JazzEnsemble have the same general architecture and the
same glue mechanism, and many of the layers of JazzEnsemble are simply taken as is from
Ensemble. The main differences are in a few layers that are related to ad-hoc networking
and to fuzzy failure detection, and to benefiting from fuzzy membership notifications.
The main architecture of Ensemble is nicely described in [58] while its security architec-
ture is described in [100]. A detailed discussion of the adaptations done in JazzEnsemble
to accommodate ad-hoc networks appears in [39]. The main aspects of JazzEnsemble that
are relevant as background for this work are those related to fuzzy membership. We thus
briefly repeat them here.
The idea of fuzzy membership is that rather than viewing membership as a binary
property, the system should maintain a fuzziness level for each view member. This indicates
87
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
the degree to which the corresponding member seems to be alive and responsive (i.e., low
fuzziness level is a good thing while high fuzziness is bad). The fuzziness level of each
member is made available to all the group communication system’s layers, and each can
utilize it in order to optimize its behavior w.r.t. nodes with high fuzziness level. With
this, it is possible to have long timeouts for failure detection (and view changes) without
compromising the performance of the system. At the same time, the fuzziness level is
hidden from the application, which continues to enjoy the relatively simple strong virtual
synchrony model.
To better understand how fuzziness levels help, consider for example the issue of flow
control [110]. Flow control restricts the number of messages (or bytes) that a sender can
send without hearing an acknowledgement, which is known as a sending window. This
prevents overflowing the network and the receivers’ buffers. The problem in multicast flow
control is that until a sender receives acknowledgements from all intended receivers, it
should not advance its sending window. With fuzzy membership, we modify this behavior
to allow the sender to advance its sending window as soon as all nodes with low fuzziness
level acknowledge the message. This way, we avoid pausing due to slow nodes, since the
fuzziness level of slow nodes is high.
Similarly, in order to ensure reliable delivery, nodes must keep messages they receive
for possible retransmission. In order to save buffer space, we can utilize fuzziness levels by
compressing messages that were already acknowledged by all members with low fuzziness.
As reported in [46], using similar principles, it is also possible to expedite view changes
in some cases, while offering replicated state machine semantics. Finally, we utilize the
fuzziness levels as unreliable failure detectors in our Byzantine consensus protocols, as
presented later in this paper.
JazzEnsemble supports the notion of fuzzy membership by adding a special event for
notifying about changes in fuzziness levels, by adding flags to existing events, and through
modification to the failure detection, flow control, reliable broadcast, and membership man-
agement layers. These changes are discussed in more detail in [39].
88
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Protocol Stack
SENDER RECEIVER
Message
Header
Event
Protocol Layer
Figure 6.2: Message headers and data in layers (drawing taken from Ensemble’s referencemanual)
ly jixcnn gwlp xei`d) Ensemble ly zeaky jeza zercedd ly rcine zexzek :6.2 xei`(Ensemble
6.2.2 Fuzzy Mute and Fuzzy Verbose Failure Detectors
Let us note that the standard heartbeat based failure detection mechanism of non-Byzantine
tolerant group communication systems is not sufficient for overcoming Byzantine failures.
This is because a node can send heartbeats in a timely manner, yet otherwise behave in an
arbitrary manner.
When considering the structure of messages sent and manipulated by layered group
communication systems, it is clear that at each layer it is possible to identify a header part
and a data part. In particular, the header part includes the information added, manipulated
and verified by the layer. For example, the header for a layer that implements reliable FIFO
delivery often includes a message type, a sequence number for the message, and possibly
the sequence number of the last acknowledged message. Often, a given layer L is completely
unaware of headers belonging to lower layers in the stack, whereas the application data (in
application driven messages) as well as the headers added by higher layers are part of the
data as far as L is considered. See illustration in Figure 6.2.
Moreover, often such a layer L can expect to receive messages with known headers from
other nodes in the group. For example, consider a reliable FIFO delivery layer L at process
p that recently sent a message m to a node q. The layer L at p expects to see a message
from q that includes an acknowledgement for m within a given timeout. A failure by layer
89
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
L at p to see such a message from q is called a mute failure of q with respect to p. Another
example of a mute failure is a coordinator of a membership maintenance layer that fails
to generate a new view when expected by the other members. Additionally, often it is
possible to assume that a correct layer should not generate messages with certain headers
too frequently. For example, if the flow control restrict the rate of messages, then q should
not send messages faster than this limit. Similarly, there are situations in which a layer
L at p knows that a certain message header from q should not be received if q is correct.
As an example, consider an acknowledgement for a message that was not sent in a reliable
FIFO layer. We refer to such behavior as a verbose failure of q with respect to p.
Interestingly, a large percentage of Byzantine attacks against many layers are either
mute failures or verbose failures.2 Moreover, with the above observations, layered group
communication systems match perfectly the model proposed in Section 3.3. This suggests
replacing the standard failure detection mechanism with mute and verbose failure detectors.
That is, we add a component to the system that allows each layer to register statistics timers
and counters. Whenever an inappropriate mute or verbose behavior is noticed by some layer
L, the layer can invoke the corresponding method of the mute or verbose failure detector,
instantiated with the corresponding counter or timer, to record this misbehavior. Thus, in
order to cope with mute processes and verbose processes we use failure detectors that are
similar to Mute and Verbose failure detectors from Section 3.3.
Yet, similarly to the detection of crash failures in the benign failure model, when running
in a somewhat asynchronous system, it is hard, if not impossible, to find good timeouts for
deciding that a node is truly faulty. Being too eager would result in eliminating from the
view many legitimate members. On the other hand, being too lenient may result in serious
performance degradations. Thus, the solution we adopt is in the form of fuzzy mute and
fuzzy verbose failure detectors. That is, these failure detection modules maintain a fuzzy
mute level and a fuzzy verbose level for each group member. These fuzziness levels are
reported to all layers of the micro-protocol stack, and each layer can decide how to handle
members with high levels of muteness or verbosity. In particular, there is a suspicion layer
that initiates removal of nodes whose fuzzy mute or fuzzy verbose levels are above a given
threshold. In order to handle false detection caused by network overloads and short-lived
2Of course, a node can send a corrupt message, or try to impersonate another node. However, such abehavior can be trivially recognized by the cryptographic mechanism.
90
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
disconnections, we also reduce fuzziness levels using an aging mechanism. The interface of
these failure detectors is similar to the one that appears in Section 3.3.1.
6.2.3 Intra-View Reliable Delivery
Intra-view reliable delivery involves issues like flow-control to avoid congesting the network
or running receivers buffers, fragmenting and reassembling of messages that are larger than
UDP’s MTU, and ensuring reliable FIFO delivery of both point-to-point and broadcast
messages. There is also the issue of filtering messages sent from other views and preventing
admitting corrupted messages.3 Filtering bad messages (corrupt or from a different view)
is done at the lowest part of the system. It is obtained by indicating the view id on each
message and by signing it. If the message is corrupt, its digest will not fit its content, and it
will be dropped. Similarly, if the message was sent in a different view by a correct process,
it will be eliminated based on its view ID and not even reach any layer.
As for the layers that handle flow control and reliable delivery of messages, including
recovery of lost messages, these layers implement well known protocols. In particular, these
layers are almost the same in the Byzantine protocol stack of JazzEnsemble, the benign
protocol stack of JazzEnsemble, and Ensemble. The only differences are the ones related
to fuzzy mute and fuzzy verbose failures. The differences are fairly technical, and are
therefore dropped from this dissertation. For the rest of this work, we assume that the
system provides reliable delivery of messages within views, i.e., it satisfies all intra-view
requirements of Byzantine virtual synchrony. Below, we concentrate on the complementing
protocols that handle view changes, which together provide the overall required Byzantine
virtual synchrony semantics.
6.2.4 Byzantine Membership Maintenance
As in Ensemble (and in fact, this dates back to Horus [117]), a new node that tries to join
the system first establishes a singleton view with only itself in it. From that point on, the
membership protocols are responsible for merging concurrent views or eliminating faulty
3We use the term broadcast to mean sending the same message to all members of the view in which themessage was sent.
91
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Periodically do(1) Mute.expect(HEARTBIT, p0...pn−1);(2) if (I AM COORD==TRUE) then(3) Periodically GOSSIP about the view to other nodes
(4) else(5) Mute.expect(GOSSIP, pcoord);(6) endif
Upon receive(view, GOSSIP MESSAGE) sent by pj do(7) Verbose.indict(pj);(8) if (pj is not suspected by FD) then(9) if (I AM COORD==TRUE) then(10) try to merge with another view(pj);
(11) else(12) Mute.expect(MERGE, pcoord);(13) endif;(14) endif;
Upon Trust.suspect(pi)do(15) Suspect Node(pi, FALSE);
Figure 6.3: Pseudo Code of Membership Protocolzexagd lewehext ly cew-ecaqt :6.3 xei`
nodes by establishing new views that exclude them. In particular, the goal of the member-
ship maintenance is to provide the Byzantine Virtual Synchrony model. This includes the
following aspects:
Eliminating Suspected Nodes
Each node in JazzEnsemble employs a local failure detection (Line 1, 5, 7 and 12 in Figure 6.3
and Line 30 in Figure 6.4) mechanism in order to suspect nodes that seem to be faulty, and
reports such suspicions to other nodes (Line 15 in Figure 6.3 and Line 1 in Figure 6.4). When
some nodes are suspected, JazzEnsemble tries to establish a new view without the suspected
nodes. However, in order to prevent Byzantine nodes from removing correct nodes from the
system, only nodes that are suspected by enough other nodes can be removed. Moreover,
we would like to ensure that only nodes that are agreed upon by the correct members of
the current view would be eliminated from the next view. This is obtained by utilizing a
Byzantine consensus protocol (Line 7, 14 in Figure 6.4, Line 3, Line 32 in Figure 6.5 and
Line 30 in Figure 6.6) whose details appear in Section 6.2.6. Handling suspicions is treated
92
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
procedure Suspect Node(pi, start byz consensus)(1) bcast (pi, SUSPECT);(2) if (have not suspected pi before) then(3) timer.set(pi,SUSPECT,t1); /* setting suspicion timer for t1 seconds*/(4) update suspicions vector and increase suspicions threshold counter;(5) endif(6) if ((suspicions threshold counter > k) OR (start byz consensus == TRUE)) then(7) Byzantine Consensus(suspicions vector);(8) endif
Upon receive(pi, SUSPECT) sent by pj do(9) increase suspicions counter for pi if we have not received this message from pj before;(10) if ((suspicions counter for pi ≥ f + 1) AND (we do not suspect pi)) then(11) Suspect Node(pi, FALSE);(12) endif
Upon timer.expire(pi, SUSPECT) do(13) if (have not started Byzantine Consensus) then(14) Byzantine Consensus(suspicions vector);(15) endif
procedure handle BYZANTINE CONSENSUS DECISION Event(msg)(16) if (I am suspected) then(17) create a singletone view;(18) return;(19) endif(20) cr := view id mod number of non suspected nodes ;(21) coord := min(i):{i ≥ cr AND msg[i] = 0 };(22) if (I AM COORD==TRUE) then(23) init FLUSH Protocol();(24) else /* I am not the coordinator of the group */(25) timer.set(coord,FLUSH,t2); /* waiting for FLUSH message from coordinator*/(26) endif
Upon timer.expire(coord, NEW VIEW) do(27) if (I AM COORD==FALSE) then(28) Suspect Node(coord, TRUE);(29) else if (VIEW CAUSED BY MERGE == TRUE AND I AM COORD==TRUE) then(30) Verbose.indict(big group coord);(31) new view msg := Create New V iew Message();(32) Uniform Broadcast(new view msg);(33) endif
procedure handle NEW VIEW Event(new view)(34) if (new view contains a correct new view) then(35) Install(new view); /* Installing new View*/(36) else /* the new view is not correct */(37) if (I AM COORD==FALSE) then(38) Suspect Node(pcoord, TRUE);(39) else /* I am the coordinator of the view */(40) create a singletone view;(41) endif;(42) endif;
Figure 6.4: Pseudo Code of Suspicion Protocolzecygd lewehext ly cew-ecaqt :6.4 xei`
93
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
in Lines 9–12 in Figure 6.4. Specifically, whenever a node pi locally suspects another node
pj , e.g., the fuzzy muteness or fuzzy verbosity levels of pj surpass a certain threshold, or
pj was caught trying to send a forged message, etc., pi marks pj as suspected (Lines 1 in
Figure 6.4). Whenever pi has some nodes marked as suspected, it slanders about these
nodes to all other view members. In return, if a node pk receives more than f + 1 slanders
about a node pj , then pk also marks pj as suspected. Notice that if f + 1 nodes slander
about pj , then at least one correct node locally suspects pj , and so it is safe to adopt this
suspicion.
Additionally, the first time in a given view that a node pi marks another node pj as
suspected, pi starts a timer and a counter for the number of nodes it suspects (Lines 2–4 in
Figure 6.4). Once the timer expires, or the number of nodes that pj suspects goes beyond
a predefined threshold, or the coordinator is suspected, pi starts a Byzantine consensus
protocol in order to decide on the failed nodes (Lines 6–8 in Figure 6.4 and Lines 13–15 in
Figure 6.4). Once the Byzantine consensus protocol terminates, the ith non-faulty node,
where i is the old view identifier modulo the number of members that are not suspected, is
supposed to generate a new view (Lines 20–23 in Figure 6.4 and Lines 14–17 in Figure 6.5).
If pi does not generate a new view or if pi generates a wrong view (Lines 37–38 in Figure 6.4),
it would result in re-execution of the view change protocol, and in particular of the Byzantine
consensus protocol (Lines 27–28 in Figure 6.4).
Note that the layered structure of JazzEnsemble allows us to utilize any known Byzantine
consensus protocol. In particular, the layer implementing consensus already enjoys intra-
view reliable delivery, and thus we can use any protocol that assumes this capability [17,
21, 23].
Notice, however, that we would like to decide on which nodes are faulty and which are
not. In other words, we must decide on a binary vector of suspicions. One option is to
use a Byzantine consensus protocol that works with any value domain in which the binary
vector can be viewed as a binary encoding of some value. However, we claim that this is
not adequate. The reason is that if all nodes think that some node pj is suspected, yet
there is a disagreement about another node pk, then the result would be a disagreement
about the suspicion vector, which means that any suspicion vector becomes a valid decision
value for the consensus protocol (by the definition of the Byzantine consensus problem). In
94
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
particular, this could result in never eliminating pj from the view, despite the fact that all
nodes suspect it!
In this work we use an adaptation of the mute failure detector based protocol reported
in [49] since this protocol is very simple, and since it terminates in one communication round
in favorable circumstances, i.e., when there are no Byzantine behavior other than process
crashes and network disconnections. In this protocol, we do the equivalent of running the
mute failure detector based protocol of [49] n times in parallel, once for each view member,
as listed in Algorithm 6.8. Yet, rather than actually invoking the protocol n times, we
invoke it once in a way that operates in parallel on each entry of the vector, providing an
independent element-wise Byzantine consensus semantics for each of the vector’s bits. The
details appear in Section 6.2.6 below.
Handling Verbose Nodes As mentioned before, a simple attack that Byzantine nodes
can play at all layers of JazzEnsemble is sending spurious messages in order to slow down the
entire group. In particular, in the case of membership, this means initiating too many view
changes that in fact do not result in eliminating or incorporating any node. Such behavior
is captured by the verbose failure detector, which will eventually trigger a suspicion that
such a node is Byzantine.
Merging Views
In order for concurrent views to locate each other, we employ an IP multicast based discovery
mechanism. That is, the coordinator of each view is supposed to periodically multicast a
message announcing its existence and the view it represents (Lines 2–3 in Figure 6.3).
This message is called a gossip message. All nodes in the system are supposed to listen
for gossip messages (this is in contrast to Ensemble and the non-Byzantine version of
JazzEnsemble, in which only coordinators listen for these messages). If correct nodes of a
view do not see gossip messages sent by their own coordinator, then they consider it a mute
failure on behalf of the coordinator (Line 5 in Figure 6.3).
When a coordinator of a view receives such a gossip message, it checks whether it
should try to merge with the reported view (Lines 9–11 in Figure 6.3). In particular, it
checks if the view identifier of the gossiped view is not older than its own view identifier,
95
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
procedure try to merge with another view(pj)(1) if (my group wants to join the other group) then(2) VIEW CAUSED BY MERGE := TRUE;(3) Byzantine Consensus(suspicions vector);(4) else /* may be the other coordinator wants to merge with my group */(5) send(view, GOSSIP MESSAGE, pj);
(6) endif;
procedure handle END OF FLUSH PROTOCOL Event()(7) if (VIEW CAUSED BY MERGE == TRUE AND I AM COORD OF SMALL GROUP==TRUE) then(8) bcast(msg, MERGE REQUEST);
(9) timer.set(big group coord, MERGE REPLY,t4);(10) else if (VIEW CAUSED BY MERGE == TRUE AND I AM COORD OF BIG GROUP==TRUE) then(11) bcast(msg, MERGE GRANTED);
(12) new view msg := Create New V iew Message();(13) Uniform Broadcast(new view msg);(14) else if (VIEW CAUSED BY MERGE == FALSE AND I AM COORD==TRUE) then(15) new view msg := Create New V iew Message();(16) Uniform Broadcast(new view msg);(17) endif
Upon receive(msg, MERGE REPLY) sent by pj do(18) if (msg.type == MERGE GRANTED) then(19) /*The big group wants to merge with us.*/(20) Uniform Broadcast(msg); /*notifying other nodes that we received MERGE GRANTED message */(21) timer.set(big group coord, NEW VIEW,t5); /*if I do not receive a view, I will generate a new view*/(22) else(23) merge denied by big group();(24) endif;
Upon timer.expire(pi, MERGE REPLY) do(25) merge denied by big group();
procedure merge denied by big group()(26) new view msg := Create New V iew Message();(27) Uniform Broadcast(new view msg);
Upon receive(MERGE REQUEST) sent by pj do(28) if (my group does not want to merge with the group of pj) then(29) bcast(msg, MERGE DENIED);
(30) else /* we want to merge with him */(31) VIEW CAUSED BY MERGE := TRUE;(32) Byzantine Consensus(suspicions vector);(33) endif;
Figure 6.5: Pseudo Code of Merge Protocolbefind lewehext ly cew-ecaqt :6.5 xei`
96
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
and that the membership lists of the two views do not intersect and both agree on the
same protocol stack. If these conditions do not hold, then the coordinator is supposed
to try merging with the gossiped view (Lines 1–6 in Figure 6.5) using a merge request
message, which is again sent using IP multicast (Lines 7–8 in Figure 6.5). In addition, the
coordinator starts a timer and when the timer expires and it has not received the answer
from the coordinator of the other group, it cancels the merge and creates a new local view
(Line 9, 25–27 in Figure 6.5).
Notice that the checks performed by the coordinator are deterministic, and can be done
by any group member based on its local knowledge. In order to save bandwidth, only
the coordinator sends a merge request. However, in order to protect against Byzantine
coordinators, all other nodes execute the same checks, and if the coordinator was supposed
to send a merge request, then they notify their fuzzy mute detector to expect it (Line 12
in Figure 6.3). Thus, if the coordinator does not send the merge request message, it will
eventually be suspected as mute. Moreover, the view members verify the contents of the
merge request message, and if it is bogus, they will also suspect the coordinator as being
Byzantine.
Similarly, when the coordinator pi of one group receives a merge request message from
the coordinator of another group, then pi performs similar sanity checks on it. If the message
is good pi starts a merging procedure that eventually leads to a new view (Lines 10–13 in
Figure 6.5, Lines 18–24 in Figure 6.5, Lines 28–33 in Figure 6.5). Otherwise, pi sends a
merge denied message that cancels the merge (Lines 28–29 in Figure 6.5).
Forming a New View
In order to reduce the performance impact of a Byzantine coordinator, we replace the
coordinator on each view change. The new coordinator is chosen as the ith non-faulty node,
where i is the old view identifier modulo the number of members that are not suspected.
Clearly, one can use other methods. However, it is preferable that the chosen method
would be locally computable, so that each node can locally verify who should act as coor-
dinator (Lines 20–21 in Figure 6.4).
When the coordinator of the new view sends a new view message, we must ensure
that all correct view members receive the same view message. This is easily obtained by
97
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
employing a uniform Byzantine delivery protocol (Line 32 in Figure 6.4 and Lines 13, 16 in
Figure 6.5). Here again, in principle, we could use any existing protocol, such as the one
by Bracha [17]. Practically, we have chosen to develop an optimized protocol that obtains
uniform broadcast with only two communication steps (instead of three in [17]), at the price
of f < n/6. The protocol is described in Section 6.2.6.
Message Agreement Inside a View
Recall that at this point in the paper, we already rely on the fact that we have a mechanism
for detecting lost messages and for recovery of such messages (if needed) by retransmission.
Thus, the only two things we still need to worry about are the following:
1. Whenever two correct nodes deliver two versions of the same message to their respec-
tive application module, then these two versions are the same.
2. If a correct node pi delivers a message m that was sent by another node that was
eliminated from a view V 1, then any other correct node pj that continues with pi to
its consecutive view V 2 will also deliver m during V 1.
In order to overcome the first problem, i.e., ensuring that every pair of correct nodes
agree on the content of a message they deliver to their respective application modules, we
use a Byzantine uniform broadcast layer, as described above. Yet, if the message is large,
it is possible to optimize and broadcast uniformly just the digest of the message. This is
because here we only need to ensure that one version of the same message is delivered to
all correct nodes. Once a correct message digest is received, the rest is taken care of in any
case by the reliable retransmission mechanism.
The second problem is solved using what is known in the literature as a flush protocol.
Specifically, we say that a broadcast message is stable if it was acknowledged by every
member that is not considered faulty. Thus, the coordinator does not send the new view
message until all the messages of the terminating view are stable (Lines 13, 16 in Figure 6.5).
Moreover, as part of the uniform broadcast mechanism of new views, a process does not
echo the view message until it knows that all messages it is aware of from the terminating
view are stable.
98
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
The Pseudo code of FLUSH Protocol by coordinator
procedure init FLUSH Protocol()(1) bcast (FLUSH); /* broadcasting FLUSH message */(2) timer.set(ALL NODES,FLUSH REPLY,t3); /* waiting for FLUSH REPLY message from other nodes*/
Upon timer.expire(ALL NODES,FLUSH REPLY) do(3) if (have not received FLUSH REPLY message from all nodes) then(4) set 0 for every node that has not sent a FLUSH REPLY message
and 1 otherwise in missing flush reply messages vec;(5) Uniform Broadcast(missing flush reply messages vec);(6) missing flush replies nodes := all entries in missing flush reply messages vec that contain 0;(7) timer.set(missing flush replies nodes,MISSING FLUSH REPLIES,t4);(8) endif
Upon timer.expire(missing flush replies nodes, MISSING FLUSH REPLIES) do(9) if (have not received FLUSH REPLY message from all nodes in missing flush replies nodes) then(10) set 1 in suspicions vector for every node that has not sent a FLUSH REPLY message;(11) /*suspicions vector contains all the nodes that have not sent FLUSH REPLY message*/(12) Byzantine Consensus(suspicions vector);(13) endif
Upon receive (FLUSH REPLY) sent by pi do(14) if (received FLUSH REPLY message from all nodes) then(15) GENERATE End of FLUSH Protocol EVENT;(16) endif
The Pseudo code of FLUSH Protocol by regular node
Upon timer.expire(coord, FLUSH) do(17) if (have not received FLUSH message from coordinator) then(18) /*start suspecting coordinator and Run Byzantine Consensus to try to remove him*/(19) Suspect Node(pcoord, TRUE);(20) endif
Upon receive (FLUSH) sent by pcoord do(21) send(FLUSH REPLY, coord);(22) timer.set(coord,NEW VIEW,t3); /* waiting for New View message from coordinator via Uniform Broadcast*/
Upon receive(missing flush replies nodes, MISSING FLUSH REPLIES) via Uniform Broadcast do(23) timer.set(missing flush replies nodes, MISSING FLUSH REPLIES,t4);(24) if (my entry in msg equals to 0) then(25) bcast(msg, FLUSH REPLY, coord);(26) endif
Upon timer.expire(missing flush replies nodes,MISSING FLUSH REPLIES) do(27) if (have not received FLUSH REPLY message from all nodes in missing flush replies nodes) then(28) set 1 in suspicions vector for every node that has not sent a FLUSH REPLY message;(29) /*suspicions vector contains all the nodes that have not sent FLUSH REPLY message*/(30) Byzantine Consensus(suspicions vector);(31) endif
Figure 6.6: Pseudo Code of FLUSH Protocoly`ltd lewehext ly cew-ecaqt :6.6 xei`
99
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Flush Protocol: The flush protocol is managed by the coordinator of the view. The
coordinator broadcasts a flush message to all the members of the view and starts a timer
(Lines 1–2 in Figure 6.6). Once the timer expires, the coordinator uniformly broadcasts the
vector of nodes that have not answered the flush message (missing flush reply messages vec)
and sets another timer (Lines 3–8 in Figure 6.6). The purpose of this uniform broadcast is to
cause the correct nodes to monitor the nodes that appear in missing flush reply messages vec.
In this way, if correct nodes will not hear the broadcast of a flush reply message by some
of the nodes in missing flush reply messages vec, they will start suspecting those nodes
and will suggest those nodes as faulty in the next execution of the Byzantine consensus
protocol (Lines 27–31 in Figure 6.6).
When a node receives a flush message from the coordinator of the view , it sends
(Line 21 in Figure 6.6) the list of stable messages in a flush reply message and starts a
timer (Line 22 in Figure 6.6). In addition, when a node receives a missing flush replies
message from the coordinator of the view and the node appears as someone that has not sent
a flush reply message, it casts (Lines 24–26 in Figure 6.6) the list of stable messages in a
flush reply message and also starts a timer (Line 23 in Figure 6.6). Once a timer expires,
if some of the nodes have not sent the flush reply message, they are suspected (Lines 9–
13, 27–31 in Figure 6.6). Otherwise, the coordinator is suspected for not generating a new
view (Lines 27–28 in Figure 6.4 and Lines 17–20 in Figure 6.6). Finally, if the coordinator
received flush reply messages from all the nodes, it tries to create a new view (Lines 14–16
in Figure 6.6 and Lines 7–17 in Figure 6.5).
Small Views
If the membership size n of a view is small, we can use a Byzantine consensus protocol and
a uniform broadcast protocol that work with f < n/3. If n drops below that, then there is
not much that can be done due to the theoretical lower bounds. However, by distinguishing
between the number of Byzantine nodes and the number of disconnected and crashed nodes,
we may be able to employ somewhat more resilient protocols that still work efficiently, along
the lines of [73].
Finally, a member of a small view that is unhappy with its view members, but does not
have enough supporters to establish the view it believes in, can always establish a singleton
100
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
view and try to gradually merge with nodes it trusts. Handling this is left for future work.
6.2.5 Total Ordering
While total ordering is not strictly required by virtual synchrony, it is a common option
in most group communication systems. Adding total ordering to virtual synchrony enables
obtaining atomic delivery, which is a basic mechanism for implementing a replicated state
machine semantics [105].
We have implemented total ordering as following: Nodes accumulate all the messages
they receive. Each node picks a subset of these message, chosen by some deterministic and
fair rule, and proposes them in the Consensus protocol. Once a batch of such messages is
decided on, these messages are delivered, and the process moves on to pick the next subset
to be proposed in the Consensus protocol and so forth.
As for the Byzantine consensus protocol, we have utilized the mute failure detector based
protocol of [49], which has the nice property that it terminates in a single communication
step in good scenarios (no failures and all processes initially propose the same message).
Interestingly, as we have discovered during our experiments, if the size of the subset of
messages to decide on is sufficiently large, and when there is a continuous load, or bursty
traffic, the amortized cost of deciding on each message becomes one communication step.
Specifically, in the first invocation of Consensus in each burst, there might be disagreement
regarding the proposals and therefore multiple communication steps are required to decide.
However, during this time, all nodes continue to accumulate messages. Thus, given that
the subsets of messages to be proposed to Consensus are chosen using a deterministic rule,
then the subsequent invocations of consensus terminate in one communication round!
Notice also that when the application messages are small, then the values proposed
to the Byzantine consensus protocol are the messages themselves. Thus, this implements
atomic broadcast without needing to run a separate uniform broadcast protocol. On the
other hand, when messages are large, it makes more sense to run the Consensus protocol
only on messages’ unique identifiers. However, in this case, we do need a separate uniform
broadcast mechanism, similar to the one we described in Section 6.2.4, to ensure that indeed
all correct nodes receive the same copy (content-wise) of a given message.
101
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
6.2.6 Efficient Implementations of building blocks
Vector Byzantine Consensus Protocol
In the vector Byzantine consensus problem, we assume that each node starts with a vector of
input bits of size n, known as the input vector. The goal is to have each correct process decide
on an output vector of size n, also called a decision vector. Yet, notice that as this protocol
is being run within a view, some otherwise correct nodes might become disconnected due
to the network. These nodes cannot be required to terminate their computation. Thus,
we introduce the notion of a core component.4 That is, we assume that among the set of
n nodes participating in the computation, there is a subset of at least n− f correct nodes
that are also connected, which we call the core component. With this definition, we say
that a protocol solves the Vector Byzantine Consensus problem if it ensures the following
requirements (these are simple extensions of standard Byzantine consensus requirements;
we repeat them here for completeness):
Vector Byzantine Validity: Let Vi be the decision vector of some core process pi that
decides. Then for each k, if the value of the entry Uj [k] in the input vector Uj of all
core processes is v, then Vi[k] = v.
Vector Byzantine Agreement: Let Vi be the decision vector of some core process pi
that decides and Vj the decision vector of another core process pj that decides. Then
for every k, Vi[k] = Vj [k].
Byzantine Termination: Eventually, every core process decides on some decision vector.
As indicated above, the Byzantine consensus protocol we employ is a simple extension of
the protocol of [49] to vectors and is based on having a♦Pmute failure detector. The protocol
ensures safety even when the failure detector is not obeying the properties of ♦Pmute.
The only requirement that depends on the properties of ♦Pmute to hold is termination.
Practically, in the actual implementation inside JazzEnsemble we use the fuzziness levels of
nodes as an approximation for ♦Pmute.
The pseudo-code is listed in Figure 6.7 and Algorithm 6.8. Intuitively, the protocol
includes two phases. In the first phase, each process collects the current estimates regarding4A similar approach was used in [35] for handling benign failures in partitionable networks.
102
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Variables:esti[ ] – a vector of current estimates of pi about the decision valuesdominatingi[ ] – a vector of majority estimates of pi about the decision valuesneed coord[ ] – a vector that contains the indexes of values that need to adopt the coordinator’s valueVi[ ][ ] – a matrix that contains current estimates of other processes
Figure 6.7: Main variables held by each process pi
pi jildz lk i"r miwfgeny mixwir mipzyn :6.7 xei`
the value that should be decided on. If some value is overwhelmingly dominating, we can
safely decide on it in the second phase. Otherwise, if there is a single value that was
reported by a significant number of the nodes, but not enough for a safe decision, we adopt
this value as the estimate for the next round, but do not decide on it yet. If even this does
not happen, and we were able to obtain the value of the coordinator without suspecting it,
then we adopt the value suggested to us by the coordinator. The idea is that if no value gets
enough support, than it means that we are not bound by validity to decide on any specific
value. In this case, if we are lucky and the round is controlled by a correct coordinator,
everyone will adopt the coordinator’s value and will be able to decide in the next round.
The fact that we replace a coordinator on each round ensures that eventually there will
be such a coordinator. On the other hand, if the current coordinator is mute, the failure
detector ensures that we will not wait for it forever.
Proof of the Vector Byzantine Consensus Protocol: In the following lemmas, we
prove the correctness of the algorithm for an arbitrary entry k in the vector. Since the
proof holds for each entry in the vector, it also holds for the entire vector. The proofs are
adaptations of the corresponding ones given in [49] to incorporate the notions of vectors
and core subsets; they are given here for completeness.
Lemma 6.2.1 Let us assume n > 4f , and consider the situation where, at the beginning
of a round r, all core processes pi have the same estimate value v[k] (i.e., esti[k] = v[k]).
They will never change their estimates thereafter.
Proof: Note that in every round, each core process collects at least (n − f) estimates.
Since at the beginning of round r, all core processes have v[k] as their initial estimate and
103
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
procedure Byzantine consensus(dominatingi[ ])(1)init : r ← 1; esti ← dominatingi; cr← hash(n,view id);(2)loop
—————————————– Step 1 of round r——————————————(3) Vi ← [⊥, . . . ,⊥](n ∗ n times); c ← ((c+1) mod n); r ← (r+1);(4) broadcast val(r, esti);(5) wait until val(ri,−) or dec(−) messages have been received from all non-suspected processes
and from at least (n− f) distinct processes/* We build here the matrix of estimates */
(6) for All j: do(7) if
(val (ri, estj) or dec(estj) received from pj
)then Vi[j] ← estj
(8) end if(9) end for
/* We are looking for the columns that the majority value appears more than > n/2 times */(10) for All k: do(11) if (∃v 6= ⊥ : #v(Vi[ ][k]) > n/2 ) then dominatingi[k] ← v;(12) else dominatingi[k] ← esti[k](13) end if(14) end for
—————————————– Step 2 of round r——————————————(15) if (i = cr) then broadcast coord(r, dominatingi)(16) end if(17) for All k: do(18) if (#dominatingi[k](Vi[][k]) ≥ (n− 2f −#⊥(Vi[][k]))
)then
(19) esti[k] ← dominatingi[k];(20) else need coord[k] ← 1;(21) end if(22) end for(23) if (∃ k s.t. need coord[k]=1) then(24) wait until
(coord(r,−) or dec(−) received from pc or pc is suspected
)(25) if (coord(r, x) or dec(x) received from pc) then(26) coord vali ← x;(27) else coord vali ← dominatingi;(28) end if(29) for All k: do(30) if (need coord[k]=1) then(31) esti[k] ← coord vali[k];(32) end if(33) end for(34) goto 3(35) else(36) for All k: do(37) if (#dominatingi[k](Vi[][k]) < (n− f)) then(38) goto 3;(39) end if(40) end for
/* if we haven’t jumped for any of the fields in the array, we can decide */(41) broadcast dec(esti) ; return (esti) ;(42) end if(43) end loop
Figure 6.8: ♦Pmute-Based Vector Byzantine Consensus Protocol Executed by pi (n > 6f)rveane ♦Pmute a ynzyny mikxr ly mixehwe xear zizpfia dnkqd ly lewehext :6.8 xei`
(n > 6f ) pi jildz i"r
104
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
as there are at most f Byzantine processes, then every core process pi will collect at least
n−2f estimates equal to its own estimate v[k]. As n > 4f , it follows that v[k] is a majority
value in Vi[ ][k] and therefore dominatingi[k] is set to v[k] (line 11). Hence, esti[k] is set to
dominatingi[k] = v[k] (line 19).
Lemma 6.2.2 [Validity] If all the core processes propose the same value v[k], then no value
v′[k] 6= v[k] can be decided.
Proof: This lemma is an immediate consequence of Lemma 6.2.1 when we consider r = 1.
As all estimates of core processes remain equal to v[k], it follows from line 41 that no value
v′[k] 6= v[k] can be returned by a core process.
Lemma 6.2.3 [Agreement] Let n > 6f . No two core processes decide different values.
Proof: Let r be the first round during which a core process pi decides, and let v[k] be the
value of entry k that it decides. Due to the lines 11 and 41, it follows that dominatingi[k] =
v[k] and #v(Vi)[ ][k]≥ n− f . Due to the fact that at most f processes are not in the core
component, it follows that, in the worst case, pj sees the same values as pi except for
#⊥(Vj)[ ][k] entries that are equal to ⊥ in Vj [ ][k] (those being equal to v in Vi[ ][k]), and
at most f other entries (those possibly corresponding to Byzantine processes that sent v[k]
to pi and v′[k] 6= v[k] to pj). It follows that #v(Vj [ ][k]) ≥ n − f − (f + #⊥(Vj [][k])), i.e.,
#v(Vj [ ][k]) ≥ n − 2f − #⊥(Vj [][k]) for any core process pj . As #⊥(Vj [ ][k]) ≤ f , we get
#v(Vj [ ][k]) ≥ n − 3f and, as n > 6f , it follows that v[k] is a majority value in Vj [ ][k].
Hence, dominatingj [k] = v[k] (line 11).
Moreover, as #v(Vj [ ][k]) ≥ n − 2f − #⊥(Vj [ ][k]), the test at line 18 is satisfied for any
core process pj and, accordingly, any core pj sets estj [k] to v[k] at line 19. If pj decides
at line 41, it decides v[k]. If pj proceeds to the next round, due to Lemma 6.2.1, no value
v′[k] 6= v[k] can be decided.
Lemma 6.2.4 No core process can block forever in a round.
Proof: The lemma follows immediately from the following observations. At each round
r: (a) as there are as most f non-core processes, no core process can block forever at line
105
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
5, and (b) as the failure detector satisfies the Muteness Strong Completeness property, no
core process can block forever at line 24.
Lemma 6.2.5 [Termination] Let n > 6f . Each core process eventually decides.
Proof: Let t be the time after which the failure detector is accurate, i.e., no core process
is suspected (due to the Eventual Strong Accuracy of the failure detector, such a time t
does exist). Let r be the first round that starts after t and is coordinated by a core process
pc. Let us observe that, due to Lemma 6.2.4 and the use of dec() messages (if any), any
core process pi that has not yet decided starts round r. During r, let dominatingc[k] = v[k].
Claim. At the end of r (where dominatingc[k] = v[k]), all core processes pi have esti[k] =
v[k]. End of the claim.
Due to the claim, it follows that all the core processes (that have not yet decided) start
the round r + 1 with the same estimate value v[k]. Moreover, due to (1) the fact that
there are at least (n − f) core processes, (2) the fact that the failure detector is accurate
(i.e., no core process is suspected), (3) the dec () messages sent by the processes that
have already decided (if any), and (4) the waiting statement of line 5 (messages are re-
ceived from all core processes), it follows that all the core processes pi are such that #v(Vi[
][k]) ≥ n − f , and v[k] is the only such value (because n − f > f). So, for any core pi,
we have dominatingi[k] = v[k] at line 11. Consequently, the test of line 18 is satisfied (for
every entry in the vector esti) and the test of line 37 is not satisfied for any column in the
matrix Vi, and each core process pi decides accordingly by the end of r + 1.
Proof of the claim. Let us first observe that if each core process executes line 31 for entry
k, it adopts v[k] as its new estimate and the claim trivially follows.
Let us consider the case where a process pi executes line 19, namely, esti[k] ← dominatingi[k].
Let dominatingi[k] = w. We have to show that v[k] = w. As pi executes line 19, the test of
line 18 is satisfied and we have #w(Vi[ ][k]) ≥ n−2f−#⊥(Vi[ ][k]). Moreover, as (1) pi is in
the core, (2) there are at most f non-core processes, (3) we are after the time t (and conse-
quently, each core process receives a message from each core process), we can conclude that
the entries m such that Vi[m][k] = ⊥ correspond to faulty processes. Consequently, for any
106
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
core process pj , we have #w(Vj [ ][k]) ≥ n−2f−#⊥(Vi[][k])−(f−#⊥(Vi[ ][k])), i.e., #w(Vj [
][k]) ≥ n− 3f . So, when we consider the coordinator pc, we get #w(Vc[ ][k]) ≥ n− 3f . As
n > 6f , we have #w(Vc) ≥ n−3f > n/2, and so w is a majority value in the vector Vc[ ][k].
It then follows from line 11, that dominatingc[k] = w. Hence w = v[k]. It follows that all
core processes pi have esti[k] = v[k] at the end of r. End of proof of the claim.
Theorem 6.2.6 Let n > 6f . The protocol described in Algorithm 6.8 solves the vector
Byzantine consensus problem.
Proof: The proof follows from the Lemmas 6.2.2, 6.2.3 and 6.2.5.
An Efficient Byzantine Uniform Broadcast Protocol
In the formal problem of uniform broadcast, a process is trying to send a message v to
all other processes such that all of them will deliver the same message. As in the case of
Byzantine consensus, in this work we assume that the view includes n processes, out of
which there is a core component of at least n− f processes that are correct and connected.
A protocol implements Uniform Byzantine Broadcast if it obeys the following requirements:
Broadcast Uniform Delivery: If a correct process p delivers a message v, then all other
core processes also deliver the value v. In particular, if two core processes deliver
values v and u respectively, then v = u.
Broadcast Termination: If a core process sends a message v, then every core process
delivers v.
The optimized protocol for implementing uniform broadcast appears in Figure 6.9. In-
tuitively, all messages that are sent in the k’th broadcast by p are tagged with (p, k), thereby
eliminating possible interference between broadcasts. There are two types of messages in
the protocol: initial and echo. The algorithm starts when the originator of the mes-
sage p sends an (initial,v, k) message, where v is the content of the actual message p
wishes to disseminate. Following this, the processes report to each other the value they
received via (echo,v, k) messages. If more than (n/2 + f + 1) (echo,v, k) messages (or
the (initial,v, k) message) are received by a process, it sends an (echo,v, k) to other
107
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
processes (if this process has not done so yet) and if the process receives (n/2 + 2f + 1)
messages, it delivers v. As is shown in the proof, this is enough to ensure uniform broadcast.
Function Uniform broadcast(vi, k)
step 0: (only by the originator): Send(initial,vi, k) to all the processes ;
step 1:Wait until Receive one (initial,v, k) message or (n/2 + f + 1) (echo,v, k) messages for some v;Send(echo,v, k) to all the processes;
step 2:Wait until Receive (n/2 + 2f + 1) (echo,v, k) messages for some v ;// The node accumulates echo messages it received from Step 1:// if the node gets at least (n/2 + 2f + 1) (echo,v, k) messages in both steps, it can decideDeliver(v);
Figure 6.9: Uniform Broadcast Protocol Executed by pi (n > 6f)pi znev i"r rveany dcig` dvtd ly lewehext :6.9 xei`
Correctness proof: As in the case of the proof of Byzantine consensus, we assume that
the terminating view includes a core component of n − f nodes, where n is the number
of nodes in the view. We show that if f < n/6, then the protocol in Figure 6.9 indeed
implements Uniform Byzantine Broadcast.
Lemma 6.2.7 For any given k, if two core processes p and q deliver values v and u re-
spectively, then u = v.
Proof: Assume by way of contradiction that the lemma does not hold. In order for p to
deliver v it must have received (n/2 + 2f + 1) (echo,v, k) messages, and therefore at least
n/2 + f + 1 (echo,v, k) messages from core processes. Similarly, q must have received at
least n/2+ f +1 (echo,u, k) messages from core processes. Therefore, some core process r
must have sent both (echo,v, k) and (echo,u, k) messages. But core processes, which by
definition are also correct, can send only one version of each message during a broadcast.
A contradiction. Therefore, u = v.
Lemma 6.2.8 For any given k, if a core process p delivers the value v, then every other
core process will eventually deliver v.
108
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Proof: If p delivers v, then p received (n/2 + 2f + 1) (echo,v, k) messages. At least
n/2 + f + 1 of these messages were sent by core processes. Therefore, every other core
process receives at least n/2 + f + 1 (echo,v, k) messages and sends its own (echo,v, k)
message. Thus, at least (n−f) processes will send (echo,v, k) message. Every core process
will eventually receive at least (n−f) ≥ (n/2+2f +1) (echo,v, k) messages and will deliver
v.
Lemma 6.2.9 For any k, if a core process p sends v, then all the core processes will deliver
v.
Proof: Suppose a core process p sends v; every other core process will receive an (initial,v, k)
message and will send an (echo,v, k) message. Therefore, every core process q will re-
ceive (n − f) ≥ (n/2 + 2f + 1) (echo,v, k) messages from core processes, and at most
f < (n/2 + 2f + 1) different messages from non-core processes. Therefore, q will deliver v.
6.3 Performance Evaluation
Our measurements were carried out on an IBM Blade Center cluster, comprising of 25 dual-
processor 2.2GHz PowerPC blades (JS20), each with 4GB of RAM and interconnected via
gigabit ethernet switches and running SuSE Linux Enterprise Server 9. Every blade has
only one NIC, and thus all applications running on the same blade share the same NIC,
even if they run on a different CPU. The blades were otherwise unloaded. We have run our
tests with groups ranging from 8 to 50 processes. In all tests we had only one process per
CPU. Additionally, in tests of up to 24 nodes, each process was run on a different blade,
while with larger groups we had two processes on each blade (so in large groups each two
processes shared a NIC, but were run on different CPUs). Also, due to the configuration
of our Blade Center, when the group size was above 12, part of the communication had to
cross two internal switches. Last, JazzEnsemble is implemented in OCaml. Therefore, we
relied on the OCaml CryptoKit for handling cryptography.
We have used the Ensemble Ring demo application to measure the performance of the
system. In this demo, the application advances in rounds. In each round, a node sends a
109
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
0 10 20 30 40 500
0.5
1
1.5
2
2.5
3
3.5
4
4.5
x 104
16−b
yte
mes
sage
s / s
econ
d
group size
JazzEnsByzEns+NoCryptoByzEns+SymCryptoByzEns+NoCrypto+TotalByzEns+PubCrypto(512 bits)
Figure 6.10: Throughput measurements (theline for public key cryptography is hardlyvisible, as it is so close to 0 compared with
the other lines)zercedd zenka zkxrnd zwetz :6.10 xei`ly mirevia lnqny ewd) miznvl exaredy,d`xp iyewa ianet gztn zervn`a dptvd
(mieewd x`yl d`eeydd 0 l aexw `edy oeeikn
0 10 20 30 40 500
1
2
3
4
5
6
7
8
9
10
aver
age
late
ncy
of 1
−byt
e m
essa
ges
in m
s
group size
JazzEnsByzEns+NoCryptoByzEns+SymCryptoByzEns+NoCrypto+Total
Figure 6.11: Latency measurements (the linefor public key cryptography is dropped since itis orders of magnitude higher than the others)ly mirevia lnqny ewd) xefg` onf :6.11 xei`onfy oeeikn cxed ianet gztn zervn`a dptvd
gztn mr dptvda miynzyn xy`k xefg``ly xefg`d ipnf x`yn lceb ixcqa lecb ianet
(ianet gztn mr dptvda miynzyn
burst of k messages and waits until it receives k messages from all other nodes, at which point
it moves to the next round. Thus, assigning k = 1 allows measuring the network latency.
Throughput is measured as the number of broadcast messages successfully delivered per
second (if a message is delivered to n nodes, we count it as one message for throughput
calculations).
As can be seen in Figure 6.10 and Figure 6.11, the performance is fairly scalable with
up to 50 members. We attribute some of the minor dip in throughput above 12 nodes to
the extra switch that some messages need to travel. Similarly, part of the minor dip above
24 nodes is due to the fact that each pair of processes shared a NIC in such large groups
(yet, each process was run on a separate CPU). Moreover, the OS kernel runs only on one
of the two processors; we discovered that any process that runs on the same processor as
the kernel enjoys better performance than processes that run on the second processor!
Additionally, we can see that without cryptography and uniform broadcast5, the per-
formance in about 85-90% of the performance of the non-Byzantine version of our system.
Or in other words, handling all attacks on reliable delivery, flow control, and membership5For a discussion of uniform broadcast, see Section 6.2.4 and the discussion after Definition 6.1.2.
110
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
maintenance reduces the throughput by about 10%-15%.
Symmetric key cryptography (AES with a 128-bit key) reduces the performance by
about half. This includes signing each message n− 1 times with a symmetric key. On the
other hand, the throughput with public key cryptography with a 512-bit key drops to a few
dozen messages per second, making it almost useless.
The line labelled “ByzEns+NoCrypto+Total” in Figure 6.10 illustrates the performance
of atomic delivery, obtained by placing a Byzantine consensus layer to order messages in a
total order (as described in Section 6.2.5).6 As can be seen, the performance is lower than
without total ordering with up to 24 nodes, with a significant drop above 24 nodes. The
drop in performance above 24 nodes is largely attributed to the fact that when we utilize
two processes on the same blade server, they both share the same NIC (but separate CPUs).
This means that when running Byzantine consensus, we are limited by the NICs capacity
due to the extra messages injected by this protocol.
Figure 6.12 focuses on the attainable throughput of the Byzantine version of JazzEnsem-
ble while also ensuring total ordering and uniform broadcast. As can be seen, symmetric
key roughly halves the throughput for both total ordering and uniform delivery (recall that
due to our use of consensus in the implementation of total ordering, total ordering already
satisfies uniform broadcast). The reason why uniform delivery is worse than total ordering
is that the implementation of consensus can decide on multiple messages in one instance.
Thus, the cost of the consensus protocol is averaged on multiple messages. Due to a bug in
JazzEnsemble, we were not able to implement a similar optimization for uniform delivery.
In general, both these protocols deliver reasonable performance for small clusters. However,
the performance decays as the cluster grows, due to the fact that both protocols require
O(n2) messages, or to be precise, O(n) broadcasts (with consensus averaging out this cost
on multiple messages). Interestingly, the performance decay looks linear rather that poly-
nomial. The reason is that the network is switched. Thus, the extra load imposed on each
link and each group member grows only according to O(n)!
At any event, we would like to emphasize once again that this is without packing/batching
optimizations [52]. When incorporating such optimizations, from sporadic testing, we be-
lieve that for small messages we can get a performance boost of at least a factor of 10, and6Also, the graph ends at 44 nodes rather than 50 since 6 nodes were trashed due to a UPS malfunction
during an electric break.
111
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
0 5 10 15 20 25 30 35 40 450
2000
4000
6000
8000
10000
12000
14000
16000
16−b
yte
mes
sage
s / s
econ
d
group size
NoCrypto+TotalNoCrypto+UniformNoCrypto+Total+UniformSymCrypto+TotalSymCrypto+UniformSymCrypto+Total+Uniform
Figure 6.12: Throughput Measurements: thecost of total ordering and uniform broadcast
with and without symmetric-keycryptography
`ln xcq ly xign :zkxrnd zwetz :6.12 xei`ilae mr zerced ly dcig` dvtde zerced ly
ixhniq gztn zervn`a dptvd
0 10 20 30 40 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
seco
nds
to s
tabi
lity
group size
ByzEns+NoCrypto merge−>initByzEns+NoCrypto leave−>init
Figure 6.13: Time to establish a new viewycg han ly dpwzdd cr onf :6.13 xei`
as much as a factor of 90 for 1 byte messages.
Figure 6.13 shows the latency to establish a new view for both merging a new node and
recovering from a failed or departed node (once the failure was detected). As can be seen,
this latency grows with the view size, and is roughly the same in both cases. However, even
with 50 nodes, it takes about 0.35 seconds to establish the view. On the other hand, the
exponential nature of the graph suggests that in order to grow to much larger groups, a more
scalable overlay based solution might be needed. However, overlays tend to be vulnerable
to Byzantine failures, so finding a practical solution to this is not a trivial task.
Finally, Table 6.1 details the time to recover from several scenarios. The scenarios
include a member leaving the group after sending a leave message, a node that becomes
mute and does not send anything, a coordinator that becomes mute, a node that becomes
too verbose and suspects other nodes all the time, and a coordinator that sends a view that
is different from the one expected. In all cases, the time presented is from the detection
of the failure until a new view is installed (but in the case of muteness and verbosity, does
not include the failure detection time itself as this is a tunable parameter). As can be seen,
in all cases, the recovery time is similar and is always less than 20 milliseconds. The main
112
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Name Meaning Recovery TimeByzLeave A node sends a leave message and then leaves 0.013 secByzMuteNode A node is mute and not sending anything 0.015 secByzMuteCoord The coordinator is mute 0.018 secByzVerboseNode A Byzantine node is too verbose and suspects nodes all the time 0.016 secCoordBadView The coordinator sends wrong view 0.014 sec
Table 6.1: Recovery time from problematic scenariosmiziira miyigxzn zeyye`zd onf :6.13 xei`
difference seem to be in whether all nodes start the consensus protocol roughly at the same
time and with the same value or not. These numbers were obtained with a group of 12
nodes. In groups of 50 nodes, the latency may grow up to 350 milliseconds; in those cases
the view latency is vastly dominated by the synchronization time of the view, as appears
in Figure 6.13.
113
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
114
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Chapter 7
Summary and Future Directions
In this chapter we conclude the outcome of this work. First, we present a short summary
of the results and discuss them. Then, we list a few key directions at which this work may
evolve further.
7.1 Summary of Results
In this thesis we have presented a scalable Byzantine group communication system. Our
system enjoys several interesting properties: It is a generic group communication system and
therefore can be used as a building block for various distributed applications. The system
is designed to perform well in the normal case, i.e., when no Byzantine failures occur, yet
be resilient to them if they do, as validated by our performance measurements. Also, our
protocols do not rely on protocol level signatures, and only sign (and authenticate) each
message once before sending it to (or receiving from) the network. The only exception to
this is retransmitting by a third node, which requires signatures at a low level of the system.
By examining our performance measurements, and in particular when focusing the
sources of overhead in handling Byzantine failures, one can make the following observa-
tions: The cost of handling Byzantine failures other than cryptography, total ordering, and
uniform broadcast (to be precise, ensuring that a Byzantine node does not send different
versions of its application messages to different nodes) or total ordering, is relatively small;
about 10-15% in our measurements. Also, public key cryptography is extremely expensive
115
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
to perform in software. Yet, using symmetric key cryptography while signing each broadcast
message n− 1 times results in acceptable performance degradation even in groups of up to
50 nodes. Furthermore, as security becomes increasingly important, it is conceivable that
in the future most computers will include hardware accelerators for it, which will reduce
its cost even further. Moreover, even with total ordering (by Byzantine consensus), the
performance of the system is still quite reasonable. Only a decade ago, a throughput of
4,000 messages per second on a cluster of 44 nodes would have been considered excellent
for a non-Byzantine tolerant system. However, the performance does degrade as the cluster
size increases.
Our layered architecture enabled us to perform fine grain measurements regarding the
cost of various performance limiting factors in fending off Byzantine faults. When con-
sidering the scalability problems of total ordering and uniform delivery, one is faced with
the following tradeoffs: First, public-key cryptography is too expensive. Moreover, due
to its CPU intensive behavior, its detrimental effect on throughput is even more consid-
erable than its impact on latency, as highlighted in [45]. Second, existing techniques for
implementing Byzantine resilient total ordering and uniform delivery that do not rely on
public-key cryptography are not very scalable. In contrast, the approach of [2] replaces the
use of Byzantine consensus and uniform delivery with a quorum approach. However, it only
ensures probabilistic termination, and requires increased clients to servers communication.
This makes it less attractive when the communication between clients and servers is worse
than the communication among the servers themselves, e.g., when all the servers are in the
same farm.
Additionally, we have presented RAPID and BDP, two reliable broadcast protocols for
mobile ad-hoc networks. BDP disseminates messages along the arcs of a logical overlay.
The protocol relies on signatures to prevent messages from being forged. It also employs
gossiping of headers of known messages to prevent a malicious overlay node from stopping
the dissemination of messages to the rest of the system. Moreover, for efficiency reasons,
the overlay maintenance mechanism is augmented to ensure that enough correct nodes are
elected to the overlay so that malicious nodes do not disconnect the overlay beyond the time
required to detect such behavior. Finally, the detection of observable malicious behaviors,
such as mute and verbose failures, are encapsulated within corresponding failure detectors
116
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
modules. The use of failure detectors simplifies the presentation of the protocol and makes
it more generic and robust. This is because the protocol need not deal explicitly with issues
like timers and timeouts.
We show that for non-sparse networks, BDP behaves very well. That is, BDP obtains
very high delivery ratios while sending much fewer messages than flooding. When there is
no malicious activity, it is almost as economical as a protocol that has no recovery mecha-
nism (and in particular, much more efficient than flooding). When some malicious failures
occur, BDP still remains more efficient than flooding, while maintaining a comparable de-
livery rate. In contrast, when there are malicious failures or mobility, having no recovery
mechanism results in a significant drop in delivery rates. Additionally, we discovered the in-
teresting anecdote that malicious failures have a somewhat reduced impact when the nodes
are mobile. Intuitively, when nodes are mobile, there is a lower chance that malicious nodes
will constantly be at critical positions on the message dissemination paths for all messages.
Yet, the problem with deterministic overlays is that due to the combination of mobility and
the decentralized nature of MANETs, maintaining overlays in MANETs is a complex and
expensive task. Finally, it is hard to make overlays resilient to malicious or even selfish
behavior in highly mobile networks.
We have also developed RAPID, a probabilistic reliable broadcast protocol for mobile
ad-hoc networks. The protocol includes a probabilistic flooding phase that is complemented
by two corrective measures, namely, counter based forwarding and a deterministic gossip
based mechanism. The latter enable recovering messages that were not delivered by the
probabilistic dissemination process while maintaining low communication overhead. The
probabilistic flooding part of the protocol takes advantage of the locally observed network’s
density in order to send a small number of messages, yet one that is still sufficient to deliver
the message to most nodes. This is in accordance with our formal analysis. This provides
very rapid dissemination of the message to most nodes in the system with low message
overhead, and in a way that is scalable in the network density.
117
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
7.2 Future Directions
As noted in Section 6, our Byzantine group communication system supports only single
hop networks. One of the interesting directions is to extend it to support multi-hop ad-hoc
networks. Recall that in ad-hoc networks, nodes cannot necessarily communicate directly
with each other. Instead, some nodes act as forwarders for the entire group. The main
places where this affects our work are that we need a Byzantine routing mechanism, and
the fact that the stability protocol and the failure detection must become gossip based [47].
Both the stability and failure detection protocols in Byzantine JazzEnsemble require to
receive messages from all group members, which was the case in single hop networks. Yet,
in multi hop networks, a node p can receive messages only from nodes whose distance from
p is less than their transmission range. Therefore, some devices will have to forward the
stability and failure detection messages from other nodes to their neighbors. It is even
more complicated if some of the nodes choose to behave in a Byzantine manner and not to
forward these messages. Thus, it can be a good direction to exploit BDP and RAPID, that
we developed, in Byzantine JazzEnsemble in order to reliably disseminate messages to all
the nodes.
Even if we use a malicious tolerant protocol like BDP, which sends relatively small num-
ber of messages, the number of messages that is sent by the stability and failure detection
protocols that are used in Byzantine JazzEnsemble is very high. In order to reduce the
number of messages, gossip based failure detection and stability detection protocols have
been proposed for MANET in benign failure models [47, 50]1. The challenge is how to
adapt these protocols to Byzantine failure prone environments in which a Byzantine node
can alter the messages it is forwarding.
The simplest way of overcoming many of the potential attacks against these protocols
is through the use of public-key cryptography. However, as we have discovered in this
work, the cost of public-key cryptography has the potential of rendering such protocols too
expensive. Thus, the gossip based failure detection and stability detection protocols that
will be developed should avoid using public-key cryptography in order to perform well in
practise.
1The work presented in [118] introduces a scalable gossip based protocol that provides timely detectionin large wired networks.
118
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Finally, one of the main problems in mobile ad-hoc networks is power. As nodes are
mobile, they are typically battery operated. It turns out that the network card consumes
roughly the same levels of energy when it sends a message, receives a message, and listens for
messages. The main source of energy saving is to put the card in sleeping mode. The IEEE
802.11 standard includes the Power Save Mode in order to deal with this problem in wireless
LANs when all messages are point to point. There have also been a few attempts to extend
this to multiple hops networks with point to point messages, such as [8]. Recently, many of
the mobile devices that arrive to the market are equipped with more than one networking
interface, such as Wi-Fi and Bluetooth. Since Bluetooth consumes considerately less energy
then Wi-Fi its possible to put the Wi-Fi card in a sleeping mode, while keeping the Bluetooth
interface active. When some node wants to forward a message to its neighbor p, it sends a
short wake-up message over Bluetooth interface to p, that causes p to start listening over
Wi-Fi interface as well. After receiving the message p can put its Wi-Fi card in a sleeping
mode again until the next wake-up interrupt. Such a trick can save a considerable amount
of energy and keep a network active for a longer period of time.
An interesting problem is to utilize techniques mentioned above and to develop a Byzan-
tine broadcast protocol for multiple hop ad hoc networks that enables most nodes to sleep
most of the time in order to reduce their energy consumption.
119
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
120
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Appendix A
Practical Application
WiPeer is a software that enables direct communication between computers, without the
need for Internet access. The software contains a user friendly interface and a suite of col-
laborative applications, operating in peer-to-peer mode over both WiFi communication and
Ethernet LANs. It includes a common management utility, to which the other applications
can be plugged in. Current applications include automatic discovery of devices at the same
network, presence notifications, instant messaging, file sharing, distributed file search, and
several multiplayer games. WiPeer’s technology is based on the JazzEnsemble group com-
munications toolkit that was designed to target wireless mobile ad-hoc networks (MANETs).
Therefore, any communication between two nearby devices is performed over the direct link
(in a peer-to-peer mode), without relying on a central server or infrastructure. It dramat-
ically improves the user’s experience, since such communication enjoys higher bandwidth
and lower latency than infrastructure based communication. WiPeer’s extendable core en-
ables to add new applications within very short development cycles, which makes it a highly
extendible platform. Thus, WiPeer may serve as a platform for mobile application, such as
mobile multiplayer games and productivity applications.
Figure A.1 describes the WiPeer’s architecture. WiPeer’s architecture is designed in
a way that enables re-use of most of the components, even when running above different
platforms. Thus, only components that depend on the functionality that is inherent to the
platform that WiPeer runs above need to be rewritten, such as network management, GUI
(graphic user interface) and power control of networking cards. All other components may
remain the same and should not be rewritten.
121
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
A.1 Future Work
One of the most important things is to extend the range of the communication between the
devices. In order to achieve it, we plan to implement multi hop routing protocols and to add
them to the JazzEnsemble group communication system. It will enable devices that are not
in direct range of each other to communicate with each other via intermediate nodes. In this
scenario, other devices will be used as proxy repeaters to transmit others’ messages. This
kind of networking allows extending the reach of proximity based communication without
usage of any infrastructure. One can imagine an entire school being connected to one big
multi hop wireless network, in which all pupils’ data exchanges are performed directly over
WiFi without the usage of cellular service providers infrastructure.
Network management Group communication:Discovery, membership,Reliable communicationManagement moduleChat File Sharing Presence SDKAPIGraphical User Interface ExternalGamesExternalApplicationsExternalGamesExternalGamesExternalApplicationsExternalApplicationsPowerControl
Platform dependentFigure A.1: WiPeer architecture
WiPeer ly dxehwhikx` :'`.1 xei`
122
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
References
[1] Swans/jist. http://jist.ece.cornell.edu/.
[2] M. Abd-El-Malek, G. Ganger, G. Goodson, M. Reiter, and J. Wylie. Fault-
Scalable Byzantine Fault-Tolerant Services. In Proc. 20th ACM SIGOPS Sym-
posium on Operating Systems Principles (SOSP), pages 59–74, October 2005.
[3] M. K. Aguilera, C. Delporte-Gallet, H. Fauconnier, and S. Toueg. On imple-
menting omega with weak reliability and synchrony assumptions. In PODC ’03:
Proceedings of the twenty-second annual symposium on Principles of distributed
computing, pages 306–314, New York, NY, USA, 2003. ACM.
[4] A. Aiyer, L. Alvisi, A. Clement, M. Dahlin, J.-P. Martin, and C. Porth. BAR
Fault-Tolerance for Cooperative Services. In Proc. 20th ACM SIGOPS Sympo-
sium on Operating Systems Principles (SOSP), pages 45–58, October 2005.
[5] D. Allen. Hidden terminal problems in Wireless LAN’s. In IEEE 802.11 Working
Group Papers, 1993.
[6] Y. Amir, G. Ateniese, D. Hasse, Y. Kim, C. Nita-Rotaru, T. Schlossnagle,
J. Schultz, J. Stanton, and G. Tsudik. Secure Group Communication in Asyn-
chronous Networks with Failures: Integration and Experiments. In Proc. of the
20th International Conference on Distributed Computing Systems, pages 330–
343, 2000.
[7] B. Awerbuch, D. Holmer, C. Nita-Rotaru, and H. Rubens. An On-Demand Se-
cure Routing Protocol Resilient to Byzantine Failures. In Proc. ACM Workshop
on Wireless Security (WiSe), Atlanta, GA, September 2002.
123
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
[8] B. Awerbuch, D. Holmer, and H. Rubens. The Pulse Protocol: Energy Efficient
Infrastructure Access. In Proc. of the 23rd Conference of the IEEE Communi-
cations Society (Infocom), March 2004.
[9] M. Backes and C. Cachin. Reliable Broadcast in a Computational Hybrid Model
with Byzantine Faults, Crashes, and Recoveries. In Proc. of the International
Conference on Dependable Systems and Networks (DSN), June 2003.
[10] G. Badishi, I. Keidar, and A. Sasson. Exposing and Eliminating Vulnerabilities
to Denial of Service Attacks in Secure Gossip-Based Multicast. In Proc. of the
International Conference on Dependable Systems and Networks (DSN), pages
201–210, June – July 2004.
[11] R. Baldoni, J. Helary, and M. Raynal. From Crash-Fault Tolerance to Arbitrary-
Fault Tolerance: Towards a Modular Approach. In Proc. of the IEEE Interna-
tional Conference on Dependable Systems and Networks (DSN), pages 273–282,
June 2000.
[12] Z. Bar-Yossef, R. Friedman, and G. Kliot. RaWMS - Random Walk based
Lightweight Membership Service for Wireless Ad Hoc Networks. In Proc. of
the 7th ACM Intr. Symposium on Mobile Ad Hoc Networking and Computing
(MobiHoc), pages 238–249, 2006.
[13] M. Ben-Or. Another Advantage of Free Choice: Completely Asynchronous
Agreement Protocols. In Proc. 2nd ACM Symposium on Principles of Dis-
tributed Computing, pages 27–30, 1983.
[14] K. Birman, M. Hayden, O. Ozkasap, Z. Xiao, M. Budiu, , and Y. Minsky.
Bimodal Multicast. ACM Transactions on Computer Systems, 17(2):41–88, May
1999.
[15] K. P. Birman. Building Secure and Reliable Network Applications. Manning
Publishing Company and Prentice Hall, December 1996.
[16] S. Bohacek, J. Hespanha, J. Lee, C. Lim, and K. Obraczka. Enhancing Security
via Stochastic Routing. In Proc. of the 11th IEEE International Conference on
Computer Communications and Networks, pages 58–62, May 2002.
124
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
[17] G. Bracha. An Asynchronous (n− 1)/3-Resilient Consensus Protocol. In Proc.
3rd ACM Symposium on Principles of Distributed Computing, pages 154–162,
1984.
[18] G. Bracha and S. Toueg. Asynchronous Consensus and Broadcast Protocols.
Journal of the ACM, 32(4):824–840, October 1985.
[19] J. Broch, D. A. Maltz, D. B. Johnson, Y.-C. Hu, and J. Jetcheva. A performance
comparison of multi-hop wireless ad hoc network routing protocols. In Proc. of
the 4th annual ACM/IEEE International Conference on Mobile Computing and
Networking (MobiCom), pages 85–97, 1998.
[20] C. Cachin, K. Kursawe, F. Petzold, and V. Shoup. Secure and Efficient Asyn-
chronous Broadcast Protocols. In Proc. of Advances in Cryptology: CRYPTO
2001, pages 524–541, 2001.
[21] C. Cachin, K. Kursawe, and V. Shoup. Random Oracles in Constantinople:
Practical Asynchronous Byzantine Agreement Using Cryptography. In Proc.
19th ACM Symposium on Principles of Distributed Computing, pages 123–132,
2000.
[22] T. Camp, J. Boleng, and V. Davies. A survey of mobility models for ad hoc
network research. Wireless Communications & Mobile Computing (WCMC):,
2(5):483–502, 2002.
[23] R. Canetti and T. Rabin. Fast Asynchronous Byzantine Agreement with Opti-
mal Resilience. In Proc. 25th Annual ACM Symposium on Theory of Computing,
pages 42–51, 1993.
[24] J. Cartigny and D. Simplot. Border Node Retransmission Based Probabilistic
Broadcast Protocols in Ad-Hoc Networks. Telecommunication Systems, 22(1–
4):189–204, 2003.
[25] M. Castro and B. Liskov. Practical Byzantine Fault Tolerance and Proactive
Recovery. ACM Transactions on Computer Systems, 20(4):398–461, 2002.
125
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
[26] T. Chandra, V. Hadzilacos, S. Toueg, and B. Charron-Bost. On the Impossibility
of Group Membership. In Proc. of the 15th ACM Symposium of Principles of
Distributed Computing, pages 322–330, May 1996.
[27] T. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Sys-
tems. Journal of the ACM, 43(4):685–722, July 1996.
[28] I. Chang, M. Hiltunen, and R. Schlichting. Affordable Fault Tolerance Through
Adaptation. In Proc. of Workshop on Fault-Tolerant Parallel and Distributed
Systems (LNCS 1388), pages 585–603, April 1998.
[29] M.-H. Chek and Y.-K. Kwok. On Adaptive Frequency Hopping to Combat IEEE
802.11b with Practical Resource Constraints. In International Symposium on
Parallel Architectures, Algorithms and Networks (ISPAN), pages 391–396, May
2004.
[30] G. Chockler, I. Keidar, and R. Vitenberg. Group Communication Specifications:
A Comprehensive Study. ACM Computing Surveys, 33(4):427–469, 2001.
[31] T. Clause, P. Jacquet, and A. Laouti. Optimized Link State Routing Protocol.
In Proc. IEEE International Multi Topic Conference (INMIC), December 2001.
[32] M. Correia, N. Neves, L. Lung, and P. Verıssimo. Low Complexity Byzantine-
Resilient Consensus. Distributed Computing, 17(3):237–249, March 2005.
[33] F. Cristian, H. Aghili, R. Strong, and D. Dolev. Atomic Broadcast: From Simple
Diffusion to Byzantine Agreement. In Proc. of the 15th International Conference
on Fault-Tolerant Computing, pages 200–206, Austin, Texas, 1985.
[34] A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Stur-
gis, D. Swinehart, and D. Terry. Epidemic algorithms for replicated database
maintenance. In Proc. of the 6th annual ACM Symposium on Principles of
Distributed Computing (PODC), pages 1–12, New York, NY, USA, 1987. ACM
Press.
[35] D. Dolev, R. Friedman, I. Keidar, and D. Malki. Failure Detectors in Omission
Failure Environments. Technical Report TR96–1608, Department of Computer
Science, Cornell University, 1996.
126
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
[36] S. Dolev, E. Schiller, and J. Welch. Random Walk for Self-Stabilizing Group
Communication in Ad Hoc Networks. In Proc. of the 21st Annual Symposium
on Principles of Distributed Computing, pages 259–259, 2002.
[37] A. Doudou, B. Garbinato, R. Guerraoui, and A. Schiper. Muteness Failure
Detectors: Specification and Implementation. In Proc. 3rd European Dependable
Computing Conference, pages 71–87, 1999.
[38] A. Doudou and A. Schiper. Muteness Detectors for Consensus with Byzantine
Processes (Brief Announcement). In Proc. 17th ACM Symposium on Principles
of Distributed Computing (PODC), page 315, 1998.
[39] V. Drabkin, R. Friedman, A. Kama, and B. Mudrik. JazzEnsemble: a Group
Communication System for MANET. Technical report, Computer Science, Tech-
nion, 2005.
[40] P. T. Eugster, R. Guerraoui, S. B. Handurukande, P. Kouznetsov, and A.-
M. Kermarrec. Lightweight Probabilistic Broadcast. ACM Transactions on
Computing Systems, 21(4):341–374, 2003.
[41] P. Felman and S. Micali. Optimal Algorithms for Byzantine Agreement. In
Proc. 20th Annual ACM Symposium on Theory of Computing, pages 148–161,
1988.
[42] S. Floyd, van Jacobson, S. McCanne, C. Liu, and L. Zhang. A Reliable Multicast
Framework for Light-Weight Sessions and Application Level Framing. In Proc.
ACM SIGCOMM’95, August 1995.
[43] R. Friedman. Fuzzy Group Membership. In Proc. of FuDiCo 2002: Interna-
tional Workshop on Future Directions of Distributed Computing, pages 60–63,
Bertinoro, Italy, June 2002.
[44] R. Friedman, M. Gradinariu, and G. Simon. Locating Cache Proxies in
MANETs. In Proc. 5th ACM International Symposium on Mobile Ad Hoc Net-
working and Computing (MobiHoc), pages 175–186, May 2004.
127
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
[45] R. Friedman and E. Hadad. On the Significance of Latency vs. Throughput in
Analyzing the Performance of Distributed Systems. IEEE Distributed Systems
Online: “Distributed Wisdom” Column, January 2006.
[46] R. Friedman and A. Kama. Strong Replication Semantics in Mobile Ad-Hoc
Networks. Technical report, Computer Science, Technion, 2005.
[47] R. Friedman, S. Manor, and K. Guo. Scalable Hypercube Based Stability De-
tection. IEEE Transactions on Parallel and Distributed Systems, 13(8), August
2002.
[48] R. Friedman, A. Mostefaoui, and M. Raynal. Simple and Efficient Oracle-Based
Consensus Protocols for Asynchronous Byzantine Systems. In Proc. of the 23rd
IEEE International Symposium on Reliable Distributed Systems (SRDS), pages
228–237, October 2004.
[49] R. Friedman, A. Mostefaoui, and M. Raynal. Simple and Efficient Oracle-Based
Consensus Protocols for Asynchronous Byzantine Systems. IEEE Transactions
on Dependable and Secure Computing, 2(1):46–56, March 2005.
[50] R. Friedman and G. Tcharny. Evaluating Failure Detection in Mobile Ad-Hoc
Networks. International Journal of Wireless and Mobile Computing, 1(8), 2005.
[51] R. Friedman and R. van Renesse. Strong and Weak Virtual Synchrony in Horus.
In Proc. of the 15th Symposium on Reliable Distributed Systems, pages 140–149,
October 1996.
[52] R. Friedman and R. van Renesse. Packing Messages as a Tool for Boosting the
Performance of Total Ordering Protocols. In Proc. of the Sixth IEEE Interna-
tional Symposium on High Performance Distributed Computing, pages 233–242,
August 1997.
[53] D. Gavidia, S. Voulgaris, and M. van Steen. Epidemic-style Monitoring in Large-
Scale Sensor Networks. Technical Report IR-CS-012, Vrije Universiteit, Nether-
lands, March 2005.
[54] K. Guo and I. Rhee. Message Stability Detection for Reliable Multicast. In
Proc. of IEEE INFOCOM’2000, March 2000.
128
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
[55] P. Gupta and P. Kumar. Critical Power for Asymptotic Connectivity in Wire-
less Networks. In Stochastic Analysis, Control, Optimization and Applications,
Birkhauser, Boston, pages 547–566, 1998.
[56] Z. Haas. A New Routing Protocol for the Reconfigurable Wireless Networks. In
Proc. IEEE Int. Conf. on Universal Personal Communications (ICUP), October
1997.
[57] Z. Haas, J. Halpern, and L. Li. Gossip-Based Ad Hoc Routing. In Proc. of
the 21st Conference of the IEEE Communication Society (INFOCOM), pages
1707–1716, June 2002.
[58] M. Hayden. The Ensemble System. Technical Report TR98-1662, Department
of Computer Science, Cornell University, January 1998.
[59] M. Hiltunen, R. Schlichting, and C. Ugarte. Survivability Issues in Cactus. In
Proc. of the IEEE Information Survivability Workshop, October 1998.
[60] I. M. A. hoc Networks Working Group. Jitter considerations in Mobile Ad Hoc
Networks (MANETs).
[61] I. M. A. hoc Networks Working Group. The Optimized Link State Routing
Protocol version 2.
[62] F. Ingelrest, D. Simplot-Ryl, and I. Stojmenovic. Broadcasting in Hybrid Ad
Hoc Networks. In Proc. 2nd Annual Conference on Wireless On demand Net-
work Systems and Services (WONS), 2005.
[63] I. Ioannidis and B. Carbunar. Scalable Routing in Hybrid Cellular and Ad-Hoc
Networks. In 1st IEEE International Conference on Mobile Ad Hoc and Sensor
Systems (MASS), October 2004. Poster.
[64] M. Jelasity, R.Guerraoui, A.-M. Kermarrec, and M. van Steen. The peer sam-
pling service: experimental evaluation of unstructured gossip-based implemen-
tations. In Proc. of the 5th Middleware, pages 79–98, 2004.
[65] D. Johnson and D. Maltz. Dynamic Source Routing in Ad Hoc Wireless Net-
works. In Mobile Computing, volume 353. 1996.
129
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
[66] B. Karp. Geographic Routing for Wireless Networks. PhD thesis, Harvard
University, 2000.
[67] A.-M. Kermarrec and M. van Steen. Gossiping in Distributed Systems. SIGOPS
Operating Systems Review, 41(5):2–7, 2007.
[68] A. Keshavarz-Haddad, V. J. Ribeiro, and R. H. Riedi. Color-based broadcasting
for ad hoc networks. In Proc. of the 4th IEEE Int. Symposium on Modeling and
Optimization in Mobile, Ad-Hoc and Wireless Networks (WiOpt), pages 49–58,
April 2006.
[69] K. Kihlstrom, L. Moser, and P. Melliar-Smith. Solving Consensus in a Byzantine
Environment Using an Unreliable Fault Detector. In Proc. of the Int. Conference
on Principles of Distributed Systems, pages 61–75, 1997.
[70] K. Kihlstrom, L. Moser, and P. Melliar-Smith. The SecureRing Group Com-
munication System. ACM Transactions on Information and System Security,
4(4):371–406, 2001.
[71] Y. Ko and N. H. Vaidya. Geocasting in Mobile Ad Hoc Networks: Location-
Based Multicast Algorithms. In Proc. 2nd IEEE Workshop on Mobile Computer
Systems and Applications, page 101, 1999.
[72] R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong. Zyzzyva: specula-
tive byzantine fault tolerance. In SOSP ’07: Proceedings of twenty-first ACM
SIGOPS symposium on Operating systems principles, pages 45–58, 2007.
[73] L. Lamport. Lower Bounds for Asynchronous Consensus. In A. Schiper,
A. Shvartsman, H. Weatherspoon, and B. Zhao, editors, Future Directions in
Distributed Computing: Research and Position Papers, number 2584 in LNCS,
pages 22–23. Springer, 2003.
[74] L. Lamport, R. Shostak, and M. Pease. The Byzantine Generals Problem. ACM
Transactions on Programming Languages and Systems, 3(4):382–401, July 1982.
[75] A. Laouiti, A. Qayyum, and L. Viennot. Multipoint Relaying: An Efficient
Technique for Flooding in Mobile Wireless Networks. In Proc. 35th IEEE Annual
Hawaii International Conference on System Sciences (HICSS), 2001.
130
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
[76] P. Levis, N. Patel, D. Culler, and S. Shenker. Trickle: A self-regulating algorithm
for code propagation and maintenance in wireless sensor networks, 2004.
[77] J. Li, J. Jannotti, D. S. J. D. Couto, D. R. Karger, and R. Morris. A Scalable
Location Service for Geographic Ad Hoc Routing. In Proc. 6th Annual Inter-
national Conference on Mobile Computing and Networking (MobiCom), pages
120–130, 2000.
[78] M.-J. Lin, K. Marzullo, and S. Masini. Gossip versus Deterministically Con-
strained Flooding on Small Networks. In Proc14th International Conference on
Distributed Computing 2000, pages 253–267, October 2000.
[79] J. Luo, P. Eugster, and J.-P. Hubaux. PILOT: ProbabilistIc lightweight group
communication system for mobile ad hoc networks. IEEE Trans. on Mobile
Computing, 3(2):164–179, April–June 2004.
[80] M. Macedonia and D. Brutzman. MBone Provides Audio and Video Across the
Internet. IEEE Computer, 27(4):30–36, April 1994.
[81] D. Malkhi, Y. Mansour, and M. Reiter. Diffusion Without False Rumors: on
Propagating Updates in a Byzantine Environment. Theoretical Computer Sci-
ence, 1–3(299):289–306, April 2003.
[82] D. Malkhi and M. Reiter. A High-Throughput Secure Reliable Multicast Pro-
tocol. Journal of Computer Security, 5:113–127, 1997.
[83] P. McDaniel, A. Prakash, and P. Honeyman. Antigone: A Flexible Framework
for Secure Group Communication. Technical Report CITI TR 99-2, University
of Michigan, Ann Arbor, MI, USA, September 1999.
[84] Y. Minsky and F. Schneider. Tolerating Malicious Gossip. Distributed Comput-
ing, 16(1):49–68, 2003.
[85] H. Miranda, S. Leggio, L. Rodrigues, and K. Raatikainen. A Power-Aware
Broadcasting Algorithm. In In Proc. of The 17th Annual IEEE Interna-
tional Symposium on Personal, Indoor and Mobile Radio Communications
(PIMRC’06), September 2006.
131
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
[86] G. Neiger and S. Toueg. Automatically Increasing the Fault-Tolerance of Dis-
tributed Algorithms. Journal of Algorithms, 11(3):374–419, September 1990.
[87] P. Panchapakesan and D. Manjunath. On the Transmission Range in Dense
Ad Hoc Radio Networks. In Proc. of IEEE Signal Processing Communication
(SPCOM), 2001.
[88] P. Papadimitratos and Z. Haas. Secure Routing for Mobile and Ad Hoc Net-
works. In Proc. Communication Networks and Distributed Systems Modeling
and Simulations Conference, January 2002.
[89] P. Papadimitratos and Z. Haas. Secure Message Transmission in Mobile and Ad
Hoc Networks. Ad Hoc Networks, 1, July 2003.
[90] S. Paul, K. K. Sabnani, J. C. Lin, and S. Bhattacharya. Reliable Multicast
Transport Protocol (RMTP). IEEE Journal on Selected Areas in Communica-
tions, 15(3):407–421, April 1997. Special issue on Network Support for Multi-
point Communication.
[91] M. D. Penrose. Random Geometric Graphs. Oxford Press, 2003.
[92] C. Perkins. Ad Hoc On Demand Distance Vector (AODV)
Routing. Internet Draft, draft-ietf-manet-aodv-00.txt, cite-
seer.nj.nec.com/article/perkins99ad.html, 1997.
[93] S. Pleisch, M. Balakrishnan, K. Birman, and R. van Renesse. MISTRAL: Ef-
ficient Flooding in Mobile Ad-hoc networks. In Proc. of the 7th ACM Inter-
national Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc),
pages 1–12, 2006.
[94] M. Rabin. Randomized Byzantine Generals. In Proc. 24th IEEE Symposium on
Foundations of Computer Science, pages 403–409, 1983.
[95] H. Ramasamy, A. Agbaria, and W. Sanders. Parsimony-Based Approach for
Obtaining Resource-Efficient and Trustworthy Execution. In Proc. 2nd Latin-
American Dependable Computing Symposium (LADC), pages 206–225, October
2005.
132
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
[96] H. Ramasamy and C. Cachin. Parsimonious Asynchronous Byzantine-Fault-
Tolerant Atomic Broadcast. In Proc. 9th International Conference on Principles
of Distributed Systems, December 2005.
[97] H. Ramasamy, P. Pandey, J. Lyons, M. Cukier, and W. Sanders. Quantifying
the Cost of Providing Intrusion Tolerance in Group Communication Systems.
In Proc. of the IEEE Conference on Dependable Systems and Networks, pages
229–238, 2002.
[98] A. Rao, C. Papadimitriou, S. Shenker, and I. Stoica. Geographic Routing with-
out Location Information. In Proc. 9th Annual International Conference on
Mobile Computing and Networking (MobiCom), pages 96–108, 2003.
[99] M. Reiter. Distributed Trust with the Rampart Toolkit. Communications of
the ACM, 39(4):70–74, April 1996.
[100] O. Rodeh. Secure Group Communication. PhD thesis, School of Computer
Science and Engineering, The Hebrew University of Jerusalem, 2001.
[101] A. Rowstron, A.-M. Kermarrec, M. Castro, and P. Druschel. SCRIBE: The
Design of a Large Scale Event Notification Infrastructure. In Proceedings of 3rd
International Workshop on Networked Group Communication, November 2001.
[102] E. Royer, P. Melliar-Smith, and L. Moser. An Analysis of the Optimum Node
Density for Ad hoc Mobile Networks. In Proc. of the IEEE International Con-
ference on Communications, June 2001.
[103] K. Sanzgiri, B. Dahill, B. Levine, C. Shields, and E. Belding-Royer. A Secure
Routing Protocol for Ad Hoc Networks. In Proc. of the IEEE International
Conference on Network Protocols (ICNP), November 2002.
[104] Y. Sasson, D. Cavin, and A. Schiper. Probabilistic Broadcast for Flooding in
Wireless Mobile Ad hoc Networks. In Proc. of the IEEE Wireless Comm. and
Networking Conference (WCNC), March 2003.
[105] F. B. Schneider. The state machine approach: a tutorial. Technical Report TR
86-800, Department of Computer Science, Cornell University, December 1986.
Revised June 1987.
133
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
[106] B. Schneier. Applied Cryptography. Wiley, 1996.
[107] D. Scott and A. Yasinsac. Dynamic probabilistic retransmission in ad hoc net-
works. In Proc. of the Int. Conference on Wireless Networks (ICWN), pages
158–164, Las Vegas, Nevada, June 2004.
[108] K. Singh, A. Nedos, G. Gaertner, and S. Clarke. Message Stability and Reliable
Broadcasts in Mobile Ad-Hoc Networks. In Proc. of the 4th ADHOC-NOW,
pages 297–310, October 2005.
[109] I. Stojmenovic, M. Seddigh, and J. Zunic. Dominating Sets and Neighbor Elim-
ination Based Broadcasting Algorithms in Wireless Networks. IEEE Transac-
tions on Parallel and Distributed Systems, 13(1):14–25, January 2002.
[110] A. Tanenbaum. Computer Networks (4th edition). Prentice Hall PTR, 2003.
[111] A. S. Tanenbaum. Computer Networks. Prentice Hall, 1996. 3rd Ed.
[112] C. Toh. Ad Hoc Mobile Wireless Networks. Prentice Hall, 2002.
[113] S. Toueg. Randomized Byzantine Agreement. In Proc. 3th ACM Symposium on
Principles of Distributed Computing, pages 163–178, 1984.
[114] Y.-C. Tseng, S.-Y. Ni, Y.-S. Chen, and J.-P. Sheu. The broadcast storm problem
in a mobile ad hoc network. Wireless Networks, 8(2/3):153–167, 2002.
[115] Y.-C. Tseng, S.-Y. Ni, and E.-Y. Shih. Adaptive approaches to relieving broad-
cast storms in a wireless multihop mobile ad hoc networks. In Proc. of the 21st
International Conference on Distributed Computing Systems (ICDCS), pages
481–488, 2001.
[116] C. University. JiST/SWANS Java in Simulation Time / Scalable Wireless Ad
Hoc Network Simulator.
[117] R. van Renesse, K. Birman, and S. Maffeis. Horus: A Flexible Group Commu-
nication System. Communications of the ACM, 39(4):76–83, April 1996.
[118] R. van Renesse, Y. Minsky, and M. Hayden. A Gossip Style Failure Detection
Service. In IFIP Intl. Conference on Distributed Systems Platforms and Open
Distributed Processing (Middleware ’98), pages 55–70, April 1998.
134
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
[119] P. Verıssimo, N. Neves, and M. Correia. The Middleware Architecture of MAF-
TIA: A Blueprint. In Proc. of the IEEE Third Information Survivability Work-
shop, October 2000.
[120] E. Vollset and P. Ezhilchelvan. Enabling reliable many-to-many communication
in ad-hoc pervasive environments. In Proc. of the 2nd Intr. Workshop on Mobile
Peer-to-Peer Computing (MP2P), 2005.
[121] C. Wu and Y. Tay. AMRIS: A Multicast Protocol for Ad-Hoc Wireless Networks.
In Proc. of the IEEE MILCOMM, Nov. 1999.
[122] J. Wu, M. Gao, and I. Stojmenovic. On Calculating Power-Aware Connected
Dominating Sets for Efficient Routing in Ad Hoc Wireless Networks. In Proc.
of the 30th International Conference on Parallel Processing (ICPP), pages 346–
353, 2001.
[123] J. Wu and H. Li. On Calculating Connected Dominating Sets for Efficient
Routing in Ad Hoc Wireless Networks. In Proc. of the 3rd DialM, pages 7–14,
1999.
[124] S. Yi, P. Naldurg, and R. Kravets. Security Aware Ad Hoc Routing for Wire-
less Networks. In Proc. ACM Syposium on Mobile Ad Hoc Networking and
Computing, October 2001.
[125] J. Yin, J.-P. Martin, A. Venkataramani, L. Alvisi, and M. Dahlin. Separating
Agreement from Execution for Byzantine Fault Tolerant Services. In Proc. of the
19th ACM Symposium on Operating Systems Principles, pages 253–267, 2003.
[126] M. Zapata and N. Asokan. Secure Ad Hoc Routing Protocol. In Proc. ACM
Workshop on Wireless Security, 2002.
[127] Q. Zhang and D. P. Agrawal. Dynamic probabilistic broadcasting in MANETs.
Journal of Parallel Distributed Computing, 65(2):220–233, 2005.
135
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
136
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
wed-c` zezyxa mipin` zxeywz ilewehext
oiwaxc mic`e
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
wed-c` zezyxa mipin` zxeywz ilewehext
xwgn lr xeaig
x`ez zlawl zeyixcd ly iwlg ielin myl
diteqelitl xehwec
oiwaxc mic`e
l`xyil ibelepkh oekn — oeipkhd hpql ybed
2008 ipei dtig g"qyz oeiq
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
aygnd ircnl dhlewta oncixt irex g"text ziigpda dyrp xwgnd
izenlzyda daicpd zitqkd dkinzd lr oeipkhl dcen ip`
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
mipiipr okez
1 zilbp`a xivwz
3 zilbp`a mixeviwe milnq
5 dncwd 1
11 zexeyw zecear 2
11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . aaexn xeciy 2.1
16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zelitp i`lbe zeizpfia zelitp 2.2
16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . letkye zizveaw zxeywz 2.3
21 `ean 3
21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . wed-c` zezyx ly lcen 3.1
22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zeipecf zelitp 3.2
22 . . . . . . . . . . . . . . . . . . . . . . . miipecf miznv ly gekd lr zelabn 3.2.1
24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zelitp i`lb 3.3
25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zelitp i`lb ly wynn 3.3.1
29 zeiheg-l` wed-c` zezyxa dpin` zizexazqd dvtd 4
29 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lcend zxcbd 4.1
30 . . . . . . . . . . . . . . . . . . . . . . . . . . . zerced ly dpin` dvtdl zewipkh 4.2
30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zizexazqd dvtd 4.2.1
`
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
(jynd) mipiipr okez a
34 . . . . . . . . . . . . . . . . . . . . . . . . zerced zxitq lr zqqean dvtd 4.2.2
36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zihi` zelikx 4.2.3
37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RAPID lewehext 4.3
38 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iqiqa RAPID 4.3.1
40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . agxen RAPID 4.3.2
43 . . . . . . . . . . . . . . . . . . . . zeipecf zelitp ipta cinr xy` RAPID 4.3.3
48 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zeivleniq 4.4
48 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . divleniqd zxevz zxcbd 4.4.1
50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ze`vez 4.4.2
59 zeiheg-l` wed-c` zezyxa izxeywz cly zqqean dpin` dvtd 5
59 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lcend zxcbd 5.1
59 . . . . . . . . . . . . . . . . . . . . . . . . . . znev ly dxehwhikx`e zelitp i`lb 5.2
61 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . dvtdd zira 5.3
61 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . dvtdd lewehext 5.4
62 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . dvtdd oepbpn ly hexit 5.4.1
62 . . . . . . . . . . . . . . . . . . . . . . . . zercedd znlyd oepbpn ly hexit 5.4.2
66 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . izxeywz cly zwefgz 5.5
68 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zepekp zgked 5.6
70 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lewehextd gezip 5.6.1
72 . . . . znieqn dtewz ixg` dreh `l xy` zelitp i`lb mr dxidn dvtd 5.6.2
72 . . . . . . . minieqn onf iwxt lr lret xy` zelitp i`lb mr dxidn dvtd 5.6.3
75 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ze`vez 5.7
81 zeihpfia zetwzd ipta dcinr zizveaw zxeywz 6
81 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lcend zxcbd 6.1
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
b (jynd) mipiipr okez
81 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ceqi ibyen 6.1.1
83 . . . . . . . . . zeizpfia zelitp ipta dpiqg xy` zil`ehxie divfipexkpiq 6.1.2
87 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . oexztd ly zillk dxiwq 6.2
87 . . . . . . . . . . . . . . . . . . . . dveawa ziwlg zexage JazzEnsemble 6.2.1
89 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . miiwlg zelitp i`lb 6.2.2
91 . . . . . . . . . . . . . . . . . . . . . . . . . . . . dveawd jeza dpin` dvtd 6.2.3
91 . . . . . . . . . . . zeihpfia zetwzd ipta cinr xy` dveawa zexag ledip 6.2.4
101 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . `ln xcq 6.2.5
102 . . . . . . zizveawd zxeywzd lewehextt ly oiipia ipa` ly liri yenin 6.2.6
109 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ze`vez 6.3
115 miicizr mipeeike mekiq 7
115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xwgnd ze`vez mekiq 7.1
118 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . miicizr mipeeik 7.2
121 ziyrn divwilt` '`
122 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zicizr dcear '`.1
122 zexewn zniyx
`i agxen xivwz
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
(jynd) mipiipr okez c
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
mixei` zniyx
26 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zelitp i`lb ly wynn 3.1
33 . . . . . . . . . . . . . m drced lawi `l edylk znevy zexazqd lr oeilr mqg 4.1
35 . . nk,... ,n1 ,p :ely xeciyd geeha miznvd lk lv` lawzi s znev ici lr xeciy 4.2
39 . . . . . . . . . . . . . . . . . . . . . . . . . (p znev ici lr rvean) iqiqa RAPID 4.3
zipaln dqtewa ze`vnp 4.3 xei`l ziqgi ezpey xy` zexey) agxen RAPID 4.4
41 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (etqep 27--18 zexeye
4.4 xei`l ziqgi ezpey xy` zexey) zeipecf zelitp ipta cinr xy` RAPID 4.5
44 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (zipaln dqtewa ze`vnp
zelzk ,miciip miznvd xy`k miznvd lk lv` elawzdy zerced jqn feg` 4.6
51 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . mipey β ikxra
51 . . . . . . . . . . . mipey β ikxra zelzk ,miciip miznvd xy`k zyxd lr qner 4.7
zelzk ,miciip miznvd lk xy`k miznvdn X% l drced xiardl xefg` onf 4.8
52 . . . . . . . . . . . . . (zeycg zerced migley miznv 100 xy`k) mipey β ikxra
zelzk ,miciip miznvd lk xy`k miznvdn 98% l drced xiardl xefg` onf 4.9
52 . . . . . . . . . . . . . (zeycg zerced migley miznv 100 xy`k) mipey β ikxra
zelzk ,miciip miznvd xy`k miznvd lk lv` elawzdy zerced jqn feg` 4.10
52 . . . . . . . . . . . . . . . . . . . . . . . zeycg zerced mixcynd miznvd xtqna
zerced mixcynd miznvd xtqna zelzk ,miciip miznvd xy`k zyxd lr qner 4.11
52 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zeycg
100 xy`k) miciip miznvd lk xy`k miznvdn X% l drced xiardl xefg` onf 4.12
53 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (zeycg zerced migley miznv
d
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
zelzk ,miciip miznvd lk xy`k miznvdn X% l drced xiardl xefg` onf 4.13
53 . . . . . . (zeycg zerced migley miznv 100 xy`k) miikep`d miznvd xtqna
mixcynd miznvd xtqna zelzk miznvd lk lv` elawzdy zerced jqn feg` 4.14
53 . . . . . . . (migiipe miciip miznvd xy`k milewehext z`eeyd) zeycg zerced
-xt z`eeyd) zeycg zerced mixcynd miznvd xtqna zelzk zyxd lr qner 4.15
53 . . . . . . . . . . . . . . . . . . . . . . . . (migiipe miciip miznvd xy`k milewehe
zelzk ,migiip miznvd xy`k miznvd lk lv` elawzdy zerced jqn feg` 4.16
54 . . . . . . . . . . . (zeycg zerced migley miznv 100 xy`k) miznvd zetitva
100 xy`k) miznvd zetitva zelzk ,migiip miznvd xy`k zyxd lr qner 4.17
54 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (zeycg zerced migley miznv
60 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . znev ly dxehwhikx` 5.1
63 . . . . . . . . . . . . . . . . . . zeipecf zelitp ipta cinr xy` dvtd ly mzixebl` 5.2
64 . . . . . . . . . . . . jynd -- zeipecf zelitp ipta cinr xy` dvtd ly mzixebl` 5.3
71 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ipecf zxeywz cly 5.4
75 . . . . . . . migiip miznvd xy`k miznvd lk lv` elawzdy zerced jqn feg` 5.5
75 . . . . . . . . . . . . . . . . . . . . . . . . . . migiip miznvd xy`k zyxd lr qner 5.6
200 xy`k) migiip miznvd lk xy`k miznvdn X% l drced xiardl xefg` onf 5.7
76 . . . . . . . . . . . . . . . . . . . . . . . . (diipy lk zeycg zerced migley miznv
200 xy`k) miciip miznvd lk xy`k miznvdn X% l drced xiardl xefg` onf 5.8
76 . . . . . . . . . . . . . . . . . . . . . . . . (diipy lk zeycg zerced migley miznv
77 . . . . . . . miciip miznvd xy`k miznvd lk lv` elawzdy zerced jqn feg` 5.9
77 . . . . . . . . . . . . . . . . . . . . . . . . . . miciip miznvd xy`k zyxd lr qner 5.10
zelzk ,migiip miznvd xy`k miznvd lk lv` elawzdy zerced jqn feg` 5.11
78 . . . . . . . . . . . . . . . . . . . . . (miznv 200 jezn) miipecfd miznvd xtqna
zelzk ,migiip miznvd xy`k miznvd lk lv` elawzdy zerced jqn feg` 5.12
78 . . . . . . . . . . . . . . . . . . . . . (miznv 200 jezn) miipecfd miznvd xtqna
e
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
jezn) miipecfd miznvd xtqna zelzk ,migiip miznvd xy`k zyxd lr qner 5.13
79 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (miznv 200
jezn) miipecfd miznvd xtqna zelzk ,miciip miznvd xy`k zyxd lr qner 5.14
79 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (miznv 200
zelzk ,migiip miznvd lk xy`k miznvdn X% l drced xiardl xefg` onf 5.15
80 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . miipecfd miznvd xtqna
zelzk ,miciip miznvd lk xy`k miznvdn X% l drced xiardl xefg` onf 5.16
80 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . miipecfd miznvd xtqna
82 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . znev ly dxehwhikx` 6.1
ly jixcnn gwlp xei`d) Ensemble ly zeaky jeza zercedd ly rcine zexzek 6.2
89 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (Ensemble
92 . . . . . . . . . . . . . . . . . . . . . . . . . . . . zexagd lewehext ly cew-ecaqt 6.3
93 . . . . . . . . . . . . . . . . . . . . . . . . . . . . zecygd lewehext ly cew-ecaqt 6.4
96 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . befind lewehext ly cew-ecaqt 6.5
99 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . y`ltd lewehext ly cew-ecaqt 6.6
103 . . . . . . . . . . . . . . . . . . . . . . pi jildz lk i"r miwfgeny mixwir mipzyn 6.7
♦Pmute a ynzyny mikxr ly mixehwe xear zizpfia dnkqd ly lewehext 6.8
104 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (n > 6f ) pi jildz i"r rveane
108 . . . . . . . . . . . . . . . . . . . . pi znev i"r rveany dcig` dvtd ly lewehext 6.9
ly mirevia lnqny ewd) miznvl exaredy zercedd zenka zkxrnd zwetz 6.10
x`yl d`eeydd 0 l aexw `edy oeeikn ,d`xp iyewa ianet gztn zervn`a dptvd
110 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (mieewd
oeeikn cxed ianet gztn zervn`a dptvd ly mirevia lnqny ewd) xefg` onf 6.11
x`yn lceb ixcqa lecb ianet gztn mr dptvda miynzyn xy`k xefg` onfy
110 . . . . . . . . . . . . . . . . (ianet gztn mr dptvda miynzyn `ly xefg`d ipnf
mr zerced ly dcig` dvtde zerced ly `ln xcq ly xign :zkxrnd zwetz 6.12
112 . . . . . . . . . . . . . . . . . . . . . . . . . . . ixhniq gztn zervn`a dptvd ilae
f
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
112 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ycg han ly dpwzdd cr onf 6.13
122 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WiPeer ly dxehwhikx` '`.1
g
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
ze`lah zniyx
xtqna zelzk zyxd lr qnere miznvd lk lv` elawzdy zerced jqn feg` 4.1
57 . . . . . . . . . . . (zeycg zerced migley miznv 100 xy`k) miikep`d miznvd
113 . . . . . . . . . . . . . . . . . . . . . . . . . . . miziira miyigxzn zeyye`zd onf 6.1
h
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
i
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
agxen xivwz
zeivwilt`a odly l`ivphetd awr iaiqphpi` ote`a exwgp zeihegl` zezyx zepexg`d mipya
.wed-c`d zezyx zgtyn dpid zeihegl` zezyx ly daeyg dgtyn .cg`k zei`ave zeigxf`
zxeywz zleki ilra mipwzd ly sqe` xy`k zexvep (MANET ) zeihegl` wed-c` zezyx
yeniya jxev `ll zin`pic dxeva zexvep dl` zezyx .ipyd ly ezaxwa cg` mi`vnp zihegl`
,mixg` mipwzd ly zerced xiardl oken el` zezyxa mipwzddn wlg xy`k .idylk zizyza
.mited zaexn wed-c` zyx zxvep
xy` ,`ad ixwird zxeywzd meicnl jetdl l`ivphetd melb zeiwed-c` zeihegl` zezyxa
.rcin mikxev s`e ,rcin mitzyn ,rcinl miybip ep` da jxca hpxhpi`d znbeck dktdn llegi
meyl zewewf opi`y dcaerde ,odly zinvrd zeiledipd ,el` zezyx ly daxd zeyinbd lya z`f
`ede ynzyinl mpig `ed ,zizyza drwyd jixvn epi` oda yeniyd jk awre ,zifit zizyz
xarn okle ,cgeina ddeab zecixy zelra opid wed-c` zezyx ,sqepa .mideab dxard iaviw lra
,ipxcend axwd dcy ledipl ,oeq` ixef`l cgeina zeni`zn mb od ,odly miineineid miyeniyl
.rerx avnay e` zeniiw `ly e` zeizyzd oday ,zegztzne zelykp zevx`le
i"r dyrp el` zezyxa aezipde idylk zizyz lr zeknzqn `l zeciip wed-c` zezyx
.lcben geeh zlra wed-c` zyx zxvep jk ici lre xqnnd zepgz z` mieednd ,mnvr mipwzdd
.dyixc it lr zexgap xqnn zepgze sxd `ll dpzyn zyxd ly dxevzd ,el` zezyxa sqepa
-c` zezyxa ce`n zihew` aezipd zira .wed-c` dxeva dyrp aezipd `yep lk ,zexg` milina
,zepin` zeieyik zeaygp llk jxca xqnn zepgz oda ,zizyz zeielz zezyxl cebipae xg`n ,wed
lr zrvazn zercedd zxard `l` ,odilr jenql ozipy xqnn zepgz oi` zeiwed-c` zezyxa
lagl eqpi s`e ,mipin` `l eidi mipwzddn wlgy xiaq iekiq miiw okle ,mnvr mipwzdd ici
lr xenyl miqpny mipwzd :dnbecl) miikep` e` miipecf mipwzd ly meiw ,okl .zyxd zelirta
milewehext ly gezit aiign zkxrna (mixg` mipwzd ly zerced xiardl `le mdly dixhad
.ef zebdpzd ipta micinry
owzdl xyt`n `edy oeeikn ,zeiteziy zeivwilt` daxdl aeyg zexiy deedn zerced zvtd
`i
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
,dpin`e dliri zeidl dkixv zerced zvtd ,hxta .zyxa mipwzdd lkl zelwa rcin uitdl edylk
zexiy zervn`a dglypy drced lk elawi zyxa mipwzdd aexy gihadl ,zexg` milna e`
.zercedd zvtd
xy`n mikaeqne miakxen xzei mizexiy zeyxec zeiteziyd zeivwilt`dn wlg ,z`f mr
-`n xy` ,zizveaw zxeywz zkxrn ly mizexiyn zepdl elkei od okle zercedd zvtd zexiy
dkeza zcb`n zizveaw zxeywz zkxrn .ziteziy zxeywzl mizexiy ly agx ce`n oeebn zxyt
oia zxeywzd lr zizernyn dxeva miliwn xy` mipey mizexiye minzixebl` ,gezit ilk zekxr
-xyt`n aexl zizveaw zxeywz zekxrn .zeiteziy zeivwilt` gezit lr hxtae dveawd ixag
xeciq ,zil`ehxie divfipexkpiq ,dpin` dvtd oebk milewehext siqedl zeivwilt`d igztnl ze
zxeywz zkxrna yeniy okle gezitl miakxen ce`n aexl el` milewehext .'eke zerced ly `ln
zeivwilt`d ly zepin`d z` licbn s`e zeivwilt`d gezit onf z` daxda xvwn zizveaw
ektdp zizveaw zxeywz zekxrn zepexg`d mipya .zizveaw zxeywz zkxrn zxfra zeazkpd
.zeixgqne zeincw` mixhqlw zekxrna zeihxcphq oipa ipa`l
egzety zekxrnd aex ,zizveaw zxeywz zekxrna drwyedy daxd zixwgnd dcear zexnl
zxeywzd zekxrnn wlg ,hxta .zeihpfia zelitp llek `l aexly heyt zelitp lcen zegipn
ixag lky gipn zekxrnd ly rixknd aexde ,zerced ly zeni`e zenizga zeknez zizveawd
leki jildz eay ihpfia zelitp lcenl xenb cebipa df .zkxrnl wifdl eqpi `le mipiwz dveawd
,b`an d`vezk mxbdl dleki lewehextdn ef diihq .ely lewehextdn zi`xw` dxeva zehql
.zipecf zebdpzd e` dxneg zira
zeihpfia zelitpn enlrzd zizveawd zxeywzd zekxrn aexy jkl zeixwird zeaiqd zg`
LAN eze`a evxy mixhqlw zekxrna yeniya eid zizveawd zxeywzd zekxrn aexy `id
dxizi .mdilr jenql xyt`e mipiwz md LAN eze`a mi`vnpy mipwzdy `id zgeexd dgpdde
zelitp mr zeccenzdn d`vezk sqepy jeaiqd mr cgi z`fd dgpdd z` mixagn xy`k ,z`fn
-pa zeaygzd `ll egzet zekxrnd aex ,zkxrnd ly mireviaa zizernyn dribt mbe zeihpfia
.zeihpfia zelit
zxeywz zekxrna ynzydl oevxd mbe miaygn zekxrn cbp zetwzdd zenka diilrd ,z`f mr
zekxrnd ly jxevd z` ycgn xxer ,zeiwed-c`d zezyxd zaiaq oebk ,zetqep zeaiaqa zizveaw
szzyiy owzdy iekiqd ,wed-c` zezyxay oeeikn ,dxw df .zeihpfia zelitp ipta cinr zeidl
jiynz eply zkxrndy mivex epgp` m` ,jk itl .gipf `l xak ihpfia didi zizveaw zxeywza
dvxp epgp` ,ok onk .zeihpfia zelitp ipta dcinr zeidl dkixv `id ,ziwed-c` daiaqa mb cwtzl
epgp` ,hxta .ziyeniy `l didiz `id zxg` ,mixiaq ex`yi zkxrnd ly mireviady gihadl
ai
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
cwnzdl aeygy mipin`n epgp` okle dkenp ziqgi zeihpfiad zelitpd zenk llk jxcay migipn
zeidl ,zeihpfia zelitp zexew xy`k ,z`f mr .zeihpfia zelitp oi` xy`k zkxrnd ly mireviaa
.ziyeniy dcear rval jiyndle odilr xabzdl milbeqn
dnk icn xywzi dveawa xag lky yxec zipiite` zizveaw zxeywz zkxrn ly oeiti`d
wx zerced lawl leki p owzd ,mited zeaexn zezyxa ,z`f mr .dveawd ixag x`y mr onf
xiardl ekxhvi mipwzddn wlg ,jk itl .mdly xeciyd geehn ohw p n mwgxny mipwzdn
dxeva ebdpzi mipwzddn wlg m` jaeqn xzei s` df .mdly mipkyl mixg` mipwzdn zerced
wed-c` zyxa zizveaw zxeywz zkxrn ynnl zpn lr ,okl .zerceddn wlg exiari `le zihpfia
.zyxa mipwzdd lkl zerced ly dpin` dvtd xyt`n xy` lewehexta jxev yi mited zaexn
milna .dtvdd lewehext zervn`a df mited zaexn zyxa zerced ly dvtdl dheytd jxcd
lk ,okn xg`l .ely xeciyd geeha mipwzdd lkl dze` xiarn ,drcedd z` xviiny owzd ,zexg`
onfa .ely xeciyd geeha mipwzdd lkl dze` xiarn ,dpey`xd mrta drcedd z` lawny owzd
.zeieybpzd ly ax xtqnl mexbl dlekie zipfafa ce`n `id ,dpin` ce`n `id efd dvtdd zxevy
.mipwzd ly zyx zz iab lr dtvde zizexazqd dtvd od dtvdl zegiky zeaihpxhl`
hilgdl eciwtzy il`wel aeyig rvan `ed ,drced lawn owzd xy`k ,zizexazqd dyiba
miheyt ce`n aexl md mizexazqd milewehext .`l e` drcedd z` xcyl jixv owzdd m`d
milewehext ly zepin`dy zpn lr z`f mr .mzrepz e` mipwzd ly zelitp ipta micinr mbe
zenk oiicr jkn d`vezke ddeab ic zeidl dkixv xeciyl zexazqd ,ddeab didiz miizexazqd
.ce`n dlecb zeglypy zexzeind zercedd
wgxn lr miqqazny mil`wel miaeyigl zizexazqd dtvd oia zealyn zetqep zeyib
mciwtzy ,'eke owzd ly mewin ,rny owzddy beq eze`n zercedd zenk ,xcyny owzddn
zenk z` oihwdl zexyt`n mpne` el` zehiy .drcedd z` xcyl jixv owzdd m`d hilgdl
.mipwzdd lkl zercedd zvtda dlecb diidydn zelaeq od ipy cvn j` ,cg` cvn mixeciyd
zeibeleteh xear mipwzdd lkl zerced ly dpin` dvtd gihadl zeleki `l el` zehiy ok enk
.mipwzd ly zeipecf zetwzd mr ccenzdl zeywzn mb ode zepey zyx
-c` zeihegl` zezyxa zizveaw zxeywz zekxrn zn`zda zwqer ef hxehwecd zcear
.jci`n zeihpfia zetwzdl oiqg didi j` ,cgn miaehd odirevia z` xnyiy ote`a zeiwed
-c` zeihegl` zezyxl zn`zeny zizveaw zxeywz zkxrna epynzyd epizcearl qiqak
mileki mipier mipwzd mikxc eli`a eppga dceard jldna .JazzEnsemble z`xwpy zeiwed
xy` milewehext epgzite hxta JazzEnsemble le llka zizveaw zxeywz zkxrnl wifdl
zezyxa mipwzdd oia zxeywzd igeeh z` licbdl zpn lr ,sqepa .el` zetwzd ipta mipiqg
bi
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008
zezyxa mipwzdd lkl zerced ly dpin` dvtd mixyt`n xy` milewehext ipy epgzit wed-c`
mipwzdd lkl zerced xiardl zpn lre BDP `xwp oey`xd lewehextd .zeiwed-c` zeihegl`
md ef zyx zza mixagy mipwzd .mnvr mipwzddn diepay zyx zz wfgzne dpea `ed zyxa
zenky jka epid BDP ly oexzid .mipwzdd lkl zercedd lk ly dvtdd lr mi`xg`y el`
okle zixewnd zyxa mipwzdd zenkl ziqgi dphw `id ef zyx zza mitzzyny mipwzdd
-xhl`l ziqgi dphw mipwzdd lkl zercedd lk z` xiardl zpn lr zeglypy zercedd zenk
izexazqd lewehext epid RAPID. RAPID `xwp epgzity ipyd lewehextd .zexg`d zeaihp
dvtd gihadl zpn lr .mited zeaexn wed-c` zezyxa zerced ly dpin` dvtd gihan xy`
ihqipinxhc jildz uixn RAPID ,zizexaqdd dvtdl sqepa ,mipwzdd lkl zerced ly dpin`
wlg zexqg mipwzd dnkle dxwnae zercedd lk z` elaiw mipwzdd lky wcea xy` sqep
milikny mipwzdd i"r enlyei zexqgd zerceddy b`ec ihqipinxhcd lewehextd ,zerceddn
-ip zeihegl` zezyxl BDP xy`n xzei daeh dxeva m`zen df lewehext .elld zercedd z`
zizyz lr zknzqn `l `ide zizexazqd `id dvtdd RAPID a ,BDP l cebipay oeeikn zeci
.mipwzdd ly mnewin lr e` idylk
,WiPeer z`xwpy divwilt` epgzit zeihegl` wed-c` zezyx ly xwgndn wlgk ,sqepa
xywzle dycg zihegl` zyx xevil ipyd ly ezaiaqa cg` mi`vnpy mipwzdl zxyt`n xy`
.idylk zizyza zelz `ll ipyd mr cg`
ci
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008