dependable communication protocols in ad-hoc networks

Dependable Communication Protocols in Ad-Hoc

Networks

Vadim Drabkin

Technion - Computer Science Department - Ph.D. Thesis PHD-2008-05 - 2008

Dependable Communication Protocols in Ad-Hoc

Networks

Research Thesis

Submitted in Partial Fulfillment of the Requirements

FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

Vadim Drabkin

Submitted to the Senate of the Technion — Israel Institute of Technology

SIVAN, 5768 Haifa June, 2008


This Research Thesis was done under the supervision of Assoc. Prof. Roy Friedman in theDepartment of Computer Science

It is my privilege to thank Roy Friedman for his insightful guidance that made this workpossible, and for bringing me up as a scientist, researcher and an entrepreneur.

It is a pleasure to thank my colleagues at the Technion, Gabi Kliot and Marc Segal z”l,for fruitful discussions and for the wonderful time we had together.

I feel that no words can express my deep gratitude to my wonderful wife Anna, whogave me unconditional love, support, and understanding through the better and worsetimes of my research. Also, to my grandparents, Enoch and Ester Tsvaig, my mother,Larisa Drabkin, and to my sister Regina Bricker to whom I owe my life and education fromthe very beginning.

The generous financial help of the Technion is gratefully acknowledged


Contents

Abstract 1

Notation and Abbreviations 3

1 Introduction 5

2 Related work 11

2.1 Multicast & Broadcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Byzantine Failures & Failure Detectors . . . . . . . . . . . . . . . . . . . . 16

2.3 Group Communication & Replication . . . . . . . . . . . . . . . . . . . . . 16

3 Preliminaries 21

3.1 Ad-Hoc Networks Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2 Malicious Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.1 Limitations on the power of the malicious nodes . . . . . . . . . . . 22

3.3 Failure Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3.1 Interfacing with the Failure Detectors . . . . . . . . . . . . . . . . . 25

4 Reliable Probabilistic Dissemination inWireless Ad-Hoc Networks 29

4.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Common Reliable Dissemination Techniques . . . . . . . . . . . . . . . . . 30

4.2.1 Probabilistic Flooding . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2.2 Counter Based Broadcast . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2.3 Lazy Gossip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.3 The RAPID Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3.1 Basic RAPID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.3.2 Enhanced RAPID . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

i


ii Contents (continued)

4.3.3 Maliciousness Resilient RAPID . . . . . . . . . . . . . . . . . . . . . 43

4.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.4.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5 Overlay Based Reliable Broadcast in Wireless Ad-Hoc Networks 59

5.1 System Model and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2 Failure Detectors and Nodes’ Architecture . . . . . . . . . . . . . . . . . . 59

5.3 The Broadcast Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.4 The Dissemination Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.4.1 The Dissemination Task in Detail . . . . . . . . . . . . . . . . . . . 62

5.4.2 Gossiping and Message Recovery in Detail . . . . . . . . . . . . . . 62

5.5 Overlay Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.6 Correctness Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.6.1 Protocol Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.6.2 Fast Dissemination with Eventually Perfect Failure Detectors . . . 72

5.6.3 Fast Dissemination with Interval Failure Detectors . . . . . . . . . . 72

5.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6 Byzantine Resilient Group Communication 81

6.1 Model, Assumptions and Problem Statement . . . . . . . . . . . . . . . . . 81

6.1.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.1.2 Byzantine Virtual Synchrony . . . . . . . . . . . . . . . . . . . . . . 83

6.2 Overview of the Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.2.1 JazzEnsemble and Fuzzy Membership . . . . . . . . . . . . . . . . . 87

6.2.2 Fuzzy Mute and Fuzzy Verbose Failure Detectors . . . . . . . . . . 89

6.2.3 Intra-View Reliable Delivery . . . . . . . . . . . . . . . . . . . . . . 91

6.2.4 Byzantine Membership Maintenance . . . . . . . . . . . . . . . . . . 91

6.2.5 Total Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.2.6 Efficient Implementations of building blocks . . . . . . . . . . . . . 102

6.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

7 Summary and Future Directions 115

7.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

7.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118


Contents (continued) iii

A Practical Application 121

A.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

References 122

Hebrew Abstract `i


iv Contents (continued)


List of Figures

3.1 Failure Detectors’ Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1 An upper bound on the probability that an arbitrary node does not receivea message m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2 A transmission by a node s can be received by all nodes within its transmis-sion range: p, n1, ...,nk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.3 Basic RAPID (executed by node p) . . . . . . . . . . . . . . . . . . . . . . . 39

4.4 Enhanced RAPID (lines that were modified w.r.t Figure 4.3 are boxed whilelines 18–27 were added) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.5 Maliciousness Resilient RAPID (lines that were modified w.r.t Figure 4.4 areboxed) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.6 Message delivery ratio when all nodes are mobile vs. varying values of β . . 51

4.7 Network load in terms of total number of transmissions when all nodes aremobile vs. varying values of β . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.8 Latency to deliver a message to X% of the nodes when all nodes are mobilevs. varying values of β (with 100 broadcasting nodes) . . . . . . . . . . . . 52

4.9 Latency to deliver a message to 98% of the nodes when all nodes are mobilevs. varying values of β (with 100 broadcasting nodes) . . . . . . . . . . . . 52

4.10 Message delivery ratio when all nodes are mobile vs. varying number ofbroadcasting nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.11 Network load in terms of total number of transmissions when all nodes aremobile vs. varying number of broadcasting nodes . . . . . . . . . . . . . . . 52

4.12 Latency to deliver a message to X% of the nodes when all nodes are mobile(with 100 broadcasting nodes) . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.13 Latency to deliver a message to X% of the nodes when all nodes are mobilevs. varying number of selfish nodes (with 100 broadcasting nodes) . . . . . 53

4.14 Message delivery ratio vs. varying number of broadcasting nodes (compareprotocols both in static and mobile environments) . . . . . . . . . . . . . . 53

4.15 Network load in terms of total number of transmissions vs. varying number ofbroadcasting nodes (compare protocols both in static and mobile environments) 53

4.16 Message delivery ratio when all nodes are static vs. varying density (with100 broadcasting nodes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

v


4.17 Network load in terms of total number of transmissions when all nodes arestatic vs. varying density (with 100 broadcasting nodes) . . . . . . . . . . . 54

5.1 A node’s architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.2 Malicious Resilient Dissemination Algorithm . . . . . . . . . . . . . . . . . 63

5.3 Malicious Resilient Dissemination Algorithm – continued . . . . . . . . . . . 64

5.4 Malicious overlay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.5 Message delivery ratio when all nodes are static . . . . . . . . . . . . . . . . 75

5.6 Network load in terms of total number of messages sent when all nodes arestatic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.7 Latency to deliver a message to X% of the nodes when all nodes are static(with 200 broadcasting nodes that send one message per second) . . . . . . 76

5.8 Latency to deliver a message to X% of the nodes when nodes are mobile(with 200 broadcasting nodes that send one message per second) . . . . . . 76

5.9 Message delivery ratio when all nodes are mobile . . . . . . . . . . . . . . . 77

5.10 Network load in terms of total number of messages sent when nodes are mobile 77

5.11 Message delivery ratio when all nodes are static vs. varying number of ma-licious nodes (out of a total of 200 nodes) . . . . . . . . . . . . . . . . . . . 78

5.12 Message delivery ratio when nodes are mobile vs. varying number of mali-ciouos nodes (out of a total of 200 nodes) . . . . . . . . . . . . . . . . . . . 78

5.13 Network load when all nodes are static vs. varying number of malicious nodes(out of a total of 200 nodes) . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.14 Network load when nodes are mobile with varying number of malicious nodes(out of a total of 200 nodes) . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.15 Latency to deliver a message to X% of the nodes when all nodes are staticvs. varying number of malicious nodes . . . . . . . . . . . . . . . . . . . . . 80

5.16 Latency to deliver a message to X% of the nodes when nodes are mobile vs.varying number of malicious nodes . . . . . . . . . . . . . . . . . . . . . . . 80

6.1 A node’s architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.2 Message headers and data in layers (drawing taken from Ensemble’s referencemanual) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.3 Pseudo Code of Membership Protocol . . . . . . . . . . . . . . . . . . . . . 92

6.4 Pseudo Code of Suspicion Protocol . . . . . . . . . . . . . . . . . . . . . . . 93

6.5 Pseudo Code of Merge Protocol . . . . . . . . . . . . . . . . . . . . . . . . . 96

6.6 Pseudo Code of FLUSH Protocol . . . . . . . . . . . . . . . . . . . . . . . . 99

6.7 Main variables held by each process pi . . . . . . . . . . . . . . . . . . . . . 103

6.8 ♦Pmute-Based Vector Byzantine Consensus Protocol Executed by pi (n > 6f) 104

vi


6.9 Uniform Broadcast Protocol Executed by pi (n > 6f) . . . . . . . . . . . . 108

6.10 Throughput measurements (the line for public key cryptography is hardlyvisible, as it is so close to 0 compared with the other lines) . . . . . . . . . 110

6.11 Latency measurements (the line for public key cryptography is dropped sinceit is orders of magnitude higher than the others) . . . . . . . . . . . . . . . 110

6.12 Throughput Measurements: the cost of total ordering and uniform broadcastwith and without symmetric-key cryptography . . . . . . . . . . . . . . . . 112

6.13 Time to establish a new view . . . . . . . . . . . . . . . . . . . . . . . . . . 112

A.1 WiPeer architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

vii


viii


List of Tables

4.1 Delivery ratio and message count vs. the number of selfish nodes (with 100broadcasting nodes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6.1 Recovery time from problematic scenarios . . . . . . . . . . . . . . . . . . . 113

ix


x


Abstract

Mobile Ad-Hoc Networks (MANETs) are networks of mobile devices that are formed in an

ad-hoc manner. The devices that participate in such networks have wireless communication

capabilities with limited range transmitters, and thus can directly communicate with other

devices that are within their range. Some of the devices occasionally volunteer to forward

some of the messages they receive, or in other words, act as routers, thereby forming a

multi-hop network with a wider reach. Yet, there is no fixed infrastructure, the network is

continuously changing, and routers are elected on demand. In other words, the networking

issues are handled in an ad-hoc manner.

Unlike infrastructure based networks in which routers are usually considered to be

trusted entities, in ad-hoc networks routing is performed by the devices themselves. Thus,

there is a high risk that some of the nodes of an ad-hoc network would not respect the

networking protocols. This can be due to maliciousness, or simply selfishness (trying to

save battery power). Thus, the possibility of having faulty nodes in the system motivates

the development of reliable broadcast protocols for ad-hoc networks.

Group communication systems have proven themselves as powerful middleware for build-

ing reliable networked applications in wired environments. These systems relieve program-

mers from many of the tedious and highly complex issues involved in designing such ap-

plications, allowing them to focus on the essential aspects of the application being devel-

oped, resulting in faster development time and fewer bugs. During the last few years,

group communication systems have become standard building blocks in many clustering

and replication products in both industry and academia.

In this thesis we present a novel ReliAble ProbabIlistic Dissemination protocol, called

RAPID, for mobile wireless ad-hoc networks, which tolerates message omissions, node

1


crashes, and selfish behavior. The protocol employs a combination of probabilistic for-

warding with deterministic corrective measures. The forwarding probability is set based

on the observed number of nodes in each one-hop neighborhood, while the deterministic

corrective measures include deterministic gossiping as well as timer based corrections of the

probabilistic process. These aspects of the protocol are motivated by a theoretical analysis,

which explains why this unique protocol design is inherent to ad-hoc networks environments.

Since the protocol only relies on local computations and probability, it is highly resilient to

mobility and failures. By adding authentication, it can even be made malicious tolerant.

As additional contribution, we present an efficient overlay based reliable broadcast pro-

tocol for ad hoc networks. The use of an overlay results in a significant reduction in the

number of messages. The protocol overcomes different types of nodes failures by combining

digital signatures, gossiping of message signatures, and failure detectors.

Last, we present and explore a reliable group communication system in wireless mobile

ad-hoc networks. The objective is to enable easy, efficient, and correct development of

applications in this environment. The main challenge was to develop protocols that are

resilient to attacks by malicious nodes, yet are scalable, robust and efficient. This included

adapting several existing protocols as well as developing new membership maintenance

mechanisms, followed by a thorough benchmarking of the system.

2


Notation and Abbreviations

rp — Transmission range of node p

N t(1, p) — Set of direct neighbors of a node p in time t

N t(k, p) — A transitive closure with length k of N t(1, p) in time t

OV ERLAY — Set of nodes that belong to the overlay

OLt(1, p) — Set of direct neighbors of a node p that belong to the overlay in time t

OLt(k, p) — A transitive closure with length k of OLt(1, p) in time t

β — Required number of forwarders of message

davg — The average number of neighbors of any node

Q — A probability of a message to be successfully received by a neighboring node

P — A probability to broadcast a message

kp — A private key of device p

f — A number of failures in the system

Mute — Failure detector that detects mute failures

Verbose — Failure detector that detects verbose failures

Mute — Failure detect that detects node failures

Imute — Interval failure detect that detects mute failures

Iverbose — Interval failure detect that detects verbose failures

3


4


Chapter 1

Introduction

Wireless mobile ad-hoc networks (MANET) are formed when an ad-hoc collection of de-

vices equipped with wireless communication capabilities happen to be in proximity to each

other [112]. When some of these devices agree to forward messages for other devices, a

multi-hop network is formed. One of the aspects of ad-hoc networks is that they are formed

without any pre-existing infrastructure or management authority. Also, due to mobility,

the physical structure of the network is continuously evolving.

MANETs offer a potential for a variety of new applications and improved services for

mobile users, especially as the computing power of mobile devices becomes stronger. Exam-

ple applications include interactive distributed games, ad-hoc transactions and e-commerce,

collaborative (shared white-board and video conferencing) applications, and enhancing the

bandwidth and reach of cellular communication (e.g., for Wi-Fi enabled cell-phones) [62, 63].

Broadcast is a basic service for many collaborative applications, as it enables any device

to disseminate information to all other participants in the network. In particular, a useful

broadcast service should be both efficient and provide a good level of reliability, meaning

that most nodes in the system will receive almost every broadcasted message.

Yet, some of the applications we mentioned above require stronger semantics than broad-

cast and can benefit from a group communication service that provides the application a

wider choice of fine grain semantics. Group communication is an umbrella term that de-

scribes a wide variety of toolkits, algorithms, and services that promote the ease of commu-

nication among a number of nodes [15]. The GC toolkits that have been developed typically

5


serve as a middleware, enabling developers to easily add communication primitives such as

reliable multicast, virtual synchrony, and total ordering to their applications. These pro-

tocols are typically not trivial to implement, thus contributing to the popularity of the

component model for such toolkits, where developers can simply select the services that are

required for their particular task and use them in a transparent, abstract manner. In recent

years, group communication toolkits have become standard building blocks in commercial

and academic clustering systems.

Despite the large body of work on group communication, most systems assumed a rel-

atively benign failure model, which largely excludes Byzantine failures [74]. In particular,

while some group communication systems support signatures and authentication of mes-

sages, the vast majority assume that all group members can be trusted. In contrast, under

the Byzantine failure model, a process can deviate arbitrarily from its protocol. This can

be either a result of a bug or hardware malfunction, or due to malicious behavior.

One of the main reasons why most systems ignore Byzantine faults is that group com-

munication has been largely used to coordinate clustered applications, all running within

the same LAN. It is often assumed that in such closed environments, all participants can be

trusted. In particular, when combining this assumption with the performance hit and pro-

tocol complexity associated with accommodating Byzantine failures, most projects opted

not to handle such failures.

However, given the rise in security attacks against computer systems, as well as the

desire to utilize group communication in new application domains, such as in ad-hoc net-

works [112], the need for Byzantine tolerance re-emerges. This is because the likelihood

that a node might be compromised is no longer negligible. Thus, if we want the system to

remain robust in these situations, it must be able to tolerate Byzantine failures. On the

other hand, we would like to ensure reasonable performance, since otherwise the system

would also be useless. In particular, we assume that the occurrence of Byzantine faults is

rare, and thus we believe that it is important to focus on the performance of the system

during normal runs, when there are no Byzantine faults. Yet, of course, the system must

still be able to recover from Byzantine faults and do useful work when they occur.

The design of traditional group communication systems requires each group member

to communicate periodically with all other group members. Yet, in multi-hop networks,

6


a node p can receive messages only from nodes whose distance from p is less than their

transmission range. Therefore, some nodes will have to forward those messages from other

nodes to their neighbors. It is even more complicated if some of the nodes choose to behave

in a Byzantine manner and not to forward these messages. Thus, in order to implement a

group communication system in multi-hop networks there is a need for reliable broadcast

protocol.

The simplest way to obtain broadcast in a multiple hop network is by employing flood-

ing [111]. That is, the sender sends the message to everyone in its transmission range.

Each device that receives a message for the first time delivers it to the application and also

forwards it to all other devices in its range. While this form of dissemination is very robust,

it is also very wasteful and may cause contention and a large number of collisions [114].

Common alternatives to flooding are either to perform a constrained flooding on top

of a deterministic overlay, e.g., [75, 109, 121, 123], or to perform a probabilistic flooding,

e.g., [57, 76].

In the probabilistic approach, whenever a node receives a message, it applies some locally

computable probabilistic mechanism to randomly determine whether it should broadcast the

message or not [24, 57, 76]. Probabilistic protocols are appealing since they are very simple,

and are inherently robust to failures and mobility. Yet, as was discovered in [57, 76, 104],

in order to obtain very high reliability levels with pure probabilistic broadcasting, one has

to set the retransmission probability to relatively high values. Consequently, such schemes

still generate a large number of redundant messages.

Other approaches [24, 57, 114, 115] combine probabilistic forwarding with some addi-

tional locally computable mechanism, such as counter-based, distance-based, location-based,

or any combination of those, to determine whether it should rebroadcast the message or

not. That way, the number of messages is further reduced. Yet, those protocols suffer

from increased latency. Finally, those schemes cannot ensure high reliability for arbitrary

topologies, and cannot cope with selfish (and malicious) behavior.

Our work began with the investigating of the various aspects of designing and enhancing

an effective group communication toolkit for MANETs that overcomes Byzantine failures.

Soon, we understood that some membership protocols require reliable dissemination of

7


messages to all group members in multi-hop wireless networks. Having the reliable dissem-

ination building block, enabled us to concentrate on group communication that overcomes

Byzantine failures in single hop networks. For didactic reasons we first present two flavors of

reliable broadcast protocols: RAPID, a reliable probabilistic broadcast protocol and BDP,

an overlay-based broadcast protocol. Then we turn our attention to a group communication

system that tolerates Byzantine failures.

Our contributions are:

• We demonstrate an efficient reliable probabilistic broadcast protocol for wireless ad-

hoc networks. The protocol employs a combination of probabilistic forwarding with

deterministic corrective measures. The forwarding probability is set based on the

locally observed network’s density. Additionally, we employ timer based corrections

that may cause a node to change its decision on whether to broadcast a message

or not. Finally, the protocol employs a deterministic gossip based mechanism that

recovers messages that were not delivered by the probabilistic dissemination.

• We analyze the relationship between the number of nodes that rebroadcast messages in

each one-hop neighborhood in a probabilistic dissemination protocol and the expected

reliability of this protocol (or in other words, the percentage of nodes that will receive

the message). We show that there is an optimal number, in the sense that this number

of retransmitting nodes, which is relatively small, is enough to ensure good reliability

and this number does not depend on the network’s density.

• We present an efficient overlay-based reliable broadcast protocol for wireless ad-hoc

networks. The protocol overcomes malicious failures by combining digital signatures,

gossiping of message signatures, and failure detectors. These ensure that messages

dropped or modified by malicious nodes will be detected and retransmitted and that

the overlay will eventually consist of enough correct processes to enable message dis-

semination. An appealing property of the protocol is that it only requires the existence

of one correct node in each one-hop neighborhood.

• We develop a Byzantine resilient membership protocol. In addition, we analyze the

sources of performance degradation associated with various aspects of overcoming

Byzantine failures.

8


• We present a novel Byzantine uniform broadcast protocol that terminates in two

communication steps (instead of three) at a cost of f < n/6.

• We present a novel vector-oriented Byzantine consensus protocol that allows the pro-

cesses to decide in one communication step in favorable circumstances. The protocol

is a failure detector-based and assumes f < n/6.

• We demonstrate a design of practical application that uses the JazzEnsemble group

communications toolkit and targets wireless mobile ad-hoc networks (MANETs).

Road-map: The rest of this thesis is organized as follows. Chapter 2 discusses rele-

vant related work such as broadcast protocols and group communication systems. Chapter 3

presents the Ad-Hoc networks model. Chapter 4 describes the reliable probabilistic broad-

cast and Chapter 5 describes the overlay-based broadcast protocols. The details of the

Byzantine JazzEnsemble protocols are given in Chapter 6. Finally, Chapter 7 summarizes

the thesis and outlines potential future work.

9


10


Chapter 2

Related work

2.1 Multicast & Broadcast

A good survey of broadcast and multicast protocols for wireless ad hoc networks can be

found in [112]. In particular, (multicast) routing in MANET can be classified into proactive,

e.g., OLSR [31], reactive, e.g., AODV [92] and DSR [65], and mixtures of both, e.g., ZRP [56],

as well as geographic routing [66, 71, 77, 98]. These protocols, however, ignore Byzantine

failures.

The simplest probabilistic broadcast protocol is probabilistic flooding [57, 114]. In this

scheme, each node rebroadcasts a message with a predefined probability P. Works by Haas

et al. [57] and Sasson et al. [104] study the rebroadcasting probability P with regard to the

so called phase transition phenomena. Both works establish that the delivery distribution

has a bimodal behavior with regard to some threshold probability P, in a sense that for

any P > P almost all nodes will receive the message and for P < P almost none. Both

works show that the threshold probability P is around 0.59 − 0.65; in [104] this is done

analytically based on percolation theory while in [57] it is obtained by simulations. It is

also noted in [57] that the threshold probability depends on nodes density, yet without

providing any theoretical means to evaluate this dependance. We have studied the delivery

distribution using probabilistic methods in Section 4.2. We have shown that by making

a few probabilistic assumptions, the delivery distribution function behaves in a concaved

manner rather than being bimodal. That is, nodes coverage initially grows fast with P.

Then, at some critical point, the added coverage becomes negligible with further increase

11


of P. Our protocol is designed with corrective measures that compensate for situations in

which the simplifying assumptions do not hold.

Other probabilistic approaches [57, 85, 114, 115] include counter-based, distance-based,

and location-based mechanisms. The main idea in these schemes is that the additional

space coverage obtained by each additional broadcast decreases with the number of broad-

casts. For example, [57] presents a variant of the probabilistic protocol in which every node

monitors the transmissions of its neighbors and rebroadcasts a message if it has not heard

M transmissions of the same message. Yet, those protocols suffer from increased latency

due to the packet delay introduced at each hop (as explained in Section 4.3.2) and none

of them guarantee a reliable dissemination of messages to all nodes (as explained in Sec-

tion 4.3.2). On the other hand, the RAPID protocol, presented in this thesis, guarantees

reliable dissemination in any topology.

The works in [107, 127] utilize an adapted probabilistic flooding that makes use of local

density. The approaches of those works are based on the observation that the retransmission

probability P should be adjusted relatively to the local nodes density. In [127] this is

done through counters, while in [107] the uniform density is assumed. However, those

works contain little theoretical analysis of the proposed schemes and like other counter-

based schemes can also fail to provide reliability on certain topologies. To the best of our

knowledge, our work is the first to provide a theoretical analysis of the optimal usage of

nodes density in order to set P.

The work in [24] studies three variants of the above ideas. The first is to retransmit

with probability k/ni, where k is some constant and ni is the size of the neighborhood.

The second method is based on having each node learn its 2-hop neighborhood and then

computing the rebroadcasting probability based on 1-hop neighborhoods intersections. The

final scheme in [24] also computes the probability according to k/n, but adds a mechanism

in which if a node suspects that some of its neighbors did not receive the message, it

rebroadcast the message regardless of its initial decision. Unlike the work in [24], we formally

analyze the value of k. Also, we include a gossip and recovery mechanism, whereas none of

the protocols in [24] do so. Consequently, RAPID is more reliable than any of the schemes

of [24]. Moreover, RAPID has a variant that can deal with many forms of malicious behavior

while the other protocols do not.

12


The color-based scheme has been recently proposed in [68]. In this scheme, each node

forwards a message if it can assign it a color from a given pool, which it has not already

overheard after a random time. Using geometric analysis, they have shown that the size of

the rebroadcasting group is within a small constant factor of the optimum. The color-based

scheme is actually an advanced type of a counter-based scheme, and thus incurs similar

latencies and does not guarantee high reliability on arbitrary topologies. The bounds on

the size of the rebroadcasting set in homogenous dense network in [68] are similar to our

analysis in Section 4.2.1. Yet, our analysis is much simpler and holds for every probabilistic

algorithm that picks nodes uniformly at random in homogenous network, while their analysis

only holds for color-based schemes.

A number of works have been designed to provide a reliable dissemination of messages

to all nodes. An approach called Mistral tries to compensate for missing messages in

probabilistic dissemination by using forward error correction techniques [93]. In contrast,

our approach to recovery of messages is based on gossip. Also, Mistral cannot cope with

malicious behavior.

Demers et al. were the first to use gossip in the context of replicated databases in [34].

This idea was later adopted and extended in followup works such as the MNAK layer of

the Ensemble system in 1996 [58]. Additionally, randomized gossip has been used as a

method of ensuring reliable delivery of broadcast/multicast messages while maintaining

high throughput in the PBcast/Bimodal work [14] as well as in several followup papers,

e.g., [40]. In a way, the idea in our work is an inverse of the idea at PBcast/Bimodal work.

In the PBcast/Bimodal, each node deterministically sends every message to all the nodes

and later gossips about the existing messages with a random subset of nodes. Conversely,

in RAPID each node disseminates the messages to a random set of nodes (chosen among

its physical neighbors) and later deterministically gossips about the existing messages with

all its neighbors.

A generic framework for presenting gossip protocols was proposed in [64]. In particular,

it highlighted the advantages of designing gossiping protocols using a pull-push approach

for higher reliability. This framework was later extended to ad-hoc networks in [12, 53].

An example of a protocol for ad-hoc networks that uses a pull-push approach and is easily

expressed in the above framework is [79]. Both RAPID and BDP can also be seen as specific

13


instantiations of pull-push dissemination.

An additional protocol for reliable broadcast and manycast in ad hoc networks called

Scribble has been proposed in [120]. In Scribble, the responsibility for dissemination ini-

tially rests with the manycast originator, which periodically broadcasts the message, and is

subsequently passed around to other nodes. The termination condition in Scribble is deter-

mined by piggybacking a bit vector for all known nodes that have received the broadcast

message. Scribble does not employ probabilistic mechanisms and thus suffers from increased

latency and is more message consuming.

Another work that proposed a gossip-based multicast protocol resistant to DoS attacks,

Drum, was presented in [10]. Drum focuses on multicast only, and as a gossip-based protocol,

it relies on a high level of redundancy. Drum achieves DoS resistance using a combination

of pull and push operations, separate resource bounds for different operations, and the use

of random ports in order to reduce the chance of a port being attacked.

Random walk techniques have also been used to maintain group membership in ad-hoc

networks [36], as well as reliable multicast. Yet, these services only provide probabilistic

guarantees, with a probability that is based on the network density and the cover time of

the random walk agent.

Spanning tree based overlays have been often used as the main scheme for disseminating

messages to large groups, e.g., in IP multicast [90, 111] and in the MBone [42, 80]. More

sophisticated overlays such as hypercubes and Harary graphs have been explored, e.g.,

in [47, 78], as well as distributed hash tables like SCRIBE [101].

There has been a lot of work on securing point-to-point routing schemes against mali-

cious nodes. One example is the protocol presented in [7]. In this work, the authors describe

a mechanism for detecting malicious faults along a path and then discovering alternative

paths. Another secure routing protocol (SRP) has been proposed in [88]. SRP requires a

secure association between each pair of source and destination but assumes that Byzantine

nodes do not collude. Yet another protocol, SMT [89], protects pairwise communication by

breaking the message into several pieces based on a coding scheme that allows reconstruct-

ing the message even when some pieces are lost. Each piece is then sent along a different

path. Additional examples of secure point-to-point routing include, e.g,. [103, 124, 126].

The work of Minsky and Schneider [84] explored disseminating information using gossip

14


in wired networks, when some nodes can be faulty. This is by only trusting gossips that have

gained the support of at least f + 1 nodes, where f is the number of potential Byzantine

nodes. Several other works have also proposed a Byzantine multicast scheme that sends a

message along f + 1 distinct paths [33, 81]. Similarly, [16] has studied how to reduce the

possibility of interception by using multiple paths chosen in a stochastic manner.

Reliable Byzantine tolerant broadcast and multicast in networks where all nodes can

communicate directly with each other has been formally described in [18], and has been

explored, e.g., in [82]. Additionally, Byzantine tolerant atomic broadcast in general network

topologies that maintain connectivity between correct nodes has been investigated in [33].

Also, the works in [9, 20] have proposed a formal framework for defining and implementing

reliable multicast protocols in a hybrid failure environment (Byzantine, crash, and omission)

based on modern cryptography. In particular, they have investigated the computational

complexity of such protocols.

A framework for fault-tolerance by adaptation was proposed in [28]. In this framework, a

simple protocol is run during normal operation alongside some failure detection mechanism.

Once a failure is detected, the execution switches to a masking protocol. This idea was

demonstrated in [28] on the broadcast problem, which results in a somewhat similar solution

to BDP protocol. However, in [28] it was not mentioned how the overlay (a tree in their

case) is constructed and maintained. Also, the masking protocol was flooding, whereas we

avoid flooding even when failures are detected. Instead, in BDP, local message recovery

is first attempted. Moreover, in [28] it was not explained when and how to return to the

simple protocol once a failure is compensated for. Finally, our work encapsulates failure

detection behind failure detectors, which results in a modular implementation.

A reliable broadcast protocol in Mobile Ad-Hoc Networks that uses message depen-

dencies was presented in [108]. The protocol does not require explicit acknowledgements;

instead message dependencies provide implicit acknowledgements. Additionally, their pro-

tocol can be used to determine message stability, an important property for practical im-

plementations of a broadcast protocol.

15


2.2 Byzantine Failures & Failure Detectors

Byzantine failures have been introduced by Lamport, Shostack, and Pease in the context of

synchronous systems in [74]. Byzantine failures have been mostly studied in the context of

the consensus problem. The first randomized protocols to solve consensus in asynchronous

Byzantine systems have been proposed by Ben-Or [13] and Rabin [94]. Both Toueg and

Bracha have presented randomized asynchronous consensus protocols that are optimal with

respect to the number of processes that can exhibit a Byzantine behavior in [113] and [17],

respectively. Since then, many protocols have been published including, e.g., [21, 23, 41]

(this is far from being an exhaustive list).

The notion of a failure detector, which captures the required functional properties of

failure detection without specifying explicit timing assumptions, was initiated by Chandra

and Toueg in the context of the Consensus problem [27]. Mute failure detectors were ini-

tially proposed in [37, 38] in order to solve Byzantine Consensus in otherwise asynchronous

systems. They were later used also in [11, 48, 69].

2.3 Group Communication & Replication

Group communication has a noted research history [15, 30]. Yet, most of the systems de-

veloped ignore Byzantine failures. The few group communication systems that focus on se-

curity include SecureRing [70], Ensemble [58, 100], Rampart [99], Antigone [83], ITUA [97],

Cactus [59], and Secure Spread [6]. We elaborate on these systems below.

Ensemble’s architecture addresses security [100], yet it only protects the system from

external attacks, and does not handle Byzantine failures. Antigone is a framework that en-

ables specifying flexible application security policies [83]. The framework allows controlling

various quality of service issues, including security, but does not handle Byzantine failures

either.

Secure Spread [6] is a group communication system that was designed to provide group

communication over WANs. Spread integrates two low-level protocols: the Ring protocol

in each site and the hop protocol connecting the sites. Secure Spread relies on strong

synchronization guarantees to assure that no member can receive and decrypt messages

16


after it left the group and no new member can receive and decrypt messages sent before it

joined the group. Secure Spread also ignores Byzantine failures.

Rampart is the first group communication system that handled Byzantine attacks [99].

Rampart allows dynamic group membership and it must exclude faulty replicas from the

group to make progress (e.g., to remove a faulty primary and elect a new one). The

SecureRing system consists of a reliable delivery protocol, a group membership protocol,

and a Byzantine fault detector [70]. The system protects a low-level ring by authenticating

each transmission of the token and data message received. Both Rampart and SecureRing

can guarantee safety if fewer than 1/3 of the replicas are faulty. Additionally, Rampart and

SecureRing provide group membership protocols that can be used to implement recovery,

but only in the presence of benign faults.

The ITUA project has the goal of developing a middleware based intrusion tolerance

solution that helps building survivable distributed applications [97]. They have also taken

the approach of extending an existing layered group communication system, in their case,

C-Ensemble, and making it resilient to Byzantine faults. ITUA uses an adaptive and

unpredictable response as a major technique to cope with an attacker and its architecture

separates the role of detection from replication management. The Cactus project also enjoys

a layered micro-protocol architecture that allows adaptability and flexibility [59], and also

has a Byzantine tolerant protocol stack. Interestingly, the programming model of Cactus

is not virtually synchronous.

Rampart, SecureRing, Cactus, and ITUA all suffer from limited performance since they

use costly protocols and rely intensively on public key cryptography. On the other hand, the

BFT system, by Castro and Liskov [25], provides state-machine replication for an (almost)

asynchronous network, where fewer than a third of the replicas may fail. BFT operates in

epochs, where each epoch is made up of two phases, an optimistic phase and a recovery

phase. In the optimistic phase, one node is designated as a leader; this node decides on the

ordering of messages and notifies all other nodes about it using Bracha’s uniform broadcast

protocol [17]. If the leader becomes suspected, then it is being replaced using an agreement

protocol. BFT uses MACs to authenticate all messages and public-key cryptography is

used only to exchange the symmetric key pairs to compute the MACs. We have also taken

the approach of using symmetric key pairs. However, we implement a fully fledged Group

17


Communication system with a proven scalability of up to 50 nodes, whereas BFT only

supported active replication.

The Query/Update (a.k.a. Q/U) protocol offers an optimistic quorum approach for

providing efficient Byzantine tolerant replication [2]. According to the Q/U protocol, clients

access servers trying to obtain a quorum that ensures atomic execution of queries and

updates. However, unlike traditional quorum based approaches, the Q/U protocol enables

a one pass execution of operations in “good scenarios”, i.e., when there are no conflicting

accesses to the same objects. On the other hand, concurrent accesses are resolved using

a probabilistic back-off protocol. This same technique is used to eliminate the need for

locks while supporting read-modify-write semantics as well as operations accessing multiple

objects atomically. The benefit of this approach, compared to consensus based approaches

like BFT, is in its improved fault-scalability, or in other words, the performance degradation

involved in tolerating an increasing number of faults. This approach can be viewed as

trading guaranteed termination for probabilistic one (due to their back-off protocol), in

exchange for better scalability.

Another approach that exploits speculation extensively to reduce the overhead and the

latency of BFT replication is Zyzzyva [72]. In failure-free and synchronous executions,

Zyzzyva is extremely efficient since requests complete in 3 one-way message delays. Yet, if

there is only single faulty replica Zyzzyva requires 5 one-way message delays. To decrease

the latency, the authors propose Zyzzyva5, which requires 5f + 1 replicas and completes

requests in 3 one-way message delays even if there are f faults in the replicas (except for

the primary).

The BAR (Byzantine, Altruist, Rational) framework was recently introduced in order

to handle systems in which some nodes are Byzantine, some are rational, and the rest are

altruist [4]. That is, Byzantine nodes can deviate arbitrarily from their protocol, rational

nodes only deviate from the protocol is they can gain something by that, and altruists always

obey the protocol. A generic set of services that accommodate this generalized failure model

was also developed, as well as a specific collaborative storage service, nicknamed BAR-B [4].

The BAR model is more suitable for collaborative systems, in which services are hosted on

the participants machines, and therefore it is likely that many of these nodes will only

participate if they are given an incentive to do so. On the other hand, this comes at a

18


substantial performance cost, and is thus not suitable for a dedicated servers based system.

It may be interesting to investigate adapting the BAR approach to ad-hoc networks, as the

latter is also prone to Byzantine and rational behavior.

Another optimized Byzantine atomic broadcast protocol is the Parsimonious Broadcast

protocol [96]. This protocol also includes a leader based optimistic phase and a recovery

phase. Yet, it utilizes a consistent broadcast service rather than uniform broadcast, in order

to reduce the message complexity from O(n2) to O(n). However, the protocol requires

public-key cryptography. Based on our results and the results of BFT [25], this might be a

limiting factor in its practical applicability (there are no performance measurements in [96]),

in particular with respect to throughput [45].

When the execution of client requests is computation-intensive, it is worth splitting the

decision on the execution order from the execution itself [125]. For example, in [95], they

use an agreement cluster of 3f + 1 nodes to decide only on the ordering of executions, and

then pass the execution itself to a set, called primary committee, of only f + 1 nodes. The

generated replies of the primary committee are then compared by the agreement cluster.

If all replies are the same, then they are returned to the client. Otherwise, if there is a

mismatch, the request is sent to additional f servers; a reply that repeats at least f + 1

times is declared correct and is sent to the client. In this case, a new primary committee

is also elected by the agreement cluster. The savings for compute-intensive requests comes

from the fact that on average, each request is executed by only f + 1 nodes. This is in

contrast with having a request executed on all 3f + 1 nodes required to decide on the

ordering. However, this only makes sense when the requests are indeed compute-intensive,

since the mechanism described above involves non-negligible overheads. Our results are

applicable to splitting approaches since they can help optimize the performance of the

agreement cluster.

The MAFTIA [119] project has explored two different approaches to building intrusion-

tolerant group communication protocols. The first approach is to use a linear secret sharing

scheme based on a generalized adversary structure that can model a more realistic set of fault

assumptions. The second approach is based on the use of a Trusted Timely Computing Base

(TTCB). Moreover, the use of TTCB was explored as another means of solving Byzantine

Consensus efficiently in [32].

19


In general, when it comes to benign failures, the approach of having a simple and efficient

protocol most of the time and only fixing things when needed has been practiced in the

area of group communication for a long time. For example, the Horus system included 4

optional total ordering protocols [52], two of which were leader based and two were token

based. In all of them, during normal computation, the protocol is very simple and proceeds

very efficiently, while a failure of the leader or token holder is being compensated for during

the computation of the new view.

20


Chapter 3

Preliminaries

3.1 Ad-Hoc Networks Model

We assume a collection of nodes placed in a given finite size area. A node in the system

is a device owning an omni-directional antenna that enables wireless communication. A

transmission of a node p can be received by all nodes within a disk centered on p whose

radius depends on the transmission power, referred to in the following as the transmission

disk ; the radius of the transmission disk is called the transmission range. The combination

of the nodes and the transitive closure of their transmission disks forms a wireless ad-hoc

network.1

We denote the transmission range of device p by rp. This means that a node q can only

receive messages sent by p if the distance between p and q is smaller than rp. A node q is

a direct neighbor of another node p if q is located within the transmission disk of p. In the

following, N t(1, p) refers to the set of direct neighbors of a node p at time t and N t(k, p)

refers to the transitive closure with length k of N t(1, p) at time t. By considering N t(1, p)

as a relation (defining the set N t(1, p)), we say that a node p has a path to a node q at time

t if q appears in the transitive closure of the N t(1, p) relation.

As nodes can physically move, there is no guarantee that a neighbor q of p at time t will

remain in the transmission disk of p at a later time t′ > t. Additionally, messages can be1In practice, the transmission range does not behave exactly as a disk due to various physical phenomena.

However, for the description of the protocols in this work it does not matter, and on the other hand, a diskassumption greatly simplifies the formal model. At any event, our simulation results are carried on asimulator that simulates a real transmission range behavior including distortions, background noise, etc.

21


lost. For example, if two nodes p and q transmit a message at the same time, then if there

exists a node r that is a direct neighbor of both, then r will not receive either message, in

which case we say that there was a collision. Yet, we assume that a message is delivered

with positive probability.

3.2 Malicious Failures

A node is said to incur a malicious failure if it tries to harm the execution of the protocol

that it is supposed to perform. A node that incurs a malicious failure is referred to as a

malicious node, while the rest of the nodes are called correct. In particular, a malicious node

can avoid generating messages that it is expected to, fail to deliver messages it received from

the network, send different versions of a message to different nodes, etc. If a node is correct,

then it is presumed to be correct throughout the execution of the protocol. A special type of

failure is called crash, which means that the process stops incurring and generating events

of any kind. Note that a malicious failure is a sub-instance of a Byzantine failure where

a node can deviate in any arbitrary manner from its specification [74]. In this work, we

assume that up to f out of the total of n nodes in the system may experience some kind of

failures.

3.2.1 Limitations on the power of the malicious nodes

We assume that the malicious nodes are limited by the following restrictions:

1. Pairs of correct processes who are neighbors of each other are connected by fair-lossy

communication channels [3]. In other words, each message has a positive probability

of being delivered and malicious nodes cannot constantly collide all the messages in

their area.

2. The malicious nodes cannot impersonate other nodes. This can be realized using

cryptography [106].

3. Either malicious nodes cannot control the way the devices moves or there is a limita-

tion on their speed.

4. The dynamic subgraph of correct nodes is continuously connected.

22


Validity and implications of the assumptions

The first assumption is necessary, since otherwise, malicious nodes can cause a denial of

service attack by constantly sending messages that jam the network (at the MAC level).

This assumption is legitimate whenever the malicious nodes are battery operated, in which

case, after some time, their battery will drain and they will not be able to constantly collide

all the messages. Also, this assumption can be reasonable if the malicious code is only

executed in user mode, and therefore has no direct access to the wireless network card.

Alternatively, various electronic warfare techniques, such as frequency hopping can be used

to overcome jamming attacks [29].

The second assumption is common in researches that deal with malicious nodes. Without

this assumption, the malicious node can pretend to be someone else, send corrupt messages

and cause other nodes to suspect correct nodes. This assumption can be fulfilled using

cryptography. The price is having cryptographic infrastructure in place.

The third assumption prevents malicious nodes from harming the communication between

correct nodes in one part of the network and instantly move to another part of the network,

to harm the communication in that part of the network as well, and so on. This assump-

tion is valid whenever malicious nodes are not much stronger than the correct nodes, or

maliciousness is caused by code intrusion to some user’s mobile devices and therefore their

speed cannot be much higher than the speed of correct nodes.

Finally, assumption 4 guarantees that malicious nodes cannot disconnect the subgraph of

correct nodes. This assumption enables to disseminate all the messages to correct nodes

relatively fast. It is possible to slightly weaken this assumption by assuming the following:

Let V ∗ be the set of correct nodes. Then,

∀s, d, t ∃v1, v2, ..., vk ∈ V ∗, t1, t2, ..., tk−1 such that(tk−1 ≥ tk−2 ≥ ... ≥ t1 > t

) ∧ (v1 ∈

N t(1, s), v2 ∈ N t1(1, v1), ..., vi ∈ N ti+1(1, vi+1), ..., vk ∈ N tk−1(1, d))

The implication of weakening assumption 4 is that the dissemination of messages to correct

nodes will take more time and each node will have to store all the messages until all the

correct nodes will receive them. Hence, it may cause both higher latency and a considerable

increase in the size of buffers that every correct node should have in order to disseminate

all the messages to all correct nodes.

23


3.3 Failure Detectors

Failure detector is a distributed oracle that provides hints about the operational status of

processes. Each local oracle (module) monitors a subset of the processes in the system, and

maintains a list of those that it currently suspects to be faulty. Each failure detector oracle

can make mistakes by erroneously adding processes to its list of suspects: i.e, it can suspect

that a process p is faulty even though p is still running and behaves in a correct way. If

this module later believes that suspecting p was a mistake, it can remove p from its list of

suspected nodes. Thus, each module may repeatedly add and remove processes from its list

of suspects. Furthermore, at any given time the failure detector modules at two different

processes may have different lists of suspects.

An inspection of many middleware systems and protocols implementations shows that

most of the messages that are sent in those systems have a header part and a data part.

The header part can often be anticipated based on local information only while the data

part cannot. For example, the type of a message, the id of the sender, and a sequence

number of the message are part of the header. On the other hand, the information that the

application level intended to send is part of the data.

Based on this, we define a mute failure as failure to send a message with an expected

header w.r.t. the protocol. The process is considered as mute if it experiences a mute

failure. Similarly, a verbose failure is sending messages too often w.r.t. the protocol and the

process is considered as verbose if it experiences a verbose failure. Note that both types of

failures can be detected accurately in a synchronous system based on local knowledge only.

This is because in synchronous systems each message has a known bounded deadline, so it

is possible to tell that a message is missing. Similarly, it is possible to accurately measure

the rate of messages received and verify that it is below an agreed upon threshold.

Obtaining synchronous communication in ad-hoc networks with standard hardware and

operating systems is extremely difficult. On the other hand, observations of communication

networks indicate that they tend to behave in a timely manner for large fractions of the

time. This is captured by the notion of the class ♦Pmute of failure detectors [11, 37, 38, 48].

This class includes all failure detectors that satisfy the following properties:

• Muteness Strong Completeness: Eventually, every mute or permanently disconnected

24


process is permanently suspected by every correct process.

• Eventual Strong Accuracy: There is a time after which no correct process that is not

disconnected is suspected.

In other words, such failure detectors are assumed to eventually (i.e., during periods of

timely network behavior) detect mute failures accurately. In this eventuality, all nodes

that suffer a mute failure are suspected (known as completeness) and only such nodes

are suspected (known as accuracy). This approach has the benefit that all synchrony

assumptions are encapsulated behind the functional specification of the failure detector (i.e.,

its ability to eventually detect mute failures and verbose failures in an accurate manner).

This also frees protocols that are based on such failure detectors from the implementation

details related to timers and timeouts, thus making them both more general and more

robust.

In a similar manner to ♦Pmute, we define ♦P verbose as a class of failure detectors that

eventually reliably detect verbose failure and ♦P trust as a class of failure detectors that

eventually reliably detect node misbehavior. During this work we assume that the failure

detector Mute is in the class ♦Pmute, Verbose is in the class ♦P verbose and Trust failure

detector is in the class ♦P trust.

An inherent problem with malicious failures is that by definition they are tightly related

to the semantics of a given protocol. Thus, a pure general detector, like the Chandra

and Toueg ones [27] can never be used to detect them. On the other hand, modularity

principles advocate the use of a failure detection module rather than having an ad-hoc

detection mechanism interleaved in the code of each protocol.

3.3.1 Interfacing with the Failure Detectors

Recall that the goal of the Mute failure detector is to detect when a process fails to send a

message with a header it is supposed to. To notify this failure detector about such messages,

its interface includes one method called expect (see Figure 3.1). This method accepts as

parameters a message header to look for, a set of nodes that are supposed to send this

message, and an indication if all of these nodes must send the message or only one of them

is enough. Note that the header passed to this method can include wildcards as well as

25


Muteexpect(message header,set of nodes,one or all)

This method notifies the Mute failure detector about an expected message.It accepts as parameters the expected message header,

the set of nodes that are supposed to send the message,and a one or all indication.

The latter parameters indicates if ALL nodes are assumed to send the messageor only ONE of them.

Verboseindict(node id)

This method indicts a node with node id for being too verboseIt causes the Verbose failure detector to increment the suspicion level of node id.

Trustsuspect(node id,suspicion reason)

This method notifies the Trust failure detectors that thelevel of trust of node node id should be reduced based on the provided suspicion reason.

Figure 3.1: Failure Detectors’ Interfacezelitp i`lb ly wynn :3.1 xei`

exact values for each of the header’s fields. In this work we do not focus on how such a

failure detector is implemented. Intuitively, a simple implementation consists of setting a

timeout for each message reported to the failure detector with the expect method. When

the timer times out, the corresponding nodes that failed to send anticipated messages are

suspected for a certain period of time (see discussion in [37, 38]).

The goal of the Verbose failure detector is to detect verbose nodes. Such nodes try

to overload the system by sending too many messages that may cause other nodes to react

with messages of their own, thereby degrading the performance of the system. Detecting

such nodes is therefore useful in order to allow nodes to stop reacting to messages from

these nodes. The interface of Verbose exports one method called indict. This method

simply indicts a process that has sent too many messages of a certain type.

Practically, we assume that Verbose maintains a counter for each node that was listed

in any invocation of its method. The counter is incremented on each such event, and after

a given threshold, the node is considered to be a suspect. Verbose also includes a method

that allows to specify general requirements about the minimal spacing between consecutive

arrivals of messages of the same type. Such a method is typically invoked at initialization

time. As it it is not directly accessed by our protocol’s code, we do not discuss it any

further.

26


In order to recover from mistakes, both the Mute and the Verbose failure detectors

employ an aging mechanism. That is, the suspicion counters for each node are periodically

decremented.

Finally, we also define the Trust failure detector, which maintains a trust level for every

node known to it. Trust suspects a node q if q is suspected by either the Mute failure

detector or the Verbose failure detector, or if the suspect method has been invoked for q

(if, for example, q tried to send a forged message).

27


28


Chapter 4

Reliable Probabilistic

Dissemination in

Wireless Ad-Hoc Networks

This chapter studies three common approaches for achieving scalable reliable broadcast in

ad-hoc networks, namely probabilistic flooding, counter based broadcast, and lazy gossip. The

strength and weaknesses of each scheme are analyzed, and a new protocol that combines

these three techniques, called RAPID, is developed.

Specifically, the analysis in this work focuses on the tradeoffs between reliability (per-

centage of nodes that receive each message), latency, and the message overhead of the

protocol. Each of these methods excels in some of these parameters, but no single method

wins in all of them. This motivated the need for a combined protocol that benefits from all

of these methods and allows to trade between them smoothly.

4.1 System Model

We assume the same model as Section 3.1. We also assume that some of the nodes may act

selfishly, i.e., they may refuse to forward messages of other nodes. Such nodes are called

selfish whereas the others are called correct. We assume that the correct nodes in the system

continuously form a connected sub-network. More severe malicious behavior is discussed

29


in Sections 4.3.3 and 4.4.2.

4.2 Common Reliable Dissemination Techniques

In this section we present the various techniques used for dissemination in wireless ad hoc

networks and discuss their properties.

4.2.1 Probabilistic Flooding

In the probabilistic approach, whenever a node receives a message, it applies some locally

computable probabilistic mechanism to randomly determine whether it should broadcast

the message or not [24, 57, 76]. Probabilistic protocols are appealing since they are very

simple and are inherently robust to failures and mobility. Moreover, these protocols enable

messages to advance asynchronously, and therefore they exhibit very low latency in deliv-

ering messages. Yet, as was empirically discovered in [57, 76, 104], in order to obtain very

high reliability levels with pure probabilistic broadcasting, one has to set the retransmission

probability to high values. This in turn translates into a very large number of redundant

messages.

Below, we obtain the following results about probabilistic flooding: We provide a model

for analyzing the tradeoff in probabilistic flooding between the number of randomly selected

nodes that retransmit each message in a given one hop neighborhood and the expected reli-

ability level. In other words, this analysis formally captures the tradeoff between efficiency

and reliability offered by pure probabilistic flooding. This, for example, enables designers

to decide on a forwarding probability based on their goals w.r.t. this tradeoff.

Second, our formal analysis shows that in order to achieve a given tradeoff point between

reliability and efficiency, it is enough that a constant number of nodes in each one hop

neighborhood will retransmit a message. Constant here means independent of the nodes

density. In other words, the forwarding probability of each node should be set in reverse

proportion to the size of its neighborhood. This probability can be expressed as β/ni,

where ni is the neighborhood size of node i and β is the required constant of forwarders.

Further, the behavior of the reliability w.r.t. forwarding probability is concaved with a

knee at values of β between 2.5 and 3.5 (Figure 4.1). Setting the forwarding probability to

30


these values results in delivery to 80%-90% of the nodes very quickly and very efficiently.

However, for boosting the reliability beyond these levels, it makes more sense to utilize some

complementing measures.

Finally, we show that regardless of the forwarding probability, pure probabilistic proto-

cols cannot ensure 100% reliability. This again hints that probabilistic flooding should be

aided by another mechanism if one wishes to ensure extremely high levels of reliability. We

now turn to the details of the analysis.

Formal Analysis of Probabilistic Flooding Probability

Our theoretical analysis in this section relies on a formal graph model of 2-dimensional

wireless ad hoc networks. The network connectivity graph G = (V, E) of an ad hoc network

is a special case of a d-dimensional Unit Disk graph, in which n nodes are embedded in

the surface of a d-dimensional unit torus, and any two nodes within Euclidean distance r of

each other are connected. When the nodes are placed uniformly at random on the surface

the graph is known as a Random Geometric Graph (RGG) [91] and is denoted by Gd(n, r).

Specifically, the G2(n, r) graph is often used to model the network connectivity graph of

2-dimensional wireless ad hoc networks and sensor networks [55]. In our case we assume n

nodes are placed uniformly at random in the rectangular area [a, b] and the transmission

radius r is scaled accordingly such as G2(n, r) is connected with high probability. It has been

shown (by Gupta and Kumar [55] and [87]) that for r satisfying πr2 ≥ ab log n+c(n)n , G2(n, r)

is asymptotically connected with probability one if and only if c(n) → ∞ as n → ∞. We

will therefore assume that r satisfies the above condition and the network is connected.

We stress here that the uniform distribution of nodes in the space is only used in the

theoretical analysis of this section, in order to set the retransmission probability in the

most efficient way. The correctness of RAPID does not depend on this assumption. If

the uniformity assumption does not hold, our protocol in Section 4.3 will ensure reliable

delivery in any case, alas possibly with higher communication cost.

Denote by davg the average number of neighbors of any node in G2(n, r). It is well

known that davg ≤ πr2(n−1)ab and for large networks, when the edges effect is negligible,

davg ∼ πr2(n−1)ab . It has been previously shown in [12] that the maximal and minimal

31


degrees are of the order of davg with high probability. That is, the actual degree of any

node in G2(n, r) is close to davg with high probability.

Assume some broadcasting algorithm A, which picks for every message m, a set of nodes

S that transmit m. Every node in S is picked with probability P = βdavg

from all network

nodes, independently from all other nodes, where β is a parameter called the reliability factor

of algorithm A. Informally, β is the average number of nodes in each one-hop neighborhood

that retransmit m. Also assume that a message that was sent has a probability Q to be

successfully received by a neighboring node. Let Yp be a random variable corresponding

to the number of times that node p has received a given message. We calculate below an

upper bound on the probability that an arbitrary node will not receive m, or in other words,

Pr(Yp = 0).

Claim 4.2.1 For any node p, the probability that p does not receive a message m is upper

bounded by e−βQ.

Proof: S is the set of all nodes that transmit message m. Notice that the size of S is a

binomial random variable with mean nP.

For each q ∈ S and any node p, let Xp,q be a 0-1 random variable indicating whether

the node p receives a message m that was sent by the node q or not. Node p can receive a

message m sent by q if and only if q is a neighbor of p in G2(n, r) and m has not collided

with other messages. Since two nodes are neighbors if and only if they are at distance at

most r from each other, then Pr(Xp,q = 1) = Qπr2

ab .

Let Yp be the random variable indicating the number of times node p has received m.

Naturally, if p ∈ S, Pr(Yp = 0) = 0. Otherwise,

Pr(Yp = 0) =n∑

i=0

Pr(Yp = 0||S| = i) Pr(|S| = i) =

n∑

i=0

∏

q∈S,|S|=i

Pr(Xp,q = 0) Pr(|S| = i) =

n∑

i=0

(1−Qπr2

ab)i

(n

i

)P i(1− P)n−i =

32


n∑

i=0

(n

i

)(P − PQπr2

ab)i(1− P)n−i = (1− PQπr2

ab)n

≤ (e−PQπr2

ab )n = e− β

davgQπr2

abn ≤ e

− βab

πr2(n−1)Qπr2

abn ≤ e−βQ

In the forth line we have used the binomial coefficients formula and in the last line the

inequality 1− x < e−x, which holds for all x > 0.

Figure 4.1 depicts an upper bound e−βQ on the value of Pr(Yp = 0) for an arbitrary

node p as a function of β and Q.

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

Beta

Non

rec

eive

pro

babi

lity

Successfull receive probability 1Successfull receive probability 0.8Successfull receive probability 0.6

Figure 4.1: An upper bound on the probability that an arbitrary node does not receive amessage m

m drced lawi `l edylk znevy zexazqd lr oeilr mqg :4.1 xei`

It can be seen from the figure that the probability that a given node does not receive

a message m is small for quite small values of β. For example, for Q = 0.8, Pr(Yp = 0)

is less than 0.06 for β = 3.5. That is, if there are only β = 3.5 nodes in every one-hop

neighborhood that transmit m and Q = 0.8, approximately 94% of all nodes will receive m.

Discussion: A broadcasting algorithm that sets the retransmission probability P inversely

proportional to the average degree has a number of advantages. First, the number of

transmissions (which is equal to the average size of set S) is constant with respect to the

number of nodes n and to the nodes’ density.

33


E(|S|) = nP =nβ

davg=

nβπr2(n−1)

ab

≤ abβ

πr2.

That is, the number of transmissions does not depend on the overall number of nodes,

but rather only on the physical size of the network, the transmission radius and the re-

quired reliability level. Hence, for a given physical network, there is a minimal number of

retransmissions that is required to guarantee high broadcasting reliability, and this number

is constant with respect to the number of nodes and to the nodes’ density. In particular,

such a broadcast protocol is highly efficient in dense networks.

Second, a probabilistic broadcasting algorithm that picks nodes uniformly at random

with probability inversely proportional to the average degree, can achieve high coverage of

the network with relatively few redundant messages. Most (but not all) of the network

nodes will receive almost every message while using a relatively small group S.

On the Impossibility of Absolute Reliability

Notice that no pure probabilistic protocol can ensure absolute dissemination reliability.

Consider an example of a node q with neighbors n1, . . . , nk, each of which forwards each

message they receive with probability pni . Hence, with probability Prob = Πi=1,...,k(1−Pni),

no node will retransmit the message and therefore q will not receive the message. No matter

how high the probabilities Pni are, Prob is non zero and can sometimes be non negligible. In

particular, even if the average density across the whole network is high, if nodes are scattered

in a somewhat random manner, there is a likelihood that some parts of the network will

have low density. In those parts k can even be less than 2. Thus, the probability that

there will be some node q that will not receive some messages is non-negligible in any pure

probabilistic protocol.

4.2.2 Counter Based Broadcast

The shortcomings of probabilistic flooding has led to the development of the counter-based

approach [24, 57, 114, 115] and its distance-based and location-based derivatives (and their

combinations). The idea in these schemes is that rather than placing the randomness

directly on the retransmission probability, the randomness is placed on the timing of the

34


i

nk

1

s p q

n

n

Figure 4.2: A transmission by a node s can be received by all nodes within itstransmission range: p, n1, ...,nk

nk,... ,n1 ,p :ely xeciyd geeha miznvd lk lv` lawzi s znev ici lr xeciy :4.2 xei`

rebroadcasting. That is, every node p that receives a message m for the first time, decides

to rebroadcast the message after some random time. If during this chosen period p hears k

(the counter) retransmissions of m, then p decides to abort its retransmission.

Interestingly, this is another way to ensure a constant number of retransmissions in each

neighborhood. But, as opposed to the probabilistic method, the number of retransmissions

is deterministically guaranteed by the protocol. Despite this, as we show below, even the

counter based approach cannot guarantee reliable delivery of all messages on an arbitrary

topology. In fact, if we assume that the nodes are uniformly distributed in the network,

and that the random function used for setting the retransmission time is independent of

the node’s location, then we can utilize our formal analysis from Section 4.2.1 to calculate

the reliability level of a counter-based protocol for a given value of k.

Empirical studies have shown that counter-based schemes can obtain high delivery ratios

with relative efficiency [24, 57, 114, 115]. Yet, these works do not include a formal analysis

of this behavior. Moreover, as we now discuss, counter-based schemes are inherently slower

than probabilistic schemes.

Latency

As mentioned before, the rebroadcasting time of each node is set randomly. However, in

order for the protocol to succeed, the values should be set from a sufficiently large range so

that the number of collisions will be small, or even zero [60, 61]. In other words, the range

35


from which the rebroadcast timing is chosen must be proportional to the number of nodes

in each neighborhood. For ensuring zero collisions, by using the birthday paradox, we can

deduce that the range should be roughly sl×n2i , where sl is the minimal slot required for a

message transmitted by one node to be heard by any other node in its neighborhood and ni

is the size of the neighborhood of node i. For example, routing protocols in ad hoc network

usually apply a random delay uniformly distributed between 0 and 10 milliseconds [19].

On the other hand, with probabilistic flooding as we suggest, and assuming β ≤ 3.5, at

most 3.5 nodes might retransmit simultaneously in each neighborhood. Hence, the jitter

applied to probabilistic forwarding can be much shorter than for counter-based schemes.

On the Impossibility of Absolute Reliability

We claim that no counter-based scheme can guarantee reliable delivery of all messages on

an arbitrary topology. Consider a scenario w.r.t Figure 4.2. When node s broadcasts a

message m, nodes p and n1, ..., nk receive it. If some of ni nodes rebroadcasts the message

before node p, p will refrain from rebroadcasting m and therefore q will not receive m. For

any counter-based scheme and for any value of the counter in p, there could be as much ni

nodes as needed, such that ni is a neighbor of s and p, but not of q. Then, all ni nodes

might rebroadcast m before p, by this satisfying the counter in p and preventing p from

rebroadcasting the message m.

4.2.3 Lazy Gossip

In lazy gossip [67], nodes periodically gossip with their neighbors about the ids of messages

they have received. Yet, this gossiping is performed in a deterministic manner, in the sense

that each node sends such a gossip message as a broadcast to all its neighbors. Whenever a

node q learns than one of its neighbors p has a message that q has missed, q explicitly asks

p to retransmit this message. Here, there can be a few optimizations such as broadcasting

requests for retransmissions, etc.

Lazy gossip incurs a constant per node message overhead due to the need to periodically

gossip about messages. The overall network overhead grows with the network density.

However, due to its deterministic nature, lazy gossip can obtain absolute reliability.

36


The shortcomings of lazy gossip mainly comes from its very high latency and the fact

that for reliability, it must gossip multiple times for each message. The latency stems from

the fact that messages are propagated only due to gossips, and these only occur periodically.

In order to keep the message overhead reasonable, gossips might be performed once every

several seconds, in which case forwarding a message across multiple hops can take dozens of

seconds. Also, due to message loss, obtaining absolute reliability involves unlimited memory

consumption and unbounded message sizes, at least in theory.

4.3 The RAPID Protocol

For didactic purposes, we develop our protocol in few steps. The basic version of our

protocol appears in Figure 4.3 whereas an enhanced version of the protocol that sends even

fewer messages and provides higher delivery ratio is depicted in Figure 4.4. A malicious

resilient version of RAPID, appears in Section 4.3.3. In all figures we make use of two

primitives. The primitive prob bcast denotes an immediate broadcast to all the direct

neighbors of the sender with a given probability. The primitive lazycast initiates periodic

broadcasting of the given message to the direct neighbors of the sender.

Our protocol is based on the following principles: Each node calculates its broadcast

probability according to the number of observed neighbors at a given moment. Since in

our protocol each node needs to know the number of its one-hop neighbors, every node

periodically sends a heartbeat/hello message (unless it has already sent another message

during a predefined time interval).

The rebroadcasting probability used by RAPID is set to min(1, β|Nt(1,k)|). β is a param-

eter of the protocol and corresponds directly to the communication overhead. For bigger β

higher reliability level is achieved, however with larger communication cost. As can be seen

in Figure 4.1 (the knee in the graph), a good tradeoff between the number of retransmissions

and the reliability level is achieved when β is set to around 3.5. We further explore the

effect of parameter β on RAPID in the simulation section.

In parallel, every node p periodically broadcasts to its neighbors the headers of messages

p received from other nodes, which is called gossiping. This technique enables nodes who

miss some messages that exist in the system to request these messages from their neighbors.

37


Notice that nodes only send headers of messages they possess. Hence, the header of a mes-

sage that does not exist will not be disseminated in the network. Also, whenever possible,

gossip messages are piggybacked on other messages in order to further reduce the generated

traffic. Unlike many other gossiping mechanisms from distributed computing [14], in our

case, gossiping is deterministic, in the sense that a gossip message from p is broadcasted to

all of p’s neighbors at once.

When examining the graph in Figure 4.1, it can be seen that the reliability level obtained

depends on the probability that a transmission will not be lost. Specifically, in wireless

networks, most message losses are caused due to collisions. Hence, to reduce the chance

of collision, and thereby be able to obtain reliability levels similar to the bottom most

line of Figure 4.1, RAPID employs jitter. That is, when a node decides to rebroadcast a

message, it waits for a short random time before doing so. Hence, the small probability of

rebroadcasting plus the short jitter before rebroadcasting means that RAPID very rarely

causes message collisions. The value of jitter is discussed in Section 4.4.1.

4.3.1 Basic RAPID

The Dissemination Task in Details

This protocol is a combination of probabilistic flooding with lazy gossip. Hence, its message

dissemination consists of the following steps: (1) The originator p of a message m sends

m||header(m) to all nodes in N t(1, p) (Lines 1–4 in Figure 4.3). The header part of m

includes a sequence number and the identifier of the originator. (2) The originator p of m

then starts a periodic gossip of header(m) to all nodes in N t(1, p) (Line 5). (3) When a

node p receives a message m for the first time, p accepts m (Lines 6–7). (4) p broadcasts

m with probability min(1, β|Nt(1,p)|) (Line 8 – our protocol was simulated with β equals to

3.5). (5) p starts a periodic gossip of header(m) to all nodes in N t(1, p) (Line 9). (6) If a

node p receives a message m it has already received beforehand, then m is ignored.

Gossiping and Message Recovery in Detail

The gossiping and message recovery part of the protocol is composed of the following sub-

tasks:

38


Upon send(msg) by application do(1) header := msg id||node id;(2) data msg := header||msg;(3) gos msg := header;(4) prob bcast(prob = 1, data msg, DATA);(5) lazycast(gos msg, GOSSIP);

Upon receive(msg, DATA or DATA REPLY) sent by pj do(6) if (have not received this msg before) then(7) Accept(pj , msg); /*forward to the application*/

(8) prob bcast(prob = min(1, β|Nt(1,p)| ), msg, DATA);

(9) lazycast(gos msg, GOSSIP);(10) endif;

Upon receive(gos msg, GOSSIP) sent by pj : do(11) if (there is no msg that fits the gos msg) then(12) /*Ask the neighbors to send the real message*/

(13) prob bcast(prob = min(1, β|Nt(1,p)| ), gos msg, REQUEST);

(14) endif;

Upon receive(gos msg, REQUEST) sent by pj do(15) if (I have the msg that matches gos msg) then

(16) prob bcast(prob = min(1, β|Nt(1,p)| ), msg, DATA REPLY);

(17) endif;

Figure 4.3: Basic RAPID (executed by node p)(p znev ici lr rvean) iqiqa RAPID :4.3 xei`

39


1. When p receives a message m, p gossips header(m) to other nodes in N t(1, p) (Lines 9).

Note that p does not forward gossips about messages it has not received yet. This is

done in order to make the recovery process more efficient.

2. When p receives a gossip header(m) for a message m it has not received yet, p asks its

neighbors to forward m to itself using a REQUEST message (Lines 11–14). Intuitively,

since p received a GOSSIP message about m, one of p’s neighbors should have m and

supply it when needed.

3. When p receives a REQUEST for a message m, yet p has not received m, p ignores

this request. Otherwise, p broadcasts the missing message (Lines 15–17).

One issue that needs to be taken care of is purging received messages, in order to avoid

unbounded memory requirements. This can be done either using timeouts, or by employing

a stability detection mechanism [54, 108]. In this work, we have chosen to use timeout

based purging due to its simplicity. Clearly, in this case there is a tradeoff in setting

the timeout value: a long timeout increases the reliability, but also increases the memory

consumption. From our experiments, it turns out that that even with short timeouts we

can reach reliability above 99.9% in most cases.

4.3.2 Enhanced RAPID

The basic RAPID protocol has an important drawback: if all nodes in a given neighborhood

decide not to broadcast a message, the dissemination of this message would be severely

delayed, as it will only be propagated through the gossip/request mechanism, which is slow.

In order to deal with this drawback and improve the reliability and the latency of

RAPID, we slightly change the protocol by adding a complementing counter-based like

mechanism. That is, whenever p initially probabilistically decides not to rebroadcast m,

but later on p does not hear any other rebroadcasting of m, then p adds m to its casting

queue. Thus, either p will hear a retransmission of m by one of its neighbors, or p will

retransmit m. This optimization of deciding to rebroadcast m even if initially a node p

probabilistically chose not to, but later did not hear any of its neighbors rebroadcast m

helps boosting the reliability of the protocol, by ensuring that a message will be propagated

to almost every neighborhood of the network.

40


Upon send(msg) by application do(1) header := msg id||node id;(2) data msg := header||msg;(3) gos msg := header;(4) prob bcast(prob = 1, data msg, DATA);(5) lazycast(gos msg, GOSSIP);

Upon receive(msg, DATA or DATA REPLY) sent by pj do(6) if (have not received this msg before) then(7) Accept(pj , msg); /*forward it to the application*/

(8) cast queue.add(prob = min(1, β|Nt(1,p)| ), time=random(0, short jitter), msg, DATA);

(9) lazycast(gos msg, GOSSIP);(10) endif;

Upon receive(gos msg, GOSSIP) sent by pj : do(11) if (there is no message that fits the gos msg) then(12) /*Node asks from its neighbors to send the real message*/

(13) cast queue.add(prob = min(1, β|Nt(1,p)| ), time=random(0, short jitter), gos msg, REQUEST);

(14) endif;

Upon receive(gos msg, REQUEST) sent by pj do(15) if (I have the msg that matches gos msg) then

(16) cast queue.add(prob = min(1, β|Nt(1,p)| ), time=random(0, short jitter), msg, DATA REPLY);

(17) endif;

Interceptor(18) if (m that appears in cast queue was received) and (m.type==REQUEST or m.type==DATA REPLY) then(19) cast queue.remove(m);(20) endif;

Upon Expiration of timer of msg in cast queue do(21) cast queue.remove(msg);(22) pr = the probability attached to msg ;(23) type = the message type associated with msg ;(24) prob bcast(prob = pr, msg, type);(25) if (msg was not broadcasted) then(26) cast queue.add(prob = 1, time=long jitter, msg, type);(27) endif;

Figure 4.4: Enhanced RAPID (lines that were modified w.r.t Figure 4.3 are boxed whilelines 18–27 were added)

zexeye zipaln dqtewa ze`vnp 4.3 xei`l ziqgi ezpey xy` zexey) agxen RAPID :4.4 xei`(etqep 27--18

41


The Dissemination task in details

The pseudo-code for the enhanced version of RAPID is listed in Figure 4.4. In this code,

we use a queue called cast queue. The add method of this queue accepts the following

parameters. The sending probability, a time parameter, the message itself and the type of

the message. The time is used in order to set a timer to expire after the corresponding

amount of time elapses. The probability and type are stored alongside the message inside

the queue.

Dissemination in Enhanced RAPID works the same as in Basic RAPID (Section 4.3.1)

except for step 4 of the first paragraph of Section 4.3.1. In Enhanced RAPID, whenever

a node p receives a message m for the first time, it schedules a rebroadcast of m with

probability min(1, β|Nt(1,p)|) to occur after some random jitter (Line 8 in Figure 4.4). If a

received message has never been rebroadcasted, neither by p nor by any of its neighbors,

then p decides to rebroadcast m after all, by invoking prob broadcast with probability 1

(Lines 25–27).

Gossiping and Message Recovery in Detail

The main difference between gossiping in Basic RAPID vs. Enhanced RAPID is in the

cancelling of REQUEST and DATA REPLY messages. That is, in the enhanced protocol

every node p monitors its neighbors and if p planned to broadcast such a message m, but p

heard a transmission of m by its neighbor node, then p cancels the transmission of m. This

cancelling is done in order to eliminate redundant REQUEST and DATA REPLY messages

due to the broadcast nature of wireless networks. In addition, if p decided not to broadcast

m, but it does not hear the transmission of m by any of its neighbors, p broadcasts m.

These issues are handled in Lines 13, 15–17 and 25–27.

Latency of RAPID

In both RAPID and counter-based protocols [24, 57, 114, 115], nodes wait for a certain

amount of time before they rebroadcast a message. Yet, the average waiting time is much

shorter in RAPID than in counter based protocols. Notice that in Figure 4.4 we employ

two jitter lengths, short jitter and long jitter. The first is used to prevent collisions, while

42


the second is used as a corrective measure, as discussed above, and is similar to the counter

based approach. Notice that in order to be effective, the duration of the jitter must be

proportional to the number of expected concurrent transmissions. The expected number

of concurrent transmitters competing for transmission due to the probabilistic mechanism

is quite small (β). On the other hand, in the situations in which long jitter is used in our

protocol, and similarly in counter based protocols, all nodes in the neighborhood might

transmit concurrently. Hence, long jitter must be long enough to accommodate for that.

Consequently, short jitter can be much shorter than long jitter. For example, if the target is

to completely eliminate collisions with high probability, then following the birthday paradox,

the length of the jitter must be proportional to s2, where s is the expected number of

concurrent senders. Moreover, most times in RAPID the timer-based corrective measure

will not be used, so average latency is mostly dominated by short jitter. The actual values

used for both jitters are described in Section 4.4.1.

4.3.3 Maliciousness Resilient RAPID

Due to its probabilistic nature, RAPID can be resilient to many forms of malicious behavior.

Since the decisions that every node takes are based only on the number of its neighbors and

the transmissions it hears, the attacks that a malicious node can perform are quite limited.

We describe below how the protocol was modified in order to overcome these attacks.

Malicious Tolerant RAPID in Details

We use digital signatures in order to prevent a malicious node from forging others’ messages

or trying to impersonate other nodes. Each device p holds a private key kp, known only to

itself, with which p can digitally sign every message it sends [106]. We assume a malicious

node cannot forge signatures and that each device can obtain the public key of every other

device, and can thus authenticate the sender of any signed message.

The originator p of a message m adds two signatures to m before it broadcasts m. The

first signature is calculated on the concatenation of m, p’s node id, and m’s message id,

in order to bind between the context of the message, the node id of its originator and

the message id. The second signature is performed on the p’s node id and the message

id. The objective of the second signature being attached to the message is to speed up

43


Upon send(msg) by application do

(1) gos msg := msg id||node id||sig(msg id||node id);

(2) data msg := msg id||node id||msg||sig(msg id||node id||msg)||sig(msg id||node id);

(3) prob bcast(prob = 1, data msg, DATA);(4) lazycast(gos msg, GOSSIP);

Upon receive(msg, DATA or DATA REPLY) sent by pj do

(5) if (verify signature(msg) = true) then

(6) if (have not received this msg before) then(7) Accept(pj , msg); /*forward it to the application*/

(8) cast queue.add(prob = min(1, β|trusted neighbors| ), time=random(0, short jitter), msg, DATA);

(9) lazycast(gos msg, GOSSIP);(10) endif;(11) else /* the message is not correct */

(12) suspect(pj);

(13) endif;

Upon receive(gos msg, GOSSIP) sent by pj : do

(14) if (verify signature(gos msg) = true) then

(15) if (there is no message that fits the gos msg) then

(16) expect(gos msg,pj);

(17) /* The node asks from the node that sent the gossip to send the real message */

(18) send(gos msg, REQUEST, pj);

(19) endif;(20) else /* the message is not correct */

(21) suspect(pj);

(22) endif;

Upon receive(gos msg, REQUEST, pk) sent by pj do

(23) if (verify signature(gos msg) = true) then

(24) if (I am pk and I have the msg that matches gos msg) then

(25) prob bcast(prob = 1, msg,DATA REPLY);

(26) endif;(27) else /* the message is not correct */

(28) suspect(pj);

(29) endif;

Figure 4.5: Maliciousness Resilient RAPID (lines that were modified w.r.t Figure 4.4 areboxed)

ze`vnp 4.4 xei`l ziqgi ezpey xy` zexey) zeipecf zelitp ipta cinr xy` RAPID :4.5 xei`(zipaln dqtewa

44


the dissemination of gossip messages in the system. That is, in our protocol, every time

a node q receives a data message m, q sends a gossip message about m to its neighbors.

However, the first signature binds both the message header (sender id and message id) with

the message data. Thus, a node that receives a message m cannot generate a valid gossip

message for m only based on the first signature. The second signature is the one that should

be sent with the gossip message. This enables any node that receives m to immediately

start gossiping about m, and be able to attach a valid signature that was generated by the

originator of m, to the gossip message. Otherwise, without the second signature, a receiver

q of m would have had to wait for a separate gossip message about m before q could have

started gossiping about m.

The pseudo-code for the maliciousness resilient protocol appears in Figure 4.5. Here

we introduce four new primitives: send, verify signature, suspect and expect, and the

retransmission probability is being computed based on the number of trusted neighbors

(trusted neighbors). The one-hop neighbors of a node p that it has not suspected yet of

being malicious form its set of trusted neighbors. The primitive send is a point to point

send. The primitive verify signature verifies that sig(m) matches m. If it does not

then m is ignored and the node that sent it is suspected by the receiver of the message.

The primitive suspect permanently removes a node pj that was caught forging a message

from the list of trusted neighbors (i.e., pj sent a message with a signature that fails to

authenticate). On the other hand, expect accepts two parameters: a gossip message and a

node id pj . In response, the node p that executed expect, sets a timer such that the given

message must be received from pj before the timer expires. If such a message is not received

in time, then pj is temporarily removed from the list of trusted neighbors of p. We use it to

temporarily suspect a node, which sent us a gossip but refused to deliver the corresponding

message.

As mentioned before, in the malicious resilient version of RAPID, each node only counts

its one-hop neighbors that it has not suspected yet of being malicious. This is because if

a node is malicious, it might not execute the protocol correctly, and in particular refuse

to forward some messages even when it should do so probabilistically. Hence, if a correct

node p is located in an area with many malicious nodes, then p’s broadcast probability

will become higher due to the fact that it will ignore those malicious nodes in counting its

45


neighbors. Even if malicious nodes manage to mislead a correct node p by pretending to

be correct nodes, the worst thing that can happen is that p’s broadcast probability will

be lower. In this case, any message m that is not sent by the probabilistic rebroadcasting

mechanism will still be forwarded to p’s neighbors either if p does not hear a retransmission

by any of its neighbors or via the gossip/request protocol. Either way, the reliability of the

protocol will not be degraded. The only thing that can suffer is the latency of delivering

the message to all the nodes.

Also, notice that the protocol in Figure 4.5 uses point-to-point requests (for missing

messages) and unconditional replies (node that was requested a message will send it to the

requesting node regardless of other nodes and other messages), rather than probabilistically

broadcasting requests and replies as in the previous versions of the protocol. This is done in

order to prevent attacks in which malicious nodes “convince” some nodes not to send their

messages. For example, consider the following scenario, which is possible with the recovery

scheme of Figure 4.4. A malicious node p can continuously broadcast REQUEST messages

such that its close neighbors will hear the transmission of the messages, while the rest of

its neighbors will not hear the transmissions of those REQUEST messages. Consequently,

the nearby neighbors of p will not broadcast REQUEST messages even if they miss some

messages, since they have heard the transmissions of the corresponding REQUEST messages

by p. Hence, these neighbors of p will never obtain messages that they failed to receive using

the probabilistic dissemination phase. A similar attack is for a malicious node p to always

rebroadcast DATA messages in response for REQUEST messages, but to do so such that

only the close neighbors of p will receive that DATA message, and will therefore never

retransmit it themselves. In this case, the other neighbors of p might never receive such

messages. Hence, by using point-to-point requests for missing messages, we slightly enlarge

the overhead of the protocol on one hand, but on the other hand, we increase the reliability

of the protocol.

It would have been possible to use a similar mechanism to the one used in Enhanced

RAPID in lines 13 and 16, but that would have required an additional twist. In order to

continue using the scheme of lines 13 and 16, each node would have had to store additional

information about messages it has decided not to broadcast due to broadcasts by its neigh-

bors. If some node p receives the same REQUEST (GOSSIP) message several times and

46


p has cancelled the rebroadcast of the corresponding DATA (REQUEST) message, then

p would have to rebroadcast the message (with probability 1) immediately. The code in

Figure 4.5 does not include this optimization for simplicity.

Resilience Against Malicious Attacks

Below we specify a number of specific attacks, which are being overcome by Maliciousness

Resilient RAPID. Those attacks include : (1) forwarding a message with the wrong data, (2)

not forwarding some/all messages (this is known as selfish behavior1), (3) sending gossip

messages without ever supplying the real messages in order to confuse other nodes, (4)

trying to collide others’ messages, and (5) sending messages as point-to-point messages

instead of broadcast messages, thus causing a correct node to decide not to rebroadcast a

message, even if it is the only one among all its neighbors that has received the message.

As mentioned above, the first attack is solved by adding signatures. That is, the origi-

nator of a message m signs the message with its private key and attaches this signature to

the message. Thus, every node p that receives m from q checks m’s signature and if the

signature does not match the content of m, p will suspect q and will not accept the message.

Moreover, p will no longer count q as one of its neighbors for the purpose of calculating the

rebroadcasting probability.

The second attack is solved by the monitoring mechanism. If a malicious node does

not rebroadcast a message m to all its neighbors, then our protocol guarantees that in any

case one of its neighbors will do it. Hence, as long as the correct nodes form a connected

sub-network, every message will be disseminated to all of them.

The third attack is solved using a simple timeout mechanism. When a node p receives

a gossip from q about a message m that p is missing, then in addition to sending a request

for m to q, p starts a timer. If p does not receive the requested message from q after

the timeout, it starts suspecting q as being malicious. In this case, p stops counting q for

calculating its rebroadcasting probability.

As for the fourth attack, in our model we assumed that all messages are delivered with a

non-zero probability. Hence, by assumption, the fourth attack is not possible. The rational1Giving incentives for nodes to participate is beyond the scope of this work. Here we only focus on

overcoming selfish behavior so that it does not prevent correct nodes from receiving messages, assumingthat the correct nodes form a connected sub-network.

47


behind this is twofold: first, if malicious nodes are allowed to collide all messages, then no

protocol can ensure reliable delivery. Second, if all nodes are battery operated, jamming

the channel will drain the battery very quickly, and hence such an attack cannot last for too

long. In particular, whenever malicious nodes are only selfish, rather than mean, then the

fourth attack does not make sense in any case, since it hurts everyone, including themselves.

Finally, if a malicious node sends a point-to-point message instead of rebroadcasting

it, our gossip mechanism will ensure that the message will still be propagated, yet with

an increased delay. In addition, some lower level mechanisms can be used, such as forcing

nodes to send messages and listen to messages only on IP-multicast addresses. Moreover, it

is possible to verify that a received IP-multicast message was also sent to a MAC destination

broadcast address rather than to a point-to-point destination address.

4.4 Simulations

In this section, we evaluate the performance of RAPID and compare it with the performance

of the counter-based protocol of Tseng et al. [114] and with the performance of the GOSSIP3

protocol [57]. In GOSSIP3, when a node q receives a message, it broadcasts the message

to its neighbors with probability P and with probability 1− P it discards the message. In

addition, q broadcasts a message if initially q got a message and did not broadcast it, but

later q did not get the message from at least M other nodes2. The reason for choosing

GOSSIP3 is that it is one of the best studied probabilistic protocols in the literature and

was found to be the best probabilistic broadcast mechanism among all the ones explored

in [57]. In our simulations we have measured the percentage of messages delivered to all the

nodes (delivery ratio), the latency to deliver a message to varying percentages of the nodes,

the load imposed on the network (number of transmitted messages) and the influence of

mute (selfish) nodes on the performance of our protocol.

4.4.1 Setup

We have used the JiST/SWANS simulator [116] to evaluate the protocols. In JiST/SWANS,

nodes use two-ray ground radio propagation model with IEEE 802.11 MAC protocol and2 GOSSIP3 was simulated with probability P = 0.65 and M=1.

48


54Mb/sec throughput. Communication between nodes is by broadcast. Two concurrent

broadcasts can collide, in which case, the messages will not be received by some of the

nodes. The collision may occur without the broadcasting node detecting the problem, a

phenomenon known as the hidden terminal problem [5]. In order to reduce the number

of collisions, we have employed a staggering technique (Figure 4.4). That is, each time a

node is supposed to send a message, it delays the sending by a random period, denoted by

short jitter, which was set to 3 milliseconds. In addition, in the TSENG protocol and in

the counter based mechanism in RAPID we have used a long jitter of 0.33× s2 millisecond

(where s is the expected number of concurrent senders).

The transmission range was set to roughly 200 meters3. The nodes were placed at uni-

formly random locations in a square area of 3500x3500 m2, and unless mentioned otherwise,

the results are reported for networks of 1,000 nodes, which corresponds to roughly 10 neigh-

bors per node. We have also checked other network sizes (2500x2500 m2 and 4500x4500 m2)

with similar density, but the results were qualitatively the same, regardless of the specific

network size and exact number of nodes. An additional analysis of varying network density

is presented in Section 4.4.2.

Mobility was modelled by the Random-Waypoint model [65]. In this model, each node

picks a random target location and moves there at a randomly chosen speed. The node

then waits for a random amount of time and then chooses a new location, etc. In our

case, the speed of movement ranged from 1-10 m/s. Being aware of recent criticisms of the

Random-Waypoint model [22], we set the pause time to be 0 seconds and discarded the first

1000 seconds of simulation time.

In our simulations the number of broadcasting nodes varied from 1 to 200 and the size

of data messages was set to 512 bytes (less than one UDP/IP packet). In every simulation,

every broadcasting node sends 100 messages and then after a cool down period the simula-

tion is being terminated. Each data point was generated as an average of 10 runs. Unless

otherwise mentioned, we use the default values defined in JiST/SWANS. We have used the

default Java pseudo random number generator, initialized with the current system time in

milliseconds as a seed.

3In SWANS one can choose the transmission power which translates into a transmission range based onpower degradation and background noise.

49


In the graphs, we have used the following notation: the enhanced version of our prob-

abilistic dissemination protocol from Figure 4.4 is denoted RAPID; a restricted version of

the enhanced RAPID in which the gossip and the recovery mechanism were disabled is

denoted RAPID-NO-GOSSIP; the counter-based protocol of Tseng et al. [114] is denoted

TSENG; GOSSIP3 is the probabilistic protocol by Haas et al. [57]. We limited the number

of times each message is gossiped by nodes in RAPID to 1. Additional gossip attempts

slightly improve the delivery ratios at the cost of additional messages.

4.4.2 Results

Broadcasting Probability - exploring β

Figures 4.6, 4.7, 4.8 and 4.9 explore the delivery ratio, the number of transmissions and

the latency (in seconds) against the broadcast probability of nodes in RAPID. Since the

broadcasting probability of node i is expressed as β/ni, where ni is the neighborhood size

of i, increasing the value of β leads to an increase in the broadcasting probability of i. The

following discussion and simulations analyze the influence of β values on the latency, the

delivery ratio and the number of transmissions of RAPID.

We can see in Figures 4.8 and 4.9 that when we increase the β value, the latency of

RAPID decreases. It can be explained by the fact that more nodes decide to broadcast

the received message and therefore more nodes receive messages from their neighbors by

the probabilistic mechanism and not due to the completion or recovery mechanisms. Yet,

when the value of β increases, more messages are injected into the network, as can be seen

in Figure 4.7. In addition, the value of β has hardly any influence on the reliability of

RAPID and even for β = 1.5, RAPID delivers all messages to more than 99% of nodes. In

this case more messages are delivered via the completion and the recovery phases, which

increases the latency, but still keeps the reliability of RAPID as high as 99%, as can be seen

in Figure 4.6. Hence, the decision of whether to use RAPID with low or high value of β

can be made based on the tradeoff between the latency and the message load for a given

application. In the following sections, we present RAPID with β = 3.5 since it gives a good

tradeoff between throughput and latency.

50


0 1 2 3 4 5 6 7 80

10

20

30

40

50

60

70

80

90

100Mobility=RandomWaypoint;#nodes=1000;#senders=100;length=3500 m

% re

ceiv

ed m

essa

ges

beta

RAPIDRAPID−NO−GOSSIP

Figure 4.6: Message delivery ratio when allnodes are mobile vs. varying values of β

lk lv` elawzdy zerced jqn feg` :4.6 xei`β ikxra zelzk ,miciip miznvd xy`k miznvd

mipey

0 1 2 3 4 5 6 7 80

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5x 10

6 Mobility=RandomWaypoint;#nodes=1000;#senders=100;length=3500 m

#tra

nsm

issi

ons

beta

RAPIDRAPID−NO−GOSSIP

Figure 4.7: Network load in terms of totalnumber of transmissions when all nodes are

mobile vs. varying values of β,miciip miznvd xy`k zyxd lr qner :4.7 xei`

mipey β ikxra zelzk

Changing the Number of Broadcasting Nodes

Figures 4.10 and 4.11 present a comparison of RAPID with other protocols in mobile net-

works. Figure 4.10 shows the percentage of nodes that received all messages vs. the number

of nodes that initiate one new broadcast per second. RAPID delivers a very high percent-

age of messages (99.9%), even when the number of broadcasting nodes is as high as 200.

RAPID-NO-GOSSIP, GOSSIP3 and TSENG also deliver high percentage of messages when

the number of broadcasting nodes is relatively small (about 50 nodes). Yet, when the num-

ber of broadcasting nodes increases and more messages are injected into the network, the

percentage of messages that RAPID-NO-GOSSIP, GOSSIP3 and TSENG deliver to all the

nodes decreases substantially. The reason for this degradation is the fact that when the

number of concurrent messages in the system is too high, many collisions occur causing

messages to be lost. Given that RAPID-NO-GOSSIP, GOSSIP3 and TSENG only employ

a probabilistic dissemination mechanism, they cannot recover these lost messages.

Interestingly, the gap between the reliability of RAPID-NO-GOSSIP and TSENG and

the reliability of GOSSIP3 grows as the number of broadcasting nodes is increased. This

is because RAPID-NO-GOSSIP and TSENG generates significantly fewer messages than

51


0 10 20 30 40 50 60 70 80 90 1000

0.05

0.1

0.15

0.2

0.25

0.3

#protocol=RAPID;#nodes=1000;#senders=100;length=3500 m

late

ncy

% nodes

beta−1.5beta−2.5beta−3.5beta−4.5beta−5.5beta−6.5beta−7.5

Figure 4.8: Latency to deliver a message toX% of the nodes when all nodes are mobile

vs. varying values of β (with 100broadcasting nodes)

X% l drced xiardl xefg` onf :4.8 xei`zelzk ,miciip miznvd lk xy`k miznvdnmigley miznv 100 xy`k) mipey β ikxra

(zeycg zerced

0 1 2 3 4 5 6 7 80

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Mobility=RandomWaypoint;#nodes=1000;#senders=100;length=3500 m

#lat

ency

−98

beta

RAPID−98RAPID−NO−GOSSIP−98

Figure 4.9: Latency to deliver a message to98% of the nodes when all nodes are mobile

vs. varying values of β (with 100broadcasting nodes)

98% l drced xiardl xefg` onf :4.9 xei`zelzk ,miciip miznvd lk xy`k miznvdnmigley miznv 100 xy`k) mipey β ikxra

(zeycg zerced

0 20 40 60 80 100 120 140 160 180 2000

10

20

30

40

50

60

70

80

90

100Mobility=RandomWaypoint;#nodes=1000;length=3500 m

%re

ceiv

ed m

essa

ges

#senders

RAPIDRAPID−NO−GOSSIPGOSSIP3TSENG

Figure 4.10: Message delivery ratio when allnodes are mobile vs. varying number of

broadcasting nodeslk lv` elawzdy zerced jqn feg` :4.10 xei`xtqna zelzk ,miciip miznvd xy`k miznvd

zeycg zerced mixcynd miznvd

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

7x 10

6 Mobility=RandomWaypoint;#nodes=1000;length=3500 m

#tra

nsm

issi

ons

#senders


Figure 4.11: Network load in terms of totalnumber of transmissions when all nodes aremobile vs. varying number of broadcasting

nodesmiznvd xy`k zyxd lr qner :4.11 xei`mixcynd miznvd xtqna zelzk ,miciip

zeycg zerced

52


0 10 20 30 40 50 60 70 80 90 1000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45Mobility=RandomWaypoint;#nodes=1000;#senders=100;length=3500 m

late

ncy

% nodes



(with 100 broadcasting nodes)X% l drced xiardl xefg` onf :4.12 xei`

100 xy`k) miciip miznvd lk xy`k miznvdn(zeycg zerced migley miznv

0 20 40 60 80 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16Mobility=RandomWaypoint;#nodes=1000;#senders=100;length=3500 m

late

ncy

% nodes

RAPID−0RAPID−50RAPID−100RAPID−200


vs. varying number of selfish nodes (with 100broadcasting nodes)

X% l drced xiardl xefg` onf :4.13 xei`zelzk ,miciip miznvd lk xy`k miznvdn

miznv 100 xy`k) miikep`d miznvd xtqna(zeycg zerced migley

0 20 40 60 80 100 120 140 160 180 2000

10

20

30

40

50

60

70

80

90

100

#nodes=1000;length=3500 m

%re

ceiv

ed m

essa

ges

#senders

RAPID−MOBILERAPID−STATICGOSSIP3−MOBILEGOSSIP3−STATICTSENG−MOBILETSENG−STATIC

Figure 4.14: Message delivery ratio vs.varying number of broadcasting nodes

(compare protocols both in static and mobileenvironments)

lk lv` elawzdy zerced jqn feg` :4.14 xei`mixcynd miznvd xtqna zelzk miznvdxy`k milewehext z`eeyd) zeycg zerced

(migiipe miciip miznvd

0 20 40 60 80 100 120 140 160 180 2000

2

4

6

8

10

12

x 106 #nodes=1000;length=3500 m

#tra

nsm

issi

ons

#senders

RAPID−MOBILERAPID−STATICGOSSIP3−MOBILEGOSSIP3−STATICTSENG−MOBILETSENG−STATIC

Figure 4.15: Network load in terms of totalnumber of transmissions vs. varying numberof broadcasting nodes (compare protocolsboth in static and mobile environments)xtqna zelzk zyxd lr qner :4.15 xei`

z`eeyd) zeycg zerced mixcynd miznvd(migiipe miciip miznvd xy`k milewehext

53


0 100 200 300 400 500 600 700 800 900 10000

10

20

30

40

50

60

70

80

90

100

Mobility=Static;#senders=100;length=2500 m

%re

ceiv

ed m

essa

ges

#nodes

RAPID−100RAPID−NO−GOSSIP−100GOSSIP3−100TSENG−100

Figure 4.16: Message delivery ratio when allnodes are static vs. varying density (with 100

broadcasting nodes)lk lv` elawzdy zerced jqn feg` :4.16 xei`zetitva zelzk ,migiip miznvd xy`k miznvd

zerced migley miznv 100 xy`k) miznvd(zeycg

0 100 200 300 400 500 600 700 800 900 10000

1

2

3

4

5

6

7x 10

6 Mobility=Static;#senders=100;length=2500 m

#tra

nsm

issi

ons

#nodes

RAPID−100RAPID−NO−GOSSIP−100GOSSIP3−100TSENG−100

Figure 4.17: Network load in terms of totalnumber of transmissions when all nodes are

static vs. varying density (with 100broadcasting nodes)

miznvd xy`k zyxd lr qner :4.17 xei`100 xy`k) miznvd zetitva zelzk ,migiip

(zeycg zerced migley miznv

GOSSIP3 and therefore there are fewer collisions. Recall that the rebroadcasting proba-

bility of GOSSIP3 is fixed at 0.65. Conversely, in RAPID-NO-GOSSIP, (and RAPID) the

rebroadcasting probability is set to the minimal number required to ensure continued dis-

semination with high probability, depending on the number of observed neighbors of each

node. Practically, with this specific network density, in our protocol the rebroadcasting

probability is close to 0.35. This can also be observed when looking at the total number of

transmissions, which is reported in Figure 4.11.

We can also observe in Figure 4.11 that RAPID sends more messages than RAPID-NO-

GOSSIP and TSENG, in order to overcome the collisions and message loss. Hence, the

decision of whether to use RAPID or RAPID-NO-GOSSIP (or TSENG) can be made based

on the tradeoff between reliability and load for a given application.

Figure 4.12 explores the latency to deliver messages to a varying percentage of the nodes

when the number of broadcasting nodes is 100. As can be seen by the graphs, GOSSIP3 is

significantly faster than all other protocols. Yet, GOSSIP3 delivers messages only to 95%

of the nodes, while RAPID delivers the messages to 99.6% of the nodes within 0.15 seconds,

which is good enough for most envisioned applications of MANET. In the famous “no

54


free lunch” analogy, RAPID trades off latency (but still keeps it reasonable) for increased

reliability and reduced message overhead. As expected, RAPID is much faster than TSENG

due to the fact that the timeout between broadcasts of nodes in RAPID is smaller than the

timeout of broadcast in TSENG as it was explained in 4.3.2 and the recovery of missing

messages in RAPID is faster than the completion protocol in TSENG. Finally, RAPID

(with gossip) is faster than RAPID-NO-GOSSIP due to the recovery protocol that it runs

in parallel to probabilistic dissemination.

Impact of Mobility

Figures 4.14 and 4.15 explore the impacts of mobility. We have run simulations while varying

the speed of nodes (from 1 to 10 meters/sec) and discovered that the results are qualitatively

the same. Thus, we only present the results when the speed of nodes was between 1 and

5 meters/sec and when all the nodes are static. As can be seen in Figures 4.14 and 4.15,

when nodes are mobile, the performance of RAPID (in terms of delivery ratio and number

of transmitted messages) is slightly better than when all nodes are static. This is because

with mobility, the information about messages propagates faster to all areas of the network.

Additionally, when a node moves, its chances of overhearing a message in one of the visited

locations are higher than when it stays in the same place. Finally, when nodes move, they

appear to be in more neighborhoods, which slightly reduces the retransmission probability.

Network Density

Figures 4.16 and 4.17 explore the delivery ratio and the number of transmissions against the

density of the nodes. We can see that when the number of nodes is 200 and the network size

is 2500x2500 m2 (the average density is about 4 nodes per neighborhood), RAPID with 100

broadcasting nodes delivers all messages to 52.4% of the nodes, while GOSSIP3 delivers all

messages to 38.04% of the nodes and TSENG delivers all messages to 42.5% of the nodes.

These results are explained by the very poor network connectivity. We can also see that

when the number of nodes is 400 (the average density is about 8 nodes), GOSSIP3 with 100

broadcasting nodes delivers all messages to 94.9% of the nodes, Tseng delivers all messages

to 95.4% of the nodes, and the delivery ratio of RAPID is above 98%.

55


Interestingly, this echoes the results of [102]. Moreover, we know from Gupta and

Kumar’s connectivity bound for ad hoc networks [55] that the networks’ connectivity is

ensured with high probability when r ≥ a

√C ln(n)

n , with r being the transmission range,

a the length of the network area, C is a constant such that C > 1π , and n the number of

nodes. Recall that in our case, r = 200 and a = 2, 500. With these numbers, we get that

for n = 200, the network is not likely to be connected, but for n = 400, the network is

already connected. Hence, with n = 200, no protocol can achieve high delivery ratios, yet

with n = 400, good reliability can already be obtained.

When looking at the total number of transmissions in Figure 4.17, we can observe

that RAPID scales much better than GOSSIP3 with the density of the network. The

number of transmission is almost constant (slightly increasing mainly due to the gossip

messages and increased collisions) due to the fact that RAPID tunes its rebroadcasting

probability based on the number of observed neighbors. This validates our theoretical

analysis in Section 4.2.1. RAPID achieves a slightly better delivery ratio than RAPID-

NO-GOSSIP. Yet, if the number of messages is more important, we may use RAPID-NO-

GOSSIP that sends even less messages than RAPID (this is since RAPID-NO-GOSSIP

tunes its rebroadcast probability according to the number of observed neighbors just like

RAPID, yet it does not gossip about the existing messages).

Selfish Nodes

Figure 4.13 explores the latency to deliver a message to X% of the nodes when the total

number of nodes in the system is 1,000 and some nodes are selfish, i.e., refuse to rebroadcast

messages. In this graph, we use the notation RAPID-Y to indicate that RAPID was run

with Y selfish nodes. Surprisingly, the latency does not grow with the number of selfish

nodes. This is since on one hand selfish nodes do not rebroadcast other’s messages, but on

the other hand they do not send gossip messages and therefore cause fewer collisions. We

can also see that even when the number of selfish nodes is 200 (20% of all nodes), RAPID

delivers the messages to 98.99% of the nodes within 0.14 seconds. We would like to point

out that by fine tuning the rate of gossips and the other timers in the system, it is possible

to reduce the quantitative latency numbers even further.

Table 4.1 presents the delivery ratio and the message overhead in mobile networks for

56


# Selfish Delivery ratio Message overhead

0 99.61% 4200144

50 99.57% 4071811.7

100 99.55% 4006395.7

200 98.99% 3816120.6

Table 4.1: Delivery ratio and message count vs. the number of selfish nodes (with 100broadcasting nodes)

xtqna zelzk zyxd lr qnere miznvd lk lv` elawzdy zerced jqn feg` :4.17 xei`(zeycg zerced migley miznv 100 xy`k) miikep`d miznvd

varying numbers of selfish nodes. We can see that the number of selfish nodes hardly

influences the delivery ratio of RAPID, which consistently delivers more than 99% of the

messages to all nodes. Interestingly, we also notice that the message overhead becomes

smaller as the number of selfish nodes increases. One could expect that the message overhead

should increase with the number of selfish nodes. In particular, the protocol must send more

REQUEST messages for recovering missing messages that were not rebroadcasted by selfish

nodes. However, selfish nodes do not send gossip messages. This reduces both the number

of retransmissions and the number of message collisions. Hence, overall, this results in a

reduced number of message transmissions.

57


58


Chapter 5

Overlay Based Reliable Broadcast

in Wireless Ad-Hoc Networks

5.1 System Model and Definitions

We assume the same model as Section 3.1. We also assume an abstract entity called

an overlay, which is simply a collection of nodes. Nodes that belong to the overlay are

called overlay nodes. Nodes that do not belong to the overlay are called non-overlay nodes.

In the following, OVERLAY refers to the set of nodes that belong to the overlay and

OLt(1, p) ≡ N t(1, p)⋂

OVERLAY (the neighbors of p that belong to the overlay in time t).

Later in this chapter we give examples of a couple of known overlay maintenance protocols

that we adapted to our environment.

5.2 Failure Detectors and Nodes’ Architecture

We assume that each node is equipped with three types of failure detectors, Mute, Ver-

bose, and Trust as defined in Section 3.3 (see also illustration in Figure 5.1). The Trust

failure detector collects the reports of Mute and Verbose, as well as detections of messages

with bad signatures and other locally observable deviations from the protocol. In return,

Trust maintains a trust level for each neighboring node. This information is fed into the

overlay, as illustrated in Figure 5.1. As we describe later in the chapter, the information

59


appl

FD interceptorR

eliable Broadcast M

echanism

VERBOSE MUTE

TRUST

network

multicastoverlay

Figure 5.1: A node’s architectureznev ly dxehwhikx` :5.1 xei`

obtained from Trust is used to ensure that there are enough correct nodes in the overlay

so that the correct nodes of the overlay form a connected graph and that each correct node

is within the transmission disk of an overlay node that does not exhibit detectable malicious

behavior.

Interval Failure Detectors

Since the specification of eventual failure detectors requires the accuracy property to hold

from some point on forever [27], they are not practical in a real long running system. Hence,

we present a new type of failure detectors called Interval failure detector. We define Imute

as the class of failure detectors that detect mute failures that occur during special intervals

called mute intervals for the duration of an interval called suspicion interval. Formally, the

Imute failure detector is defined by two properties:

Interval Strong Accuracy: Non-mute processes are not suspected by any other correct

process during a certain interval that we call suspicion free interval.

Interval Local Completeness: Every process p that suffers a mute failure with respect to

a correct process q during a mute interval is suspected by q during a suspicion interval.

In a similar manner to Imute, we define Iverbose as a class of failure detectors that detect

verbose failure. In Section 5.6 we show that if the failure detector belongs to an interval

60


failure detector class, then during periods of good connectivity messages will be disseminated

fast (i.e., via the overlay nodes).

5.3 The Broadcast Problem

Intuitively, the broadcast problem states that a message sent by a correct node should

usually be delivered to all correct nodes. We capture this by the eventual dissemination

and the validity properties. Eventual dissemination specifies the ability of a protocol to

disseminate a message to all the nodes in the system. Validity specifies that when a correct

node accepts a message, then this message was indeed generated by the claimed originator.

Formally, we assume a primitive broadcast(p,m) that can be invoked by a node p

in order to disseminate a message m to all other nodes in the system, and a primitive

accept(p,q,m) in which a message claimed to be originated by q is accepted at a node p.

Eventual dissemination: If a correct node p invokes broadcast(p,−) infinitely often,

then eventually every correct node q invokes accept(q,p,−) infinitely often.1

Validity: If a correct node q invokes accept(p,q,m) and p is correct, then indeed q invoked

broadcast(p,m) beforehand. Moreover, for the same message m, a correct node p can

only invoke accept(p,q,m) once.

5.4 The Dissemination Protocol

Our protocol includes three concurrent tasks. First, messages are disseminated over the

overlay by the overlay nodes. Second, signatures about sent messages are gossiped among

all nodes in the system. This allows all nodes to learn about the existence of messages

they did not receive either due to collisions or due to a malicious behavior by an overlay

node. When a node p discovers that it misses a message following a gossip it heard from q,

then p requests the missing message from q as well as from its overlay neighbors. The third

and final task is the maintenance of the overlay, whose goal is to ensure that the evolving

overlay indeed disseminates messages to all correct nodes. Note that the dissemination and1Clearly, with this property it is possible to implement a reliable delivery mechanism. In order to bound

the buffers used by such a mechanism, it is common to use flow control mechanisms.

61


recovery tasks are independent of the overlay maintenance. At any event, for performance

reasons, most overlay maintenance messages can be piggybacked on gossip messages.

The pseudo-code of the main protocol appears in Figures 5.2 and 5.3 and is described in

detail in Section 5.4. These figures use two primitives. The primitive broadcast denotes a

broadcast of a message with a given TTL value, i.e., it reaches through flooding all nodes in

the corresponding hop distance from the sender. The primitive lazycast initiates periodic

broadcasting of the given message only to the immediate neighbors of the sender. The

overlay maintenance is described in Section 5.5.

5.4.1 The Dissemination Task in Detail

Dissemination consists of the following steps (described from the point of view of a node

p): (1) The originator p of a message m sends m||sig(m) to all nodes in N t(1, p). The

header part of m includes a sequence number and the identifier of the originator (Line 3 in

Figure 5.2). (2) The originator p of m then gossips sig(m) to all nodes in N t(1, p) (Line 4).

(3) When a node p receives a message m for the first time, p first verifies that sig(m)

matches m. If it does, then p accepts m. If the node that sent m is not an originator of m

and is not in OLt(1, p), p instructs its Mute failure detector to expect a transmission of m

by any of its overlay neighbors. Moreover, if p is also an overlay node, then p forwards m

to all nodes in N t(1, p) (Lines 5–13). However, if m does not fit sig(m), then m is ignored

and the process that sent it is suspected by the Trust failure detector (Lines 22–24). (4)

If a node p receives a message m it has already received beforehand, then m is ignored.

5.4.2 Gossiping and Message Recovery in Detail

Intuitively, the idea here is that nodes gossip about messages they received (or sent) to

all their neighbors. This way, if a node hears a gossip about a message that it has never

received, it can explicitly ask the message both from its overlay neighbor and from the node

from which it received the gossip. If any of the contacted nodes has the message, it forwards

it to the requesting node. Messages can be purged either after a timeout, or by using a

stability detection mechanism. In this work, we have chosen to use timeout based purging

due to its simplicity.

62


Upon send(msg) by application do(1) message := msg id||node id||msg||sig(msg id||node id||msg);(2) gossip message := msg id||node id||sig(msg id||node id);(3) broadcast(message,DATA,ttl=1);(4) lazycast(gossip message,GOSSIP,ttl=1);

Upon receive(message,DATA,ttl) sent by pj do(5) if (have not received this message before) then(6) if (authenticate-signature(message) = true) then(7) Accept(pi,pj ,message) /* forward it to the application */;(8) if (pj /∈ OV ERLAY and pj is not originator of message) then(9) /* The correct message was received (but not from the overlay node) */(10) Mute.expect(message.header,OLt(1,current node),ANY);(11) endif;(12) if (current node ∈ OV ERLAY ) then(13) broadcast(message,DATA,1);(14) else /* the message is correct and I am not in the overlay */;(15) if (ttl = 2) then(16) broadcast(message,DATA,ttl-1);(17) endif;(18) endif;(19) if (already received a gossip message about message before) then(20) lazycast(gossip message,GOSSIP,ttl=1);(21) endif;(22) else/* the message is not correct */;(23) Trust.suspect(pj ,“bad signature reason”); /* notify the trust failure detector */(24) endif;(25) endif;

Upon receive(gossip message,GOSSIP,ttl) sent by pj : do(26) if (authenticate-signature(gossip message) = true) then(27) if (there is no message that fits the gossip message) then(28) Mute.expect(gossip message.header,pj ,ANY);(29) if (pj is not originator of message that fits the gossip message) then(30) /* The node asks from the node that sent the gossip message and from overlay nodes to */(31) /* send the real message */ ;(32) broadcast(gossip message,REQUEST MSG,ttl=1,pj);(33) endif;(34) else /* the message that fits the gossip message was received */ ;(35) if (gossip message have not been sent yet) then(36) lazycast(gossip message,GOSSIP,ttl=1);(37) endif;(38) endif;(39) else/* the message is not correct */;(40) Trust.suspect(pj ,“bad signature reason”);(41) endif;

Figure 5.2: Malicious Resilient Dissemination Algorithmzeipecf zelitp ipta cinr xy` dvtd ly mzixebl` :5.2 xei`

63


Upon receive(missing message,REQUEST MSG,ttl,pk) sent by pj do(42) if (authenticate-signature(missing message) = true) then(43) if (current node ∈ OV ERLAY or current node = pk) then(44) if (message that matches missing message was received) then(45) if (current node ∈ OV ERLAY ) then(46) Verbose.indict(pj);(47) endif;(48) broadcast(message,DATA,ttl=1,pj);(49) else /* the message that fits the gossip message was not received */;(50) if (pj is not originator of the message that matches missing message) then(51) if (current node ∈ OV ERLAY ) then(52) broadcast(missing message,FIND MISSING MSG,2,pk);(53) endif;(54) else(55) Verbose.indict(pj);(56) endif;(57) endif;(58) endif;(59) else/* the message is not correct */;(60) Trust.suspect(pj ,“bad signature reason”);(61) endif;

Upon receive(missing message,FIND MISSING MSG,ttl,pk) sent by pj do(62) if (authenticate-signature(missing message) = true) then(63) if ( message that matches missing message was not received) then(64) if (ttl = 2) then(65) broadcast(missing message,FIND MISSING MSG,ttl-1);(66) endif;(67) else /*message that matches missing message was received)*/(68) if(current node ∈ OV ERLAY or current node = pk) then(69) if (pj ∈ N t(1, current node)) then(70) if(current node ∈ OV ERLAY ) then(71) Verbose.indict(pj);(72) endif;(73) broadcast(message,DATA,1);(74) else(75) broadcast(message,DATA,2);(76) endif;(77) endif;(78) endif;(79) else/* the message is not correct */;(80) Trust.suspect(pj ,“bad signature reason”);(81) endif;

Figure 5.3: Malicious Resilient Dissemination Algorithm – continuedjynd -- zeipecf zelitp ipta cinr xy` dvtd ly mzixebl` :5.3 xei`

64


Additionally, there are several mechanisms in place to overcome malicious failures (in

addition to signatures that detect impersonations). In order to prevent a malicious overlay

node from blocking the dissemination of a message, searching a missing message can be

initiated by limited flooding with TTL=2, which ensures that the recovery request will

reach beyond a single malicious overlay node. This, in addition to requesting the message

from the process that gossiped about its existence. Also, when a node feels that it has

received a request for a missing message too often, or that such a request is unjustified, it

notifies its Verbose failure detector about it.

More accurately, the gossiping and message recovery task is composed of the following

subtasks:

1. When a node p receives a gossip header(m) for a message m it has already received

before, then p gossips header(m) to other nodes in N t(1, p) (Lines 34–38). Otherwise,

p ignores such gossips. In particular, p only gossips about messages it has already

received and does not forward gossips about messages it has not receive yet. This is

done in order to make the recovery process more efficient, and in order to help detect

mute failures more accurately.2

2. When p receives a gossip header(m) for a message m it has not received, p asks its

overlay neighbors and the sender q of the gossip to forward m to it using a RE-

QUEST MSG message. p also instructs its Mute failure detector to expect a trans-

mission of m by q (Lines 27–33). Intuitively, since q gossiped about m, it should have

m and supply it when needed. If q gossips about messages that do not exist or q does

not want to supply them, it will be suspected.

3. When an overlay node p receives a REQUEST MSG for the same message m too

many times from the same node q, it causes p’s Verbose failure detector to suspect

q (Lines 43–47 in Figure 5.3).

4. When an overlay node p receives a REQUEST MSG for the message m, yet p has

not received m, then p sends a FIND MISSING MSG message to nodes in OLt(2, p)

2It is possible to piggyback the first gossip of a message by the sender and by overlay nodes on theactual message. This saves one message and makes the recovery of messages a bit faster, since gossipsabout messages advance slightly faster this way. For clarity of presentation, we separate these two types ofmessages in the pseudo-code.

65


asking them to retransmit m (Lines 49–53). (The message is sent to overlay nodes at

distance 2 in order to bypass a potential neighboring malicious node.) Intuitively, if

p receives a REQUEST MSG from q for a message m, and p does not have m, then

it means that some neighbor r of q has gossiped header(m) to q. Therefore, at least

one node in N t(1, q) has m. Since real messages are broadcasted by the overlay nodes

faster than the gossips on these messages, it means that m is missing and therefore p

asks nodes in OLt(2, p) to retransmit m.

5. When an overlay node p receives a FIND MISSING MSG message for m from a node q

and p has m, then p first broadcast m to q. If q ∈ N t(1, p), then p notifies its Verbose

failure detector about it (Lines 67–73). Intuitively, if q is p’s neighbor and p is an

overlay node that has m, then p has broadcasted m to its neighbors and therefore q

should have m.

6. When a non-overlay node p receives a FIND MISSING MSG message for m (it gos-

siped about) from a node q, p broadcasts m to q (Lines 67–73).

5.5 Overlay Maintenance

Overlay maintenance is executed by a distributed protocol. There is no global knowledge

and each node must decide whether it considers itself an overlay node or not. Thus, the

collection of overlay nodes is simply the set of all nodes that consider themselves as such. At

the same time, every correct overlay node periodically publishes this fact to its neighbors,

so in particular, each overlay node eventually knows about all its correct overlay neighbors.

The goal of the protocol is to ensure that indeed the overlay can serve as a good backbone

for dissemination of messages. This means that eventually between every pair of correct

nodes p and q there will be a path consisting of overlay nodes that do not exhibit externally

visible malicious behavior. At the same time, for efficiency reasons, the overlay should

consist of as few nodes as possible.

For scalability and resiliency reasons, we are interested in a self-stabilizing distributed

algorithm in which every node decides whether it participates in the overlay based only on

the knowledge of its neighbors. Recall that the neighbors of p are the nodes that appear in

66


the transmission disk of p. Thus, p can communicate directly with them and every message

p sends is received by all of them.

In order to enable nodes to decide locally if they should become overlay nodes, we need

some deterministic symmetry breaking rule. In this work we utilize the overlay maintenance

protocols of [44]. The work of [44] defined the goodness number as a generic function that

associates each node with a value taken from some ordered domain. The goodness number

represents the node’s appropriateness to serve in the overlay. This way, it is possible to

compare any two nodes using their goodness number and to prefer to elect the one whose

value is highest to the overlay. Since in a malicious environment nodes can lie about their

goodness number, this becomes a useless criterion. Thus, we replace the notion of a goodness

number with the nodes id (which is unforgeable, by assumption).

Each node has a local status, which can be either active or passive; active means that

the node is in the overlay whereas passive means that it is not. The local state of each node

includes a status (active or passive), and its knowledge of the local states of all its neighbors

(based on the last local state they reported to it). Additionally every node p maintains a

variable overlay trust for each of its neighbors q, which can be either trusted, untrusted or

unknown; untrusted means that the Trust failure detector of p suspects q, unknown means

that the Trust failure detector of p does not suspect q but another neighbor of p that p

trusts reported to p that it suspects q, and trusted means that p has no reason to suspect

q. Also, p records for each neighbor the list of its active neighbors. We assume that overlay

maintenance messages are signed as well.

In order to ensure the appropriateness of the overlay, we need to verify that the overlay

includes alternatives to each detected mute or verbose node. Ideally, we would like to

eliminate these nodes from the overlay, but as they are malicious, they may continue to

consider themselves as overlay nodes. Thus, the best we can do is make sure that there is

an alternative path in the overlay that does not pass through such nodes, and that correct

nodes do not consider mute and verbose nodes as their overlay neighbors.

The protocol for deciding if a node should be in the overlay consists of computation steps

that are taken periodically and repeatedly by each node. In each computation step, each

node makes a local computation about whether it thinks it should be in the overlay or not,

and then exchanges its local information with its neighbors. For simplicity, we concentrate

67


below on the local computation steps only.

Additionally, if a node p receives a message from its neighbor q in which q reports that

it suspects a node r, then p changes r’s overlay trust to unknown, unless p already suspects

either q or r. This is done because a malicious node might be suspected only by some of

its neighbors. Therefore, a node that suspects one of its neighbors should notify its other

neighbors about this suspicion in order to preserve connectivity of correct nodes in the

overlay. Note that a malicious node may abuse this and cause its correct neighbors to join

the overlay. In other words, a malicious node can cause correct nodes to unnecessarily join

the overlay, but it cannot destroy the connectivity of the overlay w.r.t. correct nodes.

Our goal is to ensure that a node elects itself to the overlay if it has the highest identifier

among its trusted neighbors. Below, we mention a couple of overlay maintenance protocols

that realize this intuition (by making the goodness number of a node be its identifier).

Specifically, we have implemented two overlay maintenance protocols, namely the Con-

nected Dominating Set (CDS) and the Maximal Independent Set with Bridges (MIS+B)

of [44], augmented with trust levels (i.e., the overlay trust variable).3 Since other than

adding the trust level, the protocols are the same as in [44], we do not repeat them here.

5.6 Correctness Proof

Let us remind the reader that in Section 3.2 we assumed that there are enough correct nodes

so that non-malicious nodes form a connected graph. With this assumption, we prove the

following validity and eventual dissemination properties. We present proofs only for Mute

failure detector, since the proofs for Trust and Verbose failure detectors are trivial. For

this, we first introduce a few definitions.

1. gossip timeout - the time between two consecutive gossip messages by a correct node.

2. request timeout - the time between receiving a gossip message and sending a request

message.

3. rebroadcast timeout - the time between getting a request message and sending the

message that fits the requested message.3The CDS and MIS+B protocols in [44] are in fact self-stabilizing generalizations of the work of [122].

68


4. β - transmission time (the latency that takes a message to arrive to the receiver)

5. δ - the number of new messages that are injected to the network every second.

6. max timeout = gossip timeout + request timeout + rebroadcast timeout + 3× β

In the following, a pair of nodes p and q are well connected at time t if both are correct

and p ∈ N t(1, q) and q ∈ N t(1, p) during the time interval [t, t + max timeout]. We denote

this relation by WC(p, q, t). In order to ensure dissemination of messages, we assume the

following: starting with some time t′, for every t > t′, the graph induced by all pairs of

well connected nodes at time t is connected, and this graph includes all correct nodes in

the network.4 This can be seen as a refinement of the similar requirement in [33] to mobile

ad-hoc networks.

Theorem 5.6.1 The protocol satisfies the validity property.

Proof: According to the protocol, the originator of a message m adds a signature sig(m)

and then disseminates the message m||sig(m) to other nodes. Note that on receiving of

m||sig(m), every correct node checks if sig(m) corresponds to m before the node accepts

m. As a part of the model’s basic assumptions, a malicious node cannot forge signatures.

Therefore, no correct node will accept a message other than m as if it was m. Moreover,

according to the protocol, correct nodes filter duplicates of messages they have already

received.

Theorem 5.6.2 The protocol satisfies the eventual dissemination property.

Proof: We show that a message m that is sent infinitely often by a correct originator p

is disseminated to all the correct nodes. Assume, by way of contradiction, that there is a

message m that is not received by some correct process. Let k be the smallest number such

that there exists a correct node q ∈ N t(k, p) that does not receive m.

Recall that by assumption, during every time interval all the well connected nodes form

a connected graph that includes all correct nodes. Therefore there exists a correct node l

4One can weaken this requirement by saying that starting with some time t′, there are infinite times forwhich the graph induced by all pairs of well connected nodes is connected, and this graph includes all correctnodes in the network. The price of doing this is that the dissemination time of a message to all the nodeswill grow proportionally to the durations in which this graph is not connected.

69


∈ N t(k − 1, p) such that the distance between q and l is smaller than rl and l received m.

According to the protocol, l will send a gossip about m to its neighbors and if requested

by its neighbors, l will also send m. Thus, q will receive m either from its overlay node or

from l. This is a contradiction to the assumption about the minimality of k.

5.6.1 Protocol Analysis

In this section we compute a bound on the time required to disseminate a message to all the

nodes and present a limit on the size of buffers that every node must maintain to successfully

disseminate all the messages. In order to provide a bound on the dissemination time we

assume that messages do not collide.

The Dissemination Time:

In the following, CR(m, t) refers to the number of correct processes that received m by time

t.

Lemma 5.6.3 Let m be a message sent by some correct process p. Let t1 be the time in

which the first correct process received m. Let t2 be the time in which the last correct process

received m. During the interval [t1, t2], for all t3, t4 such that t2 ≥ t4 > t3 ≥ t1 and (t4 -

t3) ≥ max timeout, CR(m,t4) > CR(m, t3).

Proof: Assume by way of contradiction that the lemma does not hold. Therefore, there

are two neighboring nodes u and q such that q received the message m at time t and u does

not receive m by time t + max timeout. Recall our assumption that the graph induced

by well connected nodes is connected and includes all correct nodes. After getting the

gossip that fits the correct message m (that q received before), q broadcasts the gossip

to its neighbors after at most gossip timeout seconds. If its neighbors do not have it,

they send a request message after at most request timeout seconds and then after at most

rebroadcast timeout seconds q broadcasts the message to its neighbors. Therefore, after at

most max timeout seconds the message will be disseminated to u. A contradiction.

Theorem 5.6.4 Let m be a message sent by some correct process p at time t. Then all

correct nodes will receive m by time t + max timeout × (n− 1).

70


s

BB B B

Figure 5.4: Malicious overlayipecf zxeywz cly :5.4 xei`

Proof: According to Lemma 5.6.3, every max timeout seconds at least one node (that has

not received the message before) will get m. Since the graph of correct nodes is connected

and the number of correct nodes is at most n, all correct nodes will receive message m after

at most max timeout × (n− 1) seconds.

The dissemination time depends on the mobility of nodes. If all nodes are static, each

message will be disseminated to all the nodes in at most max timeout × n2 seconds, where

n is the number of nodes. The explanation for this bound is as follows: According to

Lemma 5.6.3, a node that has message m will broadcast it to its neighbors in at most

max timeout seconds. In the worst case, as illustrated in Figure 5.4, all nodes that belong

to the overlay are malicious and therefore all messages will be disseminated using the gossip-

request mechanism. Due to the assumption that the graph of correct nodes is connected,

the maximal number of hops in the network is n2 hops (in every hop there is one malicious

overlay node and one correct node). Therefore, the message should pass n2 hops and it

will take at most max timeout × n2 seconds. In mobile networks, each message will be

disseminated to all the nodes according to Theorem 5.6.4 within max timeout × (n − 1)

seconds.

Buffers Size:

The size of buffers that every node should have depends on the mobility of the nodes. If

all the nodes do not move, every node has to hold every message for max timeout seconds,

i.e., the time it takes to disseminate the message only to all its neighbors. Therefore, every

node in a static network should have a buffer of size max timeout × δ messages.

In mobile networks, every message should be kept until all the nodes receive the message.

As we showed, every message is disseminated to all the nodes within max timeout × (n−1)

seconds. Therefore, the buffer size of every node in mobile network should be max timeout

71


× (n− 1) ×δ messages.

In the following sections, we show that the messages are propagated fast (via the overlay

nodes) during certain periods if the failure detectors behave like eventually perfect failure

detectors or like interval failure detectors.

5.6.2 Fast Dissemination with Eventually Perfect Failure Detectors

In the following, we show that if the MUTE failure detector indeed belongs to ♦Pmute, then

eventually messages are disseminated to all correct nodes by the overlay. The significance of

this is that dissemination along overlay nodes is fast, since it need not wait for the periodic

gossip mechanism.

Lemma 5.6.5 Assume that the MUTE failure detector ∈ ♦Pmute. Then eventually the

non-mute overlay nodes form a connected graph COL such that every correct node is either

in COL, or within the transmission range of a non-mute node in COL.

Proof: Eventually, ♦Pmute of all correct nodes will suspect all the mute nodes. Thus, the

goodness number in the overlay maintenance protocol for mute nodes will be lower than

all other nodes. Consequently, the overlay built by the maintenance protocol will have the

desired property.

Theorem 5.6.6 Eventually, when there are no collisions, most messages propagate to all

the nodes via the overlay nodes, if the MUTE failure detector ∈ ♦Pmute.

Proof: In Lemma 5.6.5, we showed that eventually, the non-mute nodes of the overlay

form a connected graph that covers all non-mute nodes. Therefore, eventually, all messages

are propagated by overlay nodes to all correct nodes, which proves the theorem.

5.6.3 Fast Dissemination with Interval Failure Detectors

In this section we discuss the conditions under which our protocol implements Imute cor-

rectly. We show that, if the MUTE failure detector indeed belongs to Imute then during

periods of good connectivity messages are disseminated to all correct nodes by the overlay.

72


Observation 5.6.1 If suspicion interval ≥ f × mute interval then there will be an in-

terval CIi in which at least one overlay node in Ni will be correct. We call CIi a correct

interval for node i.

Observation 5.6.2 There exists a suspicion interval such that there are CI1, CI2, ...

CIn for which CI1 ∩ CI2 ∩ ... ∩ CIn 6= ∅.

Observation 5.6.3 In order to prevent false suspicions of the overlay nodes the mute interval

of the Imute failure detector should be larger than (n− 1)×max timeout.

Observation 5.6.4 If some correct node p decides that it is not in the overlay, then after

some finite time all of its correct neighbors know that p /∈ OV ERLAY . This immediately

follows from the protocols that maintain the overlay.

Let OL(p1,p2,tstart,tend) be a relation such that p1 believes that p2 ∈ OLt(1,p1) during

interval [tstart,tend].

Lemma 5.6.7 Every malicious overlay node q that is mute w.r.t. another node p during

an interval [t, t + mute interval] and satisfies OL(p,q,t,t+mute interval) will be suspected

by p during suspicion interval.

Proof: Let q be a malicious overlay node that does not forward the message m. Let

p be a correct node that satisfies OL(p,q,t,t+mute interval). We assume that q is not

forwarding m to p during mute interval and we will show that q will be suspected during

suspicion interval .

According to Theorem 5.6.4, all correct nodes will receive m after at most max timeout

× (n − 1) seconds. Therefore, according to the protocol, after receiving m, p will activate

its MUTE failure detector and if q is not forwarding messages during mute interval, it will

be suspected during suspicion interval by p.

Lemma 5.6.8 Non-mute processes are not suspected by some correct process during

suspicion free interval.

73


Proof: A non-mute non-overlay node p cannot be suspected, since according to the pro-

tocol a non-overlay node can be suspected by the MUTE failure detector only if it is not

forwarding a message m that it gossiped about. Since p is not mute, it will always forward

the message m and therefore will not be suspected. According to Observation 5.6.4, if p

leaves the overlay, then after some finite time that is smaller than mute interval all its

correct neighbors believe that p /∈ OV ERLAY . Therefore, p’s neighbors will not expect p

to broadcast messages and thus will not suspect it either.

Similarly, a non-mute overlay node p also cannot be suspected. This is because according

to the protocol, an overlay node can be suspected by the MUTE failure detector only if

it is not forwarding a message m that it received from another overlay node or from the

originator of m. Yet, since p is not mute, if p has m, it will always forward it. If p does not

have the message m, p will send a FIND MISSING MSG message and it will receive the

missing message m either from nodes that belong to N t(2, p) or after at most max timeout

× (n− 1) seconds (as we show in Theorem 5.6.4). Once p receives m, it will forward m to

its neighboring nodes and therefore p will not be suspected since mute interval > (n− 1)

× max timeout (according to Observation 5.6.3).

Lemma 5.6.9 Assume that the MUTE failure detector ∈ Imute. Then there is an interval

in which the non-mute overlay nodes form a connected graph such that every correct node

is either in the overlay, or within the transmission range of a non-mute overlay node.

Proof: Lemma 5.6.8 shows that non-mute nodes are not suspected and according to

Lemma 5.6.7 there is an interval such that Imute of all correct nodes suspects all mute

nodes. Thus, none of the mute nodes will be trusted. Consequently, the overlay built by

the maintenance protocol will have the desired property.

Theorem 5.6.10 If the MUTE failure detector ∈ Imute then there is an interval such that,

when there are no collisions, most messages propagate to all the nodes via the overlay nodes.

Proof: Lemma 5.6.7 shows that there is a certain interval when every mute overlay node

is suspected. In Lemma 5.6.9, we showed that there is an interval such that the non-mute

nodes of the overlay form a connected graph that covers all non-mute nodes. Therefore,

during a certain interval, all messages are propagated by overlay nodes to all correct nodes,

which proves the lemma.

74


0 0.5 1 1.5 2 2.5 3 3.5 40

10

20

30

40

50

60

70

80

90

100

%re

ciev

ed m

essa

ges

#messages sent per second

BDP(MIS)BDP(CDS)OVERLAY(MIS)OVERLAY(CDS)FLOODING

Figure 5.5: Message delivery ratio when allnodes are static

lk lv` elawzdy zerced jqn feg` :5.5 xei`migiip miznvd xy`k miznvd

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

2

2.5

3

3.5

4

4.5

x 105

#sen

t mes

sage

s


BDP(MIS)BDP(CDS)OVERLAY(MIS)OVERLAY(CDS)FLOODING

Figure 5.6: Network load in terms of totalnumber of messages sent when all nodes are

staticmigiip miznvd xy`k zyxd lr qner :5.6 xei`

5.7 Results

We have measured the performance of our protocol using the SWANS/JIST simulator [1]. In

the simulations, we have compared the performance of our protocol with the performance of

flooding on one hand and of simple dissemination along an overlay (without recovery of lost

messages). Here, flooding is an example of a very robust protocol against maliciousness, but

also very wasteful. At the other extreme, dissemination along an overlay without message

recovery is very efficient, but very unreliable as well. We have measured the percentage

of messages delivered to all nodes, the latency to deliver a message to all and to most of

the nodes, and the load imposed on the network. It is also important to note that our

performance measurements included the overhead of the overlay maintenance as well as the

gossip messages (although overlay maintenance are piggybacked on gossip messages).

In order to reduce the number of collisions, we have employed a staggering technique.

That is, each time a node is supposed to send a message, it delays the sending by a random

period of up to several milliseconds.

In the simulations, mobility was modelled by the Random-Waypoint model [65]. In this

model, each node picks a random target location and moves there at a randomly chosen

speed. The node then waits for a random amount of time and then chooses a new location

etc. In our case, the speed of movement ranged from 0.5-1.5 m/s, which corresponds to

75


0 20 40 60 80 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

late

ncy

(sec

)

%nodes

BDPOVERLAY

Figure 5.7: Latency to deliver a message toX% of the nodes when all nodes are static

(with 200 broadcasting nodes that send onemessage per second)

X% l drced xiardl xefg` onf :5.7 xei`200 xy`k) migiip miznvd lk xy`k miznvdn

(diipy lk zeycg zerced migley miznv

0 20 40 60 80 1000

0.5

1

1.5

2

2.5

3

late

ncy

(sec

)

%nodes

BDPOVERLAY

Figure 5.8: Latency to deliver a message toX% of the nodes when nodes are mobile

(with 200 broadcasting nodes that send onemessage per second)

X% l drced xiardl xefg` onf :5.8 xei`200 xy`k) miciip miznvd lk xy`k miznvdn

(diipy lk zeycg zerced migley miznv

walking speed. Also, the maximal waiting time was set to 20 seconds. Each simulation

lasted 5 minutes (of simulation time) and each data point was generated as an average of

10 runs. The transmission range was set to roughly 80 meters5 with a simulation area of

200x200 meters, the message size was set to 1KB (less than one UDP/IP packet), and the

network bandwidth to 1Mbps. In each simulation, two nodes were generating messages at

variable rates. We have run simulations with a varying number of nodes, but discovered that

with the exception of very sparse networks, the results are qualitatively the same. Thus, we

only present the results when the number of nodes is fixed at 200. In the graphs, we denote

the flooding protocol by FLOODING, our malicious resilient dissemination protocol by

BDP(MIS) and BDP(CDS) depending on the overlay mechanism used (see Section 5.5), and

by OVERLAY(MIS) and OVERLAY(CDS) the simple overlay dissemination mechanism

that has no message recovery. We limited the number of times each message is gossiped

to two. Additional gossip attempts slightly improve the delivery ratios, but at the cost of

additional messages. Finally, the main maliciouos behavior checked was of being mute, as

this has the most adverse affect on the performance of the system.

5In fact in SWANS one can choose the transmission power which translates into a transmission rangebased on power degradation and background noise.

76


0 0.5 1 1.5 2 2.5 3 3.5 40

10

20

30

40

50

60

70

80

90

100

%re

ciev

ed m

essa

ges


BDP(MIS)OVERLAY(MIS)FLOODING

Figure 5.9: Message delivery ratio when allnodes are mobile

lk lv` elawzdy zerced jqn feg` :5.9 xei`miciip miznvd xy`k miznvd

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

2

2.5

3

3.5

4

4.5

x 105

#sen

t mes

sage

s



Figure 5.10: Network load in terms of totalnumber of messages sent when nodes are

mobilemiciip miznvd xy`k zyxd lr qner :5.10 xei`

The results of the simulations in static networks with no malicious nodes are presented

in Figures 5.5, 5.6, and 5.7. As can be seen by the graphs, in this benign case, all protocols

obtain very high delivery rates. Essentially, in all protocols the latency to deliver a message

to all nodes remain well below 200ms. However, the load on the network of the flooding

protocol grows dramatically in the number of neighbors each node has (or in other words,

the density of the network). Thus, from an energy standpoint, flooding is much worse and

less scalable than the others. Due to the staggering we used, even the flooding approach

resulted in a relatively small number of collisions that were compensated for by its high

redundancy, which explains its high delivery ratios. However, with higher sending rates, it

is expected to perform much worse.

Since MIS+B and CDS performed almost the same, yet MIS+B is much more com-

putationally efficient, during the rest of the this work, we only present the results for the

MIS+B overlay. Figures 5.8, 5.9 and 5.10 present the simulation results for a mobile net-

work. Here, flooding continues to behave well in terms of delivery ratio and latency (and

bad in terms of network load). However, we start seeing a significant difference between our

dissemination protocol (BDP) and a simple dissemination with no gossip and no recovery of

messages (OVERLAY). While BDP maintains delivery rates close to flooding (and close to

77


0 2 4 6 8 10 12 140

10

20

30

40

50

60

70

80

90

100

%re

ciev

ed m

essa

ges

#faulty nodes


Figure 5.11: Message delivery ratio when allnodes are static vs. varying number of

malicious nodes (out of a total of 200 nodes)lk lv` elawzdy zerced jqn feg` :5.11 xei`xtqna zelzk ,migiip miznvd xy`k miznvd

(miznv 200 jezn) miipecfd miznvd

0 2 4 6 8 10 12 140

10

20

30

40

50

60

70

80

90

100

%re

ciev

ed m

essa

ges

#faulty nodes


Figure 5.12: Message delivery ratio whennodes are mobile vs. varying number of

malicious nodes (out of a total of 200 nodes)lk lv` elawzdy zerced jqn feg` :5.12 xei`xtqna zelzk ,migiip miznvd xy`k miznvd

(miznv 200 jezn) miipecfd miznvd

100%), without gossip the delivery rate drops to 40%. Generally speaking, all protocols de-

liver messages fast. However, OVERLAY only delivers message to about 40% of the nodes.

Also, in BDP the latency slightly grows for the last nodes proportionally to the frequency

of a single gossip exchange.

Figures 5.11 and 5.12 explore the delivery ratio of the different protocols with varying

number of malicious nodes. As can be seen, when no recovery mechanism is employed, the

delivery rate drops dramatically. On the other hand, both our protocol and the flooding

protocol maintain very high delivery rates. Interestingly, when nodes are mobile, the impact

of malicious nodes is weakened. This can be explained by the fact that the overlay adapts

itself to the evolving network topology. Thus, a malicious node does not necessarily remain

in the overlay throughout the execution.

Figures 5.13 and 5.14 explore the network load imposed by the different protocols as a

function of the number of malicious nodes. In the static case, the network load imposed

by BDP exhibit a linear increase with the number of malicious nodes. On the other hand,

the network load imposed by flooding slightly improves. This can be explained by the fact

that if malicious nodes avoid sending messages, then fewer messages are sent. As for the

dynamic case, here we also observe the interesting phenomenon that mobility improves the

78


0 2 4 6 8 10 12 140

1

2

3

4

5

6x 10

4

#sen

t mes

sage

s

#faulty nodes


Figure 5.13: Network load when all nodes arestatic vs. varying number of malicious nodes

(out of a total of 200 nodes)miznvd xy`k zyxd lr qner :5.13 xei`

jezn) miipecfd miznvd xtqna zelzk ,migiip(miznv 200

0 2 4 6 8 10 12 140

1

2

3

4

5

6x 10

4

#sen

t mes

sage

s

#faulty nodes


Figure 5.14: Network load when nodes aremobile with varying number of malicious

nodes (out of a total of 200 nodes)miznvd xy`k zyxd lr qner :5.14 xei`

jezn) miipecfd miznvd xtqna zelzk ,miciip(miznv 200

asymptotic behavior of the protocols. Again, this can be explained by the fact that the

overlay structure evolves with the network topology, making it “harder” for malicious nodes

to block message dissemination along the overlay.

Figures 5.15 and 5.16 explore the latency to deliver a message to X% of the nodes when

some nodes are malicious (out of 200 nodes and a sending rate of 1 message per second).

Clearly, the latency grows with the number of malicious nodes. Also, in the static malicious

case, almost all nodes receive the message in less than a second and only when there are

many malicious nodes, it may take several seconds to deliver a message to the last 20% of

the nodes. In the mobile case we see the same qualitative behavior, but the latency starts

growing beyond one second at 60% of the nodes. We would like to point out that by fine

tuning the rate of gossips and the other timers in the system, it is possible to dramatically

reduce the quantitative latency numbers. However, the important thing to note is that

with malicious nodes, without a best-effort recovery mechanism, it is almost impossible to

ensure reliable delivery just by retransmission. This is because without additional recovery

mechanism, the malicious nodes might collude to block all messages from reaching some

parts of the network.

79


0 20 40 60 80 1000

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

late

ncy

(sec

)

%nodes

BDP−0BDP−1BDP−2BDP−8BDP−14

Figure 5.15: Latency to deliver a message toX% of the nodes when all nodes are static vs.

varying number of malicious nodesX% l drced xiardl xefg` onf :5.15 xei`

zelzk ,migiip miznvd lk xy`k miznvdnmiipecfd miznvd xtqna

0 20 40 60 80 1000

1

2

3

4

5

6

7

8

9

10

late

ncy

(sec

)

%nodes

BDP−0BDP−1BDP−2BDP−8BDP−14

Figure 5.16: Latency to deliver a message toX% of the nodes when nodes are mobile vs.

varying number of malicious nodesX% l drced xiardl xefg` onf :5.16 xei`

zelzk ,miciip miznvd lk xy`k miznvdnmiipecfd miznvd xtqna

80


Chapter 6

Byzantine Resilient Group

Communication

In this chapter we outline Byzantine JazzEnsemble, a group communication system that

tolerate Byzantine failures. We have designed Byzantine JazzEnsemble to perform well in

the normal case, i.e., when no Byzantine failures occur, yet be resilient to them if they do.

Byzantine JazzEnsemble can serve as a building block in different collaborative applications,

as we show in Appendix A. Yet, in this work we address single-hop ad-hoc networks, while

enhancing Byzantine JazzEnsemble to multi-hop ad hoc networks is left for the future work.

6.1 Model, Assumptions and Problem Statement

6.1.1 Basic Concepts

We assume the standard group-communication/middleware enhanced distributed comput-

ing model. That is, we assume a collection of n nodes (also called processes), each with an

architecture similar to the one illustrated in Figure 6.1. In particular, a node includes an

application module, a group communication module, and a network module.

Physically, nodes can only communicate by sending and receiving messages over the

network. From a theoretical standpoint, the network itself can be modeled as being driven

by a scheduler that controls the timing in which messages are received and is also allowed

to drop messages. Furthermore, the scheduler may decide at each moment for any pair

81


Failure Detector

NET_SEND

SEND_DELIVER

NET_RECEIVE

application

network

Group Communication

VIEW

CAST_DELIVERCASTSEND

JOINLEAVE

NET_CAST

Figure 6.1: A node’s architectureznev ly dxehwhikx` :6.1 xei`

of nodes whether they are connected or disconnected. When a pair of nodes p and q are

connected, most messages sent between p and q are delivered by the scheduler within a

known bounded latency and while preserving their FIFO sending order. The few messages

that are delayed, dropped, or reordered when p and q are connected are chosen randomly and

in an oblivious manner to their content. Otherwise, the network is disconnected. We may

treat the connected and disconnected properties as relations; we assume that the scheduler

maintains symmetry and transitivity for these relations at all times. Thus, if p is connected

to q, then q is connected to p, and if there is a third process r that p is connected to, then

q and r are also connected.1

The application module executes a program that involves communicating with the ap-

plication modules at other nodes by exchanging messages with them. The group communi-

cation module is responsible for providing an abstract communication model that has much

stronger semantics than the network module. In particular, in this work we are interested

in the Byzantine tolerant version of the strong virtual synchrony model, which is defined

below.

As is typically done in distributed computing, each module of a node can be modeled

as an automaton. In this work, we are mainly concerned with the group communication

module. The automaton of this module accepts input events from the application module,

the network module, or timer events. In turn, the group communication performs some1In some networks, these relations may not be transitive, yet transitivity can be obtained by a peer-to-peer

routing protocol.

82


computation that may change its state, returns some output events to the application mod-

ule and network module, or set future timer events. The input events include send – a

request by the application to send a message to a specific node, cast – a request by the

application to send the same message to all nodes, net-receive – receiving a message from

the network, as well as membership related events that will be introduced below. The out-

put events are net-send and net-cast to the network, send-deliver and cast-deliver

to the application, as well as membership related events, as reported below. The actions

performed by the group communication layer in response to a given event are governed by

its specification, which is also known as a transition function in automata theory.

With this model, for a process pi, we can define a process history hi to be the sequence of

events occurring at pi. A collection of process histories, one for each process in the system,

is called an execution. We assume in this work that all executions are well formed in the

sense that in every execution σ, if the history of a process pi in σ includes a net-receive

event with some message m sent to by some process pj to pi (or to everyone in the case of

a broadcast), then the history hj includes a net-send or net-cast event with the message

m send to pi by pj (or to everyone in the case of a broadcast).

Each process has a local clock. The local clocks are not synchronized. However, we

assume the existence of a global clock that is known to external observers of the system.

For a given history h and two events e1 and e2, we denote by e1 →h e2 the fact that e1 is

ordered before e2 in h. Similarly, for a given execution σ, we denote by e1 →σ e2 the fact

that e1 occurred in the global time of σ before e2.

6.1.2 Byzantine Virtual Synchrony

We assume an abstract entity called a group. The application module of a node can invoke

a join event, indicating that it wishes to join a group, or a leave event, indicating that it

is no longer interested in the group. During the time interval between a join event and a

subsequent leave event, the node is said to be a member of the group. The collection of

correct nodes that are members of the group at a given time t is called the group membership

at time t.

The virtual synchrony model presents an abstraction to the application, in which peri-

odically the applications receives a view event; such an event reports an estimate for the

83


current membership. The view event includes a view ID and an ordered membership list; for

a view event v, we denote its view identifier by v.vid and the view membership by v.mbrs.

The Byzantine virtual synchrony model includes two aspects: The first relates the contents

of views delivered to applications in different nodes to one another and to reality. The

second places restrictions on message delivery within a given view. The formal definition

appears below (the definition does not explicitly address join and leave events for simplic-

ity). It is split into Byzantine view synchrony, which only addresses views, and Byzantine

virtual synchrony, which adds certain requirements about message agreement and reliable

delivery.

In the definitions below, we use the following notation for convenience: For any view

events v1 and v2 and history hi, we denote C(hi, v1, v2) the fact that v1 →hi v2 and there

does not exist a third view event v3 such that v1 →hi v3 →hi v2 (this corresponds to saying

that v1 and v2 are consecutive view events in hi).

Definition 6.1.1 (Byzantine View Synchrony) An execution σ is Byzantine view syn-

chronous if it obeys the following restrictions:

1. For every view event v that is included in a history hi of a correct process, pi ∈ v.mbrs.

2. For every history hi of a correct process and every two view events v1 →hi v2, we have

v1.vid < v2.vid.

3. For every two correct nodes pi and pj and any two view events vi ∈ hi and vj ∈ hj

for which vi.vid = vj .vid, we also have vi.mbrs = vj .mbrs.

4. For every two correct nodes pi and pj that from some point on in σ are continuously

connected, there is a point in hi from which all view events v in hi are such that

pj ∈ v.mbrs.

5. For every correct node pi, if from some point on in σ there is another node pj that is

always disconnected from pi, or pj crashes, then from some point on in hi, for every

view event v in hi we have pj 6∈ v.mbrs.

6. If two correct nodes pi, pj ∈ v1, and C(hi, v1, v2) and pj 6∈ v2, then some process pk

∈ v1 suspected pj in v1.

84


7. For any two correct nodes pi and pj and views v1 and v2, if C(hi, v1, v2) and pj ∈v1.mbrs ∩ v2.mbrs, then the history hj (of pj) also includes v1.

Intuitively, Items 1 and 2 are sanity checks that ensure that a node is included in its

own view and that view identifiers are monotonically increasing. Item 3 requires correct

processes to agree on the membership of joint views. Item 4 and 5 relate the membership

lists in views to aspire to resemble the true list of connected correct processes. Item 6 verifies

that a node is not removed from the view without being suspected by another node in the

view. This prevents spurious views from occurring and therefore eliminates possible useless

implementation in which singleton views are continuously installed. Finally, Item 7 requires

that if a node pj appears in two consecutive views of another node pi, then pj has at least

installed the first of these two views. Thus, each view also serves as a confirmation and

synchronization point w.r.t. the previous view. The main difference between this definition

and the benign version of View Synchrony that appears in [51] is that here we restrict the

behavior of correct processes (and not of the alive processes) and we separate between being

connected and being correct.

Notice that if a process pi is included in the membership list of some view v, i.e.,

pi ∈ v.mbrs, it does not automatically mean that pi has also installed this view, i.e., that

v ∈ hi. Furthermore, without some strong synchronization assumptions, the 5th item in

the definition of Byzantine view synchrony cannot be satisfied [26]. Rather than adding

such explicit assumptions to our model, we assume that each node is equipped with a failure

detector module, as in Figure 6.1. The failure detector at process pi may occasionally report

some other processes as suspected. These reports may be erroneous, but it is assumed that

the failure detector is bound in some ways about the mistakes it can make. It has been

previously shown in [30] that in benign failure models, Items 4 and 5 in the definition above

are equivalent to what is known as an eventually perfect failure detector, also denoted 3Pin the literature. At this point in the paper, we do not restrict the failure detector type, but

rather relax the 4th requirement. Specifically, we only require that for some parameter k, if

there exists a subset of nodes of size at least k such that the failure detectors of these nodes

never suspect each other from some point on, then eventually these nodes continuously

remain in each other’s views. The ratio between k and f (the number of Byzantine nodes)

may depend on the exact failure detector used and possibly also on the protocols chosen.

85


In most cases, one is likely to require that k is at least 3f + 1.

We would like to emphasize that the definition of Byzantine View Synchrony supports

what is known as partitionable membership model, in which there can be multiple concurrent

views of the same group. In particular, a process can join the group, yet still be partitioned

from the rest of the group, or in other words, have its own view of the group, at least for a

while. If the members of two such views become connected for sufficiently long, they should

merge and create a joint view (as called for by Item 4).

For our next and final definition, we introduce the following notation: We denote

I(hi, v1) the set of events ev such that ev appears in some history hi after a view v1

(v1 →hi ev) and there is no other view v2 for which v1 →hi v2 →hi ev (this corresponds to

saying that ev occurred in view v1 in hi). Finally, for a given message m and process pi

that sends m, we denote si(m) the corresponding send event at pi; similarly, for a message

m and process pi that receives m, we denote ri(m) the corresponding receive event at pi.

Definition 6.1.2 (Byzantine Virtual Synchrony) An execution σ is Byzantine virtu-

ally synchronous if it obeys the following restrictions:

1. σ is Byzantine view synchronous.

2. Let m be a message and si(m) and rj(m) be corresponding send and receive events at

correct processes pi and pj, respectively. If for some view v1 si(m) ∈ I(hi, v1), then

rj(m) ∈ I(hj , v1).

3. Let m be a broadcast message such that si(m) ∈ I(hi, v1) for some correct process pi

and view v1, and let v2 be a view such that C(hi, v1, v2). Then for each correct process

pj such that both v1 ∈ hj and v2 ∈ hj, we have rj(m) ∈ hj.

4. Let m be a broadcast message such that ri(m) ∈ I(hi, v1) for some correct process pi

and view v1, and let v2 be a view such that C(hi, v1, v2). Then for each correct process

pj such that v1 ∈ hj and v2 ∈ hj, we have rj(m) ∈ hj.

5. Let m1 and m2 be two messages sent by a process pi that is either correct, or crashes

during σ, but does not suffer any other Byzantine failure in σ. Furthermore, assume

si(m1) →hi si(m2) and both si(m1) ∈ I(hi, v1) and si(m2) ∈ I(hi, v1). Then if for

some correct process pj rj(m2) ∈ I(hj , v1), then rj(m1) ∈ I(hj , v1) as well.

86


Intuitively, Item 2 implies that a message can only be received in the same view in

which it was sent; Item 3 implies reliable delivery of messages sent by correct members that

remain in the same view; Item 4 implies agreement on which messages were received in a

terminating view; Item 5 implies no message omissions (or no FIFO holes even from crashed

processes). Here, again, the definition is similar to the benign case as it appears in [51],

with the exception that we only restrict the behavior of correct processes. An interesting

aspect of Item 4 is that a Byzantine process can send two distinct versions of the same

message to two different correct processes. This situation cannot occur in the benign failure

model. Ensuring that correct processes also agree on the content of a message is known as

uniform broadcast [86].

6.2 Overview of the Solution

Section 6.2.1 presents the architecture of JazzEnsemble, while the rest of the Section de-

scribes the adaption of JazzEnsemble to Byzantine environment.

6.2.1 JazzEnsemble and Fuzzy Membership

JazzEnsemble is an experimental variant of Ensemble. JazzEnsemble implements the ideas

of fuzzy group membership [43] and also supports various optimizations and protocol layers

that enable it to operate in ad-hoc networks, including, e.g., support for routing in ad-hoc

networks. Both Ensemble and JazzEnsemble have the same general architecture and the

same glue mechanism, and many of the layers of JazzEnsemble are simply taken as is from

Ensemble. The main differences are in a few layers that are related to ad-hoc networking

and to fuzzy failure detection, and to benefiting from fuzzy membership notifications.

The main architecture of Ensemble is nicely described in [58] while its security architec-

ture is described in [100]. A detailed discussion of the adaptations done in JazzEnsemble

to accommodate ad-hoc networks appears in [39]. The main aspects of JazzEnsemble that

are relevant as background for this work are those related to fuzzy membership. We thus

briefly repeat them here.

The idea of fuzzy membership is that rather than viewing membership as a binary

property, the system should maintain a fuzziness level for each view member. This indicates

87


the degree to which the corresponding member seems to be alive and responsive (i.e., low

fuzziness level is a good thing while high fuzziness is bad). The fuzziness level of each

member is made available to all the group communication system’s layers, and each can

utilize it in order to optimize its behavior w.r.t. nodes with high fuzziness level. With

this, it is possible to have long timeouts for failure detection (and view changes) without

compromising the performance of the system. At the same time, the fuzziness level is

hidden from the application, which continues to enjoy the relatively simple strong virtual

synchrony model.

To better understand how fuzziness levels help, consider for example the issue of flow

control [110]. Flow control restricts the number of messages (or bytes) that a sender can

send without hearing an acknowledgement, which is known as a sending window. This

prevents overflowing the network and the receivers’ buffers. The problem in multicast flow

control is that until a sender receives acknowledgements from all intended receivers, it

should not advance its sending window. With fuzzy membership, we modify this behavior

to allow the sender to advance its sending window as soon as all nodes with low fuzziness

level acknowledge the message. This way, we avoid pausing due to slow nodes, since the

fuzziness level of slow nodes is high.

Similarly, in order to ensure reliable delivery, nodes must keep messages they receive

for possible retransmission. In order to save buffer space, we can utilize fuzziness levels by

compressing messages that were already acknowledged by all members with low fuzziness.

As reported in [46], using similar principles, it is also possible to expedite view changes

in some cases, while offering replicated state machine semantics. Finally, we utilize the

fuzziness levels as unreliable failure detectors in our Byzantine consensus protocols, as

presented later in this paper.

JazzEnsemble supports the notion of fuzzy membership by adding a special event for

notifying about changes in fuzziness levels, by adding flags to existing events, and through

modification to the failure detection, flow control, reliable broadcast, and membership man-

agement layers. These changes are discussed in more detail in [39].

88


Protocol Stack

SENDER RECEIVER

Message

Header

Event

Protocol Layer

Figure 6.2: Message headers and data in layers (drawing taken from Ensemble’s referencemanual)

ly jixcnn gwlp xei`d) Ensemble ly zeaky jeza zercedd ly rcine zexzek :6.2 xei`(Ensemble

6.2.2 Fuzzy Mute and Fuzzy Verbose Failure Detectors

Let us note that the standard heartbeat based failure detection mechanism of non-Byzantine

tolerant group communication systems is not sufficient for overcoming Byzantine failures.

This is because a node can send heartbeats in a timely manner, yet otherwise behave in an

arbitrary manner.

When considering the structure of messages sent and manipulated by layered group

communication systems, it is clear that at each layer it is possible to identify a header part

and a data part. In particular, the header part includes the information added, manipulated

and verified by the layer. For example, the header for a layer that implements reliable FIFO

delivery often includes a message type, a sequence number for the message, and possibly

the sequence number of the last acknowledged message. Often, a given layer L is completely

unaware of headers belonging to lower layers in the stack, whereas the application data (in

application driven messages) as well as the headers added by higher layers are part of the

data as far as L is considered. See illustration in Figure 6.2.

Moreover, often such a layer L can expect to receive messages with known headers from

other nodes in the group. For example, consider a reliable FIFO delivery layer L at process

p that recently sent a message m to a node q. The layer L at p expects to see a message

from q that includes an acknowledgement for m within a given timeout. A failure by layer

89


L at p to see such a message from q is called a mute failure of q with respect to p. Another

example of a mute failure is a coordinator of a membership maintenance layer that fails

to generate a new view when expected by the other members. Additionally, often it is

possible to assume that a correct layer should not generate messages with certain headers

too frequently. For example, if the flow control restrict the rate of messages, then q should

not send messages faster than this limit. Similarly, there are situations in which a layer

L at p knows that a certain message header from q should not be received if q is correct.

As an example, consider an acknowledgement for a message that was not sent in a reliable

FIFO layer. We refer to such behavior as a verbose failure of q with respect to p.

Interestingly, a large percentage of Byzantine attacks against many layers are either

mute failures or verbose failures.2 Moreover, with the above observations, layered group

communication systems match perfectly the model proposed in Section 3.3. This suggests

replacing the standard failure detection mechanism with mute and verbose failure detectors.

That is, we add a component to the system that allows each layer to register statistics timers

and counters. Whenever an inappropriate mute or verbose behavior is noticed by some layer

L, the layer can invoke the corresponding method of the mute or verbose failure detector,

instantiated with the corresponding counter or timer, to record this misbehavior. Thus, in

order to cope with mute processes and verbose processes we use failure detectors that are

similar to Mute and Verbose failure detectors from Section 3.3.

Yet, similarly to the detection of crash failures in the benign failure model, when running

in a somewhat asynchronous system, it is hard, if not impossible, to find good timeouts for

deciding that a node is truly faulty. Being too eager would result in eliminating from the

view many legitimate members. On the other hand, being too lenient may result in serious

performance degradations. Thus, the solution we adopt is in the form of fuzzy mute and

fuzzy verbose failure detectors. That is, these failure detection modules maintain a fuzzy

mute level and a fuzzy verbose level for each group member. These fuzziness levels are

reported to all layers of the micro-protocol stack, and each layer can decide how to handle

members with high levels of muteness or verbosity. In particular, there is a suspicion layer

that initiates removal of nodes whose fuzzy mute or fuzzy verbose levels are above a given

threshold. In order to handle false detection caused by network overloads and short-lived

2Of course, a node can send a corrupt message, or try to impersonate another node. However, such abehavior can be trivially recognized by the cryptographic mechanism.

90


disconnections, we also reduce fuzziness levels using an aging mechanism. The interface of

these failure detectors is similar to the one that appears in Section 3.3.1.

6.2.3 Intra-View Reliable Delivery

Intra-view reliable delivery involves issues like flow-control to avoid congesting the network

or running receivers buffers, fragmenting and reassembling of messages that are larger than

UDP’s MTU, and ensuring reliable FIFO delivery of both point-to-point and broadcast

messages. There is also the issue of filtering messages sent from other views and preventing

admitting corrupted messages.3 Filtering bad messages (corrupt or from a different view)

is done at the lowest part of the system. It is obtained by indicating the view id on each

message and by signing it. If the message is corrupt, its digest will not fit its content, and it

will be dropped. Similarly, if the message was sent in a different view by a correct process,

it will be eliminated based on its view ID and not even reach any layer.

As for the layers that handle flow control and reliable delivery of messages, including

recovery of lost messages, these layers implement well known protocols. In particular, these

layers are almost the same in the Byzantine protocol stack of JazzEnsemble, the benign

protocol stack of JazzEnsemble, and Ensemble. The only differences are the ones related

to fuzzy mute and fuzzy verbose failures. The differences are fairly technical, and are

therefore dropped from this dissertation. For the rest of this work, we assume that the

system provides reliable delivery of messages within views, i.e., it satisfies all intra-view

requirements of Byzantine virtual synchrony. Below, we concentrate on the complementing

protocols that handle view changes, which together provide the overall required Byzantine

virtual synchrony semantics.

6.2.4 Byzantine Membership Maintenance

As in Ensemble (and in fact, this dates back to Horus [117]), a new node that tries to join

the system first establishes a singleton view with only itself in it. From that point on, the

membership protocols are responsible for merging concurrent views or eliminating faulty

3We use the term broadcast to mean sending the same message to all members of the view in which themessage was sent.

91


Periodically do(1) Mute.expect(HEARTBIT, p0...pn−1);(2) if (I AM COORD==TRUE) then(3) Periodically GOSSIP about the view to other nodes

(4) else(5) Mute.expect(GOSSIP, pcoord);(6) endif

Upon receive(view, GOSSIP MESSAGE) sent by pj do(7) Verbose.indict(pj);(8) if (pj is not suspected by FD) then(9) if (I AM COORD==TRUE) then(10) try to merge with another view(pj);

(11) else(12) Mute.expect(MERGE, pcoord);(13) endif;(14) endif;

Upon Trust.suspect(pi)do(15) Suspect Node(pi, FALSE);

Figure 6.3: Pseudo Code of Membership Protocolzexagd lewehext ly cew-ecaqt :6.3 xei`

nodes by establishing new views that exclude them. In particular, the goal of the member-

ship maintenance is to provide the Byzantine Virtual Synchrony model. This includes the

following aspects:

Eliminating Suspected Nodes

Each node in JazzEnsemble employs a local failure detection (Line 1, 5, 7 and 12 in Figure 6.3

and Line 30 in Figure 6.4) mechanism in order to suspect nodes that seem to be faulty, and

reports such suspicions to other nodes (Line 15 in Figure 6.3 and Line 1 in Figure 6.4). When

some nodes are suspected, JazzEnsemble tries to establish a new view without the suspected

nodes. However, in order to prevent Byzantine nodes from removing correct nodes from the

system, only nodes that are suspected by enough other nodes can be removed. Moreover,

we would like to ensure that only nodes that are agreed upon by the correct members of

the current view would be eliminated from the next view. This is obtained by utilizing a

Byzantine consensus protocol (Line 7, 14 in Figure 6.4, Line 3, Line 32 in Figure 6.5 and

Line 30 in Figure 6.6) whose details appear in Section 6.2.6. Handling suspicions is treated

92


procedure Suspect Node(pi, start byz consensus)(1) bcast (pi, SUSPECT);(2) if (have not suspected pi before) then(3) timer.set(pi,SUSPECT,t1); /* setting suspicion timer for t1 seconds*/(4) update suspicions vector and increase suspicions threshold counter;(5) endif(6) if ((suspicions threshold counter > k) OR (start byz consensus == TRUE)) then(7) Byzantine Consensus(suspicions vector);(8) endif

Upon receive(pi, SUSPECT) sent by pj do(9) increase suspicions counter for pi if we have not received this message from pj before;(10) if ((suspicions counter for pi ≥ f + 1) AND (we do not suspect pi)) then(11) Suspect Node(pi, FALSE);(12) endif

Upon timer.expire(pi, SUSPECT) do(13) if (have not started Byzantine Consensus) then(14) Byzantine Consensus(suspicions vector);(15) endif

procedure handle BYZANTINE CONSENSUS DECISION Event(msg)(16) if (I am suspected) then(17) create a singletone view;(18) return;(19) endif(20) cr := view id mod number of non suspected nodes ;(21) coord := min(i):{i ≥ cr AND msg[i] = 0 };(22) if (I AM COORD==TRUE) then(23) init FLUSH Protocol();(24) else /* I am not the coordinator of the group */(25) timer.set(coord,FLUSH,t2); /* waiting for FLUSH message from coordinator*/(26) endif

Upon timer.expire(coord, NEW VIEW) do(27) if (I AM COORD==FALSE) then(28) Suspect Node(coord, TRUE);(29) else if (VIEW CAUSED BY MERGE == TRUE AND I AM COORD==TRUE) then(30) Verbose.indict(big group coord);(31) new view msg := Create New V iew Message();(32) Uniform Broadcast(new view msg);(33) endif

procedure handle NEW VIEW Event(new view)(34) if (new view contains a correct new view) then(35) Install(new view); /* Installing new View*/(36) else /* the new view is not correct */(37) if (I AM COORD==FALSE) then(38) Suspect Node(pcoord, TRUE);(39) else /* I am the coordinator of the view */(40) create a singletone view;(41) endif;(42) endif;

Figure 6.4: Pseudo Code of Suspicion Protocolzecygd lewehext ly cew-ecaqt :6.4 xei`

93


in Lines 9–12 in Figure 6.4. Specifically, whenever a node pi locally suspects another node

pj , e.g., the fuzzy muteness or fuzzy verbosity levels of pj surpass a certain threshold, or

pj was caught trying to send a forged message, etc., pi marks pj as suspected (Lines 1 in

Figure 6.4). Whenever pi has some nodes marked as suspected, it slanders about these

nodes to all other view members. In return, if a node pk receives more than f + 1 slanders

about a node pj , then pk also marks pj as suspected. Notice that if f + 1 nodes slander

about pj , then at least one correct node locally suspects pj , and so it is safe to adopt this

suspicion.

Additionally, the first time in a given view that a node pi marks another node pj as

suspected, pi starts a timer and a counter for the number of nodes it suspects (Lines 2–4 in

Figure 6.4). Once the timer expires, or the number of nodes that pj suspects goes beyond

a predefined threshold, or the coordinator is suspected, pi starts a Byzantine consensus

protocol in order to decide on the failed nodes (Lines 6–8 in Figure 6.4 and Lines 13–15 in

Figure 6.4). Once the Byzantine consensus protocol terminates, the ith non-faulty node,

where i is the old view identifier modulo the number of members that are not suspected, is

supposed to generate a new view (Lines 20–23 in Figure 6.4 and Lines 14–17 in Figure 6.5).

If pi does not generate a new view or if pi generates a wrong view (Lines 37–38 in Figure 6.4),

it would result in re-execution of the view change protocol, and in particular of the Byzantine

consensus protocol (Lines 27–28 in Figure 6.4).

Note that the layered structure of JazzEnsemble allows us to utilize any known Byzantine

consensus protocol. In particular, the layer implementing consensus already enjoys intra-

view reliable delivery, and thus we can use any protocol that assumes this capability [17,

21, 23].

Notice, however, that we would like to decide on which nodes are faulty and which are

not. In other words, we must decide on a binary vector of suspicions. One option is to

use a Byzantine consensus protocol that works with any value domain in which the binary

vector can be viewed as a binary encoding of some value. However, we claim that this is

not adequate. The reason is that if all nodes think that some node pj is suspected, yet

there is a disagreement about another node pk, then the result would be a disagreement

about the suspicion vector, which means that any suspicion vector becomes a valid decision

value for the consensus protocol (by the definition of the Byzantine consensus problem). In

94


particular, this could result in never eliminating pj from the view, despite the fact that all

nodes suspect it!

In this work we use an adaptation of the mute failure detector based protocol reported

in [49] since this protocol is very simple, and since it terminates in one communication round

in favorable circumstances, i.e., when there are no Byzantine behavior other than process

crashes and network disconnections. In this protocol, we do the equivalent of running the

mute failure detector based protocol of [49] n times in parallel, once for each view member,

as listed in Algorithm 6.8. Yet, rather than actually invoking the protocol n times, we

invoke it once in a way that operates in parallel on each entry of the vector, providing an

independent element-wise Byzantine consensus semantics for each of the vector’s bits. The

details appear in Section 6.2.6 below.

Handling Verbose Nodes As mentioned before, a simple attack that Byzantine nodes

can play at all layers of JazzEnsemble is sending spurious messages in order to slow down the

entire group. In particular, in the case of membership, this means initiating too many view

changes that in fact do not result in eliminating or incorporating any node. Such behavior

is captured by the verbose failure detector, which will eventually trigger a suspicion that

such a node is Byzantine.

Merging Views

In order for concurrent views to locate each other, we employ an IP multicast based discovery

mechanism. That is, the coordinator of each view is supposed to periodically multicast a

message announcing its existence and the view it represents (Lines 2–3 in Figure 6.3).

This message is called a gossip message. All nodes in the system are supposed to listen

for gossip messages (this is in contrast to Ensemble and the non-Byzantine version of

JazzEnsemble, in which only coordinators listen for these messages). If correct nodes of a

view do not see gossip messages sent by their own coordinator, then they consider it a mute

failure on behalf of the coordinator (Line 5 in Figure 6.3).

When a coordinator of a view receives such a gossip message, it checks whether it

should try to merge with the reported view (Lines 9–11 in Figure 6.3). In particular, it

checks if the view identifier of the gossiped view is not older than its own view identifier,

95


procedure try to merge with another view(pj)(1) if (my group wants to join the other group) then(2) VIEW CAUSED BY MERGE := TRUE;(3) Byzantine Consensus(suspicions vector);(4) else /* may be the other coordinator wants to merge with my group */(5) send(view, GOSSIP MESSAGE, pj);

(6) endif;

procedure handle END OF FLUSH PROTOCOL Event()(7) if (VIEW CAUSED BY MERGE == TRUE AND I AM COORD OF SMALL GROUP==TRUE) then(8) bcast(msg, MERGE REQUEST);

(9) timer.set(big group coord, MERGE REPLY,t4);(10) else if (VIEW CAUSED BY MERGE == TRUE AND I AM COORD OF BIG GROUP==TRUE) then(11) bcast(msg, MERGE GRANTED);

(12) new view msg := Create New V iew Message();(13) Uniform Broadcast(new view msg);(14) else if (VIEW CAUSED BY MERGE == FALSE AND I AM COORD==TRUE) then(15) new view msg := Create New V iew Message();(16) Uniform Broadcast(new view msg);(17) endif

Upon receive(msg, MERGE REPLY) sent by pj do(18) if (msg.type == MERGE GRANTED) then(19) /*The big group wants to merge with us.*/(20) Uniform Broadcast(msg); /*notifying other nodes that we received MERGE GRANTED message */(21) timer.set(big group coord, NEW VIEW,t5); /*if I do not receive a view, I will generate a new view*/(22) else(23) merge denied by big group();(24) endif;

Upon timer.expire(pi, MERGE REPLY) do(25) merge denied by big group();

procedure merge denied by big group()(26) new view msg := Create New V iew Message();(27) Uniform Broadcast(new view msg);

Upon receive(MERGE REQUEST) sent by pj do(28) if (my group does not want to merge with the group of pj) then(29) bcast(msg, MERGE DENIED);

(30) else /* we want to merge with him */(31) VIEW CAUSED BY MERGE := TRUE;(32) Byzantine Consensus(suspicions vector);(33) endif;

Figure 6.5: Pseudo Code of Merge Protocolbefind lewehext ly cew-ecaqt :6.5 xei`

96


and that the membership lists of the two views do not intersect and both agree on the

same protocol stack. If these conditions do not hold, then the coordinator is supposed

to try merging with the gossiped view (Lines 1–6 in Figure 6.5) using a merge request

message, which is again sent using IP multicast (Lines 7–8 in Figure 6.5). In addition, the

coordinator starts a timer and when the timer expires and it has not received the answer

from the coordinator of the other group, it cancels the merge and creates a new local view

(Line 9, 25–27 in Figure 6.5).

Notice that the checks performed by the coordinator are deterministic, and can be done

by any group member based on its local knowledge. In order to save bandwidth, only

the coordinator sends a merge request. However, in order to protect against Byzantine

coordinators, all other nodes execute the same checks, and if the coordinator was supposed

to send a merge request, then they notify their fuzzy mute detector to expect it (Line 12

in Figure 6.3). Thus, if the coordinator does not send the merge request message, it will

eventually be suspected as mute. Moreover, the view members verify the contents of the

merge request message, and if it is bogus, they will also suspect the coordinator as being

Byzantine.

Similarly, when the coordinator pi of one group receives a merge request message from

the coordinator of another group, then pi performs similar sanity checks on it. If the message

is good pi starts a merging procedure that eventually leads to a new view (Lines 10–13 in

Figure 6.5, Lines 18–24 in Figure 6.5, Lines 28–33 in Figure 6.5). Otherwise, pi sends a

merge denied message that cancels the merge (Lines 28–29 in Figure 6.5).

Forming a New View

In order to reduce the performance impact of a Byzantine coordinator, we replace the

coordinator on each view change. The new coordinator is chosen as the ith non-faulty node,

where i is the old view identifier modulo the number of members that are not suspected.

Clearly, one can use other methods. However, it is preferable that the chosen method

would be locally computable, so that each node can locally verify who should act as coor-

dinator (Lines 20–21 in Figure 6.4).

When the coordinator of the new view sends a new view message, we must ensure

that all correct view members receive the same view message. This is easily obtained by

97


employing a uniform Byzantine delivery protocol (Line 32 in Figure 6.4 and Lines 13, 16 in

Figure 6.5). Here again, in principle, we could use any existing protocol, such as the one

by Bracha [17]. Practically, we have chosen to develop an optimized protocol that obtains

uniform broadcast with only two communication steps (instead of three in [17]), at the price

of f < n/6. The protocol is described in Section 6.2.6.

Message Agreement Inside a View

Recall that at this point in the paper, we already rely on the fact that we have a mechanism

for detecting lost messages and for recovery of such messages (if needed) by retransmission.

Thus, the only two things we still need to worry about are the following:

1. Whenever two correct nodes deliver two versions of the same message to their respec-

tive application module, then these two versions are the same.

2. If a correct node pi delivers a message m that was sent by another node that was

eliminated from a view V 1, then any other correct node pj that continues with pi to

its consecutive view V 2 will also deliver m during V 1.

In order to overcome the first problem, i.e., ensuring that every pair of correct nodes

agree on the content of a message they deliver to their respective application modules, we

use a Byzantine uniform broadcast layer, as described above. Yet, if the message is large,

it is possible to optimize and broadcast uniformly just the digest of the message. This is

because here we only need to ensure that one version of the same message is delivered to

all correct nodes. Once a correct message digest is received, the rest is taken care of in any

case by the reliable retransmission mechanism.

The second problem is solved using what is known in the literature as a flush protocol.

Specifically, we say that a broadcast message is stable if it was acknowledged by every

member that is not considered faulty. Thus, the coordinator does not send the new view

message until all the messages of the terminating view are stable (Lines 13, 16 in Figure 6.5).

Moreover, as part of the uniform broadcast mechanism of new views, a process does not

echo the view message until it knows that all messages it is aware of from the terminating

view are stable.

98


The Pseudo code of FLUSH Protocol by coordinator

procedure init FLUSH Protocol()(1) bcast (FLUSH); /* broadcasting FLUSH message */(2) timer.set(ALL NODES,FLUSH REPLY,t3); /* waiting for FLUSH REPLY message from other nodes*/

Upon timer.expire(ALL NODES,FLUSH REPLY) do(3) if (have not received FLUSH REPLY message from all nodes) then(4) set 0 for every node that has not sent a FLUSH REPLY message

and 1 otherwise in missing flush reply messages vec;(5) Uniform Broadcast(missing flush reply messages vec);(6) missing flush replies nodes := all entries in missing flush reply messages vec that contain 0;(7) timer.set(missing flush replies nodes,MISSING FLUSH REPLIES,t4);(8) endif

Upon timer.expire(missing flush replies nodes, MISSING FLUSH REPLIES) do(9) if (have not received FLUSH REPLY message from all nodes in missing flush replies nodes) then(10) set 1 in suspicions vector for every node that has not sent a FLUSH REPLY message;(11) /*suspicions vector contains all the nodes that have not sent FLUSH REPLY message*/(12) Byzantine Consensus(suspicions vector);(13) endif

Upon receive (FLUSH REPLY) sent by pi do(14) if (received FLUSH REPLY message from all nodes) then(15) GENERATE End of FLUSH Protocol EVENT;(16) endif

The Pseudo code of FLUSH Protocol by regular node

Upon timer.expire(coord, FLUSH) do(17) if (have not received FLUSH message from coordinator) then(18) /*start suspecting coordinator and Run Byzantine Consensus to try to remove him*/(19) Suspect Node(pcoord, TRUE);(20) endif

Upon receive (FLUSH) sent by pcoord do(21) send(FLUSH REPLY, coord);(22) timer.set(coord,NEW VIEW,t3); /* waiting for New View message from coordinator via Uniform Broadcast*/

Upon receive(missing flush replies nodes, MISSING FLUSH REPLIES) via Uniform Broadcast do(23) timer.set(missing flush replies nodes, MISSING FLUSH REPLIES,t4);(24) if (my entry in msg equals to 0) then(25) bcast(msg, FLUSH REPLY, coord);(26) endif

Upon timer.expire(missing flush replies nodes,MISSING FLUSH REPLIES) do(27) if (have not received FLUSH REPLY message from all nodes in missing flush replies nodes) then(28) set 1 in suspicions vector for every node that has not sent a FLUSH REPLY message;(29) /*suspicions vector contains all the nodes that have not sent FLUSH REPLY message*/(30) Byzantine Consensus(suspicions vector);(31) endif

Figure 6.6: Pseudo Code of FLUSH Protocoly`ltd lewehext ly cew-ecaqt :6.6 xei`

99


Flush Protocol: The flush protocol is managed by the coordinator of the view. The

coordinator broadcasts a flush message to all the members of the view and starts a timer

(Lines 1–2 in Figure 6.6). Once the timer expires, the coordinator uniformly broadcasts the

vector of nodes that have not answered the flush message (missing flush reply messages vec)

and sets another timer (Lines 3–8 in Figure 6.6). The purpose of this uniform broadcast is to

cause the correct nodes to monitor the nodes that appear in missing flush reply messages vec.

In this way, if correct nodes will not hear the broadcast of a flush reply message by some

of the nodes in missing flush reply messages vec, they will start suspecting those nodes

and will suggest those nodes as faulty in the next execution of the Byzantine consensus

protocol (Lines 27–31 in Figure 6.6).

When a node receives a flush message from the coordinator of the view , it sends

(Line 21 in Figure 6.6) the list of stable messages in a flush reply message and starts a

timer (Line 22 in Figure 6.6). In addition, when a node receives a missing flush replies

message from the coordinator of the view and the node appears as someone that has not sent

a flush reply message, it casts (Lines 24–26 in Figure 6.6) the list of stable messages in a

flush reply message and also starts a timer (Line 23 in Figure 6.6). Once a timer expires,

if some of the nodes have not sent the flush reply message, they are suspected (Lines 9–

13, 27–31 in Figure 6.6). Otherwise, the coordinator is suspected for not generating a new

view (Lines 27–28 in Figure 6.4 and Lines 17–20 in Figure 6.6). Finally, if the coordinator

received flush reply messages from all the nodes, it tries to create a new view (Lines 14–16

in Figure 6.6 and Lines 7–17 in Figure 6.5).

Small Views

If the membership size n of a view is small, we can use a Byzantine consensus protocol and

a uniform broadcast protocol that work with f < n/3. If n drops below that, then there is

not much that can be done due to the theoretical lower bounds. However, by distinguishing

between the number of Byzantine nodes and the number of disconnected and crashed nodes,

we may be able to employ somewhat more resilient protocols that still work efficiently, along

the lines of [73].

Finally, a member of a small view that is unhappy with its view members, but does not

have enough supporters to establish the view it believes in, can always establish a singleton

100


view and try to gradually merge with nodes it trusts. Handling this is left for future work.

6.2.5 Total Ordering

While total ordering is not strictly required by virtual synchrony, it is a common option

in most group communication systems. Adding total ordering to virtual synchrony enables

obtaining atomic delivery, which is a basic mechanism for implementing a replicated state

machine semantics [105].

We have implemented total ordering as following: Nodes accumulate all the messages

they receive. Each node picks a subset of these message, chosen by some deterministic and

fair rule, and proposes them in the Consensus protocol. Once a batch of such messages is

decided on, these messages are delivered, and the process moves on to pick the next subset

to be proposed in the Consensus protocol and so forth.

As for the Byzantine consensus protocol, we have utilized the mute failure detector based

protocol of [49], which has the nice property that it terminates in a single communication

step in good scenarios (no failures and all processes initially propose the same message).

Interestingly, as we have discovered during our experiments, if the size of the subset of

messages to decide on is sufficiently large, and when there is a continuous load, or bursty

traffic, the amortized cost of deciding on each message becomes one communication step.

Specifically, in the first invocation of Consensus in each burst, there might be disagreement

regarding the proposals and therefore multiple communication steps are required to decide.

However, during this time, all nodes continue to accumulate messages. Thus, given that

the subsets of messages to be proposed to Consensus are chosen using a deterministic rule,

then the subsequent invocations of consensus terminate in one communication round!

Notice also that when the application messages are small, then the values proposed

to the Byzantine consensus protocol are the messages themselves. Thus, this implements

atomic broadcast without needing to run a separate uniform broadcast protocol. On the

other hand, when messages are large, it makes more sense to run the Consensus protocol

only on messages’ unique identifiers. However, in this case, we do need a separate uniform

broadcast mechanism, similar to the one we described in Section 6.2.4, to ensure that indeed

all correct nodes receive the same copy (content-wise) of a given message.

101


6.2.6 Efficient Implementations of building blocks

Vector Byzantine Consensus Protocol

In the vector Byzantine consensus problem, we assume that each node starts with a vector of

input bits of size n, known as the input vector. The goal is to have each correct process decide

on an output vector of size n, also called a decision vector. Yet, notice that as this protocol

is being run within a view, some otherwise correct nodes might become disconnected due

to the network. These nodes cannot be required to terminate their computation. Thus,

we introduce the notion of a core component.4 That is, we assume that among the set of

n nodes participating in the computation, there is a subset of at least n− f correct nodes

that are also connected, which we call the core component. With this definition, we say

that a protocol solves the Vector Byzantine Consensus problem if it ensures the following

requirements (these are simple extensions of standard Byzantine consensus requirements;

we repeat them here for completeness):

Vector Byzantine Validity: Let Vi be the decision vector of some core process pi that

decides. Then for each k, if the value of the entry Uj [k] in the input vector Uj of all

core processes is v, then Vi[k] = v.

Vector Byzantine Agreement: Let Vi be the decision vector of some core process pi

that decides and Vj the decision vector of another core process pj that decides. Then

for every k, Vi[k] = Vj [k].

Byzantine Termination: Eventually, every core process decides on some decision vector.

As indicated above, the Byzantine consensus protocol we employ is a simple extension of

the protocol of [49] to vectors and is based on having a♦Pmute failure detector. The protocol

ensures safety even when the failure detector is not obeying the properties of ♦Pmute.

The only requirement that depends on the properties of ♦Pmute to hold is termination.

Practically, in the actual implementation inside JazzEnsemble we use the fuzziness levels of

nodes as an approximation for ♦Pmute.

The pseudo-code is listed in Figure 6.7 and Algorithm 6.8. Intuitively, the protocol

includes two phases. In the first phase, each process collects the current estimates regarding4A similar approach was used in [35] for handling benign failures in partitionable networks.

102


Variables:esti[ ] – a vector of current estimates of pi about the decision valuesdominatingi[ ] – a vector of majority estimates of pi about the decision valuesneed coord[ ] – a vector that contains the indexes of values that need to adopt the coordinator’s valueVi[ ][ ] – a matrix that contains current estimates of other processes

Figure 6.7: Main variables held by each process pi

pi jildz lk i"r miwfgeny mixwir mipzyn :6.7 xei`

the value that should be decided on. If some value is overwhelmingly dominating, we can

safely decide on it in the second phase. Otherwise, if there is a single value that was

reported by a significant number of the nodes, but not enough for a safe decision, we adopt

this value as the estimate for the next round, but do not decide on it yet. If even this does

not happen, and we were able to obtain the value of the coordinator without suspecting it,

then we adopt the value suggested to us by the coordinator. The idea is that if no value gets

enough support, than it means that we are not bound by validity to decide on any specific

value. In this case, if we are lucky and the round is controlled by a correct coordinator,

everyone will adopt the coordinator’s value and will be able to decide in the next round.

The fact that we replace a coordinator on each round ensures that eventually there will

be such a coordinator. On the other hand, if the current coordinator is mute, the failure

detector ensures that we will not wait for it forever.

Proof of the Vector Byzantine Consensus Protocol: In the following lemmas, we

prove the correctness of the algorithm for an arbitrary entry k in the vector. Since the

proof holds for each entry in the vector, it also holds for the entire vector. The proofs are

adaptations of the corresponding ones given in [49] to incorporate the notions of vectors

and core subsets; they are given here for completeness.

Lemma 6.2.1 Let us assume n > 4f , and consider the situation where, at the beginning

of a round r, all core processes pi have the same estimate value v[k] (i.e., esti[k] = v[k]).

They will never change their estimates thereafter.

Proof: Note that in every round, each core process collects at least (n − f) estimates.

Since at the beginning of round r, all core processes have v[k] as their initial estimate and

103


procedure Byzantine consensus(dominatingi[ ])(1)init : r ← 1; esti ← dominatingi; cr← hash(n,view id);(2)loop

—————————————– Step 1 of round r——————————————(3) Vi ← [⊥, . . . ,⊥](n ∗ n times); c ← ((c+1) mod n); r ← (r+1);(4) broadcast val(r, esti);(5) wait until val(ri,−) or dec(−) messages have been received from all non-suspected processes

and from at least (n− f) distinct processes/* We build here the matrix of estimates */

(6) for All j: do(7) if

(val (ri, estj) or dec(estj) received from pj

)then Vi[j] ← estj

(8) end if(9) end for

/* We are looking for the columns that the majority value appears more than > n/2 times */(10) for All k: do(11) if (∃v 6= ⊥ : #v(Vi[ ][k]) > n/2 ) then dominatingi[k] ← v;(12) else dominatingi[k] ← esti[k](13) end if(14) end for

—————————————– Step 2 of round r——————————————(15) if (i = cr) then broadcast coord(r, dominatingi)(16) end if(17) for All k: do(18) if (#dominatingi[k](Vi[][k]) ≥ (n− 2f −#⊥(Vi[][k]))

)then

(19) esti[k] ← dominatingi[k];(20) else need coord[k] ← 1;(21) end if(22) end for(23) if (∃ k s.t. need coord[k]=1) then(24) wait until

(coord(r,−) or dec(−) received from pc or pc is suspected

)(25) if (coord(r, x) or dec(x) received from pc) then(26) coord vali ← x;(27) else coord vali ← dominatingi;(28) end if(29) for All k: do(30) if (need coord[k]=1) then(31) esti[k] ← coord vali[k];(32) end if(33) end for(34) goto 3(35) else(36) for All k: do(37) if (#dominatingi[k](Vi[][k]) < (n− f)) then(38) goto 3;(39) end if(40) end for

/* if we haven’t jumped for any of the fields in the array, we can decide */(41) broadcast dec(esti) ; return (esti) ;(42) end if(43) end loop

Figure 6.8: ♦Pmute-Based Vector Byzantine Consensus Protocol Executed by pi (n > 6f)rveane ♦Pmute a ynzyny mikxr ly mixehwe xear zizpfia dnkqd ly lewehext :6.8 xei`

(n > 6f ) pi jildz i"r

104


as there are at most f Byzantine processes, then every core process pi will collect at least

n−2f estimates equal to its own estimate v[k]. As n > 4f , it follows that v[k] is a majority

value in Vi[ ][k] and therefore dominatingi[k] is set to v[k] (line 11). Hence, esti[k] is set to

dominatingi[k] = v[k] (line 19).

Lemma 6.2.2 [Validity] If all the core processes propose the same value v[k], then no value

v′[k] 6= v[k] can be decided.

Proof: This lemma is an immediate consequence of Lemma 6.2.1 when we consider r = 1.

As all estimates of core processes remain equal to v[k], it follows from line 41 that no value

v′[k] 6= v[k] can be returned by a core process.

Lemma 6.2.3 [Agreement] Let n > 6f . No two core processes decide different values.

Proof: Let r be the first round during which a core process pi decides, and let v[k] be the

value of entry k that it decides. Due to the lines 11 and 41, it follows that dominatingi[k] =

v[k] and #v(Vi)[ ][k]≥ n− f . Due to the fact that at most f processes are not in the core

component, it follows that, in the worst case, pj sees the same values as pi except for

#⊥(Vj)[ ][k] entries that are equal to ⊥ in Vj [ ][k] (those being equal to v in Vi[ ][k]), and

at most f other entries (those possibly corresponding to Byzantine processes that sent v[k]

to pi and v′[k] 6= v[k] to pj). It follows that #v(Vj [ ][k]) ≥ n − f − (f + #⊥(Vj [][k])), i.e.,

#v(Vj [ ][k]) ≥ n − 2f − #⊥(Vj [][k]) for any core process pj . As #⊥(Vj [ ][k]) ≤ f , we get

#v(Vj [ ][k]) ≥ n − 3f and, as n > 6f , it follows that v[k] is a majority value in Vj [ ][k].

Hence, dominatingj [k] = v[k] (line 11).

Moreover, as #v(Vj [ ][k]) ≥ n − 2f − #⊥(Vj [ ][k]), the test at line 18 is satisfied for any

core process pj and, accordingly, any core pj sets estj [k] to v[k] at line 19. If pj decides

at line 41, it decides v[k]. If pj proceeds to the next round, due to Lemma 6.2.1, no value

v′[k] 6= v[k] can be decided.

Lemma 6.2.4 No core process can block forever in a round.

Proof: The lemma follows immediately from the following observations. At each round

r: (a) as there are as most f non-core processes, no core process can block forever at line

105


5, and (b) as the failure detector satisfies the Muteness Strong Completeness property, no

core process can block forever at line 24.

Lemma 6.2.5 [Termination] Let n > 6f . Each core process eventually decides.

Proof: Let t be the time after which the failure detector is accurate, i.e., no core process

is suspected (due to the Eventual Strong Accuracy of the failure detector, such a time t

does exist). Let r be the first round that starts after t and is coordinated by a core process

pc. Let us observe that, due to Lemma 6.2.4 and the use of dec() messages (if any), any

core process pi that has not yet decided starts round r. During r, let dominatingc[k] = v[k].

Claim. At the end of r (where dominatingc[k] = v[k]), all core processes pi have esti[k] =

v[k]. End of the claim.

Due to the claim, it follows that all the core processes (that have not yet decided) start

the round r + 1 with the same estimate value v[k]. Moreover, due to (1) the fact that

there are at least (n − f) core processes, (2) the fact that the failure detector is accurate

(i.e., no core process is suspected), (3) the dec () messages sent by the processes that

have already decided (if any), and (4) the waiting statement of line 5 (messages are re-

ceived from all core processes), it follows that all the core processes pi are such that #v(Vi[

][k]) ≥ n − f , and v[k] is the only such value (because n − f > f). So, for any core pi,

we have dominatingi[k] = v[k] at line 11. Consequently, the test of line 18 is satisfied (for

every entry in the vector esti) and the test of line 37 is not satisfied for any column in the

matrix Vi, and each core process pi decides accordingly by the end of r + 1.

Proof of the claim. Let us first observe that if each core process executes line 31 for entry

k, it adopts v[k] as its new estimate and the claim trivially follows.

Let us consider the case where a process pi executes line 19, namely, esti[k] ← dominatingi[k].

Let dominatingi[k] = w. We have to show that v[k] = w. As pi executes line 19, the test of

line 18 is satisfied and we have #w(Vi[ ][k]) ≥ n−2f−#⊥(Vi[ ][k]). Moreover, as (1) pi is in

the core, (2) there are at most f non-core processes, (3) we are after the time t (and conse-

quently, each core process receives a message from each core process), we can conclude that

the entries m such that Vi[m][k] = ⊥ correspond to faulty processes. Consequently, for any

106


core process pj , we have #w(Vj [ ][k]) ≥ n−2f−#⊥(Vi[][k])−(f−#⊥(Vi[ ][k])), i.e., #w(Vj [

][k]) ≥ n− 3f . So, when we consider the coordinator pc, we get #w(Vc[ ][k]) ≥ n− 3f . As

n > 6f , we have #w(Vc) ≥ n−3f > n/2, and so w is a majority value in the vector Vc[ ][k].

It then follows from line 11, that dominatingc[k] = w. Hence w = v[k]. It follows that all

core processes pi have esti[k] = v[k] at the end of r. End of proof of the claim.

Theorem 6.2.6 Let n > 6f . The protocol described in Algorithm 6.8 solves the vector

Byzantine consensus problem.

Proof: The proof follows from the Lemmas 6.2.2, 6.2.3 and 6.2.5.

An Efficient Byzantine Uniform Broadcast Protocol

In the formal problem of uniform broadcast, a process is trying to send a message v to

all other processes such that all of them will deliver the same message. As in the case of

Byzantine consensus, in this work we assume that the view includes n processes, out of

which there is a core component of at least n− f processes that are correct and connected.

A protocol implements Uniform Byzantine Broadcast if it obeys the following requirements:

Broadcast Uniform Delivery: If a correct process p delivers a message v, then all other

core processes also deliver the value v. In particular, if two core processes deliver

values v and u respectively, then v = u.

Broadcast Termination: If a core process sends a message v, then every core process

delivers v.

The optimized protocol for implementing uniform broadcast appears in Figure 6.9. In-

tuitively, all messages that are sent in the k’th broadcast by p are tagged with (p, k), thereby

eliminating possible interference between broadcasts. There are two types of messages in

the protocol: initial and echo. The algorithm starts when the originator of the mes-

sage p sends an (initial,v, k) message, where v is the content of the actual message p

wishes to disseminate. Following this, the processes report to each other the value they

received via (echo,v, k) messages. If more than (n/2 + f + 1) (echo,v, k) messages (or

the (initial,v, k) message) are received by a process, it sends an (echo,v, k) to other

107


processes (if this process has not done so yet) and if the process receives (n/2 + 2f + 1)

messages, it delivers v. As is shown in the proof, this is enough to ensure uniform broadcast.

Function Uniform broadcast(vi, k)

step 0: (only by the originator): Send(initial,vi, k) to all the processes ;

step 1:Wait until Receive one (initial,v, k) message or (n/2 + f + 1) (echo,v, k) messages for some v;Send(echo,v, k) to all the processes;

step 2:Wait until Receive (n/2 + 2f + 1) (echo,v, k) messages for some v ;// The node accumulates echo messages it received from Step 1:// if the node gets at least (n/2 + 2f + 1) (echo,v, k) messages in both steps, it can decideDeliver(v);

Figure 6.9: Uniform Broadcast Protocol Executed by pi (n > 6f)pi znev i"r rveany dcig` dvtd ly lewehext :6.9 xei`

Correctness proof: As in the case of the proof of Byzantine consensus, we assume that

the terminating view includes a core component of n − f nodes, where n is the number

of nodes in the view. We show that if f < n/6, then the protocol in Figure 6.9 indeed

implements Uniform Byzantine Broadcast.

Lemma 6.2.7 For any given k, if two core processes p and q deliver values v and u re-

spectively, then u = v.

Proof: Assume by way of contradiction that the lemma does not hold. In order for p to

deliver v it must have received (n/2 + 2f + 1) (echo,v, k) messages, and therefore at least

n/2 + f + 1 (echo,v, k) messages from core processes. Similarly, q must have received at

least n/2+ f +1 (echo,u, k) messages from core processes. Therefore, some core process r

must have sent both (echo,v, k) and (echo,u, k) messages. But core processes, which by

definition are also correct, can send only one version of each message during a broadcast.

A contradiction. Therefore, u = v.

Lemma 6.2.8 For any given k, if a core process p delivers the value v, then every other

core process will eventually deliver v.

108


Proof: If p delivers v, then p received (n/2 + 2f + 1) (echo,v, k) messages. At least

n/2 + f + 1 of these messages were sent by core processes. Therefore, every other core

process receives at least n/2 + f + 1 (echo,v, k) messages and sends its own (echo,v, k)

message. Thus, at least (n−f) processes will send (echo,v, k) message. Every core process

will eventually receive at least (n−f) ≥ (n/2+2f +1) (echo,v, k) messages and will deliver

v.

Lemma 6.2.9 For any k, if a core process p sends v, then all the core processes will deliver

v.

Proof: Suppose a core process p sends v; every other core process will receive an (initial,v, k)

message and will send an (echo,v, k) message. Therefore, every core process q will re-

ceive (n − f) ≥ (n/2 + 2f + 1) (echo,v, k) messages from core processes, and at most

f < (n/2 + 2f + 1) different messages from non-core processes. Therefore, q will deliver v.

6.3 Performance Evaluation

Our measurements were carried out on an IBM Blade Center cluster, comprising of 25 dual-

processor 2.2GHz PowerPC blades (JS20), each with 4GB of RAM and interconnected via

gigabit ethernet switches and running SuSE Linux Enterprise Server 9. Every blade has

only one NIC, and thus all applications running on the same blade share the same NIC,

even if they run on a different CPU. The blades were otherwise unloaded. We have run our

tests with groups ranging from 8 to 50 processes. In all tests we had only one process per

CPU. Additionally, in tests of up to 24 nodes, each process was run on a different blade,

while with larger groups we had two processes on each blade (so in large groups each two

processes shared a NIC, but were run on different CPUs). Also, due to the configuration

of our Blade Center, when the group size was above 12, part of the communication had to

cross two internal switches. Last, JazzEnsemble is implemented in OCaml. Therefore, we

relied on the OCaml CryptoKit for handling cryptography.

We have used the Ensemble Ring demo application to measure the performance of the

system. In this demo, the application advances in rounds. In each round, a node sends a

109


0 10 20 30 40 500

0.5

1

1.5

2

2.5

3

3.5

4

4.5

x 104

16−b

yte

mes

sage

s / s

econ

d

group size

JazzEnsByzEns+NoCryptoByzEns+SymCryptoByzEns+NoCrypto+TotalByzEns+PubCrypto(512 bits)

Figure 6.10: Throughput measurements (theline for public key cryptography is hardlyvisible, as it is so close to 0 compared with

the other lines)zercedd zenka zkxrnd zwetz :6.10 xei`ly mirevia lnqny ewd) miznvl exaredy,d`xp iyewa ianet gztn zervnà dptvd

(mieewd x`yl dèeydd 0 l aexw èdy oeeikn

0 10 20 30 40 500

1

2

3

4

5

6

7

8

9

10

aver

age

late

ncy

of 1

−byt

e m

essa

ges

in m

s

group size

JazzEnsByzEns+NoCryptoByzEns+SymCryptoByzEns+NoCrypto+Total

Figure 6.11: Latency measurements (the linefor public key cryptography is dropped since itis orders of magnitude higher than the others)ly mirevia lnqny ewd) xefg` onf :6.11 xeiònfy oeeikn cxed ianet gztn zervnà dptvd

gztn mr dptvda miynzyn xy`k xefg``ly xefg`d ipnf x`yn lceb ixcqa lecb ianet

(ianet gztn mr dptvda miynzyn

burst of k messages and waits until it receives k messages from all other nodes, at which point

it moves to the next round. Thus, assigning k = 1 allows measuring the network latency.

Throughput is measured as the number of broadcast messages successfully delivered per

second (if a message is delivered to n nodes, we count it as one message for throughput

calculations).

As can be seen in Figure 6.10 and Figure 6.11, the performance is fairly scalable with

up to 50 members. We attribute some of the minor dip in throughput above 12 nodes to

the extra switch that some messages need to travel. Similarly, part of the minor dip above

24 nodes is due to the fact that each pair of processes shared a NIC in such large groups

(yet, each process was run on a separate CPU). Moreover, the OS kernel runs only on one

of the two processors; we discovered that any process that runs on the same processor as

the kernel enjoys better performance than processes that run on the second processor!

Additionally, we can see that without cryptography and uniform broadcast5, the per-

formance in about 85-90% of the performance of the non-Byzantine version of our system.

Or in other words, handling all attacks on reliable delivery, flow control, and membership5For a discussion of uniform broadcast, see Section 6.2.4 and the discussion after Definition 6.1.2.

110


maintenance reduces the throughput by about 10%-15%.

Symmetric key cryptography (AES with a 128-bit key) reduces the performance by

about half. This includes signing each message n− 1 times with a symmetric key. On the

other hand, the throughput with public key cryptography with a 512-bit key drops to a few

dozen messages per second, making it almost useless.

The line labelled “ByzEns+NoCrypto+Total” in Figure 6.10 illustrates the performance

of atomic delivery, obtained by placing a Byzantine consensus layer to order messages in a

total order (as described in Section 6.2.5).6 As can be seen, the performance is lower than

without total ordering with up to 24 nodes, with a significant drop above 24 nodes. The

drop in performance above 24 nodes is largely attributed to the fact that when we utilize

two processes on the same blade server, they both share the same NIC (but separate CPUs).

This means that when running Byzantine consensus, we are limited by the NICs capacity

due to the extra messages injected by this protocol.

Figure 6.12 focuses on the attainable throughput of the Byzantine version of JazzEnsem-

ble while also ensuring total ordering and uniform broadcast. As can be seen, symmetric

key roughly halves the throughput for both total ordering and uniform delivery (recall that

due to our use of consensus in the implementation of total ordering, total ordering already

satisfies uniform broadcast). The reason why uniform delivery is worse than total ordering

is that the implementation of consensus can decide on multiple messages in one instance.

Thus, the cost of the consensus protocol is averaged on multiple messages. Due to a bug in

JazzEnsemble, we were not able to implement a similar optimization for uniform delivery.

In general, both these protocols deliver reasonable performance for small clusters. However,

the performance decays as the cluster grows, due to the fact that both protocols require

O(n2) messages, or to be precise, O(n) broadcasts (with consensus averaging out this cost

on multiple messages). Interestingly, the performance decay looks linear rather that poly-

nomial. The reason is that the network is switched. Thus, the extra load imposed on each

link and each group member grows only according to O(n)!

At any event, we would like to emphasize once again that this is without packing/batching

optimizations [52]. When incorporating such optimizations, from sporadic testing, we be-

lieve that for small messages we can get a performance boost of at least a factor of 10, and6Also, the graph ends at 44 nodes rather than 50 since 6 nodes were trashed due to a UPS malfunction

during an electric break.

111


0 5 10 15 20 25 30 35 40 450

2000

4000

6000

8000

10000

12000

14000

16000

16−b

yte

mes

sage

s / s

econ

d

group size

NoCrypto+TotalNoCrypto+UniformNoCrypto+Total+UniformSymCrypto+TotalSymCrypto+UniformSymCrypto+Total+Uniform

Figure 6.12: Throughput Measurements: thecost of total ordering and uniform broadcast

with and without symmetric-keycryptography

`ln xcq ly xign :zkxrnd zwetz :6.12 xei`ilae mr zerced ly dcig` dvtde zerced ly

ixhniq gztn zervn`a dptvd

0 10 20 30 40 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

seco

nds

to s

tabi

lity

group size

ByzEns+NoCrypto merge−>initByzEns+NoCrypto leave−>init

Figure 6.13: Time to establish a new viewycg han ly dpwzdd cr onf :6.13 xei`

as much as a factor of 90 for 1 byte messages.

Figure 6.13 shows the latency to establish a new view for both merging a new node and

recovering from a failed or departed node (once the failure was detected). As can be seen,

this latency grows with the view size, and is roughly the same in both cases. However, even

with 50 nodes, it takes about 0.35 seconds to establish the view. On the other hand, the

exponential nature of the graph suggests that in order to grow to much larger groups, a more

scalable overlay based solution might be needed. However, overlays tend to be vulnerable

to Byzantine failures, so finding a practical solution to this is not a trivial task.

Finally, Table 6.1 details the time to recover from several scenarios. The scenarios

include a member leaving the group after sending a leave message, a node that becomes

mute and does not send anything, a coordinator that becomes mute, a node that becomes

too verbose and suspects other nodes all the time, and a coordinator that sends a view that

is different from the one expected. In all cases, the time presented is from the detection

of the failure until a new view is installed (but in the case of muteness and verbosity, does

not include the failure detection time itself as this is a tunable parameter). As can be seen,

in all cases, the recovery time is similar and is always less than 20 milliseconds. The main

112


Name Meaning Recovery TimeByzLeave A node sends a leave message and then leaves 0.013 secByzMuteNode A node is mute and not sending anything 0.015 secByzMuteCoord The coordinator is mute 0.018 secByzVerboseNode A Byzantine node is too verbose and suspects nodes all the time 0.016 secCoordBadView The coordinator sends wrong view 0.014 sec

Table 6.1: Recovery time from problematic scenariosmiziira miyigxzn zeyye`zd onf :6.13 xei`

difference seem to be in whether all nodes start the consensus protocol roughly at the same

time and with the same value or not. These numbers were obtained with a group of 12

nodes. In groups of 50 nodes, the latency may grow up to 350 milliseconds; in those cases

the view latency is vastly dominated by the synchronization time of the view, as appears

in Figure 6.13.

113


114


Chapter 7

Summary and Future Directions

In this chapter we conclude the outcome of this work. First, we present a short summary

of the results and discuss them. Then, we list a few key directions at which this work may

evolve further.

7.1 Summary of Results

In this thesis we have presented a scalable Byzantine group communication system. Our

system enjoys several interesting properties: It is a generic group communication system and

therefore can be used as a building block for various distributed applications. The system

is designed to perform well in the normal case, i.e., when no Byzantine failures occur, yet

be resilient to them if they do, as validated by our performance measurements. Also, our

protocols do not rely on protocol level signatures, and only sign (and authenticate) each

message once before sending it to (or receiving from) the network. The only exception to

this is retransmitting by a third node, which requires signatures at a low level of the system.

By examining our performance measurements, and in particular when focusing the

sources of overhead in handling Byzantine failures, one can make the following observa-

tions: The cost of handling Byzantine failures other than cryptography, total ordering, and

uniform broadcast (to be precise, ensuring that a Byzantine node does not send different

versions of its application messages to different nodes) or total ordering, is relatively small;

about 10-15% in our measurements. Also, public key cryptography is extremely expensive

115


to perform in software. Yet, using symmetric key cryptography while signing each broadcast

message n− 1 times results in acceptable performance degradation even in groups of up to

50 nodes. Furthermore, as security becomes increasingly important, it is conceivable that

in the future most computers will include hardware accelerators for it, which will reduce

its cost even further. Moreover, even with total ordering (by Byzantine consensus), the

performance of the system is still quite reasonable. Only a decade ago, a throughput of

4,000 messages per second on a cluster of 44 nodes would have been considered excellent

for a non-Byzantine tolerant system. However, the performance does degrade as the cluster

size increases.

Our layered architecture enabled us to perform fine grain measurements regarding the

cost of various performance limiting factors in fending off Byzantine faults. When con-

sidering the scalability problems of total ordering and uniform delivery, one is faced with

the following tradeoffs: First, public-key cryptography is too expensive. Moreover, due

to its CPU intensive behavior, its detrimental effect on throughput is even more consid-

erable than its impact on latency, as highlighted in [45]. Second, existing techniques for

implementing Byzantine resilient total ordering and uniform delivery that do not rely on

public-key cryptography are not very scalable. In contrast, the approach of [2] replaces the

use of Byzantine consensus and uniform delivery with a quorum approach. However, it only

ensures probabilistic termination, and requires increased clients to servers communication.

This makes it less attractive when the communication between clients and servers is worse

than the communication among the servers themselves, e.g., when all the servers are in the

same farm.

Additionally, we have presented RAPID and BDP, two reliable broadcast protocols for

mobile ad-hoc networks. BDP disseminates messages along the arcs of a logical overlay.

The protocol relies on signatures to prevent messages from being forged. It also employs

gossiping of headers of known messages to prevent a malicious overlay node from stopping

the dissemination of messages to the rest of the system. Moreover, for efficiency reasons,

the overlay maintenance mechanism is augmented to ensure that enough correct nodes are

elected to the overlay so that malicious nodes do not disconnect the overlay beyond the time

required to detect such behavior. Finally, the detection of observable malicious behaviors,

such as mute and verbose failures, are encapsulated within corresponding failure detectors

116


modules. The use of failure detectors simplifies the presentation of the protocol and makes

it more generic and robust. This is because the protocol need not deal explicitly with issues

like timers and timeouts.

We show that for non-sparse networks, BDP behaves very well. That is, BDP obtains

very high delivery ratios while sending much fewer messages than flooding. When there is

no malicious activity, it is almost as economical as a protocol that has no recovery mecha-

nism (and in particular, much more efficient than flooding). When some malicious failures

occur, BDP still remains more efficient than flooding, while maintaining a comparable de-

livery rate. In contrast, when there are malicious failures or mobility, having no recovery

mechanism results in a significant drop in delivery rates. Additionally, we discovered the in-

teresting anecdote that malicious failures have a somewhat reduced impact when the nodes

are mobile. Intuitively, when nodes are mobile, there is a lower chance that malicious nodes

will constantly be at critical positions on the message dissemination paths for all messages.

Yet, the problem with deterministic overlays is that due to the combination of mobility and

the decentralized nature of MANETs, maintaining overlays in MANETs is a complex and

expensive task. Finally, it is hard to make overlays resilient to malicious or even selfish

behavior in highly mobile networks.

We have also developed RAPID, a probabilistic reliable broadcast protocol for mobile

ad-hoc networks. The protocol includes a probabilistic flooding phase that is complemented

by two corrective measures, namely, counter based forwarding and a deterministic gossip

based mechanism. The latter enable recovering messages that were not delivered by the

probabilistic dissemination process while maintaining low communication overhead. The

probabilistic flooding part of the protocol takes advantage of the locally observed network’s

density in order to send a small number of messages, yet one that is still sufficient to deliver

the message to most nodes. This is in accordance with our formal analysis. This provides

very rapid dissemination of the message to most nodes in the system with low message

overhead, and in a way that is scalable in the network density.

117


7.2 Future Directions

As noted in Section 6, our Byzantine group communication system supports only single

hop networks. One of the interesting directions is to extend it to support multi-hop ad-hoc

networks. Recall that in ad-hoc networks, nodes cannot necessarily communicate directly

with each other. Instead, some nodes act as forwarders for the entire group. The main

places where this affects our work are that we need a Byzantine routing mechanism, and

the fact that the stability protocol and the failure detection must become gossip based [47].

Both the stability and failure detection protocols in Byzantine JazzEnsemble require to

receive messages from all group members, which was the case in single hop networks. Yet,

in multi hop networks, a node p can receive messages only from nodes whose distance from

p is less than their transmission range. Therefore, some devices will have to forward the

stability and failure detection messages from other nodes to their neighbors. It is even

more complicated if some of the nodes choose to behave in a Byzantine manner and not to

forward these messages. Thus, it can be a good direction to exploit BDP and RAPID, that

we developed, in Byzantine JazzEnsemble in order to reliably disseminate messages to all

the nodes.

Even if we use a malicious tolerant protocol like BDP, which sends relatively small num-

ber of messages, the number of messages that is sent by the stability and failure detection

protocols that are used in Byzantine JazzEnsemble is very high. In order to reduce the

number of messages, gossip based failure detection and stability detection protocols have

been proposed for MANET in benign failure models [47, 50]1. The challenge is how to

adapt these protocols to Byzantine failure prone environments in which a Byzantine node

can alter the messages it is forwarding.

The simplest way of overcoming many of the potential attacks against these protocols

is through the use of public-key cryptography. However, as we have discovered in this

work, the cost of public-key cryptography has the potential of rendering such protocols too

expensive. Thus, the gossip based failure detection and stability detection protocols that

will be developed should avoid using public-key cryptography in order to perform well in

practise.

1The work presented in [118] introduces a scalable gossip based protocol that provides timely detectionin large wired networks.

118


Finally, one of the main problems in mobile ad-hoc networks is power. As nodes are

mobile, they are typically battery operated. It turns out that the network card consumes

roughly the same levels of energy when it sends a message, receives a message, and listens for

messages. The main source of energy saving is to put the card in sleeping mode. The IEEE

802.11 standard includes the Power Save Mode in order to deal with this problem in wireless

LANs when all messages are point to point. There have also been a few attempts to extend

this to multiple hops networks with point to point messages, such as [8]. Recently, many of

the mobile devices that arrive to the market are equipped with more than one networking

interface, such as Wi-Fi and Bluetooth. Since Bluetooth consumes considerately less energy

then Wi-Fi its possible to put the Wi-Fi card in a sleeping mode, while keeping the Bluetooth

interface active. When some node wants to forward a message to its neighbor p, it sends a

short wake-up message over Bluetooth interface to p, that causes p to start listening over

Wi-Fi interface as well. After receiving the message p can put its Wi-Fi card in a sleeping

mode again until the next wake-up interrupt. Such a trick can save a considerable amount

of energy and keep a network active for a longer period of time.

An interesting problem is to utilize techniques mentioned above and to develop a Byzan-

tine broadcast protocol for multiple hop ad hoc networks that enables most nodes to sleep

most of the time in order to reduce their energy consumption.

119


120


Appendix A

Practical Application

WiPeer is a software that enables direct communication between computers, without the

need for Internet access. The software contains a user friendly interface and a suite of col-

laborative applications, operating in peer-to-peer mode over both WiFi communication and

Ethernet LANs. It includes a common management utility, to which the other applications

can be plugged in. Current applications include automatic discovery of devices at the same

network, presence notifications, instant messaging, file sharing, distributed file search, and

several multiplayer games. WiPeer’s technology is based on the JazzEnsemble group com-

munications toolkit that was designed to target wireless mobile ad-hoc networks (MANETs).

Therefore, any communication between two nearby devices is performed over the direct link

(in a peer-to-peer mode), without relying on a central server or infrastructure. It dramat-

ically improves the user’s experience, since such communication enjoys higher bandwidth

and lower latency than infrastructure based communication. WiPeer’s extendable core en-

ables to add new applications within very short development cycles, which makes it a highly

extendible platform. Thus, WiPeer may serve as a platform for mobile application, such as

mobile multiplayer games and productivity applications.

Figure A.1 describes the WiPeer’s architecture. WiPeer’s architecture is designed in

a way that enables re-use of most of the components, even when running above different

platforms. Thus, only components that depend on the functionality that is inherent to the

platform that WiPeer runs above need to be rewritten, such as network management, GUI

(graphic user interface) and power control of networking cards. All other components may

remain the same and should not be rewritten.

121


A.1 Future Work

One of the most important things is to extend the range of the communication between the

devices. In order to achieve it, we plan to implement multi hop routing protocols and to add

them to the JazzEnsemble group communication system. It will enable devices that are not

in direct range of each other to communicate with each other via intermediate nodes. In this

scenario, other devices will be used as proxy repeaters to transmit others’ messages. This

kind of networking allows extending the reach of proximity based communication without

usage of any infrastructure. One can imagine an entire school being connected to one big

multi hop wireless network, in which all pupils’ data exchanges are performed directly over

WiFi without the usage of cellular service providers infrastructure.

Network management Group communication:Discovery, membership,Reliable communicationManagement moduleChat File Sharing Presence SDKAPIGraphical User Interface ExternalGamesExternalApplicationsExternalGamesExternalGamesExternalApplicationsExternalApplicationsPowerControl

Platform dependentFigure A.1: WiPeer architecture

WiPeer ly dxehwhikx` :'`.1 xei`

122


References

[1] Swans/jist. http://jist.ece.cornell.edu/.

[2] M. Abd-El-Malek, G. Ganger, G. Goodson, M. Reiter, and J. Wylie. Fault-

Scalable Byzantine Fault-Tolerant Services. In Proc. 20th ACM SIGOPS Sym-

posium on Operating Systems Principles (SOSP), pages 59–74, October 2005.

[3] M. K. Aguilera, C. Delporte-Gallet, H. Fauconnier, and S. Toueg. On imple-

menting omega with weak reliability and synchrony assumptions. In PODC ’03:

Proceedings of the twenty-second annual symposium on Principles of distributed

computing, pages 306–314, New York, NY, USA, 2003. ACM.

[4] A. Aiyer, L. Alvisi, A. Clement, M. Dahlin, J.-P. Martin, and C. Porth. BAR

Fault-Tolerance for Cooperative Services. In Proc. 20th ACM SIGOPS Sympo-

sium on Operating Systems Principles (SOSP), pages 45–58, October 2005.

[5] D. Allen. Hidden terminal problems in Wireless LAN’s. In IEEE 802.11 Working

Group Papers, 1993.

[6] Y. Amir, G. Ateniese, D. Hasse, Y. Kim, C. Nita-Rotaru, T. Schlossnagle,

J. Schultz, J. Stanton, and G. Tsudik. Secure Group Communication in Asyn-

chronous Networks with Failures: Integration and Experiments. In Proc. of the

20th International Conference on Distributed Computing Systems, pages 330–

343, 2000.

[7] B. Awerbuch, D. Holmer, C. Nita-Rotaru, and H. Rubens. An On-Demand Se-

cure Routing Protocol Resilient to Byzantine Failures. In Proc. ACM Workshop

on Wireless Security (WiSe), Atlanta, GA, September 2002.

123


[8] B. Awerbuch, D. Holmer, and H. Rubens. The Pulse Protocol: Energy Efficient

Infrastructure Access. In Proc. of the 23rd Conference of the IEEE Communi-

cations Society (Infocom), March 2004.

[9] M. Backes and C. Cachin. Reliable Broadcast in a Computational Hybrid Model

with Byzantine Faults, Crashes, and Recoveries. In Proc. of the International

Conference on Dependable Systems and Networks (DSN), June 2003.

[10] G. Badishi, I. Keidar, and A. Sasson. Exposing and Eliminating Vulnerabilities

to Denial of Service Attacks in Secure Gossip-Based Multicast. In Proc. of the

International Conference on Dependable Systems and Networks (DSN), pages

201–210, June – July 2004.

[11] R. Baldoni, J. Helary, and M. Raynal. From Crash-Fault Tolerance to Arbitrary-

Fault Tolerance: Towards a Modular Approach. In Proc. of the IEEE Interna-

tional Conference on Dependable Systems and Networks (DSN), pages 273–282,

June 2000.

[12] Z. Bar-Yossef, R. Friedman, and G. Kliot. RaWMS - Random Walk based

Lightweight Membership Service for Wireless Ad Hoc Networks. In Proc. of

the 7th ACM Intr. Symposium on Mobile Ad Hoc Networking and Computing

(MobiHoc), pages 238–249, 2006.

[13] M. Ben-Or. Another Advantage of Free Choice: Completely Asynchronous

Agreement Protocols. In Proc. 2nd ACM Symposium on Principles of Dis-

tributed Computing, pages 27–30, 1983.

[14] K. Birman, M. Hayden, O. Ozkasap, Z. Xiao, M. Budiu, , and Y. Minsky.

Bimodal Multicast. ACM Transactions on Computer Systems, 17(2):41–88, May

1999.

[15] K. P. Birman. Building Secure and Reliable Network Applications. Manning

Publishing Company and Prentice Hall, December 1996.

[16] S. Bohacek, J. Hespanha, J. Lee, C. Lim, and K. Obraczka. Enhancing Security

via Stochastic Routing. In Proc. of the 11th IEEE International Conference on

Computer Communications and Networks, pages 58–62, May 2002.

124


[17] G. Bracha. An Asynchronous (n− 1)/3-Resilient Consensus Protocol. In Proc.

3rd ACM Symposium on Principles of Distributed Computing, pages 154–162,

1984.

[18] G. Bracha and S. Toueg. Asynchronous Consensus and Broadcast Protocols.

Journal of the ACM, 32(4):824–840, October 1985.

[19] J. Broch, D. A. Maltz, D. B. Johnson, Y.-C. Hu, and J. Jetcheva. A performance

comparison of multi-hop wireless ad hoc network routing protocols. In Proc. of

the 4th annual ACM/IEEE International Conference on Mobile Computing and

Networking (MobiCom), pages 85–97, 1998.

[20] C. Cachin, K. Kursawe, F. Petzold, and V. Shoup. Secure and Efficient Asyn-

chronous Broadcast Protocols. In Proc. of Advances in Cryptology: CRYPTO

2001, pages 524–541, 2001.

[21] C. Cachin, K. Kursawe, and V. Shoup. Random Oracles in Constantinople:

Practical Asynchronous Byzantine Agreement Using Cryptography. In Proc.

19th ACM Symposium on Principles of Distributed Computing, pages 123–132,

2000.

[22] T. Camp, J. Boleng, and V. Davies. A survey of mobility models for ad hoc

network research. Wireless Communications & Mobile Computing (WCMC):,

2(5):483–502, 2002.

[23] R. Canetti and T. Rabin. Fast Asynchronous Byzantine Agreement with Opti-

mal Resilience. In Proc. 25th Annual ACM Symposium on Theory of Computing,

pages 42–51, 1993.

[24] J. Cartigny and D. Simplot. Border Node Retransmission Based Probabilistic

Broadcast Protocols in Ad-Hoc Networks. Telecommunication Systems, 22(1–

4):189–204, 2003.

[25] M. Castro and B. Liskov. Practical Byzantine Fault Tolerance and Proactive

Recovery. ACM Transactions on Computer Systems, 20(4):398–461, 2002.

125


[26] T. Chandra, V. Hadzilacos, S. Toueg, and B. Charron-Bost. On the Impossibility

of Group Membership. In Proc. of the 15th ACM Symposium of Principles of

Distributed Computing, pages 322–330, May 1996.

[27] T. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Sys-

tems. Journal of the ACM, 43(4):685–722, July 1996.

[28] I. Chang, M. Hiltunen, and R. Schlichting. Affordable Fault Tolerance Through

Adaptation. In Proc. of Workshop on Fault-Tolerant Parallel and Distributed

Systems (LNCS 1388), pages 585–603, April 1998.

[29] M.-H. Chek and Y.-K. Kwok. On Adaptive Frequency Hopping to Combat IEEE

802.11b with Practical Resource Constraints. In International Symposium on

Parallel Architectures, Algorithms and Networks (ISPAN), pages 391–396, May

2004.

[30] G. Chockler, I. Keidar, and R. Vitenberg. Group Communication Specifications:

A Comprehensive Study. ACM Computing Surveys, 33(4):427–469, 2001.

[31] T. Clause, P. Jacquet, and A. Laouti. Optimized Link State Routing Protocol.

In Proc. IEEE International Multi Topic Conference (INMIC), December 2001.

[32] M. Correia, N. Neves, L. Lung, and P. Verıssimo. Low Complexity Byzantine-

Resilient Consensus. Distributed Computing, 17(3):237–249, March 2005.

[33] F. Cristian, H. Aghili, R. Strong, and D. Dolev. Atomic Broadcast: From Simple

Diffusion to Byzantine Agreement. In Proc. of the 15th International Conference

on Fault-Tolerant Computing, pages 200–206, Austin, Texas, 1985.

[34] A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Stur-

gis, D. Swinehart, and D. Terry. Epidemic algorithms for replicated database

maintenance. In Proc. of the 6th annual ACM Symposium on Principles of

Distributed Computing (PODC), pages 1–12, New York, NY, USA, 1987. ACM

Press.

[35] D. Dolev, R. Friedman, I. Keidar, and D. Malki. Failure Detectors in Omission

Failure Environments. Technical Report TR96–1608, Department of Computer

Science, Cornell University, 1996.

126


[36] S. Dolev, E. Schiller, and J. Welch. Random Walk for Self-Stabilizing Group

Communication in Ad Hoc Networks. In Proc. of the 21st Annual Symposium

on Principles of Distributed Computing, pages 259–259, 2002.

[37] A. Doudou, B. Garbinato, R. Guerraoui, and A. Schiper. Muteness Failure

Detectors: Specification and Implementation. In Proc. 3rd European Dependable

Computing Conference, pages 71–87, 1999.

[38] A. Doudou and A. Schiper. Muteness Detectors for Consensus with Byzantine

Processes (Brief Announcement). In Proc. 17th ACM Symposium on Principles

of Distributed Computing (PODC), page 315, 1998.

[39] V. Drabkin, R. Friedman, A. Kama, and B. Mudrik. JazzEnsemble: a Group

Communication System for MANET. Technical report, Computer Science, Tech-

nion, 2005.

[40] P. T. Eugster, R. Guerraoui, S. B. Handurukande, P. Kouznetsov, and A.-

M. Kermarrec. Lightweight Probabilistic Broadcast. ACM Transactions on

Computing Systems, 21(4):341–374, 2003.

[41] P. Felman and S. Micali. Optimal Algorithms for Byzantine Agreement. In

Proc. 20th Annual ACM Symposium on Theory of Computing, pages 148–161,

1988.

[42] S. Floyd, van Jacobson, S. McCanne, C. Liu, and L. Zhang. A Reliable Multicast

Framework for Light-Weight Sessions and Application Level Framing. In Proc.

ACM SIGCOMM’95, August 1995.

[43] R. Friedman. Fuzzy Group Membership. In Proc. of FuDiCo 2002: Interna-

tional Workshop on Future Directions of Distributed Computing, pages 60–63,

Bertinoro, Italy, June 2002.

[44] R. Friedman, M. Gradinariu, and G. Simon. Locating Cache Proxies in

MANETs. In Proc. 5th ACM International Symposium on Mobile Ad Hoc Net-

working and Computing (MobiHoc), pages 175–186, May 2004.

127


[45] R. Friedman and E. Hadad. On the Significance of Latency vs. Throughput in

Analyzing the Performance of Distributed Systems. IEEE Distributed Systems

Online: “Distributed Wisdom” Column, January 2006.

[46] R. Friedman and A. Kama. Strong Replication Semantics in Mobile Ad-Hoc

Networks. Technical report, Computer Science, Technion, 2005.

[47] R. Friedman, S. Manor, and K. Guo. Scalable Hypercube Based Stability De-

tection. IEEE Transactions on Parallel and Distributed Systems, 13(8), August

2002.

[48] R. Friedman, A. Mostefaoui, and M. Raynal. Simple and Efficient Oracle-Based

Consensus Protocols for Asynchronous Byzantine Systems. In Proc. of the 23rd

IEEE International Symposium on Reliable Distributed Systems (SRDS), pages

228–237, October 2004.

[49] R. Friedman, A. Mostefaoui, and M. Raynal. Simple and Efficient Oracle-Based

Consensus Protocols for Asynchronous Byzantine Systems. IEEE Transactions

on Dependable and Secure Computing, 2(1):46–56, March 2005.

[50] R. Friedman and G. Tcharny. Evaluating Failure Detection in Mobile Ad-Hoc

Networks. International Journal of Wireless and Mobile Computing, 1(8), 2005.

[51] R. Friedman and R. van Renesse. Strong and Weak Virtual Synchrony in Horus.

In Proc. of the 15th Symposium on Reliable Distributed Systems, pages 140–149,

October 1996.

[52] R. Friedman and R. van Renesse. Packing Messages as a Tool for Boosting the

Performance of Total Ordering Protocols. In Proc. of the Sixth IEEE Interna-

tional Symposium on High Performance Distributed Computing, pages 233–242,

August 1997.

[53] D. Gavidia, S. Voulgaris, and M. van Steen. Epidemic-style Monitoring in Large-

Scale Sensor Networks. Technical Report IR-CS-012, Vrije Universiteit, Nether-

lands, March 2005.

[54] K. Guo and I. Rhee. Message Stability Detection for Reliable Multicast. In

Proc. of IEEE INFOCOM’2000, March 2000.

128


[55] P. Gupta and P. Kumar. Critical Power for Asymptotic Connectivity in Wire-

less Networks. In Stochastic Analysis, Control, Optimization and Applications,

Birkhauser, Boston, pages 547–566, 1998.

[56] Z. Haas. A New Routing Protocol for the Reconfigurable Wireless Networks. In

Proc. IEEE Int. Conf. on Universal Personal Communications (ICUP), October

1997.

[57] Z. Haas, J. Halpern, and L. Li. Gossip-Based Ad Hoc Routing. In Proc. of

the 21st Conference of the IEEE Communication Society (INFOCOM), pages

1707–1716, June 2002.

[58] M. Hayden. The Ensemble System. Technical Report TR98-1662, Department

of Computer Science, Cornell University, January 1998.

[59] M. Hiltunen, R. Schlichting, and C. Ugarte. Survivability Issues in Cactus. In

Proc. of the IEEE Information Survivability Workshop, October 1998.

[60] I. M. A. hoc Networks Working Group. Jitter considerations in Mobile Ad Hoc

Networks (MANETs).

[61] I. M. A. hoc Networks Working Group. The Optimized Link State Routing

Protocol version 2.

[62] F. Ingelrest, D. Simplot-Ryl, and I. Stojmenovic. Broadcasting in Hybrid Ad

Hoc Networks. In Proc. 2nd Annual Conference on Wireless On demand Net-

work Systems and Services (WONS), 2005.

[63] I. Ioannidis and B. Carbunar. Scalable Routing in Hybrid Cellular and Ad-Hoc

Networks. In 1st IEEE International Conference on Mobile Ad Hoc and Sensor

Systems (MASS), October 2004. Poster.

[64] M. Jelasity, R.Guerraoui, A.-M. Kermarrec, and M. van Steen. The peer sam-

pling service: experimental evaluation of unstructured gossip-based implemen-

tations. In Proc. of the 5th Middleware, pages 79–98, 2004.

[65] D. Johnson and D. Maltz. Dynamic Source Routing in Ad Hoc Wireless Net-

works. In Mobile Computing, volume 353. 1996.

129


[66] B. Karp. Geographic Routing for Wireless Networks. PhD thesis, Harvard

University, 2000.

[67] A.-M. Kermarrec and M. van Steen. Gossiping in Distributed Systems. SIGOPS

Operating Systems Review, 41(5):2–7, 2007.

[68] A. Keshavarz-Haddad, V. J. Ribeiro, and R. H. Riedi. Color-based broadcasting

for ad hoc networks. In Proc. of the 4th IEEE Int. Symposium on Modeling and

Optimization in Mobile, Ad-Hoc and Wireless Networks (WiOpt), pages 49–58,

April 2006.

[69] K. Kihlstrom, L. Moser, and P. Melliar-Smith. Solving Consensus in a Byzantine

Environment Using an Unreliable Fault Detector. In Proc. of the Int. Conference

on Principles of Distributed Systems, pages 61–75, 1997.

[70] K. Kihlstrom, L. Moser, and P. Melliar-Smith. The SecureRing Group Com-

munication System. ACM Transactions on Information and System Security,

4(4):371–406, 2001.

[71] Y. Ko and N. H. Vaidya. Geocasting in Mobile Ad Hoc Networks: Location-

Based Multicast Algorithms. In Proc. 2nd IEEE Workshop on Mobile Computer

Systems and Applications, page 101, 1999.

[72] R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong. Zyzzyva: specula-

tive byzantine fault tolerance. In SOSP ’07: Proceedings of twenty-first ACM

SIGOPS symposium on Operating systems principles, pages 45–58, 2007.

[73] L. Lamport. Lower Bounds for Asynchronous Consensus. In A. Schiper,

A. Shvartsman, H. Weatherspoon, and B. Zhao, editors, Future Directions in

Distributed Computing: Research and Position Papers, number 2584 in LNCS,

pages 22–23. Springer, 2003.

[74] L. Lamport, R. Shostak, and M. Pease. The Byzantine Generals Problem. ACM

Transactions on Programming Languages and Systems, 3(4):382–401, July 1982.

[75] A. Laouiti, A. Qayyum, and L. Viennot. Multipoint Relaying: An Efficient

Technique for Flooding in Mobile Wireless Networks. In Proc. 35th IEEE Annual

Hawaii International Conference on System Sciences (HICSS), 2001.

130


[76] P. Levis, N. Patel, D. Culler, and S. Shenker. Trickle: A self-regulating algorithm

for code propagation and maintenance in wireless sensor networks, 2004.

[77] J. Li, J. Jannotti, D. S. J. D. Couto, D. R. Karger, and R. Morris. A Scalable

Location Service for Geographic Ad Hoc Routing. In Proc. 6th Annual Inter-

national Conference on Mobile Computing and Networking (MobiCom), pages

120–130, 2000.

[78] M.-J. Lin, K. Marzullo, and S. Masini. Gossip versus Deterministically Con-

strained Flooding on Small Networks. In Proc14th International Conference on

Distributed Computing 2000, pages 253–267, October 2000.

[79] J. Luo, P. Eugster, and J.-P. Hubaux. PILOT: ProbabilistIc lightweight group

communication system for mobile ad hoc networks. IEEE Trans. on Mobile

Computing, 3(2):164–179, April–June 2004.

[80] M. Macedonia and D. Brutzman. MBone Provides Audio and Video Across the

Internet. IEEE Computer, 27(4):30–36, April 1994.

[81] D. Malkhi, Y. Mansour, and M. Reiter. Diffusion Without False Rumors: on

Propagating Updates in a Byzantine Environment. Theoretical Computer Sci-

ence, 1–3(299):289–306, April 2003.

[82] D. Malkhi and M. Reiter. A High-Throughput Secure Reliable Multicast Pro-

tocol. Journal of Computer Security, 5:113–127, 1997.

[83] P. McDaniel, A. Prakash, and P. Honeyman. Antigone: A Flexible Framework

for Secure Group Communication. Technical Report CITI TR 99-2, University

of Michigan, Ann Arbor, MI, USA, September 1999.

[84] Y. Minsky and F. Schneider. Tolerating Malicious Gossip. Distributed Comput-

ing, 16(1):49–68, 2003.

[85] H. Miranda, S. Leggio, L. Rodrigues, and K. Raatikainen. A Power-Aware

Broadcasting Algorithm. In In Proc. of The 17th Annual IEEE Interna-

tional Symposium on Personal, Indoor and Mobile Radio Communications

(PIMRC’06), September 2006.

131


[86] G. Neiger and S. Toueg. Automatically Increasing the Fault-Tolerance of Dis-

tributed Algorithms. Journal of Algorithms, 11(3):374–419, September 1990.

[87] P. Panchapakesan and D. Manjunath. On the Transmission Range in Dense

Ad Hoc Radio Networks. In Proc. of IEEE Signal Processing Communication

(SPCOM), 2001.

[88] P. Papadimitratos and Z. Haas. Secure Routing for Mobile and Ad Hoc Net-

works. In Proc. Communication Networks and Distributed Systems Modeling

and Simulations Conference, January 2002.

[89] P. Papadimitratos and Z. Haas. Secure Message Transmission in Mobile and Ad

Hoc Networks. Ad Hoc Networks, 1, July 2003.

[90] S. Paul, K. K. Sabnani, J. C. Lin, and S. Bhattacharya. Reliable Multicast

Transport Protocol (RMTP). IEEE Journal on Selected Areas in Communica-

tions, 15(3):407–421, April 1997. Special issue on Network Support for Multi-

point Communication.

[91] M. D. Penrose. Random Geometric Graphs. Oxford Press, 2003.

[92] C. Perkins. Ad Hoc On Demand Distance Vector (AODV)

Routing. Internet Draft, draft-ietf-manet-aodv-00.txt, cite-

seer.nj.nec.com/article/perkins99ad.html, 1997.

[93] S. Pleisch, M. Balakrishnan, K. Birman, and R. van Renesse. MISTRAL: Ef-

ficient Flooding in Mobile Ad-hoc networks. In Proc. of the 7th ACM Inter-

national Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc),

pages 1–12, 2006.

[94] M. Rabin. Randomized Byzantine Generals. In Proc. 24th IEEE Symposium on

Foundations of Computer Science, pages 403–409, 1983.

[95] H. Ramasamy, A. Agbaria, and W. Sanders. Parsimony-Based Approach for

Obtaining Resource-Efficient and Trustworthy Execution. In Proc. 2nd Latin-

American Dependable Computing Symposium (LADC), pages 206–225, October

2005.

132


[96] H. Ramasamy and C. Cachin. Parsimonious Asynchronous Byzantine-Fault-

Tolerant Atomic Broadcast. In Proc. 9th International Conference on Principles

of Distributed Systems, December 2005.

[97] H. Ramasamy, P. Pandey, J. Lyons, M. Cukier, and W. Sanders. Quantifying

the Cost of Providing Intrusion Tolerance in Group Communication Systems.

In Proc. of the IEEE Conference on Dependable Systems and Networks, pages

229–238, 2002.

[98] A. Rao, C. Papadimitriou, S. Shenker, and I. Stoica. Geographic Routing with-

out Location Information. In Proc. 9th Annual International Conference on

Mobile Computing and Networking (MobiCom), pages 96–108, 2003.

[99] M. Reiter. Distributed Trust with the Rampart Toolkit. Communications of

the ACM, 39(4):70–74, April 1996.

[100] O. Rodeh. Secure Group Communication. PhD thesis, School of Computer

Science and Engineering, The Hebrew University of Jerusalem, 2001.

[101] A. Rowstron, A.-M. Kermarrec, M. Castro, and P. Druschel. SCRIBE: The

Design of a Large Scale Event Notification Infrastructure. In Proceedings of 3rd

International Workshop on Networked Group Communication, November 2001.

[102] E. Royer, P. Melliar-Smith, and L. Moser. An Analysis of the Optimum Node

Density for Ad hoc Mobile Networks. In Proc. of the IEEE International Con-

ference on Communications, June 2001.

[103] K. Sanzgiri, B. Dahill, B. Levine, C. Shields, and E. Belding-Royer. A Secure

Routing Protocol for Ad Hoc Networks. In Proc. of the IEEE International

Conference on Network Protocols (ICNP), November 2002.

[104] Y. Sasson, D. Cavin, and A. Schiper. Probabilistic Broadcast for Flooding in

Wireless Mobile Ad hoc Networks. In Proc. of the IEEE Wireless Comm. and

Networking Conference (WCNC), March 2003.

[105] F. B. Schneider. The state machine approach: a tutorial. Technical Report TR

86-800, Department of Computer Science, Cornell University, December 1986.

Revised June 1987.

133


[106] B. Schneier. Applied Cryptography. Wiley, 1996.

[107] D. Scott and A. Yasinsac. Dynamic probabilistic retransmission in ad hoc net-

works. In Proc. of the Int. Conference on Wireless Networks (ICWN), pages

158–164, Las Vegas, Nevada, June 2004.

[108] K. Singh, A. Nedos, G. Gaertner, and S. Clarke. Message Stability and Reliable

Broadcasts in Mobile Ad-Hoc Networks. In Proc. of the 4th ADHOC-NOW,

pages 297–310, October 2005.

[109] I. Stojmenovic, M. Seddigh, and J. Zunic. Dominating Sets and Neighbor Elim-

ination Based Broadcasting Algorithms in Wireless Networks. IEEE Transac-

tions on Parallel and Distributed Systems, 13(1):14–25, January 2002.

[110] A. Tanenbaum. Computer Networks (4th edition). Prentice Hall PTR, 2003.

[111] A. S. Tanenbaum. Computer Networks. Prentice Hall, 1996. 3rd Ed.

[112] C. Toh. Ad Hoc Mobile Wireless Networks. Prentice Hall, 2002.

[113] S. Toueg. Randomized Byzantine Agreement. In Proc. 3th ACM Symposium on

Principles of Distributed Computing, pages 163–178, 1984.

[114] Y.-C. Tseng, S.-Y. Ni, Y.-S. Chen, and J.-P. Sheu. The broadcast storm problem

in a mobile ad hoc network. Wireless Networks, 8(2/3):153–167, 2002.

[115] Y.-C. Tseng, S.-Y. Ni, and E.-Y. Shih. Adaptive approaches to relieving broad-

cast storms in a wireless multihop mobile ad hoc networks. In Proc. of the 21st

International Conference on Distributed Computing Systems (ICDCS), pages

481–488, 2001.

[116] C. University. JiST/SWANS Java in Simulation Time / Scalable Wireless Ad

Hoc Network Simulator.

[117] R. van Renesse, K. Birman, and S. Maffeis. Horus: A Flexible Group Commu-

nication System. Communications of the ACM, 39(4):76–83, April 1996.

[118] R. van Renesse, Y. Minsky, and M. Hayden. A Gossip Style Failure Detection

Service. In IFIP Intl. Conference on Distributed Systems Platforms and Open

Distributed Processing (Middleware ’98), pages 55–70, April 1998.

134


[119] P. Verıssimo, N. Neves, and M. Correia. The Middleware Architecture of MAF-

TIA: A Blueprint. In Proc. of the IEEE Third Information Survivability Work-

shop, October 2000.

[120] E. Vollset and P. Ezhilchelvan. Enabling reliable many-to-many communication

in ad-hoc pervasive environments. In Proc. of the 2nd Intr. Workshop on Mobile

Peer-to-Peer Computing (MP2P), 2005.

[121] C. Wu and Y. Tay. AMRIS: A Multicast Protocol for Ad-Hoc Wireless Networks.

In Proc. of the IEEE MILCOMM, Nov. 1999.

[122] J. Wu, M. Gao, and I. Stojmenovic. On Calculating Power-Aware Connected

Dominating Sets for Efficient Routing in Ad Hoc Wireless Networks. In Proc.

of the 30th International Conference on Parallel Processing (ICPP), pages 346–

353, 2001.

[123] J. Wu and H. Li. On Calculating Connected Dominating Sets for Efficient

Routing in Ad Hoc Wireless Networks. In Proc. of the 3rd DialM, pages 7–14,

1999.

[124] S. Yi, P. Naldurg, and R. Kravets. Security Aware Ad Hoc Routing for Wire-

less Networks. In Proc. ACM Syposium on Mobile Ad Hoc Networking and

Computing, October 2001.

[125] J. Yin, J.-P. Martin, A. Venkataramani, L. Alvisi, and M. Dahlin. Separating

Agreement from Execution for Byzantine Fault Tolerant Services. In Proc. of the

19th ACM Symposium on Operating Systems Principles, pages 253–267, 2003.

[126] M. Zapata and N. Asokan. Secure Ad Hoc Routing Protocol. In Proc. ACM

Workshop on Wireless Security, 2002.

[127] Q. Zhang and D. P. Agrawal. Dynamic probabilistic broadcasting in MANETs.

Journal of Parallel Distributed Computing, 65(2):220–233, 2005.

135


136


wed-c` zezyxa mipin` zxeywz ilewehext

oiwaxc mic`e


wed-c` zezyxa mipin` zxeywz ilewehext

xwgn lr xeaig

x`ez zlawl zeyixcd ly iwlg ielin myl

diteqelitl xehwec

oiwaxc mic`e

l`xyil ibelepkh oekn — oeipkhd hpql ybed

2008 ipei dtig g"qyz oeiq


aygnd ircnl dhlewta oncixt irex g"text ziigpda dyrp xwgnd

izenlzyda daicpd zitqkd dkinzd lr oeipkhl dcen ip`


mipiipr okez

1 zilbpà xivwz

3 zilbpà mixeviwe milnq

5 dncwd 1

11 zexeyw zecear 2

11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . aaexn xeciy 2.1

16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zelitp i`lbe zeizpfia zelitp 2.2

16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . letkye zizveaw zxeywz 2.3

21 èan 3

21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . wed-c` zezyx ly lcen 3.1

22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zeipecf zelitp 3.2

22 . . . . . . . . . . . . . . . . . . . . . . . miipecf miznv ly gekd lr zelabn 3.2.1

24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zelitp i`lb 3.3

25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zelitp i`lb ly wynn 3.3.1

29 zeiheg-l` wed-c` zezyxa dpin` zizexazqd dvtd 4

29 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lcend zxcbd 4.1

30 . . . . . . . . . . . . . . . . . . . . . . . . . . . zerced ly dpin` dvtdl zewipkh 4.2

30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zizexazqd dvtd 4.2.1

`


(jynd) mipiipr okez a

34 . . . . . . . . . . . . . . . . . . . . . . . . zerced zxitq lr zqqean dvtd 4.2.2

36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zihi` zelikx 4.2.3

37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RAPID lewehext 4.3

38 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iqiqa RAPID 4.3.1

40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . agxen RAPID 4.3.2

43 . . . . . . . . . . . . . . . . . . . . zeipecf zelitp ipta cinr xy` RAPID 4.3.3

48 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zeivleniq 4.4

48 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . divleniqd zxevz zxcbd 4.4.1

50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ze`vez 4.4.2

59 zeiheg-l` wed-c` zezyxa izxeywz cly zqqean dpin` dvtd 5

59 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lcend zxcbd 5.1

59 . . . . . . . . . . . . . . . . . . . . . . . . . . znev ly dxehwhikx`e zelitp i`lb 5.2

61 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . dvtdd zira 5.3

61 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . dvtdd lewehext 5.4

62 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . dvtdd oepbpn ly hexit 5.4.1

62 . . . . . . . . . . . . . . . . . . . . . . . . zercedd znlyd oepbpn ly hexit 5.4.2

66 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . izxeywz cly zwefgz 5.5

68 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zepekp zgked 5.6

70 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lewehextd gezip 5.6.1

72 . . . . znieqn dtewz ixg` dreh `l xy` zelitp i`lb mr dxidn dvtd 5.6.2

72 . . . . . . . minieqn onf iwxt lr lret xy` zelitp i`lb mr dxidn dvtd 5.6.3

75 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ze`vez 5.7

81 zeihpfia zetwzd ipta dcinr zizveaw zxeywz 6

81 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lcend zxcbd 6.1


b (jynd) mipiipr okez

81 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ceqi ibyen 6.1.1

83 . . . . . . . . . zeizpfia zelitp ipta dpiqg xy` zil`ehxie divfipexkpiq 6.1.2

87 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . oexztd ly zillk dxiwq 6.2

87 . . . . . . . . . . . . . . . . . . . . dveawa ziwlg zexage JazzEnsemble 6.2.1

89 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . miiwlg zelitp i`lb 6.2.2

91 . . . . . . . . . . . . . . . . . . . . . . . . . . . . dveawd jeza dpin` dvtd 6.2.3

91 . . . . . . . . . . . zeihpfia zetwzd ipta cinr xy` dveawa zexag ledip 6.2.4

101 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . `ln xcq 6.2.5

102 . . . . . . zizveawd zxeywzd lewehextt ly oiipia ipa` ly liri yenin 6.2.6

109 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ze`vez 6.3

115 miicizr mipeeike mekiq 7

115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xwgnd ze`vez mekiq 7.1

118 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . miicizr mipeeik 7.2

121 ziyrn divwilt` '`

122 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zicizr dcear '`.1

122 zexewn zniyx

`i agxen xivwz


(jynd) mipiipr okez c


mixei` zniyx

26 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zelitp i`lb ly wynn 3.1

33 . . . . . . . . . . . . . m drced lawi `l edylk znevy zexazqd lr oeilr mqg 4.1

35 . . nk,... ,n1 ,p :ely xeciyd geeha miznvd lk lv` lawzi s znev ici lr xeciy 4.2

39 . . . . . . . . . . . . . . . . . . . . . . . . . (p znev ici lr rvean) iqiqa RAPID 4.3

zipaln dqtewa ze`vnp 4.3 xei`l ziqgi ezpey xy` zexey) agxen RAPID 4.4

41 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (etqep 27--18 zexeye

4.4 xei`l ziqgi ezpey xy` zexey) zeipecf zelitp ipta cinr xy` RAPID 4.5

44 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (zipaln dqtewa ze`vnp

zelzk ,miciip miznvd xy`k miznvd lk lv` elawzdy zerced jqn feg` 4.6

51 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . mipey β ikxra

51 . . . . . . . . . . . mipey β ikxra zelzk ,miciip miznvd xy`k zyxd lr qner 4.7

zelzk ,miciip miznvd lk xy`k miznvdn X% l drced xiardl xefg` onf 4.8

52 . . . . . . . . . . . . . (zeycg zerced migley miznv 100 xy`k) mipey β ikxra

zelzk ,miciip miznvd lk xy`k miznvdn 98% l drced xiardl xefg` onf 4.9

52 . . . . . . . . . . . . . (zeycg zerced migley miznv 100 xy`k) mipey β ikxra

zelzk ,miciip miznvd xy`k miznvd lk lv` elawzdy zerced jqn feg` 4.10

52 . . . . . . . . . . . . . . . . . . . . . . . zeycg zerced mixcynd miznvd xtqna

zerced mixcynd miznvd xtqna zelzk ,miciip miznvd xy`k zyxd lr qner 4.11

52 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zeycg

100 xy`k) miciip miznvd lk xy`k miznvdn X% l drced xiardl xefg` onf 4.12

53 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (zeycg zerced migley miznv

d



53 . . . . . . (zeycg zerced migley miznv 100 xy`k) miikep`d miznvd xtqna

mixcynd miznvd xtqna zelzk miznvd lk lv` elawzdy zerced jqn feg` 4.14

53 . . . . . . . (migiipe miciip miznvd xy`k milewehext z`eeyd) zeycg zerced

-xt z`eeyd) zeycg zerced mixcynd miznvd xtqna zelzk zyxd lr qner 4.15

53 . . . . . . . . . . . . . . . . . . . . . . . . (migiipe miciip miznvd xy`k milewehe

zelzk ,migiip miznvd xy`k miznvd lk lv` elawzdy zerced jqn feg` 4.16

54 . . . . . . . . . . . (zeycg zerced migley miznv 100 xy`k) miznvd zetitva

100 xy`k) miznvd zetitva zelzk ,migiip miznvd xy`k zyxd lr qner 4.17

54 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (zeycg zerced migley miznv

60 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . znev ly dxehwhikx` 5.1

63 . . . . . . . . . . . . . . . . . . zeipecf zelitp ipta cinr xy` dvtd ly mzixebl` 5.2

64 . . . . . . . . . . . . jynd -- zeipecf zelitp ipta cinr xy` dvtd ly mzixebl` 5.3

71 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ipecf zxeywz cly 5.4

75 . . . . . . . migiip miznvd xy`k miznvd lk lv` elawzdy zerced jqn feg` 5.5

75 . . . . . . . . . . . . . . . . . . . . . . . . . . migiip miznvd xy`k zyxd lr qner 5.6

200 xy`k) migiip miznvd lk xy`k miznvdn X% l drced xiardl xefg` onf 5.7

76 . . . . . . . . . . . . . . . . . . . . . . . . (diipy lk zeycg zerced migley miznv

200 xy`k) miciip miznvd lk xy`k miznvdn X% l drced xiardl xefg` onf 5.8

76 . . . . . . . . . . . . . . . . . . . . . . . . (diipy lk zeycg zerced migley miznv

77 . . . . . . . miciip miznvd xy`k miznvd lk lv` elawzdy zerced jqn feg` 5.9

77 . . . . . . . . . . . . . . . . . . . . . . . . . . miciip miznvd xy`k zyxd lr qner 5.10


78 . . . . . . . . . . . . . . . . . . . . . (miznv 200 jezn) miipecfd miznvd xtqna


78 . . . . . . . . . . . . . . . . . . . . . (miznv 200 jezn) miipecfd miznvd xtqna

e


jezn) miipecfd miznvd xtqna zelzk ,migiip miznvd xy`k zyxd lr qner 5.13

79 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (miznv 200

jezn) miipecfd miznvd xtqna zelzk ,miciip miznvd xy`k zyxd lr qner 5.14

79 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (miznv 200

zelzk ,migiip miznvd lk xy`k miznvdn X% l drced xiardl xefg` onf 5.15

80 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . miipecfd miznvd xtqna


80 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . miipecfd miznvd xtqna

82 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . znev ly dxehwhikx` 6.1

ly jixcnn gwlp xei`d) Ensemble ly zeaky jeza zercedd ly rcine zexzek 6.2

89 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (Ensemble

92 . . . . . . . . . . . . . . . . . . . . . . . . . . . . zexagd lewehext ly cew-ecaqt 6.3

93 . . . . . . . . . . . . . . . . . . . . . . . . . . . . zecygd lewehext ly cew-ecaqt 6.4

96 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . befind lewehext ly cew-ecaqt 6.5

99 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . y`ltd lewehext ly cew-ecaqt 6.6

103 . . . . . . . . . . . . . . . . . . . . . . pi jildz lk i"r miwfgeny mixwir mipzyn 6.7

♦Pmute a ynzyny mikxr ly mixehwe xear zizpfia dnkqd ly lewehext 6.8

104 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (n > 6f ) pi jildz i"r rveane

108 . . . . . . . . . . . . . . . . . . . . pi znev i"r rveany dcig` dvtd ly lewehext 6.9

ly mirevia lnqny ewd) miznvl exaredy zercedd zenka zkxrnd zwetz 6.10

x`yl dèeydd 0 l aexw èdy oeeikn ,d`xp iyewa ianet gztn zervnà dptvd

110 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (mieewd

oeeikn cxed ianet gztn zervnà dptvd ly mirevia lnqny ewd) xefg` onf 6.11

x`yn lceb ixcqa lecb ianet gztn mr dptvda miynzyn xy`k xefg` onfy

110 . . . . . . . . . . . . . . . . (ianet gztn mr dptvda miynzyn `ly xefg`d ipnf

mr zerced ly dcig` dvtde zerced ly `ln xcq ly xign :zkxrnd zwetz 6.12

112 . . . . . . . . . . . . . . . . . . . . . . . . . . . ixhniq gztn zervnà dptvd ilae

f


112 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ycg han ly dpwzdd cr onf 6.13

122 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WiPeer ly dxehwhikx` '`.1

g


ze`lah zniyx

xtqna zelzk zyxd lr qnere miznvd lk lv` elawzdy zerced jqn feg` 4.1

57 . . . . . . . . . . . (zeycg zerced migley miznv 100 xy`k) miikep`d miznvd

113 . . . . . . . . . . . . . . . . . . . . . . . . . . . miziira miyigxzn zeyye`zd onf 6.1

h


i


agxen xivwz

zeivwiltà odly lìvphetd awr iaiqphpi` oteà exwgp zeihegl` zezyx zepexg`d mipya

.wed-c`d zezyx zgtyn dpid zeihegl` zezyx ly daeyg dgtyn .cg`k zeiàve zeigxf`

zxeywz zleki ilra mipwzd ly sqe` xy`k zexvep (MANET ) zeihegl` wed-c` zezyx

yeniya jxev `ll zin`pic dxeva zexvep dl` zezyx .ipyd ly ezaxwa cg` mi`vnp zihegl`

,mixg` mipwzd ly zerced xiardl oken el` zezyxa mipwzddn wlg xy`k .idylk zizyza

.mited zaexn wed-c` zyx zxvep

xy` ,àd ixwird zxeywzd meicnl jetdl lìvphetd melb zeiwed-c` zeihegl` zezyxa

.rcin mikxev sè ,rcin mitzyn ,rcinl miybip ep` da jxca hpxhpi`d znbeck dktdn llegi

meyl zewewf opi`y dcaerde ,odly zinvrd zeiledipd ,el` zezyx ly daxd zeyinbd lya z`f

ède ynzyinl mpig èd ,zizyza drwyd jixvn epi` oda yeniyd jk awre ,zifit zizyz

xarn okle ,cgeina ddeab zecixy zelra opid wed-c` zezyx ,sqepa .mideab dxard iaviw lra

,ipxcend axwd dcy ledipl ,oeq` ixef`l cgeina zeni`zn mb od ,odly miineineid miyeniyl

.rerx avnay e` zeniiw `ly e` zeizyzd oday ,zegztzne zelykp zevx`le

i"r dyrp el` zezyxa aezipde idylk zizyz lr zeknzqn `l zeciip wed-c` zezyx

.lcben geeh zlra wed-c` zyx zxvep jk ici lre xqnnd zepgz z` mieednd ,mnvr mipwzdd

.dyixc it lr zexgap xqnn zepgze sxd `ll dpzyn zyxd ly dxevzd ,el` zezyxa sqepa

-c` zezyxa ce`n zihew` aezipd zira .wed-c` dxeva dyrp aezipd `yep lk ,zexg` milina

,zepin` zeieyik zeaygp llk jxca xqnn zepgz oda ,zizyz zeielz zezyxl cebipae xg`n ,wed

lr zrvazn zercedd zxard `l` ,odilr jenql ozipy xqnn zepgz oi` zeiwed-c` zezyxa

lagl eqpi sè ,mipin` `l eidi mipwzddn wlgy xiaq iekiq miiw okle ,mnvr mipwzdd ici

lr xenyl miqpny mipwzd :dnbecl) miikep` e` miipecf mipwzd ly meiw ,okl .zyxd zelirta

milewehext ly gezit aiign zkxrna (mixg` mipwzd ly zerced xiardl `le mdly dixhad

.ef zebdpzd ipta micinry

owzdl xyt`n èdy oeeikn ,zeiteziy zeivwilt` daxdl aeyg zexiy deedn zerced zvtd

ì


,dpinè dliri zeidl dkixv zerced zvtd ,hxta .zyxa mipwzdd lkl zelwa rcin uitdl edylk

zexiy zervnà dglypy drced lk elawi zyxa mipwzdd aexy gihadl ,zexg` milna e`

.zercedd zvtd

xy`n mikaeqne miakxen xzei mizexiy zeyxec zeiteziyd zeivwilt`dn wlg ,z`f mr

-`n xy` ,zizveaw zxeywz zkxrn ly mizexiyn zepdl elkei od okle zercedd zvtd zexiy

dkeza zcb`n zizveaw zxeywz zkxrn .ziteziy zxeywzl mizexiy ly agx ce`n oeebn zxyt

oia zxeywzd lr zizernyn dxeva miliwn xy` mipey mizexiye minzixebl` ,gezit ilk zekxr

-xyt`n aexl zizveaw zxeywz zekxrn .zeiteziy zeivwilt` gezit lr hxtae dveawd ixag

xeciq ,zilèhxie divfipexkpiq ,dpin` dvtd oebk milewehext siqedl zeivwilt`d igztnl ze

zxeywz zkxrna yeniy okle gezitl miakxen ce`n aexl el` milewehext .'eke zerced ly `ln

zeivwilt`d ly zepin`d z` licbn sè zeivwilt`d gezit onf z` daxda xvwn zizveaw

ektdp zizveaw zxeywz zekxrn zepexg`d mipya .zizveaw zxeywz zkxrn zxfra zeazkpd

.zeixgqne zeincw` mixhqlw zekxrna zeihxcphq oipa ipa`l

egzety zekxrnd aex ,zizveaw zxeywz zekxrna drwyedy daxd zixwgnd dcear zexnl

zxeywzd zekxrnn wlg ,hxta .zeihpfia zelitp llek `l aexly heyt zelitp lcen zegipn

ixag lky gipn zekxrnd ly rixknd aexde ,zerced ly zeniè zenizga zeknez zizveawd

leki jildz eay ihpfia zelitp lcenl xenb cebipa df .zkxrnl wifdl eqpi `le mipiwz dveawd

,bàn d`vezk mxbdl dleki lewehextdn ef diihq .ely lewehextdn zi`xw` dxeva zehql

.zipecf zebdpzd e` dxneg zira

zeihpfia zelitpn enlrzd zizveawd zxeywzd zekxrn aexy jkl zeixwird zeaiqd zg`

LAN ezeà evxy mixhqlw zekxrna yeniya eid zizveawd zxeywzd zekxrn aexy ìd

dxizi .mdilr jenql xytè mipiwz md LAN ezeà mi`vnpy mipwzdy ìd zgeexd dgpdde

zelitp mr zeccenzdn d`vezk sqepy jeaiqd mr cgi z`fd dgpdd z` mixagn xy`k ,z`fn

-pa zeaygzd `ll egzet zekxrnd aex ,zkxrnd ly mireviaa zizernyn dribt mbe zeihpfia

.zeihpfia zelit

zxeywz zekxrna ynzydl oevxd mbe miaygn zekxrn cbp zetwzdd zenka diilrd ,z`f mr

zekxrnd ly jxevd z` ycgn xxer ,zeiwed-c`d zezyxd zaiaq oebk ,zetqep zeaiaqa zizveaw

szzyiy owzdy iekiqd ,wed-c` zezyxay oeeikn ,dxw df .zeihpfia zelitp ipta cinr zeidl

jiynz eply zkxrndy mivex epgp` m` ,jk itl .gipf `l xak ihpfia didi zizveaw zxeywza

dvxp epgp` ,ok onk .zeihpfia zelitp ipta dcinr zeidl dkixv ìd ,ziwed-c` daiaqa mb cwtzl

epgp` ,hxta .ziyeniy `l didiz ìd zxg` ,mixiaq ex`yi zkxrnd ly mireviady gihadl

ai


cwnzdl aeygy mipin`n epgp` okle dkenp ziqgi zeihpfiad zelitpd zenk llk jxcay migipn

zeidl ,zeihpfia zelitp zexew xy`k ,z`f mr .zeihpfia zelitp oi` xy`k zkxrnd ly mireviaa

.ziyeniy dcear rval jiyndle odilr xabzdl milbeqn

dnk icn xywzi dveawa xag lky yxec zipiite` zizveaw zxeywz zkxrn ly oeiti`d

wx zerced lawl leki p owzd ,mited zeaexn zezyxa ,z`f mr .dveawd ixag x`y mr onf

xiardl ekxhvi mipwzddn wlg ,jk itl .mdly xeciyd geehn ohw p n mwgxny mipwzdn

dxeva ebdpzi mipwzddn wlg m` jaeqn xzei s` df .mdly mipkyl mixg` mipwzdn zerced

wed-c` zyxa zizveaw zxeywz zkxrn ynnl zpn lr ,okl .zerceddn wlg exiari `le zihpfia

.zyxa mipwzdd lkl zerced ly dpin` dvtd xyt`n xy` lewehexta jxev yi mited zaexn

milna .dtvdd lewehext zervnà df mited zaexn zyxa zerced ly dvtdl dheytd jxcd

lk ,okn xg`l .ely xeciyd geeha mipwzdd lkl dze` xiarn ,drcedd z` xviiny owzd ,zexg`

onfa .ely xeciyd geeha mipwzdd lkl dze` xiarn ,dpey`xd mrta drcedd z` lawny owzd

.zeieybpzd ly ax xtqnl mexbl dlekie zipfafa ce`n ìd ,dpin` ce`n ìd efd dvtdd zxevy

.mipwzd ly zyx zz iab lr dtvde zizexazqd dtvd od dtvdl zegiky zeaihpxhl`

hilgdl eciwtzy il`wel aeyig rvan èd ,drced lawn owzd xy`k ,zizexazqd dyiba

miheyt ce`n aexl md mizexazqd milewehext .`l e` drcedd z` xcyl jixv owzdd m`d

milewehext ly zepin`dy zpn lr z`f mr .mzrepz e` mipwzd ly zelitp ipta micinr mbe

zenk oiicr jkn d`vezke ddeab ic zeidl dkixv xeciyl zexazqd ,ddeab didiz miizexazqd

.ce`n dlecb zeglypy zexzeind zercedd

wgxn lr miqqazny mil`wel miaeyigl zizexazqd dtvd oia zealyn zetqep zeyib

mciwtzy ,'eke owzd ly mewin ,rny owzddy beq eze`n zercedd zenk ,xcyny owzddn

zenk z` oihwdl zexyt`n mpne` el` zehiy .drcedd z` xcyl jixv owzdd m`d hilgdl

.mipwzdd lkl zercedd zvtda dlecb diidydn zelaeq od ipy cvn j` ,cg` cvn mixeciyd

zeibeleteh xear mipwzdd lkl zerced ly dpin` dvtd gihadl zeleki `l el` zehiy ok enk

.mipwzd ly zeipecf zetwzd mr ccenzdl zeywzn mb ode zepey zyx

-c` zeihegl` zezyxa zizveaw zxeywz zekxrn zn`zda zwqer ef hxehwecd zcear

.jci`n zeihpfia zetwzdl oiqg didi j` ,cgn miaehd odirevia z` xnyiy oteà zeiwed

-c` zeihegl` zezyxl zn`zeny zizveaw zxeywz zkxrna epynzyd epizcearl qiqak

mileki mipier mipwzd mikxc elià eppga dceard jldna .JazzEnsemble z`xwpy zeiwed

xy` milewehext epgzite hxta JazzEnsemble le llka zizveaw zxeywz zkxrnl wifdl

zezyxa mipwzdd oia zxeywzd igeeh z` licbdl zpn lr ,sqepa .el` zetwzd ipta mipiqg

bi


zezyxa mipwzdd lkl zerced ly dpin` dvtd mixyt`n xy` milewehext ipy epgzit wed-c`

mipwzdd lkl zerced xiardl zpn lre BDP `xwp oey`xd lewehextd .zeiwed-c` zeihegl`

md ef zyx zza mixagy mipwzd .mnvr mipwzddn diepay zyx zz wfgzne dpea èd zyxa

zenky jka epid BDP ly oexzid .mipwzdd lkl zercedd lk ly dvtdd lr mi`xg`y el`

okle zixewnd zyxa mipwzdd zenkl ziqgi dphw ìd ef zyx zza mitzzyny mipwzdd

-xhl`l ziqgi dphw mipwzdd lkl zercedd lk z` xiardl zpn lr zeglypy zercedd zenk

izexazqd lewehext epid RAPID. RAPID `xwp epgzity ipyd lewehextd .zexg`d zeaihp

dvtd gihadl zpn lr .mited zeaexn wed-c` zezyxa zerced ly dpin` dvtd gihan xy`

ihqipinxhc jildz uixn RAPID ,zizexaqdd dvtdl sqepa ,mipwzdd lkl zerced ly dpin`

wlg zexqg mipwzd dnkle dxwnae zercedd lk z` elaiw mipwzdd lky wcea xy` sqep

milikny mipwzdd i"r enlyei zexqgd zerceddy bèc ihqipinxhcd lewehextd ,zerceddn

-ip zeihegl` zezyxl BDP xy`n xzei daeh dxeva m`zen df lewehext .elld zercedd z`

zizyz lr zknzqn `l ìde zizexazqd ìd dvtdd RAPID a ,BDP l cebipay oeeikn zeci

.mipwzdd ly mnewin lr e` idylk

,WiPeer z`xwpy divwilt` epgzit zeihegl` wed-c` zezyx ly xwgndn wlgk ,sqepa

xywzle dycg zihegl` zyx xevil ipyd ly ezaiaqa cg` mi`vnpy mipwzdl zxyt`n xy`

.idylk zizyza zelz `ll ipyd mr cg`

ci


dependable communication protocols in ad-hoc networks

Documents