distributed deterministic 1–2 skip list for peer-to-peer system

Peer-to-Peer Netw. Appl.DOI 10.1007/s12083-013-0222-6

Distributed deterministic 1–2 skip list for peer-to-peersystem

Subhrangsu Mandal · Sandip Chakraborty ·Sushanta Karmakar

Received: 1 November 2012 / Accepted: 30 June 2013© Springer Science+Business Media New York 2013

Abstract Data management in the peer-to-peer system isa challenging task due to the random distribution of dataamong several participating peers. Efficient data structureslike distributed hash tables (DHT) and its variants aredesigned and implemented to reduce the complexity of datamanagement in such environment. However, DHT has itslimitations in supporting range queries and its variants likedistributed segment trees often perform poorly when thenumber of peers is high. Further, distributed lists and dis-tributed balanced trees require significant amount of timefor stabilizing after a new peer joins or a peer leaves. Inthis paper, a new distributed data structure called determin-istic 1–2 skip list is introduced as an alternate solution fordata management in the peer-to-peer systems. A determin-istic skip list can be viewed as an alternate of a balancedtree, where the semantic locality of each key is preserved.Thus it can support the range queries as well as the singleshot queries. This paper proposes three main operations onthis data structure - searching data based on keys, insertionwhen a new peer joins, and deletion when a peer leaves.The correctness of the proposed operations are analyzedusing theoretical arguments and mathematical proofs. The

A preliminary version of this work appears in the proceedings ofthe Second IEEE International Conference on Parallel, Distributedand Grid Computing, 2012 [26].

S. Mandal · S. Chakraborty (�) · S. KarmakarDepartment of Computer Science and Engineering,Indian Institute of Technology Guwahati,Guwahati, Assam, 781039 Indiae-mail: [email protected]

S. Mandale-mail: [email protected]

S. Karmakare-mail: [email protected]

proposed scheme is simulated using NS-2.34 network sim-ulator, and the efficiency of the scheme has been comparedwith DHT, DST, distributed list and distributed tree baseddata management.

Keywords Deterministic skip list · 1–2 skip list · Datastructure · Range queries

1 Introduction

Data management in the peer-to-peer system is a chal-lenging task due to the random distribution of data amongseveral peers. Efficient data structures are designed andimplemented to reduce the complexity of data searching insuch an environment. One of the widely used data struc-ture is distributed hash table (DHT) which is implementedfor structured peer-to-peer [11], overlay networks [20, 24]and content distribution systems [1]. DHTs are used to look-up data in a decentralized environment based on (key,value)pair, similar to hash tables. The advantage of DHT is itsscalability and fault tolerant architecture that makes it usefulfor several distributed search applications. However, DHTis based on hash based look up which destroys the seman-tic locality of the keys. In DHT all the keys are hashedusing a hash function, which maps each key from the keydomain to the hash value domain. Because of the non lin-ear nature of the hash function, it is difficult to preservethe semantic locality of the keys in the hash value domain.So DHT performs inefficiently for range queries, and canonly provide service for single-shot queries. Hence appli-cations that require range queries cannot be implementedefficiently using DHT. The range queries often involve high-dimensional and multi-attribute ranges [3, 18, 35] whichrequires the DHT to flood the network with lots of search

mailto:[email protected]



Peer-to-Peer Netw. Appl.

packets for every individual queries. In [43], Zheng et al.have introduced distributed segment trees (DST) to supportrange queries over DHT. DST is a balanced tree structure,where every intermediate node stores the interval of thesub-tree rooted at that node. To deal with skewed data dis-tribution, DST imposes a threshold δ to limit the number ofkeys at each node. If a node has already δ keys, any newinterval is relayed to its child nodes. DST becomes ineffi-cient when leaf nodes get saturated, i.e. they have already δ

keys. Thus scalability is a problem for DST, and it performspoorly for large number of nodes.

A solution to the above problem requires an efficient listor tree-like data structure that preserves the semantic local-ity of the keys, and supports range queries. A set of workshave been proposed in literature [10, 12, 16, 23, 42] tosupport range queries effectively over peer-to-peer systems.However, distributed list based [10, 42] or tree based [12,16, 23] data structures require significant amount of time forstabilizing when a new peer joins, or a peer leaves [7, 30].Skip list is a data structure that have the properties for botha list and a balanced tree, and thus makes searching efficientfor both single-shot queries and range queries.

Skip List [31] has been introduced initially as a central-ized data structure which is a probabilistic alternative ofbalanced tree. The idea behind the design of this data struc-ture is quite simple. To get the idea of skip lists, we can startfrom a sorted linked list. To search any element in a sortedlinked list, we might have to traverse all the nodes of the list.Here the search complexity is O(n) where n is the total num-ber of elements. Suppose we have two sorted linked lists,where the second linked list is made with subset of the ele-ments from the first list. Here each node in the second listhas a pointer to the node (with equal key value) of the firstlist. The nodes in the second list is determined probabilisti-cally. To search an element, we can walk right to the secondlist until we find a node which has greater key value. Afterthat we have to walk down to the first list, and walk rightto the first list until we find the key value or a greater keyvalue. For this case search cost is 2

√n. Now if we maintain

three sorted lists, search cost will be 3 3√

n. If we maintainlog n number of lists search cost will be 2 log n. Here log n

sorted lists are like balanced binary tree. Figure 1 shows askip list. Suppose we need to search the data item 10 in thisskip list. Then from node with value 6, we can directly jumpto the node with the value 10, skipping the node with thevalue 8. With a geometric or negative binomial distribution

N

U

L

L

2 46

8

10

12

Fig. 1 A skip list

of references to the next nodes, the search, insert and deleteoperation in a skip list can operate in the order of log n inaverage cases, where n is the total number of nodes [31].

Being probabilistic in nature, skip list has some draw-backs. Nodes those are in the next level, are determinedprobabilistically. Thus the shape of the skip list is not deter-ministic in nature. So it is not possible to give an upperbound of worst case insert and maintenance cost. To over-come this difficulty, few deterministic versions of skip listhave been proposed in literature. In [27], the authors havedefined a simple version of deterministic skip list called 1–2 skip list. A 1–2 skip list is a deterministic skip list, wherethere would be either one node or two nodes of height (h-1)between any two nodes of height h or more. Here height ofa node denotes that the maximum level of list up to whichthe node belongs to. This deterministic 1–2 skip list has anone-to-one correspondence with the 2–3 tree. This idea isfurther extended for 1-2-3 skip list. For these types of skiplists, there should be either 1 or 2 or 3 nodes of height (h-1)between any two nodes of height h. Deterministic 1–2 skiplist behaves like balanced 2-3 tree. So we are more inter-ested to this type of deterministic skip list. Figure 2 shows adeterministic 1–2 skip list.

In this paper, we have considered deterministic 1–2 skiplist as an alternate data structure for efficient searching inpeer-to-peer system, as it resembles a balanced 2-3 tree, andthus having the properties of both a list and a tree. Becauseof its deterministic link characteristics, this data structurecan provide robustness and efficiency in data managementin a distributed way. The algorithms for search, insert anddeletion in a distributed deterministic 1–2 skip list areprovided with the formal specifications and analysis. Theproposed set of algorithms differs from the self-stabilizingTiara [8] and Corona [28] architectures in the sense that itdoes not require periodic message broadcasts. Whenever anew node requires to be inserted in an already existing skiplist, or a node from the skip list is deleted, the proposed setof algorithms stabilizes the network on-the-fly. In summary,this paper has following contributions,

– It describes a set of algorithms for operating determinis-tic 1–2 skip list in a distributed environment for efficientsearching. The insertion and deletion algorithms pro-vide effective maintenance of this data structure. Thecorrectness and completeness of the algorithms are alsoanalyzed.

1517

20

11

864

2

NULL

Fig. 2 A deterministic 1–2 skip list


– The search, insertion, and deletion algorithms requireO(log n) messages for the data structure to be stabi-lize. These complexities are similar with the centralizedalgorithms provided in [27].

– The proposed algorithms are also simulated using net-work simulator NS-2.34 [29]. The delay for searchinga data item, and the delay of stabilization for the inser-tion and deletion operations are also compared withother similar data structures used in peer-to-peer sys-tems, such as DHT, DST distributed lists and distributedbalanced trees.

The rest of the paper is organized as follows. Section 2gives a brief overview of the related works on dis-tributed skip-list. Section 3 provides a general system modelfor the design of proposed distributed deterministic 1–2skip list. Sections 4, 5 and 6 provide the details of search,insertion, and deletion procedures in a distributed deter-ministic 1–2 skip list along with their theoretical analysis.Section 7 reports the results obtained from simulation usingNS-2.34 network simulator framework. Section 8 gives abrief justification about why the proposed data structure canbe used effectively as an alternate to the DHT. Section 9provides a brief discussion about the concurrency, fault-tolerance, and optimization of the proposed scheme. FinallySection 10 concludes the paper.

2 Related works

There are several variants of skip lists proposed in literature.Hanson et al. proposed a data structure called “Interval skiplist” (IS-List) [19] to support interval indexing. This IS-Listallows stabbing queries as well as dynamic insertion anddeletion of intervals. Goodrich et al. have used skip list forefficient and practical techniques for dynamically maintain-ing an authenticated dictionary [17]. Skip list is also usedin graphics acceleration in visualization of complex envi-ronments [13], to speed up the inverted index look up for aweb scale search engine [5], and processing of forecastingquery [15] etc.

All the variants of the skip-lists stated above are designedfor centralized environment only, where all the data ele-ments and links are available to a single process. Harveyet al. have proposed a scalable overlay network called Skip-Net [20]. SkipNet preserves the semantic locality, and guar-antees routing locality by organizing data using string name.Thus it overcomes the major shortcomings of the other over-lay networks like Chord [36], CAN [33] etc. They have usedskip list architecture as the basis of the overlay. The stringname identifiers are used as the key of the skip list. As main-tenance of perfect SkipNet is difficult in the presence ofan insertion and a deletion, they have used the randomized

version of the skip list. Because of the random nature of thedata structure, it is difficult to determine the shape of theskip list which might create problem with the large numberof nodes.

A distributed variant of skip-list is introduced in [2],called skip graph, to provide functionality like DHT. Thisdata structure is based on the randomized version of skip listwhich is designed for distributed environment. They haveused several lists at each level of the skip graph and eachnode is available in any of these lists. This modification hasbeen done to increase redundancy and to incorporate faulttolerance and load balancing. As skip graph combines mul-tiple skip lists, scalability becomes an issue for large numberof nodes. Liu et al. have developed a peer-to-peer asyn-chronous video streaming system using skip list [39]. Theyhave used a randomized, distributed skip list to overcomethe challenge of on-demand streaming with asynchronousrequests and in general, with VCR-like interactions, for avideo streaming system. They have found that the skip listbased overlay is highly scalable, with smooth playback fordiverse inter-activities, and low overheads.

There are some preliminary works in literature that intro-duce skip list as an alternate data structure for efficientsearching in distributed peer-to-peer and overlay networks.Clouser et al. have proposed a peer-to-peer system calledTiara [8], in which they have used a self stabilizing sparse0-1 skip list on top of a self stabilizing sorted list. Theyhave proposed a self-stabilizing algorithm for a sorted listfirst, and then have extended it to the sparse 0-1 skip lists.The proposed stabilization scheme is based on shared regis-ter communication model. Recently, Scheideler et al. haveproposed a self stabilizing deterministic message-passingskip list called Corona [28]. In this scheme all the pro-cesses forwards their status at any level to their neighborsby passing messages periodically. Every node periodicallybroadcast messages to check for any violations in the condi-tions of the deterministic 1–2 skip list. This scheme providesa basic approach to self stabilize a deterministic 1–2 skiplist in distributed environment. The main drawback of thisscheme is that all processes periodically broadcast mes-sages. This results unnecessary flow of messages when noinconsistency is there.

The proposed set of algorithms in this paper overcomethe shortcomings of existing architectures, as proposedin [8] and [28]. The Tiara architecture [8] works on sparse0–1 skip list and cannot be extended for 1–2 skip lists.Deterministic 1–2 skip list works better for searching inpeer-to-peer and overlay networks, as it resembles a bal-anced 2–3 tree, and thus supports range queries effectively.The Corona architecture [28] provides a self-stabilizationalgorithm for deterministic 1–2 skip list by means of peri-odic broadcasting to check the violation in the skip-listproperties. The proposed system in this paper stabilizes


on-the-fly after an insertion or a deletion operation gets exe-cuted, and therefore, does not require periodic broadcasting.

3 System model

In a distributed environment, skip list can be considered asa collection of nodes that holds data elements, and can bein different lists of different levels. Here we can representeach of these nodes as an individual process. It is assumedthat each process can communicate only by message pass-ing. Channels are assumed to be asynchronous, FIFO andreliable. Each process holds a key, and the ordering of thenodes are done based on those keys. The keys are the uni-form shorthand representations of the data items. Let k1 andk2 be the keys for data items d1 and d2. Then k1 = k2

if and only if d1 = d2, otherwise the keys are different.The keys can be constructed from the data items based onany well known hashing algorithm, such as the RC5 check-sum [34]. It can be noted that the keys used in 1–2 skip listis different from those used in DHT. In case of DHT, keysdenote the location where the data items need to be stored.In the current context, keys are used as a shorthand repre-sentation of large data items. For example, let the data itemsbe some large files. Then the key value represent the RC5checksum of the file contents, which is short in size, andcan be used efficiently for pattern matching during searchoperation.

The list contains two special nodes, HEADER and NULL,the first and the last node of the list respectively. There aretwo dedicated processes functioning for the HEADER andthe NULL. The HEADER contains the key −∞ and theNULL contains key ∞. The keys are sorted in ascendingorder in the list. Here it has been assumed that the only wayof communication between two nodes is message passing.For simplicity it has also been assumed that each node cancommunicate to each other. All the algorithms are describedas the actions to be taken when a particular message hasbeen received by the process (or equivalently, the node). Thevariables and symbols used throughout this paper have beensummarized in Table 1.

Table 1 State variables of node v

Variable Description

level level of a node

k(v) The key value for the data item stored in node v

ndlevel (v) list of keys for right neighbor at every level

lm(v) maximum level of node v

Ll(v) Left neighbor at level l

Rl(v) Right neighbor at level l

Every node v has two local boolean variables ϒ1(v) andϒ2(v). Let,

(i) Llm(v)(v) = u and(ii) Rlm(v)(v) = w

Then,

Definition 1 ϒ1(v) = TRUE implies lm(u) > lm(v)

Definition 2 ϒ2(v) = TRUE implies,

1. ϒ1(u) = T RUE

2. lm(u) = lm(v)

3. lm(w) > lm(v)

ϒ1(v) = T RUE denotes that v is the immediate firstnode after node u where lm(u) = lm(v) + 1. Similarlyϒ2(v) = T RUE denotes that v is the immediate first nodeafter node u where lm(u) = lm(v), and w is the imme-diate first node after node v, where lm(w) = lm(v) + 1.Figure 3 shows the cases when ϒ1(v) = T RUE andϒ2(v) = T RUE. We have following lemma to characterizeϒ1 and ϒ2,

Lemma 1 If ϒ1(v) = T RUE for a node v, then ϒ2(v) =FALSE and if ϒ2(v) = T RUE then ϒ1(v) = FALSE.

Proof The proof can be derived directly from the defini-tion of ϒ1 and ϒ2. If ϒ1(v) = T RUE, then it violatescondition (2) in the definition of ϒ2. Similarly, if ϒ2(v) =T RUE, the condition (2) in the definition of ϒ2 violates thecondition in definition of ϒ1.

These two variables are maintained to detect the viola-tion of properties of deterministic 1–2 skip list caused by aninsertion or a deletion of a node. Let v is a node that violatesthe properties of 1–2 skip list after an insertion or a dele-tion. Let Llm(v)(v) = u and Rlm(v)(v) = w. Then we have 9possible conditions (1a)–(1i), given in Table 2, that denotethe states of node v. These conditions are derived from thedefinition 1, definition 2 and Lemma 1.

Lemma 2 The conditions given in (1a)–(1i) are neces-sary and sufficient to denote the states of a node, whosemaximum level violates the properties of 1–2 skip list.

Proof If node v violates the properties of 1–2 skip list,then node u = Llm(v)(v) can have one of the three possibleoptions:

(i) lm(u) > lm(v)

(ii) lm(u) = lm(v) and ϒ1(u) = T RUE

(iii) lm(u) = lm(v) and ϒ2(u) = T RUE


Fig. 3 a ϒ1(v) = T RUE, (b)ϒ2(v) = T RUE

(b)u vx wu v

(a)

Similarly, for node w = Rlm(v)(v) can have one of the threepossible options:

(i) lm(w) > lm(v)

(ii) lm(w) = lm(v) and ϒ1(w) = T RUE

(iii) lm(w) = lm(v) and ϒ2(w) = T RUE

Thus based on the states at u and w, node v has 9 possibleoptions which are reflected in the conditions given in (1a)–(1i).

Whenever a skip-list becomes unstable due to insertionof a new node or deletion of node from the skip list, thenodes receive the information from its left neighbor andright neighbor at every level using a single round of mes-sage passing and compute the conditions given in (1a)–(1i).It can be noted that only one of the nine conditions can betrue. From this information, a node can compute whetherit should be upgraded, or one of its neighbors should beupgraded. Following theorem bounds number of opera-tions required to stabilize a skip list after an insertion or adeletion.

Theorem 1 For deterministic 1–2 skip list, there are at mostlog n + 1 number of levels, where n is the total number ofnodes.

Proof At level 1 there will be n number of nodes. At theworst case half of the nodes of level 1 will be present atlevel 2. So number of nodes at level 2 will be n/2. Similarlymaximum number of nodes at level 3 will be n/22. So atlevel l the maximum number of nodes will be n/2(l−1).

At the maximum level there should be at least one node.From this we can derive that for maximum value of l,

n/2(l−1) = 1 (2)

By solving Eq. (2), l = log n + 1. Hence the maxi-mum number of levels for deterministic 1–2 skip list is logn + 1.

4 Search procedure in a distributed deterministic1–2 skip list

The search procedure is started from the HEADER of theskip list. If the search procedure is successful, it returns theprocess id of the node that contains the key value. If searchis unsuccessful, the procedure returns a failure message tothe HEADER.

4.1 The detailed search procedure

The whole procedure is initiated by forwarding a message“SEARCH” to the HEADER. The searching procedure isdescribed in the Algorithm 1. The procedure is explainedusing an example shown in Fig. 4. Suppose we have to findthe node with the key value 10. The search procedure isstarted at the HEADER from level 3. The procedure contin-ues by sending the messages to the node with the key value7 followed by the node with the key value 10.

It can be noted that the search operation can be extendedto support range queries in this data structure. As the list isin sorted order, range queries can be implemented using two

Table 2 Conditions for 1–2skip list maintenance ((lm(u) = lm(v)) ∧ (ϒ1(u) = T RUE)) ∧ ((lm(w) = lm(v)) ∧ (ϒ2(w) = T RUE)) (1a)

((lm(u) = lm(v)) ∧ (ϒ1(u) = T RUE)) ∧ ((lm(w) = lm(v)) ∧ (ϒ1(w) = T RUE)) (1b)

((lm(u) = lm(v)) ∧ (ϒ2(u) = T RUE)) ∧ ((lm(w) = lm(v)) ∧ (ϒ1(w) = T RUE)) (1c)

((lm(u) = lm(v)) ∧ (ϒ2(u) = T RUE)) ∧ ((lm(w) = lm(v)) ∧ (ϒ2(w) = T RUE)) (1d)

((lm(u) = lm(v)) ∧ (ϒ2(u) = T RUE)) ∧ (lm(w) > lm(v)) (1e)

((lm(u) = lm(v)) ∧ (ϒ1(u) = T RUE)) ∧ (lm(w) > lm(v)) (1f)

(lm(u) > lm(v)) ∧ ((lm(w) = lm(v)) ∧ (ϒ2(w) = T RUE)) (1g)

(lm(u) > lm(v)) ∧ ((lm(w) = lm(v)) ∧ (ϒ1(w) = T RUE)) (1h)

(lm(u) > lm(v)) ∧ (lm(w) > lm(v)) (1i)


(SEARCH, 10,3)

(SEARCH, 10,3)

2 4

4

5 7

7

7

9 10

10

12 13

12

addr(10)

11

Fig. 4 Searching key 10 on a deterministic distributed 1–2 skip list

search operations - one search operation with the minimumkey, and another search operation with the maximum key.The nodes in between the node with the minimum key andthe node with the

Algorithm 1 Node z received message “SEARCH” fromnode y

Input κ (key for the data item to be searched), l (currentsearch level)Output Return node identifier that contains the data ele-ment with the key overlinek, otherwise return error1: if z = HEADER then2: level := lm(z)

3: else4: level := l

5: end if6: if k(z) = κ then7: return z

8: else if k(z) < κ then9: while level �= 0 do

10: if ndlevel(z) ≤ κ then11: send(“SEARCH”, κ , level) to Rlevel(z)

12: break13: else14: level := level − 115: end if16: end while17: end if18: if level = 019: send a “ERROR” message to HEADER.20: end if

maximum key, contains the resultant data items of the rangequery.

Following lemma and theorem show the correctness andcompleteness of the search procedure.

Lemma 3 Suppose node Ni receives a SEARCH messageat level li , and forward the same message to the node Nj atlevel lj . Then lj ≤ li

Proof According to Algorithm 1, when a node Ni receivesa SEARCH message at li , it performs some local computa-tion. According to line 8 of the Algorithm 1, if the storedkey is less than the searched key, the search procedure con-tinues. If the stored key does not match with the search key,the search key is compared with the stored key of next node,according to line 10. If the stored key is less than or equalsto the searched key, the message is forwarded to the nextnode with equal level. Otherwise the level is decremented,according to line 14, until it reaches to zero. Therefore,whenever the search message is forwarded, the level valueeither remains equal or decremented, until it reaches zero.This implies lj ≤ li

Theorem 2 The algorithm for search terminates eventually,either with a search failure or with the node address whichcontains the search data.

Proof Let lNi and dNi denote the level and key at node Ni

where the “SEARCH” message is forwarded. From Lemma3,

lHEADER ≤ lN1 ≤ lN2 ≤ ... ≤ lNULL (3)

As the list is sorted in ascending order,

dHEADER ≤ dN1 ≤ dN2 ≤ ... ≤ dNULL (4)

Now we have dHEADER = −∞ and dNULL = ∞, sodHEADER < data < dNULL. Thus from Eqs. (3) and (4),it can be concluded that traversal of “SEARCH” messagesforms a partially ordered set in decreasing order of levelvalues, which implies the level eventually becomes zero,and an unsuccessful search terminates at this point. If thesearch is successful, it immediately terminates returning theresult.

4.2 Complexity analysis

Theorem 3 The search procedure has a message complex-ity of O(log n).

Proof In the proposed distributed searching algorithm,every steps requires one search message to be forwardedto the next process. We know that at the top most level ofskip list, there can be only maximum two nodes. There-fore, at the top most level, maximum number of messagesto be forwarded is 2. After that if data is not found we needto decrease level by one. At that level also from the defi-nition of deterministic 1–2 skip list there will be only twonodes between any two nodes of height greater than currentlevel. So similarly at this level also we need to go down byone level after forwarding 2 messages. As from Theorem 1we know there are log n + 1 number of levels in the skiplist. So we need to send at most 2(log n) + 2 messages for


searching. Finally one message is required to send the resultto the HEADER. So message complexity is asymptoticallybounded by O(log n).

5 Insertion procedure in a distributed deterministic1–2 skip list

The insertion procedure is triggered by the new node to beinserted. The new node sends a message to the HEADER.Then the HEADER initiates the whole procedure. At first, itfinds out the proper position of the new node, and the neigh-bors at different level. After that, the node gets inserted intothe list at minimum level, and checks for any violation ofthe properties of deterministic 1–2 skip list. If any violationis detected, the either the node is upgraded, or one of itsneighbors gets upgraded. This procedure is repeated until alldiscrepancies are resolved.

5.1 Theoretical basis of the insertion procedure

Suppose node v needs to be inserted in a deterministic 1–2skip list, and for this case Llm(v)(v) = u, and Rlm(v)(v) = w.We have following propositions.

Proposition 1 Let the following conditions are true,

lm(u) = lm(v), (5a)

ϒ2(u) = T RUE, (5b)

Then, lm(w) > lm(v)

Proof Let us assume, lm(w) = lm(v). From the definitionof 1–2 skip list lm(w) ≮ lm(v). Now,

ϒ2(u) = T RUE � lm(Llm(u)(u)) = lm(u) (I.1)

From condition (5a) and inference (I.1),

lm(Llm(u)(u)) = lm(u) = lm(v) (I.2)

From our assumption that lm(w) = lm(v) and infer-ence (I.2),

lm(Llm(u)(u)) = lm(u) = lm(w)

Thus before an insertion, there are three consecutivenodes with similar maximum level, which violates the prop-erties of 1–2 skip list. Thus our assumption is wrong, thatproves the theorem by contradiction.

Proposition 2 Let the following conditions are true alongwith the condition (5a),

ϒ1(u) = T RUE, (6a)

lm(w) = lm(v), (6b)

Then, ϒ2(w) = T RUE

Proof Let us assume, ϒ2(w) = FALSE. Now fromLemma 1,

ϒ2(w) = FALSE � ϒ1(w) = T RUE (I.3)

Condition (5a), condition (6a), condition (6b) and infer-ence (I.3) imply that before an insertion, there are twoconsecutive nodes u and w of equal level with ϒ1(u) =T RUE and ϒ1(w) = T RUE that violates the definition ofϒ1. Hence our assumption is wrong. Hence the theorem isproved.

Proposition 3 Let the following condition is true along withthe condition (6b),

lm(u) > lm(v) (7a)

Then, ϒ1(w) = T RUE

Proof Let us assume, ϒ1(w) = FALSE. From Lemma 1,

ϒ1(w) = FALSE � ϒ2(w) = T RUE (I.4)

This implies that before the insertion, lm(u) > lm(w)

(from condition (6b) and condition (7a)) and ϒ2(w) =T RUE, which violates the definition of ϒ2. Thus ourassumption is wrong.

It should be noted that from Proposition 1, condition (1c)and condition (1d) cannot be evaluated to true when anew node is inserted. Similarly from Proposition 2, con-dition (1b), and from Proposition 2, condition (1g) cannothold true for the insertion procedure. So we are left with fivepossible conditions. Once a node v receives the state infor-mation of u and w using a pair of message transmissions, itshould execute one of the five possible actions, as shown inTable 3.

Condition (1a) denotes that before the insertion, u andv are in the equal maximum level. So after the insertion,v should be upgraded to lm(u) + 1. Note here that hence-forth the term upgradation is used to mean that the level ofa node is incremented by 1 unit. Once v is upgraded, ϒ1(w)

should be set to TRUE. Condition (1e) denotes that beforethe insertion, u and Llm(u)(u) are in equal level. So after the

Table 3 Insertion procedure for 1–2 skip list

Condition (1a) → upgrade(v),

ϒ1(w) = T RUE (8a)

Condition (1e) → upgrade(u),

ϒ1(v) = T RUE (8b)

Condition (1f) → ϒ2(v) = T RUE (8c)

Condition (1h) → checkup(w) (8d)

Condition (1i) → ϒ1(v) = T RUE (8e)


insertion, u should be upgraded to lm(u) + 1. ϒ1(v) is setto TRUE. Condition (1f) denotes that before the insertion,ϒ1(u) = T RUE, and after the insertion lm(w) > lm(v),so ϒ2(v) is set to true. Condition (1h) denotes that after theinsertion, lm(u) > lm(v) and ϒ1(w) = T RUE. Followingtheorem shows the limitation of condition (1h).

Theorem 4 Node v can always determine whether there isany possible upgradation at lm(u), but not at lm(w).

Proof Whenever node v checks the status of its neighbors,if it finds that lm(u) = lm(v) and ϒ1(u) = T RUE itcan conclude that there is no other node with the same lmat its left of left neighbor. So it can conclude that there isno possible upgrade. Similarly when lm(u) = lm(v) andϒ2(u) = T RUE, it can conclude that it is the consecu-tive third node with the same lm (as ϒ2(u) = T RUE �ϒ1(Llm(u)(u)) = T RUE) and thus the left neighbor shouldget upgraded. But we cannot say this when lm(w) = lm(v)

and ϒ1(w) = T RUE. Because the node v cannot decidewhether there is another node with the same lm at the rightof right neighbor. Hence the theorem holds.

So from Theorem 4, node v sends a checkup(w) messageto the node w to check for any possible upgradation. Thefollowing lemma limits the checkup operation up to a singlehop.

Lemma 4 If a node x receives a checkup message from y,where x = Rlm(y)(y), then condition (1h) cannot hold trueat node x.

Proof If Condition (1h) is true at node y, then

lm(x) = lm(y) (I.5)

This implies y = Llm(x)(x), so on receiving checkupfrom y, if condition (1h) has to be true at node x, thenlm(y) > lm(x), which violates Inference (I.5).

Hence the checkup message does not propagate further.Condition (1i) denotes that after insertion both u and w arein higher maximum level. So v sets ϒ1(v) to TRUE.

5.1.1 The detailed insertion procedure

Suppose node v wants to join in the skip list. Then it for-wards a “JOIN” message to the HEADER. On receiving the“JOIN” message the HEADER initiates the insertion pro-cedure by finding out the proper position of the new nodebased on its key value. The HEADER uses a similar pro-cedure to search for this operation. Suppose the new nodev needs to be inserted between node u and node w. Then

node v proceeds for the join procedure by sending a mes-sage “GETINFO” to the node u and node w, as shown in theAlgorithm 2. The “GETINFO” message contains a param-eter isLeft, which is set to TRUE for w and FALSE for u, todenote whether it is a left neighbor or a right neighbor.

On receiving a “GETINFO” message, a node first set itsleft and right neighbors accordingly, and then call proceduresendResponse to forward its parameter information to therequester (the node that forwarded

Algorithm 2 Node z received message “GETINFO” fromnode y

Input isLeft (TRUE if y is a left node, else FALSE), l (max-imum level of node y)Output Set left or right neighbor and forward state infor-mation1: if isLeft = TRUE then2: Ll(z) := y

3: else4: Rl(z) := y

5: end if6: Call procedure sendResponse(y, l)

Algorithm 3 procedure sendResponse at node z

Input Rid (the node identifier of the requester), l (the max-imum level of the requester)Output Send state information of node z to the node Rid1: if z �= HEADER ∧ lm(z) = l then2: if ϒ1(z) = T RUE then3: send(“RESPONSE”, 1) to Rid4: else5: send(“RESPONSE”, 2) to Rid6: end if7: else8: send(“RESPONSE”, 3) to Rid9: end if

“GETINFO”). The operation of this procedure is shownin the Algorithm 3. The procedure checks for the condi-tions, as mentioned in Table 2. There can be three cases,i) ϒ1 = T RUE or ii) ϒ2 = T RUE or iii) the maximumlevel of this node is greater than the maximum level of therequester. For the first case, it forwards a “RESPONSE”message with the parameter 1, for the second case it for-wards a “RESPONSE” message with the parameter 2, andfor the third case, it forwards a “RESPONSE” with theparameter 3.


On receiving the “RESPONSE” message, a node be-comes stabilized based on the operations mentioned inTable 3, as shown in the Algorithm 4. The node receivestwo “RESPONSE” messages, one from its left neighbor andone from its right neighbor. Based on the received valuefrom these two messages, the node sets its state variables, asshown in lines 2–20 in the Algorithm 4. Based on the statevariables, there can be five possible options according toTable 3. The actions triggered for this five possible optionscan be summarized as follows;

Case (1) ϒ1 = T RUE at left neighbor, and ϒ2 = T RUE

at right neighbor (Condition 1a) : The node trig-gers procedure upgrade for upgradation.

Case (2) ϒ1 = T RUE at left neighbor, and the maxi-mum level of the right neighbor is greater thanthe maximum level of this node (Condition 1f) :The node sets ϒ2 = T RUE

Case (3) ϒ2 = T RUE at left neighbor, independent atright neighbor (Condition 1e, note thatCondition 1c and Condition 1d cannot occur) :The node sets ϒ1 to T RUE and asks its leftneighbor to upgrade.

Case (4) The maximum level of the left neighbor is greaterthan the maximum level of this node and ϒ1 =T RUE at right neighbor (Condition 1h): Thenode sets ϒ1 = T RUE. However, accordingto Theorem 4, a node can not decide whetherits right neighbor needs to be upgraded. So itforwards a “CHECKUP” message to its rightneighbor to check for possible upgradation.

Case (5) The maximum level of both the left and the rightneighbor is greater than the maximum level ofthis node (Condition 1i): The node sets ϒ1 =T RUE.

Algorithm 5 shows the functionalities of a node onreceiving a “CHECKUP” message. It checks whether itsmaximum level equals to the maximum level of its rightneighbor. If it is, then it upgrades, otherwise sets ϒ2 totrue. The upgrade procedure upgrades the level of a nodeby one as shown in the Algorithm 6. After that, it forwardsthe “GETINFO” message to its left and right neighbor tocheck for further upgradation. If the node is already in themaximum level of the list, then it forwards an “UPHEAD”message to the HEADER and an “UPNULL” message to theNULL. On receiving these messages, the HEADER and theNULL upgrade and the procedure terminates at this point.

The execution of insertion procedure is shown using anexample given in Fig. 5. Suppose we have to insert a nodewith the key 7. So after the neighbor finding operationsthis node sends a “GETINFO” message with the param-eters [TRUE, 1] to the node with the key 6 and another“GETINFO”, with the parameters [FALSE, 1] to the node

Algorithm 4 Node z received message “RESPONSE” fromnode y

Input m (the received parameter)Output level upgradation based on Table 31: i := lm(z)

2: if m = 1 then3: if y = Li(z) then4: state10 := 15: else6: state11 := 17: end if8: else if m = 2 then9: if y = Li(z) then

10: state20 := 111: else12: state21 := 113: end if14: else15: if y = Li(z) then16: state30 := 117: else18: state31 := 119: end if20: end if21: if state10 = 1 ∧ state21 = 1 then22: upgrade(z)/*received “RESPONSE”, 1 from leftneighbor and “RESPONSE”, 2 from right neighbor*/23: else if state10 = 1 ∧ state31 = 1 then24: ϒ2(z) := T RUE/*received “RESPONSE”, 1 fromleft neighbor, and “RESPONSE”, 3 from right neighbor*/25: else if state20 = 1 then26: ϒ1(z) := T RUE/*received “RESPONSE”, 2 fromthe left neighbor*/

27: upgrade(Li(z))28: else if state30 = 1 ∧ state11 = 1 then29: ϒ1(z) := T RUE/*received “RESPONSE”, 3 fromleft neighbor and “RESPONSE”, 1 from right neighbor*/30: send(“CHECKUP”) to Ri(z)

31: else if state30 = 1 ∧ state31 = 1 then32: ϒ1(z) := T RUE/*received “RESPONSE”, 3 fromboth left and right neighbor33: end if

with the key 8 respectively. Then this node gets inserted inbetween the nodes with the key 6 and the node with the key8. After that, it receives a “RESPONSE” message with theparameter 1 from both the neighbors, which sets the statevariable state10 and state11. So it concludes that it shouldget upgraded and sends a message “GETINFO”, with theparameters [TRUE, 2] to the node with the key 4, andwith the parameters [FALSE, 2] to theNULL respectively.


Algorithm 5 Node z received message “CHECKUP” fromnode y

Input noneOutput check for upgradation1: if lm(Rlm(z)) = lm(z) then2: ϒ1(Rlm(z)) := T RUE

3: upgrade(z)4: else5: ϒ2(z) := T RUE/*Terminate algorithm at this point*/

6: end if

Algorithm 6 Procedure upgrade at node z

Input noneOutput Upgrade this node and then check for furtherupgradation1: level := level + 12: lm(z) := level

3: send(“GETINFO”, FALSE, level) to Llevel(z)

4: send(“GETINFO”, TRUE, level) to Rlevel(z)

5: if level > lm(HEADER) then6: send(“UPHEAD”) to HEADER.7: send(“UPNULL”) to NULL.8: Llevel (z) := HEADER

9: Rlevel(z) := NULL

10: end if

It receives a “RESPONSE” message with the parameter 1from the left neighbor, which sets variable state10. As theright neighbor is the NULL, it sets the variable state31. So

2 4

4

86

7

2 4

4

6 7 8

62

4

4

7

7 8

(a)

(b)

(c)

RESPONSE(1) RESPONSE(1)

GETINFO (FALSE,1)GETINFO(TRUE,1)

Fig. 5 Insertion in an deterministic distributed skip list

the node concludes that no more upgradation is required andalso maximum level of the node is equal to the HEADER, soit terminates the algorithm. After all these actions get exe-cuted, a stable deterministic 1–2 skip list is formed as shownin Fig. 5c.

Following theorems establishes the correctness and ter-mination of the proposed algorithm.

Theorem 5 The insertion algorithm eventually terminates.

Proof From Lemma 2, conditions (1a)–(1i) are necessaryand sufficient conditions to be checked to make a skip liststable. If v is a node that violates the properties of 1–2 skiplist, then one of the conditions (1a)–(1i), except the invalidconditions given in Theorems 1, 2 and 3, must be true. Thecontrol propagates further if one of the conditions (1a), (1e)or (1h) becomes true and a possible upgradation is required.If condition (1a) becomes true, the node upgrades itself, andthen check for any possible violation of the properties of1–2 skip list at upgraded level. If condition (1e) becomestrue, then the node itself becomes part of the stable 1–2skip list, and the control goes to its left neighbor for upgra-dation. Similarly, if condition (1h) becomes true, the nodebecomes part of the stable list and the control goes to theright neighbor for possible upgradation. Then eventuallyone of the conditions (1f) or (1i) evaluates to true, or thecontrol goes to the maximum level of the skip list where asingle upgradation is sufficient. The algorithm terminates atthis point.

Theorem 6 After the insertion procedure the properties of1–2 skip list hold true.

Proof From the insertion procedure, it is straightforwardthat the skip list remains sorted after the insertion. Accord-ing to the insertion procedure, the new node is inserted atlevel 1, and exchanges a pair of messages to check for anypossible upgradation. For a deterministic 1–2 skip list, theonly node that violates the level property of 1–2 skip listis the node that is inserted or the node that is upgradedto a new level. The conditions (1a)–(1i) are checked at allthe nodes after upgradation to detect any possible violationof the properties of 1–2 skip list. Since these conditionsare sufficient, all the nodes are in correct levels after theinsertion procedure terminates.

5.1.2 Complexity analysis

Theorem 7 The message complexity of the insertion oper-ation is O(log n) .

Proof Finding all the neighbors of a new node takes 2 ×O(log n) number of messages. According to theorem 1,


there can be O(log n) number of levels possible witha n-node skip list. So there can be O(log n) upgra-dation possible after a new node gets inserted. Everyupgradation requires a pair of message communication.So, overall the insertion procedure works in O(log n)

messages.

6 Deletion procedure in distributed deterministic1–2 skip list

The deletion operation starts from the HEADER. When theHEADER receives a deletion request, it starts a search oper-ation. If the node is present in the list, the node sends allits right neighbor information to its left neighbors by mes-sage passing. Once the complete neighbor information getupdated, the violation of properties of 1–2 skip list aresearched. If any violation is there that is resolved by eitherdecreasing the lm of a node by one or upgrading the lm ofa node by one. The procedure gets terminated when all thediscrepancies are resolved.

6.1 Theoretical basis of deletion procedure

Deletion requires downgrade operation as well as upgradeoperation to be performed to make a 1–2 skip list sta-ble. By downgrade operation, we mean that the levelof a node is decremented by one unit. Let us considerFig. 6a. Suppose node v be deleted. Then either node u ornode w needs to be downgraded. Now we have followingtheorems.

Theorem 8 Suppose node v gets deleted from a stable 1–2skip list. Let u = Llm(v)(v) and w = Rlm(v)(v). Then thedowngrade operation can be either at lm(u) or lm(w).

Proof It can be noted that if lm(u) = lm(v) or lm(v) =lm(w), then there is no downgrade operation. There arethree possible cases when downgrade operation can occur,as shown in Fig. 6.

Case I (lm(u) = lm(w) = lm(v) + 1):If v is deleted, thenone of u or w need to be downgraded to avoid parallel links,as shown in Fig. 6a and b.

Case II (lm(w) = lm(v) + 1, lm(u) > lm(w)): If v isdeleted, then w needs to be downgraded to avoid parallellinks, as shown in Fig. 6c and d.

Case III (lm(u) = lm(v) + 1, lm(w) > lm(u)): If v isdeleted, then u needs to be downgraded to avoid parallellinks, as shown in Fig. 6e and f.

Theorem 9 Suppose node v is downgraded in a 1–2 skiplist. Let l′m(v) be the maximum level of node v before down-grade operation, and u′ = Ll′m(v)(v) and w′ = Rl′m(v)(v).Then the downgrade operation can be either at lm(u′) orlm(w′).

Proof The proof is similar to the proof given in Theorem 8.

Theorem 10 There cannot be more than one level down-grade operation at any node.

Proof We prove this by contradiction. Let us assume thatthere can be more than one level downgrade operation at anode. This implies that after deletion, there are three consec-utive pointers to the next node. This is true if there were twoconsecutive pointers to the next node before deletion whichviolates the properties of 1–2 skip list. This contradicts ourassumption and follows the theorem.

Theorem 11 Suppose node v gets downgraded and lm(v)

be the maximum level of node v after downgrade operation.

Fig. 6 Deletion in a skip list

(a) (b)u

u

u

v w

v w

v w

wu

wu

wu

(c) (d)

(e) (f)


Let u = Llm(v)(v) and w = Rlm(v)(v). Then if either ofu or w gets upgraded due to the result of this downgradeoperation, then the downgrade operation terminates.

Proof Let us consider after the downgrade operation atnode v, lm(v) = lm(w) = lm(Rlm(w)(w)). Then node wgets upgraded. From Theorem 9, after node v gets down-graded, the next downgrade operation can be at either lm(u′)or lm(w′). But when node w gets upgraded, Llm(w)(w) = u′and Rlm(w)(w) = w′, as shown in Fig. 7. Thus neither nodeu′ nor node w′ is required to be downgraded. From Theo-rem 9, there is no further downgrade operation possible. Sothe downgrade operation terminates.

Similarly it can be proved, that after the downgrade ofnode v, if node u gets upgraded, then also the downgradeoperation terminates.

Theorem 12 The downgrade operation terminates eventu-ally.

Proof Suppose node v gets downgraded. Now before thedowngrade operation if lm(v) = lm(Llm(v)(v))or lm(v) =lm(Rlm(v)(v)), then the downgrade operation terminates, asthere will be no further downgrade operation after node vgets downgraded.

If lm(v) < lm(Llm(v)(v)) and lm(v) < lm(Rlm(v)(v)), thenfrom Theorem 11, if there is an upgradation after a down-grade operation, then the downgrade operation terminates.

Otherwise the downgrade operation terminates when thedowngrade control is reached at the maximum level of theskip list. From Theorem 8 and Theorem 9, every time a

v w ’u=u’

v w ’u=u’

v w

x w

x w

x w’u=u’

(c) w upgraded

(a) v needs to be downgraded

(b) v downgraded, w needs to be upgraded

Fig. 7 Downgrade operation followed by up-gradation

downgrade operation occurs, the next downgrade operationcan occur at the next level. So if node v is deleted, thenstarting from level lm(v) + 1, the downgrade operation cancontinue up to the maximum level of the skip list. Onceit reaches at the maximum level, the downgrade operationterminates.

Once the downgrade operation gets terminated accordingto Theorem 12, the next task is to check for possible upgra-dation. The upgradation can occur only at the neighborhoodof the node deleted. The checking for upgradation is startedfrom level 1 neighbors of the deleted node. The upgradationis similar to the operations described for insertion proce-dure. Starting from level 1 neighbors, every node uses a pairof message communication to get the information of theirneighbors at maximum level, and executes condition (1a)–(1i) to check for possible upgradations. The actions taken bya node v are given in Table 4. The following theorem opti-mizes neighbor search procedure for upgradation in deletionprocedure.

Theorem 13 Let ui = Li(v) and wi = Ri(v) denotes ith

level left and right neighbor respectively for node v that isdeleted. Let wj (j < i) is upgraded to ith level because ofupgradation. Then Llm(wj )(wj ) = ui and Rlm(wj )(wj ) =wi . Similarly if uj (j < i) is upgraded to ith level because ofupgradation. Then Llm(uj )(uj ) = ui and Rlm(uj )(uj ) = wi .

Proof The proof idea can be illustrated using Fig. 8. When-ever node v gets deleted, then ui = Li(wi) and wi =Ri(ui). So when a node wj is upgraded to the next leveli, for it Llm(wj )(wj ) = ui and Rlm(wj )(wj ) = wi . Simi-larly when uj is upgraded to ith level, Llm(uj )(uj ) = ui andRlm(uj )(uj ) = wi .

Table 4 Deletion procedure for 1–2 skip list

Condition (1a) → upgrade(v),

ϒ1(w) = T RUE (9a)

Condition (1b) → upgrade(v) (9b)

Condition (1c) → upgrade(v) (9c)

Condition (1d) → upgrade(v),

ϒ1(w) = T RUE (9d)

Condition (1e) → upgrade(u)

ϒ1(v) = T RUE (9e)

Condition (1f) → ϒ2(v) = T RUE (9f)

Condition (1g) → ϒ1(v) = T RUE (9g)

Condition (1h) → checkup(w)

ϒ1(v) = T RUE (9h)

Condition (1i) → ϒ1(v) = T RUE (9i)


1u0u1

1u0u1

(a)

(b)

v w0 w

v w0 w

v w0 w1u0u1

(c)

Fig. 8 Up-gradation after deletion

6.1.1 The detailed deletion algorithm

The deletion operation is explained formally through a set ofalgorithms as shown in the Algorithm 7 to the Algorithm 17.Suppose node v wants to be deleted from the skip list.

Algorithm 7 Node v wants to delete

1: for i = 1 to (lm(v) − 1) do2: send(“RNEXT”, i, Ri(v)) to Li(v).3: end for4: listl := L1(v)

5: listr := R1(v)

6: for all 1 < i < lm(v) do7: listl := listl append(Li(v))

8: listr := listr append(Ri(v))

9: end for10:send(“UPNEXT”, listl , listr , lm, Rlm(v)(v)) to Llm(v)(v).

Before deletion, it first forwards all its right neighbor infor-mation to its left neighbor using an “RNEXT” message,to update the neighbors after a deletion, as shown in theAlgorithm 7. Let u be its left neighbor and w be its rightneighbor. Then after deletion of v, the right neighbor of uwould be w and vice-versa. This link-setup and parametersetup (ϒ1 and ϒ2) is done through an “RNEXT” messageand an “LNEXT” message, as shown in the Algorithm 8and Algorithm 9. In the Algorithm 7, after forwarding linkinformation, the node v prepares two lists - listl and listr ,corresponding to its left neighbors’ identifiers and its rightneighbors’ identifiers. These two lists are used to check

possible downgrade operations at all its left neighbors andthe right neighbors (as downgrade operation can be onlyeither at the neighbors of the node deleted, or at the neigh-bors of the node downgraded, as proved in Theorem 8 andTheorem 9). Node v then forwards these lists along with itsmaximum level information, and its right neighbor informa-tion at maximum level to its left neighbor at maximum level,through an “UPNEXT” message, as shown in Line 10 of theAlgorithm 7.

On receiving an “UPNEXT” message, a node first checksfor downgrade operation by checking whether two paral-lel link exists (both neighbors at level l and at level l + 1are equal where l is the maximum level of the node deletedaccording to Theorem 8), as shown in

Algorithm 8 Node z received message “RNEXT” fromnode y

Input l (maximum level of the node), addr (id of the node)Output set the right neighbors1: level := l

2: Rl(z) := addr

3: send(“LNEXT”, l) to addr

Algorithm 9 Node z received message “LNEXT” fromnode y

Input l (maximum level of the node)Output Set left neighbors1: Ll(z) := y

2: if (l = lm(z)) ∧ (ϒ2 = T RUE) then3: ϒ1 := T RUE

4: end if

line 5 of the Algorithm 10. If the maximum level of thisnode is l + 1, then this node needs to be downgraded. Fordowngrade operation the node first stores ids of its leftand right neighbors at maximum level to local variablest Lnode and t Rnode (used in future upgradation, if neces-sary, to minimize number of messages required, accordingto Theorem 13). Then it decrements its maximum levelby one, and sends a “CHECKST” message to its new leftand right neighbor at maximum level to check for possibleupgradation due to violation in skip-list properties.

Now if there are two parallel links, but the maximumlevel of this node is not l + 1, then it asks its right neigh-bor at maximum level (only if the right neighbor is not theNULL, as downgrade is not possible at the NULL) to down-grade by sending a “DOWNGRADE” message, as shown


in line 12 of the Algorithm 10. If the right neighbor isNULL, then it extracts next node from listl , and sendsa “CHECKUP” message to that node for checking anypossibility of upgradation. Similar operation is done if nodowngrade is required due to deletion, as shown in line 19to line 21 of the Algorithm 10.

Algorithm 11 shows the functionalities of a node whenit receives a “DOWNGRADE” message. If first stores itleft and right neighbors’ ids to two temporary variablesas earlier for possible future upgradation, and then decre-ments its maximum level by one. After that it forwards a“CHECKST” message to its left and right neighbor at thenew maximum level, to check for any possible upgradation.

When a node receives a “CHECKUP” message with thelistl and the listr , it first stores the incoming parameters totemporary variables (the purpose of these temporary vari-ables will be discussed later, these are required for a seriesof upgradation) as shown in the Algorithm 12, and thenforwards a “CHECKST”

Algorithm 10 Node z received message “UPNEXT” fromnode y

Input listl , listr , l (maximum level of the node)Output Check for downgrade and upgrade1: Rl(z) := y

2: send(“LNEXT”, l) to y3: if (z = HEADER)∧(y = NULL)∧(lm(z) = 1) then4: return 1/*No Upgrade or Downgrade Required*/5: else if (Rl(z) = Rl+1(z)) ∧ (l + 1 = lm(z)) ∧ (z �=

HEADER) then6: t Lnode := Llm(z)(z)/*Downgrade this node*/7: t Rnode := Rlm(z)(z)

8: lm(z) := lm(z) − 19: send(“CHECKST”, lm(z)) to Llm(z)(z), Rlm(z)(z)

10: else if (Rl(z) = Rl+1(z))∧(z = HEADER)∨(lm(z) �=l + 1) then11: if Rl+1(z) �= NULL then12: send(“DOWNGRADE”) to Rl+1(z)/*Ask rightneighbor to downgrade*/13: else14: b := extract (listl)/*Right neighbor is NULL. Checkupgrade for next node in listl*/15: extract (listr )

16: send(“CHECKUP”, listl , listr , -1, -1) to b17: end if18: else19: b := extract (listl)/*No downgrade. Check upgradefor next node in listl*/20: extract (listr )

21: send(“CHECKUP”, listl , listr , -1, -1) to b

22: end if

Algorithm 11 Node z received message “DOWNGRADE”from node y

Input noneOutput Downgrade the node1: t Lnode := Llm(z)(z)

2: t Rnode := Rlm(z)(z)

3: lm(z) := lm(z) − 14: send(“CHECKST”, lm(z)) to Llm(z)(z), Rlm(z)(z)

Algorithm 12 Node z received message “CHECKUP”from node y

Input listl , listr , x (identifier of the previous left neighbor),y (identifier of the previous right neighbor)Output Check for upgradation at left and right neighbor ofmaximum level1: t Lnode := x

2: t Rnode := y

3: send(“CHECKST”, lm(z)) to Llm(z)(z), Rlm(z)(z)

message to its left and right neighbors at maximum levelto get their parameters to check for possible upgradation.On receiving a “CHECKST” message, a node forwards“RESPONSE” message with parameters 1, 2 or 3 based onits maximum level, ϒ1 and ϒ2 values, as shown in the Algo-rithm 13. If its maximum level equals to the maximum levelof the sender, then it forwards 1 if ϒ1 is true, and forwards2 if ϒ2 is true. Otherwise it forwards 3.

Algorithm 13 Node z received message “CHECKST” fromnode y

Input l (maximum level of the node)Output Send parameter information1: if (lm(z) = l) ∧ (ϒ1 = T RUE) then2: send(“RESPONSE”, 1) to y3: else if (lm(z) = l) ∧ (ϒ2 = T RUE) then4: send(“RESPONSE”, 2) to y5: else6: send(“RESPONSE”, 3) to y7: end if

Based on the received “RESPONSE” from both its left andthe right neighbors, a node triggers actions according toTable 4.

The actions of a node on receiving a “RESPONSE”message from both its left and right neighbors is


shown in the Algorithm 14. There can be followingcases;

Case (1) ϒ1 = T RUE at both left and right neighbors(Condition 1b) or ϒ2 = T RUE at left neighborand ϒ1 = T RUE at right neighbor (Condition1c):: In this case, the node triggers procedureupgradeD for upgradation, as shown in line 21 ofthe Algorithm 14.

Case (2) The maximum level of the left neighbor isgreater than the maximum level of this nodeand ϒ1 = T RUE at right neighbor (Condi-tion 1h): The node sets its ϒ1 to T RUE asshown in line 23 of the Algorithm 14, and thensends a “CHECKUP” message to its right neigh-bor at maximum level to check for any possibleupgradation.

Case (3) ϒ1 = T RUE at left neighbor, and ϒ2 = T RUE

at right neighbor (Condition 1a) or ϒ2 = T RUE

at both left and right neighbor (Condition 1d):The node triggers procedure upgradeD and sendsa “SETONE” message to its right neighbor asshown in line 26 and line 27 of the Algorithm 14.On receiving a “SETONE” message, a node setsits ϒ1 to T RUE and ϒ2 to FALSE.

Case (4) ϒ2 = T RUE at left neighbor and the maxi-mum level of the right neighbor is greater thanthe maximum level of this node (Condition 1e): The node sets its ϒ1 to T RUE and sendsa “UPGRADE” message to its left neighbor atmaximum level. On receiving a “UPGRADE”message, a node triggers the upgradeD proce-dure, as shown in the Algorithm 6.1.1.

Case (5) ϒ1 = T RUE at left neighbor and the maxi-mum level of the right neighbor is greater thanthe maximum level of this node (Condition 1f) :The node sets ϒ2 = T RUE and triggers the pro-cedure nextCheck as shown in lines 32 and 33 ofthe Algorithm 14.

Case (6) The maximum level of the left neighbor is greaterthan the maximum level of this node and ϒ2 =T RUE at right neighbor (Condition 1g) or themaximum level of both the left and the rightneighbor is greater than the maximum level ofthis node (Condition 1i): For both of these cases,the node sets ϒ1 = T RUE, and triggers thenextCheck procedure as shown in lines 35 and 36of the Algorithm 14.

The procedure nextCheck is shown in the Algorithm 15.According to Theorem 8 and Theorem 9, the downgradeoperation can happen only at the maximum level neigh-bors of either the node deleted or the node downgraded.

Algorithm 14 Node z received message “RESPONSE”from node y

Input m (the parameter information)Output Check for upgradation and set parameters accord-ing to Table 41: if m = 1 then2: if y = Llm(z)(z) then3: f lag10 := 14: else5: f lag11 := 16: end if7:else if m = 2 then8: if y = Llm(z)(z) then9: f lag20 := 1

10: else11: f lag21 := 112: end if13:else14: if y = LmaxLevel(z) then15: f lag30 := 116: else17: f lag31 := 118: end if19: end if20: if ((f lag10 = 1) ∧ (f lag11 = 1)) ∨ ((f lag20 =1) ∧ (f lag11 = 1)) then21: upgradeD(z)22: else if (f lag11 = 1) ∧ (f lag30 = 1) then23: ϒ1 := T RUE

24: send(“CHECKUP”, listl , listr , t Lnode, t Rnode)to Rlm(z)(z)

25: else if ((f lag10 = 1) ∧ (f lag21 = 1)) ∨ ((f lag20 =1) ∧ (f lag21 = 1)) then26: upgradeD(z)27: send(“SETONE”) to Rlm(z)(z)

28:else if (f lag20 = 1) ∧ (f lag31 = 1) then29: ϒ1 := T RUE

30: send(“UPGRADE”, listl , listr , t Lnode, t Rnode)to Llm(z)(z)

31: else if (f lag10 = 1) ∧ (f lag31 = 1) then32: ϒ2 = T RUE

33: nextCheck(listl , listr )

34:else if ((f lag21 = 1) ∧ (f lag30 = 1)) ∨ ((f lag30 =1) ∧ (f lag31 = 1))

35: ϒ1 := T RUE then36: nextCheck(listl , listr )

37: end if

Further, from Theorem 11, if there is a upgradation as adowngrade operation, then there would be no further down-grade operation possible. The temporary variables t Lnode


Algorithm 15 procedure nextCheck(listl , listr )

1: if t Lnode = −1 then2: b := extract (list l)

3: extract (list r)

4: if b = φ then5: send(“WRAPUP”) to HEADER6: else7: send (“CHECKUP ”, list l, list r, t Lnode,

t Rnode) to b8: end if9: else

10: send (“UPNEXT ”, list l, list r, l m + 1,

t Rnode) to t Lnode

11: end if

Algorithm 16 Node z received message “UPGRADE” fromnode y

Input listl , listr , x (identifier of the previous left neigh-bor), y (identifier of the previous right neighbor)Output upgrade the node1: t Lnode := x

2: t Rnode := y

3: upgradeD(z)

Algorithm 17 procedure upgradeD(z)

1: lm(z) := lm(z) + 12: if t Lnode := −13: Llm(z) = extract (listl)

4: Rlm(z) = extract (listr )

5: else6: Llm(z) = t Lnode

7: Rlm(z) = t Rnode

8: end if9: send(“GETINFO”, FALSE, lm(z))toLlm(z)(z)

10: send(“GETINFO”, T RUE, lm(z))toRlm(z)(z)

11: if (Llm(z)(z) = HEADER) ∧ (Rlm(z)(z)) = NULL)

then12: send (“WRAPUP”) to HEADER13: end if

and t Rnode keeps track of whether there was any upgrada-tion as a result of downgrade operation. As discussed earlier,these two variables are set on receipt of a “CHECKUP”message, as discussed in the Algorithm 12. If t Lnode

is set, it indicates there can be further downgrade opera-tion, and so the node forwards an “UPNEXT” message tot Lnode. Otherwise the next node in listl is extracted. Iflistl is empty, the node sends a “WRAPUP” message to theHEADER to indicate the termination of deletion procedure.

Otherwise it sends the “CHECKUP” message to the nextnode in listl to check for any possible upgradation.

The procedure upgradeD is shown in the Algorithm 17.the node first increments to one. If this upgradation isa result of previous downgrade operation, then accordingto Theorem 13, the left and right neighbors are stored int Lnode and t Rnode variables. Otherwise, the left andright neighbors are extracted from listl and listr . The nodesends a “GETINFO” messages to the left and right neigh-bors to check for any more upgradation. If either of the twonodes is the HEADER or the NULL, then no more upgrada-tion is required, and the node sends a “WRAPUP” messageto the HEADER. It can be noted that due to deletion pro-cedure, the HEADER may need to decrement its maximumlevel if it directly points to the NULL at maximum level.This is done on receiving of a “WRAPUP” message.

The deletion algorithm is explained using an exampleshown in Fig. 9. Let the node with the key 4 needs to bedeleted. So it sends its right neighbor information to itsleft neighbors. Then it forwards an “RNEXT” message tothe node with the key 2 and sends an “UPNEXT” messageto the HEADER. So these nodes updates their neighbors.In this case, no downgrade operation is required. So theHEADER sends a “DOWNGRADE” message to the nodewith the key 7. This node gets downgraded and checksfor available upgrade by sending a “CHECKST” message.It receives a “RESPONSE” message with the parameter3 from the left neighbor and another “RESPONSE” mes-sage with the parameter 1 from the right neighbor. So itsends a “CHECKUP” message to the node with the key10. This node repeats similar procedure, and receives a“RESPONSE” message, with the parameter 1, and another“RESPONSE” message, with the parameter 2, from the leftand the right neighbor respectively. So it gets upgraded.Then “CHECKUP” message is sent to the node with thekey 2. It follows the similar procedure, and receives a“RESPONSE” message with the parameter 3 from boththe neighbors. No upgrade is required at this point, andlistl is also empty. Therefore the node sends a “WRAPUP”message to the HEADER. After all these processes are per-formed the resultant skip-list is shown in Fig. 9d, which is astable deterministic 1–2 skip list.

Following theorems show the correctness of the deletionprocedure.

Theorem 14 Eventually the deletion operation gets termi-nated.

Proof According to Theorem 12, the downgrade operationterminates eventually. Once the downgrade operation getsterminated, the upgradation operation starts at level 1 of theneighbors of the node deleted. The termination of upgra-dation can be proved using similar logic given Theorem 5.


Fig. 9 Deletion on adeterministic distributed 1–2skip list

2

4

4 5

7

7

7 9 10

10

11

12

12 14

14

12

122 5 7

7

10

10

119

142 5 7

7

9 10

10

10

11

12

12

142 5 7

7

9 10

10

10

11

12

12

(UPNEXT, 1,l1,l2,addr(7))

(CHECKST,2) (CHECKST,2)

(RESPONSE,1) (RESPONSE,2)

(CHECKUP)

(RNEXT,1,addr(5))(a)

(b)

(c)

(d)

DOWNGRADE

Once the upgradation terminates, the deletion proceduregets terminated.

Theorem 15 After deleting a node from the skip list theproperties of 1–2 skip list hold true.

Proof After a node v is deleted from a stable 1–2 skiplist, all duplicate links in the skip-list is removed using thedowngrade operations. According to Theorem 12, the down-grade operation terminates after all possible downgrades,and the upgradation due to downgrade are done. The upgra-dation at step 3 of deletion algorithm ensures the correctlevel property based on the conditions (1a)–(1i). As theseconditions are sufficient, the deletion algorithm terminateswith a correct 1–2 skip list.

6.2 Complexity analysis

Theorem 16 The message complexity of the deletion oper-ation is O(log n).

Proof According to theorem 1, there can be O(log n) lev-els in a n-node 1–2 skip list. So the downgrade operations

can occur at most O(log n) times. Every downgrade oper-ation requires one UPNEXT message forwarding. Thusfor downgrade operation O(log n) messages are required.Every upgradation required one pair of RESPONSE mes-sage communications. From Theorem 13, the checking forupgradation does not require any extra search operation, andneighbors can be extracted directly from the list listl piggy-backed with the UPNEXT message. So for upgradation,another O(log n) number of messages required in worstcase. So the message complexity of deletion procedure isO(log n).

7 Simulation results

The proposed scheme is simulated using NS-2.34 [29] net-work simulator framework. A random physical topologyof peers (varies from 20 to 4000) has been constructedaccording to Poison distribution with the mean probabil-ity for connectivity as 0.5, that is on average 50% of thenodes are directly connected to each other, and rest othernodes are in maximum two hops distance. This types oflower layer connectivity is assumed to reduce the effect


of lower layer delay over the performance of the pro-posed protocol. Two nodes acts as the HEADER and theNULL, and a virtual skip-list is built on the top of thistopology using NS-2 application layer message passingenvironment. The key values are chosen randomly withoutany duplicates. For search, insertion and deletion, randomP2P clients are chosen using random key values, and theseexperiments are repeated 20 times to ensure a tight con-fidence interval. The average metric value as well as theconfidence factor (the difference between maximum andminimum value) are shown in the graphs. The number ofmessages requires to perform the operation and the delayfor the operation are taken as the simulation metric. Forsearch operation, the delay is defined as the time betweenthe search operation is triggered from the HEADER, andthe result (or error) is returned at the HEADER. For theinsertion and the deletion operations, delay is definedas the time required to stabilize the whole skip-list.This delay metric is compared with other data structuresused for data management in peer-to-peer systems, suchas, DHT, DST, distributed list [10] and distributed tree[9, 16, 37, 41].

7.1 Performance of the search operation

Figure 10 shows the number of P2P clients vs average mes-sage count for search operation. It can be seen that averagemessage count is less than the theoretical upper bound, andalso follows a logarithmic curve. Figure 11 compares theaverage delay of search operation for different data struc-tures used for peer-to-peer system. A magnified version ofthe graph is shown in Fig. 12, to clearly analyze the resultswhen number of P2P clients are less than 1000. As expected,the delay for distributed is less. Though the delay for DST isless than the proposed scheme initially, however, the delaytends to increase after number of P2P clients greater than100. Thus the proposed structure performs better than DST

0

5

10

15

20

25

30

0 500 1000 1500 2000 2500 3000 3500 4000

Nu

mb

er o

f M

essa

ges

Number of P2P Clients

Theoretical Upper BoundAverage Messages

Fig. 10 Average message count in search operation

0

50

100

150

200

250

300

0 500 1000 1500 2000 2500 3000 3500 4000

Ave

rag

e D

elay

(se

c)


1-2 Skip ListDHTDST

Distributed ListDistributed Balanced Tree

Fig. 11 Average delay in search operation

with higher number of P2P clients. Further as seen from thefigure, the proposed scheme performs better than distributedlist and distributed tree based data structures.

7.2 Performance of the insert operation

Figure 13 shows number of messages required for an inser-tion with respect to the number of P2P clients in the list.For an insertion operation, average number of messagesrequired is less than the theoretical upper bound, and fol-lows a logarithmic curve. Figure 14 compares the insertiondelay for the different data structures, and Fig. 15 showsan extended version of the result for number of P2P clientsless than 1000. As earlier, the distributed list performs bet-ter than the proposed scheme initially, but performs poorlywhen number of P2P clients is high, because of its lin-ear nature. The proposed scheme performs better than thedistributed list and distributed tree based data structures,as those structures require significant amount of time forstabilization.

0

10

20

30

40

50

60

70

80

0 200 400 600 800 1000

Ave

rag

e D

elay

(se

c)


1-2 Skip ListDHTDST


Fig. 12 Average delay in search operation (Magnified)


10

20

30

40

50

60

0 500 1000 1500 2000 2500 3000 3500 4000

Nu

mb

er o

f M

essa

ges



Fig. 13 Average message count in insert operation

7.3 Performance of the delete operation

For deletion operation, a skip list was build using the inser-tion algorithm with a key value which is generated randomlyusing uniform distribution with minimum value 210 andmaximum value 2200 at each client. After that, randomlyone client has been deleted from the skip list, and aver-age of the number of required message as well as repairingdelay have been calculated after each deletion. Two graphsare plotted using the data received in simulation. One is thenumber of P2P clients vs the average message count calcu-lated in simulation, and another one is the number of P2Pclients vs the average delay calculated in simulation. Thosetwo graphs are shown in Figs. 16 and 17.

Figure 16 shows the number of P2P clients vs averagemessage count calculated in simulation graph, where wecan see that the average message count is at per with thetheoretical upper bound of the required message for dele-tion operation. The curve is bounded by logarithmic curve.Thus it supports our theoretical analysis of the algorithm.Figure 17 compares the delay for list stabilization after aclient is deleted. Figure 18 shows a magnified version of the

0

100

200

300

400

500

600

0 500 1000 1500 2000 2500 3000 3500 4000

Ave

rag

e D

elay

(se

c)


1-2 Skip ListDHTDST


Fig. 14 Average delay in insert operation

0

20

40

60

80

100

120

140

160

0 200 400 600 800 1000

Ave

rag

e D

elay

(se

c)


1-2 Skip ListDHTDST


Fig. 15 Average delay in insert operation (Magnified)

results when number of P2P clients is less than 1000. As ear-lier, the distributed list perform better when number of P2Pclients is less. The proposed scheme outperforms DST withhigher number of P2P clients. Further, the deletion delay issignificantly less compared to distributed list and distributedtree based data structures.

7.4 Performance for range queries

Figures 19 and 20 compares the average delay for rangequeries in two data structures - the distributed 1–2 skip listand the DHT. In the figures, x-axis shows the key rangesin the query, y-axis shows number of P2P clients, and thez-axis shows average delay for query execution. In caseof range queries, every query should be executed indepen-dently in case of DHT, as it does not preserve semanticslocality of the keys. However for distributed 1–2 skip list,only two queries need to be executed for range queries, thequery with minimum key, and the query with maximum key.All other keys will be in between these two results, becauseof the semantic locality preservation of 1–2 skip list. It canbe seen from the figures that when number of key ranges

20

30

40

50

60

70

80

90

100

110

0 500 1000 1500 2000 2500 3000 3500 4000

Nu

mb

er o

f M

essa

ges



Fig. 16 Average message count in delete operation


0

200

400

600

800

1000

1200

1400

1600

1800

0 500 1000 1500 2000 2500 3000 3500 4000

Ave

rag

e D

elay

(se

c)


1-2 Skip ListDHTDST


Fig. 17 Average delay in delete operation

in the queries are very low, DHT perform better than dis-tributed 1–2 skip list, however, with high ranges of keysin the query, distributed 1–2 skip list perform significantlybetter than DHT. The figures reveal that the proposed dis-tributed 1–2 skip list perform better than DHT when numberof key ranges in the queries is more than 30. With 4000clients and a range query of 48 keys, delay for query exe-cution for distributed 1–2 skip list is less than 50% of thedelay observed in case of DHT.

8 Effectiveness of using distributed 1–2 skip list overDHT in peer-to-peer system

The simulation results show that for individual one-shotqueries, DHT performs better than the proposed distributed1–2 skip list. However, for range queries, when the keyranges are large enough, DHT performs very poorly becauseof the associated network overhead introduced due to theflooding of the search messages. The proposed distributed1–2 skip list performs very efficiently in this regard. Webelieve that distributed 1–2 skip list can provide a better

0

50

100

150

200

250

300

350

400

450

0 200 400 600 800 1000

Ave

rag

e D

elay

(se

c)


1-2 Skip ListDHTDST


Fig. 18 Average delay in delete operation (Magnified)

0 1000

2000 3000

4000

0 10

20 30

40 50

0 12 24 36 48 60

z

1-2 Skip List

yx

z

0 10 20 30 40 50 60

Fig. 19 Average delay for range queries (1–2 skip list)

alternative in peer-to-peer systems, compared to the DHT,because of the following reasons.

(i) The exact performance of any data structure over theoverlay network depends on the physical connectivity.As discussed earlier, the simulation setup considersmaximum two-hop physical distance between twopeers, to avoid the delay introduced due to physicalconnectivity. However, Fig. 21 shows the performanceof the data structures where the physical layer con-nectivity is like a chain topology, one of the extremeconnectivity scenario. It can be seen from the fig-ure, that in this scenario the performance of DHT andthe skip list is almost similar. As discussed in [25],a DHT-based systems can guarantee that any dataobject can be located in a small O(log n) overlayhops, where n is the number of peers. The proposeddistributed 1–2 skip list can also search an data objectwithin O(log n) overlay hops. However, the underlay

0 1000

2000 3000

4000

0 10

20 30

40 50

0 12 24 36 48 60

z

DHT

yx

z

0 10 20 30 40 50 60

Fig. 20 Average delay for range queries (DHT)


0

100

200

300

400

500

600

0 200 400 600 800 1000

Ave

rag

e D

elay

(se

c)


1-2 Skip ListDHTDST


Fig. 21 Average delay in search operation in a chain network

hops (actual physical connectivity) may be far morethan the overlay hops, which may affect actual delayin the search operations, as evident from Figs. 11 and21. The insert and the delete operations also result ina similar behavior, as seen from the extensive sim-ulations. Therefore, it can be said, that on average,the proposed 1–2 skip list can perform as good as theDHT based system.

(ii) Most of the peer-to-peer systems for large data man-agement, that involves range queries, requires high-dimensional and multi-attribute queries, as analyzedin [3, 18, 35]. For such high-dimensional multi-attribute range queries, the key range to be searched isvery high, likely to be O(n), where n is the number ofpeers in the network. For such queries, DHT performsvery poorly, as shown in the results from the simu-lation graphs. The proposed distributed 1–2 skip listscales well for such high-dimensional multi-attributerange queries.

(iii) Most of the works in literature suggest to use dis-tributed tree structure [9, 16, 22, 32, 37, 41] to supportrange-queries in a peer-to-peer system. The proposeddata structure performs better than distributed treestructure.

(iv) The proposed data structure is scalable with largenumber of peers in the network. Even with the largenumbers of peers in the network, the times requiredfor the single-shot operations become steady, and doesnot increase with the increasing in number of peernodes.

As a whole, the proposed distributed 1–2 skip listcan provide a flexible and scalable data structure, thatresults in similar performance as of DHT, for single-shotqueries, insertion and deletion, however, provides signifi-cant improvement for range-queries with large number ofkey spaces. Therefore, the proposed distributed 1–2 skip list

can be used as an effective alternative of DHT, in peer-to-peer systems, where range queries are often used to process,and fetch data items. To the best of our experience, manyapplications now-a-days for peer-to-peer system uses multi-attribute range queries [4, 6, 14, 38, 40], where the proposeddata structure can be explored effectively.

9 Discussion

From the simulation analysis, it has been observed thatthe proposed scheme outperforms distributed list and dis-tributed tree based data structures. Distributed lists are sel-dom used for peer-to-peer data management, as it requiresO(n) number of messages for a node insertion and anode deletion. Though distributed tree based data struc-tures requires logarithmic messages for stabilization afteran insertion or a deletion, the hidden cost for the tree man-agement (the extra cost due to physical message passing, aschildren nodes may not be direct neighbors of the parents)is more for the distributed tree compared to the proposedscheme. The proposed scheme have both the properties ofa list and a tree, and thus it reduces the hidden costs forthe tree management by semantic locality. Further, the listproperty of the proposed scheme helps to pre-compute somelocal information (as described in Theorem 13) which helpsin reduction of control messages. This makes the proposeddata structure as a good alternative for data management inthe peer-to-peer system. As seen from the simulation results,distributed deterministic 1–2 skip list also outperforms DSTeven with large number of nodes. Because of its list proper-ties, the proposed scheme can also implement range querieswith the help of the two search operations.

In the proposed set of algorithms, the insertion and thedeletion procedures cannot work concurrently. However, thesearch procedure can work concurrently as it does not mod-ify the node structure. For insertion and deletion operations,the serializability can be achieved with the cooperation ofthe HEADER. Every insertion and deletion procedure isinitiated by the HEADER. So the HEADER can use a listto make the insertion and deletion operation serializable.When an insertion or a deletion procedure gets completed,then only the HEADER starts executing next procedure.As the system is assumed to be reliable, and there is nosudden failure, so with the cooperation of the HEADER,concurrency in the system can be achieved.

It can be noted that the HEADER is responsible for onlyinitiating a search, an insertion or a deletion procedure byforwarding the message to the correct node who can startthe procedure. Once the control goes to the responsible nodefor performing further operations, the HEADER does notintervene the algorithm execution. Therefore the HEADERdoes not get overloaded due to algorithmic execution of


the search, insertion or deletion procedure. However, when-ever multiple requests come simultaneously, the HEADERcontrols the serializability of the operations. However, dis-tributed control for serializability can be designed over theproposed algorithms by incorporating lock based concur-rency management [21]. However lock based concurrencycontrol can result in deadlock or starvation in the sys-tem, which should be addressed separately. Therefore theseissues have been left for further studies.

However, based on the proposed set of algorithms, thereliability assumption can be relaxed in the future extensionof this work. In a distributed environment, there can be twotypes of node failures - transient failure and permanent fail-ure. Further failure can occur either during the executionof the algorithms or during the stable network condition. Incase of stable network condition, permanent failure can behandled through a node deletion, and transient failure canbe handled using a node deletion followed by a node join.When a node detects a failure of its left neighbor, it can ini-tialize the repairing procedure by executing Algorithm 8.However, to handle node failure during the execution of thealgorithm, every node should maintain a state variable thatstores its previous state before upgrade and downgrade. Ondetecting a failure, the nodes should revert back its stateto the previous state, and reinitialize the insertion or dele-tion procedure. The detailed state maintenance, as well asdesigning the correctness and the termination condition ofthis method is out of scope of this paper, and left for futureextension of this work.

10 Conclusion

This paper proposes an alternate data structure, called deter-ministic 1–2 skip list, for data management in peer-to-peersystem. The insertion and deletion algorithms for the main-tenance of the data structure has been proposed, analyzedand proved using theorems and lemmas. Message complex-ity of all the algorithms are also calculated. In a distributedenvironment it has been shown that the worst case mes-sage complexity for the insertion and deletion is O(log n).So it can be said that upper bound of maintenance cost ofthis data structure is O(log n). The algorithm for the searchoperation on this data structure is also given in this paper.The cost of the search operation is O(log n). The proposeddata structure can effectively execute range queries with thehelp of two search operations. the proposed data structureis simulated using NS-2.34 network simulator, and perfor-mance is compared with other similar structures such asDHT, DST, distributed lists and distributed trees. The pro-posed data structure is novel in distributed message passingapplications like peer-to-peer and overlay systems which

can be effectively used to solve the search problem in suchan environments. The proposed algorithms can be extendedfor supporting multiple concurrent operations, and are tobe designed to perform correctly in systems with arbitraryfailures, which is left as a future course of study.

References

1. Androutsellis-Theotokis S, Spinellis D (2004) A survey of peer-to-peer content distribution technologies. ACM Comput Surv36:335–371

2. Aspnes J, Shah G (2007) Skip graphs. ACM Trans Algorithms3(4):37:1–37:25

3. Bharambe AR, Agrawal M, Seshan S (2004) Mercury: support-ing scalable multi-attribute range queries. In: ACM SIGCOMMcomputer communication review, vol 34. ACM, pp 353–366

4. Bharambe AR, Agrawal M, Seshan S (2004) Mercury: support-ing scalable multi-attribute range queries. SIGCOMM ComputCommun Rev 34(4):353–366

5. Boldi P, Vigna S (2005) Compressed perfect embedded skip listsfor quick inverted-index lookups. In: Proceedings SPIRE 2005.Lecture Notes in Computer Science, pp 25–28

6. Cai M, Frank M (2004) RDFPeers: a scalable distributed RDFrepository based on a structured peer-to-peer network. In: Pro-ceedings of the 13th international conference on World Wide Web.ACM, pp 650–657

7. Clement J, Herault T, Messika S, Peres O (2008) On thecomplexity of a self-stabilizing spanning tree algorithm forlarge scale systems. In: Proceedings of the 2008 14th IEEEpacific rim international symposium on dependable computing,pp 48–55

8. Clouser T, Nesterenko M, Scheideler C (2008) Tiara: a self-stabilizing deterministic skip list. In: Proceedings of the 10thinternational symposium of stabilization, safety, and security ofdistributed systems, vol 5340, pp 124–140

9. Crainiceanu A, Linga P, Gehrke J, Shanmugasundaram J (2004)Querying peer-to-peer networks using p-trees. In: Proceedingsof the 7th international workshop on the web and databases:colocated with ACM SIGMOD/PODS 2004. ACM, pp 25–30

10. Crainiceanu A, Linga P, Machanavajjhala A, Gehrke J,Shanmugasundaram J (2011) Load balancing and range queriesin p2p systems using p-ring. ACM Trans Internet Technol10(4):16:1–16:30

11. Dabek F, Zhao B, Druschel P, Kubiatowicz J, Stoica I (2003)Towards a common API for structured peer-to-peer overlays. In:Proceedings of international workshop on peer-to-peer systems

12. Dolev S, Kat RI (2004) Hypertree for self-stabilizing peer-to-peer systems. In: Proceedings of the network computing andapplications. Third IEEE international symposium, pp 25–32

13. El-Sana J, Azanli E, Varshney A (1999) Skip strips: maintain-ing triangle strips for view-dependent rendering. In: Proceedingsof the conference on visualization ’99: celebrating ten years,pp 131–138

14. Ganesan P, Yang B, Garcia-Molina H (2004) One torus to rulethem all: multi-dimensional queries in P2P systems. In: Proceed-ings of the 7th international workshop on the web and databases.ACM, pp 19–24

15. Ge T, Zdonik S (2008) A skip-list approach for efficiently pro-cessing forecasting queries. Proc VLDB Endow 1(1):984–995

16. Gonzalez-Beltran A, Milligan P, Sage P (2008) Range queriesover skip tree graphs. Comput Commun 31(2):358–374


17. Goodrich MT, Tamassia R (2001) Efficient authenticated dictio-naries with skip lists and commutative hashing. Tech. rep., JohnsHopkins Information Secutity Institute

18. Gupta A, Agrawal D, Abbadi AE (2003) Approximate range selec-tion queries in peer-to-peer systems. In: Proceedings of the firstbiennial conference on innovative data systems research CIDR,vol 2003

19. Hanson EN, Johnson T (1992) The interval skip list: a data struc-ture for finding all intervals that overlap a point. In: Proceedings ofthe 2nd workshop on algorithms and data structures, pp 153–164

20. Harvey NJA, Jones MB, Saroiu S, Theimer M, Wolman A(2003) Skipnet: a scalable overlay network with practical local-ity properties. In: Proceedings of the 4th conference on USENIXsymposium on internet technologies and systems

21. Herlihy MP, Weihl WE (1991) Hybrid concurrency control forabstract data types. J Comput Syst Sci 43(1):25–61

22. Jagadish H, Ooi BC, Tan KL, Vu QH, Zhang R (2006) Speedingup search in peer-to-peer networks with a multi-way tree structure.In: Proceedings of the ACM SIGMOD international conference onmanagement of data. ACM, pp 1–12

23. Jagadish HV, Ooi BC, Vu QH (2005) BATON: a balanced treestructure for peer-to-peer networks. In: Proceedings of the 31stinternational conference on very large data bases, pp 661–672

24. Jannotti J, Gifford DK, Johnson KL, Kaashoek MF, O’Toole JWJr (2000) Overcast: reliable multicasting with on overlay network.In: Proceedings of the 4th conference on symposium on operatingsystem design & implementation

25. Lua EK, Crowcroft J, Pias M, Sharma R, Lim S (2005) A surveyand comparison of peer-to-peer overlay network schemes. IEEECommun Surv Tutor 7(2):72–93

26. Mandal S, Chakraborty S, Karmakar S (2012) Deterministic1–2 skip list in distributed system. In: Proceedings of the sec-ond IEEE international conference on parallel, distributed and gridcomputing

27. Munro JI, Papadakis T, Sedgewick R (1992) Deterministic skiplists. In: Proceedings of the third annual ACM-SIAM symposiumon discrete algorithms, pp 367–375

28. Nor RM, Nesterenko M, Scheideler C (2011) Corona: a stabiliz-ing deterministic message-passing skip list. In: Proceedings of the13th international symposium of stabilization, safety, and securityof distributed systems, pp 356–370

29. The network simulator NS-2.34. http://www.isi.edu/nsnam/ns/30. Onus M (2009) Overlay network construction in highly decentral-

ized networks. Ph.D. thesis, Tempe31. Pugh W (1990) Skip lists: a probabilistic alternative to balanced

trees. Commun ACM 33(6):668–67632. Ramabhadran S, Ratnasamy S, Hellerstein JM, Shenker S (2004)

Prefix hash tree: an indexing data structure over distributed hashtables. In: Proceedings of the 23rd ACM symposium on principlesof distributed computing

33. Ratnasamy S, Francis P, Handley M, Karp R, Shenker S (2001)A scalable content-addressable network. In: Proceedings of the2001 conference on applications, technologies, architectures, andprotocols for computer communications, pp 161–172

34. Rivest RL (1995) The RC5 encryption algorithm. In: Fast softwareencryption. Springer, pp 86–96

35. Shu Y, Ooi BC, Tan KL, Zhou A (2005) Supporting multi-dimensional range queries in peer-to-peer systems. In: Proceed-ings of the fifth IEEE international conference on peer-to-peercomputing. IEEE, pp 173–180

36. Stoica I, Morris R, Karger D, Kaashoek MF, Balakrishnan H(2001) Chord: a scalable peer-to-peer lookup service for internetapplications. In: Proceedings of the 2001 conference on appli-cations, technologies, architectures, and protocols for computercommunications, pp 149–160

37. Tanin E, Harwood A, Samet H (2007) Using a distributed quadtreeindex in peer-to-peer networks. VLDB J 16(2):165–178

38. Trunfio P, Talia D, Papadakis H, Fragopoulou P, MordacchiniM, Pennanen M, Popov K, Vlassov V, Haridi S (2007) Peer-to-peer resource discovery in grids: models and systems. Futur GenerComput Syst 23(7):864–878

39. Wang D, Liu J (2006) Peer-to-peer asynchronous video stream-ing using skip list. In: Proceedings of the IEEE internationalconference on multimedia and expo, pp 1397–1400

40. Yu M, Li Z, Zhang L (2007) Supporting multi-attribute queriesin peer-to-peer data management systems. In: Proceedings ofthe eighth international conference on parallel and distributedcomputing, applications and technologies, pp 515–522

41. Zhang C, Krishnamurthy A, Wang RY (2005) Brushwood: dis-tributed trees in peer-to-peer systems. In: Peer-to-Peer systems IV.Springer, pp 47–57

42. Zhang K, Wang S (2005) Linknet: a new approach for searching ina large peer-to-peer system. In: Proceedings of the 7th asia-pacificweb conference on web technologies research and development,pp 241–246

43. Zheng C, Shen G, Li S, Shenker S (2006) Distributed seg-ment tree: support range query and cover query over DHT. In:Proceedings of the fifth international workshop on peer-to-peersystems

Subhrangsu Mandal hascompleted his Bachelor ofEngineering from BengalEngineering and Science Uni-versity, Shibpur, Howrah,India and Master of Technol-ogy from Indian Institute ofTechnology Guwahati, India.Currently he is working asa Software Engineer withCitrix Systems, Inc. He hadalso worked with IBM. Hisresearch area includes Dis-tributed Algorithms, NetworkSecurity etc.

Sandip Chakraborty hascompleted his Bachelor ofEngineering from JadavpurUniversity, Kolkata, India andMaster of Technology fromIndian Institute of TechnologyGuwahati, India. Currentlyhe is a doctoral student atIndian Institute of Technol-ogy Guwahati, India. He hasreceived research fellowshipfrom TATA Consultancy Ser-vices, India. He is a studentmember of IEEE, IEEE Com-

munications Society and ACM. His research area includes WirelessAd Hoc and Mesh Networks, Wireless Sensor Networks, DistributedAlgorithms, Performance Modeling of Communication Systems etc.

http://www.isi.edu/nsnam/ns/


Sushanta Karmakar re-ceived his PhD in ComputerScience and Engineering in2010 from Indian Institute ofTechnology (IIT) Kharagpur,India. He received his M.E.and B.E. both in ComputerScience and Engineering fromJadavpur University in 2004and 2001 respectively. SinceDecember 2009 he is servingas an assistant Professor inthe Department of CSE atIndian Institute Technology

Guwahati, Guwahati, India. His research interests are in DistributedAlgorithms, Faulttolerance etc.

distributed deterministic 1–2 skip list for peer-to-peer system

Documents