1612 ieee transactions on signal processing, vol. …acsp.ece.cornell.edu/papers/hetong07sp.pdfhe...

1612 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

Detecting Encrypted Stepping-Stone ConnectionsTing He, Student Member, IEEE, and Lang Tong, Fellow, IEEE

Abstract—Stepping-stone attacks are often used by networkintruders to hide their identities. In a stepping-stone attack,attacking commands are sent indirectly to the victim througha chain of compromised hosts acting as “stepping stones.” Indefending against such attacks, it is necessary to detect step-ping-stone connections at the compromised hosts. The use ofencrypted connections by the attacker complicates the detec-tion problem and the attacker’s active timing perturbation andinsertion of chaff make it even more challenging. This paperconsiders strategies to identify stepping-stone connections whenthe attacker is able to encrypt the attacking packets and perturbtheir timing. Furthermore, the attacker can also add chaff packetsin the attacking stream. The paper first considers stepping-stoneconnections subject to packet-conserving transformations by theattacker. Two activity-based algorithms are proposed to detectstepping-stone connections with bounded memory or boundeddelay perturbation, respectively. These algorithms are proven tohave exponentially decaying false alarm probabilities if normaltraffic can be modelled as Poisson processes. It is shown thatthe proposed algorithms improve the performance of an existingstepping-stone detection algorithm. This paper then addresses thedetection of stepping-stone connections with both timing pertur-bation and chaff. Robust algorithms are developed to deal withchaff evasion. It is proven that the proposed robust algorithmscan tolerate a number of chaff packets proportional to the size ofthe attacking traffic, and have vanishing false alarm probabilitiesfor Poisson traffic. Simulations using synthetic data are usedto validate the theoretical analysis. Further results using actualInternet traces are shown to demonstrate the performance of theproposed algorithms.

Index Terms—Intrusion detection, network security, nonpara-metric detection, testing on point processes.

I. INTRODUCTION

TO EVADE surveillance, network attackers can hide theiridentity by launching the so-called stepping-stone attack

[1]. In such an attack, as illustrated in Fig. 1, the attacker com-promises a collection of hosts and uses these hosts as steppingstones to relay attacking commands. Because each connection

Manuscript received February 1, 2006; revised July 11, 2006. The associateeditor coordinating the review of this manuscript and approving it for publica-tion was Dr. Dominic K. C. Ho. This work was supported in part by The Teamfor Research in Ubiquitous Secure Technology (TRUST), which receives sup-port from the National Science Foundation under Award CCF-0424422 and thefollowing organizations: Cisco, ESCHER, HP, IBM, Intel, Microsoft, ORNL,Qualcomm, Pirelli, Sun and Symantec, and the U.S. Army Research Laboratoryunder the Collaborative Technology Alliance Program, Cooperative AgreementDAAD19-01-2-0011. The U.S. Government is authorized to reproduce and dis-tribute reprints for Government purposes notwithstanding any copyright nota-tion thereon. Part of this work was presented at the IEEE International Confer-ence on Acoustics, Speech, and Signal Processing (ICASSP) 2006, and at theConference on Information Sciences and Systems (CISS) 2006.

The authors are with the School of Electrical and Computer Engineering,Cornell University, Ithaca, NY 14853 USA (e-mail: [email protected];[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSP.2006.890881

Fig. 1. Stepping-stone attack.

is made by a separate remote login, a host in the chain can onlysee the identity of its immediate predecessor, and the victim onlysees the identity of the last host. Therefore, the identification ofattackers requires tracing the chain of stepping stones. A keycomponent in such tracing is the detection of stepping-stoneconnections. Donoho et al. proposed in [2] the use of step-ping-stone monitors at each gateway node for detection. A pairof incoming and outgoing streams is called a stepping-stone pairif it is part of a stepping-stone attack. Otherwise, it is calleda normal pair.1 The stepping-stone monitors try to discoverall stepping-stone pairs by examining the incoming-outgoingtraffic. In practice, the monitor has to make decisions by ob-serving live traffic, which may not include the beginning or theend of the connection. Therefore, it is desirable that the de-tection strategy does not require synchronization between in-coming and outgoing streams. Besides, the connections may beencrypted so that the monitor cannot rely on the content of thetraffic. Furthermore, a careful attacker may even actively modifythe traffic each time it passes through a host in order to confusethe monitor.

A. Related Work

Staniford-Chen and Heberlein [1] are the first to consider theproblem of detecting stepping-stone connections. Early tech-niques are based on the content of the traffic (see, e.g., [1] and[3]). These techniques, however, are not applicable to detectingencrypted connections. An alternative is to exploit timing char-acteristics of the traffic. Zhang and Paxson [4] propose to de-tect stepping-stone connections by matching the ending of “off”periods in different connections. Their approach requires thatthe connections are synchronized. Yoda and Etoh [5] proposean algorithm to identify streams with the same traffic patternbut unknown time shift. Wang et al. [6] propose to correlatestreams by examining packet interarrival times, and they showthat their method works well if connections on different pathshave distinctive timing characteristics. The drawback of theseapproaches is that they are vulnerable to active timing perturba-tion by the attacker.

There are a few results on detecting encrypted, timing per-turbed stepping-stone connections; see [2], [7], and [8]. Thekey assumption of these methods is that the attacker is able

1Formal definitions are given in Section II.

1053-587X/$25.00 © 2007 IEEE

HE AND TONG: DETECTING ENCRYPTED STEPPING-STONE CONNECTIONS 1613

to perform a packet-conserving transformation on his traffic,but the transformation is subject to certain constraints. Donohoet al. [2] are the first to consider the bounded delay perturba-tion, where there is a maximum tolerable delay for each at-tacking packet. Assuming that a stepping-stone pair is a renewalprocess and its relay (detailed analysis is done for Poisson pro-cesses), they show that substantial correlation can be revealedeven with timing perturbation. Wang and Reeves, in [7], take awatermark-based approach. They show how to correlate step-ping-stone connections with independent and identically dis-tributed perturbation by introducing watermark into packet in-terarrival times. Blum et al. [8] work along the same line as [2]except that they also assume that the attacker has a boundedpeak rate, and they remove the Poisson assumption on the at-tacking traffic. They propose a detection algorithm called “DE-TECT-ATTACKS” (DA) with no miss, and they are the firstto prove that their algorithm requires a polynomial number ofpackets to satisfy certain false alarm constraint.

A more general category of stepping-stone connections isthe one allowing nonpacket-conserving transformations. Here,the attacker has the ability to mix attacking traffic with nonat-tacking traffic, including dummy traffic called chaff, to evadedetection, or he can repacketize his traffic so that there is no

correspondence between arriving and departing packets.The repacketization is outside the scope of packet level detec-tion, and should be addressed at a lower level; the insertion ofchaff, however, has to be dealt with effectively. Peng et al., in[9], propose an active detection scheme which combines water-marking with packet matching to detect stepping-stone trafficin chaff. They assume packets have bounded delays, and chaffonly appears in the downstream flow. Their scheme injects wa-termarks in the upstream flow, and finds a subsequence in thedownstream flow, whose watermark is closest to the injectedone. Such an active scheme, however, requires the control ofthe stepping-stone host, and it also reveals the activities of thedetector to the attacker, allowing the attacker to compromisethe detector by studying its behavior. Donoho et al. [2] pointout that in principle it is possible to correlate stepping-stonetraffic even if both (bounded) delay and independent chaff areintroduced during the relay. Blum et al. [8] modify their algo-rithm DA into a new algorithm called “DETECT-ATTACKS-CHAFF” (DAC) to deal with chaff. DAC detects stepping-stonetraffic with a limited number of chaff packets by increasing thedetection threshold. The drawback is that such an increase inthe threshold leads to an increase in the false alarm probability,and the attacker can still evade detection by adding an arbi-trarily small fraction of chaff traffic. Indeed, a fixed numberof chaff packets can evade the detection for an attacking trafficof arbitrary size. In a recent paper [10], Zhang et al. proposepacket matching schemes to detect stepping-stone traffic withbounded delay perturbation and/or chaff. For a stepping-stonetraffic with bounded delay but without chaff, they propose a de-tection strategy similar to “DETECT-MATCH” [11], althoughthe detection performance on attacking traffic is not proven, andthey do not have a closed-form characterization for the falsealarm probability. For stepping-stone traffic with both boundeddelay and chaff, they propose a matching strategy which candetect stepping-stone traffic if the chaff is only inserted in the

departing stream. They prove that this strategy has exponen-tially decaying false alarm probability for independent Poissonstreams.

B. Summary of Results and Organization

In this paper, we consider the problem of detecting encryptedstepping-stone connections subject to the attacker’s activemodification. Our strategy does not use the content of thetraffic. Nor is synchronization or active traffic manipulationrequired. We first consider detecting stepping-stone pairs withbounded perturbation but no chaff, and then generalize ourdetection schemes to handle chaff packets. We formulate theproblem of detecting stepping-stone connections as a hypoth-esis testing of independent against correlated point processes.For the traffic perturbation by the attacker, we consider twotypes of constraints: 1) the host has bounded memory and 2)attacking packets have bounded delay. While the bounded delaycondition is a key in [2] and [8]–[10], the bounded memoryconstraint, to the best of our knowledge, has not been addressedin the literature.

Under the bounded memory assumption, we develop a linearcomplexity algorithm based on the maximum variation statistic.The intuition behind this algorithm is that the maximum vari-ation statistic stays bounded for relayed traffic going througha stepping-stone host with limited memory, but diverges un-boundedly for independent traffic. Under the bounded delayassumption, we derive a timing-based algorithm based on theidea of matching arriving packets with departing packets. Byrestricting the search to maps that preserve the order of packets,we reduce the complexity from exponential to linear. We provethat both of the proposed algorithms have no miss for their tar-geting stepping-stone pairs, and exponentially decaying falsealarm probabilities for independent Poisson processes. We thengeneralize the attacker model to allow the presence of chaff. Wedevelop two new algorithms for stepping-stone pairs with bothbounded memory or bounded delay perturbation and chaff. Theidea is to declare a stream pair normal if the optimal chaff-in-serting algorithm would have had to insert a certain fraction ofchaff packets to embed attacking packets into the given streampair. Therefore, the attacker will have to insert at least the samefraction of chaff to evade detection. The threshold on the frac-tion of chaff is chosen to be as large as possible to make the at-tacker’s evasion difficult, but also small enough so that the falsealarm probability will go to zero as the traffic size increases.

We next compare the performance of existing algorithms andthe proposed algorithms. To make the comparison, we analyzethe performance of algorithms DA and DAC proposed by Blumet al. [8]. The original analysis by Blum et al. focuses on samplesize, whereas our result is on error exponent analysis. Amongalgorithms dealing with packet-conserving transformations, weshow that the proposed variation-based algorithm has largerfalse alarm error exponent than the algorithm DA by Blumet al., and it also outperforms the proposed matching-basedalgorithm when the traffic is sufficiently fast. For slow traffic,however, the matching-based algorithm can be much better.For algorithms dealing with chaff, we compare our algorithmswith existing algorithms by Blum et al. and Zhang et al. [8],[10]. We show that, in contrast to the constant chaff tolerance of


the algorithms by Blum et al. and Zhang et al., our algorithmsare capable of handling an amount of chaff growing linearlywith the size of the attacking traffic. The price we paid is thatthe false alarm probabilities of our algorithms no longer haveexponentially decaying upper bounds, but are only guaranteedto be vanishing asymptotically.

The rest of this paper is organized as follows. Section II de-fines the detection problem. Section III presents a variation-based algorithm and its performance analysis under the boundedmemory assumption. Section IV develops a timing-based al-gorithm for stepping-stone pairs with bounded delay and an-alyzes its performance. In Section V, we present robust algo-rithms for stepping-stone pairs with both bounded perturbationand chaff, and analyze their robustness and asymptotic falsealarm probabilities. Section VI compares the performance ofthe proposed algorithms with several existing algorithms for de-tecting stepping-stone pairs with or without chaff, respectively.Section VII gives simulation results on both synthetic data andinternet traces to verify the performance. This paper is con-cluded by Section VIII with a few remarks on the applicationof such detection schemes.

II. PROBLEM STATEMENT

Let the packet arrivals on stream be represented by a pointprocess

where is the th arrival epoch of stream (if ,it is the th packet before the monitor starts). Let

be the set of the elements in. Let be the incoming and be the outgoing streams at a

particular gateway node. Normally, and are independent.If, however, is a relay of in a stepping-stone attack, thenthere will be strong correlation between them as formalized inthe following definition.

Definition 2.1: A pair of streams is a normal pair ifand are independent point processes. It is a stepping-stone

pair if there exists a bijection such thatfor any .

The bijection is a mapping between the arrival and thedeparture times of packets, allowing permutation of packetsduring the relay. The condition that is a bijection imposes apacket-conservation constraint, i.e., no packets are generatedor dropped at the stepping stones. The conditionis the causality constraint, which means that a packet cannotleave the host before it arrives.

If only a subsequence of consists of attackingpackets, then only that part is constrained, as stated in the fol-lowing definition.

Definition 2.2: A pair of streams is a stepping-stonepair with chaff if it is the superposition of a stepping-stone pair

and a pair of chaff streams (either or both ofthem can be empty).

Stream consists of dummy packets called chaffwhich do not need to arrive at the victim. Chaff packets canbe generated or dropped at any stepping stone hosts without

Fig. 2. (a) Cumulative counts N (w)(i = 1; 2). (b) Cumulative differenced(w) and the maximum variation v(w).

affecting the attack. They are artificially inserted by the attackerto evade detection.

We want to test the following binary hypotheses:

is a normal pair

is a stepping-stone pair

with or without chaff

by observing . This is a nonpara-metric hypothesis testing problem; no specific assumptions onthe statistical properties of are imposed at this point.Additional assumptions on normal and stepping-stone pairs willbe introduced later when detailed detection algorithms and anal-ysis are presented.

III. DETECTING STEPPING-STONE PAIRS WITH

BOUNDED MEMORY

We consider the problem of detecting stepping-stone pairswhen the host has bounded memory. Specifically, assume thatthe host’s memory can hold at most packets.2 Then, the dif-ference between the number of incoming and the number of out-going packets during any period can never exceed . We usethis property to define such stepping-stone pairs as follows.

Definition 3.1: A pair of streams is a stepping-stonepair with bounded memory if it is a stepping-stone pair, andfor any

To detect stepping-stone pairs with bounded memory, wederive a counting-based algorithm—DETECT-MAXIMUM-VARIATION (DMV).

Before presenting the algorithm, we need to introduce somedefinitions. Merge and and orderthe union as . Let be the numberof packets monitored in when the total number of monitoredpackets is , i.e.,

where is the indicator function. Sample paths of andare illustrated in Fig. 2(a).

2Similar requirement on buffer size has been considered by Giles and Hajekin the context of timing channels [12].


TABLE IDETECT-MAXIMUM-VARIATION (DMV)

Define the cumulative difference between and as

and let the maximum variation of be

See Fig. 2(b) for an illustration of and .If the stepping-stone host has bounded memory, then the

sample path of will have bounded variation. AlgorithmDMV distinguishes normal and stepping-stone pairs by lookingat the maximum variation. Specifically, note that

where is the differencein the number of incoming and the number of outgoing packetsbetween the th and the th arrivals. For stepping-stone pairswith bounded memory , this difference is bounded by ,i.e., . Using the maximum variation statistics,we define the following detector:3

o.w.

The detector can be implemented by the algorithm DMV shownin Table I.

Algorithm DMV has complexity and uses only constantmemory ( , to be precise).4 Any stepping-stone pairwith bounded memory will be detected after packets, i.e.,miss is totally avoided. We only need to be concerned about thefalse alarm probability, and it is bounded as follows.

Theorem 3.2: If normal pairs consist of independent Poissonprocesses, then the false alarm probability of DMV is boundedby

where . Furthermore, if the two Poisson pro-cesses have the same rates, then the upper bound is tight withrespect to the error exponent, i.e.,

Proof: See the Appendix.

3We use the convention that the detector gives the value 1 for H and 0 forH .

4The log in this paper is always natural logarithm.

Remarks: For a given false alarm probability , we can guar-antee that the false alarm probability of DMV is bounded byby making its upper bound equal to , yielding a sample size

(1)

which grows as as and .For example, if , (1) says that using 1196 packets willguarantee a false alarm probability no greater than 1%.

IV. DETECTING STEPPING-STONE PAIRS

WITH BOUNDED DELAY

Many stepping-stone attacks are interactive. In interactivestepping-stone attacks, the attacker waits for the feedback ofthe previous commands and sends new commands based onthe feedback. Therefore, the delay in such interactive attacksis usually bounded. In this section, we consider detectingstepping-stone pairs with bounded delay, which is defined asfollows.

Definition 4.1: A pair of streams is a stepping-stonepair with bounded delay if it is a stepping-stone pair, and

for any .Our definition of stepping-stone pair with bounded delay

is the same as the one proposed by Donoho et al. in [2]. Thebounded delay model is fundamentally different from thebounded memory model considered in Section III. It has beenshown that the two models have very different scaling behavioron the mutual information between the incoming stream andthe outgoing stream [12]. In Section VI-A-2, we will show thatthey also have difference detection performance with respect tochanges in traffic rates.

We derive a timing-based detection algorithm DETECT-MATCH (DM) to detect such stepping-stone pairs. AlgorithmDM matches the first packets in with their possible relaysin , subject to the maximum delay . For stepping-stonepairs with bounded delay, there must be at least one way ofmatching that satisfies causality and bounded delay constraints,matching the arrivals of packets with the departures of thesame packets. For normal pairs, however, such matching maynot be possible. Algorithm DM uses this property to detectstepping-stone pairs with bounded delay.

A few definitions are needed to present the algorithm. Defineto be the index of the first arrival epoch in on or after

time , i.e.,

For example, is the first epoch in on or after time(see Fig. 4).

Definition 4.2: A match between and is a collection ofpairs , where and , such that

and for any . A length- matchis valid if for all . A match

is order-preserving if implies ,for all .

From this definition, it is easy to see that a stepping-stonepair with bounded delay must have at least one valid match.


Fig. 3. More than one valid match: both the solid and the dotted lines are validmatches.

TABLE IIDETECT-MATCH (DM)

Thus, one way to detect such stepping-stone pairs is by lookingfor a valid match between the arrivals and the departures. Thecomplexity of this approach is, however, exponential.5 Insteadof searching for any valid match, we prove that it suffices tolimit our search to order-preserving, valid matches, as stated inthe following proposition.

Proposition 4.3: If is a valid match, then thereexists a valid match between and that is order-preserving.

Proof: As illustrated in Fig. 3, if is avalid match which does not preserve the order of packets, wecan switch the match to be such that it isstill valid but the order is preserved. By this idea, we can reorder

into . The match isvalid and order-preserving.

By Proposition 4.3, it suffices to consider only the matchesthat preserve the order of packets, and the problem is reduced tofinding the departure that corresponds to the first arrival. Withthis idea in mind, we develop the following detector:

if s.t. the match

is valid

o.w.

which is implemented by the algorithm DM as shown in Table II.To analyze the complexity of DM, note that the inner loop has

operations, and the number of such loops is at most 1 plusthe number of arrivals in the interval in . Thus, thecomplexity of DM is at most

#arrivals in in

Now, we analyze the performance of DM. We will show thatany stepping-stone pairs with bounded delay will be detected

5For example, if there are at most L departures during time �, then the ex-haustive search for a length-n valid match has complexity O(L ).

Fig. 4. Match of s : there are three possible candidates.

by , i.e., there is no miss. We have shown by Proposition4.3 that a stepping-stone pair with bounded delay must havean order-preserving, valid match, and the problem of finding avalid match is reduced to a simpler problem of finding (i.e.,the match of ). There are some constraints on the range of

. The first constraint is causality, which requires .The second is bounded delay. Since the monitor may not havestarted recording from the beginning of the streams, there maybe packets sent before the monitor starts and received after-wards. This phenomenon, however, can only occur during time

because of the bounded delay assumption. Thus, for anystepping-stone pair, has to be the relay of for some

. As a result, has to satisfy ,as shown in Fig. 4. Therefore, DM must be able to return “AT-TACK.”

Next, we show that for independent Poisson normal pairs, thefalse alarm probability goes to zero exponentially, as stated inthe following theorem.

Theorem 4.4: If and are independent Poisson processesof rates and , respectively, then the false alarm probabilityof DM is bounded by

where .Proof: See Appendix.

Remark: Theorem 4.4 gives a few insights into theproblem. Since , we haveif , i.e., DM almost never falsely accuses slowindependent Poisson traffic.

Intuitively, it is easier to match two processes of equal rates.This intuition is strengthened by Theorem 4.4 because

, where , and thus, the upper bound forPoisson traffic of equal rates is larger.

Similar to DMV, we can also estimate the sample size re-quired by DM to achieve a given false alarm probability bycalculating the value that makes the upper bound in Theorem4.4 equal to , i.e.,

For example, if and , then a match length682 suffices to guarantee a false alarm probability bounded by1%. Note that for this match length, DM needs up to

packets on the average to find a valid match.


TABLE IIIDBMC

V. DETECTING STEPPING-STONE PAIRS IN CHAFF

So far we have not considered how to detect stepping-stonetraffic in chaff. In practice, the attacker usually combines trafficperturbation with the insertion of chaff to evade detection. In thissection, we address how to detect bounded memory or boundeddelay stepping-stone pairs in the presence of chaff. The key isto allow a limited violation of constraints so that the detectionscheme can still distinguish stepping-stone pairs and normalpairs while tolerating certain amount of chaff.

A. Bounded Memory Case

As explained in Section III, the maximum variation ofstepping-stone pairs with bounded memory is alwaysbounded by . After inserting chaff, the attacker can make themaximum variation larger than . But with a limited number ofchaff packets, the maximum variation will still be much less thanthat of independent processes. Based on this idea, we propose analgorithm called “DETECT-BOUNDED-MEMORY-CHAFF”(DBMC) presented in Table III. Algorithm DBMC has com-plexity .

If is a pair of stepping-stone streams passing througha host with memory size , the counter in Table III countsthe number of times the memory would have been underflowed

or overflowed ifchaff had not been inserted. Algorithm DBMC makes detectionif the fraction is suspiciously small. Since no attackingpacket can violate the memory constraint (only chaff packetscan), the number of chaff packets is at least . Therefore, toevade DBMC, the attacker has to insert at least chaffpackets in every packets. We conclude that DBMC is robustagainst up to fraction of chaff.

B. Bounded Delay Case

For stepping-stone pairs with bounded delay, we can alwaysmatch the incoming packets with the outgoing packets (perhapsexcept for the first few outgoing packets) so that all matched pairssatisfy causality and bounded delay. When chaff is inserted, wemay not be able to match all the packets. If the attacker does notinsert enough chaff, however, the attacking traffic will have muchmore matched packets than normal traffic. The algorithm is pre-sented in Table IV. Algorithm DBDC has complexity .

TABLE IVDBDC

Algorithm DBDC is inspired by an optimal chaff-inserting al-gorithm called “BOUNDED-GREEDY-MATCH” (BGM) pro-posed by Blum et al. [8]. For every arrival at time , BGMmatches it with the first unmatched departure in ; ifthere is no departure in this interval or all the departures havebeen matched, BGM inserts a chaff packet at arrival ; BGM alsoinserts chaff at all the departures which have no arrivals to matchto. Algorithm DBDC uses a counter to record the number ofchaff packets which would have been inserted had BGM beenused, and reports alarm if the fraction is smaller than apredetermined value. It is shown in [8] that BGM inserts theminimum chaff to embed a stepping-stone pair with boundeddelay into arbitrary point processes.6 If the attacker wants tosend attacking packets through a host with delays bounded by

, he needs to insert at least chaff packets in everypackets to evade DBDC. Therefore, DBDC is robust against

up to fraction of chaff. Here, is a design parameterwhich is an upper bound of the rates of and .

VI. COMPARING THE ALGORITHMS

We have introduced techniques for detecting stepping-stonepairs with various constrained perturbations or a combinationof perturbation and chaff. In practice, stepping-stone pairs mayvary in what conditions they satisfy depending on the nature ofthe attacks. For certain types of stepping-stone pairs, more thanone detection algorithm are applicable. The question is how tocompare the performance of different algorithms in detectingsuch stepping-stone pairs.

A. Algorithms for Packet-Conserving Transformations

1) DMV Versus DA: Blum et al. [8] consider the detectionof stepping-stone pairs that satisfy both the bounded delay andthe bounded peak rate conditions. The underlying idea is thatin interactive stepping-stone attacks, usually not only is thedelay bounded, but the peak rate at which the attacker can issuepackets is also bounded because he needs time to process thefeedback and type new commands. Specifically, Blum et al.consider stepping-stone pairs with bounded delay , and themaximum number of arrivals within time is . The second

6The original proof in [8] is for independent binomial processes, but it holdsfor arbitrary processes.


condition, referred to as the bounded peak rate condition, isformalized in the following definition.

Definition 6.1: A stepping-stone pair has boundedpeak rate if , for all

.Let be the largest number of packets the attacker

can send during the maximum delay . Note that stepping-stonepairs with bounded delay and bounded peak rate always usebounded memory, as stated in the following proposition.

Proposition 6.2: Define be the number of packetson in an interval . For a stepping-stone pair withbounded delay and bounded peak rate, if is the maximumdelay, and is the maximum number of packets that the at-tacker can send within time , then

i.e., the stepping-stone pair uses bounded memory .Proof: See Appendix.

By Proposition 6.2, we conclude that stepping-stone pairswith bounded delay and bounded peak rate are also stepping-stone pairs with bounded delay and bounded memory. Note thatthe inverse is not true, i.e., bounded delay and bounded memorydo not imply bounded peak rate.

Blum et al. in [8] propose a detection algorithm called“DETECT-ATTACKS” (DA) to detect stepping-stone pairswith bounded delay and bounded peak rate. Algorithm DAdivides samples in into groups of size .For each group, it computes the cumulative difference in thatgroup. Then, DA returns “NORMAL” if there exists a groupwith cumulative difference greater than . The detector usingDA is defined as follows:

where

ifo.w.

where is the cumulative dif-ference for packets in the th group.

Blum et al. show that DA has no miss for stepping-stonepairs with bounded delay and bounded peak rate. Moreover, theyprove that packets are needed to guaranteea false alarm probability no more than .

We, however, are interested in the asymptotic behavior of DAin terms of error exponent. Note that [8] does not compute theerror exponent for the false alarm probability of DA. To obtainits error exponent, we introduce the following lemma.

Lemma 6.3: For independent Poisson normal pairs

and when is large enough

where and.

Fig. 5. Statistics used by DA and DMV.

Proof: See Appendix 1.If is large, we can apply Lemma 6.3 to each group of

samples to obtain the upper and lower bounds on thefalse alarm probability of that group. Note that in [8] it is proventhat the single group false alarm probability is upper bounded by

. Hence, the false alarm probability of one group is upperbounded by

ifif

Algorithm DA has a false alarm if all the groupshave false alarms,7 so for large , the total false alarm proba-bility satisfies

(2)

Therefore, for large , the false-alarm error exponent of DA isat most and at least .

We want to compare DA with DMV in detecting stepping-stone pairs with bounded delay and bounded peak rate. ByProposition 6.2, we have shown that such stepping-stone pairssatisfy the bounded memory condition. Thus, DMV also has nomiss. We now compare their false alarm probabilities.

We first point out that DMV always outperforms DA forany realization. One reason is that(see Fig. 5), and another is that DA restarts computation from

at the beginning of each group, whereas DMVkeeps increasing the maximum variation across groups.Therefore, for every realization, if DMV has a false alarm, DAmust have a false alarm too.

Next, we compare their false alarm probabilities. In partic-ular, we are interested in whose false alarm probability has alarger error exponent. From Theorem 3.2 and (2), we see thatthe false-alarm error exponent of DMV is , whereas thatof DA is at most . By Taylor expansionof the error exponents, we have that as

Therefore, for large , the false-alarm error exponent of DMVis at least 3.38 times larger than that of DA.

2) DM Versus DMV: For stepping-stone pairs with bothbounded memory and bounded delay, both DMV and DMcan be used. We are interested in which algorithm performs

7In DA, the sample size n is always a multiple of 2(M + 1) .


better; particularly, we want to compare their asymptoticperformance. Note that we need to give DMV and DM thesame sample size to make a fair comparison. If we definesample size as the total number of monitored packets inboth the incoming and the outgoing streams, then the samplesize required by DM to find a length- match is random;it is at most #departures in . For large , thesample size is approximately . Hence, we should compare

with .Theorems 3.2 and 4.4 suggest that for Poisson processes of

equal rates , DM is preferable if , i.e.,

(3)

Otherwise, DMV is preferable. For example, for and, the threshold is . This threshold phenom-

enon has an intuitive explanation. Algorithm DMV only uses therank statistics, so it does not depend on the rate of the traffic; onthe other hand, DM performs better on slower traffic and worseon faster traffic. The reason for the latter is that meansthe inter-arrival time , which is equivalent to having finiteinter-arrival time but , i.e., for extremely slow traffic, al-most perfect synchrony is required to raise an alarm, and thus,it is unlikely for DM to have false alarms. Similarly, if

, the inter-arrival time ; equivalently, it means havingnon-zero inter-arrival time but , i.e., the delay con-straint is essentially removed, which causes DM to always raisealarms. Therefore, when the traffic is sufficiently slow, DM out-performs DMV, and otherwise DMV performs better than DM.The comparison suggests that the bounded memory condition ismore informative than the bounded delay condition in detectingstepping-stone traffic for . Since theright-hand side merely grows as , the memory bound canbe advantageous even for modest rate and large memory. For ex-ample, for packets, 10 s, we only needpackets per second for the bound memory condition to providebetter detection performance.

B. Algorithms Dealing With Chaff

In Section I-A, we have mentioned several existing detec-tion schemes dealing with chaff evasion (see [8]–[10]). We nowcompare these schemes to our proposed algorithms DBMC andDBDC.

1) DBMC Versus DAC: Detecting stepping-stone traffic withboth bounded memory perturbation and chaff has not been ad-dressed in the literature to the best of our knowledge. In [8],Blum et al. propose an algorithm called “DETECT-ATTACKS-CHAFF” (DAC) for detecting a more restricted class of step-ping-stone traffic with bounded delay, bounded peak rate, andchaff. Algorithm DAC works exactly the same as DA exceptthat the group size is increased from to ,and the threshold is increased from to . It is shown in[8] that DAC is robust against up to chaff packets in every

packets, and for independent Poisson traffic, the falsealarm probability of DAC is bounded by .

By Proposition 6.2, DBMC is also applicable to the step-ping-stone traffic Blum et al. consider for DAC. We compare

their performance in terms of robustness and false alarm prob-ability. As stated in [8], the attacker can evade DAC by in-serting chaff packets in a group of packets.As the traffic size increases, the fraction of chaff needed toevade DAC becomes negligible. Algorithm DBMC, as arguedin Section V-A, is robust against a number of chaff packets con-stituting fraction the total stepping-stone traffic. Thedrawback is that we do not have an explicit upper bound on thefalse alarm probability of DBMC, whereas DAC has exponen-tially decaying false alarm probability.

2) DBDC Versus S-III [10]: For detecting stepping-stonetraffic with both bounded delay and chaff, Peng et al. [9] andZhang et al. [10] both provide partial solutions for the specialcase when chaff only appears in the outgoing traffic. Peng et al.[9] use a watermarking scheme which requires the detector toactively manipulate the traffic, and thus, falls outside the scopeof this paper. Zhang et al. [10] propose a scheme called “S-III”which matches every arrival at with the first unmatched de-parture in , and makes detection if all the arrivals aresuccessfully matched.8 Scheme S-III is proven to have exponen-tially decaying false alarm probability for independent Poissonprocesses. If the attacker can insert chaff in the incoming traffic,however, one chaff packet is enough to defeat S-III. AlgorithmDBDC, on the other hand, is applicable to cases when both theincoming and outgoing streams are subject to chaff insertion.Furthermore, DBDC is robust against chaff traffic of non-zerorate. Its weakness, similar to DBMC, is the lack of guarantee infalse alarm probability.

VII. NUMERICAL RESULTS

We simulate our algorithms on both synthetic data and tracesto verify their performance. For synthetic data, we use indepen-dent Poisson processes of equal rates as our normal pairs; thegoal of using synthetic data is to validate our analysis. For realdata, we use the traces LBL-PKT-4, which contains an hour’sworth of all wide-area traffic between the Lawrence BerkeleyLaboratory and the rest of the world. The traces were made byPaxson and were first used in his paper [13].

A. Simulations on Synthetic Data

In this section, we simulate DA, DMV, and DM on syntheticindependent Poisson processes to verify their false alarm proba-bilities. We let packets, 10 s, and vary the samplesize between 2500 and 5000 packets (including both incomingand outgoing packets).9 The performance of DA and DMV doesnot depend on the traffic rate because they only rely on the rel-ative order of packets. For DM, rate does play a significant roleand will be specified when it is necessary.

We have shown the advantage of DMV over DA andhave quantified their difference in terms of error exponentas in Section VI-A-1. We now show how theirperformance compares for finite . In Fig. 6, we plot the

8In [10], there is also a variation of S-III called “S-IV,” which makes deci-sion by comparing the minimum deviation among all the valid matches with athreshold. The false alarm probability of S-IV is no larger than that of S-III, butS-IV is also easy to be defeated by chaff in the incoming traffic.

9Note that since DA requires the sample size to be a multiple of 2(M+1) =3362 packets, we extend the sample size for DA to 6724.


Fig. 6. P (� ); P (� ); and their bounds; M = 40 packets, 100 000Monte Carlo runs.

Fig. 7. P (� ) under various rates; � = 10 s, 100 000 Monte Carlo runs.

simulated false alarm probabilities of DMV and DA, togetherwith the upper bound on from Theorem 3.2 andthe asymptotic upper and lower bounds on from(2). Simulation shows that the asymptotic bounds in (2) arevalid even for relatively small . Furthermore, itconfirms our claim that the false alarm probability of DMVdecays much faster than that of DA.

We simulate DM for different traffic rates .The simulation results are plotted in Fig. 7. The upper bounds inTheorem 4.4 for rates between 3 and 4.5 are close to 1; the ac-tual false alarm probabilities obtained from simulation are muchlower. The plot shows that the upper bound in Theorem 4.4 is nottight, but it correctly predicts the fact that increaseswith the increase of traffic rate, as argued in Section VI-A-2.

Furthermore, we make an overall comparison by plotting thesimulated false alarm probabilities of DA, DMV, and DM to-gether in Fig. 8. From the plot it is clear that the comparison

Fig. 8. P (� ); P (� ) and P (� ); M = 40 packets, � = 10 s,100 000 Monte Carlo runs.

between DM and DMV depends on the traffic rate. In our sim-ulation, , the threshold rate estimated by (3)is about 1.0375. The simulation verifies the existence of such athreshold rate because the false alarm probability of DM decaysfaster than that of DMV for and slower for .Note, however, that in the estimation of the threshold rate we areconservative about DM. This is because for DMV, Theorem 3.2gives the exact error exponent, whereas for DM, Theorem 4.4only characterizes a lower bound on its error exponent (whichis shown to be not tight). Therefore, we expect that the actualthreshold rate is larger than the one estimated by (3), e.g., in thesimulation the threshold rate is about 4.

B. Simulations on Traces

We extract 134 flows from the TCP packets in LBL-PKT-4.Each flow has at least 1000 packets, and four of them have atleast 10 000 packets. For the testing of false alarm probabilities,we take all combinations of the 134 flows, filter out the pairssatisfying the definition of stepping-stone pairs with boundedmemory or bounded delay,10 and treat the rest as normal pairs.For the testing of miss probabilities, we introduce independenttiming perturbation and chaff into the four flows with 10 000packets to generate independent copies of their stepping-stonerelays. To generate bounded delay perturbation, we add to everypacket a delay chosen independently and uniformly from .To generate bounded memory perturbation, we divide packetsinto segments of size , and randomly generate relaypackets in the th segment for .11 Furthermore,we insert chaff packets in both and according to uni-form distributions on the range of the flows. In this section, welet packets, 5 s, and packets. InDBDC, we also set packets per second.

10The filtering is done by running DMV or DM on the entire flow pairs, andexcluding the pairs reported as “ATTACK.”

11The departures in the (i + 1)th segment can be viewed as relays of thepackets arriving in the ith segment. It is easy to see that such perturbation sat-isfies the bounded memory constraint.


Fig. 9. P (� ); P (� ); P (� ); and P (� ) onLBL-PKT-4 (n is the joint sample size).

Fig. 10. P (� ) and P (� ) (n is the joint sample size).

We first simulate the false alarm probabilities of DBMC,DAC, DBDC, and S-III; see Fig. 9. The false alarm probabil-ities of DBMC, DAC, and DBDC are comparable, and theydo not change much after sample size 1000; the false alarmprobability of S-III, however, keeps decreasing after 1000packets to a much smaller value. From the plot, we see that thefalse alarm probabilities of DBMC, DAC, and DBDC for thetraces do not decay exponentially. It is possible that the falsealarm probability of S-III still decays exponentially, but we donot have enough data in these traces to verify that.

We then simulate the miss probabilities of DBMC and DACon the long flows (of size 10 000) and their synthetic relays (seeFig. 10). For each of the four long flows, we generate 1000 in-dependent relay flows by random segment generation and uni-form chaff insertion. Thus, we totally have 4000 stepping-stonepairs with bounded memory in chaff. The simulation shows thatDBMC has much lower miss probability than DAC. In fact,DBMC detects all the stepping-stone pairs in our simulation,

Fig. 11. P (� ) and P (� ) (n is the joint sample size).

whereas DAC has up to 27.58% miss by sample size 22 000. Theplot also shows that the miss probability of DAC increases withthe increase of the average number of chaff packets. This resultconforms to our analysis because the number of chaff packetsthat are needed to evade DBMC is proportional to the trafficsize, whereas DAC can be evaded by a fixed number of chaffpackets. Note that our robustness claim about DBMC is con-servative; DBMC is robust against up tofraction of chaff no matter how the chaff packets are inserted. Inthe simulation, DBMC survives 0.1 fraction of chaff, which im-plies that the uniform chaff insertion is not optimal for boundedmemory stepping-stone pairs.

We next simulate the miss probabilities of DBDC and S-III;see Fig. 11. For each of the four long flows, we generate 1000independent relay flows by introducing independent delays andchaff packets. The plot confirms that DBDC has a much smallermiss probability than S-III; actually, in the simulation, DBDChas no miss for almost all the sample sizes.12 This is expectedbecause DBDC is robust against up to a certain fraction of chaffpackets no matter where they are inserted, whereas S-III is vul-nerable to the chaff insertion into . We see that in the simula-tion DBDC successfully handles 0.1 fraction of chaff, which islarger than the fraction which DBDC isguaranteed to be able to handle. Similar to the case of DBMC,this shows that the uniform chaff insertion is not optimal forbounded delay stepping-stone pairs, either. From Figs. 10 and11, we see that DAC is more robust to chaff than S-III.

VIII. CONCLUSION

In this paper, we develop techniques to detect encrypted step-ping-stone connections. These techniques can rule out indepen-dent connection pairs and leave a much smaller number of sus-picious connections for further examination. After discoveringall the stepping-stone connections, one can use existing serial-ization methods to determine the intrusion path [14].

12It is except for the sample size 3000, where we have six misses out of 4000Monte Carlo runs.


APPENDIX

Proof of Theorem 3.2 and Lemma 6.3: The proof is basedon the theory of random walk. Let be a simple randomwalk, i.e.,

where are i.i.d. random variables taking value in. Let . Define

the hitting time of or as

or (4)

The following lemma is from [15].Lemma 8.1:

(5)

where . If, then for large

(6)

Moreover, there exist constants andnot depending on , s.t.,

(7)

where .Since

[(5), (6)] give upper and lower bounds on .For the proof of Theorem 3.2, note that for independent

Poisson processes, it is known that is a simple randomwalk. Define extreme values

. A false alarm occurs in DMV if and only if. Note that the false alarm probability is the

largest if is symmetric (i.e., ). Then, wehave

(8)

(9)

where . Here, (8) is by union bound, and (9)is by noticing

and then applying (5) with . Furthermore, by (7),it is easy to see that

For the proof of Lemma 6.3, note that

Applying [(5), (6)] with andgives the desired result.

Proof of Theorem 4.4: Given a match , de-

fine . Algorithm DM has a false alarm if and only ifthere exists , s.t., the order-preserving matchsatisfies , for all .

For , define the interarrival times be ,

and . Let . Then

Therefore, given is a general random walk withsteps ’s. We know that and are independent exponentialrandom variables with mean and , respectively, andthus, ’s are i.i.d. with distribution function

if

if

The probability density function (pdf) of is

if

if

The false alarm probability satisfies

s.t.

Fix a . For , define

Define (Dirac delta function). In [15, p. 53],it is shown that

Then, we have

Let . Simple calculation yieldsthat . Then


Since this is true for all , we have.Proof of Proposition 6.2: If , then

For , let be the number of packetsthat arrive in and departure after andbe the number of packets that arrive in and departurebefore . Then

We have

Since and ,we have

Since and, we have

REFERENCES

[1] S. Staniford-Chen and L. Heberlein, “Holding intruders accountable onthe internet,” in Proc. IEEE Symp. Security Privacy, 1995, pp. 39–49.

[2] D. Donoho, A. Flesia, U. Shankar, V. Paxson, J. Coit, and S. Stani-ford, “Multiscale stepping-stone detection: Detecting pairs of jitteredinteractive streams by exploiting maximum tolerable delay,” in Proc.5th Int. Symp. Recent Adv. Intrusion Detection, Lecture Notes Comput.Sci. 2516, 2002, pp. 17–35.

[3] X. Wang, D. Reeves, S. Wu, and J. Yuill, “Sleepy watermark tracing:An active network-based intrusion response framework,” in Proc. 16thInt. Inf. Security Conf., 2001, pp. 369–384.

[4] Y. Zhang and V. Paxson, “Detecting stepping stones,” in Proc. 9thUSENIX Security Symp., 2000, pp. 171–184.

[5] K. Yoda and H. Etoh, “Finding a connection chain for tracing in-truders,” in Proc. 6th Eur. Symp. Res. Comput. Security, Lecture NotesComput. Sci. 1895, 2000, pp. 191–205.

[6] X. Wang, D. Reeves, and S. Wu, “Inter-packet delay-based correlationfor tracing encrypted connections through stepping stones,” in Proc.7th Eur. Symp. Research Comput. Security, Lecture Notes Comput. Sci.2502, 2002, pp. 244–263.

[7] X. Wang and D. Reeves, “Robust correlation of encrypted attack trafficthrough stepping stones by manipulation of inter-packet delays,” inProc. ACM Conf. Comput. Commun. Security, 2003, pp. 20–29.

[8] A. Blum, D. Song, and S. Venkataraman, “Detection of interactive step-ping stones: Algorithms and confidence bounds,” in Proc. Conf. Adv.Intrusion Detection (RAID), 2004, pp. 258–277.

[9] P. Peng, P. Ning, D. Reeves, and X. Wang, “Active timing-based corre-lation of perturbed traffic flows with chaff packets,” in Proc. 25th IEEEInt. Conf. Distrib. Comput. Syst. Workshops, 2005, pp. 107–113.

[10] L. Zhang, A. Persaud, A. Johson, and Y. Guan, “Detection of steppingstone attack under delay and chaff perturbations,” presented at the 25thIEEE Int. Perform. Comput. Commun. Conf. (IPCCC), Phoenix, AZ,2006.

[11] T. He and L. Tong, “A signal processing perspective to stepping-stonedetection,” presented at the Conf. Inf. Sci. Syst., Princeton, NJ, 2006.

[12] J. Giles and B. Hajek, “An information-theoretic and game-theoreticstudy of timing channels,” IEEE Trans. Inf. Theory, vol. 48, no. 9, pp.2455–2477, Sep. 2002.

[13] V. Paxson and S. Floyd, “Wide-area traffic: The failure of poisson mod-eling,” IEEE/ACM Trans. Netw., vol. 3, pp. 226–244, Jun. 1995.

[14] X. Wang, “The loop fallacy and serialization in tracing intrusion con-nections through stepping stones,” in Proc. ACM Symp. Appl. Comput.,2004, pp. 404–411.

[15] D. Cox and H. Miller, The Theory of Stochastic Processes. New York:Wiley, 1965.

Ting He (S’04) received the B.S. degree in computerscience and technology from Beijing University, Bei-jing, China, in 2003. Since 2003, she has been in theM.S/Ph.D. program in the School of Electrical andComputer Engineering at Cornell University, Ithaca,NY.

She has been a member of the Adaptive Com-munications and Signal Processing Group (ACSP) atCornell University since August, 2003. Her researchinterests include signal processing, sensor networks,information theory, and network security.

Ms. He is a recipient of a Student Paper Award at the 2005 International Con-ference on Acoustic, Speech, and Signal Processing (ICASSP).

Lang Tong (S’87–M’91–SM’01–F’05) receivedthe B.E. degree from Tsinghua University, Beijing,China, in 1985, and the Ph.D. degree in electricalengineering from the University of Notre Dame,Notre Dame, IN, in 1991.

He joined Cornell University, Ithaca, NY, in 1998,where he is currently a Professor. Prior to joiningCornell University, he was on faculty member atthe West Virginia University, Morgantown, and theUniversity of Connecticut, Hartford. He was alsothe 2001 Cor Wit Visiting Professor at the Delft

University of Technology, Delft, The Netherlands. He was a PostdoctoralResearch Affiliate at the Information Systems Laboratory, Stanford University,Palo Alto, CA, in 1991. He received Young Investigator Award from the Officeof Naval Research.

Dr. Tong is a recipient of the Outstanding Young Author Award from the IEEECircuits and Systems Society, the 2004 Best Paper Award (with M. Dong) fromthe IEEE Signal Processing Society, and the 2004 Leonard G. Abraham PrizePaper Award from the IEEE Communications Society (with P. Venkitasubra-maniam and S. Adireddy). He also a coauthor of three student Paper Awards.

1612 ieee transactions on signal processing, vol. …acsp.ece.cornell.edu/papers/hetong07sp.pdfhe...

Documents