large-scale ip traceback in high-speed internet : practical techniques and theoretical foundation

Large-Scale IP Traceback in High-Speed Internet

: Practical Techniques and Theoretical Foundation

Jun (Jim) XuNetworking & Telecommunications Group

College of Computing

GeorgiaGeorgia Institute of Institute of TechTechnologynology

(Joint work with Jun Li, Minho Sung, Li Li)(Joint work with Jun Li, Minho Sung, Li Li)

2004 IEEE Symposium on Security and Privacy

Introduction

• Internet DDoS attack is real threat - on websites

· Yahoo, CNN, Amazon, eBay, etc (Feb. 2000)

services were unavailable for several hours

- on Internet infrastructure

· 13 root DNS servers (Oct, 2002)

7 of them were shut down completely

• First step to counter attack : identification of attackers - IP spoofing enables attackers to hide their identity

- IP Traceback : mechanism to trace the attack sources

State of IP Traceback

• Assumptions “inherited” from the literature

- attackers send lots of packets

- Traceback scheme uses limited space in IP header

- attackers are aware of the effort and can sabotage

• Two main types of proposed traceback techniques (1) Probabilistic Packet Marking (PPM) scheme

a. routers : probabilistically mark each packet

with partial path info using some coding algorithms

b. victim : reconstruct the attacking paths using some

decoding algorithms

State of IP Traceback (Cont.)

• Two main types of proposed traceback techniques (2) Hash-based scheme

a. routers : store packet digests

b. victim : uses recursive lookup to reconstruct the attack path

Victim

attacker

packet digest

packet digest

packet digest

“Have you seen this packet?” “yes”



Scalability Problems of Two Approaches

• PPM schemes

- limited marking field (17-bits)

- cannot scale to large number of attackers

• Hash-based scheme

- recording 100% of the packet digests

- infeasible for high-speed links

• Our objective : design a traceback scheme that is scalable

both to the number of attackers and to high link speed

Outline of the talk

Overview of our solution

Design detail

Information theoretic framework

Performance Evaluation

Related work, Future work, Conclusion

Design Overview

• Our idea : store digests of sampled packets only

- use small sampling rate p (such as 3.3%)

- small storage and computational cost

- can scale to OC-192 or OC-768 link speed

- Let us go across the DRAM/SRAM speed barrier

• the challenge of the sampling - one packet traceback is not possible

: need to obtain larger number of attack packets

- independent random sampling will not work

-- need to improve “correlation factor”Victim

attacker

packet digest

correlation

correlation

Information-theoretic framework overview

• Information-theoretic framework to solve an optimization problem

(1) given fixed resource constraints (e.g. we can use 0.4 bits per packet in bloom filter in average), what is the best parameter setting for number of hash functions and sampling probability?

- relationship between resource constrains and two parameters resource constraints = number of hash functions sampling probability

- two tradeoffs higher number of hash functions gives less false positive rate in bloom filter higher sampling probability gives higher sampling correlation (easier traceback)

ex) when s=0.4, which set is best? (8 hash, 5% sampling) vs (12 hash, 3.3% sampling) vs (16 hash, 2.5% sampling)

Information-theoretic framework overview

• Information-theoretic framework to establish a lower bound

(2) what’s the lower bound of the size of the evidence to achieve

a certain level of traceback accuracy?

- there is a tradeoff between

the number of attack packets used for traceback (evidence)

vs

the accuracy of the traceback

ex) we want to find the minimum size of the evidence for identifying more than 90% of the attack sources

Outline of the talk


Design Detail

Information theoretic framework



One-bit Random Marking and Sampling(ORMS)

• Basic idea - each router sample the packet with probability p

- ORMS make correlation factor be larger than 50%

: we sample more than 50% of the packet which are sampled at previous router

- use 1 bit marking for coordinating the sampling

pSample all marked packets p/2

Sample unmarked packetwith probability p/(2-p)

correlation :

2 2 2 2

p p p p

p p

total sampling probability :

12 2 2

p p pp

p

Sample and mark

Sample andnot mark

p/2

1

0

0

1

0 0

correlation factor (sampled by both) : ( > 50% because 0<p<1 )

1/

2 2

pp

p p

One-bit Random Marking and Sampling(ORMS)

• why not trajectory sampling? - the attacker can use hash values that escape sampling

• design tampering-resistant scheme - Why save p/2 of marked packets, and p/2 of unmarked packets?

Why not simply save all packets that are marked with 1?

tampering jump-start stationary stationary

0 r 1

r : rate of marked packets make r to p/2

using dual-leaky-bucketr = p/2 r = p/2

- “jump-start” in first hop using dual leaky bucket scheme

Victim

attacker

Other host

send marked normal traffics

send unmarked attack traffics

Traceback Processing

Victim

attacker

packet digest

packet digest

packet digest

1. Collect a set of attack packets Lv

2. Check router S, a neighbor of the victim, with Lv

3. Check each router R ( neighbor of S ) with Ls

Lv

S

“Have you seen any of these packets? “yes”RLs

“You are convicted! Use these evidences to make your Ls”

Traceback Processing

Victim

attacker

packet digest

packet digest

packet digest

4. Pass Lv to R to be used to make new Ls

5. Repeat these processes

S

R

“You are convicted! Use these evidences to make your Ls”

“Have you seen any of these packets? “yes”

Lv

Ls

Outline of the talk


Design Detail

Information-theoretic framework



Why do we need theoretical foundation?

• Information-theoretic framework

- view the traceback system as a communication channel

- tradeoff between sampling rate and the size of packet digest : optimal parameter setting maximizes channel capacity (i.e. mutual information )

- tradeoff between the number of packets and the traceback accuracy : Information theory allows us to derive the lower bound on the number of packets (evidence) to achieve a certain level of traceback accuracy through Fano’s inequality

• Concepts

- Entropy H(X) : measures the uncertainty of X

- Conditional entropy H(X|Y) : measures how much uncertainty remains for X given the observation of Y

• Fano’s inequality

- Given an observation of Y, our estimation of X is We denote pe as

- H(pe) H(X|Y), if X is binary-valued

Information Theory Background

X̂ˆPr[X X]

Applications of Information Theory

R1

Lv

Ls

• What we can observe :

Xt1 + Xf1 , Yt + Yf

• We want to estimate Z

• Question : How to maximize

our accuracy in estimating Z?

• Answer :

minimize H(Z|Xt1+Xf1,Yt+Yf)

t21 if X 0

0 otherwiseZ

Xt1 Xf1

Yt Yf

Np : # of pkts in Lv

Xt2

Lv

R2

false positive

true positiveLegend:

Victim

Z=1Xt2

• Parameter tuning :

- k : number of hash functions in a Bloom filter

- to maximize our accuracy in estimating Z,

we would like to compute

k* = argmin H( Z | Xt1+Xf1, Yt+Yf )

k

subject to the resource constraint ( s = k p )


s : average number of bits for each packet

p : sampling probability


Resource constraint: s = k p = 0.4

• Lower bound on the number of packets to achieve a certain level of traceback accuracy :

Fano’s inequality : H(pe) H( Z | Xt1+Xf1, Yt+Yf )


Parameters: s=0.4, k=12, p=3.3% (12 3.3% = 0.4)

Outline of the talk


Design Detail




• Three Topologies - Skitter data I, Skitter data II, Bell-lab’s data (routes from a host to 192,900, 158,181, 86,813 destinations)

• Host setting : - Victim : all three topologies are routes from a single origin to many destinations, assume this origin to be the victim - Attackers : randomly distributed among the destination hosts

• Performance Metrics - False Negative Ratio (FNR): the ratio of the number of missed routers to the number of infected routers - False Positive Ratio (FPR): the ratio of the number of incorrectly convicted routers to the number of convicted routers

Simulation set-up

• False Negative & False Positive on Skitter I topology

Simulation results

Parameters: s=0.4, k=12, p=3.3% (12 3.3% = 0.4)

• Parameter tuning

Verification of Theoretical Analysis

Parameters: 1000 attackers, s = k p = 0.4

• Error levels by different k values


Parameters: 2000 attackers, Np=200,000

• Lower bound on the number of packets to achieve a certain level of traceback accuracy


Parameters: s = 0.4, k = 12, p = 3.3%

Outline of the talk


Design Detail




Related work (not exhaustive)

• PPM (Probabilistic Packet Marking) traceback schemes - S. Savage et al., Practical network support for IP traceback,

SIGCOMM 2000

- M.T.Goodrich, Efficient packet marking for large-scale IP traceback, ACM CCS 2002

• Hash-based traceback scheme - Snoeren et al., Hash-based IP traceback, SIGCOMM 2001

• Analysis of the traceback scheme and lower bounds - M. Adler, Tradeoffs in PPM for IP traceback, ACM STOC 2002

Discussion and future work

1. Is correlation factor 1/(2-p) optimal for coordination using one bit?

2. What if we use more that one bit for coordinating sampling?

3. How to optimally combine PPM and hash-based scheme – a Network Information Theory question.

4. How to know with 100% certainty that some packets are attack packets? How about we only know with p-certainty?

Conclusion

• New approach to IP traceback is presented - using sampling, the scheme can scale to very high link speed

- ORMS, a novel sampling technique, is introduced

• Analysis using Information-theoretic framework - allows us to compute the optimal parameters

- can be used to compute the trade-off between the amount of evidence and the traceback accuracy

• Simulation study - demonstrate the high performance of the scheme even with

thousands of attackers and very low (3.3%) of sampling rate

large-scale ip traceback in high-speed internet : practical techniques and theoretical foundation

Documents