large-scale ip traceback in high-speed internet : practical techniques and theoretical foundation

31
Large-Scale IP Traceback in High- Speed Internet : Practical Techniques and Theoretical Foundation Jun (Jim) Xu Networking & Telecommunications Group College of Computing Georgia Georgia Institute of Institute of Tech Tech nology nology (Joint work with Jun Li, Minho Sung, Li Li) (Joint work with Jun Li, Minho Sung, Li Li) 2004 IEEE Symposium on Security and Privacy

Upload: taro

Post on 14-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

2004 IEEE Symposium on Security and Privacy. Large-Scale IP Traceback in High-Speed Internet : Practical Techniques and Theoretical Foundation. Jun (Jim) Xu Networking & Telecommunications Group College of Computing Georgia Institute of Tech nology - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Large-Scale IP Traceback in High-Speed Internet

: Practical Techniques and Theoretical Foundation

Jun (Jim) XuNetworking & Telecommunications Group

College of Computing

GeorgiaGeorgia Institute of Institute of TechTechnologynology

(Joint work with Jun Li, Minho Sung, Li Li)(Joint work with Jun Li, Minho Sung, Li Li)

2004 IEEE Symposium on Security and Privacy

Page 2: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Introduction

• Internet DDoS attack is real threat - on websites

· Yahoo, CNN, Amazon, eBay, etc (Feb. 2000)

services were unavailable for several hours

- on Internet infrastructure

· 13 root DNS servers (Oct, 2002)

7 of them were shut down completely

• First step to counter attack : identification of attackers - IP spoofing enables attackers to hide their identity

- IP Traceback : mechanism to trace the attack sources

Page 3: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

State of IP Traceback

• Assumptions “inherited” from the literature

- attackers send lots of packets

- Traceback scheme uses limited space in IP header

- attackers are aware of the effort and can sabotage

• Two main types of proposed traceback techniques (1) Probabilistic Packet Marking (PPM) scheme

a. routers : probabilistically mark each packet

with partial path info using some coding algorithms

b. victim : reconstruct the attacking paths using some

decoding algorithms

Page 4: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

State of IP Traceback (Cont.)

• Two main types of proposed traceback techniques (2) Hash-based scheme

a. routers : store packet digests

b. victim : uses recursive lookup to reconstruct the attack path

Victim

attacker

packet digest

packet digest

packet digest

“Have you seen this packet?” “yes”

“Have you seen this packet?” “yes”

“Have you seen this packet?” “yes”

Page 5: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Scalability Problems of Two Approaches

• PPM schemes

- limited marking field (17-bits)

- cannot scale to large number of attackers

• Hash-based scheme

- recording 100% of the packet digests

- infeasible for high-speed links

• Our objective : design a traceback scheme that is scalable

both to the number of attackers and to high link speed

Page 6: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Outline of the talk

Overview of our solution

Design detail

Information theoretic framework

Performance Evaluation

Related work, Future work, Conclusion

Page 7: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Design Overview

• Our idea : store digests of sampled packets only

- use small sampling rate p (such as 3.3%)

- small storage and computational cost

- can scale to OC-192 or OC-768 link speed

- Let us go across the DRAM/SRAM speed barrier

• the challenge of the sampling - one packet traceback is not possible

: need to obtain larger number of attack packets

- independent random sampling will not work

-- need to improve “correlation factor”Victim

attacker

packet digest

correlation

correlation

Page 8: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Information-theoretic framework overview

• Information-theoretic framework to solve an optimization problem

(1) given fixed resource constraints (e.g. we can use 0.4 bits per packet in bloom filter in average), what is the best parameter setting for number of hash functions and sampling probability?

- relationship between resource constrains and two parameters resource constraints = number of hash functions sampling probability

- two tradeoffs higher number of hash functions gives less false positive rate in bloom filter higher sampling probability gives higher sampling correlation (easier traceback)

ex) when s=0.4, which set is best? (8 hash, 5% sampling) vs (12 hash, 3.3% sampling) vs (16 hash, 2.5% sampling)

Page 9: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Information-theoretic framework overview

• Information-theoretic framework to establish a lower bound

(2) what’s the lower bound of the size of the evidence to achieve

a certain level of traceback accuracy?

- there is a tradeoff between

the number of attack packets used for traceback (evidence)

vs

the accuracy of the traceback

ex) we want to find the minimum size of the evidence for identifying more than 90% of the attack sources

Page 10: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Outline of the talk

Overview of our solution

Design Detail

Information theoretic framework

Performance Evaluation

Related work, Future work, Conclusion

Page 11: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

One-bit Random Marking and Sampling(ORMS)

• Basic idea - each router sample the packet with probability p

- ORMS make correlation factor be larger than 50%

: we sample more than 50% of the packet which are sampled at previous router

- use 1 bit marking for coordinating the sampling

pSample all marked packets p/2

Sample unmarked packetwith probability p/(2-p)

correlation :

2 2 2 2

p p p p

p p

total sampling probability :

12 2 2

p p pp

p

Sample and mark

Sample andnot mark

p/2

1

0

0

1

0 0

correlation factor (sampled by both) : ( > 50% because 0<p<1 )

1/

2 2

pp

p p

Page 12: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

One-bit Random Marking and Sampling(ORMS)

• why not trajectory sampling? - the attacker can use hash values that escape sampling

• design tampering-resistant scheme - Why save p/2 of marked packets, and p/2 of unmarked packets?

Why not simply save all packets that are marked with 1?

tampering jump-start stationary stationary

0 r 1

r : rate of marked packets make r to p/2

using dual-leaky-bucketr = p/2 r = p/2

- “jump-start” in first hop using dual leaky bucket scheme

Victim

attacker

Other host

send marked normal traffics

send unmarked attack traffics

Page 13: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Traceback Processing

Victim

attacker

packet digest

packet digest

packet digest

1. Collect a set of attack packets Lv

2. Check router S, a neighbor of the victim, with Lv

3. Check each router R ( neighbor of S ) with Ls

Lv

S

“Have you seen any of these packets? “yes”RLs

“You are convicted! Use these evidences to make your Ls”

Page 14: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Traceback Processing

Victim

attacker

packet digest

packet digest

packet digest

4. Pass Lv to R to be used to make new Ls

5. Repeat these processes

S

R

“You are convicted! Use these evidences to make your Ls”

“Have you seen any of these packets? “yes”

Lv

Ls

Page 15: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Outline of the talk

Overview of our solution

Design Detail

Information-theoretic framework

Performance Evaluation

Related work, Future work, Conclusion

Page 16: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Why do we need theoretical foundation?

• Information-theoretic framework

- view the traceback system as a communication channel

- tradeoff between sampling rate and the size of packet digest : optimal parameter setting maximizes channel capacity (i.e. mutual information )

- tradeoff between the number of packets and the traceback accuracy : Information theory allows us to derive the lower bound on the number of packets (evidence) to achieve a certain level of traceback accuracy through Fano’s inequality

Page 17: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

• Concepts

- Entropy H(X) : measures the uncertainty of X

- Conditional entropy H(X|Y) : measures how much uncertainty remains for X given the observation of Y

• Fano’s inequality

- Given an observation of Y, our estimation of X is We denote pe as

- H(pe) H(X|Y), if X is binary-valued

Information Theory Background

X̂ˆPr[X X]

Page 18: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Applications of Information Theory

R1

Lv

Ls

• What we can observe :

Xt1 + Xf1 , Yt + Yf

• We want to estimate Z

• Question : How to maximize

our accuracy in estimating Z?

• Answer :

minimize H(Z|Xt1+Xf1,Yt+Yf)

t21 if X 0

0 otherwiseZ

Xt1 Xf1

Yt Yf

Np : # of pkts in Lv

Xt2

Lv

R2

false positive

true positiveLegend:

Victim

Z=1Xt2

Page 19: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

• Parameter tuning :

- k : number of hash functions in a Bloom filter

- to maximize our accuracy in estimating Z,

we would like to compute

k* = argmin H( Z | Xt1+Xf1, Yt+Yf )

k

subject to the resource constraint ( s = k p )

Applications of Information Theory

s : average number of bits for each packet

p : sampling probability

Page 20: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Applications of Information Theory

Resource constraint: s = k p = 0.4

Page 21: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

• Lower bound on the number of packets to achieve a certain level of traceback accuracy :

Fano’s inequality : H(pe) H( Z | Xt1+Xf1, Yt+Yf )

Applications of Information Theory

Parameters: s=0.4, k=12, p=3.3% (12 3.3% = 0.4)

Page 22: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Outline of the talk

Overview of our solution

Design Detail

Information-theoretic framework

Performance Evaluation

Related work, Future work, Conclusion

Page 23: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

• Three Topologies - Skitter data I, Skitter data II, Bell-lab’s data (routes from a host to 192,900, 158,181, 86,813 destinations)

• Host setting : - Victim : all three topologies are routes from a single origin to many destinations, assume this origin to be the victim - Attackers : randomly distributed among the destination hosts

• Performance Metrics - False Negative Ratio (FNR): the ratio of the number of missed routers to the number of infected routers - False Positive Ratio (FPR): the ratio of the number of incorrectly convicted routers to the number of convicted routers

Simulation set-up

Page 24: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

• False Negative & False Positive on Skitter I topology

Simulation results

Parameters: s=0.4, k=12, p=3.3% (12 3.3% = 0.4)

Page 25: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

• Parameter tuning

Verification of Theoretical Analysis

Parameters: 1000 attackers, s = k p = 0.4

Page 26: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

• Error levels by different k values

Verification of Theoretical Analysis

Parameters: 2000 attackers, Np=200,000

Page 27: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

• Lower bound on the number of packets to achieve a certain level of traceback accuracy

Verification of Theoretical Analysis

Parameters: s = 0.4, k = 12, p = 3.3%

Page 28: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Outline of the talk

Overview of our solution

Design Detail

Information-theoretic framework

Performance Evaluation

Related work, Future work, Conclusion

Page 29: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Related work (not exhaustive)

• PPM (Probabilistic Packet Marking) traceback schemes - S. Savage et al., Practical network support for IP traceback,

SIGCOMM 2000

- M.T.Goodrich, Efficient packet marking for large-scale IP traceback, ACM CCS 2002

• Hash-based traceback scheme - Snoeren et al., Hash-based IP traceback, SIGCOMM 2001

• Analysis of the traceback scheme and lower bounds - M. Adler, Tradeoffs in PPM for IP traceback, ACM STOC 2002

Page 30: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Discussion and future work

1. Is correlation factor 1/(2-p) optimal for coordination using one bit?

2. What if we use more that one bit for coordinating sampling?

3. How to optimally combine PPM and hash-based scheme – a Network Information Theory question.

4. How to know with 100% certainty that some packets are attack packets? How about we only know with p-certainty?

Page 31: Large-Scale IP Traceback in High-Speed Internet :  Practical Techniques and Theoretical Foundation

Conclusion

• New approach to IP traceback is presented - using sampling, the scheme can scale to very high link speed

- ORMS, a novel sampling technique, is introduced

• Analysis using Information-theoretic framework - allows us to compute the optimal parameters

- can be used to compute the trade-off between the amount of evidence and the traceback accuracy

• Simulation study - demonstrate the high performance of the scheme even with

thousands of attackers and very low (3.3%) of sampling rate