anycast vs. ddos: evaluating nov. 30

33
Anycast vs. DDoS: Evaluating Nov. 30 Giovane C. M. Moura 1 , Ricardo de O. Schmidt 2 , John Heidemann 3 , Wouter B. de Vries 2 , Moritz Muller 1 , Lan Wei 3 , Christian Hesselman 1 1 SIDN Labs 2 University of Twente 3 USC/ISI DNS-OARC Dallas, Texas, USA 2016-10-16 Copyright © 2016 by John Heidemann Release terms: CC-BY-NC 4.0 international Based on a technical paper “Anycast vs. DDoS: Evaluating the November 2015 Root DNS Event”, to appear at ACM IMC 2016.

Upload: others

Post on 29-Nov-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Anycast vs. DDoS: Evaluating Nov. 30

Anycast vs. DDoS:

Evaluating Nov. 30

Giovane C. M. Moura1, Ricardo de O. Schmidt2,

John Heidemann3, Wouter B. de Vries2,

Moritz Muller1, Lan Wei3, Christian Hesselman1

1SIDN Labs 2University of Twente 3USC/ISI

DNS-OARC Dallas, Texas, USA 2016-10-16

Copyright © 2016 by John Heidemann

Release terms: CC-BY-NC 4.0 international

Based on a technical paper “Anycast vs. DDoS: Evaluating the

November 2015 Root DNS Event”, to appear at ACM IMC 2016.

Page 2: Anycast vs. DDoS: Evaluating Nov. 30

A Bad Day at the Root…

2Anycast vs. DDoS / 2016-10-16

data: RIPE DNSmon

red: >30% loss

(some sites ~99% loss!)

What happened?

Anycast vs. DDoS

in general?

What does “red”

really mean?

Page 3: Anycast vs. DDoS: Evaluating Nov. 30

DDoS: Bad and Getting Worse

• big and getting bigger

– 2012: first 100Gb/s [Arbor12a]

– 2016: 100Gb/s common; 540Gb/s seen; 1Tb/s possible

• easy and getting easier

– 2012: several 1000+-node botnets

– 2016: DDoS-as-a-service (booters): few Gb/s @ US$1

• frequent and getting frequent-er

– 2002: the October 30 DNS root event

– 2016: 3 recent big attacks (2015-11-30, 2015-12-01, 2016-06-25)

3Anycast vs. DDoS / 2016-10-16

Page 4: Anycast vs. DDoS: Evaluating Nov. 30

Does Anycast Defend?

4Anycast vs. DDoS / 2016-10-16

561 root DNS locations

for 13 services (in 2016-01)

large capex and opex

data:

www.root-servers.org

is 561 too few? too many?

what happens under stress?

How Well

Page 5: Anycast vs. DDoS: Evaluating Nov. 30

Our Work: Study Nov. 30 Event

approach and goals

• gather public info aboutNov. 30 event

• study it carefully

– blank

• identify design choices

• generalize for anycast– blank

• suggest future defenses

non-approach and non-goals

• no inside information

• not bashing operators

• not just intentional, but also

emergent policies

• not only about DNS and roots

• not help attackers

5Anycast vs. DDoS / 2016-10-16

Page 6: Anycast vs. DDoS: Evaluating Nov. 30

• public evaluation of anycast under stress)

• public articulation of design options

• evaluation of collateral damage

– (al

Contributions

6Anycast vs. DDoS / 2016-10-16

goals:

• public discussion => greater transparency

• expectation setting

• possible future defenses

prior work for all, but in private

Page 7: Anycast vs. DDoS: Evaluating Nov. 30

• sites may have multiple servers

• 11 use IP anycast sites

– 5 to 144 anycast sites for each anycast letter

– (1 uses primary/secondary, 1 is single site)

• provided by 13 letters

– 12 operators, 13 deployments

– each different

– each thoughtful

– each constrained (peering, funding, etc.)

• one root “.”

– Q: .com’s NS? A: 192.5.6.30

Parts of Root DNS’ Anycast

7Anycast vs. DDoS / 2016-10-16

root: .

letters: a…k…m

sites: K-AMS, K-AVN,

…K-ZRH

servers: K-AMS-1, K-AMS-2,

K-AMS-3

Page 8: Anycast vs. DDoS: Evaluating Nov. 30

Anycast in Good Times

8Anycast vs. DDoS / 2016-10-16

you

X-SJC

your

friend

X-PRG

anycast matches

a user to a (hopefully)

nearby site

X-SYD

(some sites have

more capacity)

another

friendanycast divides the Internet

into catchements(often messy and non-geographic)

Page 9: Anycast vs. DDoS: Evaluating Nov. 30

Anycast Under Stress

9Anycast vs. DDoS / 2016-10-16

you

X-SJC

your

friend

X-PRG

attackers

too many attackers

overwhelm your site:

your queries get lost

a similar size attack

may be absorbed

at a bigger site

other

attackers

catchments also

isolate sites from

attackers

X-SYDanother

friend

Page 10: Anycast vs. DDoS: Evaluating Nov. 30

Anycast Reactions to Stress(do nothing?)

10Anycast vs. DDoS / 2016-10-16

you

X-SJC

your

friend

X-PRG

attackers

other

attackers

X-SYDanother

friend

1. nothing: X-SJC is degraded absorber,

protecting X-SYD’s users

Page 11: Anycast vs. DDoS: Evaluating Nov. 30

Anycast Reactions to Stress(withdraw some routes?)

11Anycast vs. DDoS / 2016-10-16

you

X-SJC

your

friend

X-PRG

attackers

other

attackers

X-SYDanother

friend

1. nothing: X-SJC is degraded absorber,

protecting X-SYD’s users

2. withdraw routes from X-SJC;

may shift attackers to big site

Page 12: Anycast vs. DDoS: Evaluating Nov. 30

Anycast Reactions to Stress(withdraw other routes?)

12Anycast vs. DDoS / 2016-10-16

you

X-SJC

your

friend

X-PRG

attackers

other

attackers

X-SYDanother

friend

1. nothing: X-SJC is degraded absorber,

protecting X-SYD’s users

2. withdraw routes from X-SJC;

may shift attackers to big site

3. withdraw wrong routes from X-SJC;

may shift attackers to other site

Page 13: Anycast vs. DDoS: Evaluating Nov. 30

Best Reaction to Stress?

You Don’t Know

13Anycast vs. DDoS / 2016-10-16

you

X-SJC

your

friend

X-PRG

attackers

other

attackers

X-SYDanother

friend

1. nothing: X-SJC is degraded absorber,

protecting X-SYD’s users

2. withdraw routes from X-SJC;

may shift attackers to big site

3. withdraw wrong routes from X-SJC;

may shift attackers to other site

don’t know:

number of attackers

location of attackers

affects of routing change

hard to make

informed choices

don’t fully control

routing and catchments

Page 14: Anycast vs. DDoS: Evaluating Nov. 30

What Actually Happens?

• studying Nov. 30

• we see withdrawals and degraded absorbers

• some clients lose service

• results vary

– by anycast deployment

14Anycast vs. DDoS / 2016-10-16

Page 15: Anycast vs. DDoS: Evaluating Nov. 30

Data About Nov. 30

• RIPE Atlas– ~9000 vantage points (RIPE Atlas probes)

– try every letter every 4 minutes• except A-root, at this time, was every 30 minutes

• CHAOS query identifies server and implies site

• targets letters, not Root DNS (cannot switch letter)

– global, but heavily biased to Europe

– we map server->site• map will be public dataset

• RSSAC-002 reports– self-reports from letters

– not guaranteed when under stress

• BGPmon routing– control plane

15Anycast vs. DDoS / 2016-10-16

6996 RIPE Atlas VPs on 2015-11-30

(looking at K-Root)

Page 16: Anycast vs. DDoS: Evaluating Nov. 30

Summary of the Events

• two events– 2015-11-30t06:50 for 2h40m

– 2015-12-01t05:10 for 1h

• affected 10 of 13 letters

• about 5M q/s or 3.5Gb/s per affected letter– aggregate: 34Gb/s

• real DNS queries, common query names, from spoofed source IPs

• implications:– some letters had high loss

– overall, though DNS worked fine• clients retried other letters (as designed)

– but want to do better

16Anycast vs. DDoS / 2016-10-16

data:

A-Root had full view

(Verisign presentation);

RSSAC-002 reports

Page 17: Anycast vs. DDoS: Evaluating Nov. 30

How About the Letters?

17Anycast vs. DDoS / 2016-10-16

[Mou

ra16

a, figu

re 3; d

ata: RIP

E A

tlas]

some did great:

D, L, M: not attacked

A: no visible loss

most suffered:

a bit (E, F, I, J, K)

or a lot (B, C, G, H)

but does “x%”

measure what

users actually see?

Page 18: Anycast vs. DDoS: Evaluating Nov. 30

30

0 V

anta

ge

Po

ints

(1

/ro

w)

36 hours20

15-1

1-3

0t0

0Z

No

v. 3

0

even

t

Dec

. 1

even

t

yellow:

K-LHR

salmon:

K-FRA

blue: K-AMS

black: failed query

white: K-other

[Moura16a, figure 11;

data: RIPE Atlas]

View from Atlas Vantage Points

18Anycast vs. DDoS / 2016-10-16

K overall:

~30% loss

(not bad)

but these 300 VPs:

70-90% loss to K

=> loss is uneven;

some users very sad

=> “30% loss” may

imply all VPs lose;

doesn’t show

uneven distribution

Page 19: Anycast vs. DDoS: Evaluating Nov. 30

Reachability at K’s Sites

19Anycast vs. DDoS / 2016-10-16

sparkline plot per site

(les

s)

sit

e p

op

ula

rity

(

more

)

sites see fewer VPs, but why?

- query loss? site absorbs attack,

but sad customers

- route change? who? why? where?

few VPs

(during

Nov. 30

event)

few VPs

(Dec. 1

event)

extra VPs

median

median(the “natural”

catchment)

3x median

Page 20: Anycast vs. DDoS: Evaluating Nov. 30

Site Flips from Routing Changes

20Anycast vs. DDoS / 2016-10-16

30

0 V

anta

ge

Po

ints

(1

/ro

w)

36 hours20

15-1

1-3

0t0

0Z

No

v. 3

0

even

t

Dec

. 1

even

t

yellow:

K-LHR

salmon:

K-FRA

blue: K-AMS

black: failed query

white: K-other

[Moura16a, figure 11;

data: RIPE Atlas]

Page 21: Anycast vs. DDoS: Evaluating Nov. 30

Site Flips from Routing Changes

21Anycast vs. DDoS / 2016-10-16

40

Van

tage

Po

ints

(1

/ro

w)

360 minutes (in 4 minute bins)

yellow: K-LHR

blue: K-AMS

black: failed query [Moura16a, figure 11b;

data: RIPE Atlas]

white: K-other

Nov. 30 event

stay at K-LHR;sad during event

flip to K-AMS;(less) sad during event;

back to K-LHR after

flip to K-other

and stay there

flip to K-AMS

Page 22: Anycast vs. DDoS: Evaluating Nov. 30

Confirming Flips in BGP

22Anycast vs. DDoS / 2016-10-16

flips common during

events for most lettersflips seen in BGP

[Mou

ra16

a, figu

re 8; d

ata: RIP

E A

tlas]

[Mou

ra16

a, figu

re 9; d

ata: BG

Pm

on]

Page 23: Anycast vs. DDoS: Evaluating Nov. 30

Flips Across Letters: E and K

23Anycast vs. DDoS / 2016-10-16

sites acquiring VPs

(during event?)

sites shedding VPs

to evaluate flips over two days:

compare minimum and maximum

catchement (measured in VPs/site)

normalize to median VPs

(the natural catchment),

to correct for uneven Atlas locations

(red sites: <20 VPs; not enough

to provide meaningful results)

[Mou

ra16a, fig

ure 5

; data: R

IPE

Atlas]

Page 24: Anycast vs. DDoS: Evaluating Nov. 30

Flips: Implications

• some ISPs are “sticky” and won’t flip– will suffer if their site is overloaded

• some ISPs will flip– but new site may not be much better

• result depends on many factors– actions taken by root operator

– routing choices by operator and peer• and perhaps peer’s peers, depending on congestion location

– implementation choices• DNS, routing

24Anycast vs. DDoS / 2016-10-16

Page 25: Anycast vs. DDoS: Evaluating Nov. 30

Anycast Under Stress:

What Should Happen?• consider a service

– 3 sites: s1, s2, S3

– s1 and s2: 1Gb/s

– S3: 10Gb/s

• with clients– 4 clients: c0 to c3

• the attack– A0 and A1

– each: 0.49, 0.99, 4.9, or 6Gb/s

• what is the optimal, ideal defense?– assume static attackers

– defender knows attack strengths

– defender controls routing

• metric: Happiness H: number of clients served

25Anycast vs. DDoS / 2016-10-16

Page 26: Anycast vs. DDoS: Evaluating Nov. 30

Anycast Under Stress:

What Should Happen?1. A0+A1 < s1: do nothing; H=4

2. A0 < s1 and A0+A1 > s2: shed load; H=4– vs. H=2 if do nothing

3. A0 > s1 and A0+A1 < s3: keep only big site; H=4

– vs. H=2 if nothing

4. A0+A1 > S3: do nothing (s1 is degraded absorber); H=2

⇒ with today’s uncertainty:“do nothing” looks good

⇒ future goal: what is needed(measurement and control) to do better?

26Anycast vs. DDoS / 2016-10-16

.98 < 1

A0=.49

A1=.49

s1=1

Page 27: Anycast vs. DDoS: Evaluating Nov. 30

Anycast Under Stress:

What Should Happen?1. A0+A1 < s1: do nothing; H=4

2. A0 < s1 and A0+A1 > s2: shed load; H=4– vs. H=2 if do nothing

3. A0 > s1 and A0+A1 < s3: keep only big site; H=4

– vs. H=2 if nothing

4. A0+A1 > S3: do nothing (s1 is degraded absorber); H=2

⇒ with today’s uncertainty:“do nothing” looks good

⇒ future goal: what is needed(measurement and control) to do better?

27Anycast vs. DDoS / 2016-10-16

.99 < 1 and 1.98 > 1

A0=.99

s1=1

A1=.99

s2=1

Page 28: Anycast vs. DDoS: Evaluating Nov. 30

Anycast Under Stress:

What Should Happen?1. A0+A1 < s1: do nothing; H=4

2. A0 < s1 and A0+A1 > s2: shed load; H=4– vs. H=2 if do nothing

3. A0 > s1 and A0+A1 < s3: keep only big site; H=4

– vs. H=2 if nothing

4. A0+A1 > S3: do nothing (s1 is degraded absorber); H=2

⇒ with today’s uncertainty:“do nothing” looks good

⇒ future goal: what is needed(measurement and control) to do better?

28Anycast vs. DDoS / 2016-10-16

4.9 > 1 and 9.8 < 10

A0=4.9

s1=1

A1=4.9

Page 29: Anycast vs. DDoS: Evaluating Nov. 30

Anycast Under Stress:

What Should Happen?1. A0+A1 < s1: do nothing; H=4

2. A0 < s1 and A0+A1 > s2: shed load; H=4– vs. H=2 if do nothing

3. A0 > s1 and A0+A1 < s3: keep only big site; H=4

– vs. H=2 if nothing

4. A0+A1 > S3: do nothing (s1 is degraded absorber); H=2

⇒ with today’s uncertainty:“do nothing” looks good

⇒ future goal: what is needed(measurement and control) to do better?

29Anycast vs. DDoS / 2016-10-16

12 > 10

A0=6

s1=1

A1=6

Page 30: Anycast vs. DDoS: Evaluating Nov. 30

During An Event:

Active Routing Changes or Not?• no active routing changes

– should expect partial loss in future attacks• inevitable: non-uniform attacker and defender capacity

– overloaded catchments will suffer during attack

– need to pre-deploy excess capacity

– operators understand and are doing these;but what about user expectations?

• active routing changes– important when aggregate attack and defense capacity is similar

• if one exceeds the other, no need to bother

– requires much better measurement and route control• seems like a research problem; AFAIK no tools today

– important to reduce client losses at smaller sites

– seems necessary to get to 0% loss

30Anycast vs. DDoS / 2016-10-16

Page 31: Anycast vs. DDoS: Evaluating Nov. 30

Aside: Collateral Damage

• can an event hurt non-targets?

• yes! …a risk of shared datacenters

31Anycast vs. DDoS / 2016-10-16

D-FRA and D-SYD: less traffic(even though D was not directly attacked)

.NL-FRA and .NL-AMS: no traffic

In other attacks, B-Root’s ISP

saw loss to other customers

[Mou

ra16

a, figu

re 14

; data: R

IPE

atlas]

[Mou

ra16

a, figu

re 15

;

data: S

IDN

]

Page 32: Anycast vs. DDoS: Evaluating Nov. 30

Recommendations

• current approach reasonable– build out capacity in advance

– no active re-routing during attack

– should expect some loss during each attack

• need true diversity to avoid collateral damage

• longer-term– need research to improve measurement and control

– active control can improve loss during some attacks

• how many sites needed?– there is a lot of capacity already

– many small sites seem to increase partial outages

32Anycast vs. DDoS / 2016-10-16

Page 33: Anycast vs. DDoS: Evaluating Nov. 30

Conclusions

• anycast under stress is complicated

– some users will see persistent loss

– “x% loss” is not complete picture

• options:

– pre-deploy + no change during is reasonable choice today

– to avoid loss, will need to do more

• more info:

– paper: http://www.isi.edu/~johnh/PAPERS/Moura16b

– data: https://ant.isi.edu/datasets/anycast/

33Anycast vs. DDoS / 2016-10-16