when the dike breaks: dissecting dns defenses during ddosimplications: ops: be ready for friendly...

38
When the Dike Breaks: Dissecting DNS Defenses During DDoS Giovane C. M. Moura 1,2 , John Heidemann 3 , Moritz Müller 1,4 , Ricardo de O. Schmidt 5 , Marco Davids 1 RIPE 77, Amsterdam, The Netherlands 2018-10-15 1 SIDN Labs, 2 TU Delft, 3 USC/ISI, 4 University of Twente, 5 University of Passo Fundo 1

Upload: others

Post on 21-Sep-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

When the Dike Breaks:Dissecting DNS Defenses During DDoS

Giovane C. M. Moura1,2, John Heidemann3, Moritz Müller1,4,Ricardo de O. Schmidt5, Marco Davids1

RIPE 77, Amsterdam, The Netherlands2018-10-15

1SIDN Labs, 2TU Delft, 3USC/ISI,4University of Twente, 5University of Passo Fundo

1

Page 2: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Research paper to appear on ACM IMC 2018

• Joint research work to appear at:https://conferences.sigcomm.org/imc/2018/

• Full text (PDF):https://www.isi.edu/~johnh/PAPERS/Moura18b.pdf

2

Page 3: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

DDoS Attacks

• DDoS attacks are on the rise

• Getting bigger, more frequent, cheaper, and easier• Arbor: 1.7 Tb/s [2] (2018)• Github DDoS: 1.35 Tb/s [1] (2018)• Dyn DDoS: 1.2 Tb/s (Mirai IoT) [6] (2017)• DDoS as a service: few dollars with booters [8].

• Many DNS services have been victim of DDOS attacks

3

Page 4: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

DDoS and DNS: two examples

Root DNS DDoS Nov 2015

no known reports of errors seenby users [3]

Dyn Oct 2016

some users could not reachpopular sites [6]

Two large DDoSes, very different outcomes. Why?4

Page 5: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

DDoS and DNS: two examples

Root DNS DDoS Nov 2015

no known reports of errors seenby users [3]

Dyn Oct 2016

some users could not reachpopular sites [6]

Two large DDoSes, very different outcomes. Why?4

Page 6: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

DNS Basics

User Internet

Query: example.nl?

Answer:192.168.1.1

• That’s what most users (need to) know about DNS

• Let’s see what really happens

5

Page 7: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Background: the many parts of DNS

Stub Resolvere.g.: OS/applicationsStub

Recursives(1st level

e.g.: modem)R1a R1bCR1a CR1b

RnaCRna... Rnn CRnb

Recursives(nth level)

e.g: ISP resolv.

AuthoritativeServers

e.g.: ns1.example.nlAT1 ... ATn

Figure 1: Relationship between resolvers,caches, and authoritatives

• DNS query: where’s example.nl ($ dig A example.nl)• Answer: example.nl. 3600 IN A 94.198.159.35• DNS TTL: max time to cache a record

6

Page 8: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Background: the many parts of DNS

Stub Resolvere.g.: OS/applicationsStub

Recursives(1st level

e.g.: modem)R1a R1bCR1a CR1b

RnaCRna... Rnn CRnb

Recursives(nth level)

e.g: ISP resolv.

AuthoritativeServers

e.g.: ns1.example.nlAT1 ... ATnDDoS attack

• How much will resolver’s built-in defenses help users duringDDoS?

7

Page 9: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

OPS expectation during DDoS

Stub Resolvere.g.: OS/applicationsStub

Recursives(1st level

e.g.: modem)R1a R1bCR1a CR1b

RnaCRna... Rnn CRnb

Recursives(nth level)

e.g: ISP resolv.

AuthoritativeServers

e.g.: ns1.example.nlAT1 ... ATnDDoS attack

Figure 2: TTL= how long your star powers will last – answer from cache8

Page 10: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Evaluating DNS Resiliency

• Part 1: evaluate user experience under “normal” operations

• Part 2: Verify results of Part 1 in production zones (.nl)

• Part 3: Emulate DDoSes in the wild to evaluatecaching/retrials under stress, to observe user experience

9

Page 11: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Part 1: measuring caching in the wild

Setup

1. register our new domain (cachetest.nl)

2. run two unicast IPv4 authoritatives on EC2 Frankfurt

3. User Ripe Atlas and their resolvers as vantage points (∼ 15k)

4. Each VP sends a unique AAAA query, so no interference

• e.g.,: 500.cachetest.nl for probeID=500

5. Each AAAA DNS answer encodes a counter that allow us totell if it was cache hit or miss

• $PREFIX:$SERIAL:$PROBEID:$TTL

6. Probe every 20min, and run scenarios with different TTLs, for2 to 3 hours (to match various TTLs in the wild)

• 60, 1800,3600, and 86400 seconds TTL

10

Page 12: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Part 1: measuring caching in the wild

• We control auth servers and clients (stub resolver)

• We do not control recursives

• How efficient is caching in the wild?• Remember: TTL sets upper limit for HOW LONG it should be

cached by recursives

11

Page 13: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Results: how good caching is in the wild?

0

20000

40000

60000

80000

100000

120000

60s 1800s 3600s 86400s 3600s-10m

Miss: 0.0%

Miss: 32.6%

Miss: 32.9%

Miss: 30.9%

Miss: 28.5%

rem

ain

ing

qu

erie

s

Experiment

AACC

ACCA

1. Good news: caching works fine for 70% of all 15,000 VPs• With our not popular domain

2. Not so good news: ∼ 30% of cache misses (AC)12

Page 14: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Why cache misses (Why AC?)

Possible: capacity limits, cache flushes, complex caches

Mostly: complex caches

• cache fragmentation with multiple servers

• (previous work on Google DNS [9])

TTL 60 1800 3600 86400 3600-10mAC Answers 37 24645 24091 23202 47,262

Public R1 0 12000 11359 10869 21955Google Public R1 0 9693 9026 8585 17325other Public R1 0 2307 2333 2284 4630

Non-Public R1 37 12645 12732 12333 25307Google Public Rn 0 1196 1091 248 1708other Rn 37 11449 11641 12085 23599

Table 1: AC answers (cache miss) public resolver classification

13

Page 15: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Part 2: caching in production zones

• OK, in our controlled environment, we show that cachingworks 70% as expected

• Are these experiments representative?

• We look at .nl production data• we compute ∆t (time since last query)• Compare to TTL of 3600s• 485k queries from 7,779 recursives

14

Page 16: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Part 2: caching in production zones

• Most resolvers send queries usually ∼3600s (.nl TTL)• 28% do not respect the 1h TTL• Yes, experiments are like real zone• (we also look into the Roots , see paper [4])

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

0 2000 4000 6000 8000 10000

CD

F

Δ t15

Page 17: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

OK, so what do you we have so far?

• We know how caching works in the wild (both Ripe and .nl)

• Time to move Part 3: emulate DDoS

• Goal: understand client experience under DDoS

16

Page 18: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Part 3: Emulating DDoS

• Similar setup as other experiments:

• Emulate DDoS: drop incoming queries at certain rates atAuthoritative servers, with iptables

• Question: (when) do caches protect clients?

• Or why some DDoS attacks seem to have more impact?

• We show only few experiments, many more in the paper

17

Page 19: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Scenario A: all servers DOWN

• Worst nightmare for a DNS operator

• Only resolver’s cache can save clients

• TTL=3600s (1 hour)

• We probe every 10 minutes

• At t = 10min, we drop all packets

18

Page 20: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Complete DDoS: TTL: 60min, 100% failure

0

5000

10000

15000

20000

0 10 20 30 40 50 60 70 80 90 100 110

cache-only cache-expired

answ

ers

minutes after start

OK SERVFAIL No answer

Figure 3: Scenario A: 100% failure after 10min, TTL: 60min

• DDoS starts after 1st query (fresh cache)

• During DDoS: 35%-70% of clients are served (cache)• After cache expires: only 0.2% clients (serve state)

• draft-ietf-dnsop-serve-stale-00

19

Page 21: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Complete DDoS: changing cache freshness

• Scenario B: Cache freshness: about to expire• How clients will experience DDoS?

0

5000

10000

15000

20000

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170

cache-onlynormal normal

answ

ers

minutes after start

OK SERVFAIL No answer

Figure 4: Scenario B: 100% failure after 60min, TTL: 60min

• Cache much less effective (as times out near attack)• Fragmented cached helps some (by filling later) 20

Page 22: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Complete DDoS: changing cache freshness

• Scenario B: Cache freshness: about to expire• How clients will experience DDoS?

0

5000

10000

15000

20000

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170

cache-onlynormal normal

answ

ers

minutes after start

OK SERVFAIL No answer

Figure 4: Scenario B: 100% failure after 60min, TTL: 60min

• Cache much less effective (as times out near attack)• Fragmented cached helps some (by filling later) 20

Page 23: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Complete DDoS: TTL record influence

• Influence of TTL: reducing from 60min to 30min• How clients will experience DDoS?

0

5000

10000

15000

20000

0 10 20 30 40 50 60 70 80 90 100110120130140150160170

normal normal

answ

ers

minutes after start

OK SERVFAIL No answer

cache-only

cache-expired

Figure 5: Scenario C: 100% failure after 60min, TTL: 30min

• Users experience worsens with shorter TTL• OPs: choose wisely the TTL of your records when

engineering for DDoS21

Page 24: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Discussion complete DDoS

• Caching is partially successful during complete DDoS

• OPs: don’t expect protection for clients as long as your TTL;depends on their cache state

• Serving stale content provides the last resort for Doomsdayscenario

• some ops (Google, OpenDNS) seem to do it, but it is notwidespread yet

• TTL of records: the shorter you set them, the less you protectusers during a complete DDoS

22

Page 25: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Partial DDoS

• Not all DDoS are strong enough to bring all servers down

• Some lead to partial failure (Root DNS Nov 2015 [3])• Partial failure: some of the available authoritative fail to answer

all queries, or take longer to answer; then users experiencelonger latencies

• In this case, how would users experience the attack?

23

Page 26: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Experiment E: 50% success DDoS, TTL: 30min

0

5000

10000

15000

20000

0 10 20 30 40 50 60 70 80 90 100110120130140150160170

50% packet loss(both NSes)

normal normal

answ

ers

minutes after start

OK SERVFAIL No answer

0

500

1000

1500

2000

2500

3000

3500

4000

0 20 40 60 80 100 120 140 160

late

ncy (

ms)

minutes after start

Median RTTMean RTT75%ile RTT90%ile RTT

Good! Most clients are happy, as they retry (but takes longer)

24

Page 27: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Experiment H: 90% success DDoS, TTL: 30min

0

5000

10000

15000

20000

0 10 20 30 40 50 60 70 80 90 100110120130140150160170

90% packet loss(both NSes)

normal normal

answ

ers

minutes after start

OK SERVFAIL No answer

0

500

1000

1500

2000

2500

3000

3500

4000

0 20 40 60 80 100 120 140 160

late

ncy (

ms)

minutes after start

Median RTTMean RTT75%ile RTT90%ile RTT

Good! Even at 90% packet loss with TTL 30min, most clients(60%) get an answer!! Good Engineering!

25

Page 28: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Experiment I: 90% success DDoS, TTL: 1min

• What’s TTL influence in partial DDoS?

0

5000

10000

15000

20000

0 10 20 30 40 50 60 70 80 90 100110120130140150160170

90% packet loss(both NSes)

normal normal

answ

ers

minutes after start

OK SERVFAIL No answer

0

500

1000

1500

2000

2500

3000

3500

4000

0 20 40 60 80 100 120 140 160

late

ncy (

ms)

minutes after start

Median RTTMean RTT75%ile RTT90%ile RTT

Even with no caching (TTL 1min), 27% get an answer: stale +retries

26

Page 29: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Retries cost: hammering Auth servers

• Part of DNS resilience is that recursives keep on retrying• There’s a cost to it however: 8.1x in case of no caching!• Implications: OPS: be ready for friendly fire

• usually not noticed during DDoS• If you overprovision level is 10x, know that 8.1x is friendly fire

0

50000

100000

150000

200000

0 10 20 30 40 50 60 70 80 90 100110120130140150160170

90% packet loss(both NSes)

normal normal

queries

minutes after start

NSA-for-NS

AAAA-for-NSAAAA-for-PID

Figure 6: Queries received at Auth Servers .Experiment I: 90% successDDoS, TTL: 1min

27

Page 30: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Implications

• Caching and retries work really well• provided some authoritative stays partially up• and caches last longer than DDoS (as in TLDs, not in CDNs)• For DNS OPs: make one auth very strong? (careful with load

distrubtion, see [5])

• Explains prior root DDoS outcomes

28

Page 31: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Implications

• There is a clear trade-off between TTL and DNS resilience• provided caches are filled and not about to expire

• Many commercial websites have short TTLs• explains the pain of Dyn‘s customers and users perception• shorter TTLs given them quicker management options

(Amazon EC2 resolvers cap all answer TTL to 60s [7])

29

Page 32: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Conclusions

• First study to evaluate DNS resilience to DDoS from user’sperspective

• Evaluate design choices of various vendors usingmeasurements

• Caching and retries: important part of DNS resilience• Good engineering: thanks for all IETFers/devs who have built

this

• Experiments show when they help and when they won’t

• Consistent with recent outcomes• DNS community:

• There’s a clear trade-off between TTL and DDoS robustness,choose wisely

• Serving stale content is controversial, some deploy it

30

Page 33: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

Questions?

• Paper: https://www.isi.edu/~johnh/PAPERS/Moura18b.pdf• Contact: [email protected]

• Thanks RIPE NCC and reviewers of various drafts:• Wes Hardaker, Duanne Wessels, Warren Kumari, Stephane Bortzmeyer,

Maarten Aertsen, Paul Hoffman, our shepherd Mark Allman, and theanonymous IMC reviewers

31

Page 34: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

References i

[1] Sam Kottler.

February 28th DDoS Incident Report | Github Engineering,March 2018.

. https://githubengineering.com/ddos-incident-report/.

[2] Carlos Morales.

February 28th DDoS Incident Report | GithubEngineeringNETSCOUT Arbor Confirms 1.7 Tbps DDoSAttack; The Terabit Attack Era Is Upon Us, March 2018.

https://www.arbornetworks.com/blog/asert/netscout-arbor-confirms-1-7-tbps-ddos-attack-terabit-attack-era-upon-us/.

32

Page 35: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

References ii

[3] Giovane C. M. Moura, Ricardo de O. Schmidt, JohnHeidemann, Wouter B. de Vries, Moritz Müller, Lan Wei, andChristian Hesselman.

Anycast vs. DDoS: Evaluating the November 2015 rootDNS event.

In Proceedings of the ACM Internet Measurement Conference,November 2016.

33

Page 36: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

References iii

[4] Giovane C. M. Moura, John Heidemann, Moritz Müller, Ricardode O. Schmidt, and Marco Davids.

When the dike breaks: Dissecting DNS defenses duringDDoS (extended).

In Proceedings of the ACM Internet Measurement Conference,October 2018.

[5] Moritz Müller, Giovane C. M. Moura, Ricardo de O. Schmidt,and John Heidemann.

Recursives in the wild: Engineering authoritative DNSservers.

In Proceedings of the ACM Internet Measurement Conference,pages 489–495, London, UK, 2017.

34

Page 37: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

References iv

[6] Nicole Perlroth.

Hackers used new weapons to disrupt major websitesacross U.S.

New York Times, page A1, Oct. 22 2016.

[7] Alec Peterson.

Ec2 resolver changing ttl on dns answers?

Post on the DNS-OARC dns-operations mailing list,https://lists.dns-oarc.net/pipermail/dns-operations/2017-November/017043.html, November2017.

35

Page 38: When the Dike Breaks: Dissecting DNS Defenses During DDoSImplications: OPS: be ready for friendly fire usually not noticed during DDoS If you overprovision level is 10x, know that

References v

[8] José Jair Santanna, Roland van Rijswijk-Deij, Rick Hofstede,Anna Sperotto, Mark Wierbosch, Lisandro ZambenedettiGranville, and Aiko Pras.

Booters—an analysis of DDoS-as-a-Service attacks.

In Proceedings of the 14th IFIP/IEEE Interatinoal Symposiumon Integrated Network Management, Ottowa, Canada, May2015. IFIP.

[9] Kyle Schomp, Tom Callahan, Michael Rabinovich, and MarkAllman.

On measuring the client-side DNS infrastructure.

In Proceedings of the 2015 ACM Conference on InternetMeasurement Conference, pages 77–90. ACM, October 2013.

36