still beheading hydras: botnet takedowns then and...

1545-5971 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2015.2496176, IEEETransactions on Dependable and Secure Computing

1

Still Beheading Hydras: Botnet Takedowns Thenand Now

Yacin Nadji, Roberto Perdisci, Manos Antonakakis

Abstract—Devices infected with malicious software typically form botnet armies under the influence of one or more command and control (C&C)servers. The botnet problem reached such levels where federal law enforcement agencies have to step in and take actions againstbotnets by disrupting (or “taking down”) their C&Cs, and thus their illicit operations. Lately, more and more private companies havestarted to independently take action against botnet armies, primarily focusing on their DNS-based C&Cs. While well-intentioned, theirC&C takedown methodology is in most cases ad-hoc, and limited by the breadth of knowledge available around the malware thatfacilitates the botnet. With this paper, we aim to bring order, measure, and reason to the botnet takedown problem. We improve anexisting takedown analysis system called rza. Specifically, we examine additional botnet takedowns, enhance the risk calculation to usebotnet population counts, and include a detailed discussion of policy improvements that can be made to improve takedowns. As part ofour system evaluation, we perform a postmortem analysis of the recent 3322.org, Citadel, and No-IP takedowns.

F

1 INTRODUCTION

Botnets represent a persistent threat to Internet security.To effectively counter botnets, security researchers and lawenforcement organizations have been recently relying moreand more on botnet takedown operations. Essentially, a botnettakedown consists of identifying and disrupting the botnet’scommand-and-control (C&C) infrastructure. For example, in2009 law enforcement and security operators were able totakedown the Mariposa botnet, which at that time consistedof approximately 600,000 bots. The takedown operationwas accomplished by first identifying the set of domainnames through which bots would locate their C&C networkinfrastructure. By seizing this set of domains via a collabo-ration with domain registrars, security operators effectively“sinkholed” the botnet, thus shunting the C&C traffic awayfrom the botmaster and avoiding any further commands tobe issued to the bots.

While sophisticated botnet developers have attempted,in some cases successfully, to build peer-to-peer (P2P) bot-nets that avoid entirely the use of C&C domains [22],most modern botnets make frequent use of the domainname system (DNS) to support their C&C infrastructure.This is likely due to the fact that DNS-based botnets aremuch easier to develop and manage compared to their P2P-based counterparts, and yet provide a remarkable level ofagility that makes a takedown challenging. For example,the Mariposa case required a coordinated effort involvinglaw enforcement, security operators, and domain regis-trars across several different countries. In addition, somerecent takedown efforts [15] have caused some level ofcollateral damage, thus raising both technical issues andpolicy-related questions regarding the efficacy of botnettakedowns.

In this paper, we propose a novel takedown analysis

system, which we call rza. Our main goal is to provide a wayto “go back in time” and quantitatively analyze past takedown

efforts to highlight incomplete takedowns and identify what

worked and what could have been done better. Specifically,rza identifies additional domains that are likely part of a bot-net’s C&C infrastructure by examining historical relation-ships in the DNS and analyzing the botnet’s malware sam-ples. This aids the takedown process by identifying domainsthat may have been missed by hand, both from the network-level and the malware-level, aggregating this information,and automatically labeling the domains with evidence oftheir maliciousness. While rza focuses on disrupting botnetsthat use DNS-based C&C infrastructure, it can also assistin cases where botnets are more advanced and use domainname generation algorithms (DGA) or communicate usinga peer-to-peer structure (P2P). In particular, rza provides thefirst few steps for remediating advanced C&C infrastruc-ture: (i) identifying DNS-based primary C&C infrastructure,if it exists; (ii) automatically identifying if the botnet hasDGA or P2P capabilities; and (iii) automatically identifyingthe malware samples that exhibit these behaviors to triagebinaries for reverse engineering. To successfully takedownDGA/P2P botnets we must fully understand their non-deterministic portions, such as the randomness seed forDGAs [5] or the peer enumeration and selection algorithmsfor P2P [22]. If we disable a botnet’s primary infrastructurebut do not account for the DGA-based backup mechanism,our efforts will be futile.

We show that in cases of past takedowns, likely mali-cious domain names were left unperturbed. Worse yet, insome cases malicious domains were unintentionally givenenterprise-level domain name resolution services. We showthat rza can identify additional sets of domain names thatought to be considered in a future takedown, as well asautomatically identify malware contingency plans whentheir primary C&C infrastructure is disabled.

In this paper, we improve our work described in [19] andpresent the following contributions:

• We perform two additional botnet takedown post-mortem analyses to explain the current state of bot-



2

net takedowns. We show that takedowns are stilllargely ineffective and run the risk of drawbacksfrom collateral damage or friendly fire.

• We augment risk calculation to include populationmeasurements to further improve the takedown rec-ommendation engine and analyze more recent bot-nets.

• We expand the policy discussion based on what wehave learned in the interim period.

The remainder of the paper is structured as follows:Section 2 provides the necessary background on the DNS,botnet takedowns, and our datasets. Section 3 describes rza

in detail and Section 4 describes the process for interrogatingmalware samples. Section 5 presents our postmortem exper-iments and analyses of three recent, high-profile takedownattempts. In Section 6.2 we discuss non-technical difficultiesassociated with performing takedowns that, if alleviated,would likely improve the process with respect to coordi-nation and preventing potential collateral damage.

2 BACKGROUND

In this section, we first provide an historical explanation ofsome past takedowns and explain why they deserve to bestudied in detail. Then, we describe the datasets used by rza

to perform takedown analysis and to build the takedownrecommendation system.

2.1 Botnets and TakedownsBotnet takedowns are not uncommon, and may take manydifferent forms. Considering the heterogeneous nature ofclient machines and the difficulty in keeping individualmachines clean from infection, taking down the botnetC&C is an attractive alternative. A successful takedowneliminates most external negative impacts of the botnet,effectively foiling further attacks (e.g., spam, DDoS, etc.) bythe infected hosts, which can number in the millions. In thepast, takedowns have been performed by revoking sets ofC&C IP addresses from hosting providers, de-peering entireAutonomous Systems (AS), or, more recently, sinkholing orrevoking C&C domains.

Conficker is an Internet worm that infected millions ofcomputers and remains one of the most nefarious threatsseen on the Internet to date [5]. Conficker’s latter vari-ants employed a DGA that would generate 50,000 pseudo-random domain names every day to communicate withits C&C server. The takedown of Confiker required im-mense coordination across hundreds of countries and top-level domains (TLDs), and numerous domain registrarsand registries. The takedown efforts were coordinated bythe Conficker Working Group (CWG) [5]. The takedownrequired reverse-engineering the malware binaries, and re-constructing the DGA. Then, the CWG pre-registered all50,000 domains per day that could potentially be usedfor C&C purposes, thus preventing the botmaster fromregaining control of the bots. The success of CWG’s effortshighlight the importance of participation and support fromkey governing and regulatory bodies, such as ICANN, andthe need of cooperation between the private sector andgovernments around the world.

Mariposa, a 600,000-strong botnet of Spanish origin, pro-vides another example of a takedown operation initiated bya working group that relied on sinkholing known maliciousdomains. Interestingly, Mariposa’s botmasters were able toevade a full takedown by bribing a registrar to return do-main control to the malicious operators [13], underscoringthe fact that barriers to successful takedowns are not onlytechnical ones.

The DNSChanger [23] “click-jacking” botnet was alsotaken down through a working group. DNSChanger alteredupwards of 300,000 clients’ DNS configurations to point torogue DNS resolvers under the control of the attackers. Thisallowed the attackers to direct infected hosts to illegitimatewebsites, often replacing advertisements with their ownto generate revenue. DNSChanger had to be taken downby physically seizing the botnet’s rogue DNS servers. Thetakedown was accomplished in late 2011. Largely consid-ered successful, the DNSChanger once again shows the im-portance of collaboration when performing comprehensivetakedowns.

Not all takedowns are performed at the DNS-level,however, as shown in the takedowns of McColo [11], ASTroyak [14], and other “bulletproof hosting providers,” ornetworks known to willingly support malicious activities.These are extreme cases where the networks in questionessentially hosted only malicious content, and removingthe entire network would disable large swaths of botnetsand related malicious network infrastructure. The effect ofthese takedowns were indirectly measured by witnessingdrops in spam levels, for example, upwards of two-thirdsdecrease after McColo’s shutdown [12]. Unfortunately, ifa particular botnet relied on the DNS to perform C&Cresolutions into these bulletproof networks, once a new hostwas provisioned the threat would continue. Sure enough,we saw spam levels rise back to normal levels as botnetsmoved to other hosting providers [8].

2.2 Datasets

rza relies on two primary data sources: a large passiveDNS database and a malware database that ties maliciousbinaries to the domain names the query during execution.

Passive DNS: A passive DNS (pDNS) database storeshistoric mappings between domain names and IP addressesbased on successful resolutions seen on a live network overtime. pDNS databases allow us to reconstruct the historicalstructure of DNS-based infrastructure based on how it wasused by clients. Our pDNS is constructed from real-worldDNS resolutions seen in a large North American ISP col-lected by Damballa, Inc. beginning on January 1, 2011. Thedata is approximately 27 terabytes compressed. This allowsus to identify the related historic domain names (RHDN) fora given IP, namely all domains that resolved to that IP inthe past. Also, pDNS allows us to find the related historic IP

addresses (RHIP) for a given domain name, i.e., all the IPsto which the domain resolved to in the past. Furthermore,the RHIP/RHDNs can be limited to domain-to-IP mappingsthat occurred during a particular time frame of interest, thusallowing us to focus on the crucial days before and after atakedown took place.



3

We provide the following abstract functions to moreclearly explain how the data are accessed and processed inthe context of rza:

• RHIP(domain, start_date, end_date): returns alldomains historically related to the domain argumentover the period between the desired start and enddates. For example, RHIP(foo.com, 2012/01/01,2012/01/05) would return the set of all IP addressesfoo.com successfully resolved to between January1st, 2012 and January 5th, 2012, inclusive.

• RHDN(IP, start_date, end_date): similarly,RHDN returns all domains historically related to theIP argument over the period between the start andend dates.

• Volume(domain and/or IP, date): the total success-ful lookup volume to the argument domain, IP, ordomain and IP tuple on the argument date.

• Population(domain, date): the total number of dis-tinct clients that queried the argument domain, ordomain and IP tuple on the argument date. Popu-lation can only be computed for domains that arequeried after November 14, 2012 due to data avail-ability.

It is important to note that our use of private pDNS datawas dictated mainly by convenience and cost issues. Todemonstrate that rza can properly function using differentsources of passive DNS data, we obtained temporary accessto the Farsight passive DNS database [7], which is availableto other researchers and offers an arguably more globalperspective.

Malware Domains: We also make use of a sepa-rate malware database that contains mappings between amalware sample’s MD5 sum and binary and the domainnames and IP addresses it has queried during dynamicmalware analysis. Each entry in the database is a 4-tuplethat includes the MD5 of the malware sample, the querieddomain name, the resolved IP address, and the date andtime of the analysis. These data are collected from a combi-nation of internal malware analysis output from Damballaas well as the output from a commercial malware feed. Thecommercial feed goes back until January 2007 and whencombined with Damballa’s internal data contains roughly505 million malware execution runs and is approximately200 gigabytes compressed.

3 RZA SYSTEM

In this section, we detail the internals of rza, our takedownanalysis and recommendation system.

3.1 Overview

Figure 1 shows the overall process implemented by rza.Given a set of known seed botnet domains DS , rza canbe asked to generate either a “Postmortem Report” or a“Takedown Recommendation”.

In the “Postmortem Report” mode, the input domainsrepresent the domains known to have been targeted by anhistoric takedown. This produces a report that shows the

effectiveness of the takedown of the domain names (Fig-ure 1, step 5a) with respect to the expanded infrastructurerza identifies.

In the “Takedown Recommendation” mode, the in-put domains represent the currently known malicious do-mains used for C&C infrastructure. Furthermore, the take-down recommendation engine explores possible networkresources that may be used by the botnet as a C&C backupmechanism, and suggests any additional measures thatmust be taken after the primary C&C is disabled to fullyeliminate the threat (Figure 1, step 5b).

At a high level, the processing steps executed by rza

are similar when producing both the “Postmortem Report”and “Takedown Recommendation”, despite the differencein inputs and the meaning of the results. The steps are:

1. Expand the initial domain seed set DS using thepDNS database to identify other domains that arelikely related to the botnet’s C&C infrastructure.Intuitively, domains are cheap but IP addresses arerelatively more expensive. By identifying additionaldomains that resolve to the same hosts as maliciousdomains, we can identify other potentially maliciousdomains related to the botnet.

2. Identify the subset of the expanded domains that arequeried by known malware samples. If a domainboth points to a host known to facilitate a C&Cand is also used by known malware, it increases thelikelihood of that domain itself being malicious aswell.

3. Identify the subset of the expanded domains withlow domain name reputation. Similar to the intuitionof Step 2, a domain that points to a known malicioushost and also has low domain reputation is morelikely to itself be malicious.

4. Analyze the malware samples identified in Step 2.In addition to straightforward dynamic malwareanalysis, we trick executing malware samples intobelieving that their primary C&C infrastructure isunavailable using a custom malware analysis sys-tem [20] to extract additional C&C domain names.Intuitively, domains used by malware related to theinfrastructure we are studying are likely to be relatedand malicious. Furthermore, we use the results of theanalysis to identify malware contingency plans thatwould allow the botnet to continue to function afterits primary C&C infrastructure has been disabled(e.g., a DGA-based or P2P C&C).

5. Output either the “Postmortem Report” or “Take-down Recommendation” depending on the mode ofoperation selected at the beginning.

The guiding principle we follow with rza is to pushour understanding of malicious C&C infrastructure towardscompleteness. Only once we have fully enumerated theC&C infrastructure can we successfully disable it. We canbegin to enumerate C&Cs from the network-level by iden-tifying historic relationships between domain names andhosts using pDNS evidence, and from the host-level by in-terrogating malware samples. Since the pDNS may containadditional domains not necessarily related to the botnet inquestion, we identify subsets of domains so we can focus



4

Domains

InfrastructureEnumeration

DomainReputation

Domain &MD5

Association

MalwareInterrogation

pDNS

Malware DB

MD5s

RZA

EnumeratedDomains

Low ReputationDomains

Malware-relatedDomains

InterrogatedDomains

PostmortemReport

TakedownRecommendation

1

2

3

4

5a

5b

Malware Backup Plan

Fig. 1: Overview of rza.

our investigative efforts on those that are most likely tobe malicious and not inundate ourselves with information.Each subset serves a different purpose: the low reputationsubset holds the domain names from the network-level thatare most likely to be malicious. The subset of domainsqueried by malware represents a reasonable baseline toexpect from prior takedowns, as much of this informationis readily available to the security community. The subsetgleaned from malware analysis contains the domains fromthe host-level that are the most likely to be malicious. Wecan use these sets to measure the effectiveness of pasttakedowns and recommend domains for future takedowns.

In the remainder of this section we describe each ofthese high-level tasks in detail, and discuss how they worktogether to suggest a takedown response.

3.2 Infrastructure Enumeration

Botnets often make use of the DNS to increase the reliabilityof their C&C infrastructure, for example using domain namefluxing or simply replacing retired or blacklisted domainswith new domains. This cycling of domains, however,leaves a trail in the pDNS database and can be used to enu-merate the infrastructure. For example, consider a malwaresample m that on day t

1

uses domain d1

as its primary C&Cdomain, but on day t

2

switches to domain d2

to evade theblacklisting of d

1

. Assume d1

and d2

resolve to the sameIP address. Analysis during either t

1

or t2

yields only oneof the possible domains, but the relationship between d

1

and d2

can be identified in a pDNS database because bothresolved to the same IP address.

Using the passive DNS database and the seed domainset DS , we compute the enumerated infrastructure domainset De using Algorithm 1. First, the related-historic IPs(RHIP) of DS are retrieved and known sinkhole, parking,and private IP addresses are removed. The related-historicdomain names (RHDN) for the remaining IPs are retrieved,and any benign domain names are removed, yielding theenumerated infrastructure of DS : De. The relationships re-trieved from the pDNS database are within a range of datesto ignore historic relationships that are no longer relevant.This constant is customizable and was empirically chosen.

Input: DS , startdate, enddate: seed domain set, andbounding dates

Output: De: enumerated domain set

Ib set of known sinkhole, private, parking IPsWd set of Alexa top 10,000 domain namesI RHIP (DS , startdate, enddate)I I \ IbDe RHDN(I, startdate, enddate)De De \Wd

return De

Algorithm 1: Infrastructure enumeration procedure.

To understand why we filter out benign domains con-sider an attacker that, in an attempt to mislead our anal-ysis, temporarily has their malicious domains resolve intobenign IP space (e.g., Google’s) or uses a popular hostingprovider (e.g., Amazon AWS). If either of these occur, the De

domain set may include unrelated, benign domain names.To handle this, we filter domains if they are a member,or are a subdomain of a member, of the set of the Alexatop 10,000 domain names. These domains are unlikely tobe persistently malicious and should not be considered fortakedown. IP addresses that are non-informative (private,sinkhole, etc.) are also removed, as the domains that resolveto them are unlikely to be related. For example, malwaredomains sometimes point to private IP addresses (e.g.,127.0.0.1) when they are not in use, which if not removedwould link otherwise unrelated domain names. We use theAlexa top 10,000 when performing malware interrogation(see Section 4), and for consistency we use it here as well.In future work we intend to explore the effect of usingsmaller and larger whitelists on the generated sets and theiraccuracy.

3.3 Categorizing the Expanded Infrastructure

Not all domains identified during the infrastructure enu-meration process are guaranteed to be malicious, but wecan identify subsets that are more likely to be malicious. Forexample, a domain that resolves to an IP address in a virtual



5

web hosting provider is likely to have many benign andunrelated domains that resolve to the same infrastructure aswell. To account for this, we focus on domains with known(often public) malware associations, and domains that havelow domain name reputation.

Using the passive DNS, we expand the initial seed do-main set, DS , into the expanded set De. Next, we identifyDm ✓ De and Dr ✓ De, the subset of domain names in De

with known malware associations and low domain namereputation, respectively. Malware associations are retrievedfrom our domain name to malware MD5 database andare commonly available in the security community [24]. Todetermine if a domain name has low reputation, we usea system similar in spirit to [3], [4] which scores domainreputation between 0.0 and 1.0, where 1.0 denotes a lowreputation (i.e., likely malicious) domain name. Any do-mains with > 0.5 reputation are considered malicious andare added to Dr . Unlike Dr and Dm, the set Di is notnecessarily a subset of De. Any domains that are used bymalware during malware interrogation are added to Di.These domains expand our coverage as they may unearthdomain names that were not previously included in De.During our postmortem analysis, we compare these sets tothe domains that were actually involved in the takedown(DS).

Figure 2 shows a Venn diagram representation of apossible configuration of enumerated infrastructure sets. Allsets, excluding Di, are subsets of De. Di is the most likelyto include domains outside of the scope of De, but suffersthe most from the problem of completeness as it relies ondynamic malware analysis.

For the postmortems presented in Section 5, the set Di

is not computed due to time limitations. This set is mostimportant to compute when performing a takedown rec-ommendation. Since the focus of this paper is on additionalpostmortem studies, computing the Di set is not performed.For completeness, however, the process of computing it andhow it is used to perform a takedown recommendationremains.

De

Ds

Di

Dr

Dm

Dm: malware-related domains

De: enumerated domainsDr: low reputation domains

Ds: seed domains

Di: malware interrogation domains

Fig. 2: Venn diagram of identified infrastructure sets.

3.4 Takedown Recommendation EngineUsing the four aforementioned techniques, we can runour takedown protocol as shown by the decision tree inFigure 3. Suppose we are interested in taking down a

hypothetical botnet where the current known infrastruc-ture is DS = {01.hans.gruber.com}. After enumeratingthe infrastructure, we identify the additional domain name02.hans.gruber.com that resolves to the same IP as the01 child domain. We identify and retrieve the malwaresamples that have queried the 01... and 02... domainnames and interrogate them. We identify an additionaldomain name, 03.hans.gruber.com, when the first twodomain names fail to resolve. Since we identified a finitenumber of new domain names, we re-run the process withthe expanded set of three domain names and this timethe malware analysis yields no behavioral changes fromwhat we have already identified. In the event a DGA ora P2P backup scheme is present, the DGA must be reverse-engineered or the P2P network must be subverted as de-scribed in [22] after disabling the main C&C infrastructure,respectively.

The question remains which sets of domains should berevoked or sinkholed in order to terminate the botnet’s C&Cinfrastructure, which ultimately must be decided by humanoperators. In the case where eliminating the botnet is moreimportant than any possible collateral damage that may beincurred, the set of domains in De [Di should be targeted,which we consider to be the “nuclear” option. This containsany domain name associated with the C&C infrastructureas well as domains queried by the related malware. Inother scenarios, however, this may incur too much collateraldamage. We recommend revoking Dr [Di instead in thesecases, as these domains are very likely to be malicious.These decisions should be made by threat researchers basedon the potential risks associated with deactivating thesedomain names. Another, less extreme option is to simplyblock these domains at the network’s egress point. This al-lows enterprise-sized networks to protect themselves whilelessening the negative impact incurred by collateral damage.

Ground truth for C&C infrastructure is difficult to comeby, which makes evaluating true positives and false pos-itives exceedingly difficult. To roughly estimate this, wepresent the precision and recall of each set against the“correct” set of Dr [Di. If we assume that domains flaggedas low reputation or used by malware known to be affiliatedwith a given botnet are malicious, we can use this unionto roughly correspond to ground truth. In our case, theprecision of a set D is the fraction of the number of domainnames d that are d 2 D ^ d 2 Dr [Di over the size of D or|D| and the recall is the fraction between the same numberof domain names as in the precision but over the size of the“correct” set, or |Dr [Di|.3.5 Use of Other Sources of pDNS DataOut of both financial and analysis convenience, we ranour experiments using Damballa’s internal passive DNSdatabase. To show that our results are not tied to privatedata and can be replicated by other researchers, we runa subset of our experiments using Farsight’s passive DNSdatabase [7]. While the database is not exactly public, it isgenerally available to practicing researchers and profession-als in the security community (possibly for a fee). As pDNSdata becomes more popular, we expect the number of thesedatabases to increase and become more easily accessible byresearchers.



6

Enumerate Infrastructure

InterrogateMalware

No Behavioral Changes

Finite Domains/

IPsDGA

Input: {Ds}

Input: {De U Di}

ClassifyMalware Behavior

P2P

1.) Revoke D

1.) Reverse engineer DGA2.) TLD cooperation3.) Revoke D

1.) Counter P2P2.) Revoke D

Fig. 3: Takedown recommendation engine shown as a deci-sion tree. D in this case represents either Dr[Di, which onlytargets C&C domains that are very likely to be malicious orDe [ Di, or the “nuclear” option that should only be usedwhen the threat of the botnet outweighs potential collateraldamage.

Using Farsight’s pDNS database and rza’s process out-lined in this section, we generate the De, Dm, and Dr

domain sets of one postmortem takedown and five currentbotnets. As before, we compute the respective TIR valuesof each set. We chose the five botnets with the largestC&C infrastructure that Damballa began tracking in April,2013. Our results from the Farsight dataset are presented inSection 5.6.

4 MALWARE INTERROGATION

Understanding how malware behaves when its primary in-frastructure is disabled is critical to performing a successfultakedown. Not only does this reveal additional networkassets used in the botnet’s infrastructure, it can also revealsophisticated back up plans, such as DGA- or P2P-basedC&C schemes, that must be carefully handled to ensurea successful takedown. In this section, we describe howwe play games with malware and how we design andevaluate our heuristics for interrogating malware samplesfor contingency plans.

4.1 Playing Games with MalwareMalware uses the same network protocols that benign soft-ware uses when performing malicious activity. Despite thefact that many network protocols exist, nearly all communi-cation on the Internet follows one of two patterns:

1. A transport layer (e.g., TCP and UDP) connection ismade to an IP address directly, or

2. A DNS query is made for a domain name (e.g.,google.com) and a connection to the returned IPaddress is made as in #1.

Higher-level protocols leverage these two use cases fornearly all communication. If we can assume malware relieson these two patterns for contacting its C&C servers andperforming its malicious activities, these are the patterns wemust target during analysis.

We define a network game to be a set of rules that de-termine when to inject “false network information” into thecommunication between a running malware executable andthe Internet. More specifically, false information is a forgednetwork packet. Consider the running malware sample m inFigure 4(a). Sample m first performs a DNS query to deter-mine the IP address of its C&C server located at foo.com.The returned IP address, a.b.c.d, is then used to connectto the C&C and the malware has successfully “phonedhome”. Sample m could also bypass DNS entirely if it wereto hardcode the IP address of its C&C and communicatewith it directly, as we see in Figure 4(c). This gives us twoopportunities to play games with sample m as shown inFigures 4(b,d): we can say the domain name resolution offoo.com was unsuccessful (b) or the direct connection toIP address a.b.c.d was unsuccessful (d). At this point,sample m has four possible courses of action:

1. Retry the same domain name or IP address,2. Remain dormant to evade dynamic analysis and try

again later,3. Give up, or4. Try a previously unused domain name or IP address.

In (b) and (d), we see the malware samples takingaction #4 and querying a previously unseen domain name(bar.com) and IP address (e.f.g.c), respectively. Action#2 is a common problem in dynamic malware analysissystems in general and is further discussed in Section 4.4.

4.1.1 Notation

Stated more formally, let h be a machine infected with amalware sample m that is currently executing in our anal-ysis system running game G

name

. Gname

is a packet trans-formation function called name. Given a packet p, G

name

(p)represents G

name

gaming p and its value is either the originalpacket, or some altered packet p0 that changes the intent ofp. The implementation details of G

name

determine when toreturn p or p0. For example, p could contain the resolvedIP address of a queried domain name d, whereas p0 says ddoes not exist. In all other ways, such as type of packet andsource and destination IP addresses, p and p0 are identical.As h communicates with the outside world, it sends ques-tion packets, qi, in the form of domain name queries andrequests to initiate a TCP connection and receives responsepackets, rj , in the form of domain name resolutions andinitiated TCP connections1. False information is provided tothe host h by delivering G

name

(rj) in lieu of rj . A sample

set for m, Gname,m represents the set of unique domain

names and IP addresses queried by m while running underG

name

. The functions D and I operate on sample sets andreturn the subset of unique domain names or the subset ofunique IP addresses, respectively. Given a set of malwaresample MD5s, M , a game set, GM

name

represents the subset of

1. More accurately, a TCP response packet is a TCP SYN-ACK packetas part of the TCP connection handshake.



7

DNS resolver

m GZA C&C1

IP: a.b.c.d

C&C2

IP: e.f.g.h

Q: foo.com IP?

R: a.b.c.d

Q: TCP SYNa.)

b.)

Q: foo.com IP?

R: a.b.c.dR: NXDOMAIN

Q: bar.com IP?

...

c.)

...

m'

Q: TCP SYN

R: TCP SYN-ACK

...

Q: TCP SYN

R: TCP SYN-ACKR: TCP RST

Q: TCP SYN

...

d.)

Fig. 4: Malware samples m (a-b) and m0 (c-d) initiatinga connection with the C&C server. m connects by firstperforming a DNS query to determine the IP address of itsC&C server followed by initiating a TCP connection. Samplem0 connects directly to the C&C using a hard-coded IPaddress. Examples (a) and (c) connect without interventionby a game, while (b) and (d) have false information (denotedby boxes) injected.

Gname

A game called name.G

name

(p) The result of Gname

’s transformation on packet p.G

name,m The set of network information i.e., unique IPaddresses and domain names contacted, generatedwhen malware sample m is gamed by G

name

.GM

name

The subset of malware samples from M that weresuccessfully gamed by G

name

.D(s), I(s) Given a sample set, s, return the subset of unique

domain names or IP addresses in s, respectively.

TABLE 1: Notation for describing games and the sets theygenerate.

samples that were “successfully gamed” by Gname

. A gameis considered successful if it forces a malware sample toquery more network information than under a run withouta game present. We formally define this in the followingsection. The described notation is summarized in Table 1and will be used throughout the remainder of the paper.

4.1.2 Designing Games for Interrogating Malware

Crafting games without a priori knowledge of malwarenetwork behavior is difficult. Furthermore, a successfulgame for sample m may be unsuccessful for sample m0. Byusing generic games “en masse”, we improve our chancesof successfully gaming malware during analysis. We designa suite of games to coerce a given malware sample into

showing its alternative plan during analysis. We apply all

games to a malware sample to improve the likelihood ofsuccess. Each game focuses either on DNS or TCP responsepackets in an attempt to harvest additional C&C domainnames or IP addresses, respectively. For a DNS responsepacket pd, p0d is a modified response packet that declares thequeried domain name does not exist, i.e., a DNS rcode ofNXDOMAIN. For a TCP response packet pt, p0t is a modifiedresponse packet that terminates the 3-way TCP handshake,i.e., a TCP-RESET packet. In this paper, we choose to focuson DNS/TCP packets as they are the predominant protocolsused to establish and sustain C&C communication; how-ever, our approach is general and can be adapted to otherprotocols used less commonly in C&C communication. Thedesign of an individual game is based on anecdotal evidenceof how malware samples, in general, communicate. Wedesign seven games to perform our analysis of alternativeplan behavior in malware:

Gnull

: To provide a baseline to compare the effective-ness of future games, this game allows response packets toreach its host without modification. In other words, G

null

isthe identity function.

Note that this does not mean malware communication isallowed to run completely unfettered. We perform standardprecautionary measures to prevent malicious activity fromharming external systems. However, these measures are notconsidered part of our network games, but simply goodpractice when analyzing potentially malicious binaries.

Gdnsw

: A popular domain name, like google.com,is unlikely to operate as a C&C server for a botnet. There-fore, DNS queries on popular domain names are unlikelyto be concealing additional malicious network information.For a DNS response packet pd, G

dnsw

(pd) returns pd ifthe domain being queried is whitelisted and p0d other-wise. Our whitelist is comprised of the top 1000 Alexadomain names [2]. G

dnsw

is successful for m iff |Gdnsw,m| >

|D(Gnull,m)|.G

tcpw

: An IP address that resides in a known benignnetwork is also unlikely to function as a C&C, much likea popular domain name. For a TCP response packet pt,G

tcpw

(pt) returns pt if the IP being queried is whitelistedand p0t otherwise. Our whitelist is the dnswl IP-basedwhitelist [6]. G

tcpw

is successful for m iff |Gtcpw,m| >

|I(Gnull,m)|.

4.2 Methodology

We can interrogate a single malware sample under differentenvironmental conditions to learn additional domains itmay use to reach its C&C, as well as any contingencyplans for C&C infrastructure failure. We identify the setof malware samples M that communicate with domains inDe for interrogation. To accomplish this, we can use ourexisting system that studies malware’s behavior under pri-mary C&C failure [20] to automatically determine malwarebackup plans. We run an individual malware sample underfive execution scenarios, extract the network endpoints themalware sample used to “phoned home”, and based on thedifferences observed during executions, we identify likelybackup plans.



8

Behaviorally, most malware when presented with un-available centralized infrastructure resort to one of the fol-lowing backup plans:

1. The malware simply retries connecting to hardcodeddomains and/or IP addresses.

2. The malware attempts to connect to a finite set ofadditional domains and/or additional IP addresses.

3. The malware attempts to connect to an “infinite” setof domains and/or IPs. This occurs when a malwareuses a DGA- or P2P-based backup system.

We can isolate and detect these behaviors by runningeach sample and applying various packet manipulationscenarios to simulate infrastructure takedown. As a control,we manipulate none of the packets during execution. Toshow that a domain name has been revoked, we rewrite allDNS response packets that resolve non-whitelisted domainnames to say the domain no longer exists (NXDomain). Werun a sample under this scenario twice for durations t and2t. To feign IP address takedowns, we interrupt TCP streamswith TCP reset (RST) packets when the destination is to anon-whitelisted IP address. We also run this scenario fordurations t and 2t. Intuitively, if the number of endpoints(domains or IPs) remains consistent across all runs, themalware sample does not include a contingency plan forC&C failure. If the number of endpoints is greater whenthe DNS or TCP rewriting is enabled, but remains similarbetween the two runs with different durations, we expectthe malware contains a finite set of additional endpointsas a backup mechanism. However, if we see many moreendpoints in the 2t duration run than in the t run, thissuggests the malware is capable of constantly generatingadditional candidate domains or IPs to connect to, which in-dicates DGA or P2P behavior, respectively. In the event thatthe primary C&C infrastructure is already disabled as wewould expect in the postmortem studies, the interrogationresults still hold. If the botnet employs a backup DGA/P2Pmechanism, we will still detect this as the t and 2t durationruns will still differ. The system may misclassify a sampleas having no backup plan if its infrastructure is alreadydisabled, but this is unlikely to effect rza from functioningproperly. Consider a sample m that has a finite numberof backup domains, but all of the primary domains havealready expired and return NXDomain. The control run andDNS rewriting run will be identical and the sample will bemisclassified as having no backup behavior, however, wewill still identify all the backup domains so the results willstill hold.

We empirically design heuristics using the above in-tuition and by analyzing 595 malware samples from 10malware families with known contingency plans and cater-ing our rules to perform the identification. Of the samplesanalyzed, 433 had no contingency plan, 55 used a DGA,81 used P2P communications, and 22 employed a finite setof backup domains. None of the analyzed malware used afinite number of additional IP addresses.

4.3 ResultsWe interrogated 591 malware samples from 10 malwarefamilies shown in Table 2. The families have known con-tingency plans with which we can use to tune our heuristic

Malware Family CountExpiro.Z 100Conficker 100Murofet 97TDSS/TDL4 92ZeroAccess 82zbot 25Vilsel 25Onlinegames 25Fakealert 25Boonana 20

TABLE 2: Malware family training set breakdown for mal-ware interrogation.

dga finitedomain finiteip none p2pdga 53 1 0 1 0finitedomain 0 21 0 1 0finiteip 0 0 0 0 0none 4 2 1 426 0p2p 1 1 1 1 77

TABLE 3: Confusion matrix for malware interrogation.

rules to perform the identification. Of the samples analyzed,433 had no contingency plan, 55 used a DGA, 81 used P2Pcommunications, and 22 employed a finite set of backupdomains. None of the analyzed malware used a finite num-ber of additional IP addresses. Our heuristics successfullyclassified 97% of the samples’ contingency plans correctly.A confusion matrix of the results is shown in Table 3.

This shows that with very simple heuristics one can cor-rectly identify backup behaviors that may spoil an otherwiseperfect takedown.

4.4 Evasion

Attackers are always attempting to evade newly createddefenses. The most obvious ways to evade malware in-terrogation are through timing attacks, peer-to-peer val-idation of network resource connectivity, communicatingwith different protocols, or by evading dynamic analysisentirely with excessive timeouts prior to performing ma-licious behavior. We discuss these evasion techniques andpresent methods to address these shortcomings. Since ourgames use RFC-compliant network responses, malware isunable to determine if it is being gamed at the host-leveland subsequently must use the network in clever ways todetermine its execution environment.

Dynamic malware analysis systems generally executemalware for a fixed period of time, usually around fiveminutes per sample. Malware can remain dormant until thistime passes to evade detection and analysis. Prior work ad-dresses this limitation by finding these trigger-based behaviors

and generating inputs to satisfy the triggers at runtime. Thislimitation applies to all dynamic analysis systems in generaland is orthogonal to the problem we are trying to solveof increasing the network information an executing sampleattempts to connect to.

Overhead incurred during usermode packet generationcould enable a clever malware author to determine if theyare being gamed or not. As a performance improvement,malware games will only route packets relevant to thegame in question. For example, when dnsw

iseing played,iptables will only route UDP packets with a port of 53



9

destined for a VM. If DNS packets take abnormally long,while packets of other types are unaffected this could alert amalware sample that it is being analyzed. Simply routingall packets through its game would apply this overheaduniformly across all packets, removing the signal.

Peer-to-peer (P2P) evasion is when a malware sampleverifies the results of a DNS or TCP request by askinganother infected machine to perform an identical request.If a sample, m, cannot resolve a domain name d, but fellowinfected hosts can resolve d successfully, m has reason tobelieve it is being run under our system. Communicatingthis information, however, requires the network. This forcesm to succumb to gameplay one way or another; gamingof its initial C&C communication or gaming of verificationqueries to its peers. By focusing on the building blocks ofnetwork communication, we force all network activity to begamed.

To perform a DNS query, a malware sample could queryan HTTP-based DNS tool2, bypassing the DNS protocol en-tirely. Furthermore, it could directly connect to a C&C usinga non-gamed protocol, such as UDP. These problems areeasily addressed by running aggregate games and addingadditional protocols. Querying an HTTP-based DNS lookuptool still requires some network activity so running DNS andTCP games simultaneously would prevent this lookup fromsucceeding. If an attacker uses another protocol, such asUDP, it is easy to write a new game that targets this newbehavior. As malware adapts to the presence of networkgames, malware analysts can keep pace with malware au-thors without too much effort.

5 POSTMORTEM STUDIES

In this section, we describe how we use rza to evaluatehistorical takedowns. We introduce the takedowns we studyand describe the measurements we use to understand theeffectiveness of the takedown. We end the section with ourexperimental results on the postmortem studies.

5.1 Postmortem Analysis

For our postmortems, we study the 3322.org NS takedownthat targeted the Nitol botnet [15] (aka Operation b70), theCitadel botnet takedown [16], and the No-IP takedown [17].We chose these takedowns because they are both recent andhigh profile. For each takedown, we collect the domainsdescribed in the temporary restraining orders (TRO) anduse these as our seed domains (DS).

Measuring Takedown Improvement: Prior studies ofbotnet takedowns relied on secondary measurements, suchas global spam volumes, to determine the success of a take-down. Instead, we directly measure the successful domainname resolutions to the identified infrastructure to proxy forthe victim population. By comparing the lookup volume tothe seed domains (DS) with the lookup volume to the sets ofdomains identified by rza, we can determine if a takedownwas successful and what domains it missed. For example, ifall domain sets are equivalent, their lookup volumes will beidentical and the takedown would be considered successful.

2. http://www.kloth.net/services/nslookup.php

More formally, for each takedown, t, and its collectedseed domains, Dt

S , we generate the enumerated infrastruc-ture sets Dt

e, Dtm, Dt

r and Dti using rza. Dt

e is generatedusing only successful DNS resolutions that were issued dur-ing the seven days before the takedown of t was performedaccording to the court documents3. This allows us to com-pare what was actually disabled and/or sinkholed duringthe takedown with what rza would have recommended.

For a period of 14 days surrounding the takedown, weplot the successful aggregate daily lookup volume to each ofthe previously identified sets. To quantify the gains in take-down effectiveness, we calculate the takedown improvement

ratio as defined by Equation 1.

TIR(D1

, D2

) =MDLV (D

1

)

MDLV (D2

)(1)

Where D1

and D2

are two domain name sets andMDLV is a function on domain name sets that computesthe median daily successful lookup volume. We use themedian, rather than the mean, since we are interested inpreserving long-term lookup volume trends, which are notcaptured by outliers. If TIR(Dt

m, DtS) > 1, this means the

subset of De of malware-related domain names Dtm had a

stronger lookup volume and accounts for domain namesmissed by the takedown domains Dt

S . Conversely, if theTIR 1, the takedown deactivated related malware do-mains already and was successful. We also identify malwarebackup behaviors.

Estimating Risk: To provide a different perspective,we also quantify the potential risk of collateral damage, or thenegative effect of mistakenly taking down benign domains.Ideally, we would represent this by the number of distinctclients that would be denied access to benign services,however, we can once again turn to the lookup volumesto proxy for this.

If we assume all infected botnet hosts behave identically,the aggregate lookup volume on a given day is proportionalto the number of infected clients. At most, a single lookupcorresponds to a distinct client reaching that domain, how-ever, due to DNS caching effects, differences in malwarevariant and human behaviors, and network address trans-lation (NAT), this is likely an overestimation of the actualclient population. We assume that these behaviors are con-sistent with respect to queries towards a given botnet.

We quantify the potential risk of collateral damage fora takedown as the difference in the median lookup volumebetween an enumerated set and the initial seed domain setas defined by Equation 2.

Risk(D1

, D2

) = MDLV (D1

)�MDLV (D2

) (2)

Using similar notation as seen in Equation 1. Intuitively,the difference between these two quantities is proportionalto the number of individuals that would be inconveniencedby this takedown if all the domains in D

1

that are notin D

2

are not malicious. This provides an upper bound onthe potential risk involved. The “nuclear option” of takingdown all the domains in De, or sinkholing all domains that

3. These are September 11th, 2012; June 5th, 2013 and June 30th, 2014for the 3322.org, Citadel and No-IP takedowns, respectively.



10

resolve to hosts known to provide C&C for a botnet, is theonly way to ensure the C&C communication line is severed,however, this should be weighed against the potential risks.

An analyst wishing to perform a takedown can usethe risk values to weigh whether to employ the “nuclear”option or the more reserved options as described in Sec-tion 3.4. In future work, we hope to improve the riskmeasure in two ways. First, we can correlate the risk valuewith the identified true and false positive rates duringa real, or simulated, takedown. Furthermore, we wish tomore accurately estimate the true population of visitorsto infrastructure, malicious or otherwise. This can furtherhelp analysts by allowing them to weigh the likelihood ofmaliciousness against the population that would be affectedby a takedown.

5.2 Improved Postmortem AnalysisIn our previous takedown study [19], we were limited byusing daily lookup volumes as a proxy for victims commu-nicating with active malicious infrastructure. Clearly this isa limitation, but was due to the nature of the datasets we hadaccess to. Fortunately, new data has become available to usand we can now compute the population of distinct clientsthat query a given domain name or resource record (i.e.,domain and IP tuple) on a specific date. We use this newvalue to compute improved postmortem takedown mea-sures and compare them to the previously defined ones. Wenow define TIR+ and Risk+, the improved analogues forTIR and Risk, respectively, which use population countsrather than lookup volume counts.

TIR+: For a period of 14 days surrounding the take-down, we plot the successful aggregate daily populationcount to each of the previously identified sets. To quantifythe gains in takedown effectiveness, we calculate the im-

proved takedown improvement ratio as defined by Equation 3.

TIR+(D1

, D2

) =MDP (D

1

)

MDP (D2

)(3)

Where D1

and D2

are two domain name sets and MDPis a function on domain name sets that computes the mediandaily client population. We use the median, rather thanthe mean, since we are interested in preserving long-termpopulation trends, which are not captured by outliers. IfTIR(Dt

m, DtS) > 1, this means the subset of De of malware-

related domain names Dtm had a stronger lookup volume

and accounts for domain names missed by the takedowndomains Dt

S . Conversely, if the TIR 1, the takedowndeactivated related malware domains already and was suc-cessful.

Risk+: We quantify the potential risk of collateraldamage for a takedown as the difference in the medianclient population between an enumerated set and the initialseed domain set as defined by Equation 4.

Risk+(D1

, D2

) = MDP (D1

)�MDP (D2

) (4)

Using similar notation as seen in Equation 3. Intuitively,the difference between these two quantities is proportionalto the number of individuals that would be inconveniencedby this takedown if all the domains in D

1

that are not

in D2

are not malicious. This provides an upper bound onthe potential risk involved. The “nuclear option” of takingdown all the domains in De, or sinkholing all domains thatresolve to hosts known to provide C&C for a botnet, is theonly way to ensure the C&C communication line is severed,however, this should be weighed against the potential risks.

TIR+ and Risk+ can only be computed for takedownsthat occur after November 14, 2012 due to data availability.

For each of the following takedown postmortem analy-sis, the dashed red line on each plot indicates the date thetakedown was performed according to the court proceed-ings. Each line plot represents either the aggregate dailylookup volume or aggregate daily population count to asubset of domains that are either directed to a sinkhole orcontained within the enumerated infrastructure sets gener-ated by rza. In all cases the De lookup volume represents anupper bound of malicious lookups.

5.3 3322.org

The 3322.org takedown represents an extreme case whererza would have improved a takedown’s effectiveness. Thistakedown was accomplished by transferring the entire3322.org Name Server’s (NS) authority to Microsoft and do-mains deemed malicious resolved to a set of known sinkholeIP addresses. The daily volume plot for 3322.org is shown inFigure 5. Unlike the Citadel takedown, domains were sunkon the day of the takedown and were limited to *.3322.orgdomain names. Unfortunately, this only accounted for afraction of the lookups to domains with known malwareassociations, Dm, and domains with low reputation, Dr

that resolved to hosts known to support malicious activity.We notice a drop in lookups to Dm and Dr when thetakedown is performed, showing that most of the domainstargeted by the takedown were likely malicious, however,the lookups to remaining infrastructure identified by rza arestill frequent. All enumerated sets have TIR values greaterthan one. This agreement suggests that malicious domainswere almost certainly missed during the 3322.org takedowneffort. Of the 10,135 malware samples we analyzed, none ofthem had a P2P- or DGA-based contingency plan.

This case shows the importance of using multiplesources to determine related malicious infrastructure beforeperforming a takedown. Simply identifying domains withknown malware associations offers a substantial improve-ment on the effectiveness of the takedown. Further, thesimilarity between the Dm and Dr trends shows most ofthe domains overlap between the two, which only furtherbolsters the likelihood that they are indeed malicious. Tomake matters worse, all the domains that were not sink-holed were given enterprise-level domain name resolutionservices, despite the high probability they were involvedin malicious activities. The computed TIR values for the3322.org takedown are shown in Table 4. Unlike the previ-ous two postmortems, rza identified numerous additionalmalicious domains that were left undisturbed by the take-down on 3322.org.

For the De and Dm sets, we have precision and re-call of 0.06/0.95 and 0.38/0.03, respectively. These resultsfurther reinforce the need to include domain reputationas a measure in rza. Simply relying on passive DNS (for



11

Sets TIR value RiskDm, D

mssink

13.821 409,593.5Dr, D

mssink

18.956 573,627.5De, D

mssink

654.940 20,890,774

TABLE 4: TIR and Risk values for 3322.org takedown.

De) and malware associations (for Dm) overestimate andunderestimate the malicious domain names, respectively.

Fig. 5: 3322.org aggregate daily lookup volume (log-scale).

5.4 Citadel

The Citadel takedown is an interesting case, where damagefrom “friendly fire” occurred and likely malicious domainsremained active. First, we see in Figure 6 that anothersinkhole was present for some of the Citadel infrastruc-ture (othersink), which has been confirmed by the sinkholeowners [1]. Confusingly, however, we do not see a decreaseduring the takedown, but eventually an increase in lookups.This is likely due to an increase in lookups as the sinkholesdo not reply with proper commands, causing bots to re-issuelookups. Next, we notice a drop in successful lookups forthe Citadel domains claimed to have been taken down byMicrosoft, but note there are still many successful lookups tonon-sinkholed Citadel domains. Furthermore, we see verylittle impact to the Dm, Dr , and De sets after the takedown,indicating many malicious domains not targeted in the take-down are still successfully resolving. In Figure 7 we showthe aggregate active clients for each subset. Importantly, wesee that they are strictly less than the lookup volume and allroughly follow the same trajectory, indicating that the dailylookup volume is an effective proxy for the unique clientcounts. We also see that the domain names claimed to havebeen taken down (citadel) are not all actually redirected tothe sinkhole (citadel-mssink).

TIR and Risk values are shown in Table 5 and TIR+and Risk+ are shown in Table 6. For Dm in both we seevalues similar to Dm in the 3322.org takedown case, but the

TIR and Risk values for Dr and De for Citadel are muchhigher. While we see a similar sharp increase for 3322.org,note the TIR+ and Risk+ values for Citadel increase muchless rapidly. This shows that for small sets, TIR and Riskvalues reasonably reflect reality, i.e., lookup volume remainsa reasonable proxy. However, in large sets using the exactpopulation as we do with TIR+ and Risk+ offers muchmore reasonable assessments for takedown improvementand risk evaluation.

Fig. 6: Citadel aggregate daily lookup volume (log-scale).

Fig. 7: Citadel aggregate daily active clients (log-scale).

This suggests that for small, malware based sets usingthe simple TIR value is sufficient, but with more compli-catedly derived sets using the actual population is muchmore powerful. It should be noted that our population isbiased (and smaller than the volume set), but still fairly



12


mssink

25.821 892,349Dr, D

mssink

81.703 2,900,934De, D

mssink

1,134.503 40,744,905

TABLE 5: TIR and Risk values for Citadel takedown.

Sets TIR+ value Risk+Dm, D

mssink

23.354 948,610Dr, D

mssink

29.482 1,208,680De, D

mssink

35.780 1,475,908

TABLE 6: TIR+ and Risk+ values for Citadel takedown.

representative of North America, which is the populationthe takedowns were meant to protect.

5.5 No-IP

The No-IP takedown generated a lot of press and washeavily condemned by the No-IP DNS provider [21]. Notonly did Microsoft not contact No-IP about the maliciousdomains in question, but the takedown was performed ina similar manner to the aforementioned 3322.org takedown.Instead of taking down the No-IP nameservers, the zonesused by the dynamic DNS names were taken over byMicrosoft. Malicious domains were routed to the sinkhole,while the others were supposed to be routed faithfully. As inprevious cases, there were many malicious domain namesthat were not sinkholed, as evidenced by the volume andclient population counts in Figures 8 and 9. One interestingpoint of note is the population counts are actually higher

than the volumes. While this may seem counter-intuitive, itis an explainable artifact of our collection process. Volumecounts are usually done above the recursive and the effectsof caching cannot be easily measured. Population counts aredone below the recursive and can register a lookup even ifit is cache at the local recursive. This means that lookupsto No-IP domains were more heavily concentrated andlikely had longer TTLs than those in the Citadel case. Thismakes sense because No-IP is very popular even for benignhosting, which tend to have longer TTLs than maliciousdomains.

TIR and Risk values are shown in Table 7 and TIR+and Risk+ are shown in Table 8. The results are presenteddifferently because of the nature of the takedown. Thesevalues are computing with respect to D

mssink

as well asD

noip

which are No-IP domains sinkholed and No-IP do-mains affected (in any way) by the takedown. The first thingto notice is the TIR/Risk values are much larger than theTIR + /Risk+ values. In the other takedowns, we wererestricted to specific fully qualified domain names, but inthis case all domains under the zones were taken down. Inthese cases, volume is probably too relaxed of a proxy forpopulation and exact population should be used if possible.Finally, we also include the risk in taking down all of No-IPrather than just the domains Microsoft meant to take down,i.e., those that appear in “mssink.” In this case, we see atleast 566 thousand people were negatively impacted by thetakedown actions.

Fig. 8: No-IP aggregate daily lookup volume (log-scale).

Fig. 9: No-IP aggregate daily active clients (log-scale).

5.6 RZA with Farsight pDNS Data

We replicated part of our evaluation using only Farsightdata. Specifically, we generated the De, Dm, and Dr do-main sets and computed the respective TIR values of Zeusand five of the current botnets with the most domains wetracked in our paper. Di sets were excluded due to timelimitations. Our results from the Farsight dataset are shownin Table 9 and are largely consistent and show that theprocess employed by RZA can be done with other sourcesof pDNS data. The important detail to glean is that theprocess rza uses is independent of our private dataset andcan be performed using public sources of passive DNS data.Due to regional variations, the TIR values are unlikely tobe identical between the two datasets; however, the process



13


mssink

2,537.85 120,847,924Dr, D

mssink

3,379.05 160,920,164De, D

mssink

5,994.491 285,511,928Dm, D

noip

2,176.494 120,840,015Dr, D

noip

2,897.919 160,912,255De, D

noip

4,140.956 285,504,019D

noip

, Dmssink

1.166 7,909

TABLE 7: TIR and Risk values for No-IP takedown.

Sets TIR+ value Risk+Dm, D

mssink

1146.945 6,188,102Dr, D

mssink

1305.379 7,043,646De, D

mssink

3021.707 16,311,820Dm, D

noip

10.824 5,621,290Dr, D

noip

12.319 6,476,834De, D

noip

28.516 15,745,008D

noip

, Dmssink

105.965 566,812

TABLE 8: TIR+ and Risk+ values for No-IP takedown.

and generated sets are the important factors.

Takedown De Dm Dr

Zeus 4.843 0.000 1.014#1 1.108 1.012 1.082#2 0.969 0.969 0.459#3 0.787 0.787 0.718#4 0.680 0.680 0.613#5 1.944 1.451 1.122

TABLE 9: Farsight-computed TIR values.

5.7 Running TimeTo show the benefit of rza even against clever attackers,we show the expected running time from start to finish inEquation 5. We show that the running time is short enoughthat pace can be kept with evading attackers. We showthe time it took to perform the postmortem takedowns de-scribed earlier in this section. In each case, the running timeis sufficiently short that attackers will not have substantialtime to continue evading a takedown, which will likely besuccessful in the long run.

The total running time for performing a full takedownpostmortem or a takedown recommendation analysis is Tas defined by:

T = TIE + TMI (5)

where TMI is the time to perform malware interrogationon the botnet and TIE is the time it takes to performinfrastructure enumeration. Each sub-equation is definedbelow:

TIE = 2↵P (6)

Equation 6 represents the time to fully enumerate theinfrastructure of a botnet’s criminal network. P representsthe time to perform a full pass of the passive DNS databaseto identify related infrastructure and ↵ represents the num-ber passes that must be performed until the infrastructureconverges. Two passes over the passive DNS database mustbe made in order to enumerate infrastructure; once toidentify related historic IP addresses and a second time toidentify related historic domain names. Recall Figure 3 that

TABLE 10: Timings for end-to-end run of postmortem stud-ies.

Takedown ↵ m T (in hours)3322.org 1 12,745 9.41Citadel 1 4,861 4.02No-IP 1 18,913 13.63

shows the process to perform a takedown recommendation.If additional infrastructure is identified through malwareinterrogation, an additional pass over the passive DNS isneeded to continue to expand our knowledge of the infras-tructure.

Performing a pass over the passive DNS database canbe done with either bulk or individual requests. In a bulkrequest, the running time is O(1) with respect to the numberof domains but a single bulk request runs on the orderof minutes, while with individual requests the runningtime is O(n) in the number of domain names that mustbe queried but a single request finishes in seconds. Forease of presentation, we only consider bulk requests but inan operational environment individual requests would bemade where time would be saved. As we show, however,most infrastructure can be enumerated in a single pass.

TMI =m⇥ 7t

M (7)

Equation 7 is the time to fully interrogate the malwareof the botnet to extract any additional network endpointsthat must be disabled, as well as any potential backupC&C behavior included in the botnet’s infrastructure. Thisis limited by the number of machines we have availablefor performing malware analysis, M, the duration of thecontrol malware analysis run, t, and the number of malwaresamples related to the botnet’s infrastructure m. Recallthat each malware sample must be run five times, two ofwhich for duration 2t, in order to understand the malware’sbackup plan (see Section 4).

Bound and Free Variables: Throughout the evalua-tion of the systems, the following variables were bound:

• M = 512 virtual machines for malware interroga-tion.

• t = 3 minutes for the control malware executiontimeout. Note malware may terminate earlier of itsown accord, but malware will run for at most threeminutes.

• P = 21 minutes to make a single, full pass over thepassive DNS database.

This leaves two free variables to compute for each post-mortem or takedown recommendation: ↵ and m the numberof runs of the system until convergence is achieved and thenumber of malware samples to be analyzed, respectively.

The timing information for the takedown postmortemsis summarized in Table 10. In short, we see that full analysescan be done on the order of hours and can likely keep pacewith agile botnet infrastructure.

6 DISCUSSION

In this section we discuss the limitations of rza, as wellas policy implications of our research and how botnet



14

takedowns should be performed at a policy, rather thantechnical, level.

6.1 Limitationsrza is a useful tool for understanding and assisting botnettakedowns, but it is not a panacea. In particular, rza canonly assist in a limited sense against non-DNS-based botnetC&Cs or DNS C&Cs with non-overlapping infrastructure.

rza is focused on understanding and improving take-downs of traditional DNS-based C&C infrastructure so itis inherently limited in its ability to do so for non-DNSbotnets and C&Cs that tunnel traffic over the DNS [26].However, some of the techniques could still be used inother cases. First, we have shown that rza can help identifyadvanced C&Cs, such as DGA- or P2P-based protocols.Second, the techniques described—particularly those formalware interrogation—could be used to assist in analyzingbotnets with direct IP C&C servers. rza can even help withadvanced hybrid peer-to-peer botnets [25] by detecting thepresence of P2P activity, and further help if operators chooseto augment such infrastructure with more agility usingdomain names.

Furthermore, DNS botnet infrastructure could be orga-nized such that it would evade identification during ourinfrastructure enumeration process. For example, if the C&Cdomain names never share IP infrastructure, the enumera-tion procedure described in Section 3.2 would fail to identifythe additional network assets to disable. That being said,this defeats the purpose of using the DNS for agility andis unlikely to occur in the extreme case. This is why rza

also performs malware analysis, allowing it to identify caseswhere passive analysis is insufficient.

6.2 PolicyTakedowns are currently performed in an ad-hoc mannerwith little oversight, which makes it difficult for the securitycommunity at large to assist by contributing intelligence.Furthermore, there is no standard policy for enacting a take-down at the DNS-level forcing companies to coordinate withmultiple registrars, pay for expensive court proceedings,or both to disable botnets. Existing measures for handlingdomain name issues exist, however, in the form of handlingtrademark disputes.

Our postmortem studies illustrated several drawbacksto the current ad-hoc manner in which takedowns are per-formed, namely: a lack of coordination, little to no oversight,and an environment that discourages collaboration. Withoutan effective form of coordination, we will continue to seeinstances in where two or more security companies, withgood intentions, will step on each others toes as we sawin the Citadel and No-IP takedown cases. We also sawoversight issues in the Citadel takedown where domainswere clearly being sinkholed before the date presented in thecourt order. Yet another, but more subtle, oversight issuedeals with the method of instigating these takedowns: courtorders. Each of the court orders for the presented takedownswere filed under seal, meaning they are not open to thepublic and require either the claimant to release the recordunder their discretion, or other legal action to unseal therecord.

Even more worrisome is language explicitly allowingfurther unverified action. In the 3322.org takedown tem-porary restraining order it was specified that “the author-itative name server ... [is] to respond to requests for theIP addresses of the sub-domains of 3322.org may respondto requests for the IP address of any domain listed inAppendix A4

or later determined to be associated with malware

activity...” [15] (emphasis ours). While an authoritative nameserver takeover technically grants this ability, if the purposeof the court order is to prevent collateral damage or un-lawful takedown this clause effectively negates any futureprotection. It also suggests that the full scope of the threatwas not clear at the time of the takedown by specificallypermitting further cleanup actions.

Trademark and intellectual property interests were in-volved very early on in during the formation of the InternetCorporation for Assigned Names and Numbers (ICANN),which is responsible for coordinating, among other criticalInternet infrastructure, the DNS. Through the World Intel-lectual Property Organization (WIPO), trademark interestswere arguing for procedures to protect trademarks in theDNS as early as December 1998 [18], and successfully forcedICANN to require a dispute resolution procedure dubbedthe “Uniform Dispute Resolution Policy” or URDP.

URDP is an ICANN policy that specifies independentarbitrators to oversee the process of dispute resolution.These “[i]ndependent arbitrators make a decision quicklyand (relative to courts) inexpensively” [18] and are built into the accreditation contracts to registrars. The UDRP [9]requires three conditions to be met to file a complaint:

i. Your domain name is identical or confusingly similarto a trademark or service mark in which the com-plainant has rights; and

ii. You have no rights or legitimate interests in respectof the domain name; and

iii. Your domain name has been registered and is beingused in bad faith.

In its first year, UDRP successfully “handled over 2,500cases involving nearly 4,000 names” [18] and has expandedsince. In fact, ICANN is introducing The Uniform RapidSuspension System (URS) [10] as a more expedited form ofthe URDP and is requiring new generic top-level domains(gTLDs) to follow URS in their contracts.

We suggest a similar procedure ought to be available toprovide the security community a point of coordination anda formal process to follow when performing takedowns. Itwould reduce exorbitant fees paid to courts, would likelybe faster, and would mandate oversight from arbitrators.The procedure could be applied to future TLDs as a test,much like URS. Automated systems like rza could servean invaluable place in this process to reduce the burdenon human operators and further expedite the takedownprocess.

7 CONCLUSION

We presented rza, a takedown analysis system that performspostmortem analyses of past takedowns. We have shown

4. Appendix A of the cited restraining order.



15

that rza would be useful in helping to both expedite thetakedown process, as well as ensuring future takedowns aremore complete. We presented a more accurate estimation ofthe takedown’s improvement and risk, TIR+ and Risk+,that also illustrate our earlier intuition of using volume as aproxy for population was well founded. Finally, we performpostmortem analyses of the Citadel and No-IP takedowns.Both were unsuccessful and help show the continuing im-portance of having a tool like rza to understand historictakedowns.

ACKNOWLEDGMENTS

The authors thank the anonymous reviewers for their help-ful comments and insightful questions.

This material is based in part upon work supportedby US Department of Commerce (DOC) under grant no.2106DEK. Any opinions, findings, and conclusions or rec-ommendations expressed in this material are those of theauthors and do not necessarily reflect the views of the DOC.

REFERENCES

[1] Abuse.ch. Collateral Damage: Microsoft Hits Security Researchersalong with Citadel, 2013. https://www.abuse.ch/?p=5362.

[2] Alexa. Top sites. http://www.alexa.com/topsites, (Retrieved)March 2011.

[3] M. Antonakakis, R. Perdisci, D. Dagon, and W. Lee. Building aDynamic Reputation System for DNS. In Proceedings of the USENIX

Security Symposium, 2010.[4] L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi. EXPOSURE:

Finding Malicious Domains Using Passive DNS Analysis. InNetwork and Distributed System Security Symposium (NDSS), 2011.

[5] Conficker Working Group. Conficker Working Group: LessonsLearned, 2011. http://www.confickerworkinggroup.org/wiki/uploads/Conficker Working Group Lessons Learned 17 June2010 final.pdf.

[6] dnswl. DNS whitelist - protect against false positives. http://www.dnswl.org, (Retrieved) March 2011.

[7] Farsight Security, Inc. SIE/Farsight Security’s DNSDB, 2013. https://www.dnsdb.info/.

[8] M. Harris. Spammers recovering from McColo shutdown,2009. http://www.techradar.com/news/internet/spammers-recovering-from-mccolo-shutdown-591118.

[9] Internet Corporation for Assigned Names and Numbers. UniformDomain Name Dispute Resolution Policy. Technical report, 1999.

[10] Internet Corporation for Assigned Names and Numbers. UniformRapid Suspension System. Technical report, 2012.

[11] B. Krebs. Major Source of Online Scams and Spams Knocked Of-fline, 2008. http://voices.washingtonpost.com/securityfix/2008/11/major source of online scams a.html.

[12] B. Krebs. Spam Volumes Drop by Two-Thirds After Firm Goes Of-fline, 2008. http://voices.washingtonpost.com/securityfix/2008/11/spam volumes drop by 23 after.html.

[13] B. Krebs. Mariposa Botnet Authors May Avoid Jail Time,2010. http://krebsonsecurity.com/2010/03/mariposa-botnet-authors-may-avoid-jail-time/.

[14] R. McMillan. After takedown, botnet-linked ISP Troyakresurfaces, 2010. http://www.computerworld.com/s/article/9169118/After takedown botnet linked ISP Troyak resurfaces.

[15] Microsoft Corporation. Microsoft Corporation v. Peng Yong et. al.2012. Virginia Eastern District Court.

[16] Microsoft Corporation. Microsoft Corporation v. John Does 1-82.2013. United States District Court for the Western District of NorthCarolina Charlotte Division.

[17] Microsoft Corporation. Microsoft Corporation v. Naser Al Mutairi,an individual; Mohamed Benabdellah, an individual; VitalwerksInternet Solutions, LLC, d/b/a NO-IP.com; and Does 1-500. 2014.United States District Court District of Nevada.

[18] M. Mueller. Ruling the root. The MIT Press, 2004.

[19] Y. Nadji, M. Antonakakis, R. Perdisci, D. Dagon, and W. Lee.Beheading Hydras: Performing Effective Botnet Takedowns. InProceedings of the 20th ACM Conference on Computer and Communi-

cations Security (CCS), 2013.[20] Y. Nadji, M. Antonakakis, R. Perdisci, and W. Lee. Understanding

the prevalence and use of alternative plans in malware withnetwork games. In Proceedings of the 27th Annual Computer Security

Applications Conference (ACSAC ’11), 2011.[21] Natalie Goguen. No-IP’s Formal Statement on Microsoft Take-

down. http://www.noip.com/blog/2014/06/30/ips-formal-statement-microsoft-takedown/, 2014.

[22] C. Rossow, D. Andriesse, and T. Werner. P2PWNED: Modelingand Evaluating the Resilience of Peer-to-Peer Botnets. In Proceed-

ings of the 34th IEEE Symposium on Security and Privacy (S&P 2013),2013.

[23] U.S. Attorney’s Office - Southern District of New York.Manhattan U.S. Attorney Charges Seven Individuals forEngineering Sophisticated Internet Fraud Scheme, 2011.http://www.fbi.gov/newyork/press-releases/2011/manhattan-u.s.-attorney-charges-seven-individuals-for-engineering-sophisticated-internet-fraud-scheme-that-infected-millions-of-computers-worldwide-and-manipulated-internet-advertising-business.

[24] VirusTotal. VirusTotal Intelligence. https://www.virustotal.com/en/documentation/private-api/.

[25] P. Wang, S. Sparks, and C. C. Zou. An Advanced Hybrid Peer-to-Peer Botnet. Dependable and Secure Computing, IEEE Transactions

on, 7(2):113–127, 2010.[26] K. Xu, P. Butler, S. Saha, and D. Yao. DNS for Massive-Scale

Command and Control. Dependable and Secure Computing, IEEE

Transactions on, 10(3):143–153, 2013.

Yacin Nadji Dr. Yacin Nadji is a Post-Doctoralresearcher at the Georgia Institute of Technologyunder the supervision of Dr. Manos Antonakakis.He received his Ph.D. in Computer Science fromthe Georgia Institute of Technology as a memberof the Georgia Tech Information Security Centerand was co-advised by Drs. Manos Antonakakisand Wenke Lee. He has also worked as a Re-search Contractor at Damballa, Inc.

Roberto Perdisci Dr. Roberto Perdisci is an As-sociate Professor in the Computer Science de-partment at the University of Georgia, an AdjunctAssistant Professor in the Georgia Tech Schoolof Computer Science, and a faculty member ofthe UGA Institute for Artificial Intelligence. Be-fore joining UGA he was Post-Doctoral Fellow atthe College of Computing of the Georgia Instituteof Technology, working under the supervision ofProf. Wenke Lee. He also worked as PrincipalScientist at Damballa, Inc., and prior to joining

Damballa he was Research Scholar at the Georgia Tech InformationSecurity Center and PhD candidate at the University of Cagliari, Italywith the Pattern Recognition and Applications Group.

Manos Antonakakis Dr. Manos Antonakakis isan assistant professor in the School of Electricaland Computer Engineering (ECE), and adjunctfaculty in the College of Computing (CoC), at theGeorgia Institute of Technology. Before joiningthe ECE faculty, Professor Antonakakis held theChief Scientist role at Damballa, where he wasresponsible for advanced research projects, uni-versity collaborations, and technology transferefforts. He currently serves as the co-chair of theAcademic Committee for the Messaging Anti-

Abuse Working Group (MAAWG).

still beheading hydras: botnet takedowns then and...

Documents