cs 5410 - computer and network security: malware and botnets · 2015-12-09 · final posters...

Southeastern Security for Enterprise and Infrastructure (SENSEI) Center

CS 5410 - Computer and Network Security: Malware and Botnets

Professor Kevin ButlerFall 2015


Final Posters• Final posters are your chance to show myself and your peers

the excellent work you’ve done this semester.• An opportunity!

• What should be included in a good poster?• I suggest arranging areas much like you would if you were

writing a full paper for the class.• You are going to need to show results (e.g., graphs,

tables, etc)• In addition to presenting them, all

posters will be turned in (as a single PDF per group), as will all code.

• Practice your elevator pitch!

2


Story

3


Malware• Software with “malicious intentions” is generally

categorized as malware.

• First proposed in 1949 in John von Neumann’s “Theory of self-reproducing automata”

• A theoretical treatise on code that could reproduce itself.

• Countless real examples have followed:

• The Morris Worm(1988), Michelangelo Virus (1991), Code Red Worm (2001), SQL Slammer (2003), Zeus Trojan (2007)

4


Evolution of Malware• Malware is generally classified into these categories:

• Virus - generally included as part of an executable file, requires some assistance to infect.

• Worm - similar to a virus, able to self propagate.

• Trojan - infected software, generally do not spread.

• These are not “hard and fast” rules.

5


Ransomware• New twist on malware: extort the user by

encrypting all of their files and demanding a ransom

• Helpful: telephone support for gettingyour credit carddetails

6

TOR hidden service


Detection and Evasion• Malware is most often detected statically:

• MD5/SHA256 hashes are commonly used in commercial AVs

• Tactics to evade such detection have become commonplace:

• Encrypted Malware: Virus is encrypted, and each instance is encrypted with a different key.

• Polymorphic Malware: Encrypted, but the decryption routine is modified in every instance.

• Metamorphic Malware: Everything is entirely rewritten.

• Where does the arms race go from here?

7


• A botnet is a network of software robots (bots) run on compromised machines which are administered by command and control (C&C) networks.

‣ Bot master - the owner/controller of a botnet

• What is the advantage to this approach over the others?

Botnets

8


• Worms, Tojan horses, backdoors, browser-bugs, etc...

• Note: the software on these systems is updated• Bot theft: bot controllers penetrate/"steal" bots.

Infection

9


Statistics (controversial)• The actual number of bots, the size of the

botnets and the activity is highly controversial.• As of 2012: millions of bots

• 1/4 of hosts are now part of bot-nets

• Growing fast (many more bots)

• Assertion: botnets are getting smaller(?!?)• When they become large, they are more likely

to be to to be noticed and targeted for takedown.

10


Botnet Architecture• An army of compromised hosts (“bots”) coordinated via a

command and control center (C&C).

“A botnet is comparable to compulsory military service for windows boxes”-- Bjorn Stromberg

11


Typical Botnet

12

1. Compromise

2. Download

3. DNS Lookup

4. Join

5. Command


IRC• 1988 - one-to-many or many-to-many chat (for BBS)• Client/server -- TCP Port 6667• Used to report on 1991 Soviet coup attempt• Channels (sometimes password protected) are used

to communicate between parties.• Invisible mode (no list, not known)• Invite only (must be invited to participate)

• Botnets rarely rely on IRC anymore.• Many ISPs block IRC these days.

13

Server

Server

Server

Server

Server


P2P Botnets• Bots that rely on centralized communications

mechanisms such as IRC are generally easy to attack.

• Single point of failure for the bad guys...

• Increasingly, botnets have turned to P2P-based architectures to avoid such weaknesses.

• e.g., Slapper, Phatbot, Conficker

• What are the challenges for a botmaster relying on a P2P architecture?

14


P2P Botnets

• What advantages do defenders have in this situation?

• How do communication patterns compare to IRC bots?

• How do you tell between “legitimate” P2P traffic and that associated with bots?

15


Wireless/Mobile

• Mobile devices offer new avenues for botnets.

• With the ability to communicate over multiple (5) interfaces, how does a provider defend against such multi-homed botnets?

• How does this change the game in terms of communications strategies for botmasters?

16

0

1 2 3


Campaign: DDoS• Distributed Denial of Service (DDoS)

• With hundreds of thousands of malicious devices under their control, a botmaster can unleash massive torrents of traffic at a target.

• Examples: Unknown vs Estonia, Russia/Georgia, Anonymous vs Scientology, Unknown vs CNN, Unknown vs ...

• What’s the advantage of doing this from a botnet?

17


Stuxnet?• What was Stuxnet?

• Classification?

• What was its goal?

• How did it try to do this?

• How was it delivered?

• Was it effective?

18


How are researchers learning?• Honeypots are often used to attract, observer and eventually

“dissect” bots.

• A number of recent efforts in this space have actually hijacked active botnets.

• Large portions of these networks have been monitored:

• ... to learn about the targets of the botnet (and their success in exploiting them).

• ... to learn about weaknesses in their architecture to use as a means of potentially interfering with the botnet.

• ... to figure out whether deployed defenses are helping at all.

19


Campaign: Spam• Spam: Unsolicited mass emailing, generally attempting to

advertise a product (legitimate or otherwise).

• In the past, has been as high as 90+% of email by volume.

• Approximately 72% in 2014.

• This is an economic problem... why?

• Botnets are an excellent platform for spam campaigns.

• Massive bandwidth for sending messages

• Many locations for hosting infrastructure.

20


Spamalytics• Very little was previously known about the

conversion rate of spam.

• Why not?

• Methodology: Hijack a botnet, watch what happens.

• Good methodology?

• Issues?

21


Spamalytics (cont)

• What was learned?

• What can we do in terms of defense?

• Click Trajectories: End-to-End Analysis of the Spam Value Chain, K. Levchenko, et al.,Proceedings of the IEEE Symposium and Security and Privacy, May 2011

22

2e+04 1e+05 5e+05 2e+06 1e+07

5010

020

050

010

0020

00

Number of Email Targets

Num

ber o

f Res

pond

ers

IND

USAFRA

POLRUSCHN GBR

BRAMYS CANTUR

BGR KOR DEUUKR JPNAUS TWNCZETHASAUEGY ZAFITAISRHUNPAK ROM MEX NLDARGCHL ESPHKGSGPAUTCHE

SWE

Figure 10: Volume of e-mail targeting (x-axis) vs. responses (y-axis) for the most prominent country-code TLDs. The x and y

axes correspond to Stages A and D in the pipeline (Figure 6),respectively.

x-axis greater than zero), while others appear insensitive to black-listing (those lying on the diagonal). Since points lie predominantlybelow the diagonal, we see that either blacklisting or some othereffect related to sustained spamming activity (e.g., learning con-tent signatures) diminishes the delivery rate seen at most domains.Delisting followed by relisting may account for some of the spreadof points seen here; those few points above the diagonal may sim-ply be due to statistical fluctuations. Finally, the cloud of pointsto the upper right indicates a large number of domains that are notmuch targeted individually, but collectively comprise a significantpopulation that appears to employ no effective anti-spam measures.

7. CONVERSION ANALYSISWe now turn to a preliminary look at possible factors influenc-

ing response to spam. For the present, we confine our analysis tocoarse-grained effects.

We start by mapping the geographic distribution of the hoststhat “convert” on the spam campaigns we monitored. Figure 9maps the locations of the 541 hosts that execute the emulated self-propagation program, and the 28 hosts that visit the purchase pageof the emulated pharmacy site. The map shows that users aroundthe world respond to spam.

Figure 10 looks at differences in response rates among nationsas determined by prevalent country-code e-mail domain TLDs. Toallow the inclusion of generic TLDs such as .com, for each e-mailaddress we consider it a member of the country hosting its mailserver; we remove domains that resolve to multiple countries, cat-egorizing them as “international” domains. The x-axis shows thevolume of e-mail (log-scaled) targeting a given country, while they-axis gives the number of responses recorded at our Web servers(also log-scaled), corresponding to Stages A and D in the pipeline(Figure 6), respectively. The solid line reflects a response rate of10�4 and the dashed line a rate of 10�3. Not surprisingly, wesee that the spam campaigns target e-mail addresses in the United

2e−04 5e−04 1e−03 2e−03 5e−03 1e−02

5e−0

52e−0

45e−0

42e−0

3

Response Rate for Self−prop Email

Resp

onse

Rat

e fo

r Pha

rmac

y Em

ail

USA

IND

FRA POLCHN

GBR

CAN

RUS

BRA

AUS

DEU

MYS

ZAF

KOR

THA

JPN

SAU

BGR

TUR

ITA

CZE

UKREGY

NLD

ISRROM

PAK

TWN

PHLVNMHUN

MEXCHL

ARG

Figure 11: Response rates (stage D in the pipeline) by TLD forexecutable download (x-axis) vs. pharmacy visits (y-axis).

States substantially more than any other country. Further, India,France and the United States dominate responses. In terms of re-sponse rates, however, India, Pakistan, and Bulgaria have the high-est response rates than any other countries (furthest away from thediagonal). The United States, although a dominant target and re-sponder, has the lowest resulting response rate of any country, fol-lowed by Japan and Taiwan.

However, the countries with predominant response rates do notappear to reflect a heightened interest in users from those countriesin the specific spam offerings. Figure 11 plots the rates for themost prominent countries responding to self-propagation vs. phar-macy spams. The median ratio between these two rates is 0.38(diagonal line). We see that India and Pakistan in fact exhibit al-most exactly this ratio (upper-right corner), and Bulgaria is not farfrom it. Indeed, only a few TLDs exhibit significantly differentratios, including the US and France, the two countries other thanIndia with a high number of responders; users in the US respondto the self-propagation spam substantially more than pharmaceuti-cal spam, and vice-versa with users in France. These results sug-gest that, for the most part, per-country differences in response rateare due to structural causes (quality of spam filtering, general useranti-spam education) rather than differing degrees of cultural or na-tional interest in the particular promises or products conveyed bythe spam.

8. CONCLUSIONSThis paper describes what we believe is the first large-scale quan-

titative study of spam conversion. We developed a methodologythat uses botnet infiltration to indirectly instrument spam e-mailssuch that user clicks on these messages are taken to replica Websites under our control. Using this methodology we instrumentedalmost 500 million spam messages, comprising three major cam-paigns, and quantitatively characterized both the delivery processand the conversion rate.

We would be the first to admit that these results represent a sin-gle data point and are not necessarily representative of spam as a

Bank Name BIN Country Affiliate Programs

Azerigazbank 404610 Azerbaijan GlvMd, RxPrm, PhEx, Stmul, RxPnr, WldPhB&N 425175 Russia ASRB&S Card Service 490763 Germany MaxGmBorgun Hf 423262 Iceland TrustCanadian Imperial Bank of Commerce 452551 Canada WldPhCartu Bank 478765 Georgia DrgRevDnB Nord (Pirma) 492175 Latvia Eva, OLPh, USHCLatvia Savings 490849 Latvia EuSft, OEM, WchSh, Royal, SftSlLatvijas Pasta Banka 489431 Latvia SftSlSt. Kitts & Nevis Anguilla National Bank 427852 St. Kitts & Nevis DmdRp, VgREX, Dstn, Luxry, SwsRp, OneRpState Bank of Mauritius 474140 Mauritius DrgRevVisa Iceland 450744 Iceland StalnWells Fargo 449215 USA GreenWirecard AG 424500 Germany ClFr

Table V: Merchant banks authorizing or settling transactions for spam-advertised purchases, their Visa-assigned Bank Identification Number(BIN), their location, and the abbreviation used in Table IV for affiliate program and/or store brand.

programs. For 50% of the affiliate programs, their domains,name servers, and Web servers are distributed over just 8%or fewer of the registrars and ASes, respectively; and 80%of the affiliate programs have their infrastructure distributedover 20% or fewer of the registrars and ASes. Only a handfulof programs, such as EvaPharmacy, Pharmacy Express, andRX Partners, have infrastructure distributed over a largepercentage (50% or more) of registrars and ASes.

To summarize, there are a broad range of registrars andISPs who are used to support spam-advertised sites, but thereis only limited amounts of organized sharing and differ-ent programs appear to use different subsets of availableresource providers.15

B. Realization

Next, we consider several aspects of the realizationpipeline, including post-order communication, authorizationand settlement of credit card transactions, and order fulfill-ment.

We first examined the hypothesis that realization in-frastructure is the province of affiliate programs and notindividual affiliates. Thus, we expect to see consistency inpayment processing and fulfillment between different in-stances of the same affiliate program or store brand. Indeed,we found only two exceptions to this pattern and purchasesfrom different sites appearing to represent the same affiliateprogram indeed make use of the same merchant bank and

15We did find some evidence of clear inter-program sharing in the formof several large groups of DNS servers willing to authoritatively resolvecollections of EvaPharmacy, Mailien and OEM Soft Store domains forwhich they were outside the DNS hierarchy (i.e., the name servers werenever referred by the TLD). This overlap could reflect a particular affiliateadvertising for multiple distinct programs and sharing resources internallyor it could represent a shared service provider used by distinct affiliates.

same pharmaceutical drop shipper.16 Moreover, key cus-tomer support features including the email templates andorder number formats are consistent across brands belongingto the same program. This allowed us to further confirm ourunderstanding that a range of otherwise distinct brands allbelong to the same underlying affiliate program, includingmost of the replica brands: Ultimate Replica, DiamondReplicas, Distinction Replica, Luxury Replica, One Replica,Exquisite Replicas, Prestige Replicas, Aff. Accessories; mostof the herbal brands: MaxGentleman, ManXtenz, Viagrow,Dr. Maxman, Stud Extreme, VigREX; and the pharmacy:US HealthCare.17

Having found strong evidence supporting the dominanceof affiliate programs over free actors, we now turn to thequestion how much realization infrastructure is being sharedacross programs.

Payment: The sharing of payment infrastructure is sub-stantial. Table V documents that, of the 76 purchases forwhich we received transaction information, there were only13 distinct banks acting as Visa acquirers. Moreover, thereis a significant concentration even among this small setof banks. In particular, most herbal and replica purchasescleared through the same bank in St. Kitts (a by-product ofZedCash’s dominance of this market, as per the previousdiscussion), while most pharmaceutical affiliate programsused two banks (in Azerbaijan and Latvia), and softwarewas handled entirely by two banks (in Latvia and Russia).

Each payment transaction also includes a standardized“Merchant Category Code” (MCC) indicating the type ofgoods or services being offered [52]. Interestingly, mostaffiliate program transactions appear to be coded correctly.

16In each of the exceptions, at least one order cleared through a differentbank—perhaps because the affiliate program is interleaving payments acrossdifferent banks, or (less likely) because the store “brand” has been stolen,although we are aware of such instances.

17This program, currently called ZedCash, is only open by invitation andwe had little visibility into its internal workings for this paper.

Supplier Item Origin Affiliate Programs

Aracoma Drug Orange bottle of tablets (pharma) WV, USA ClFrCombitic Global Caplet Pvt. Ltd. Blister-packed tablets (pharma) Delhi, India GlvMdM.K. Choudhary Blister-packed tablets (pharma) Thane, India OLPhPPW Blister-packed tablets (pharma) Chennai, India PhEx, Stmul, Trust, ClFrK. Sekar Blister-packed tablets (pharma) Villupuram, India WldPhRhine Inc. Blister-packed tablets (pharma) Thane, India RxPrm, DrgRevSupreme Suppliers Blister-packed tablets (pharma) Mumbai, India EvaChen Hua Small white plastic bottles (herbal) Jiangmen, China StudEtech Media Ltd Novelty-sized supplement (herbal) Christchurch, NZ StalnHerbal Health Fulfillment Warehouse White plastic bottle (herbal) MA, USA EvaMK Sales White plastic bottle (herbal) WA, USA GlvMdRiverton, Utah shipper White plastic bottle (herbal) UT, USA DrMax, GrowGuo Zhonglei Foam-wrapped replica watch Baoding, China Dstn, UltRp

Table VI: List of product suppliers and associated affiliate programs and/or store brands.

For example, all of our software purchases (across allprograms) were coded as 5734 (Computer Software Stores)and 85% of all pharmacy purchases (again across programs)were coded as 5912 (Drug Stores and Pharmacies). ZedCashtransactions (replica and herbal) are an exception, beingsomewhat deceptive, and each was coded as 5969 (DirectMarketing—Other). The few other exceptions are eitherminor transpositions (e.g., 5921 instead of 5912), singletoninstances in which a minor program uses a generic code(e.g., 5999, 8999) with a bank that we only observed inone transaction, and finally Greenline which is the solepharmaceutical affiliate program that cleared transactionsthrough a US Bank during our study (completely miscodedas 5732, Electronic Sales, across multiple purchases). Thelatter two cases suggest that some minor programs with lessreliable payment relationships do try to hide the nature oftheir transactions, but generally speaking, category codingis correct. A key reason for this may be the substantialfines imposed by Visa on acquirers when miscoded merchantaccounts are discovered “laundering” high-risk goods.

Finally, for two of the largest pharmacy programs,GlavMed and RX–Promotion, we also purchased from“canonical” instances of their sites advertised on their onlinesupport forums. We verified that they use the same bank,order number format, and email template as the spam-advertised instances. This evidence undermines the claim,made by some programs, that spammers have stolen theirtemplates and they do not allow spam-based advertising.

Fulfillment: Fulfillment for physical goods was sourcedfrom 13 different suppliers (as determined by declaredshipper and packaging), of which eight were again seenmore than once (see Table VI). All pharmaceutical tabletsshipped from India, except for one shipped from withinthe United States (from a minor program), while replicasshipped universally from China. While we received herbalsupplement products from China and New Zealand, most (byvolume) shipped from within the United States. This resultis consistent with our expectation since, unlike the other

goods, herbal products have weaker regulatory oversight andare less likely to counterfeit existing brands and trademarks.For pharmaceuticals, the style of blister packs, pill shapes,and lot numbers were all exclusive to an individual nominalsender and all lot numbers from each nominal sender wereidentical. Overall, we find that only modest levels of suppliersharing between pharmaceutical programs (e.g., PharmacyExpress, Stimul-cash, and Club-first all sourced a particularproduct from PPW in Chennai, while RX–Promotion andDrugRevenue both sourced the same drug from Rhine Inc.in Thane). This analysis is limited since we only ordered asmall number of distinct products and we know (anecdotally)that pharmaceutical programs use a network of suppliers tocover different portions of their formulary.

We did not receive enough replicas to make a convincinganalysis, but all ZedCash-originated replicas were low-quality and appear to be of identical origin. Finally, pur-chased software instances were bit-for-bit identical betweensites of the same store brand and distinct across differentaffiliate programs (we found no malware in any of theseimages). In general, we did not identify any particularly clearbottleneck in fulfillment and we surmise that suppliers arelikely to be plentiful.

C. Intervention analysisFinally, we now reconsider these different resources in

the spam monetization pipeline, but this time explicitly fromthe standpoint of the defender. In particular, for any givenregistered domain used in spam, the defender may chooseto intervene by either blocking its advertising (e.g., filteringspam), disrupting its click support (e.g., takedowns for nameservers of hosting sites), or interfering with the realizationstep (e.g., shutting down merchant accounts).18 But whichof these interventions will have the most impact?

18In each case, it is typically possible to employ either a “takedown”approach (removing the resource comprehensively) or cheaper “blacklist-ing” approach at more limited scope (disallowing access to the resourcefor a subset of users), but for simplicity we model the interventions in thetakedown style.

Registrar

% o

f sp

am

0

20

40

60

80

100

−NauNet (RU)

−Beijing Innovative (CN)

−Bizcn.com (CN)

−China Springboard (CN)

−eNom (US)

1 2 5 10 20 50 100

AS serving Web/DNS

0

20

40

60

80

100

−C

hina

net (

CN

) −

Evo

lva

(RO

)

−Evolva (RO)

−VLineTelecom (UA)

1 2 5 10 20 50 100 500

TargetDNS serverWeb server

Bank

0

20

40

60

80

100

Azerigazbank

Saint Kitts

DnB Nord

Latvia Savings

B + NB + S

Wells Fargo

Visa Iceland

Wirecard

Borgun Hf

State Mauritius

Cartu Bank

Latvijas Pasta

Figure 5: Takedown effectiveness when considering domain registrars (left), DNS and Web hosters (center) and acquiring banks (right).

or weeks). Even for so-called third-party accounts (wherebya payment processor acts as middleman and “fronts” for themerchant with both the bank and Visa/Mastercard) we havebeen unable to locate providers willing to provide operatingaccounts in less than five days, and such providers havesignificant account “holdbacks” that they reclaim when thereare problems.21 Thus, unlike the other resources in the spamvalue chain, we believe payment infrastructure has far feweralternatives and far higher switching cost.

Indeed, our subsequent measurements bear this out. Forfour months after our study we continued to place ordersthrough the major affiliate programs. Many continued touse the same banks four months later (e.g., all replica andherbal products sold through ZedCash, all pharmaceuticalsfrom Online Pharmacy and all software from Auth. Soft.Resellers). Moreover, while many programs did change(typically in January or February 2011), they still stayedwithin same set of banks we identified earlier. For exam-ple, transactions with EvaPharmacy, Greenline, and OEMSoft Store have started clearing through B&N Bank inRussia, while Royal Software, EuroSoft and Soft Sales,have rotated through two different Latvian Banks and B& S Card Service of Germany. Indeed, the only new bankappearing in our follow-on purchases is Bank Standard(a private commercial bank in Azerbaijan, BIN 412939);RX–Promotion, GlavMed, and Mailien (a.k.a. PharmacyExpress) all appear to have moved to this bank (fromAzerigazbank) on or around January 25th. Finally, oneorder placed with DrugRevenue failed due to insufficientfunds, and was promptly retried through two different banks(but again, from the same set). This suggests that whilecooperating third-party payment processors may be able toroute transactions through merchant accounts at difference

21To get a sense of the kinds of institutions we examined, considerthis advertisement of one typical provider: “We have ready-made shellcompanies already incorporated, immediately available.”

banks, the set of banks currently available for such activitiesis quite modest.

D. Policy optionsThere are two potential approaches for intervening at

the payment tier of the value chain. One is to directlyengage the merchant banks and pressure them to stop doingbusiness with such merchants (similar to Legitscript’s rolewith registrars [25], [28]). However, this approach is likelyto be slow—very likely slower than the time to acquirenew banking facilities. Moreover, due to incongruities inintellectual property protection, it is not even clear that thesale of such goods is illegal in the countries in which suchbanks are located. Indeed, a sentiment often expressed inthe spammer community, which resonates in many suchcountries, is that the goods they advertise address a realneed in the West, and efforts to criminalize their actions aremotivated primarily by Western market protectionism.

However, since spam is ultimately supported by Westernmoney, it is perhaps more feasible to address this problemin the West as well. To wit, if U.S. issuing banks (i.e.,banks that provide credit cards to U.S. consumers) were torefuse to settle certain transactions (e.g., card-not-presenttransactions for a subset of Merchant Category Codes) withthe banks identified as supporting spam-advertised goods,then the underlying enterprise would be dramatically de-monetized. Furthermore, it appears plausible that such a“financial blacklist” could be updated very quickly (drivenby modest numbers of undercover buys, as in our study) andfar more rapidly than the turn-around time to acquire newbanking resources—a rare asymmetry favoring the anti-spamcommunity. Furthermore, for a subset of spam-advertisedgoods (regulated pharmaceuticals, brand replica products,and pirated software) there is a legal basis for enforcing sucha policy.22 While we suspect that the political challenges for

22Herbal products, being largely unregulated, are a more complex issue.

http://cseweb.ucsd.edu/~savage/papers/Oakland11.pdf

http://cseweb.ucsd.edu/~klevchen/


Campaign: Click Fraud• Click fraud is the revenue generated from clicking on

paid-advertising links automatically, without any user desire or interest.

• Who are the adversaries here and what are they after?

• Publisher (revenue)

• Competitor (cost)

• Why are botnets used as part of these campaigns?

23


So What Do We Do?• Given the magnitude of this problem, how

do we fight it?

• We have area and problem... Think about solution and methodology!

• There are two places from which we can try to combat bots:

• Local Network

• At or above the ISP level

24


BotHunter : IDS Dialog Correlation• Simple Approach: Why not just use an IDS

looking for a single signature?

• Detection need not be based on a single event.

• Knowing something about the structure of communication can potentially help us find our bot.

• So how do they do it?

25


Circle of Life

• Bots follow a very regular pattern: Scan, Infect, “Egg” Download, Communicate (C&C), Action.

• Why does this reduce false positives?

26

C&C Server

DCERPC exploit (port 135)DCERPC Exploit (port 135)

Egg download

TCP Connections: 2745/Beagle; 135,1025/DCOM1,2; 445/NetBIOS3127/MyDoom; 6129/Dameware; 139/NetBIOS; 5000/UPNP

TCP 2745,135,1025,445,3127,6129,5000Outbound Scanning

IRC connection (port 6668)

Attacker

3Victim

Opens backdoorport 17509

1

5

2

4

Figure 1: Phatbot Dialog Summary

E2: InboundInfection

E1: InboundScan

E3: EggDownload

E5: Outbound

Scan

E4: C&CCommunica-

tions

V-to-A

V-to-CV-to-*Type I

V-to-*Type II

A-to-V

A-to-V

Figure 2: Bot Infection Dialog Model

loads and instantiates a full malicious binary instance ofthe bot (E3). Once the full binary instance of the bot is re-trieved and executed, our model accommodates two po-tential dialog paths, which Rajab et al. [33] refer to as thebot Type I versus Type II split. Under Type II bots, theinfected host proceeds to C&C server coordination (E4)before attempting self-propagation. Under a Type I bot,the infected host immediately moves to outbound scan-ning and attack propagation (E5), representing a classicworm infection.We assume that bot dialog sequence analysis must be

robust to the absence of some dialog events, must al-low for multiple contributing candidates for each of thevarious dialog phases, and must not require strict se-quencing on the order in which outbound dialog is con-ducted. Furthermore, in practice we have observed thatfor Type II infections, time delays between the initial in-fection events (E1 and E2) and subsequent outbound di-alog events (E3, E4, and E5) can be significant—on theorder of several hours. Furthermore, our model must berobust to failed E1 and E2 detections, possibly due to in-sufficient IDS fidelity or due to malware infections thatoccur through avenues other than direct remote exploit.One approach to addressing the challenges of se-

quence order and event omission is to use a weightedevent threshold system that captures the minimum nec-essary and sufficient sparse sequences of events underwhich bot profile declarations can be triggered. For ex-ample, one can define a weighting and threshold schemefor the appearance of each event such that a minimumset of event combinations is required before bot detec-tion. In our case, we assert that bot infection declarationrequires a minimum ofCondition 1: Evidence of local host infection (E2),

AND evidence of outward bot coordination or attackpropagation (E3-E5); orCondition 2: At least two distinct signs of outward

bot coordination or attack propagation (E3-E5).

In our description of the BotHunter correlation en-gine in Section 4, we discuss a weighted event thresholdscheme that enforces the above minimum requirementfor bot declaration.

4 BotHunter: System DesignWe now turn our attention to the design of a passive mon-itoring system capable of recognizing the bidirectionalwarning signs of local host infections, and correlatingthis evidence against our dialog infection model. Oursystem, referred to as BotHunter, is composed of a trioof IDS components that monitor in- and out-bound traf-fic flows, coupled with our dialog correlation engine thatproduces consolidated pictures of successful bot infec-tions. We envision BotHunter to be located at the bound-ary of a network, providing it a vantage point to observethe network communication flows that occur between thenetwork’s internal hosts and the Internet. Figure 3 illus-trates the components within the BotHunter package.Our IDS detection capabilities are composed on top

of the open source release of Snort [35]. We take fulladvantage of Snort’s signature engine, incorporating anextensive set of malware-specific signatures that we de-veloped internally or compiled from the highly activeSnort community (e.g., [10] among other sources). Thesignature engine enables us to produce dialog warningsfor inbound exploit usage, egg downloading, and C&Cpatterns, as discussed in Section 4.1.3. In addition, wehave developed two custom plugins that complement theSnort signature engine’s ability to produce certain dialogwarnings. Note that we refer to the various IDS alarmsas dialog warnings because we do not intend the individ-ual alerts to be processed by administrators in search ofbot or worm activity. Rather, we use the alerts producedby our sensors as input to drive a bot dialog correlationanalysis, the results of which are intended to capture andreport the actors and evidence trail of a complete bot in-fection sequence.


Simple Bayesian Calculation• Just as an intuition...

• What is the probability of a false positive in a system?

• P(I|A) = 0.001

• If we rely upon multiple independent indicators that are correlated in time:

• P(I|A) * P(I|A)’ * P(I|A)’’ * P(I|A)’’ * ... P(I|A)’n

• We can reduce the number of false positives by not simply looking for single events.

27


Components• BotHunter relies on SCADE and SLADE

• Inbound and outbound traffic scanning for phases 1 and 5

• Find suspicious payloads in intervening phases.

• Deployments:

• Georgia Tech - four month deployment

• SRI - one month deployment

28


Results• True Positives:

• Deploy 10 bots in a virtual network (Phatbot, RxBot, GTBot)

• Overlay it with GT traffic.

• False Positives:

• GT - Less than 1 per month

• SRI - 1 in a single month

• Assumptions? Weaknesses?

29


From the Network - DNS• Is this enough?

• What about all the networks that don’t deploy BotHunter?

• What about going after DNS instead?

30

3. DNS Lookup


From DNS - (Fast) Flux

• A botnet with a single IP address is easy to shut down.

• In response, many bots use Dynamic DNS and quickly move their hosting infrastructure between many IP addresses.

• What can be done now?31


Domain Generation Algorithms• To prevent takedown, bots can change the

C&C domain they speak to each day.

• Ok, great. How do we coordinate this?

• HMAC(k,currentdomain) + .com/.org/.net

• Is random good enough?

32


Summary• Botnets represent the current pinnacle of malware

evolution.

• They can be reprogrammed infinitely! This makes them incredibly valuable for many kinds of attacks.

• Where are they not valuable?

• Techniques to identify and shut them down vary:

• Organization: Detect the life-cycle.

• ISP: Watch for DNS use, try and determine DGAs.

33

cs 5410 - computer and network security: malware and botnets · 2015-12-09 · final posters...

Documents