chasing web-based malware
DESCRIPTION
TRANSCRIPT
Chasing web-‐based malware
Marco Cova [email protected]
Who am I?
• Lecturer in Computer Security at the University of Birmingham, UK
• Member of the founding team of Lastline, Inc.
• Research interests: – Malware analysis – Vulnerability analysis
WEB MALWARE
Web-‐based malware
evil.js
GET /
<iframe>
Malicious code
Exploit
Social Engineering
Not really LinkedIn
Social Malware
Blackhat SEO
Watering Hole AUacks
• SomeVmes it is difficult to exploit the target of an aUack directly – Instead compromise a site that
is likely to be visited by the target
• Council on foreign relaVons → governmental officials
• Unaligned Chinese news site → Chinese dissidents
• iPhone dev web site → developers at Apple, Facebook, TwiUer, etc.
• NaVon Journal web site → PoliVcal insiders in Washington
CHASING WEB MALWARE Oracles, Filters, Seeders, AnV Evasions
Oracle
• EssenVally, a classificaVon algorithm for web content – Input: web page – Output: classificaVon (malicious or benign)
• In pracVce, it is useful to extract and provide users with evidence to support classificaVon – Exploit detecVon – DeobfuscaVon results – Anything that helps forensics, really
Oracle approaches
• Nowadays, most oracles are dynamic analysis systems – We care about the behavior of a sample/web page/document
• Run a sample/visit a web page inside an instrumented environment and monitor its behavior
• Bypass all obfuscaVon/feasibility concerns associated with staVc analysis
• Opens up a lot of interesVng challenges related to transparency and evasion
Wepawet
• Detec3on and Analysis of Drive-‐by-‐Download ABacks and Malicious JavaScript Code Marco Cova, Christopher Kruegel, Giovanni Vigna in Proceedings of the World Wide Web Conference (WWW), Raleigh, NC, April 2010
• hUp://wepawet.cs.ucsb.edu • By the numbers: – Number of unique IPs that submiUed to Wepawet: 141,463
– Number of pages visited and analyzed by Wepawet: 67,424,459
– Number of malicious pages idenVfied as malicious: 2,239,335
Wepawet Features
• Exploit preparaVon – Number of bytes allocated
(heap spraying) – Number of likely shellcode
strings
• Exploit aUempt – Number of instanVated
plugins and AcVveX controls
– Values of aUributes and parameters in method calls
– Sequences of method calls
• RedirecVons and cloaking – Number and target of
redirecVons – Browser personality-‐ and
history-‐based differences
• ObfuscaVon – String definiVons/uses – Number of dynamic code
execuVons – Length of dynamically-‐
executed code
Filter
• If everything goes well, amer a while we will have more samples/pages than you can analyze in-‐depth with your oracle
• Analysis Vme ranges from a few seconds to a couple of minutes – Oracle actually runs the sample – SomeVmes mulVple Vmes (anV-‐evasion techniques)
• Challenge: how do we scale?
StaVc filtering
• Quick idenVficaVon of drive-‐by-‐download web pages – Each web page is deemed likely benign or likely malicious
• Basis for the classificaVon is a set of staVc features
• Necessarily more imprecise than oracle – We only worry about not having false negaVves – Very tolerant with false posiVves (consequence: more work for our oracle)
Prophiler
• Filter for malicious web pages • Prophiler: a Fast Filter for the Large-‐Scale Detec3on of Malicious Web Pages, Davide Canali, Marco Cova, Christopher Kruegel, Giovanni Vigna in Proceedings of the Interna=onal World Wide Web Conference (WWW), 2011
StaVc features
• We define three classes of features (77 in total) – HTML (19)
• source: web page content – JavaScript (25)
• source: web page content – URL and host-‐based (33)
• source: page URL and URLs included in the content
• One machine learning model for each feature class
Example features
HTML features • iframe tags, hidden elements, elements with a small area, script elements, embed and object tags, scripts with a wrong filename extension, out-‐of-‐place elements, included URLs, scripVng content percentage, whitespace percentage, meta refresh tags, double HTML documents, …
Matches
<div style="display:none"> <iframe src="http://biozavr.ru:8080/index.php" width=104 height=251 > </iframe></div>
<body><div id="DivID"> <script src='a2.jpg'></script> <script src='b.jpg'></script> <script src='url.jpg'></script> <script src='c.jpg'></script> <script src='d.jpg'></script> <script src='e.jpg'></script> <script src='f.jpg'></script>"</body>
EvaluaVon
• Large-‐scale evaluaVon of Prophiler
• 60 days of crawling + analysis
• 18,939,908 unlabeled pages
• 14.3% of pages flagged as suspicious and submiUed to Wepawet (13.7% FP)
• 85.7% load reducVon on Wepawet = saving more than 400 days of analysis!
Smart crawler
• How do we seed our oracle + filter • Obvious idea: crawling – Problem: toxicity of regular crawling is preUy low
– ObservaVon: crawling only as good as the iniVal seeds
• Challenge: can we find beUer seeds?
EvilSeed
• Guided search approach to increase toxicity of pages that are crawled
• Inputs: malicious web pages found in the past
• Output: set of (more likely malicious) web pages
• EVILSEED: A Guided Approach to Finding Malicious Web Pages, Luca Invernizzi, Stefano BenvenuV, Paolo Milani, Marco Cova, Christopher Kruegel, Giovanni Vigna, in Proceedings of the IEEE Symposium on Security and Privacy, 2012
Gadgets
Gadgets
• Links gadget (malware hub) • Content dorks gadget • SEO gadget • Domain registraVon gadget
• DNS queries gadget
AnV evasion
• At this point of the story, the bad guys will acVvely try to evade your system
• Lots of effort in designing evasion techniques – Analysis environment detecVon – User detecVon – Stalling
• Challenge: how do we detect if we are being evaded?
Revolver
• AssumpVon: aUackers are likely to take exisVng malicious samples/web pages and enhance them to add evasive code
• Idea: detect similar samples that are classified differently by the oracle
• Revolver: An Automated Approach to the Detec3on of Evasive Web-‐based Malware A. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, G. Vigna in Proceedings of the USENIX Security Symposium Washington, D.C. August 2013
Revolver
IF
VAR <= NUM
…
Oracle Web
IF
VAR <= NUM
…
Similarity computaVon {bi, mj}
Malicious evoluVon Data-‐dependency JavaScript infecVons Evasions
Pages ASTs Candidate pairs
…
…
Revolver
Terms Extractor
Malicious Pages
Feature Extractor
Public Portal
Crawler
C&C Site
Honeyclient Honeyclient Honeyclient
Wepawet
Clou
d
EvilSeed
hUp://www.easymoney.com hUp://cheapfarma.ru
hUp://rateyourcar.com hUp://nudecelebriVes.it
Prophiler
Benign Pages
Possibly Malicious Pages
Anubis
Exploit Site
Malicious Pages
Benign Pages
Threat Intel Block
Challenges
• Evasions – DetecVon – Bypass (when possible)
• Targeted aUacks • Defense/offense imbalance