malevolution : the evolution of evasive malware
DESCRIPTION
Malevolution : The Evolution of Evasive Malware . Giovanni Vigna Department of Computer Science University of California Santa Barbara http:// www.cs.ucsb.edu /~ vigna Lastline, Inc. http:// www.lastline.com. Well, I had it all planned out…. Until this guy came out with his story!. - PowerPoint PPT PresentationTRANSCRIPT
Malevolution: The Evolution of Evasive Malware
Giovanni VignaDepartment of Computer Science
University of California Santa Barbarahttp://www.cs.ucsb.edu/~vigna
Lastline, Inc.http://www.lastline.com
Well, I had it all planned out….
Until this guy came out with his story!
Malware can take many forms…
Who Is He?
• One of the top security researchers in Europe– Hire him!
• Came to Berlin’s airport• Guy told him he was in the right taxi line• ‘Hey you don’t have a display with the money’– Do not worry: The German government is creating a taxi-
tracking program based on GPS so that no taxi driver needs a billing device: awesome!!!
– Nick: GPS?!? Tracking!?! No money!?! Awesome!!!!• Scam cost Nick 200 Eur (normal charge would be 30)
The Taxi
The Taxi
Targeted Attacksand Cyberwarfare
!!!
Cyberattack (R)Evolution
Time
$$ Damage
Millions
Hundreds of Thousands
Thousands
Hundreds
Billions
Cybercrime
$$$Cybervandalism
#@!
Cyberattack (R)Evolution
Targeted attacks are mainstream news. Every week, new breaches are reported. In the last few months alone …
Nobody Is Safe…
Drive-by-download AttackID
/Pass
word
www.badware.comwww.semilegit.com
www.grayhat.com
www.evilbastard.com
www.bank.com
POST /update?id=5’,’<iframe>..’)--
<iframe src=“http://semilegit.com”height=“0” width=“0”></iframe>
Personal Data, Docs
Arms Race(s)
MaliciousBinary
ObfuscatedPolymorphic
MaliciousBinary Behavior-based
Anti-malware
sandboxEvasive
MaliciousBinary Signature-based
Anti-virus
MaliciousJavaScript
ObfuscatedPolymorphic
MaliciousJavaScript Behavior-based
Anti-malware
honeyclientEvasive
MaliciousJavaScriptSignature-based
Web Gateways
An Evasion Framework
Artifact,Provenance
Producer Consumer
Analysis System
TargetSystem
KnownMaliciousArtifacts,
Provenance
KnownBenign
Artifacts,Provenance
Activates
Executes/DisplaysLabels/Blocks
An Evasion Framework
Analysis System Target System Consumer
SPAM X N/A N/A
Phishing X N/A X
Social Engineering N/A N/A X
Malware Installs N/A (*) N/A X
Malicious Documents X X X
Malicious Web Pages X X N/A
Malicious Binaries X N/A N/A
(*) First downloader
PBKAC: Make the user smarter
• Evasion of the user good judgment• (SPAM: please don’t go!)• PHISHING: educate about provenance• MALWARE INSTALLS: educate about Fake AV, codecs– The “Can I haz kittens?” problem
• MALICIOUS DOC: don’t open (good luck with that)– Anything with “budget”, “salary”, etc. WILL BE OPENED
Harden The Target
• Evasion of the mechanisms to limit/control execution• Windows 2023 Ultimate Edition will be able to
identify things that just should not be executed• MS Office Professional 56.2 will actually prevent
documents from executing arbitrary code• Internet Explorer 23 will detect memory corruption
attacks
Analysis Systems
• Evasion of detection/labeling• Determine if an artifact is malicious based on
previous history• Leverage both static and dynamic analysis• Additional information can be leveraged if other
components need to be evaded as well
Evading Static Analysis
• Static analysis techniques can be evaded by making the (relevant) code unavailable– Packing– Delayed inclusion of code
• Static analysis techniques can be evaded by exploiting differences in the parsing capabilities of the target system vs. analysis system– Parsing the executable (target is OS)– Parsing the document (target is office application)
Evading Static Analysis
Source: Binary-Code Obfuscations in PrevalentPacker Tools, Tech Report,University of Wisconsin, 2012
Evading Dynamic Analysis
• Dynamic analysis techniques can be evaded by fingerprinting the environment (and not execute)– Detection of modified environment (instrumented libs)– Detection of specific HW/SW configurations
• Devices• Users• File names
Evading Dynamic Analysis
Evading Dynamic Analysis
• Dynamic analysis techniques can be evaded by exploiting differences in the execution capabilities of the target system vs. analysis system– Semantics (virtualization/emulation introduces
differences)– Speed (dynamic systems are usually slower)– Available resources (analysis has a finite, limited time)
• Sleeping• Stalling loops
– User activity monitoring
Evading Dynamic Analysis
• Dynamic evasion – stalling loops
Combating Evasion
• Static analysis– Use availability and parsing failures as a signal for
detection• Benign software is packed• Benign software is obfuscated• Artifacts are often generated in a benign, wrong way
– Modify the sample to make it harmless• Normalize• Remove functionality that cannot be analyzed• Might break functionality
Combating Evasion
• Dynamic analysis– Reduce differences between analysis and target
environment• Run on bare metal• Exploit hardware-supported virtualization• Use out-of-the-VM instrumentation
– Detect environment checks• Identify conditional execution based on triggers• Return non-static information about the environment
– Modify the sample to make it run• Multipath execution
Combating Evasion
• Exploit the characteristics of multiple evasions– Phishing pages need to evade detection from the analysis
system AND by the user• If the page does not look like the impersonated organization the
attack will fail– Malicious documents need to evade detection from the
analysis system, the target platform, AND the user• If the attachment does not look interesting it will not be activated
Why Do I Care?
Terms Extractor
MaliciousPages
FeatureExtractor
PublicPortal
Crawler
C&C Site
HoneyclientHoneyclientHoneyclient
Wepawet
Clou
d
EvilSeed
http://www.easymoney.comhttp://cheapfarma.ruhttp://rateyourcar.comhttp://nudecelebrities.it
Prophiler
BenignPages
PossiblyMalicious
Pages
Anubis
Exploit Site
MaliciousPages
BenignPages
ThreatIntel Block
A Few Stats
• ANUBIS
– Number of unique IPs that submitted to Anubis: 433,290
– Number of files analyzed by Anubis: 59,199,463 (unique files: 45,730,419)
– Registered users: 25,404
• WEPAWET
– Number of unique IPs that submitted to Wepawet: 141,463
– Number of pages visited and analyzed by Wepawet: 67,424,459
– Number of malicious pages identified as malicious: 2,239,335
An Example: Detecting Split Personalities
• Detect when a malware sample exhibits multiple personalities
• Signature based techniques are impractical• Behavioral based techniques seem more promising... – Different behaviors are reliable indicators for split
personalities
The Idea
• Definition:Two systems are execution equivalent if all programs start with the same initial state, and receive exactly the same inputs – “Initial state” means same OS components, memory and
registers are initialized with the same values– “Same inputs” means the access to disk, network, registry,
time, and IPC returns the same value• Hypothesis:
When a program is executed in two execution equivalent systems, it should exhibit the same behavior – “Same behavior” is output and sequence of system calls
Split Personalities
• A program that has different behavior on two execution-equivalent systems implies that:– Some instruction yielded some observable effects– The program used (intentionally or not) these effect to
follow a different execution path– This is likely the consequence of an attack based on CPU
semantics or timing• The hard part is providing exactly the same inputs…
– Efficient Detection of Split Personalities in Malware• Davide Balzarotti, Marco Cova, Christoph Karlberger, Christopher
Kruegel, Engin Kirda, Giovanni Vign in Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, CA, February 2010.
The Approach: Log and Replay
Reference System
Windows
Log Driver
(malware) sample
syscalllog
Analysis System
Windows
Replay Driver
(malware) sample
Split personlaity
Some Caveats
• Not everything can be replayed– Some operations have results that must be consistent with
the internal state of the operating system• Memory allocation
– Some operations use handles the were created by pass-through system calls
• The definition of “same behavior” needs to be relaxed to tolerate small, temporary deviations
Results
An Example:Wepawet and Revolver
• State-of-the-art in honeyclients– High-interaction honeyclients visit web pages and record
modifications to the underlying system (file system, registry, processes)
– Unexpected changes are attributed to attacks• Limitations– Defenders need to know in advance the components that
will be targeted by attacks– Configuration can be complex and incomplete
• Some of the vulnerable components are incompatible with each other
– Limited explanatory power
Wepawet• Characterizes the behavior of the browser as it visits web
pages– Monitors events that occur during visit– Characterizes properties of these events with features– Uses statistical models to determine if feature values are normal or
anomalous• In the training phase, learns the characteristics of benign pages• In the detection phase, flags as suspicious pages that result in
anomalous behavior– Detection and Analysis of Drive-by-Download Attacks and Malicious
JavaScript CodeMarco Cova, Christopher Kruegel, Giovanni Vigna in Proceedings of the World Wide Web Conference (WWW), Raleigh, NC, April 2010
Wepawet Features• Exploit preparation
– Number of bytes allocated (heap spraying)
– Number of likely shellcode strings
• Exploit attempt– Number of instantiated
plugins and ActiveX controls– Values of attributes and
parameters in method calls– Sequences of method calls
• Redirections and cloaking– Number and target of
redirections– Browser personality- and
history-based differences• Obfuscation
– String definitions/uses– Number of dynamic code
executions– Length of dynamically-
executed code
Wepawet Extensions
• PDF analyzer– Analyzes the JavaScript within PDF documents
• Flash component analyzer – Uses execution tracing to identify both malicious behavior
and other network endpoints • Java Applet analyzer– Uses execution tracing to identify known exploits
• Shellcode analyzer– Uses emulation to extract URLs pointing to additional
malware
0-day Detection• “Aurora” attack • 0-day exploit against IE6• Use-after-free vulnerability• Successfully compromised
Google and other companies
• Posted to Wepawet before having been made public
• Soon after incorporated into Metasploit
Practical Impact
• Routinely used for take-down requests and further analysis
• Used to generate blacklist of malicious sites
Impact on Attackers
40
Revolver: Detecting Evasions in Web-based Malware
• Providing an oracle available to the public has drawbacks– Malware can be tested before deployment
• Exploitation of discrepancies leads to failed detection– Revolver: An Automated Approach to the Detection of
Evasive Web-based MalwareA. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, G. Vigna in Proceedings of the USENIX Security Symposium Washington, D.C. August 2013
Evasion: Scope Handlingfunction foo() { ... //W6Kh6V5E4 is filled with non-alphanumeric data
Bm2v5BSJE="";W6Kh6V5E4 = W6Kh6V5E4.replace(/\W/g,Bm2v5BSJE);
... // W6Kh6V5E4 now contains valid JavaScript}
function foo(){...
var enryA = mxNEN+F7B07;F7B07 = eval;{}enryA = F7B07('enryA.rep' + 'lace(/\\W/g,CxFHg)');...
}
Evasion: Interpreter Idioms
OlhG='evil_code'wTGB4=evalwTGB4(OlhG)
OlhG='evil_code'wTGB4="this"["eval"] // Only works in Adobe’s JSwTGB4(OlhG)
Evasion: Exception Pathsfunction deobfuscate(){ ... // Define variable xorkey
// and compute its value for(...) { ... // XOR decryption with xorkey } eval(deobfuscated_string);}try { eval('deobfuscate();')}catch (e){ alert('err');}
function deobfuscate(){ try { ... // is variable xorkey defined? } catch(e){ xorkey=0; } ... // Compute value of xorkey VhplKO8 += 1; // throws exception first time for(...) { ... // XOR decryption with xorkey} eval(deobfuscated_string);}try { eval('deobfuscate();') } // 1st callcatch (e){ // Variable VhplKO8 is not defined try { VhplKO8 = 0; // define variable eval('deobfuscate();'); // 2nd call } catch (e){ alert('err'); }}
Evasion: Liberal Configurationvar nop="%uyt9yt2yt9yt2";var nop=(nop.replace(/yt/g,""));var sc0="%ud5db%uc9c9%u87cd...";var sc1="%"+"yutianu"+"ByutianD"+ ...;var sc1=(sc1.replace(/yutian/g,""));var sc2="%"+"u"+"54"+"FF"+
"%u"+"BE"+...+"A"+"8"+"E"+"E";var sc2=(sc2.replace(/yutian/g,""));var sc=unescape(nop+sc0+sc1+sc2);
try { new ActiveXObject("yutian");} catch (e) { var nop="%uyt9yt2yt9yt2"; var nop=(nop.replace(/yt/g,"")); var sc0="%ud5db%uc9c9%u87cd..."; var sc1="%"+"yutianu"+"ByutianD"+ ...; var sc1=(sc1.replace(/yutian/g,"")); var sc2="%"+"u"+"54"+"FF"+ "%u"+"BE"+...+"A"+"8"+"E"+"E"; var sc2=(sc2.replace(/yutian/g,"")); var sc=unescape(nop+sc0+sc1+sc2);}
Detecting Evasion: Challenges
• Code is obfuscated• Code is generated on-the-fly• Code might probe for arcane versions of a browser• Not all code changes are relevant
Revolver
IF
VAR <= NUM
…OracleWeb
IF
VAR <= NUM
…
Similaritycomputation {bi, mj}
Malicious evolutionData-dependencyJavaScript infectionsEvasions
Pages ASTs Candidate pairs
…
…
Optimizations
• The comparison step requires determining the edit distance between n benign scripts and m malicious scripts (which is usually infeasible)
• We eliminate duplicate ASTs• We compute sequence summaries, which are vectors
with the frequencies of the possible 88 operations• We extract the k nearest neighbors sequence
summaries and we apply the similarity over the associated ASTs
Classification• Data-dependency: categorizes script differences that are
associated with transforming data into code– Same packers usually produce different code: if generating code is
same and generated code is very different, do not flag as evasion• Injection: categorizes script differences that are due to
addition of code to a previously-benign script– Site gets compromised and attacker adds code to well-known
JavaScript libraries (e.g., jQuery)• Evasion: categorizes script differences that are mostly
composed of control-flow nodes added to the previously-malicious script– Control-flow decisions are made to avoid executing the malicious
functionality
Evaluation: Evasion
• Collected 6,468,623 pages, of which 265,692 malicious• Extracted 20,732,766 benign scripts, and 186,032
malicious scripts• Derived 705,472 unique ASTs and 55,701 malicious ASTs• For each benign AST, found ~70 malicious neighbors• Computed 208K candidate pairs
– 6,996 Injections (701 classes)– 101,039 Data dependencies (475 classes)– 4,147 Evasions (155 classes)– 2, 490 Evolutions (273 classes)
Limitations
• If we only see the evasive version of the code, we cannot detect it (and identify the evasion)
• This approach can only operate on client-side evasion• If an evasion is performed before upacking/eval-ing
of code, similarity to other malicious code cannot be computed– However, the attacker has to “expose” their evasion
technique, instead of hiding it in the malicious code
http://revolver.cs.ucsb.edu
• Revolver is a service accessible to the public– You need to be vetted to access the service
• We would like to make the evasion of the anti-evasion system harder
• Please sign up and let us know what you think!
http://revolver.cs.ucsb.edu
Conclusions
• Malicious code is in continuous evolution• Evasion of dynamic analysis-based detection has
become prevalent– Humans cannot keep up
• Next steps in the arms race:– Automatic detection of evasion attempts in binaries
• Possibly without re-execution– Automatic detection of evasion attempts in web-malware
• See revolver.cs.ucsb.edu– Automated evasion remediation
Questions?
EvilSeed• Challenge: Find the needle in the haystack• Approach: Search the web in a smart way • The goal of EvilSeed is to generate a URL input stream
with “high toxicity”• EvilSeed starts with a set of malicious web pages and uses
“gadgets” to find likely additional malicious web pages– Links gadget– Content dork gadget– Popular terms gadget– SEO gadget– DNS queries gadget
• Some level of random crawling is still necessary to find completely new malicious web pages
Prophiler
• Quick identification of possible drive-by-download web pages– Each web page is deemed benign or possibly malicious– Detection models derived through supervised machine-
learning• System as filter between a crawler and a more costly
(and more precise) dynamic analysis system – The filter can allow high FP rates, as they are later
discarded by the dynamic analysis system
Learning Approach• 77 static features are extracted from each URL and web
page– HTML (19): web page content– JavaScript (25): web page code– URL and host-based (33): URL and URLs included in the content,
taking into account host characteristics (WHOIS, DNS)• Supervised machine learning
– Learning: the system is fed with a labeled dataset• Both known malicious and benign samples
– A model is generated by the system– 10-fold cross validation is used to evaluate the effectiveness of
each model– The models can then be used for detection
Anubis and Wepawet
• Web pages and binary components need to be analyzed– To identify their nature (malicious, benign)– To identify their relationships with other components (e.g.,
C&C sites, distribution sites, malware components) • Anubis: Binary program analyzer– Available at http://anubis.cs.ucsb.edu
• Wepawet: Web page analyzer– Available at http://wepawet.cs.ucsb.edu