advanced anti-deobfuscation - github pages•compilers, binary rewriting tools, whole program...

45
Advanced Anti-Deobfuscation Bjorn De Sutter ISSISP 2017 – Paris 1

Upload: others

Post on 29-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

AdvancedAnti-Deobfuscation

BjornDeSutter

ISSISP2017– Paris

1

Page 2: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Aboutme

• Researchdomain:systemsoftware• compilers,binaryrewritingtools,wholeprogramoptimization(binary&Java),virtualization,run-timeenvironments

• improveprogrammerproductivitybymeansofautomation• applytoolsfordifferentapplications

• obfuscation,diversity,mitigatingsidechannelsandfaultinjection,...• protectagainstexploitationofvulnerabilities(multi-variantexecution)• generatingcodeforaccelerators

• Alsoworked/spenttimeat

• Interruptsenabled

2

Page 3: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Aboutme

3Data Hiding Algorithm Hiding Anti-Tampering Remote Attestation Renewability

SafeNet use case

Gemalto use case

Nagravision use case

Protected SafeNet use case

Protected Gemalto use case

Protected Nagravision use case

Software

Protection

Tool Flow

ASPIRE Framework

Decision Support System

Software Protection Tool Chain

http://www.aspire-fp7.eu

Page 4: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

LectureOverview1. BasicAttacks

• attacksonwhat?• basicattacktools&techniques

4

2. Defenses• anti-anything

3. AdvancedAutomatedAttacks• genericdeobfuscation• symbolicexecution

4. Defenses• anti-even-more

Page 5: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Whatisbeingattacked?

5

Asset category Security Requirements Examples of threats

Private data(keys, credentials, tokens, private info)

ConfidentialityPrivacyIntegrity

Impersonation, illegitimate authorizationLeaking sensitive dataForging licenses

Public data(keys, service info) Integrity Forginglicenses

Unique data(tokens, keys, used IDs)

ConfidentialityIntegrity

ImpersonationService disruption,illegitimateaccess

Global data (crypto & app bootstrap keys)

ConfidentialityIntegrity

BuildemulatorsCircumventauthenticationverification

Traceable data/code(Watermarks, finger-prints,traceable keys)

Non-repudiation Make identification impossible

Code (algorithms, protocols,security libs) Confidentiality Reverse engineering

Application execution(license checks & limitations, authentication & integrity verification, protocols)

Execution correctness Integrity

Circumvent security features (DRM)Out-of-context use, violating license terms

Page 6: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

6

Whatisbeingattacked?

ASSET

PROTECTION1

PROTECTION2

PROTECTION3

PROTECTION4

PROTECTION5

PROTECTION6

PROTECTION7

PROTECTION8ADDITIONALCODE

1.Attackersaimforassets,layeredprotectionsareonlyobstacles2.Attackersneedtofindassets(byiterativelyzoomingin)3.Attackersneedtools&techniquestobuildaprogramrepresentation,toanalyze,andtoextractfeatures

4.Attackersiterativelybuildstrategybasedonexperienceandconfirmedandrevisedassumptions,incl.onpathofleastresistance

5.Attackerscanundo,circumvent,orovercomeprotectionswithorwithouttamperingwiththecode

Page 7: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

BasicAttackTechniques

• Static attacksteps:withoutexecutingthecode• symbolicinformation• graphrepresentationsofprogram

• Dynamic attacksteps:observingexecution• allkindsofhooks• startandinterveneatinterfaces• observefeaturesandpatternsofprogramexecution(traces)

• Hybrid attacksteps:combinationofboth• e.g.:buildgraphsof(unpacked)codeobservedduringexecution

7

Page 8: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Disassemblers- 1• IDAPro• BinaryNinja• angr

• Farfromperfect• incompletedisassembly• incorrectgraphs(controlflow,callgraphs)

• Flexibleandinteractive• linearsweep,recursivedescent,heuristical andmanualdisassembly• GUI• codeannotation• plug-insandscripts

8

Page 9: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Disassemblers- 1• IDAPro• BinaryNinja

• Farfromperfect• incompletedisassembly• incorrectgraphs(controlflow,callgraphs)

• Flexibleandinteractive• GUI• annotation• Plug-insandscripts

9

Page 10: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Disassemblers- 2

• Static&hybridattacks• Relyonmanyunderlyingassumptions• Librarydetection

• F.L.I.R.T• Diffingtools

• BinDiff• Customtools

• detectpatterns• undoobfuscations• dataflowanalysis

• Supportscodeediting• Interfaceswith(remote)debuggers

10

Page 11: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

• Static&hybridattacks• Librarydetection

• F.L.I.R.T

• Diffingtools• BinDiff

• Customtools• detectpatterns• undoobfuscations• dataflowanalysis

Disassemblers- 2

11

Page 12: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Disassemblers- 2

12

Page 13: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Disassemblers- 3

13

• Decompiler

Page 14: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Debuggers- 1

• GDB• OllyDbg

• Scriptable

• Supporttampering• alterprocessorstate(incl.programcounter)• altermemorycontents• altercode• usedforout-of-contextexecution

14

Page 15: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Debuggers- 1

• GDB• OllyDbg

• Scriptable• Usedfortampering

• alterprocessorstate(incl.programcounter)• altermemorycontents• altercode• usedforout-of-contextexecution

15

Page 16: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Debuggers- 2

• Usedforprogramunderstanding• Usedforzoominginonrelevantcode

• Continuousiterativerefinementofscripts

• Lowoverheadwithhardwarebreakpoints• Highoverheadwithsoftwarebreakpoints

• Requirestampering

16

Page 17: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Emulation&Instrumentation

• QEMU• Pin• Valgrind• DynInst• ltrace

• Usedtocollecttraces• Toidentifypatternsandpointsofinterest

• Usedlikeadebugger• Iterativerefinementofscripts• Butnotinteractive

17

Page 18: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

SoftwareTampering• Editingthebinary• Alterrunningprocessstate(CPU,memory)• Interveneatinterfaces

• systemcalls• librarycalls• networkactivities• ....

• CustombinariestoinvokelibraryAPIs• Aforementionedtools• CheatEngine

• allkindsofreverseengineeringaids(pointerchaining)18

Page 19: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Pointerchaining

19

structplayer

boolvisible

Page 20: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Pointerchaining

20

structplayer

boolvisible

Page 21: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

stack

play()

Pointerchaining

21

structplayer

boolvisible

structgame

*(*(ESP(play())-0x16)+0x4)+0x28

Page 22: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Pointerchaining

22

Page 23: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

LectureOverview1. BasicAttacks

• attacksonwhat?• basicattacktools&techniques

23

2. Defenses• anti-anything

3. AdvancedAutomatedAttacks• genericdeobfuscation• symbolicexecution

4. Defenses• anti-even-more

Page 24: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Anti-tampering

• Codeguards(codeintegrity)• hashesovercoderegions

• Stateinspection• checkforexistinginvariants• injectadditionalinvariants• fordataintegrityandcontrolflowintegrity

• Basiccontrolflowintegrity• checkreturnaddresses• checkstackframes

24

Page 25: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Remoteattestation

25

D1.04 – Reference Architecture v2.1

ASPIRE D1.04 v2.1 PUBLIC Page 48 of 99

actions to change the behaviour of the application. These actions may vary according

the policy configured for the application.

Conceptually, these components cooperate as depicted in the workflow diagram depicted in

Figure 17, and further detailed in the steps following the figure.

DelayComponent

OriginalApplicationlogic

Attestator

1

Verifier

2

UpdateFunctions

3

DelayDataStructures

54

QueryFunctions

Reaction

Figure 17 – Anti-tamper components

Seq# Operation description

1 The attestator routine is invoked.

Details: An attestator routine is invoked that returns the attestation report in the form of some

data. It should be noted that the attestator can be a single monolithic routine, but it can also

consist of a collection of smaller routines that are invoked one after the other to iteratively

compute an attestation report. In the latter case, the routines can actually also be in-lined into

the program and hence be indistinguishable from the application code.

2 A verifier is invoked.

Details: In step 2, which will typically be executed immediately after step 1, the attestation

report is verified. This can occur locally, but it the verification can also be offloaded (completely

or partially) onto a secure server. By offloading the verification to a server beyond the reach of

an attacker, the attacker cannot learn how to fabricate correct responses by studying the

verification routine. In practice, the attestator and verifier can also be combined into one

routine. Alternatively, their invocation can be pulled apart to some degree, in order to hide the

dependency between them. However, there always has to remain a guarantee that whenever

the attestator routine is invoked, so is the verifier routine, and vice versa.

3 Update the tamper detection status.

Details: In step 3, which typically will follow immediately after step 2, the result of verification

(the verdict) is used to encode the tamper detection status of the application. Based on the

verdict, an update function is invoked that alters the delay data structures to encode that some

form of tampering was or was not detected.

It is important to note here that the update functions can also be invoked from random places

in the original program, as long as those random invocations do not alter the information

regarding detected tampering encoded in the data structures. This is important because such

random updates will give the delay data structures the appearance of being integral data

attestators:- codeguards- timing- dataintegrity- controlflowintegrity

verification:- localvs.remote- preventreplayattacks

reaction:- abort- corruption- notifyserver(blockplayer)- gracefuldegradation- lowerquality

delayreaction:- attackerseessymptom- hiderelationwithcause!

Page 26: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Anti-disassembly• Hidecode

• packers,virtualization,downloadcodeondemand,self-modifyingcode

• Junkbytes• Indirectcontrolflowtransfers• Jumpsintomiddleofinstructions• Codelayoutrandomization• Overlappinginstructions• Exploitknownheuristics

• continuationpoints• patternsforfunctionprologues,epilogues,calls,...

Often,wronginformationisworsethannoinformation.

26

Page 27: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Anti-disassemblyexamples

27

0x123a: jmp 0xabca;...

0xabca: addl #44,eax

0x123a: call 0xabca;...

0xabca: pop ebx;addl #44,eax

obfuscation

Example1

Example2

0x123a: call 0xabca;...

0xabca: ...ret

0x123a: push *(0xc000)jmp 0xabcapop eax...

0xabca: ...jmp *(esp)

0xc000: 0x12424

obfuscation

Page 28: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Anti-decompilation

Exploitsemanticgapbetweensourcecodeandassemblycodeorbytecode

• stripunnecessarysymbolinformation

• renameidentifiers(I,l,L,1)• goto spaghetti• disobeyconstructorconventions• disobeyexceptionhandlingconventions

28

Page 29: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Anti-decompilation example

29

pre();try{ might_throw_exception();catch(Exception e){ handle_exception();}post();

pre();flag = 1

flag = 0might_throw_exception();

if(flag)

then

onexception

handle_exception();

else

fall-through

post();

fall-through

Figure 3.8: Example of combining try blocks with their catch blocks [BH07]. In the transformedcode fragment, the try and catch block both start at the same instruction (in green).

The try-catch construct in the original code is replaced by an if-construct that eitherguides control to the code in the try-block, or the catch-block based on the value of a flag.The first instruction of the then block resets the flag, such that if further code in the thenblock throws an exception and the if-statement is re-evaluated, control flows through to thecode in the else block where the exception handler is located. Rewriting the code in this wayallows both the try-block and the catch-block to start at the same instruction.

3.4.5 Indirecting if instructions

The indirecting if instructions transformation takes advantage of the fact that goto statementsare not valid in Java source code. This transformation looks for different if -statements andindirects them by means of goto instructions. These goto instructions are then wrapped in atry-block so they cannot be removed without statically proving that an exception can never bethrown within the try-block. Being unable to remove the goto statements, most decompilerswill struggle in generating correct Java source code.

...if_<cond_a> target_a...if_<cond_b> target_b...if_<cond_c> target_c...

...if_<cond_a> target_goto_a...if_<cond_b> target_goto_b...if_<cond_c> target_goto_c...

try{ target_goto_a: goto target_a target_goto_b: goto target_b target_goto_c: goto target_c}catch(Exception e){}

Figure 3.9: Pseudo-code example of indirecting if instructions. Each if instruction’s target is directedto a goto which transfers control to the if instruction’s original target.

An example of the indirecting if instructions transformation is given in Figure 3.9. Thisfigure indicates how each if-statement on the left is indirected by means of a goto statement.As shown in the figure, it is possible to wrap different goto statements in the same try-block.

41

pre();try{ might_throw_exception();catch(Exception e){ handle_exception();}post();

pre();flag = 1

flag = 0might_throw_exception();

if(flag)

then

onexception

handle_exception();

else

fall-through

post();

fall-through

Figure 3.8: Example of combining try blocks with their catch blocks [BH07]. In the transformedcode fragment, the try and catch block both start at the same instruction (in green).

The try-catch construct in the original code is replaced by an if-construct that eitherguides control to the code in the try-block, or the catch-block based on the value of a flag.The first instruction of the then block resets the flag, such that if further code in the thenblock throws an exception and the if-statement is re-evaluated, control flows through to thecode in the else block where the exception handler is located. Rewriting the code in this wayallows both the try-block and the catch-block to start at the same instruction.

3.4.5 Indirecting if instructions

The indirecting if instructions transformation takes advantage of the fact that goto statementsare not valid in Java source code. This transformation looks for different if -statements andindirects them by means of goto instructions. These goto instructions are then wrapped in atry-block so they cannot be removed without statically proving that an exception can never bethrown within the try-block. Being unable to remove the goto statements, most decompilerswill struggle in generating correct Java source code.

...if_<cond_a> target_a...if_<cond_b> target_b...if_<cond_c> target_c...

...if_<cond_a> target_goto_a...if_<cond_b> target_goto_b...if_<cond_c> target_goto_c...

try{ target_goto_a: goto target_a target_goto_b: goto target_b target_goto_c: goto target_c}catch(Exception e){}

Figure 3.9: Pseudo-code example of indirecting if instructions. Each if instruction’s target is directedto a goto which transfers control to the if instruction’s original target.

An example of the indirecting if instructions transformation is given in Figure 3.9. Thisfigure indicates how each if-statement on the left is indirected by means of a goto statement.As shown in the figure, it is possible to wrap different goto statements in the same try-block.

41

pre();try{ might_throw_exception();catch(Exception e){ handle_exception();}post();

pre();flag = 1

flag = 0might_throw_exception();

if(flag)

then

onexception

handle_exception();

else

fall-through

post();

fall-through

Figure 3.8: Example of combining try blocks with their catch blocks [BH07]. In the transformedcode fragment, the try and catch block both start at the same instruction (in green).

The try-catch construct in the original code is replaced by an if-construct that eitherguides control to the code in the try-block, or the catch-block based on the value of a flag.The first instruction of the then block resets the flag, such that if further code in the thenblock throws an exception and the if-statement is re-evaluated, control flows through to thecode in the else block where the exception handler is located. Rewriting the code in this wayallows both the try-block and the catch-block to start at the same instruction.

3.4.5 Indirecting if instructions

The indirecting if instructions transformation takes advantage of the fact that goto statementsare not valid in Java source code. This transformation looks for different if -statements andindirects them by means of goto instructions. These goto instructions are then wrapped in atry-block so they cannot be removed without statically proving that an exception can never bethrown within the try-block. Being unable to remove the goto statements, most decompilerswill struggle in generating correct Java source code.

...if_<cond_a> target_a...if_<cond_b> target_b...if_<cond_c> target_c...

...if_<cond_a> target_goto_a...if_<cond_b> target_goto_b...if_<cond_c> target_goto_c...

try{ target_goto_a: goto target_a target_goto_b: goto target_b target_goto_c: goto target_c}catch(Exception e){}

Figure 3.9: Pseudo-code example of indirecting if instructions. Each if instruction’s target is directedto a goto which transfers control to the if instruction’s original target.

An example of the indirecting if instructions transformation is given in Figure 3.9. Thisfigure indicates how each if-statement on the left is indirected by means of a goto statement.As shown in the figure, it is possible to wrap different goto statements in the same try-block.

41

pre();try{ might_throw_exception();catch(Exception e){ handle_exception();}post();

pre();flag = 1

flag = 0might_throw_exception();

if(flag)

then

onexception

handle_exception();

else

fall-through

post();

fall-through

Figure 3.8: Example of combining try blocks with their catch blocks [BH07]. In the transformedcode fragment, the try and catch block both start at the same instruction (in green).

The try-catch construct in the original code is replaced by an if-construct that eitherguides control to the code in the try-block, or the catch-block based on the value of a flag.The first instruction of the then block resets the flag, such that if further code in the thenblock throws an exception and the if-statement is re-evaluated, control flows through to thecode in the else block where the exception handler is located. Rewriting the code in this wayallows both the try-block and the catch-block to start at the same instruction.

3.4.5 Indirecting if instructions

The indirecting if instructions transformation takes advantage of the fact that goto statementsare not valid in Java source code. This transformation looks for different if -statements andindirects them by means of goto instructions. These goto instructions are then wrapped in atry-block so they cannot be removed without statically proving that an exception can never bethrown within the try-block. Being unable to remove the goto statements, most decompilerswill struggle in generating correct Java source code.

...if_<cond_a> target_a...if_<cond_b> target_b...if_<cond_c> target_c...

...if_<cond_a> target_goto_a...if_<cond_b> target_goto_b...if_<cond_c> target_goto_c...

try{ target_goto_a: goto target_a target_goto_b: goto target_b target_goto_c: goto target_c}catch(Exception e){}

Figure 3.9: Pseudo-code example of indirecting if instructions. Each if instruction’s target is directedto a goto which transfers control to the if instruction’s original target.

An example of the indirecting if instructions transformation is given in Figure 3.9. Thisfigure indicates how each if-statement on the left is indirected by means of a goto statement.As shown in the figure, it is possible to wrap different goto statements in the same try-block.

41

Batchelder,Michael,andLaurieHendren."ObfuscatingJava:themostpainfortheleastgain."InCompilerConstruction,pp.96-110.SpringerBerlinHeidelberg,2007

Page 30: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Anti-debugging

30

• Option1:checkenvironmentforpresencedebugger

• Option2:preventdebuggertoattach• OS&hardwaresupportatmostonedebuggerperprocess• occupyoneseatwithcustom“debugger”process• makecontrol&dataflowdependentoncustomdebugger

• anti-debuggingbymeansofself-debugging

Page 31: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Self-Debugging

31

function1

function2

function3

minidebugger

Page 32: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Self-Debugging

32

function1

function2

function3

minidebugger

function1

function2

function3

minidebugger

Page 33: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Self-Debugging

33

function1

function2

function3

minidebugger

function1

function2

function3

minidebugger

process1045 process3721

debuggee debugger

Page 34: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Self-Debugging

34

function1

function2

function3

minidebugger

function1

function2

function3

minidebugger

process1045 process3721

debuggee debugger

function2a function2b

Page 35: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Anti-emulation

• Emulatorsarebuggy– incomplete• Virtualenvironmentsarenotreal

• JohannaRutkowska• Bluepill• Redpill

35

Page 36: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

LectureOverview1. BasicAttacks

• attacksonwhat?• basicattacktools&techniques

36

2. Defenses• anti-anything

3. AdvancedAutomatedAttacks• genericdeobfuscation• symbolicexecution

4. Defenses• anti-even-more

Page 37: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

GenericDeobfuscation (Yadegari etalIEEES&P2015)

37

• noobfuscation-specificassumptions• treatprogramsasinput-to-outputtransformations• usesemantics-preservingtransformationstosimplifyexecutiontraces

• dynamicanalysistohandleruntimeunpacking

Taintanalysis(bit-level)

Controlflowreconstruction

Semantics-preserving

transformations/simplifications

inpu

tprogram

controlflo

w

graph

mapflowofvaluesfrominputtooutput

reconstructlogicofsimplifiedcomputation

Page 38: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

GenericDeobfuscation (Yadegari etalIEEES&P2015)

38

unpack

unpack

output

output

input

input

instructions“tainted”aspropagatingvaluesfrominputtooutput

input-to-outputcomputation(furthersimplified)

used

tocon

structcon

trolflow

graph

Page 39: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

GenericDeobfuscation (Yadegari etalIEEES&P2015)

39

• Quasi-invariantlocations:locationsthathavethesamevalueateachuse.• Their transformations:

• Arithmeticsimplification• adaptationofconstantfoldingtoexecutiontraces• considerquasi-invariantlocationsasconstants• controlledtoavoidover-simplification

• Controlsimplification• E.g.,convertindirectjumpthroughaquasi-invariantlocationintoadirectjump

• Datamovementsimplification• usepattern-drivenrulestoidentifyandsimplifydatamovement.

• Deadcodeelimination• needtoconsiderimplicitdestinations,e.g.,conditioncodeflags.

Page 40: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

GenericDeobfuscation (Yadegari etalIEEES&P2015)

40

original obfuscatedwithThemida (cropped) deobfuscated

Page 41: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

SymbolicExecution

41

effectivebecausemostobfuscationsimplementsemanticsthatdonotinvolveinput

Page 42: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

LectureOverview1. BasicAttacks

• attacksonwhat?• basicattacktools&techniques

42

2. Defenses• anti-anything

3. AdvancedAutomatedAttacks• genericdeobfuscation• symbolicexecution

4. Defenses• anti-even-more

Page 43: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Anti-taintanalysis

• taintingalldatawithartificialcomputations

• hidingdatadependenciesthroughcovertchannels• time• systemstate• anythingnotnormallycheckedbyanalysis

43

Page 44: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Obfuscationswithvarying,input-dependentbehavior(Banescu etal,ACSAC2016)

44

Listing 2: Range divider with 2 branches1 if (x > 42)2 z = x + y + w3 else4 z = (((x ^ y) + ((x & y) << 1)) | w) +5 (((x ^ y) + ((x & y) << 1)) & w);

Listing 3: Program with loop1 unsigned char *str = argv [1];2 unsigned int hash = 0;3 for(int i = 0; i < strlen(str); str++, i++) {4 hash = (hash << 7) ^ (*str);5 }6 if (hash == 809267) printf("win\n");

code characteristics of the original program. EncL and ISubare not effective against symbolic execution. AddO and BCFhave a mild effect on the slowdown. EncA slows down sym-bolic execution due to the larger size of SMT queries. Flatand CFF slow down symbolic execution due to the largernumber of SMT queries issued to the solver. Virt slows downsymbolic execution due to the high number of fetch-decode-dispatch instructions added. The results also indicate inwhich order obfuscation transformations should be appliedto increase effectiveness. None of these obfuscation trans-formations add input-dependent paths to the programs.

In § 3.3 we have symbolically executed another set ofprograms obfuscated with 5 representative transformationsfrom Tigress used in § 3.2. Transformations from Obfusca-tor LLVM were not used in § 3.3 because they had similareffects as their corresponding transformations from Tigress.In this experiment we compared the performance of differentsymbolic execution engines to find test cases that lead to acertain (difficult to reach) path in the obfuscated programs.We observed that KLEE is most effective followed by angr.

We do not know if these results generalize for all possibleprograms. However, we believe that they have some degreeof generality due to the heterogeneity of our datasets andthe intuitive explanations we provide in our observations.

4. PROPOSED OBFUSCATIONThe proposed obfuscation transformations are inspired

by observation 10 in Experiment 3 from § 3.2, i.e. branchinstructions added by control-flow obfuscation transforma-tions do not depend on program inputs. Therefore, wepropose making branch instructions added by control-flowobfuscation transformations dependent on program inputsto increase the slowdown of symbolic execution. Makingsuch branch instructions input dependent may or may notbe effective for existing obfuscation transformations. For in-stance, making the opaque predicates p added by Tigress in-put dependent, causes KLEE to issue a query on the branchcondition corresponding to p. However, the SMT solver al-ways determines that p or ¬p is unsatisfiable and does notanalyze the code in unfeasible paths. In the following wepropose two obfuscation transformations which introducefeasible paths in the original program.

4.1 Range DividersOur first proposal is an obfuscation transformation called

range divider. Range dividers are branch conditions thatcan be inserted at an arbitrary position inside a basic block,such that they divide the input range into multiple sets.In contrast to opaque predicates, range divider predicatesmay have multiple branches, any of which could be true

Listing 4: Program from Listing 3 obfuscated withrange divider1 unsigned char *str = argv [1];2 unsigned int hash = 0;3 for(int i = 0; i < strlen(str); str++, i++) {4 char chr = *str;5 if (chr > 42) {6 hash = (hash << 7) ^ chr;7 } else {8 hash = (hash * 128) ^ chr;9 }

10 }11 if (hash == 809267) printf("win\n");

Listing 5: Program from Listing 3 obfuscated withmaximum number of branches of range divider1 unsigned char *str = argv [1];2 unsigned int hash = 0;3 for(int i = 0; i < strlen(str); str++, i++) {4 char chr = *str;5 switch (chr) {6 case 1: hash = (hash << 7) ^ chr;7 break;8 case 2: // obfuscated version of case 19 break;

10 ...11 default: // obfuscated version of case 112 break;13 }14 }15 if (hash == 809267) printf("win\n");

and false depending on program input. This will cause asymbolic execution engine to explore all branches of a rangedivider. In order to preserve the functionality property ofan obfuscator, we use equivalent instruction sequences inall branches of a range divider predicate, as illustrated inListing 2. To prevent compiler optimizations from remov-ing range divider predicates, due to equivalent code in theirbranches, we employ software diversity (on the code of everybranch) via different obfuscation configurations, e.g. in List-ing 2 we used EncodeArithmetic on the else branch. We haveexperimented with all optimization levels of LLVM clangand none of them remove range divider predicates if theirbranches are obfuscated. On the downside, range dividersincrease the size of the program proportionally to the totalnumber of branches.

The effectiveness of a range divider predicate against sym-bolic execution depends on: (1) number of branches of thepredicate, denoted ρ and (2) the number of times the predi-cate is executed, denoted τ . More specifically, its number ofpaths increases according to the function: ρτ . For example,consider the program from Listing 3, which computes a value(hash) based on its first argument (argv[1]) and outputs“win”on the standard output if this value is equal to 809267.It has an execution tree with 2 × strlen(argv[1]) paths.The program in Listing 4 is obtained by obfuscating the pro-gram from Listing 3 using divide range predicate with ρ = 2branches. The resulting program has 2strlen(argv[1]) paths,because the predicate is executed τ = strlen(argv[1])times. We can further increase the number of paths byadding more branches to the divide range predicate fromListing 4. However, the number of possible branches isupper-bounded by the cardinality of the type of the variableused in the range divider. In the example from Listing 4the maximum number of branches is 256 because variablechr is of type char. Therefore, the maximum number ofbranches in this example is achieved by a switch-statement

196

Listing 2: Range divider with 2 branches1 if (x > 42)2 z = x + y + w3 else4 z = (((x ^ y) + ((x & y) << 1)) | w) +5 (((x ^ y) + ((x & y) << 1)) & w);

Listing 3: Program with loop1 unsigned char *str = argv [1];2 unsigned int hash = 0;3 for(int i = 0; i < strlen(str); str++, i++) {4 hash = (hash << 7) ^ (*str);5 }6 if (hash == 809267) printf("win\n");

code characteristics of the original program. EncL and ISubare not effective against symbolic execution. AddO and BCFhave a mild effect on the slowdown. EncA slows down sym-bolic execution due to the larger size of SMT queries. Flatand CFF slow down symbolic execution due to the largernumber of SMT queries issued to the solver. Virt slows downsymbolic execution due to the high number of fetch-decode-dispatch instructions added. The results also indicate inwhich order obfuscation transformations should be appliedto increase effectiveness. None of these obfuscation trans-formations add input-dependent paths to the programs.In § 3.3 we have symbolically executed another set of

programs obfuscated with 5 representative transformationsfrom Tigress used in § 3.2. Transformations from Obfusca-tor LLVM were not used in § 3.3 because they had similareffects as their corresponding transformations from Tigress.In this experiment we compared the performance of differentsymbolic execution engines to find test cases that lead to acertain (difficult to reach) path in the obfuscated programs.We observed that KLEE is most effective followed by angr.We do not know if these results generalize for all possible

programs. However, we believe that they have some degreeof generality due to the heterogeneity of our datasets andthe intuitive explanations we provide in our observations.

4. PROPOSED OBFUSCATIONThe proposed obfuscation transformations are inspired

by observation 10 in Experiment 3 from § 3.2, i.e. branchinstructions added by control-flow obfuscation transforma-tions do not depend on program inputs. Therefore, wepropose making branch instructions added by control-flowobfuscation transformations dependent on program inputsto increase the slowdown of symbolic execution. Makingsuch branch instructions input dependent may or may notbe effective for existing obfuscation transformations. For in-stance, making the opaque predicates p added by Tigress in-put dependent, causes KLEE to issue a query on the branchcondition corresponding to p. However, the SMT solver al-ways determines that p or ¬p is unsatisfiable and does notanalyze the code in unfeasible paths. In the following wepropose two obfuscation transformations which introducefeasible paths in the original program.

4.1 Range DividersOur first proposal is an obfuscation transformation called

range divider. Range dividers are branch conditions thatcan be inserted at an arbitrary position inside a basic block,such that they divide the input range into multiple sets.In contrast to opaque predicates, range divider predicatesmay have multiple branches, any of which could be true

Listing 4: Program from Listing 3 obfuscated withrange divider1 unsigned char *str = argv [1];2 unsigned int hash = 0;3 for(int i = 0; i < strlen(str); str++, i++) {4 char chr = *str;5 if (chr > 42) {6 hash = (hash << 7) ^ chr;7 } else {8 hash = (hash * 128) ^ chr;9 }10 }11 if (hash == 809267) printf("win\n");

Listing 5: Program from Listing 3 obfuscated withmaximum number of branches of range divider1 unsigned char *str = argv [1];2 unsigned int hash = 0;3 for(int i = 0; i < strlen(str); str++, i++) {4 char chr = *str;5 switch (chr) {6 case 1: hash = (hash << 7) ^ chr;7 break;8 case 2: // obfuscated version of case 19 break;

10 ...11 default: // obfuscated version of case 112 break;13 }14 }15 if (hash == 809267) printf("win\n");

and false depending on program input. This will cause asymbolic execution engine to explore all branches of a rangedivider. In order to preserve the functionality property ofan obfuscator, we use equivalent instruction sequences inall branches of a range divider predicate, as illustrated inListing 2. To prevent compiler optimizations from remov-ing range divider predicates, due to equivalent code in theirbranches, we employ software diversity (on the code of everybranch) via different obfuscation configurations, e.g. in List-ing 2 we used EncodeArithmetic on the else branch. We haveexperimented with all optimization levels of LLVM clangand none of them remove range divider predicates if theirbranches are obfuscated. On the downside, range dividersincrease the size of the program proportionally to the totalnumber of branches.The effectiveness of a range divider predicate against sym-

bolic execution depends on: (1) number of branches of thepredicate, denoted ρ and (2) the number of times the predi-cate is executed, denoted τ . More specifically, its number ofpaths increases according to the function: ρτ . For example,consider the program from Listing 3, which computes a value(hash) based on its first argument (argv[1]) and outputs“win”on the standard output if this value is equal to 809267.It has an execution tree with 2 × strlen(argv[1]) paths.The program in Listing 4 is obtained by obfuscating the pro-gram from Listing 3 using divide range predicate with ρ = 2branches. The resulting program has 2strlen(argv[1]) paths,because the predicate is executed τ = strlen(argv[1])times. We can further increase the number of paths byadding more branches to the divide range predicate fromListing 4. However, the number of possible branches isupper-bounded by the cardinality of the type of the variableused in the range divider. In the example from Listing 4the maximum number of branches is 256 because variablechr is of type char. Therefore, the maximum number ofbranches in this example is achieved by a switch-statement

196

1.RANGEDIVIDER

Page 45: Advanced Anti-Deobfuscation - GitHub Pages•compilers, binary rewriting tools, whole program optimization (binary & Java), virtualization, run-time environments •improve programmer

Obfuscationswithvarying,input-dependentbehavior(Banescu etal,ACSAC2016)

45

2.INPUTINVARIANTS

1. injectextrainputsintoprograms2. letcorrectexecutiondependoninvariantpropertiesofthoseinputs

forexample:feedprogramextrakeytodecryptbytecode

(howtogetthesetotheuser???)