detecting software theft via system call based birthmarks xinran wang, yoon-chan jhi, sencun zhu,...

Post on 27-Dec-2015

226 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Detecting Software Theft Detecting Software Theft via System Call Based via System Call Based BirthmarksBirthmarks

Xinran Wang, Yoon-Chan Jhi, Sencun Zhu,

Peng Liu ACSAC 2009

OUTLINEOUTLINEIntroduction and Related WorkSystem Call Based BirthmarksSystem Design and

ImplementationEvaluationDiscussion and Conclusion

Software Theft (or Software Theft (or plagiarism)plagiarism)Reuse someone else’s code

◦Even only a small part of the original program

Obfuscation techniques◦Different compilers◦Different compiler optimization

levels◦SandMark

DefenderDefenderSoftware watermark

◦Theoretically, any watermark can be removed

Software birthmark◦A unique characteristic that a

program inherently possesses

Defender(Cont.)Defender(Cont.)Requirements

◦R1: Resiliency to obfuscation techniques

◦R2: Capability to detect theft of components

◦R3: Large-scale◦R4: Applicability to binary

executables◦R5: Independence to platforms

Related WorkRelated WorkSoftware Birthmark

◦Static source code based birthmark◦Static executable code based birthmark◦Dynamic whole program path(WPP)

based birthmark◦Dynamic API based birthmark

Clone Detection◦String-based, AST-based, Token-based

and PDG-basedCannot satisfy all requirements

System Call Based System Call Based BirthmarksBirthmarksBehavior based birthmarks

◦Unique behaviors in features and implementation details

SCSSB (System Call Short Sequence Birthmark)

IDSCSB (Input Dependant System Call Subsequence Birthmark)

SCSSB (System Call Short SCSSB (System Call Short Sequence Birthmark)Sequence Birthmark)Definition 1: (System Call Trace)

Definition 2: (System Call Sequence Set)

SCSSB (System Call Short SCSSB (System Call Short Sequence Birthmark)Sequence Birthmark)

SCSSB (System Call Short SCSSB (System Call Short Sequence Birthmark)Sequence Birthmark)Definition 3: (SCSSB: System Call

Short Sequence Birthmark)

SCSSB(p, I, k) is a subset of set S(p, I, k) that satisfies

SCSSB (System Call Short SCSSB (System Call Short Sequence Birthmark)Sequence Birthmark)Definition 4: (Containment) The

containment of A in B is defined as:

Here A is the birthmark of a plaintiff program or its component, and B is the birthmark of a suspect program.

System Design and System Design and ImplementationImplementation

System Design and System Design and ImplementationImplementationSystem Call Tracer

System Call Abstraction

Birthmark Generator

Input Dependant System Call Subsequence Birthmarks

System Call TracerSystem Call TracerThe simplest way

◦straceWith thread identifier

◦SATracer based on ValgrindPrepare a list of all subroutines of the

component in SATracer◦The list is automatically generated by Elsa

SATracer checks the execution stack of the running thread when a system call is called

System Call AbstractionSystem Call AbstractionIgnore the system calls that do not

represent the behavior characteristic◦brk , mmap

Consider aliases or multiple versions of a system call as the same◦Ex: fstat(int fd, struct stat *sb) and

stat(const char *path, struct stat *sb)Ignore failed system calls

Birthmark GeneratorBirthmark GeneratorRemove those loading-

environment-dependent system calls◦Run multiple times with the same

input

Remove the (noisy) system calls◦Establish a database of common

system call short sequences

Input Dependant System Call Input Dependant System Call Subsequence BirthmarksSubsequence BirthmarksDefinition 7: (IDSCSB: Input

Dependant System Call Subsequence Birthmark)

Containment:

Input Dependant System Call Input Dependant System Call Subsequence BirthmarksSubsequence Birthmarks

“file id” and “process id” are ignoredLarge parameters are hashed by the

MD5

EvaluationEvaluationSCSSB and IDSCSB:

◦Against some advanced obfuscation techniques and 15 real-world large applications

SandMark implements 39 byte code obfuscators

x86 Linux executableGCJ 4.1.2

Evaluation(Cont.)Evaluation(Cont.)Programs

◦bzip2.c, gzip.c and oggenc.cImpact of Compiler Optimization

Levels◦five optimization switches (-O0,-O1,-

O2,-O3 and -Os) of GCC (e.g., bzip2-O0, bzip2-O3, etc.)

Impact of Different Compilers◦GCC, TCC and Watcom (e.g., bzip2-

gcc, bzip2-tcc)

SCSSB Experiment I(JLex and SCSSB Experiment I(JLex and JFlexJFlex))

SCSSB Experiment SCSSB Experiment I(Cont.)I(Cont.)JLex and JFlex

SCSSB Experiment SCSSB Experiment I(Cont.)I(Cont.)Containment

scores◦JLex

CO: 87.9% DO: 85.2%

◦JFlex CO: 96% DO: 96%

SCSSB Experiment SCSSB Experiment II(Gecko)II(Gecko)Gecko: Layout engine used in all

Mozilla software and its derivatives

SCSSB Experiment SCSSB Experiment II(Cont.)II(Cont.)

IDSCSB Experiment I(JLex and IDSCSB Experiment I(JLex and JFlexJFlex))The containment scores between

original and obfuscated JLex are all 100%

Between JLex and obfuscated JFlex are less than 46%

Between JLex/JFlex and other programs are no more than 7%.

IDSCSB Experiment IDSCSB Experiment II(Gecko)II(Gecko)

DiscussionDiscussionCounterattacks

◦System call injection attack◦System call reordering attack

Limitations◦If the program does not involve any

system calls…◦Need unique system call behaviors◦The detection result of our tool

depends on the threshold a user defines

ConclusionConclusionA novel type of birthmarks

Resilient to discriminates code obfuscated by SandMark, a state-of-the-art obfuscator

The first birthmark that:◦ Detect software component theft◦ Scalability to detect large-scale software

theft

top related