taintpipe: pipelined symbolic taint analysis - faculty pipelined symbolic taint analysis jiang ming,...

TaintPipe: Pipelined Symbolic Taint Analysis

Jiang Ming, Dinghao Wu, Gaoyao Xiao, Jun Wang, and Peng LiuCollege of Information Sciences and Technology

The Pennsylvania State University{jum310, dwu, gzx102, jow5222, pliu}@ist.psu.edu

Abstract

Taint analysis has a wide variety of compelling applica-tions in security tasks, from software attack detection todata lifetime analysis. Static taint analysis propagatestaint values following all possible paths with no needfor concrete execution, but is generally less accurate thandynamic analysis. Unfortunately, the high performancepenalty incurred by dynamic taint analyses makes its de-ployment impractical in production systems. To amelio-rate this performance bottleneck, recent research effortsaim to decouple data flow tracking logic from programexecution. We continue this line of research in this paperand propose pipelined symbolic taint analysis, a noveltechnique for parallelizing and pipelining taint analy-sis to take advantage of ubiquitous multi-core platforms.We have developed a prototype system called TaintPipe.TaintPipe performs very lightweight runtime logging toproduce compact control flow profiles, and spawns mul-tiple threads as different stages of a pipeline to carryout symbolic taint analysis in parallel. Our experimentsshow that TaintPipe imposes low overhead on applica-tion runtime performance and accelerates taint analysissignificantly. Compared to a state-of-the-art inlined dy-namic data flow tracking tool, TaintPipe achieves 2.38times speedup for taint analysis on SPEC 2006 and 2.43times for a set of common utilities, respectively. In ad-dition, we demonstrate the strength of TaintPipe such asnatural support of multi-tag taint analysis with severalsecurity applications.

1 Introduction

Taint analysis is a kind of program analysis that trackssome selected data of interest (taint seeds), e.g., dataoriginated from untrusted sources, propagates themalong program execution paths according to a cus-tomized policy (taint propagation policy), and thenchecks the taint status at certain critical location (taint

sinks). It has been shown to be effective in dealingwith a wide range of security problems, including soft-ware attack prevention [25, 40], information flow control[45, 34], data leak detection [49], and malware analy-sis [43], to name a few.

Static taint analysis [1, 36, 28] (STA) is performedprior to execution and therefore it has no impact on run-time performance. STA has the advantage of consider-ing multiple execution paths, but at the cost of poten-tial imprecision. For example, STA may result in eitherunder-tainting or over-tainting [32] when merging resultsat control flow confluence points. Dynamic taint analysis(DTA) [25, 13, 27], in contrast, propagates taint as a pro-gram executes, which is more accurate than static taintanalysis since it only considers the actual path taken atrun time. However, the high runtime overhead imposedby dynamic taint propagation has severely limited itsadoption in production systems. The slowdown incurredby conventional dynamic taint analysis tools [25, 13] caneasily go beyond 30X times. Even with the state-of-the-art DTA tool based on Pin [20], typically it still intro-duces more than 6X slowdown.

The crux of the performance penalty comes fromthe strict coupling of program execution and data flowtracking logic. The original program instructions min-gle with the taint tracking instructions, and usually ittakes 6–8 extra instructions to propagate a taint tag inshadow memory [11]. In addition, the frequent “con-text switches” between the original program executionand its corresponding taint propagation lead to registerspilling and data cache pollution, which add further pres-sure to runtime performance. The proliferation of multi-core systems has inspired researchers to decouple tainttracking logic onto spare cores in order to improve per-formance [24, 31, 26, 15, 17, 9]. Previous work canbe classified into two categories. The first category ishardware-assisted approaches. For example, Speck [26]needs OS level support for speculative execution androllback. Ruwase et al. [31] employ a customized hard-

ware for logging a program trace and delivering it toother idle cores for inspection. Nagarajan et al. [24] uti-lize a hardware first-in first-out buffer to speed up com-munication between cores. Although they can achieve anappealing performance, the requirement of special hard-ware prevents them from being adopted using commod-ity hardware.

The second category is software-only methods thatwork with binary executables on commodity multi-corehardware [15, 17, 9]. These software-only solutions relyon dynamic binary instrumentation (DBI) to decoupledynamic taint analysis from program execution. The pro-gram execution and parallelized taint analysis have tobe properly synchronized to transfer the runtime valuesthat are necessary for taint analysis. Although these ap-proaches look promising, they fail to achieve expectedperformance gains due to the large amounts of commu-nication data and frequent synchronizations between theoriginal program execution thread (or process) and itscorresponding taint analysis thread (or process). Recentwork ShadowReplica [17] creates a secondary shadowthread from primary application thread to run DTA inparallel. ShadowReplica conducts an offline optimiza-tion to generate optimized DTA logic code, which re-duces the amount of information that needs to be com-municated, and thus dramatically improves the perfor-mance. However, as we will show later, the performanceimprovement achieved by this “primary & secondary”thread model is fixed and cannot be improved furtherwhen more cores are available. Furthermore, in many se-curity related tasks (e.g., binary de-obfuscation and mal-ware analysis), precise static analysis for the offline opti-mization needed by ShadowReplica may not be feasible.

In this paper, we exploit another style of parallelism,namely pipelining. We propose a novel technique, calledTaintPipe, for parallel data flow tracking using pipelinedsymbolic taint analysis. In principle, TaintPipe fallswithin the second category of taint decoupling work clas-sified above. Essentially, in TaintPipe, threads form mul-tiple pipeline stages, working in parallel. The executionthread of an instrumented application acts as the sourceof pipeline, which records information needed for taintpipelining, including the control flow data and the con-crete execution states when the taint seeds are first intro-duced. To further reduce the online logging overhead, weadopt a compact profile format and an N-way bufferingthread pool. The application thread continues executingand filling in free buffers, while multiple worker threadsconsume full buffers asynchronously. When each loggeddata buffer becomes full, an inlined call-back functionwill be invoked to initialize a taint analysis engine, whichconducts taint analysis on a segment of straight-line codeconcurrently with other worker threads. Symbolic mem-ory access addresses are determined by resolving indirect

control transfer targets and approximating the ranges ofthe symbolic memory indices.

To overcome the challenge of propagating taint tagsin a segment without knowing the incoming taint state,TaintPipe performs segmented symbolic taint analysis.That is, the taint analysis engine assigned to each seg-ment calculates taint states symbolically. When a con-crete taint state arrives, TaintPipe then updates the re-lated taint states by replacing the relevant symbolic tainttags with their correct values. We call this symbolictaint state resolution. According to the segment order,TaintPipe sequentially computes the final taint state forevery segment, communicates to the next segment, andperforms the actual taint checks. Optimizations such asfunction summary and taint basic block cache offer en-hanced performance improvements. Moreover, differ-ent from previous DTA tools, supporting bit-level andmulti-tag taint analysis are straightforward for TaintPipe.TaintPipe does not require redesign of the structure ofshadow memory; instead, each taint tag can be naturallyrepresented as a symbolic variable and propagated withnegligible additional overhead.

We have developed a prototype of TaintPipe, apipelined taint analysis tool that decouples program ex-ecution and taint logic, and parallelizes taint analysis onstraight-line code segments. Our implementation is builton top of Pin [23], for the pipelining framework, andBAP [5], for symbolic taint analysis. We have evalu-ated TaintPipe with a variety of applications such as theSPEC CINT2006 benchmarks, a set of common utilities,a list of recent real-life software vulnerabilities, malware,and cryptography functions. The experiments show thatTaintPipe imposes low overhead on application runtimeperformance. Compared with a state-of-the-art inlineddynamic taint analysis tool, TaintPipe achieves overall2.38 times speedup on SPEC CINT2006, and 2.43 timeson a set of common utility programs, respectively. Theefficacy experiments indicate that TaintPipe is effectivein detecting a wide range of real-life software vulnera-bilities, analyzing malicious programs, and speeding upcryptography function detection with multi-tag propa-gation. Such experimental evidence demonstrates thatTaintPipe has potential to be employed by various appli-cations in production systems. The contributions of thispaper are summarized as follows:

• We propose a novel approach, TaintPipe, to effi-ciently decouple conventional inlined dynamic taintanalysis by pipelining symbolic taint analysis onsegments of straight-line code.• Unlike previous taint decoupling work, which suf-

fers from frequent communication and synchroniza-tion, we demonstrate that with very lightweight run-time value logging, TaintPipe rivals conventional in-lined dynamic taint analysis in precision.

• Our approach does not require any specific hard-ware support or offline preprocessing, so TaintPipeis able to work on commodity hardware instantly.• TaintPipe is naturally a multi-tag taint analysis

method. We demonstrate this capability by detect-ing cryptography functions in binary with little ad-ditional overhead.

The remainder of the paper is organized as fol-lows. Section 2 provides background information and anoverview of our approach. Section 3 and Section 4 de-scribe the details of the system design, online logging,and pipelined segmented symbolic taint analysis. Wepresent the evaluation and application of our approachin Section 5. We discuss a few limitations in Section 6.We then present related work in Section 7 and concludeour paper in Section 8.

2 Background

In this section, we discuss the background and contextinformation of the problem that TaintPipe seeks to solve.We start by comparing TaintPipe with the conventionalinlined taint analysis approaches, and we then presentthe differences between the previous “primary & sec-ondary” taint decoupling model and the pipelined decou-pling style in TaintPipe.

2.1 Inlined Analysis vs. TaintPipeFigure 1 (“Inlined DTA”) illustrates a typical dynamictaint analysis mechanism based on dynamic binary in-strumentation (DBI), in which the original program codeand taint tracking logic code are tightly coupled. Es-pecially, when dynamic taint analysis runs on the samecore, they compete for the CPU cycles, registers, andcache space, leading to significant performance slow-down. For example, “context switch” happens fre-quently between the original program instructions andtaint tracking instructions due to the starvation of CPUregisters. This means there will be a couple of instruc-tions, mostly inserted per program instruction, to saveand restore those register values to and from memory.At the same time, taint tracking instructions themselves(e.g., shadow memory mapping) are already complicatedenough. One taint shadow memory lookup operationnormally needs 6–8 extra instructions [11].

Our approach, analogous to the hardware pipelin-ing, decouples taint logic code to multiple spare cores.Figure 1 (“TaintPipe”) depicts TaintPipe’s framework,which consists of two concurrently running parts: 1) theinstrumented application thread performing lightweightonline logging and acting as the source of the pipeline;2) multiple worker threads as different stages of the

pipeline to perform symbolic taint analysis. Each hor-izontal bar with gray color indicates a working thread.We start online logging when the predefined taint seedsare introduced to the application. The collected profileis passed to a worker thread. Each worker thread con-structs a straight-line code segment and then performstaint analysis in parallel. In principle, fully parallelizingdynamic taint analysis is challenging because there arestrong serial data dependencies between the taint logiccode and application code [31]. To address this prob-lem, we propose segmented symbolic taint analysis in-side each worker thread whenever the explicit taint in-formation is not available, in which the taint state is sym-bolically calculated. The symbolic taint state will be up-dated later when the concrete data arrive. In addition tothe control flow profile, the explicit execution state whenthe taint seeds are introduced is recorded as well. Thepurpose is to reduce the number of fresh symbolic taintvariables.

We use a motivating example to introduce the ideaof segmented symbolic taint analysis. Figure 2 showsan example for symbolic taint analysis on a straight-line code segment, which is a simplified code snippet ofthe libtiff buffer overflow vulnerability (CVE-2013-4231). Assume when a worker thread starts taint anal-ysis on this code segment (Figure 2(a)), no taint statefor the input data (“size” and “num” in our case) is de-fined. Instead of waiting for the explicit information, wetreat the unknown values as taint symbols (symbol1 for“size” and symbol2 for “num”, respectively) and sum-marize the net effect of taint propagation in the segment.The symbolic taint states are shown in Figure 2(b). Whenthe explicit taint states are available, we resolve the sym-bolic taint states by replacing the taint symbols with theirreal taint tags or concrete values (Figure 2(c)). After that,we continue to perform concrete taint analysis like con-ventional DTA. Note that here we show pseudo-code forease of understanding, while TaintPipe works on binarycode.

Compared with inlined DTA, the application threadunder TaintPipe is mainly instrumented with control flowprofile logging code, which is quite lightweight. There-fore, TaintPipe results in much lower application runtimeoverhead. On the other hand, the execution of taint logiccode is decoupled to multiple pipeline stages running inparallel. The accumulated effect of TaintPipe’s pipelineleads to a substantial speedup on taint analysis.

2.2 “Primary & Secondary” Model

Some recent work [15, 17, 9] offloads taint logic codefrom the application (primary) thread to another shadow(secondary) thread and runs them on separate cores. Atthe same time, the primary thread communicates with

Taint seeds &

execution state

DBIConcrete taint

analysisApplication

Control flow

profiling

Resolving symbolic

taint state

Application speedup

Taint speedup

Inlined DTA

TaintPipe

Time

Threads

Symbolic taint

analysis

Figure 1: Inlined dynamic taint analysis vs. TaintPipe.

size = getc(infile);

A = -1;

B = size + 1;

C = (1 << size) - 1;

D = num & C;

(a) Code segment (b) Symbolic taint state (c) Resolving symbolic taint state

A

B

C

D

Output

0

symbol1 + 1

(1 << symbol1) – 1

symbol2 & ((1 << symbol1) – 1)

Taint state

A

B

C

D

Output

0

tag1 + 1

(1 << tag1) – 1

(1 << tag1) – 1

Taint state

Figure 2: An example of symbolic taint analysis on a code segment: (a) code segment; (b) symbolic taint states, theinput value size and num are labeled as symbol1 and symbol2, respectively; (c) resolving symbolic taint states whensize is tainted as tag1 and num is a constant value (num = 0xffffffff).

the secondary thread to convey the necessary informa-tion (e.g., the addresses of memory operations and con-trol transfer targets) for performing taint analysis. How-ever, this model suffers from frequent communicationbetween the primary and secondary thread. In principle,every memory address that is loaded or stored has to belogged and transferred. Due to the frequent synchroniza-tion with the primary thread and the extra instructionsto access shadow memory, taint logic execution in thesecondary thread is typically slower than the applicationexecution. As a result, the delay for each taint operationcould be accumulated, leading to an delay proportionalto the original execution. ShadowReplica [17] partiallyaddresses this drawback by performing advanced offlinestatic optimizations on the taint logic code to reduce theruntime overhead. However, in many security analysisscenarios, precise static analysis and optimizations overtaint logic code are not feasible, e.g., reverse engineer-ing and malware forensics. In such cases, program staticfeatures such as control flow graphs are possibly obfus-cated.

In TaintPipe, we record compact control flow infor-mation to reconstruct straight-line code, in which all the

targets of direct and indirect jumps have been resolved.However, we do not record or transfer the addresses ofmemory operations. Our key observation is that mostaddresses of memory operations can be inferred fromthe straight-line code. For example, if a basic block isended with an indirect jump instruction jmp eax, wecan quickly know the value of eax from the straight-linecode. In this way, all the other memory indirect accesscalculated through eax (before it is updated) can be de-termined. For instance, we can infer the memory loadaddress for the instruction: mov ebx, [4*eax + 16].Even when the index of a memory lookup is a symbol,with the taint states and path predicates of the straight-line code, we can often narrow down the symbolic mem-ory addresses to a small range in most cases.

Since TaintPipe’s data communication is lightweight,TaintPipe can achieve nearly constant delay givenenough number of worker threads. The upper limit num-ber of worker threads is also bounded, which equalsroughly the ratio of the taint analysis execution time overthe application thread execution time for each segment.

Due to TaintPipe’s pipelining design, it is possible thatTaintPipe may detect an attack some time after the real

attack has happened. However, this trade-off does notprevent TaintPipe from practically supporting a broad va-riety of security applications, such as attack forensic in-vestigation and post-fact intrusion detection, which donot require strict runtime security enforcement. It isworth noting that different from ShadowReplica, Taint-Pipe does not depend on extensive static analysis to re-duce data communication. Therefore, TaintPipe has awider range of applications in speeding up analyzing ob-fuscated binaries, as static analysis of obfuscated bina-ries is of great challenge.

3 Design

3.1 Architecture

Figure 3 illustrates the architecture of TaintPipe. Wehave built the pipelining framework on top of a dynamicbinary instrumentation tool, enabling TaintPipe to workwith unmodified program binaries. The steps followedby TaintPipe for pipelining taint analysis are:

1. TaintPipe takes in a binary along with the taint seedsas input. The instrumented application thread startsexecution with lightweight online logging for con-trol flow and other information (Section 4.1.1).

2. Then the instrumented program is executed togetherwith a multithreaded logging tool to efficiently de-liver the logged data to memory (Section 4.1.2).

3. When the profile buffer becomes full, a taint analy-sis engine will be invoked for online pipelined taintanalysis (Section 4.2.1).

4. The generated log data are then used to constructstraight-line code, which helps to solve many pre-cision loss problems in static taint analysis. Inthis stage, we generate a segment of executed codeblocks for each logged data buffer. The memoryaddresses that are accessed through indirect jumptargets are also resolved (Section 4.2.2).

5. The taint analysis engine will further translatestraight-line code to taint operations, which avoidprecision loss and support both multi-tag and bit-level taint analysis (Section 4.2.3).

6. With the constructed taint operations, TaintPipe per-forms pipelined symbolic taint analysis. When athread finishes taint analysis with an explicit taintstate, it synchronizes with its following thread to re-solve the symbolic taint state (Section 4.2.4).

3.2 Segmented Symbolic Taint AnalysisIn this section, we analyze symbolic taint analysis froma theoretical point of view to justify the correctness ofour pipelining scheme. In order to formalize segmentedsymbolic taint analysis, we use the following notations:

1. Let σ denote a taint state, which maps variables totheir taint tags.

2. Let A (σ ,S) denote a symbolic taint analysis A ona straight-line code segment S, with an initial taintstate σ . We use Aσ (S) for convenience.

Note that the straight-line code segment S has no con-trol transfer statement. Conceptually, S only contains onetype of statements, namely assignment statements. Ofcourse, from the implementation point of view, there maybe other types of statements, but they can all be regardedas assignment statements. For example, as we will showin Section 4.2.3, our taint operations contain assignmentoperations, laundering operations, and arithmetic opera-tions. The latter two operations can be derived from taintassignment operations.

Based on the semantics of assignment statements, wedefine symbolic taint analysis for an assignment state-ment as follows:

Aσ (x := e) = σ [x 7→ et ] (1)

where et denotes the taint tag of e, and [·] is the taint stateupdate operator. If x is a new variable, the taint stateσ is extended with a new mapping from x to its taint.If x occurs in the taint state σ , for the variables in thedomain of σ whose symbolic taint expressions dependon x, their symbolic taint expressions will be updated orrecomputed with the new taint value of x.

Assume σ1 = Aσ (i1) for a statement i, then the sym-bolic taint analysis for two sequential statements i1; i2 is:

Aσ (i1; i2) = Aσ1(i2) (2)

Assume straight-line code segment S1 = (i1;S′1). Wecan then deduce the symbolic taint analysis on two se-quential segments S1;S2 as follows:

Aσ (S1;S2)

=Aσ ((i1;S′1);S2)

=Aσ (i1;(S′1;S2))

= · · ·=AAσ (S1)(S2)

(3)

That is, given Aσ (S1) = σ1 and Aε(S2) = σ2, where ε isan empty taint state, Eq. 3 leads to:

Aσ (S1;S2) = σ2[σ1] (4)

Instrumented Program

ExecutionLogged

Control FlowMultithreaded Logging Tool Straight-line

Code Construciton

Pipelined Symbolic Taint

Analysis

Figure 3: Architecture.

Here, we misuse the taint state update operator [·] andapply it to a taint state map, instead of a single taint vari-able update. With Eq. 4, we can perform segmented taintanalysis in parallel or in a pipeline style. For two seg-ments S1;S2, assume the starting taint state is σ0. Westart two threads, one compute Aσ0(S1) and the othercomputes Aε(S2), where ε is an empty taint state. As-sume the result of the first thread analysis is σ1 and theresult of the second is σ2. The symbolic taint analysis ofS1;S2 is σ2[σ1], that is, the right hand side of Eq. 4. Eq. 4forms the foundation of our segmented taint analysis in apipeline style.

4 Implementation

To demonstrate the efficacy of TaintPipe, we have devel-oped a prototype on top of the dynamic binary instru-mentation framework Pin [23] (version 2.12) and the bi-nary analysis platform BAP [5] (version 0.8). The on-line logging and pipelining framework are implementedas Pin tools, using about 3,100 lines of C/C++ code. Thetaint operation constructors are built on BAP IL (inter-mediate language). TaintPipe’s taint analysis engine isbased on BAP’s symbolic execution module, using about4,400 lines of Ocaml and running concurrently with Pintools. We utilize Ocaml’s functor polymorphism so thattaint states can be instantiated in either concrete or sym-bolic style. All of the functionality implemented in taintanalysis engine are wrapped as function calls. To sup-port communication between Pin tools and taint analysisengine, we develop a lightweight RPC interface so thateach worker thread can directly call Ocaml code. Thesaving and loading of the taint cache lookup table is im-plemented using the Ocaml Marshal API, which encodesIL expressions as sequences of compact bytes and thenstores them in a disk file.

Dynamic binary instrumentation tools tend to inlinecompact and branch-less code to the final translatedcode. For the code with conditional branches, DBI emitsa function call instead, which introduces additional over-head. Therefore, we carefully design our instrumenta-tion code to favor DBI’s code inlining. To fully reduceonline logging overhead, we also utilize Pin-specificoptimizations. We leverage Pin’s fast buffering APIsfor efficient data buffering. For example, the inlinedINS InsertFillBuffer() writes the control flow pro-

file directly to the given buffer; the callback functionregistered in PIN DefineTraceBuffer() processes thebuffer when it becomes full or thread exits. Besides,we force Pin to use the fastcall x86 calling conventionto avoid emitting stack-based parameter loading instruc-tions (i.e., push and pop). Currently Pin-tools do not sup-port the Pthreads library. Thus we employ Pin ThreadAPI to spawn multiple worker threads. We also im-plement a counting semaphore based on Pin’s lockingprimitives to assist thread synchronization. Addition-ally, TaintPipe can be extended to support multithreadedapplications with no difficulty by assigning one taintpipeline for each application thread.

4.1 LoggingTaintPipe’s pipeline stages consist of multiple threads.The thread of instrumented application (producer) servesas the source of pipeline, and a number of Pin inter-nal threads act as worker threads to perform symbolictaint analysis on the data collected from the applica-tion thread. Note that unlike application threads, workerthreads are not JITed and therefore execute natively. Oneof the major drawbacks of previous dynamic taint anal-ysis decoupling approaches is the large amount of in-formation collected in the application thread and thehigh overhead of communication between the applica-tion thread and analysis thread. To address these chal-lenges, TaintPipe performs lightweight online logging torecord information required for pipelined taint analysis.The logged data comprise control flow profile and theconcrete execution state when taint seeds are first intro-duced, which is the starting point of our pipelined taintanalysis. The initial execution state, consisting of con-crete context of registers and memory, (e.g., CR0∼CR4,EFLAGS and addresses of initial taint seeds), is used toreduce the number of fresh symbolic taint variables.

We take major two steps to reduce the applicationthread slowdown: First, we adopt a compact profilestructure so that the profile buffer contains logged dataas much as possible, and it is quite simple to recoverthe entry address of each basic block as well. Second,we apply the “one producer, multiple consumers” modeland N-way buffering scheme to process full buffers asyn-chronously, which allows application to continue execu-tion while pipelined taint analysis works in parallel. Wewill discuss each step in the following sub-sections.

4.1.1 Lightweight Online Logging

Besides the initial execution state when the taint seedsare introduced, TaintPipe collects control flow informa-tion, which is represented as a sequence of basic blocksexecuted. Conceptually, we can use a single bit to recordthe direction of conditional jump [29], which leads toa much more compact profile. However, reconstructionstraight-line code from 1 bit profile is more complicatedto make it fit for offline analysis. Zhao et al. [47] pro-posed Detailed Execution Profile (DEP), a 2-byte profilestructure to represent 4-byte basic block address on x86-32 machine. In DEP, a 4-byte address is divided intotwo parts: H-tag for the 2 high bytes and L-tag for 2low bytes. If two successive basic blocks have the sameH-tag, only L-tag of each basic block enters the profilebuffer; otherwise a special tag 0x0000 followed by thenew H-tag will be logged into the buffer.

We extend DEP’s scheme to support REP-prefix in-structions. A number of x86 instructions related to stringoperations (e.g., MOVS, LODS) with REP-prefix are exe-cuted repeatedly until the counter register (ecx) countsdown to 0. Dynamic binary instrumentation tools [23, 4]normally treat a REP-prefixed instruction as an implicitloop and generate a single instruction basic block in eachiteration. In our evaluation, there are several cases thatunrolling such REP-prefix instructions would be a perfor-mance bottleneck. We address this problem by addingadditional escape tags to represent such implicit loops.Figure 4 presents an example of the control flow profilewe adopted. The left part shows a segment of straight-line code containing 1028 basic blocks, and 1024 outof them are due to REP-prefixed instruction repetitions.Our profile (the right side of Figure 4) encodes such casewith two consecutive escape tags (0xffff), followed bythe number of iterations (0x0400).

We note that it is usually unnecessary to turn on thelogging all the time. For example, when applicationstarts executing, many functions are only used duringloading. At that time, no sensitive taint seed is intro-duced. Therefore we perform on-demand logging torecord control flow profile when necessary. As applica-tion starts running, we only instrument limited functionsto inspect the various input channels that taint could beintroduced into the application (taint seeds). Such taintseeds include standard input, files and network I/O. Be-sides, users can customize other values as taint seeds,such as function return values or parameters. When thepre-defined taint seeds are triggered, we turn on the con-trol flow profile logging. At the same time, we save thecurrent execution state to be used in the pipelined taintanalysis. Many well-known library functions have ex-plicit semantics, which facilitates us to selectively turnoff logging inside these functions and propagate taint

0x804fffa

BB_2

0x8050004

BB_3

0x8050016

BB_4

rep movsd

0xfffa

0x0000

0x0805

0x8050030

BB_1028

0x0004

0x0016

0xffff

0xffff

0x0400

4-byte tag

profile

0x8050028

rep movsd

0x804fff0

BB_1

0xfff0

...

rep movsd

0x0030

2-byte tag

profile

Repetition

Count: 1024

Figure 4: An example of 2-byte tag profile.

correspondingly at function level. We will discuss thisissue further in Section 4.2.3.

4.1.2 N-way Buffering Scheme

Since TaintPipe’s online logging is lightweight, appli-cation (producer) thread’s execution speed is typicallyfaster than the processing speed of worker threads.To mitigate this bottleneck, we employed “one pro-ducer, multiple consumers” model and N-way bufferingscheme [46]. At the center of our design is a thread pool,which is subdivided into n linked buffers, and the pro-ducer thread and multiple worker threads work on differ-ent buffers simultaneously. More specifically, when theinstrumented application thread starts running, we firstallocate n linked empty buffers (n> 1). At the same time,n Pin internal threads (worker threads) are spawned.Each worker thread is bound to one buffer and communi-cates with the application thread via semaphores. Whena buffer becomes full, the application thread will releasethe full buffer to its corresponding worker thread andthen continue to fill in the next available empty buffer.Given a full profile buffers, a worker thread will send it toa taint analysis engine to perform concrete/symbolic taintanalysis in parallel. After that, the worker thread will re-lease the profile buffer back to the application thread andwait for processing the next full buffer.

It is apparent that the availability of unused worker

threads and the size of profile buffer will affect over-all performance of TaintPipe (both application executiontime and pipelined taint analysis) significantly. In Sec-tion 5.1, we will conduct a series of experiments to findthe optimal values for these two factors.

4.2 Symbolic Taint Analysis4.2.1 Taint Analysis Engine

When the application thread releases a full profile buffer,a worker thread is waked up to capture the profile bufferand then communicates with a taint analysis engine forpipelined taint analysis. The taint analysis engine willfirst convert the control flow profile to a segment ofstraight-line intermediate language (IL) code and thentranslates the IL code to even simpler taint logic oper-ations. The translations are cached for efficiency at taintbasic block level. The key components of taint analysisengine are illustrated in Figure 5.

The core of TaintPipe’s taint analysis engine is an ab-stract taint analysis processor, which simulates a segmentof taint operations and updates the taint states accord-ingly. The taint state structure contains two contexts: vir-tual registers keeping track of symbolic taint tags for reg-ister, and taint symbolic memory for symbolic taint tagsin memory. The taint symbolic memory design is like thetwo-level page table structure and each page of memoryconsists of symbolic taint formulas rather than concretevalues. After the initialization of the symbolic taint in-puts, the engines perform taint analysis either concretelyor symbolically in a pipeline style.

4.2.2 Straight-line Code Construction

Given the control flow profiles, recovering each basicblock’s H-tag and L-tag is quite straightforward. A basicblock’s entry address is the concatenation of its corre-sponding H-tag and L-tag [47]. The taint analysis engineshould only execute the instructions required for taintpropagation. Otherwise, the work thread may run muchslower than the application thread. On the other side, dueto the cumbersome x86 ISA, precisely propagating taintfor the complex x86 instructions is an arduous work, es-pecially for some instructions with side effect of condi-tional taint (e.g., CMOV). To achieve these two goals, wefirst extract the x86 instructions sequence from the ap-plication binary and then lift them to BIL [5], a RISC-like intermediate language. Since we know exactly theexecution sequence, the sequence is a straight-line code.We have removed all the direct and indirect control trans-fer instructions and substituted them with control transfertarget assertion statements.

After resolving an indirect control transfer, we go onestep further to determine all the memory operation ad-

Figure 6: A path predicate constrains symbolic memoryaccess within the boundary of 7 < i < 10.

dresses which depend on this indirect control transfer tar-get. For example, after we know the target of jmp eax,we continue to trace the use-def chain of eax for eachmemory load or store operation whose address is calcu-lated through this eax. With the initial execution state(containing addresses of taint seeds) and indirect controltransfer target resolving, we are able to decide most ofthe memory operation addresses.

For some applications such as word processing, a sym-bolic taint input may be used as a memory lookup index.Without any constraint, a symbolic memory index couldpoint to any memory cell. Inspired by the index-basedmemory model proposed by Cha et al. [8], we attempt tonarrow down the symbolic memory accesses to a smallrange with symbolic taint states and path predicates. Wefirst leverage value set analysis [2] to limit the range ofa symbolic memory access and then refine the range byquerying a constraint solver. The path predicate alongthe straight-line code usually limits the scope of sym-bolic memory access. Figure 6 shows such an examplewhere the path predicate restricts the symbolic memoryindex i within a range such that 7 < i < 10. When propa-gating a taint tag to the memory cell referenced by i, weconservatively taint all the possible memory slots, thatis, A[8] and A[9] in Figure 6 will be tainted as tag1. InSection 5.3, we will demonstrate that our symbolic mem-ory index solution only introduces marginal side effects.

4.2.3 Taint Operation Generation

Based on BIL statements, we construct taint operations.Taint operations inside a basic block are formed as “taintbasic block” [37], which are cached for efficiency. Tomake the best of cache effect, we merge the basic blockswith only one predecessor and one successor. Since BILexplicitly reveals the side effect of intricate x86 instruc-tions, it is easy to perform intra-block optimizations toget rid of redundant taint operations. Therefore, our taint

Control flow

profileTaint analysis engine

Taint operations constructor

Taint seeds +

abstract stateAbstract taint

analysis processor

Taint state

Taint symbolic memory

Virtual registers

Worker threads

Taint basic

block cache

Figure 5: The structure of taint analysis engine.

operations are simple and accurate. Currently our taintoperations mainly consist of three types of operations fortracking taint data:

1. Assignment operations: The operations in this cate-gory are involved in copying values between reg-isters and registers/memory. We simply assignthe taint tag of source operand to the destinationoperand.

2. Laundering operations: The operations are used toclean the taint tag of the destination operand. Forexample, xor eax,eax will clean the taint result ofeax. We identify all laundering operations in taintbasic blocks and substitute them with assignmentoperations.

3. Arithmetic and logic operations: This category ofoperations are the most difficult to handle. We emu-late arithmetic and logic operations on the taint tagsto capture their real semantics.

Figure 7 presents an example of taint operations for abasic block. TaintPipe’s symbolic taint operations out-perform conventional DTA approaches in three ways.First and foremost, multi-tag taint analysis is straight-forward for TaintPipe. Each symbolic variable can nat-urally represent a taint tag (see Line 1 and Line 2 inFigure 7). Second, previous DTA tools mostly adopteda “short circuiting” method to handle arithmetic opera-tions, that is, the destination is tainted if at least one ofthe source operands is tainted regardless of the real se-mantics. However, in many scenarios, it will lead toprecision loss. Check the code at Line 4 and 5 in Fig-ure 7, value d will always be zero since b is the nega-tion of a. Unfortunately, some previous work may la-bel d as tainted incorrectly [41]. Third, different fromrelated work [20, 17], TaintPipe supports bit-level taintfor EFLAGS register, representing whether a bit of theEFLAGS is tainted or not due to side effects. Recentwork has demonstrated the value of bit-level taint in bi-nary code de-obfuscation [42].

int a, b, c, d;

1: a = read ();

2: c = read ();

3: c = c xor c;

4: b = ~ a;

5: d = a & b;

(a) a basic block (b) a taint basic block

1: Taint (a) = tag1;

2: Taint (c) = tag2;

3: Taint (c) = 0;

4: Taint (b) = ~ tag1;

5: Taint (d) = 0;

Figure 7: Example: taint operations.

Table 1: Function summary.Category Function

No tainting strcmp, strncmp, memcmp, strlen, strchr,strstr, strpbrk, strcspn, qsort, rand,

time, clock, ctimeFunction level strcat, strncat, strcpy, strncpy, memcpy,

memmove, strtok, atoi, itoa, abstolower, toupper

Another major optimization we adopt is so called“function summary”. As many well-known library func-tions have explicit semantics (e.g., atoi, strlen), we gener-ate a summary of each function and propagate taint cor-respondingly at function level. Table 1 lists two typesof function summary TaintPipe supports: 1) Functionswithin “no tainting” category do not have any side effecton taint state. We can safely turn off logging when exe-cuting them. 2) Some functions do propagate taint froman input parameter to output. We still turn off loggingand update taint state correspondingly when these func-tions return.

4.2.4 Symbolic Taint State Resolution

In TaintPipe’s pipeline framework, a worker thread mayperform taint analysis concretely or symbolically in par-allel. When a worker thread completes taint analysis withconcrete taint tags, the final taint state it maintains is de-terministic. Then it synchronizes with the subsequentworker thread to resolve the symbolic taint state main-

tained by the latter. Taint states are allocated in a sharedmemory area so that multiple threads can access themeasily. Basically, given the concrete taint state at the be-ginning of a code segment, we replace a symbolic tainttag with the appropriate starting value (either a taint tagor a concrete value). We further update all the symbolicformula containing that symbolic taint tag. For example,the logic AND formula in Figure 2(b) will be simplifiedto a single taint tag. After that, the subsequent threadswitches to the concrete taint analysis and continue pro-cessing the left segment code.

In this order, the taint states of each segment will beresolved and updated one by one. The defined taint pol-icy (e.g., a function return value should not be tainted)is checked along the concrete taint analysis as well. Atainted sink is identified if it contains a symbolic for-mula; multiple tags are determined by counting the num-ber of different symbols in the formula. Note that a pre-vious hardware-assisted approach [31] utilized a separate“master” processor to update each segment’s taint statussequentially. However, as pointed out by the paper, whenthere are more than a few “worker” processors, the mas-ter processor will become the bottleneck. Our approachamortizes the workload of the master processor to eachworker thread.

5 Experimental Evaluation

We conducted experiments with several goals in mind.First, we wanted to choose optimal values for two factorsthat may affect TaintPipe’s performance, namely controlflow profile buffer size and the number of worker threads.Then we studied overall runtime overhead when runningTaintPipe on the SPEC2006 int benchmarks and a num-ber of common utilities. We also compared TaintPipewith a highly optimized inline dynamic taint analysistool. At the same time, we wanted to make sure TaintPipeis effective in speeding up various security analysis tasksand can compete with conventional inlined dynamic taintanalysis in precision. To this end, we demonstrated threecompelling applications: 1) detecting software attacks;2) tracking information flow in obfuscated malicious pro-grams; and 3) identifying cryptography functions withmulti-tag propagation.

5.1 Experiment Setup

Our experiment platform contains two Intel Xeon E5-2690 processors, 128GB of memory and a 250GB solidstate drive, running Ubuntu12.04. Each processor isequipped with 8 2.9GHz cores, 16 hyper threads and20MB L3 cache. The performance data reported in thissection are all mean values with 5 repetitions.

2 4 6 8 1 0 1 2 1 41 6

1 82 0

48

1 21 62 02 42 83 23 64 0

81 6

3 26 4

Slowd

ow (n

ormaliz

ed ru

ntime

)

P r o f i l e b u f f e r s i z e ( M B )

# W o r k e r t h r e a d s

Figure 8: Optimal buffer size and number of workerthreads.

TaintPipe’s performance is affected by the size of theprofile buffers and the number of worker threads. Gener-ally, the more worker threads and larger profile buffers,the less possibility that an application is suspended towait for a free buffer. On the other hand, our taint anal-ysis engine has to take longer time to process largersegment code. We conducted a series of tests with theSPEC CPU2006 int benchmarks, under different settingsof these two variables. We dynamically adjust the num-ber of worker threads from 2 to 20 (2, 4, 6, 8, 12, 16 and20), and profile size from 8MB to 64MB (8MB, 16MB,32MB and 64MB). Figure 8 displays the experimentalresults. Roughly, as number of worker threads and buffersize increase, the application slowdown reduces. Thatis mainly because large buffer sizes allow applicationthread continue to fill up and worker threads spend lesstime on synchronization. After a certain point (buffersize≥ 32MB and number of worker threads > 16), over-head increases slightly. Two factors prevent TaintPipefrom achieving more speedup. First, taint analysis en-gine slows down when processing large code segment.Second, more worker threads introduce larger communi-cation latency when resolving symbolic taint states. Ac-cording to the results, we set the two factors as their op-timal values (32MB buffer size and 16 worker threads),which will be used in the following experiments.

5.2 PerformanceTo evaluate the performance gains achieved by pipelin-ing taint logic, we compared TaintPipe with a state-of-the-art tool, libdft [20], which performs inlined dynamictaint analysis based on Pin (“libdft” bar). In addition, wedeveloped a simple tool to measure the slowdown im-

0123456789

1 01 11 21 31 41 51 61 71 81 92 0

a v e r a g ex a l a n c b m m k

a s t a ro m n e t p p

h 2 6 4 r e fl i b q u a n t u m

s j e n gh m m e r

g o b m km c fg c cb z i p 2

p e r l b e n c h

Slowd

own (

norm

alized

runti

me)

n u l l p i n l i b d f t T a i n t P i p e - a p p l i c a t i o n T a i n t P i p e - o v e r a l l

Figure 9: Slowdown on SPEC CPU2006.

posed exclusively by TaintPipe. It runs a program underPin without any form of analysis (“nullpin” bar). The“TaintPipe - application” bar represents the running timeof instrumented application thread alone, and “TaintPipe- overall” corresponds to the overall overhead when boththe application thread and pipelined worker threads arerunning. The major reason we reported “TaintPipe -application” and “TaintPipe - overall” time separatelyis to show the two improvements, namely “Applicationspeedup” and “Taint speedup” (see Figure 1). Since theapplication thread typically runs faster than the workerthreads, the “TaintPipe - overall” time is actually dom-inated by the worker threads. Therefore, usually the“TaintPipe - overall” time represents the relative timespent by worker threads as well. The times reported inthis section are all normalized to native execution, thatis, application running time without dynamic binary in-strumentation.

SPEC CPU2006. Figure 9 shows the normalized ex-ecution times when running the SPEC CPU2006 intbenchmark suite under TaintPipe. On average, the in-strumented application thread enforces a 2.60X slow-down to native execution, while the overall slowdown ofTaintPipe is 4.14X. If we take Pin’s environment run-time overhead (“nullpin” bar) as the baseline, we can seeTaintPipe imposes 2.67X slowdown (“TaintPipe - over-all” / “nullpin”) and libdft introduces 6.4X slowdown—this number is coincident to the observation that prop-agating a taint tag normally requires extra 6–8 instruc-tions [30, 11]. In summary, TaintPipe outperforms in-lined dynamic taint analysis drastically: 2.38X fasterthan the inlined dynamic taint analysis, and 3.79X fasterin terms of application execution. In the best case

0123456789

1 01 11 21 31 4

t a ra r c h i v e

b z i p 2d e c o m p r e s s

s c p1 G b p s

t a ru n t a r

g z i pc o m p r e s s

b z i p 2c o m p r e s s

a v e r a g e g z i pd e c o m p r e s s

n u l l p i n l i b d f t T a i n t P i p e - a p p l i c a t i o n T a i n t P i p e - o v e r a l l

Slowd

own (

norm

alized

runti

me)

Figure 10: Slowdown on common Linux utilities.

(h264ref ), the application speedup under TaintPipe ex-ceeds 4.18X.

Utilities. We also evaluated TaintPipe on four commonLinux utilities, which were not chosen randomly. Thesefour utilities represent three kinds of workloads: I/Obounded (tar), CPU bounded (bzip2 and gzip), andthe case in-between (scp). We applied tar to archiveand extract the GNU Core utilities package (version8.13) (∼50MB), then we employed bzip2 and gzip tocompress and decompress the archive file. Finally weutilized scp to copy the archive file over a 1Gbps link.As shown in Figure 10, TaintPipe reduced slowdown ofdynamic taint analysis from 7.88X to 3.24X, by a factorof 2.43 on average.

Effects of Optimizations. In this experiment, wequantify the effects of taint logic optimizations we pre-sented in Section 4.2, which are paramount for optimizedTaintPipe performance. Figure 11 shows the impact ofthese optimizations when applied cumulatively on SPECCPU2006 and the set of common utilities. The “un-opt” bar approximates an un-optimized TaintPipe, whichdoes not adopt any optimization method. The “O1” barindicates the optimization of function summary, reduc-ing application slowdown notably by 26.6% for SPECCPU2006 and 25.0% for the common utilities. The “O2”bar captures the effect of taint basic block cache, leadingto a further reduction by 19.0% and 22.9% for SPEC andutilities, respectively. Intra-block optimizations, denotedby “O3”, offer further improvement, 12.0% with SPECand 11.6% with the utilities).

Table 2: Tested software vulnerabilities.

Program Vulnerability CVE ID # Taint Byteslibdft Temu TaintPipe

Nginx Validation Bypass CVE-2013-4547 45 45 45Micro httpd Validation Bypass CVE-2014-4927 80 85 80Tiny Server Validation Bypass CVE-2012-1783 125 126 125Regcomp Validation Bypass CVE-2010-4052 1,124 1,148 1,180Libpng Denial Of Service CVE-2014-0333 72 72 72Gzip Integer Underflow CVE-2010-0001 94 112 96Grep Integer Overflows CVE-2012-5667 608 682 653

Coreutils Buffer Overflow CVE-2013-0221 252 260 256Libtiff Buffer Overflow CVE-2013-4231 268 286 290

WaveSurfer Buffer Overflow CVE-2012-6303 384 418 406Boa Information Leak CVE-2009-4496 164 164 164

Thttpd Information Leak CVE-2009-4491 328 328 328

0

1

2

3

4

5

6

7

8

9

Slowd

own (

norm

alized

runti

me)

U n o p t O 1 O 2 O 3

S P E C C P U 2 0 0 6 C o m m o n u t i l i t i e s

Figure 11: The impact of optimizations to speed upTaintPipe when applied cumulatively: O1 (function sum-mary), O2 (O1 + taint basic block cache), O3 (O2 + intra-block optimizations).

5.3 Security Applications

Software Attack Detection. One important applica-tion of taint analysis is to define taint policies, and en-sure they are not violated during taint propagation. Wetested TaintPipe with 12 recent software exploits listedin Table 2, which covers a wide range of real-life soft-ware vulnerabilities. For example, the vulnerabilities innginx, micro httpd, and tiny server allow remoteattackers to bypass input validation and crash the pro-gram. The libtiff buffer overflow vulnerability leadsto an out of bounds loop limit via a malformed gif image.Both boa and thttpd write data to a log file without san-itizing non-printable characters, which may be exploitedto execute arbitrary commands. Since we have detailed

Table 3: Malware samples and taint graphs.

Sample TypeTaint Graph Control Flow

Node # Edge # ObfuscationSvat Virus 90 62RST Virus 154 82Agent Rootkit 624 402 XKeyLogger Trojan 554 368 XSubsevux Backdoor 1648 764 XTsunami Backdoor 734 534 XKeitan Backdoor 618 482 XFireback Backdoor 1038 620 X

vulnerability reports, we can easily mark the locations oftaint sinks in the straight-line code and set correspondingtaint policies.

In our evaluation, TaintPipe did not generate any falsepositives and successfully identified taint policy viola-tions while incurring only small overhead. At the sametime, we evaluated the accuracy of TaintPipe. To thisend, we counted the total number of tainted bytes in thetaint state when taint analysis hit the taint sinks. Col-umn 4 ∼ 6 of Table 2 show the number of taint byteswhen running libdft, Temu [44] and TaintPipe, respec-tively. Compared with the inlined dynamic taint anal-ysis tools (libdft and Temu), TaintPipe’s symbolic taintanalysis achieves almost the same results in 8 cases andintroduces only a few additional taint bytes in the other4 cases. We attribute this to our conservative approachto handling of symbolic memory indices. The evalu-ation data show that TaintPipe does not result in over-tainting [32] and rivals the inlined dynamic taint analysisat the same level of precision.

Table 4: Cryptographic function detection time.Algorithm TaintPipe (s) Temu (s)

TEA 3.8 (<1.1X) 15.2 (2.2X)AES-CBC 12.3 (1.2X) 125.6 (3.8X)Blowfish 4.5 (<1.1X) 21.4 (2.5X)

MD5 7.4 (<1.1X) 35.1 (2.6X)SHA-1 8.8 (1.1X) 40.2 (3.3X)

Generating Taint Graphs for Malware. We ran 8malware samples collected from VX Heavens1 withTaintPipe.2 Similar to Panorama [43], we tracked infor-mation flow and generated a taint graph for each sam-ple. In a taint graph, nodes represent taint seeds or in-structions operating on taint data, and a directed edgeindicates an explicit data flow dependency between twonodes. Taint graph faithfully describes intrinsic mali-cious intents, which can be used as malware specificationto detect suspicious samples [12]. The statistics of ourtesting results are presented in Table 3. It is worth notingthat 6 out of 8 malware samples are applied with vari-ous control flow obfuscation methods (the fifth column),such as opaque predicates, control flow flattening, obfus-cated control transfer targets, and call stack tampering.As a result, the control flow graphs are heavily cluttered.For example, malware samples Keitan and Fireback

have a relatively high ratio of indirect jumps (e.g., jmpeax). Typically it is hard to precisely infer the destina-tion of an indirect jump statically. Thus, the taint logicoptimization methods that rely on accurate control flowgraph [17, 18] will fail. In contrast, our approach doesnot rely on control flow graph and therefore we analyzedthese obfuscated malware samples smoothly.

Cryptography Function Detection. Malware authorsoften use cryptography algorithms to encrypt maliciouscode, sensitive data, and communication. Detectingcryptography functions facilitates malware analysis andforensics. Recent work explored the avalanche effectto quickly identify possible cryptography functions byobserving the input-output dependency with multi-tagtaint analysis. That is, each byte in the encrypted mes-sage is dependent on almost all bytes of input data orkey [7, 21, 48]. However, multi-tag dynamic taint anal-ysis normally has to sacrifice more shadow memory andimposes much higher runtime overhead than single-tagdynamic taint analysis. Recall that multi-tag propagationis handled transparently in TaintPipe. In this experiment,we applied TaintPipe to detect such avalanche effects inbinary code. We utilized the test case suite of Crypto++

1http://vxheaven.org2All these 8 samples are not packed. To analyze packed binaries, we

can start TaintPipe when the unpacking procedure arrives at the originalentry point.

library3 and tested 5 cryptography algorithms. Each byteof the plain messages was labeled as a different taint tag.We compared TaintPipe with Temu [44], which supportsmultiple byte-to-byte taint propagation as well.4 The de-tection time is shown in Table 4. We also reported the ra-tio of multi-tag’s running time to single-tag’s. The resultsshow that TaintPipe is able to detect cryptographic func-tions with little additional overhead (less than 1.1X onaverage), while Temu’s multi-tag propagation imposes asignificant slowdown (2.9X to single-tag propagation onaverage).

6 Discussions and Limitations

Since TaintPipe’s pipelining design leads to an asyn-chronous taint check, TaintPipe may detect a violationof taint policy after the real attack happens. One possiblesolution is to provide synchronous policy enforcement atcritical points (e.g., indirect jump and system call sites).In that case, we can explicitly suspend the applicationthread, and wait for the worker threads to complete. Ourcurrent design spawns worker threads in the same pro-cess of running both Pin and the application. In the fu-ture, we plan to replace the worker threads with differentprocesses to increase isolation.

As TaintPipe may perform symbolic taint analysiswhen explicit taint states are not available, TaintPipe ex-hibits similar limitations as symbolic execution of bi-naries. Recent work MAYHEM [8] proposes an ad-vanced index-based memory model to deal with sym-bolic memory index. We plan to extend our symbolicmemory index handling in the future. TaintPipe recoversthe straight-line code by logging basic block entry ad-dress. However, with malicious self-modifying code, theentry address may not uniquely identify a code block. Toaddress this issue, we can augment TaintPipe by loggingthe real executed instructions at the expense of runtimeperformance overhead.

Our focus is to demonstrate the feasibility of pipelinedsymbolic taint analysis. We have not fully optimizedthe symbolic taint analysis part which we believe canbe greatly improved in terms of performance based onour current prototype. As our taint analysis engine simu-lates the semantics of taint operations, the speed of taintanalysis is slow. One future direction is to execute con-crete taint analysis natively like micro execution [16] andswitch to the interpretation-style when performing sym-bolic taint analysis. Currently TaintPipe requires largeshare memory to reduce communication overhead be-tween different pipeline stages. Therefore, our approachis more suitable for large servers with sufficient memory.

3http://www.cryptopp.com/4libdft does not support multi-tag taint analysis.

7 Related Work

In this section we first present previous work on staticand dynamic taint analysis. Our work is a hybrid ofthese two analyses. Then we introduce previous effortson taint logic code optimization, which benefits our taintoperation generation. Finally, we describe recent workon decoupling taint tracking logic from original programexecution, which is the closest to TaintPipe’s method.

Static and Dynamic Taint Analysis. Since static taintanalysis (STA) is performed prior to execution by con-sidering all possible execution paths, it does not affectapplication runtime performance. STA has been appliedto data lifetime analysis for Android applications [1], ex-ploit code detection [36], and binary vulnerability test-ing [28]. Dynamic taint analysis (DTA) is more pre-cise than static taint analysis as it only propagates taintfollowing the real path taken at run time. DTA hasbeen widely used in various security applications, includ-ing data flow policy enforcement [25, 40, 27], revers-ing protocol data structures [33, 38, 6], malware anal-ysis [39] and Android security [14]. However, an in-trinsic limitation of DTA is its significant performanceslowdown. Schwartz et al. [32] formally defined the op-erational semantics for DTA and forward symbolic exe-cution (FSE). Our approach is in fact a hybrid of thesetechniques. Worker thread conducts concrete taint anal-ysis (like DTA) whenever explicit taint information isavailable; otherwise symbolic taint analysis (like STAand FSE) is performed.

Taint Logic Optimization. Taint logic code, decidingwhether and how to propagate taint, require additional in-structions and “context switches”. Frequently executingtaint logic code incurs substantial overhead. Minemu [3]achieved a decent runtime performance at the cost of sac-rificing memory space to speed up shadow memory ac-cess. Moreover, Minemu utilized spare SSE registers toalleviate the pressure of general register spilling. As aresult, Minemu only worked on 32-bit program. Tain-tEraser [49] developed function summaries for Windowsprograms to propagate taint at function level. Libdft [20]introduced two guidelines to facilitate DBI’s code inlin-ing: 1) tag propagation code should have no branch; 2)shadow memory updates should be accomplished witha single assignment. Ruwase et al. [30] applied com-piler optimization techniques to eliminate redundant taintlogic code in hot paths. Jee et al. [19] proposed TaintFlow Algebra to summarize the semantics of taint logicfor basic blocks. All these efforts to generate optimizedtaint logic code are orthogonal and complementary toTaintPipe.

Decoupling Dynamic Taint Analysis. A numberof researchers have considered the high performancepenalty imposed by inlined dynamic taint analysis. Theyproposed various solutions to decouple taint trackinglogic from application under examination [24, 31, 26, 15,17, 9], which are close in spirit to our proposed approach.Speck [26] forked multiple taint analysis processes fromapplication execution to spare cores by means of specula-tive execution, and utilized record/replay to synchronizetaint analysis processes. Speck required OS level supportfor speculative execution and rollback. Speck’s approachsacrifices processing power to achieve acceleration. Sim-ilar to TaintPipe’s segmented symbolic taint analysis,Ruwase et al. [31] proposed symbolic inheritance track-ing to parallelize dynamic taint analysis. TaintPipe dif-fers from Ruwase et al.’s approach in three ways: 1)Their approach was built on top of a log-based archi-tecture [10] for efficient communication with idle cores,while TaintPipe works on commodity multi-core hard-ware directly. 2) To achieve better parallelization, theyadopted a relaxed taint propagation policy to set a bi-nary operation as untainted, while TaintPipe performsfull-fledged taint propagation so that we provide strongersecurity guarantees. 3) They used a separate “master”processor to update each segment’s taint status sequen-tially, while TaintPipe resolves symbolic taint states be-tween two consecutive segments. Our approach couldachieve better performance when there are more than afew “worker” processors.

Software-only approaches [15, 17, 9] are the most re-lated to TaintPipe. They decouple dynamic taint anal-ysis to a shadow thread by logging the runtime valuesthat are needed for taint analysis. However, as we havepointed out, these methods [15, 9] may suffer from highoverhead of frequent communication between the appli-cation thread and shadow thread. Recent work Shad-owReplica [17] ameliorates this drawback by adoptingfine-grained offline optimizations to remove redundanttaint logic code. In principle, it is possible to removeredundant taint logic by means of static offline optimiza-tions. Unfortunately, even static disassembly of strippedbinaries is still a challenge [22, 35]. Therefore, theassumption by ShadowReplica that an accurate controlflow graph can be constructed may not be feasible in cer-tain scenarios, such as analyzing control flow obfuscatedsoftware. We take a different angle to address this issuewith lightweight runtime information logging and seg-mented symbolic taint analysis. We demonstrate the ca-pability of TaintPipe in speeding up obfuscated binaryanalysis, which ShadowReplica may not be able to han-dle. Furthermore, ShadowReplica does not support bit-level and multi-tag taint analysis, while TaintPipe han-dles them naturally.

8 Conclusion

We have presented TaintPipe, a novel tool for pipelin-ing dynamic taint analysis with segmented symbolic taintanalysis. Different from previous parallelization work ontaint analysis, TaintPipe uses a pipeline style that relieson straight-line code with very few runtime values, en-abling lightweight online logging and much lower run-time overhead. We have evaluated TaintPipe on a num-ber of benign and malicious programs. The results showthat TaintPipe rivals conventional inlined dynamic taintanalysis in precision, but with a much lower online ex-ecution slowdown. The performance experiments indi-cate that TaintPipe can speed up dynamic taint analysisby 2.43 times on a set of common utilities and 2.38 timeson SPEC2006, respectively. Such experimental evidencedemonstrates that TaintPipe is both efficient and effectiveto be applied in real production environments.

9 Acknowledgments

We thank the Usenix Security anonymous reviewers andNiels Provos for their valuable feedback. This researchwas supported in part by the National Science Foun-dation (NSF) grants CNS-1223710 and CCF-1320605,and the Office of Naval Research (ONR) grant N00014-13-1-0175. Liu was also partially supported by AROW911NF-09-1-0525.

References[1] ARZT, S., RASTHOFER, S., FRITZ, C., BODDEN, E., BARTEL,

A., KLEIN, J., LE TRAON, Y., OCTEAU, D., AND MCDANIEL,P. FlowDroid: Precise context, flow, field, object-sensitive andlifecycle-aware taint analysis for android apps. In Proceedings ofthe 35th ACM SIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI’14) (2014).

[2] BALAKRISHNAN, G., AND REPS, T. WYSINWYX: What YouSee Is Not What You eXecute. ACM transactions on program-ming languages and systems 32, 6 (2010).

[3] BOSMAN, E., SLOWINSKA, A., AND BOS, H. Minemu: Theworld’s fastest taint tracker. In Proceedings of the 14th Inter-national Symposium on Recent Advances in Intrusion Detection(RAID’11) (2011).

[4] BRUENING, D., GARNETT, T., AND AMARASINGHE, S. An in-frastructure for adaptive dynamic optimization. In Proceedingsof the 2003 international symposium on code generation and op-timization (CGO’03) (2003).

[5] BRUMLEY, D., JAGER, I., AVGERINOS, T., AND SCHWARTZ,E. J. BAP: A binary analysis platform. In Proceedings ofthe 23rd international conference on computer aided verification(CAV’11) (2011).

[6] CABALLERO, J., POOSANKAM, P., KREIBICH, C., AND SONG,D. Dispatcher: Enabling active botnet infiltration using automaticprotocol reverse-engineering. In Proceedings of the 16th ACMConference on Computer and Communication Security (CCS’09)(2009).

[7] CABALLERO, J., POOSANKAM, P., MCCAMANT, S., BABI C,D., AND SONG, D. Input generation via decomposition andre-stitching: Finding bugs in malware. In Proceedings of the17th ACM Conference on Computer and Communications Secu-rity (CCS’10) (2010).

[8] CHA, S. K., AVGERINOS, T., REBERT, A., AND BRUMLEY, D.Unleashing mayhem on binary code. In Proceedings of the 2012IEEE Symposium on Security and Privacy (2012).

[9] CHABBI, M., PERIYANAYAGAM, S., ANDREWS, G., AND DE-BRAY, S. Efficient dynamic taint analysis using multicore ma-chines. Tech. rep., The University of Arizona, May 2007.

[10] CHEN, S., GIBBONS, P. B., KOZUCH, M., AND MOWRY, T. C.Log-based architectures: Using multicore to help software be-have correctly. ACM SIGOPS Operating Systems Review 45, 1(2011), 84–91.

[11] CHENG, W., ZHAO, Q., YU, B., AND HIROSHIGE, S. Taint-Trace: Efficient flow tracing with dynamic binary rewriting. InProceedings of the 11th IEEE Symposium on Computers andCommunications (ISCC’06) (2006).

[12] CHRISTODORESCU, M., JHA, S., AND KRUEGEL, C. Miningspecifications of malicious behavior. In Proceedings of the 6thJoint Meeting of the European Software Engineering Conferenceand the ACM SIGSOFT Symposium on The Foundations of Soft-ware Engineering (ESEC-FSE’07) (2007).

[13] CLAUSE, J., LI, W. P., AND ORSO, A. Dytan: A generic dy-namic taint analysis framework. In Proceedings of the ACM SIG-SOFT International Symposium on Software Testing and Analysis(ISSTA 2007) (2007).

[14] ENCK, W., GILBERT, P., CHUN, B.-G., COX, L. P., JUNG,J., MCDANIEL, P., AND SHETH, A. N. TaintDroid: Aninformation-flow tracking system for realtime privacy monitoringon smartphones. In Proceedings of the 2010 USENIX Symposiumon Operating Systems Design and Implementation (OSDI’10),(2010).

[15] ERMOLINSKIY, A., KATTI, S., SHENKER, S., FOWLER, L. L.,AND MCCAULEY, M. Towards practical taint tracking. Tech.rep., EECS Department, University of California, Berkeley, Jun2010.

[16] GODEFROID, P. Micro execution. In Proceedings of the 36thInternational Conference on Software Engineering (ICSE’14)(2014).

[17] JEE, K., KEMERLIS, V. P., KEROMYTIS, A. D., AND POR-TOKALIDIS, G. ShadowReplica: Efficient parallelization of dy-namic data flow tracking. In Proceedings of the 2013 ACMSIGSAC conference on Computer & communications security(CCS’13) (2013).

[18] JEE, K., PORTOKALIDIS, G., KEMERLIS, V. P., GHOSH, S.,AUGUST, D. I., AND KEROMYTIS, A. D. A general approachfor efficiently accelerating software-based dynamic data flowtracking on commodity hardware. In Proceedings of the 19thInternet Society (ISOC) Symposium on Network and DistributedSystem Security (NDSS) (2012).

[19] JEE, K., PORTOKALIDIS, G., KEMERLIS, V. P., GHOSH, S.,AUGUST, D. I., AND KEROMYTIS, A. D. A general approachfor efficiently accelerating software-based dynamic data flowtracking on commodity hardware. In Proceedings of the 2012Network and Distributed System Security Symposium (NDSS’12)(2012).

[20] KEMERLIS, V. P., PORTOKALIDIS, G., JEE, K., ANDKEROMYTIS, A. D. libdft: Practical dynamic data flow track-ing for commodity systems. In Proceedings of the 8th ACMSIGPLAN/SIGOPS International Conference on Virtual Execu-tion Environments (VEE’12) (2012).

[21] LI, X., WANG, X., AND CHANG, W. CipherXRay: Exposingcryptographic operations and transient secrets from monitored bi-nary execution. IEEE Transactions on Dependable and SecureComputing 11, 2 (2014).

[22] LINN, C., AND DEBRAY, S. Obfuscation of executable code toimprove resistance to static disassembly. In Proceedings of the10th ACM Conference on Computer and Communications Secu-rity (CCS’03) (2003).

[23] LUK, C.-K., COHN, R., MUTH, R., PATIL, H., KLAUSER, A.,LOWNEY, G., WALLACE, S., REDDI, V. J., AND HAZELWOOD,K. Pin: building customized program analysis tools with dy-namic instrumentation. In Proceedings of the 2005 ACM SIG-PLAN conference on Programming language design and imple-mentation (PLDI’05) (2005).

[24] NAGARAJAN, V., KIM, H.-S., WU, Y., AND GUPTA, R. Dy-namic information flow tracking on multicores. In Proceedings ofthe 2008 Workshop on Interaction between Compilers and Com-puter Architectures (2008).

[25] NEWSOME, J., AND SONG, D. Dynamic taint analysis for auto-matic detection, analysis, and signature generation of exploits oncommodity software. In Proceedings of the 2005 Network andDistributed System Security Symposium (NDSS’05) (2005).

[26] NIGHTINGALE, E. B., PEEK, D., CHEN, P. M., AND FLINN,J. Parallelizing security checks on commodity hardware. InProceedings of the 13th International Conference on Architec-tural Support for Programming Languages and Operating Sys-tems (ASPLOS’08) (2008).

[27] QIN, F., WANG, C., LI, Z., SEOP KIM, H., ZHOU, Y., ANDWU, Y. LIFT: A low-overhead practical information flow track-ing system for detecting security attacks. In Proceedings of the39th Annual IEEE/ACM International Symposium on Microar-chitecture (MICRO’06) (2006).

[28] RAWAT, S., MOUNIER, L., AND POTET, M.-L. Static taint-analysis on binary executables. http://stator.imag.fr/

w/images/2/21/Laurent_Mounier_2013-01-28.pdf, Oc-tober 2011.

[29] RENIERIS, M., RAMAPRASAD, S., AND REISS, S. P. Arith-metic program paths. In Proceedings of the 10th European soft-ware engineering conference held jointly with 13th ACM SIG-SOFT international symposium on Foundations of software engi-neering (ESEC/FSE-13) (2005).

[30] RUWASE, O., CHEN, S., GIBBONS, P. B., AND MOWRY, T. C.Decoupled lifeguards: Enabling path optimizations for onlinecorrectness checking tools. In Proceedings of the ACM SIGPLAN2010 Conference on Programming Language Design and Imple-mentation (PLDI’10) (2010).

[31] RUWASE, O., GIBBONS, P. B., MOWRY, T. C., RAMACHAN-DRAN, V., CHEN, S., KOZUCH, M., AND RYAN, M. Paralleliz-ing dynamic information flow tracking lifeguards. In Proceed-ings of the 20th ACM Symposium on Parallelism in Algorithmsand Architectures (SPAA’08) (2008).

[32] SCHWARTZ, E. J., AVGERINOS, T., AND BRUMLEY, D. All youever wanted to know about dynamic taint analysis and forwardsymbolic execution (but might have been afraid to ask). In Pro-ceedings of the 2010 IEEE Symposium on Security and Privacy(2010).

[33] SLOWINSKA, A., STANCESCU, T., AND BOS, H. Howard: Adynamic excavator for reverse engineering data structures. InProceedings of the 2011 Network and Distributed System Secu-rity Symposium (NDSS’11) (2011).

[34] VACHHARAJANI, N., BRIDGES, M. J., CHANG, J., RANGAN,R., OTTONI, G., BLOME, J. A., REIS, G. A., VACHHARAJANI,M., AND AUGUST, D. I. RIFLE: An architectural framework

for user-centric information-flow security. In Proceedings of the37th Annual IEEE/ACM International Symposium on Microar-chitecture (MICRO’37) (2004).

[35] WANG, S., WANG, P., AND WU, D. Reassembleable disassem-bling. In Proceedings of the 24th USENIX Security Symposium(2015), USENIX Association.

[36] WANG, X., JHI, Y.-C., ZHU, S., AND LIU, P. STILL: Ex-ploit code detection via static taint and initialization analyses. InProceedings of the 24th Annual Computer Security ApplicationsConference (ACSAC’08) (2008).

[37] WHELAN, R., LEEK, T., AND KAELI, D. Architecture-independent dynamic information flow tracking. In Proceedingsof the 22nd international conference on Compiler Construction(CC’13) (2013).

[38] WONDRACEK, G., COMPARETTI, P. M., KRUEGEL, C., ANDKIRDA, E. Automatic network protocol analysis. In Proceed-ings of the 15th Annual Network and Distributed System SecuritySymposium (NDSS’08) (2008).

[39] XU, M., MALYUGIN, V., SHELDON, J., VENKITACHALAM,G., AND WEISSMAN, B. ReTrace: Collecting execution tracewith virtual machine deterministic replay. In Proceedings ofthe 2007 Workshop on Modeling, Benchmarking and Simulation(2007).

[40] XU, W., BHATKAR, S., AND SEKAR, R. Taint-enhanced pol-icy enforcement: A practical approach to defeat a wide range ofattacks. In Proceedings of the 15th Conference on USENIX Secu-rity Symposium (USENIX’06) (2006).

[41] YADEGARI, B., AND DEBRAY, S. Bit-level taint analysis. InProceedings of the 14th IEEE International Working Conferenceon Source Code Analysis and Manipulation (2014).

[42] YADEGARI, B., JOHANNESMEYER, B., WHITELY, B., ANDDEBRAY, S. A generic approach to automatic deobfuscation ofexecutable code. In Proceedings of the 36th IEEE Symposium onSecurity and Privacy (2015).

[43] YIN, H., AMD M. EGELE, D. S., KRUEGEL, C., AND KIRDA,E. Panorama: Capturing system-wide information flow for mal-ware detection and analysis. In ACM Conference on Computerand Communications Security (CCS’07) (2007).

[44] YIN, H., AND SONG, D. TEMU: Binary code analysisvia whole-system layered annotative execution. Tech. Rep.UCB/EECS-2010-3, EECS Department, University of California,Berkeley, Jan 2010.

[45] YIP, A., WANG, X., ZELDOVICH, N., AND KAASHOEK, M. F.Improving application security with data flow assertions. In Pro-ceedings of the ACM SIGOPS 22nd symposium on Operating sys-tems principles (2009), ACM, pp. 291–304.

[46] ZHAO, Q., CUTCUTACHE, I., AND WONG, W.-F. PiPA:Pipelined profiling and analysis on multi-core systems. In Pro-ceedings of the 2008 International Symposium on Code Genera-tion and Optimization (CGO’08) (2009).

[47] ZHAO, Q., SIM, J. E., RUDOLPH, L., AND WONG, W.-F. DEP:Detailed execution profile. In Proceedings of the 15th Inter-national Conference on Parallel Architectures and CompilationTechniques (PACT’06) (2006).

[48] ZHAO, R., GU, D., LI, J., AND ZHANG, Y. Automatic detectionand analysis of encrypted messages in malware. In Proceedingsof the 9th China International Conference on Information Secu-rity and Cryptology (INSCRYPT’13) (2013).

[49] ZHU, D. Y., JUNG, J., SONG, D., KOHNO, T., AND WETHER-ALL, D. TaintEraser: Protecting sensitive data leaks usingapplication-level taint tracking. ACM SIGOPS Operating Sys-tems Review 45 (January 2011), 142–154.

taintpipe: pipelined symbolic taint analysis - faculty pipelined symbolic taint analysis jiang ming,...

Documents