[ieee 2011 third international conference on communications and mobile computing (cmc) - qingdao,...

4
Efficient Taint Analysis with Taint Behavior Summary Ruoyu Zhang, Shan Huang, Zhengwei Qi Software Engineering School of Shanghai Jiao Tong University SE of SJTU Shanghai, China Email: {holmeszry, babalaka, qizhwei}@sjtu.edu.cn Abstract—Software security has drawn much attention re- cently. As an effective approach to detect software vulnerabili- ties and improve software security, dynamic taint analysis has been frequently researched in the last few years. In this paper, we implement a dynamic taint tracking system LTTS. In order to address the efficiency problem which exists in many of the current taint tracking systems, we also propose a taint behavior summary mechanism to optimize the system. According to our experiments, LTTS has achieved relative high efficiency compared to the existing techniques. Keywords-taint analysis; dynamic binary analysis; I. I NTRODUCTION In the recent decades, there has been growing concerns with software security which is threatened by the software vulnerabilities. By IETF, software vulnerability is defined as “a flaw or weakness in a system’s design, implementation, or operation and management that could be exploited to violate the system’s security policy”. For instance, attacker can exploit the buff overflow vulnerability by causing an overflow on the stack, and overwrite the content reserved by the system. The attacker is able to control the procedure by changing the return address using the stack overflow. There have been many researches on the detection of the software vulnerabilities[1][2]. The approaches can be generally categorized as static methods or dynamic ones. In most cases, static analysis is applied on the source code, and the dynamic approach aims at the binary. Since the source code of many software is hard to acquire, dynamic analysis is widely used in the software vulnerability detection. Dynamic taint analysis is an effective method in the dynamic analysis for software vulnerability detection. The approach is based on the concept that any variable which can be influenced by an outside user is a potential threat to the program and should be monitored. In the beginning, certain memory blocks and registers which load outside data are marked as the source of tainting. During the execution of the program, the taint analyzer traces the taint propagation by marking every memory address and register influenced by the tainted data. There has been much effort on the dynamic taint analysis[3], most of which suffer from high overhead. Dy- tan[4] is a generic dynamic taint analysis framework, which provides a customizable mechanism for the taint tracing, and its overhead is 50 times. Panorama[5] and LIFT [6] are also presented for the same purpose and slow down the target program by 20 times and 3.6 times respectively. The overhead is an important parameter for the runtime method, and the defects in efficiency have greatly limited the usage of this technique. According to the common observation, large part of the binary codes in software are loaded from system library such as kernel32.dll, USER32.dll and ntdll.dll. The behavior of these modules is predictable, so it’s not necessary to check every instruction in them. We propose an solution to optimize the taint analysis technique by summarizing the taint effects of the library functions and idioms. The experiment result shows great reduction of the runtime overhead. In this paper, we implement a systematic dynamic binary analysis based approach to apply taint analysis for software vulnerability detection. Aiming at the defect of low efficien- cy in current techniques, we improve our system with taint behavior summary, which will be described in detail in the next section. Our contributions include: Implement a system for dynamic taint analysis We implement LTTS, a low-overhead taint tracing system for binary code. The system marks the taint source, monitors the propagation of the tainted data, and generates alarms when the tainted data are involved in suspicious behavior. Present a mechanism to optimize the efficiency of taint analysis To reduce the overhead of the taint tracing techniques, this paper proposes a method of taint behavior summa- ry. According to our experiments, it significantly lowers the runtime overhead of the system. II. SYSTEM DESIGN In this section, the design of the whole system will be introduced in the first place. In addition, we also describe the details of our approach including the implementation of the taint analyzer and the mechanism of taint behavior summary. 2011 Third International Conference on Communications and Mobile Computing 978-0-7695-4357-4/11 $26.00 © 2011 IEEE DOI 10.1109/CMC.2011.76 11

Upload: zhengwei

Post on 28-Mar-2017

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2011 Third International Conference on Communications and Mobile Computing (CMC) - Qingdao, China (2011.04.18-2011.04.20)] 2011 Third International Conference on Communications

Efficient Taint Analysis with Taint Behavior Summary

Ruoyu Zhang, Shan Huang, Zhengwei QiSoftware Engineering School of Shanghai Jiao Tong University

SE of SJTUShanghai, China

Email: {holmeszry, babalaka, qizhwei}@sjtu.edu.cn

Abstract—Software security has drawn much attention re-cently. As an effective approach to detect software vulnerabili-ties and improve software security, dynamic taint analysis hasbeen frequently researched in the last few years. In this paper,we implement a dynamic taint tracking system LTTS. In orderto address the efficiency problem which exists in many of thecurrent taint tracking systems, we also propose a taint behaviorsummary mechanism to optimize the system. According toour experiments, LTTS has achieved relative high efficiencycompared to the existing techniques.

Keywords-taint analysis; dynamic binary analysis;

I. INTRODUCTION

In the recent decades, there has been growing concernswith software security which is threatened by the softwarevulnerabilities. By IETF, software vulnerability is defined as“a flaw or weakness in a system’s design, implementation,or operation and management that could be exploited toviolate the system’s security policy”. For instance, attackercan exploit the buff overflow vulnerability by causing anoverflow on the stack, and overwrite the content reservedby the system. The attacker is able to control the procedureby changing the return address using the stack overflow.

There have been many researches on the detection ofthe software vulnerabilities[1][2]. The approaches can begenerally categorized as static methods or dynamic ones. Inmost cases, static analysis is applied on the source code, andthe dynamic approach aims at the binary. Since the sourcecode of many software is hard to acquire, dynamic analysisis widely used in the software vulnerability detection.

Dynamic taint analysis is an effective method in thedynamic analysis for software vulnerability detection. Theapproach is based on the concept that any variable whichcan be influenced by an outside user is a potential threatto the program and should be monitored. In the beginning,certain memory blocks and registers which load outside dataare marked as the source of tainting. During the execution ofthe program, the taint analyzer traces the taint propagationby marking every memory address and register influencedby the tainted data.

There has been much effort on the dynamic taintanalysis[3], most of which suffer from high overhead. Dy-tan[4] is a generic dynamic taint analysis framework, whichprovides a customizable mechanism for the taint tracing,

and its overhead is 50 times. Panorama[5] and LIFT[6] arealso presented for the same purpose and slow down thetarget program by 20 times and 3.6 times respectively. Theoverhead is an important parameter for the runtime method,and the defects in efficiency have greatly limited the usageof this technique.

According to the common observation, large part of thebinary codes in software are loaded from system librarysuch as kernel32.dll, USER32.dll and ntdll.dll. The behaviorof these modules is predictable, so it’s not necessary tocheck every instruction in them. We propose an solutionto optimize the taint analysis technique by summarizingthe taint effects of the library functions and idioms. Theexperiment result shows great reduction of the runtimeoverhead.

In this paper, we implement a systematic dynamic binaryanalysis based approach to apply taint analysis for softwarevulnerability detection. Aiming at the defect of low efficien-cy in current techniques, we improve our system with taintbehavior summary, which will be described in detail in thenext section.

Our contributions include:• Implement a system for dynamic taint analysis

We implement LTTS, a low-overhead taint tracingsystem for binary code. The system marks the taintsource, monitors the propagation of the tainted data,and generates alarms when the tainted data are involvedin suspicious behavior.

• Present a mechanism to optimize the efficiency of taintanalysisTo reduce the overhead of the taint tracing techniques,this paper proposes a method of taint behavior summa-ry. According to our experiments, it significantly lowersthe runtime overhead of the system.

II. SYSTEM DESIGN

In this section, the design of the whole system will beintroduced in the first place. In addition, we also describethe details of our approach including the implementationof the taint analyzer and the mechanism of taint behaviorsummary.

2011 Third International Conference on Communications and Mobile Computing

978-0-7695-4357-4/11 $26.00 © 2011 IEEE

DOI 10.1109/CMC.2011.76

11

Page 2: [IEEE 2011 Third International Conference on Communications and Mobile Computing (CMC) - Qingdao, China (2011.04.18-2011.04.20)] 2011 Third International Conference on Communications

LTTS

Report

Binary

Stream

Dynamic

Binary

Monitor

Taint Tracker

Taint Source

Recognizer

Function

Recognizer

Function

Analyzer

Malicious

Behavior

Asserter

Taint Summary

Database

Optimize Initialize

Figure 1. System overview.

A. System Overview

LTTS is built on the binary instrument platform DynamoR-IO[7], a runtime code manipulation system that support-s code transformations on target program. The platformprovides efficient instrumentation of binary by using codecaching technology. Thanks to the facilitation of DynamoR-IO, LTTS realizes the taint analysis with high efficiency.

Figure 1 provides an overview of LTTS. The systemmonitors the binary code stream with the binary instrumentplatform DynamoRIO, when the input of the outside user hasbeen loaded into the target program, the taint source recog-nizer will detect the tainted memory blocks and initializethe taint tracking procedure. During the taint tracking, themalicious behavior asserter checks the propagation of thetainted data, and generates the report.

The main part of the taint behavior summary is the func-tion call analysis. The general conception of this techniqueis to predict the taint propagation without concrete tainttracking in identifiable functions. The function recognizeris implemented to detect the calls to the libraries, and thenthe function analyzer is able to provide the taint behaviorinformation according to the taint summary database.

B. Taint Source

Our approach is customized on the definition of the taintsource. By default, LTTS marks all data which is introducedto the target program by the outside users as untrustful one.For example, since the malicious users can exploit bufferoverflow vulnerabilities by manipulating the input file forthe software, our system marks the location which loads theuser file as a source of taint. When a file is read to the buffer,the memory blocks of the buffer are registered in LTTS, andmonitored by the system.

C. Taint Tracking

After a tainted data is identified, the taint analysis engineis initialized to monitor each binary instruction which refersto the tainted data in order to track the behavior of thetaint propagation. In order to determine the effect of theinstructions, LTTS categorizes them into two sorts: (1)Instructions which can propagate the tainted data (LOAD,POP, PUSH etc.), (2) Instructions which do not affect thepropagation of the tainted data(NOP, JMP, etc.).

It’s worthy of noticing that there are special cases usedas idioms whose functions do not depend on the inputs. Forexample, “XOR EAX, EAX” always sets EAX to be zeroregardless of whether the original value in eax is tainted.LTTS recognizes these idioms, and makes the taint trackingmore efficient.

D. Malicious Behavior Assertion

Since the behavior of the tainted data is monitored, itis achievable for LTTS to detect the exploit on the soft-ware vulnerability. When processing the instructions, ourinstruction analyzer applies runtime analysis on it. In ourapproach, the system checks whether the tainted data is usedas a control transfer target, such as a jump target, returnaddress, or function pointer. According to our observation,many attacks attempt to overwrite one of these in order tointerfere with the control flow of the victim program. Infact, the tainted data is very rarely used as a control transfertarget in normal program, and the developers are not likelyto handle the control of their codes into data which comesfrom the user[4].

E. Mechanism of Taint Behavior Summary

In the previous part of this section, we provide the ap-proach of the baseline system. The system also suffers fromhigh runtime overhead. In order to address this problem, wepropose an approach to improve the efficiency by summarizethe taint behavior of certain part of the code, and determinethe taint propagation without processing every instruction.Our experiments show that this mechanism greatly reducesthe overhead of LTTS.

API summary is a key part of the approach. Because mostof the APIs in the system library can be determined withoutactually analyzing in every instance, we exam the code of

12

Page 3: [IEEE 2011 Third International Conference on Communications and Mobile Computing (CMC) - Qingdao, China (2011.04.18-2011.04.20)] 2011 Third International Conference on Communications

TS'TSTRn ! " (1)

TsrcTS'TS)TRSrc,Dst,Dst(TR nn !"#$%&'()* (2)

}T{TS'TS

Dst)TRSrc,Dst,Dst(TR

Dst

nn

!

"#$%&"' (( (3)

Figure 2. The formulas of the taint summary algorithm. TR stands for thetaint relation in the target function, which contains taint vectors in the formof < Dst, Src >. TS is the set of the tainted data in the system scale,which records all the locations which are tainted, and should be monitored.

these APIs in the first place, gather the information of theirtaint propagation. By this method, when the apis are called,LTTS doesn’t have to analyze every instruction to track thetaint data.

According to our design, the code which acts with certainfunction can also be summarized, and referred as idioms. Forexample, the binary codes which initialize the stack for thefunctions are quite similar, and the behaviors of these idiomsare predictable, thus LTTS can summarize them and hencereduce the overhead of the system.

The formulas provide the general principles of the taintanalysis summary. (1) When a function doesn’t do anythingin the taint propagation, then the taint status of the programdoesn’t change. (2) When a function can eliminate thetainted data, then the taint status of the program will removethe source parameter from the set of tainted data. (3) When afunction can propagate the tainted data, then the destinationof the tainting should be marked as tainted and brought intothe monitoring of the system.

III. EVALUATION

Because this paper has proposed a mechanism to optimizethe existing taint tracking techniques, the evaluation onthe efficiency of LTTS is provided. We also apply thesystem on the software vulnerability detection to evaluateits effectiveness.

A. Efficiency

This subsection provides the performance of LTTS andthe optimizations are described in the last section. Theexperiments is based on the SPEC CINT2006 benchmarkson Windows. The results of the experiments have shown thatthe taint behavior summary mechanism has greatly improvedthe efficiency of the system.

Figure 3 shows normalized execution time (the ratio ofour time to native execution time) of LTTS. Because of thefacilitation of the binary instrument platform DynamoRIOwhich also dynamically optimizes the execution of the targetprogram, LTTS achieve the low overhead of 3.14 timeson average without the taint behavior summary. When thesummary is introduced to the system, the time cost reduces

Table ISOFTWARE VULNERABILITIES DETECTION OF LTTS.

Target Exploitable Maybe exploitableMS Word 2003 0 1MS Word 2007 0 0

MS PowerPoint 2003 0 1MS PowerPoint 2007 0 0

Kingsoft wps 1 12justsystem ichitaro 3 1

hangul HWP 3 42Total 7 57

approximately by half, and the average runtime overhead is1.41 times that of the native execution.

It can be observed from figure 3 that the performanceof LTTS differs among the target programs. One reasonis the structural distinction between them. The size of theexe module of the program is an important factor in theperformance of our system. Since the library functions canbe summarized and the analyzing time spent on them isrelatively low. It’s also worth noticing that we haven’tsummarized the system library entirely. When the programcalls to the functions which haven’t been included in ourtaint behavior database, the overhead will increase.

B. Effectiveness

As a dynamic approach, LTTS is implemented at runtimeto detect the existing threaten to the target system. We ap-plied our approach on 7 real-world word processors(Kingsoftwps, justsystem ichitaro and hangul HWP are the word textprocessors for Chinese, Korean and Japanese respectively).

The target programs are running under the monitor ofLTTS. The results are presented in Table I. In general, 7exploitable vulnerabilities have been found, and another 57ones maybe threats.

IV. RELATED WORK

Binary analysis is an intuitive and effective way in pro-gram analysis, especially in the cases when the source codeof the target programs are not acquirable.

In order to monitor the behavior of the running programfor analysis, several tools are provided based on binary in-strument. We build our tools on the base of DynamoRIO[7].The platform presents a mechanism which dynamicallyremoves much of the interpreted overhead from languageimplementations and has achieved high efficiency. There arealso several other tools which can facilitate the dynamicbinary analysis. Pin[8] and Valgrind[9] also provide greatfacilitation on this field of application.

Dynamic taint analysis is very common in runtime bugdetection. Panorama[5] monitors the behavior of the targetcode, and it’s designed to detect integer bugs in malware.Minos[10] proposes hardware extension to perform runtimedata flow integrity check to detect attacks. The authors

13

Page 4: [IEEE 2011 Third International Conference on Communications and Mobile Computing (CMC) - Qingdao, China (2011.04.18-2011.04.20)] 2011 Third International Conference on Communications

Figure 3. The performance of LTTS. The measurement is normalized with “native” execution of the benchmarks in the SPEC CINT2006. “LTTS” and“LTTS NOT TBS” presents the execution time with and without taint behavior summary mechanism respectively.

perform checks at the whole-system level by modifyinghardwares and the OS. In contrast, LTTS doesn’t requiremodifications of hardwares or the OS. Furthermore, the tech-niques described in this paper can be applied in many typesof attack detections which are not addressed in Minos, suchas format string overflow and security-sensitive variablesoverwrite.

V. CONCLUSION

In this paper, we present a taint tracking system LTTS, andpropose a mechanism of taint behavior summary to optimizethe system in efficiency. The experiments have shown thatour approach greatly reduces the runtime overhead of thedynamic taint analysis. In the future, we will apply symbolicexecution in the field of dynamic taint analysis, and broadenthe application of the system.

REFERENCES

[1] Tielei Wang and Tao Wei and Zhiqiang Lin and Wei Zou,IntScope: Automatically Detecting Integer Overflow Vulnera-bility in X86 Binary Using Symbolic Execution. Network andDistributed System Security Symposium (NDSS), 2009.

[2] Dawn Xiaodong Song and David Brumley and Heng Yin andJuan Caballero and Ivan Jager and Min Gyung Kang andZhenkai Liang and James Newsome and Pongsin Poosankamand Prateek Saxena BitBlaze: A New Approach to ComputerSecurity via Binary Analysis. International Conference onInformation Systems Security (ICISS), 2008.

[3] Omer Tripp and Marco Pistoia and Stephen J. Fink and ManuSridharan and Omri Weisman, TAJ: effective taint analysisof web applications. Programming Language Design andImplementation (PLDI), 2009.

[4] James A. Clause and Wanchun Li and Alessandro Orso, Dytan:a generic dynamic taint analysis framework. Proceedingsof the 2007 international symposium on Software testing andanalysis (ISSTA), 2007.

[5] Heng Yin and Dawn Xiaodong Song and Manuel Egele andChristopher Kruegel and, Engin Kirda Panorama: capturingsystem-wide information flow for malware detection and anal-ysis. ACM Conference on Computer and CommunicationsSecurity (CCS), 2007.

[6] Feng Qin and Cheng Wang and Zhenmin Li and Ho-SeopKim and Yuanyuan Zhou and Youfeng Wu, LIFT: A Low-Overhead Practical Information Flow Tracking System forDetecting Security Attacks. Proceedings of the 39th Annu-al IEEE/ACM International Symposium on Microarchitecture(MICRO), 2006.

[7] Derek Bruening and Timothy Garnett and Saman AmarasingheAn infrastructure for adaptive dynamic optimization. In In-ternational Symposium on Code Generation and Optimization(CGO), 2003.

[8] Chi-Keung Luk and Robert Cohn and Robert Muth and HarishPatil and Artur Klauser and Geoff Lowney and Steven Wallaceand Vijay Janapa Reddi and Kim Hazelwood, Pin: Buildingcustomized program analysis tools with dynamic instrumenta-tion. PLDI, 2005.

[9] Nicholas Nethercote and Julian Seward, Valgrind: a frameworkfor heavyweight dynamic binary instrumentation. PLDI,2007.

[10] Jedidiah R. Crandall and Frederic T. Chong, Minos: Archi-tectural support for software security through control dataintegrity. International Symposium on Microarchitecture,2004.

14