Transcript
Page 1: Precise Dynamic Data Flow Tracking with Proximal Gradients · 2020-04-30 · Huffman decoding index in gzip decompression. taint source: x taint sink: y // x is a 4 byte int int x

Gabriel Ryan1, Abhishek Shah1, Dongdong She1, Koustubha Bhat2, Suman Jana1

1Columbia University 2Vrije [email protected], [email protected], [email protected], [email protected], [email protected]

References:[1] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521.7553 (2015): 436.[2] Selvaraju, Ramprasaath R., et al. "Grad-cam: Visual explanations from deep networks via gradient-based localization." Proceedings of the IEEE International Conference on Computer Vision. 2017.[3] Parikh, Neal, and Stephen Boyd. "Proximal algorithms." Foundations and Trends® in Optimization 1.3 (2014): 127-239.

Gradient provides more precise and meaningful information about programbehavior than taint based dataflow tracking.

Gradient can be used toguide searches over large input spaces.

Evaluation:- Achieved up to 39% better dataflow precision than

state of the art method (Taint Analysis) on 7 programs.

- Overhead is comparable to Taint Analysis on average (<5%).- Example application: Dataflow guided fuzzing.

- Increase of 57% edge coverage on 2 programs

Figure 1. Gradient avoids approximation errorsand identifies where x has the greatest effect on y.

Precise Dynamic Data Flow Tracking with Proximal Gradients

Problem: Programs are not Differentiable

Solution: Nonsmooth Optimization

Programs contain nonsmooth operations like Bitwise And,making it impossible to compute gradients normally.

Proximal Gradients are an application of nonsmoothoptimization methods that base the gradient on the localminimum of a function.3

int f(int x) return x & 4;

x=6

Local gradient is flat!

x=6 x=6

Proximal Minimization Proximal Gradient

Proximal Operator finds nearby minimum

Proximal Gradient is computed from nearby minimum

inpu

t by

te

decoding byte

Gradient Analysis identifies which input bytes are used to look up each Huffman decoding index in gzipdecompression.

taint source: xtaint sink: y// x is a 4 byte intint x = 0x12345678;for (int i=0; i<6; i++) y[i] = x; x = x<<8;

Taint to y from x

Gradient of y wrt. x

y1 1 1 1 1 1

1 256 2^16 2^24 0 0gradient

input[0]

input[1]

Error!

y = 2 * x

y_shad = alloc_shadow() y_grad = gradient(2 * x)

Application Memory

Gradient Table

Application Code

Shadow MemoryInstrumentation Code

Implementation:- Implemented as LLVM Sanitizer, based on DataFlowSanitizer- Each address has associated shadow memory with label- Each operation is instrumented to propagate gradient

Top Related