jsai: a static analysis platform for javascriptbenh/research/papers/kashyap14jsai.pdf ·...

11
JSAI: A Static Analysis Platform for JavaScript Vineeth Kashyap Kyle Dewey Ethan A. Kuefner John Wagner Kevin Gibbons John Sarracino ? Ben Wiedermann ? Ben Hardekopf University of California Santa Barbara, USA ? Harvey Mudd College, USA ABSTRACT JavaScript is used everywhere from the browser to the server, including desktops and mobile devices. However, the cur- rent state of the art in JavaScript static analysis lags far be- hind that of other languages such as C and Java. Our goal is to help remedy this lack. We describe JSAI, a formally specified, robust abstract interpreter for JavaScript. JSAI uses novel abstract domains to compute a reduced prod- uct of type inference, pointer analysis, control-flow analysis, string analysis, and integer and boolean constant propaga- tion. Part of JSAI’s novelty is user-configurable analysis sen- sitivity, i.e., context-, path-, and heap-sensitivity. JSAI is designed to be provably sound with respect to a specific con- crete semantics for JavaScript, which has been extensively tested against a commercial JavaScript implementation. We provide a comprehensive evaluation of JSAI’s perfor- mance and precision using an extensive benchmark suite, including real-world JavaScript applications, machine gener- ated JavaScript code via Emscripten, and browser addons. We use JSAI’s configurability to evaluate a large number of analysis sensitivities (some well-known, some novel) and observe some surprising results that go against common wis- dom. These results highlight the usefulness of a configurable analysis platform such as JSAI. Categories and Subject Descriptors F.3.2 [Semantics of Programming Languages]: Pro- gram analysis General Terms Languages, Algorithms, Verification Keywords JavaScript Analysis, Abstract Interpretation 1. INTRODUCTION JavaScript is pervasive. While it began as a client-side webpage scripting language, JavaScript has grown hugely in Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. FSE ’14, November 16–22, 2014, Hong Kong, China Copyright 2014 ACM 978-1-4503-3056-5/14/11 ...$15.00. scope and popularity and is used to extend the functional- ity of web browsers via browser addons, to develop desktop applications (e.g., for Windows 8 [1]) and server-side appli- cations (e.g., using Node.js [2]), and to develop mobile phone applications (e.g., for Firefox OS [3]). JavaScript’s growing prominence means that secure, correct, maintainable, and fast JavaScript code is becoming ever more critical. Static analysis traditionally plays a large role in providing these characteristics: it can be used for security auditing, error- checking, debugging, optimization, program understanding, refactoring, and more. However, JavaScript’s inherently dy- namic nature and many unintuitive quirks cause great diffi- culty for static analysis. Our goal is to overcome these difficulties and provide a formally specified, well-tested static analysis platform for JavaScript, immediately useful for many client analyses such as those listed above. In fact, we have used JSAI in previous work to build a security auditing tool for browser addons [35] and to experiment with strategies to improve analysis preci- sion [36]. We have also used JSAI to build a static program slicing [48] client and to build a novel abstract slicing [49] client. These are only a few examples of JSAI’s usefulness. Several important characteristics distinguish JSAI from existing JavaScript static analyses (which are discussed fur- ther in Section 2): JSAI is formally specified. We base our analysis on formally specified concrete and abstract JavaScript se- mantics. The two semantics are connected using ab- stract interpretation; we have soundness proof sketches for our most novel and interesting abstract analysis domain. JSAI handles JavaScript as specified by the ECMA 3 standard [22] (sans eval and family), and various language extensions such as Typed Arrays [4]. JSAI’s concrete semantics have been extensively tested against an existing commercial JavaScript engine, and the JSAI abstract semantics have been extensively tested against the concrete semantics for soundness. JSAI’s analysis sensitivity (i.e., path-, context-, and heap-sensitivity) are user-configurable independently from the rest of the analysis. This means that JSAI al- lows arbitrary sensitivities as defined by the user rather than only allowing a small set of baked-in choices, and that the sensitivity can be set independently from the rest of the analysis or any client analyses. JSAI’s contributions include complete formalisms for con- crete and abstract semantics for JavaScript along with im- plementations of concrete and abstract interpreters based

Upload: others

Post on 15-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: JSAI: A Static Analysis Platform for JavaScriptbenh/research/papers/kashyap14jsai.pdf · refactoring, and more. However, JavaScript’s inherently dy-namic nature and many unintuitive

JSAI: A Static Analysis Platform for JavaScript

Vineeth Kashyap† Kyle Dewey† Ethan A. Kuefner†

John Wagner† Kevin Gibbons† John Sarracino?

Ben Wiedermann? Ben Hardekopf†† University of California Santa Barbara, USA ? Harvey Mudd College, USA

ABSTRACTJavaScript is used everywhere from the browser to the server,including desktops and mobile devices. However, the cur-rent state of the art in JavaScript static analysis lags far be-hind that of other languages such as C and Java. Our goalis to help remedy this lack. We describe JSAI, a formallyspecified, robust abstract interpreter for JavaScript. JSAIuses novel abstract domains to compute a reduced prod-uct of type inference, pointer analysis, control-flow analysis,string analysis, and integer and boolean constant propaga-tion. Part of JSAI’s novelty is user-configurable analysis sen-sitivity, i.e., context-, path-, and heap-sensitivity. JSAI isdesigned to be provably sound with respect to a specific con-crete semantics for JavaScript, which has been extensivelytested against a commercial JavaScript implementation.

We provide a comprehensive evaluation of JSAI’s perfor-mance and precision using an extensive benchmark suite,including real-world JavaScript applications, machine gener-ated JavaScript code via Emscripten, and browser addons.We use JSAI’s configurability to evaluate a large numberof analysis sensitivities (some well-known, some novel) andobserve some surprising results that go against common wis-dom. These results highlight the usefulness of a configurableanalysis platform such as JSAI.

Categories and Subject DescriptorsF.3.2 [Semantics of Programming Languages]: Pro-gram analysis

General TermsLanguages, Algorithms, Verification

KeywordsJavaScript Analysis, Abstract Interpretation

1. INTRODUCTIONJavaScript is pervasive. While it began as a client-side

webpage scripting language, JavaScript has grown hugely in

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.FSE ’14, November 16–22, 2014, Hong Kong, ChinaCopyright 2014 ACM 978-1-4503-3056-5/14/11 ...$15.00.

scope and popularity and is used to extend the functional-ity of web browsers via browser addons, to develop desktopapplications (e.g., for Windows 8 [1]) and server-side appli-cations (e.g., using Node.js [2]), and to develop mobile phoneapplications (e.g., for Firefox OS [3]). JavaScript’s growingprominence means that secure, correct, maintainable, andfast JavaScript code is becoming ever more critical. Staticanalysis traditionally plays a large role in providing thesecharacteristics: it can be used for security auditing, error-checking, debugging, optimization, program understanding,refactoring, and more. However, JavaScript’s inherently dy-namic nature and many unintuitive quirks cause great diffi-culty for static analysis.

Our goal is to overcome these difficulties and provide aformally specified, well-tested static analysis platform forJavaScript, immediately useful for many client analyses suchas those listed above. In fact, we have used JSAI in previouswork to build a security auditing tool for browser addons [35]and to experiment with strategies to improve analysis preci-sion [36]. We have also used JSAI to build a static programslicing [48] client and to build a novel abstract slicing [49]client. These are only a few examples of JSAI’s usefulness.

Several important characteristics distinguish JSAI fromexisting JavaScript static analyses (which are discussed fur-ther in Section 2):

• JSAI is formally specified. We base our analysis onformally specified concrete and abstract JavaScript se-mantics. The two semantics are connected using ab-stract interpretation; we have soundness proof sketchesfor our most novel and interesting abstract analysisdomain. JSAI handles JavaScript as specified by theECMA 3 standard [22] (sans eval and family), andvarious language extensions such as Typed Arrays [4].

• JSAI’s concrete semantics have been extensively testedagainst an existing commercial JavaScript engine, andthe JSAI abstract semantics have been extensively testedagainst the concrete semantics for soundness.

• JSAI’s analysis sensitivity (i.e., path-, context-, andheap-sensitivity) are user-configurable independentlyfrom the rest of the analysis. This means that JSAI al-lows arbitrary sensitivities as defined by the user ratherthan only allowing a small set of baked-in choices, andthat the sensitivity can be set independently from therest of the analysis or any client analyses.

JSAI’s contributions include complete formalisms for con-crete and abstract semantics for JavaScript along with im-plementations of concrete and abstract interpreters based

Page 2: JSAI: A Static Analysis Platform for JavaScriptbenh/research/papers/kashyap14jsai.pdf · refactoring, and more. However, JavaScript’s inherently dy-namic nature and many unintuitive

on these semantics. While concrete semantics for JavaScripthave been proposed before, ours is the first designed specif-ically for abstract interpretation. Our abstract semantics isthe first formal abstract semantics for JavaScript in the liter-ature. The abstract interpreter implementation is the firstavailable static analyzer for JavaScript that provides easyconfigurability as a design goal. All these contributions areavailable freely for download as supplementary materials1.JSAI provides a solid foundation on which to build multipleclient analyses for JavaScript. The specific contributions ofthis paper are:

• The design of a JavaScript intermediate language andconcrete semantics intended specifically for abstractinterpretation (Section 3.1).

• The design of an abstract semantics that enables con-figurable, sound abstract interpretation for JavaScript(Section 3.2). This abstract semantics represents areduced product of type inference, pointer analysis,control-flow analysis, string analysis, and number andboolean constant propagation.

• Novel abstract string and object domains for JavaScriptanalysis (Section 3.3).

• A discussion of JSAI’s configurable analysis sensitivity,including two novel context sensitivities for JavaScript(Section 4).

• An evaluation of JSAI’s performance and precision onthe most comprehensive suite of benchmarks for Java-Script static analysis that we are aware of, includ-ing browser addons, machine-generated programs viaEmscripten [5], and open-source JavaScript programs(Section 5). We showcase JSAI’s configurability byevaluating a large number of context-sensitivities, andpoint out novel insights from the results.

We preface these contributions with a discussion of relatedwork (Section 2) and conclude with plans for future work(Section 6).

2. RELATED WORKIn this section we discuss existing static analyses and hy-

brid static/dynamic analyses for JavaScript and discuss pre-vious efforts to formally specify JavaScript semantics.

JavaScript Analyses. The current state-of-the-art staticanalyses for JavaScript usually take one of two approaches:(1) an unsound2 dataflow analysis-based approach usingbaked-in abstractions and analysis sensitivities [17, 26, 31],or (2) a formally-specified type system requiring annota-tions to existing code, proven sound with respect to a specificJavaScript formal semantics but restricted to a small subsetof the full JavaScript language [45, 30, 19, 28]. No existingJavaScript analyses are formally specified, implemented us-ing an executable abstract semantics, tested against a formalconcrete semantics, or target configurable sensitivity.

1http://www.cs.ucsb.edu/~pllab, under Downloads.2Most examples of this approach are intentionally unsoundas a design decision, in order to handle the many difficul-ties raised by JavaScript analysis. Unsound analysis can beuseful in some circumstances, but for many purposes (e.g.,security auditing) soundness is a key requirement.

The closest related work to JSAI is the JavaScript staticanalyzer TAJS by Jensen et al [32, 33, 34]. While TAJS isintended to be a sound analysis of the entire JavaScript lan-guage (sans dynamic code injection), it does not possess anyof the characteristics of JSAI described in Section 1. TheTAJS analysis is not formally specified and the TAJS papershave insufficient information to reproduce the analysis; alsothe analysis implementation is not well documented, mak-ing it difficult to build client analyses or modify the mainTAJS analysis. In the process of formally specifying JSAI,we uncovered several previously unknown soundness bugsin TAJS that were confirmed by the TAJS authors. Thisserves to highlight the importance and usefulness of formalspecification.

Various previous works [15, 45, 31, 39, 25, 44, 24] pro-pose different subsets of the JavaScript language and pro-vide analyses for that subset. These analyses range fromtype inference, to pointer analysis, to numeric range andkind analysis. None of these handle the full complexities ofJavaScript. Several intentionally unsound analyses [40, 17,6, 47, 23] have been proposed, while other works [31, 26] takea best-effort approach to soundness, without any assurancethat the analysis is actually sound. None of these effortsattempt to formally specify the analysis they implement.

Several type systems [45, 30, 28, 19] have been proposedto retrofit JavaScript (or subsets thereof) with static types.Guha et. al. [28] propose a novel combination of type sys-tems and flow analysis. Chugh et. al. [19] propose a flow-sensitive refinement type system designed to allow typing ofcommon JavaScript idioms. These type systems require pro-grammer annotations and cannot be used as-is on real-worldJavaScript programs.

Combinations of static analysis with dynamic checks [25,20] have also been proposed. These systems statically ana-lyze a subset of JavaScript under certain assumptions anduse runtime checks to enforce these assumptions. Schaferet al. [42] use a dynamic analysis to determine informationthat can be leveraged to scale static analysis for JavaScript.These ideas are complementary to and can supplement ourpurely static techniques.

JavaScript Formalisms. None of the previous work onstatic analysis of JavaScript have formally specified the anal-ysis. However, there has been previous work on providingJavaScript with a formal concrete semantics. Maffeis et.al [41] give a structural smallstep operational semantics di-rectly to the full JavaScript language (omitting a few con-structs). Lee et. al [38] propose SAFE, a semantic frame-work that provides structural bigstep operational semanticsto JavaScript, based directly on the ECMAScript specifica-tion. Due to their size and complexity, neither of these se-mantic formulations are suitable for direct translation intoan abstract interpreter.

Guha et. al [27] propose a core calculus approach to pro-vide semantics to JavaScript—they provide a desugarer fromJavaScript to a core calculus called λJS , which has a small-step structural operational semantics. Their intention wasto provide a minimal core calculus that would ease provingsoundness for type systems, thus placing all the complexityin the desugarer. However, their core calculus is too low-level to perform a precise and scalable static analysis (forexample, some of the semantic structure that is critical fora precise analysis is lost, and their desugaring causes a large

Page 3: JSAI: A Static Analysis Platform for JavaScriptbenh/research/papers/kashyap14jsai.pdf · refactoring, and more. However, JavaScript’s inherently dy-namic nature and many unintuitive

code bloat—more than 200× on average). We also use thecore calculus approach; however, our own intermediate lan-guage, called notJS, is designed to be in a sweet-spot thatfavors static analysis (for example, the code bloat due toour translation is between 6− 8× on average). In addition,we use an abstract machine-based semantics rather than astructural semantics, which (as described later) is the primeenabler for configurable analysis sensitivity.

Configurable Sensitivity. Bravenboer and Smaragdakisintroduce the DOOP framework [18] that performs flow-insensitive points-to analysis for Java programs using a declar-ative specification in Datalog. Several context-sensitive ver-sions [43, 37] of the points-to analysis are expressible in thisframework as modular variations of a common code base.Their framework would require significant changes to en-able flow-sensitive analysis (especially for a language likeJavaScript, which requires an extensive analysis to com-pute a sound SSA form) like ours, and they cannot expressarbitrary analysis sensitivities (including path sensitivities)modularly the way that JSAI can.

3. JSAI DESIGNWe break our discussion of the JSAI design into three

main components: (1) the design of an intermediate repre-sentation (IR) for JavaScript programs, called notJS, alongwith its concrete semantics; (2) the design of an abstract se-mantics for notJS that yields the reduced product of a num-ber of essential sub-analyses and also enables configurableanalysis; and (3) the design of novel abstract domains forJavaScript analysis. We conclude with a discussion of vari-ous options for handling dynamic code injection.

The intent of this section is to discuss the design decisionsthat went into JSAI, rather than giving a comprehensive de-scription of the various formalisms (e.g., the translation fromJavaScript to notJS, the concrete semantics of notJS, andthe abstract semantics of notJS). All of these formalisms,along with their implementations, are available in the sup-plementary materials.

3.1 Designing the notJS IRJavaScript’s many idiosyncrasies and quirky behaviors mo-

tivate the use of formal specifications for both the concreteJavaScript semantics and our abstract analysis semantics.Our approach is to define an intermediate language callednotJS, along with a formally-specified translation from Java-Script to notJS. We then give notJS a formal concrete se-mantics upon which we base our abstract interpreter.3

Figure 1 shows the abstract syntax of notJS, which wascarefully designed with the ultimate goal of making abstractinterpretation simple, precise, and efficient. The IR containsliteral expressions for numeric, boolean values and for undef

and null. Object values are expressed with the new con-struct, and function values are expressed with the newfun

construct. The IR directly supports exceptions via throw

and try-catch-fin; it supports other non-local control flow(e.g., JavaScript’s return, break, and continue) via the jump

construct. The IR supports two forms of loops: while andfor. The for construct corresponds to JavaScript’s reflectivefor..in statement, which allows the programmer to iterate

3Guha et al [27] use a similar approach, but our IR designand formal semantics are quite different. See Section 2 fora discussion of the differences between our two approaches.

over the fields of an object. A method takes exactly twoarguments: self and args, referring to the this object andarguments object; all variants of JavaScript method calls canbe translated to this form. The toobj, tobool, tostr, tonum andisprim constructs are the explicit analogues of JavaScript’simplicit conversions. JavaScript’s builtin objects (e.g,. Math)and methods (e.g., isNaN) are properties of the global objectthat is constructed prior to a program’s execution, thus theyare not a part of the IR syntax.

n ∈ Num b ∈ Bool str ∈ String x ∈ Variable ` ∈ Label

s ∈ Stmt ::= ~si | if e s1 s2 | while e s | x := e | e1.e2 := e3

| x := e1(e2, e3) | x := toobj e | x := del e1.e2

| x := newfun m n | x := new e1(e2) | throw e

| try-catch-fin s1 x s2 s3 | ` s | jump ` e | for x e s

e ∈ Exp ::= n | b | str | undef | null | x | m | e1 ⊕ e2 | �ed ∈ Decl ::= decl −−−−→xi = ei in s

m ∈ Meth ::= (self, args)⇒ d | (self, args)⇒ s

⊕ ∈ BinOp ::= + | − | × | ÷ | % | << | >> | >>> | <| ≤ | & | ′|′ | Y | and | or | ++ | ≺ | �| ≈ | ≡ | . | instanceof | in

� ∈ UnOp ::= − | ∼ | ¬ | typeof | isprim | tobool

| tostr | tonum

Figure 1: The abstract syntax of notJS provides canonical con-structs that simplify JavaScript’s behavior. The vector notationrepresents (by abuse of notation) an ordered sequence of unspec-ified length n, where i ranges from 0 to n− 1.

Note that our intermediate language is not based on acontrol-flow graph but rather on an abstract syntax tree(AST), further distinguishing it from existing JavaScriptanalyses. JavaScript’s higher-order functions, implicit ex-ceptions, and implicit type conversions (that can executearbitrary user-defined code) make a program’s control-flowextremely difficult to precisely characterize without exten-sive analysis of the very kind we are using the intermedi-ate language to carry out. Other JavaScript analyses thatdo use a flow-graph approach start by approximating thecontrol-flow and then fill in more control-flow informationin an ad-hoc manner as the analysis progresses; this leadsto both imprecision and unsoundness (for example, one ofthe soundness bugs we discovered in TAJS was directly dueto this issue). JSAI uses the smallstep abstract machine se-mantics to determine control-flow during the analysis itselfin a sound manner.

An important design decision we made is to carefully sep-arate the language into pure expressions (e ∈ Exp) thatare guaranteed to terminate without throwing an exception,and impure statements (s ∈ Stmt) that do not have theseguarantees. This decision directly impacts the formal se-mantics and implementation of notJS by reducing the sizeof the formal semantics4 and the corresponding code to one-third of the previous size compared to a version withoutthis separation, and vastly simplifying them. This is thefirst IR for JavaScript we are aware of that makes this de-sign choice—it is a more radical choice than might first beapparent, because JavaScript’s implicit conversions make it

4Specifically, the number of semantic continuations andtransition rules.

Page 4: JSAI: A Static Analysis Platform for JavaScriptbenh/research/papers/kashyap14jsai.pdf · refactoring, and more. However, JavaScript’s inherently dy-namic nature and many unintuitive

difficult to enforce this separation without careful thought.Other design decisions of note include making JavaScript’simplicit conversions (which are complex and difficult to rea-son about, involving multiple steps and alternatives depend-ing on the current state of the program) explicit in notJS(the constructs toobj, isprim, tobool, tostr, tonum are used forthis); leaving certain JavaScript constructs unlowered to al-low for a more precise abstract semantics (e.g., the for..inloop, which we leave mostly intact as for x e s); and simpli-fying method calls to make the implicit this parameter andarguments object explicit; this is often, but not always, theaddress of a method’s receiver object, and its value can benon-intuitive, while arguments provides a form of reflectionproviding access to a method’s arguments.

Given the notJS abstract syntax, we need to design a for-mal concrete semantics that (together with the translationto notJS) captures JavaScript behavior. We have two maincriteria: (1) the semantics should be specified in a man-ner that can be directly converted into an implementation,allowing us to test its behavior against actual JavaScriptimplementations; (2) looking ahead to the abstract versionof the semantics (which defines our analysis), the seman-tics should be specified in a manner that allows for config-urable sensitivity. These requirements lead us to specify thenotJS semantics as an abstract machine-based smallstep op-erational semantics. One can think of this semantics as aninfinite state transition system, wherein we formally definea notion of state and a set of transition rules that connectstates. The semantics is implemented by turning the statedefinition into a data structure (e.g., a Scala class) and thetransition rules into functions that transform a given stateinto the next state. The concrete interpreter starts with aninitial state (containing the start of the program and all ofthe builtin JavaScript methods and objects), and continuallycomputes the next state until the program finishes.

We omit further details of the concrete semantics for spaceand because they have much in common with the abstractsemantics described in the next section. The main differencebetween the two is that the abstract state employs sets inplaces where the concrete state employs singletons, and theabstract transition rules are nondeterministic whereas theconcrete rules are deterministic. Both of these differencesare because the abstract semantics over-approximates theconcrete semantics.

Testing the Semantics. We tested the translation tonotJS, the notJS semantics, and implementations thereofby comparing the resulting program execution behavior withthat of a commercial JavaScript engine, SpiderMonkey [7].We first manually constructed a test suite of over 243 pro-grams that were either hand-crafted to exercise various partsof the semantics, or taken from existing JavaScript programsused to test commercial JavaScript implementations. Wethen added over one million randomly generated JavaScriptprograms to the test suite. We ran all of the programs inthe test suite on SpiderMonkey and on our concrete inter-preter, and we verified that they produce identical output.Because the ECMA specification is informal we can nevercompletely guarantee that the notJS semantics is equivalentto the spec, but we can do as well as other JavaScript imple-mentations, which also use testing to establish conformancewith the ECMA specification.

3.2 Designing the Abstract Semantics

The JavaScript static analysis is defined as an abstractsemantics for notJS that over-approximates the notJS con-crete semantics. The analysis is implemented by computingthe set of all abstract states reachable from a given initialstate by following the abstract transition rules. The analysiscontains some special machinery that provides configurablesensitivity. We illustrate our approach via a worklist algo-rithm that ties these concepts together:

Algorithm 1 The JSAI worklist algorithm

1: put the initial abstract state ς0 on the worklist2: initialize map partition : Trace → State] to empty3: repeat4: remove an abstract state ς from the worklist5: for all abstract states ς ′ in next_states(ς) do6: if partition does not contain trace(ς ′) then7: partition(trace(ς ′)) = ς ′

8: put ς ′ on worklist9: else

10: ςold = partition(trace(ς ′))11: ςnew = ςold t ς ′12: if ςnew 6= ςold then13: partition(trace(ς ′)) = ςnew14: put ςnew on worklist15: end if16: end if17: end for18: until worklist is empty

The static analysis performed by this worklist algorithm isdetermined by the definitions of the abstract semantic statesς ∈ State], the abstract transition rules5 next_states ∈State] → P(State]), and the knob that configures the anal-ysis sensitivity trace(ς).

Abstract Semantic Domains. Figure 2 shows our def-inition of an abstract state for notJS. An abstract state ςconsists of a term that is either a notJS statement or anabstract value that is the result of evaluating a statement;an environment that maps variables to (sets of) addresses; astore mapping addresses to either abstract values, abstractobjects, or sets of continuations (to enforce computability forabstract semantics that use semantic continuations, as perVan Horn and Might [46]); and finally a continuation stackthat represents the remaining computations to perform—one can think of this component as analogous to a runtimestack that remembers computations that should completedonce the current computation is finished.

Abstract values are either exception/jump values (EValue],JValue]), used to handle non-local control-flow, or base val-ues (BValue]), used to represent JavaScript values. Basevalues are a tuple of abstract numbers, booleans, strings,addresses, null, and undefined; each of these componentsis a lattice. Base values are defined as tuples because theanalysis over-approximates the concrete semantics, and thuscannot constrain values to be only a single type at a time.These value tuples yield a type inference analysis: any com-ponent of this tuple that is a lattice ⊥ represents a type thatthis value cannot contain. Base values do not include func-tion closures, because functions in JavaScript are actually

5Omitted for space; available in supplementary materials.

Page 5: JSAI: A Static Analysis Platform for JavaScriptbenh/research/papers/kashyap14jsai.pdf · refactoring, and more. However, JavaScript’s inherently dy-namic nature and many unintuitive

n ∈ Num]

str ∈ String]

a ∈ Address] � ∈ UnOp

] ⊕ ∈ BinOp]

ς ∈ State]= Term

] × Env] × Store

] ×Kont]

t ∈ Term]= Decl + Stmt + Value

]

ρ ∈ Env]= Variable → P(Address

])

σ ∈ Store]= Address

] → (BValue]+ Object

]+ P(Kont

]))

bv ∈ BValue]= Num

] × P(Bool)× String] × P(Address

])×

P({null})× P({undef})

o ∈ Object]= (String

] → BValue])× P(String)×

(String → (BValue]+ Class + P(Closure

])))

c ∈ Class = {function, array, string, boolean, number, date,

error, regexp, arguments, object, . . .}

clo ∈ Closure]= Env

] ×Meth

ev ∈ EValue]::= exc bv

jv ∈ JValue]::= jmp ` bv

v ∈ Value]= BValue

]+ EValue

]+ JValue

]

κ ∈ Kont]::= haltK | seqK ~si κ | whileK e s κ | lblK ` κ

| forK−→stri x s κ | retK x ρ κ ctor | retK x ρ κ call

| tryK x s s κ | catchK s κ | finK v κ | addrK a

Figure 2: Abstract semantic domains for notJS.

objects. Instead, we define a class of abstract objects thatcorrespond to functions and that contain a set of closuresthat are used when that object is called as a function. Wedescribe our novel abstract object domain in more detail inSection 3.3.

Each component of the tuple also represents an individualanalysis: the abstract number domain determines a num-ber analysis, the abstract string domain determines a stringanalysis, the abstract addresses domain determines a pointeranalysis, etc. Composing the individual analyses representedby the components of the value tuple is not a trivial task;a simple cartesian product of these domains (which corre-sponds to running each analysis independently, without us-ing information from the other analyses) would be impreciseto the point of being useless. Instead, we specify a reducedproduct [21] of the individual analyses, which means thatwe define the semantics so that each individual domain cantake advantage of the other domains’ information to improvetheir results. The abstract number and string domains areintentionally unspecified in the semantics; they are config-urable. We discuss our specific implementations of the ab-stract string domain in Section 3.3.

Together, all of these abstract domains define a set ofsimultaneous analyses: control-flow analysis (for each call-site, which methods may be called), pointer analysis (foreach object reference, which objects may be accessed), typeinference (for each value, can it be a number, a boolean,a string, null, undef, or a particular class of object), andextended versions of boolean, number, and string constantpropagation (for each boolean, number and string value, isit a known constant value). These analyses combine to givedetailed control- and data-flow information forming a fun-damental analysis that can be used by many possible clients(e.g., error detection, program slicing, secure informationflow, etc).

Current State ς Next State ς′

1 〈s ::~si, ρ, σ, κ〉 〈s, ρ, σ, seqK ~si κ〉2 〈bv , ρ, σ, seqK s ::~si κ〉 〈s, ρ, σ, seqK ~si κ〉3 〈bv , ρ, σ, seqK ε κ〉 〈bv , ρ, σ, κ〉4 〈if e s1 s2, ρ, σ, κ〉 〈s1, ρ, σ, κ〉 if true ∈ πb(JeK)5 〈if e s1 s2, ρ, σ, κ〉 〈s2, ρ, σ, κ〉 if false ∈ πb(JeK)

Figure 3: A small subset of the abstract semantics rules for JSAI.Each smallstep rule describes a transition relation from one ab-stract state ς to the next state ς′. The phrase πb(JeK) means toevaluate expression e to an abstract base value, then project outits boolean component.

Abstract Transition Rules. Figure 3 describes a smallsubset of the abstract transition rules to give their flavor. Tocompute next_states(ς), the components of ς are matchedagainst the premises of the rules to find which rule(s) arerelevant; that rule then describes the next state (if multiplerules apply, then there will be multiple next states). Therules 1, 2 and 3 deal with sequences of statements. Rule 1says that if the state’s term is a sequence, then pick the firststatement in the sequence to be the next state’s term; thentake the rest of the sequence and put it in a seqK continua-tion for the next state, pushing it on top of the continuationstack. Rule 2 says that if the state’s term is a base value(and hence we have completed the evaluation of a state-ment), take the next statement from the seqK continuationand make it the term for the next state. Rule 3 says that ifthere are no more statements in the sequence, pop the seqK

continuation off of the continuation stack. The rules 4 and 5deal with conditionals. Rule 4 says that if the guard expres-sion evaluates to an abstract value that over-approximatestrue, make the true branch statement the term for the nextstate; rule 5 is similar except it takes the false branch. Notethat these rules are nondeterministic, in that the same statecan match both rules.

Configurable Sensitivity. To enable configurable sensi-tivity, we build on the insights of Hardekopf et al [29]. Weextend the abstract state to include an additional compo-nent from a Trace abstract domain. The worklist algorithmuses the trace function to map each abstract state to itstrace, and joins together all reachable abstract states thatmap to the same trace (see lines 10–11 of Algorithm 1). Thedefinition of Trace is left to the analysis designer; differentdefinitions yield different sensitivities. For example, sup-pose Trace is defined as the set of program points, and anindividual state’s trace is the current program point. Thenour worklist algorithm computes a flow-sensitive, context-insensitive analysis: all states at the same program point arejoined together, yielding one state per program point. Sup-pose we redefine Trace to be sequences of program points,and an individual state’s trace to be the last k call-sites.Then our worklist algorithm computes a flow-sensitive, k-CFA context-sensitive analysis. Arbitrary sensitivities (in-cluding path-sensitivity and property simulation) can be de-fined in this manner solely by redefining Trace, withoutaffecting the worklist algorithm or the abstract transitionrules. We explore a number of possibilities in Section 5.

3.3 Novel Abstract DomainsJSAI allows configurable abstract number and string do-

Page 6: JSAI: A Static Analysis Platform for JavaScriptbenh/research/papers/kashyap14jsai.pdf · refactoring, and more. However, JavaScript’s inherently dy-namic nature and many unintuitive

>

SNotSpl SNotNum

SNum SNotNumNorSpl SSpl

"1" · · · "2" · · · "foo" "bar"· · · "valueOf" · · ·

⊥Figure 4: Our default string abstract domain, further explainedin Section 3.3.

mains, but we also provide default domains based on ourexperience with JavaScript analysis. We motivate and de-scribe our default abstract string domain here. We also de-scribe our novel abstract object domain, which is an integralpart of the JSAI abstract semantics.

Abstract Strings. Our initial abstract string domain String]

was an extended string constant domain. The elements wereeither constant strings, or strings that are definitely num-bers, or strings that are definitely not numbers, or > (acompletely unknown string). This string domain is simi-lar to the one used by TAJS [33], and it is motivated bythe precision gained while analyzing arrays: arrays are justobjects where array indices are represented with numericstring properties such as "0", "1", etc, but they also havenon-numeric properties like "length". However, this initialstring domain was inadequate.

In particular, we discovered a need to express that a stringis not contained within a given hard-coded set of strings.Consider the property lookup x := obj[y], where y is a vari-able that resolves to an unknown string. Because the stringis unknown, the analysis is forced to assign to x not onlythe lattice join of all values contained in obj, but also thelattice join of all the values contained in all prototypes ofobj, due to the rules of prototype-based inheritance. Almostall object prototype chains terminate in one of the builtinobjects contained in the global object (Object.prototype,Array.prototype, etc); these builtin objects contain the builtinvalues and methods. Thus, all of these builtin values andmethods are returned for any object property access basedon an unknown string, polluting the analysis. One possibleway to mitigate this problem is to use an expensive domainthat can express arbitrary complements (i.e., express thata string is not contained in some arbitrary set of strings).Instead, we extend the string domain to separate out specialstrings (valueOf, toString etc, fixed ahead of time) from therest; these special strings are drawn from property namesof builtin values and methods. We can thus express that astring has an unknown value that is not one of the specialvalues. This is a practical solution that improves precisionat minimal cost.

The new abstract string domain depicted in Figure 4 (thatseparates unknown strings into numeric, non-numeric andspecial strings) was simple to implement due to JSAI’s con-figurable architecture; it did not require changes to any otherparts of the implementation despite the pervasive use ofstrings in all aspects of JavaScript semantics.

Abstract Objects. We highlight the abstract domain Object]

given in Figure 2 as a novel contribution. Previous JavaScript

analyses model abstract objects as a tuple containing (1) amap from property names to values; and (2) a list of defi-nitely present properties (necessary because property namesare just strings, and objects can be modified using unknownstrings as property names). However, according to the ECMAstandard objects can be of different classes, such as func-tions, arrays, dates, regexps, etc. While these are all objectsand share many similarities, there are semantic differencesbetween objects of different classes. For example, the length

property of array objects has semantic significance: assign-ing a value to length can implicitly add or delete propertiesto the array object, and certain values cannot be assigned tolength without raising a runtime exception. Non-array ob-jects can also have a length field, but assigning to that fieldwill have no other effect. The object’s class dictates thesemantics of property enumerate, update, and delete oper-ations on an object. Thus, the analysis must track whatclasses an abstract object may belong to in order to accu-rately model these semantic differences. If abstract objectscan belong to arbitrary sets of classes, this tracking andmodeling becomes complex, error-prone, and inefficient.

Our innovation is to add a map as the third componentof abstract objects that contains class-specific values. Thiscomponent also records which class an abstract object be-longs to. Finally, the semantics is designed so that any givenabstract object must belong to exactly one class. This is en-forced by assigning abstract addresses to objects based notjust on their static allocation site and context, but also onthe constructor used to create the object (which determinesits class). The resulting abstract semantics is much simpler,more efficient, and precise.

4. SHOWCASING CONFIGURABILITYAnalysis sensitivity (path-, context-, and heap-sensitivity)

hsa a significant impact on the usefulness and practicality ofthe analysis. The sensitivity represents a tradeoff betweenprecision and performance: the more sensitive the analysisis the more precise it can be, but also the more costly it canbe. The “sweet-spot” in this tradeoff varies from analysisto analysis and from program to program. JSAI allows theuser to easily specify different sensitivities in a modular way,separately from the rest of the analysis.

A particularly important dimension of sensitivity is context-sensitivity : how the (potentially infinite) possible methodcall instances are partitioned and merged into a finite num-ber of abstract instances. The current state of the art forJavaScript static analysis has explored only a few possiblecontext-sensitivity strategies, all of which are baked into theanalysis and difficult to change, with no real basis for choos-ing these over other possible strategies.

We take advantage of JSAI’s configurability to define andevaluate a much larger selection of context-sensitivities thanhas ever been evaluated before in a single paper. Becauseof JSAI’s design, specifying each sensitivity takes only 5–20lines of code; previous analysis implementations would haveto hard-code each sensitivity from scratch. The JSAI anal-ysis designer specifies a sensitivity by instantiating a par-ticular instance of Trace; all abstract states with the sametrace will be merged together. For context-sensitivity, wedefine Trace to include some notion of the calling context,so that states in the same context are merged while statesin different contexts are kept separate.

We implement six main context-sensitivity strategies, each

Page 7: JSAI: A Static Analysis Platform for JavaScriptbenh/research/papers/kashyap14jsai.pdf · refactoring, and more. However, JavaScript’s inherently dy-namic nature and many unintuitive

parameterized in various ways, yielding a total of 56 dif-ferent forms of context-sensitivity. All of our sensitivitiesare flow-sensitive (JavaScript’s dynamic nature means thatflow-insensitive analyses tend to have terrible precision). Weempirically evaluate all of these strategies in Section 5; herewe define the six main strategies. Four of the six strate-gies are known in the literature, while two are novel to thispaper. The novel strategies are based on two hypothesesabout context definitions that might provide a good balancebetween precision and performance. Our empirical evalua-tion demonstrates that these hypotheses are false, i.e., theydo not provide any substantial benefit. We include themhere not as examples of good sensitivities to use, but ratherto demonstrate that JSAI makes it easy to formulate andtest hypotheses about analysis strategies—each novel strat-egy took only 15–20 minutes to implement. The strategieswe defined are as follows, where the first four are known andthe last two are novel:

Context-insensitive. All calls to a given method are merged.We define the context component of Trace to be a unit value,so that all contexts are the same.

Stack-CFA. Contexts are distinguished by the list of call-sites on the call-stack. This strategy is k-limited to ensurethere are only a finite number of possible contexts. We definethe Trace component to contain the top k call-sites.

Acyclic-CFA. Contexts are distinguished the same as Stack-CFA, but instead of k-limiting we collapse recursive call cy-cles. We define Trace to contain all call-sites on the call-stack, except that cycles are collapsed.

Object-sensitive. Contexts are distinguished by a listof addresses corresponding to the chain of receiver objects(corresponding to full-object-sensitivity in Smaragdakis etal. [43]). We define Trace to contain this information (k-limited to ensure finite contexts).

Signature-CFA. Type information is important for dy-namically typed languages, so intuitively it seems that typeinformation would make good contexts. We hypothesizethat defining Trace to record the types of a call’s argumentswould be a good context-sensitivity, so that all k-limited callpaths with the same types of arguments would be merged.

Mixed-CFA. Object-sensitivity uses the address of the re-ceiver object. However, in JavaScript the receiver objectis often the global object created at the beginning of theprogram execution. Intuitively, it seems this would meanthat object sensitivity might merge many calls that shouldbe kept separate. We hypothesized that it might be benefi-cial to define Trace as a modified object-sensitive strategy—when object-sensitivity would use the address of the globalobject, this strategy uses the current call-site instead.

5. EVALUATIONIn this section we evaluate JSAI’s precision and perfor-

mance for a range of context-sensitivities as described inSection 4, for a total of 56 distinct sensitivities. We run eachsensitivity on 28 benchmarks collected from four differentapplication domains and analyze the results, yielding sur-prising observations about context-sensitivity and JavaScript.We also briefly evaluate JSAI as compared to TAJS [33], themost comparable existing JavaScript analysis.

5.1 Implementation and MethodologyWe implement JSAI using Scala version 2.10. We pro-

vide a model for the DOM, event handling loop (handledas non-deterministic execution of event-handling functions),and other native APIs used in our benchmarks. The baselineanalysis sensitivity we evaluate is fs (flow-sensitive, context-insensitive); all of the other evaluated sensitivities are moreprecise than fs. The other sensitivities are: k.h-stack, h-acyclic,k.h-obj, k.h-sig, and k.h-mixed, where k is the context depthfor k-limiting and h is the heap-sensitivity (i.e., the contextdepth used to distinguish abstract addresses). The parame-ters k and h vary from 1 to 5 and h ≤ k.

We use a comprehensive benchmark suite to evaluate thesensitivities. Most prior work on JavaScript static analysishas been evaluated only on the standard SunSpider [8] andV8 [9] benchmarks, with a few micro-benchmarks thrownin. We evaluate JSAI on these standard benchmarks, butwe also include real-world representatives from a diverse setof JavaScript application domains. We choose seven rep-resentative programs from each domain, for a total of 28programs. We partition the programs into four categories,described below. For each category, we provide the meansize of the benchmarks in the suite (expressed as numberof AST nodes generated by the Rhino parser [10]) and themean translator blowup (i.e., the factor by which the num-ber of AST nodes increases when translating from JavaScriptto notJS). The benchmark names are shown in the graphspresented below; the benchmark suite is included in the sup-plementary material.

The benchmark categories are: standard: seven of thelargest, most complex benchmarks from SunSpider [8] andV8 [9] (mean size: 2858 nodes; mean blowup: 8×); addon:seven Firefox browser addons selected from the official Mozillaaddon repository [11] (mean size: 2597 nodes; mean blowup:6×); generated: seven programs from the Emscripten LLVMtest suite, which translates LLVM bitcode to JavaScript [5](mean size: 38211 nodes; mean blowup: 7×); and finallyopensrc: seven real-world JavaScript programs taken fromopen source JavaScript frameworks and their test suites [12,13] (mean size: 8784 nodes; mean blowup: 6.4×).

Our goal is to evaluate the precision and performance ofJSAI instantiated with several forms of context sensitiv-ity. However, the different sensitivities yield differing setsof function contexts and abstract addresses, making a faircomparison difficult. Therefore, rather than statistical mea-surements (such as address-set size or closure-set size), wechoose a client-based precision metric based on a error re-porting client. This metric is a proxy for the precision ofthe analysis.

Our precision metric reports the number of static programlocations (i.e., AST nodes) that might throw exceptions,based on the analysis’ ability to precisely track types. Java-Script throws a TypeError exception when a program at-tempts to call a non-function or when a program tries to ac-cess, update, or delete a property of null or undef. JavaScriptthrows a RangeError exception when a program attempts toupdate the length property of an array to contain a valuethat is not an unsigned 32-bit integer. Fewer errors indicatea more precise analysis.

The performance metric we use is execution time of theanalysis. To gather data on execution time, we run eachexperimental configuration 11 times, discard the first result,then report the median of the remaining 10 trials. We set a

Page 8: JSAI: A Static Analysis Platform for JavaScriptbenh/research/papers/kashyap14jsai.pdf · refactoring, and more. However, JavaScript’s inherently dy-namic nature and many unintuitive

time limit of 30 minutes for each run, reporting a timeoutif execution time exceeds that threshold. We run all ex-periments on Amazon Web Services [14] (AWS), using M1XLarge instances; each experiment is run on an indepen-dent AWS instance. These instances have 15GB memoryand 8 ECUs, where each ECU is equivalent CPU capacityof a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.

We run all 56 analyses on each of the 28 benchmarks, fora total of 1,568 trials (multiplied by an additional 10 execu-tions for each analysis/benchmark pair for the timing data).For reasons of space, we present only highlights of these re-sults. In some cases, we present illustrative examples; theomitted results show similar behavior. In other cases, wedeliberately cherry-pick, to highlight contrasts. We are ex-plicit about our approach in each case.

(a) addon benchmarks

tryagainodesk_job_wat…less_spam_ple…live_pagerankcoffee_pods_d…chesspinpoints

(b) generated benchmarks

fastallubenchmarkfourinarowahasgefahashtestfannkuch

(c) opensrc benchmarks

rsalinq_aggregateaeslinq_enumerablelinq_functionallinq_actionlinq_dictionary

(d) standard benchmarks

crypto-sha1richardssplay3d-cubeaccess-nbody3d-raytracecryptobench

1.0

-sta

ck 5

.4-s

tack

4-a

cycl

ic 1

.0-o

bj 5

.4-o

bj 1

.0-s

ig 5

.4-s

ig 1

.0-m

ixed

5.4

-mix

ed fs

Figure 6: A heat map to showcase the performance characteris-tics of different sensitivities across the benchmark categories. Theabove figure is a two-dimensional map of blocks; rows correspondto benchmarks, and columns correspond to analysis run with aparticular sensitivity. The color in a block indicates a sensitiv-ities’ relative performance on the corresponding benchmark, ascompared to fastest sensitivity on that benchmark. Darker col-ors represent better performance. Completely blackened blocksindicate that the corresponding sensitivity has the fastest anal-ysis time on that benchmark, while completely whitened blocksindicate that the corresponding sensitivity does not time out, buthas a relative slowdown of at least 2×. The remaining colors areof evenly decreasing contrast from black to white, representing aslowdown between 1× to 2×. The red grid pattern on a blockindicates a timeout.

5.2 ObservationsFor each main sensitivity strategy, we present the data

for two trials: the least precise sensitivity in that strategy,and the most precise sensitivity in that strategy. This set ofanalyses is: fs, 1.0-stack, 5.4-stack, 4-acyclic, 1.0-obj, 5.4-obj,1.0-sig, 5.4-sig, 1.0-mixed, 5.4-mixed.

Figures 5 and 6 contain performance results, and Figure 7contains the precision results. The results are partitioned bybenchmark category to show the effect of each analysis sen-sitivity on benchmarks in that category. The performancegraphs in Figure 5 plot the median execution time in mil-liseconds, on a log scale, giving a sense of actual time takenby the various sensitivity strategies. Lower bars are better;timeouts extend above the top of the graph.

We provide an alternate visualization of the performancedata through Figure 6 to easily depict how the sensitivitiesperform relative to each other. Figure 6 is heat map that laysout blocks in two dimensions—rows represent benchmarksand columns represent analyses with different sensitivities.Each block represents relative performance as a color: darkerblocks correspond to faster execution time of a sensitivitycompared to other sensitivities on the same benchmark. Acompletely blackened block corresponds to the fastest sensi-tivity on that benchmark, a whitened block corresponds toa sensitivity that has ≥ 2× slowdown relative to the fastestsensitivity, and the remaining colors evenly correspond toslowdowns in between. Blocks with the red grid pattern in-dicate a timeout. A visual cue is that columns with darkerblocks correspond to better-performing sensitivities, and arow with blocks that have very similar colors indicates abenchmark on which performance is unaffected by varyingsensitivities.

Figure 7 provides a similar heat map (with similar visualcues) for visualizing relative precisions of various sensitiv-ity strategies on our benchmarks. The final column in thisheat map provides the number of errors reported by the fs

strategy on a particular benchmark, while the rest of thecolumns provide the percentage reduction (relative to fs) inthe number of reported errors due to a corresponding sensi-tivity strategy. The various blocks (except the ones in thefinal column) are color coded in addition to providing per-centage reduction numbers: darker is better precision (thatis, more reduction in number of reported errors). Timeoutsare indicated using a red grid pattern.

Breaking the Glass Ceiling. One startling observationis that highly sensitive variants (i.e., sensitivity strategieswith high k and h parameters) can be far better than theirless-sensitive counterparts, providing improved precision ata much cheaper cost (see Figure 8). For example, on linq_-dictionary, 5.4-stack is the most precise and most efficientanalysis. By contrast, the 3.2-stack analysis yields the sameresult at a three-fold increase in cost, while the 1.0-stack

analysis is even more expensive and less precise. We seesimilar behavior for the sgefa benchmark, where 5.4-stack isan order of magnitude faster than 1.0-stack and delivers thesame results. This behavior violates the common wisdomthat values of k and h above 1 or 2 are intractably expensive.

This behavior is certainly not universal,6 but it is intrigu-ing. Analysis designers often try to scale up their context-sensitivity (in terms of k and h) linearly, and they stop whenit becomes intractable. However, our experiments suggestthat pushing past this local barrier may yield much better

6For example, linq_aggregate times out on all analyses withk > 1.

Page 9: JSAI: A Static Analysis Platform for JavaScriptbenh/research/papers/kashyap14jsai.pdf · refactoring, and more. However, JavaScript’s inherently dy-namic nature and many unintuitive

1.0 stack 5.4 stack 4. acyclic 1.0 obj 5.4 obj 1.0 sig 5.4 sig 1.0 mixed 5.4 mixed fs

1,000

10,000

100,000

1,000,000

crypto-sha1richards

splay3d-cube

access-nbody

3d-raytrace

cryptobench

1,000

10,000

tryagain

odesk_job_wa

t...

less_spam_ple...

live_pagerank

coffee_pods_d...

chess

pinpoints

Exe

cutio

n Ti

me

(ms)

(a) addon benchmarks

1,000

10,000

100,000

1,000,000

fasta

llubenchmark

fourinarow aha sge

fahashtest

fannkuch

Exe

cutio

n Ti

me

(ms)

(b) generated benchmarks

100

1,000

10,000

100,000

1,000,000

rsa

linq_aggregate aes

linq_enumerable

linq_functional

linq_action

linq_dictionary

Exe

cutio

n Ti

me

(ms)

(c) opensrc benchmarks

1,000

10,000

100,000

1,000,000

crypto-sha1

richards

splay

3d-cube

access-nbody

3d-raytrace

cryptobench

Exe

cutio

n Ti

me

(ms)

(d) standard benchmarksFigure 5: Performance characteristics of different sensitivities across the benchmark categories. The x-axis gives the benchmark names.The y-axis (log scale) gives for each benchmark, the time taken by the analysis (in milliseconds) when run under 10 different sensitivities.Lower bars mean better performance. Timeout (30 minutes) bars are flush with the top of the graph.

results.

Callstring vs Object Sensitivity. In general, we findthat callstring-based sensitivity (i.e., k.h-stack and h-acyclic)is more precise than object sensitivity (i.e., k.h-obj). This re-sult is unintuitive, since JavaScript heavily relies on objectsand object sensitivity was specifically designed for object-oriented languages such as Java. Throughout the bench-marks, the most precise and efficient analyses are the onesthat employ stack-based k-CFA. Part of the reason for thistrend is that 25% of the benchmarks are machine-generatedJavaScript versions of procedural code, whose structure yieldsmore benefits to callstring-based context-sensitivity. Evenamong the handwritten open-source benchmarks, however,this trend holds. For example, several forms of callstring sen-sitivity are more efficient and provide more precise results forthe open-source benchmarks than object-sensitivity, whichoften times out.

Benefits of Context Sensitivity. When it comes to pureprecision, we find that more context sensitivity sometimesincreases precision and sometimes has no effect. The open-source benchmarks demonstrate quite a bit of variance forthe precision metric. A context-sensitive analysis almost al-ways finds fewer errors (i.e., fewer false positives) than acontext-insensitive analysis, and increasing the sensitivityin a particular family leads to precision gains. For example,5.4-stack gives the most precise error report for linq_enumerable,and it is an order of magnitude more precise than a context-insensitive analysis. On the other hand, the addon domainhas very little variance for the precision metric, which isperhaps due to shorter call sequence lengths in this domain.In such domains, it might be wise to focus on performance,rather than increasing precision.

Summary. Perhaps the most sweeping claim we can makefrom the data is that there is no clear winner across allbenchmarks, in terms of JavaScript context-sensitivity. Thisstate of affairs is not a surprise: the application domainsfor JavaScript are so rich and varied that finding a silverbullet for precision and performance is unlikely. However,it is likely that—within an application domain, e.g., auto-matically generated JavaScript code—one form of context-sensitivity could emerge a clear winner. The benefit of JSAIis that it is easy to experiment with the control flow sen-sitivity of an analysis. The base analysis has already beenspecified, the analysis designer need only instantiate andevaluate multiple instances of the analysis in a modular wayto tune analysis-sensitivity, without having to worry aboutthe analysis soundness.

5.3 Discussion: JSAI vs. TAJSJensen et al.’s Type Analysis for JavaScript [33, 34] (TAJS)

stands as the only published static analysis for JavaScriptwhose intention is to soundly analyze the entire JavaScriptlanguage. JSAI has several features that TAJS does not,including configurable sensitivity, a formalized abstract se-mantics, and novel abstract domains, but TAJS is a valuablecontribution that has been put to good use. An interestingquestion is how JSAI compares to TAJS in terms of precisionand performance.

The TAJS implementation (in Java) has matured over aperiod of five years, it has been heavily optimized, and it ispublicly available. Ideally, we could directly compare TAJSto JSAI with respect to precision and performance, but theyare dissimilar enough that they are effectively noncompara-ble. For one, TAJS has known soundness bugs that canartificially decrease its set of reported type errors. Also,

Page 10: JSAI: A Static Analysis Platform for JavaScriptbenh/research/papers/kashyap14jsai.pdf · refactoring, and more. However, JavaScript’s inherently dy-namic nature and many unintuitive

(a) addon benchmarks

tryagain 0% 0% 0% 0% 0% 0% 0% 0% 0% 16odesk_job_wat… 0% 0% 0% 0% 0% 0% 0% 0% 0% 18less_spam_ple… 77% 77% 77% 13% 13% 0% 65% 16% 16% 62live_pagerank 15% 15% 15% 0% 0% 0% 0% 15% 15% 13coffee_pods_d… 0% 0% 0% 0% 0% 0% 0% 0% 0% 5chess 17% 17% 17% 8% 8% 0% 8% 8% 8% 24pinpoints 2% 2% 2% 0% 0% 0% 0% 4% 4% 54

(b) generated benchmarks

fasta 92% 94% 94% 17% 17% 0% 17% 92% 92% 36llubenchmark 99% 99% 99% 0% 0% 21% 99% 99% 287fourinarow 88% 92% 92% 0% 0% 0% 0% 88% 88% 24aha 67% 70% 70% 0% 0% 7% 7% 67% 67% 27sgefa 99% 99% 99% 0% 0% 21% 99% 99% 287hashtest 91% 94% 94% 17% 17% 0% 17% 91% 91% 35fannkuch 91% 94% 94% 18% 18% 0% 18% 91% 91% 33

(c) opensrc benchmarks

rsa 29% 32% 32% 0% 0% 0% 6% 9% 9% 34linq_aggregate 88% 1% 2% 267aes 0% 0% 0% 0% 0% 0% 0% 0% 0% 4linq_enumerable 95% 99% 99% 2% 2% 7% 88% 0% 374linq_functional 73% 89% 1% 12% 0% 0% 335linq_action 92% 93% 96% 9% 10% 66% 75% 90% 90% 169linq_dictionary 81% 85% 84% 1% 3% 1% 5% 73% 73% 376

(d) standard benchmarks

crypto-sha1 0% 0% 0% 0% 0% 0% 0% 0% 0% 0richards 0% 26% 26% 2% 2% 0% 0% 12% 12% 42splay 0% 0% 0% 0% 0% 0% 0% 0% 0% 303d-cube 8% 8% 8% 0% 0% 0% 4% 8% 8% 53access-nbody 0% 0% 0% 0% 0% 0% 0% 0% 0% 63d-raytrace 29% 29% 29% 6% 6% 0% 8% 27% 27% 48cryptobench 27% 76% 76% 6% 14% 21% 63% 10% 28% 329

1.0

-sta

ck 5

.4-s

tack

4-a

cycl

ic 1

.0-o

bj 5

.4-o

bj 1

.0-s

ig 5

.4-s

ig 1

.0-m

ixed

5.4

-mix

ed fs

Figure 7: A heat map to showcase the precision characteristics(based on number of reported runtime errors) of different sensitiv-ities across the benchmark categories. The above figure is a two-dimensional map of blocks; rows correspond to benchmarks, andcolumns corresponds to analysis run with a particular sensitiv-ity. The rightmost column corresponds to the context insensitiveanalysis fs, and the blocks in this column give the number of errorsreported by the analysis under fs (which is an upper bound onthe number of errors reported across any sensitivity). The color(which ranges evenly from black to white) in the remaining blocksindicate the percentage reduction in number of errors reported bythe analysis under the corresponding sensitivity, compared to fson the same benchmark. Darker colors represent more reductionin errors reported, and hence better precision. In addition to thecolors, the percentage reduction in errors is also given inside theblocks (higher percentage reduction indicates better precision).The red grid pattern on a block indicates a timeout.

5.4-stack

1.0-stack 5.4-obj

1.0-obj

fs

Figure 8: Precision vs. performance of various sensitivities, onthe opensrc linq_dictionary benchmark. Interestingly, 5.4-stack(the most sensitive Stack-CFA analysis) is not only tractable, itexhibits the best performance and the best precision.

TAJS does not implement some of the APIs required byour benchmark suite, and so it can only run on a subsetof the benchmarks. On the flip side, TAJS is more ma-ture than JSAI, it has a more precise implementation of thecore JavaScript APIs, and it contains a number of precisionand performance optimizations (e.g., the recency heap ab-straction [16] and lazy propagation [34]) that JSAI does notcurrently implement.

Nevertheless, we can perform a qualitative“ballpark”com-parison, to demonstrate that JSAI is roughly comparablein terms of precision and performance. For the subset ofour benchmarks on which both JSAI and TAJS execute, wecatalogue the number of errors that each tool reports andrecord the time it took for each tool to do so. We find thatJSAI analysis time is 0.3× to 1.8× that of TAJS. In termsof precision, JSAI reports from nine fewer type errors to104 more type errors, compared to TAJS. Many of the extratype errors that JSAI reports are RangeErrors, which TAJSdoes not report due to one of the unsoundness bugs we un-covered. Excluding RangeErrors, JSAI reports at most 20more errors than TAJS in the worst case.

6. CONCLUSIONWe have described the design of JSAI, a configurable,

sound, and efficient abstract interpreter for JavaScript. JSAI’sdesign is novel in a number of respects which make it standout from all previous JavaScript analyzers. We have pro-vided a comprehensive evaluation that demonstrates JSAI’susefulness. The JSAI implementation and formalisms arefreely available as a supplement, and we believe that JSAIwill provide a useful platform for people building JavaScriptanalyses.

Acknowledgements. This work was supported by NSFCCF-1319060 and CCF-1117165.

7. REFERENCES[1] http://www.drdobbs.com/windows/

microsofts-javascript-move/240012790.

[2] http://nodejs.org/.

[3] http://www.mozilla.org/en-US/firefox/os/.

[4] http://www.khronos.org/registry/typedarray/

specs/latest/.

[5] http://www.emscripten.org/.

[6] http://doctorjs.org/.

[7] https://developer.mozilla.org/en-US/docs/

SpiderMonkey.

[8] http:

//www.webkit.org/perf/sunspider/sunspider.html.

[9] http://v8.googlecode.com/svn/data/benchmarks/

v7/run.html.

[10] https://developer.mozilla.org/en-US/docs/Rhino.

[11] https://addons.mozilla.org/en-US/firefox/.

[12] http://linqjs.codeplex.com/.

[13] http://www.defensivejs.com/.

[14] http://aws.amazon.com/.

[15] C. Anderson, P. Giannini, and S. Drossopoulou.Towards type inference for javascript. In Europeanconference on Object-oriented programming, 2005.

Page 11: JSAI: A Static Analysis Platform for JavaScriptbenh/research/papers/kashyap14jsai.pdf · refactoring, and more. However, JavaScript’s inherently dy-namic nature and many unintuitive

[16] G. Balakrishnan and T. Reps. Recency-abstraction forheap-allocated storage. In International conference onStatic Analysis, 2006.

[17] S. Bandhakavi, N. Tiku, W. Pittman, S. T. King,P. Madhusudan, and M. Winslett. Vetting browserextensions for security vulnerabilities with vex.Commun. ACM, 54(9), Sept. 2011.

[18] M. Bravenboer and Y. Smaragdakis. Strictlydeclarative specification of sophisticated points-toanalyses. In ACM International Conference on ObjectOriented Programming Systems Languages andApplications. ACM, 2009.

[19] R. Chugh, D. Herman, and R. Jhala. Dependent typesfor javascript. In International Conference on ObjectOriented Programming Systems Languages andApplications, 2012.

[20] R. Chugh, J. A. Meister, R. Jhala, and S. Lerner.Staged information flow for javascript. In ACMSIGPLAN Conference on Programming LanguagesDesign and Implementation, 2009.

[21] P. Cousot and R. Cousot. Systematic design ofprogram analysis frameworks. In ACM Symposium onPrinciples of Programming Languages, 1979.

[22] ECMA. ECMA-262: ECMAScript LanguageSpecification. Third edition, Dec. 1999.

[23] A. Feldthaus, M. Schafer, M. Sridharan, J. Dolby, andF. Tip. Efficient construction of approximate callgraphs for javascript ide services. In InternationalConference on Software Engineering. IEEE Press,2013.

[24] P. A. Gardner, S. Maffeis, and G. D. Smith. Towards aprogram logic for javascript. In ACM Symposium onPrinciples of programming languages, 2012.

[25] S. Guarnieri and B. Livshits. Gatekeeper: mostlystatic enforcement of security and reliability policiesfor javascript code. In Conference on USENIXsecurity symposium, 2009.

[26] A. Guha, S. Krishnamurthi, and T. Jim. Using staticanalysis for Ajax intrusion detection. In World WideWeb Conference, 2009.

[27] A. Guha, C. Saftoiu, and S. Krishnamurthi. Theessence of javascript. In European conference onObject-oriented programming, 2010.

[28] A. Guha, C. Saftoiu, and S. Krishnamurthi. Typinglocal control and state using flow analysis. InEuropean conference on Programming languages andsystems, 2011.

[29] B. Hardekopf, B. Wiedermann, B. Churchill, andV. Kashyap. Widening for control-flow. InInternational Conference on Verification, ModelChecking, and Abstract Interpretation, 2014.

[30] P. Heidegger and P. Thiemann. Recency types foranalyzing scripting languages. European conference onObject-oriented programming, 2010.

[31] D. Jang and K.-M. Choe. Points-to analysis forjavascript. In Symposium on Applied Computing, 2009.

[32] S. H. Jensen, P. A. Jonsson, and A. Møller.Remedying the Eval that Men Do. In InternationalSymposium on Software Testing and Analysis, 2012.

[33] S. H. Jensen, A. Møller, and P. Thiemann. TypeAnalysis for Javascript. In International Symposiumon Static Analysis, 2009.

[34] S. H. Jensen, A. Møller, and P. Thiemann.Interprocedural Analysis with Lazy Propagation. InInternational Symposium on Static Analysis, 2010.

[35] V. Kashyap and B. Hardekopf. Security signatureinference for javascript-based browser addons. InSymposium on Code Generation and Optimization,2014.

[36] V. Kashyap, J. Sarracino, J. Wagner, B. Wiedermann,and B. Hardekopf. Type refinement for static analysisof javascript. In Symposium on Dynamic Languages,2013.

[37] G. Kastrinis and Y. Smaragdakis. Hybridcontext-sensitivity for points-to analysis. In ACMSIGPLAN Conference on Programming LanguagesDesign and Implementation. ACM, 2013.

[38] H. Lee, S. Won, J. Jin, J. Cho, and S. Ryu. Safe:Formal specification and implementation of a scalableanalysis framework for ecmascript. In InternationalWorkshop on Foundations of Object-OrientedLanguages, 2012.

[39] F. Logozzo and H. Venter. Rata: Rapid Atomic TypeAnalysis by Abstract Interpretation – Application toJavascript Optimization. In Joint EuropeanConference on Theory and Practice of Software,International Conference on Compiler Construction,2010.

[40] M. Madsen, B. Livshits, and M. Fanning. Practicalstatic analysis of JavaScript applications in thepresence of frameworks and libraries. In ACMSymposium on the Foundations of SoftwareEngineering, Aug. 2013.

[41] S. Maffeis, J. C. Mitchell, and A. Taly. An operationalsemantics for javascript. In Asian Symposium onProgramming Languages and Systems, 2008.

[42] M. Schafer, M. Sridharan, J. Dolby, and F. Tip.Dynamic determinacy analysis. In ACM SIGPLANConference on Programming Languages Design andImplementation. ACM, 2013.

[43] Y. Smaragdakis, M. Bravenboer, and O. Lhotak. Pickyour contexts well: understanding object-sensitivity.In ACM Symposium on Principles of programminglanguages, 2011.

[44] A. Taly, U. Erlingsson, J. C. Mitchell, M. S. Miller,and J. Nagra. Automated analysis of security-criticaljavascript apis. In IEEE Symposium on Security andPrivacy, 2011.

[45] P. Thiemann. Towards a Type System for AnalyzingJavascript Programs. In European Conference onProgramming Languages and Systems, 2005.

[46] D. Van Horn and M. Might. Abstracting abstractmachines. In International Conference on FunctionalProgramming, 2010.

[47] D. Vardoulakis. CFA2: Pushdown Flow Analysis forHigher-Order Languages. PhD thesis, NortheasternUniversity, 2012.

[48] M. Weiser. Program slicing. In InternationalConference on Software Engineering. IEEE Press,1981.

[49] D. Zanardini. The semantics of abstract programslicing. In IEEE International Working Conference onSource Code Analysis and Manipulation, 2008.