powerpoint presentation€¦ · title: powerpoint presentation author: leeswimming created date:...

51
1 Montage: A Neural Network Language Model-Guided JavaScript Engine Fuzzer Suyoung Lee, HyungSeok Han, Sang Kil Cha, Sooel Son Graduate School of Information Security (GSIS) , KAIST

Upload: others

Post on 11-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

1

Montage: A Neural Network Language

Model-Guided JavaScript Engine Fuzzer

Suyoung Lee, HyungSeok Han, Sang Kil Cha, Sooel Son

Graduate School of Information Security (GSIS) , KAIST

Page 2: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

2

Popularity of Web Browsers

Source: https://www.statista.com/statistics/543218/worldwide-internet-users-by-browser/

4 billion users

Page 3: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

3

Vulnerable Web Browsers

47% 24%

21%

5%

Office

Browser

Android

Java

https://www.theguardian.com/technology/2017/jan/10/browser-autofill-used-to-steal-personal-details-in-new-phising-attack-chrome-safari

https://searchsecurity.techtarget.com/news/252458522/MarioNet-attack-exploits-HTML5-to-create-botnets

https://cointelegraph.com/news/recent-firefoxs-zero-day-flaw-was-used-in-attacks-against-coinbases-employees

https://www.tomsguide.com/us/ios-malvertising-barrage,news-29880.html

▪ Most exploited applications in 2018▪ Browser-based cyber threats

https://www.kaspersky.com/about/press-releases/2018_microsoft-office-exploits

Page 4: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

4

Chrome

V8

JS Engine

JS Engine Vulnerabilities

https://leeswimming.com

JS engine vulnerabilities pose critical security threat!

# id

uid=0(root) gid=0(root) groups=0(root)

Page 5: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

5

JS Engine Fuzzing

5

▪ JS Engines

V8 ChakraCore

SpiderMonkey

JavaScriptCore

▪ Fuzzing (Fuzz Testing)

- An automated software testing that involve providing invalid or unexpected input to a

program under testing (PUT).

ChakraCore (PUT)

JS Engine

Fuzzer

JS TESTCODE

Page 6: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

6

How Can We (Fuzzer) Generate Test Input?

6

Proof of Concept (PoC) for CVE-2017-8586

ChakraCore (PUT)

CRASH!

Page 7: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

7

Previous Work

Abstract Syntax Tree (AST)

Page 8: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

8

Previous Work

1. Mutation-based fuzzers

2. Generation-based fuzzers

- LangFuzz, IFuzzer, and GramFuzz

- Combining AST subtrees extracted from JS seeds

- jsfunfuzz

- Applying JS grammar rules from scratch

Abstract Syntax Tree (AST)

Page 9: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

9

Previous Work

1. Mutation-based fuzzers

2. Generation-based fuzzers

- LangFuzz, IFuzzer, and GramFuzz

- Combining AST subtrees extracted from JS seeds

- jsfunfuzz

- Applying JS grammar rules from scratch

Abstract Syntax Tree (AST)

Page 10: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

10

▪ LangFuzz, IFuzzer, and GramFuzz

- They combine AST subtrees of seed JS tests

Subtree 1 Subtree 2

Generated code

Previous Work: Mutation-based JS fuzzers

Page 11: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

11

Which combination is more likely to trigger JS engine bugs?

Relationship between Building Blocks

Current AST A set of applicable AST subtrees

None of the existing fuzzers consider their relationships!

Page 12: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

12

Are there any similar patterns between bug-triggering JS code?

Motivational Question

Current AST A set of applicable AST subtrees

Page 13: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

13

Study on JS Engine Vulnerabilities

• Analyzed patches of 50 CVEs assigned to ChakraCore

CVE-2017-0071

CVE-2017-0141

CVE-2017-0196

CVE-2018-0953

...

18% of patches revisedGlobOpt.cpp

14% of patches revised JavascriptArray.cpp

Patches of 50 CVEs

Page 14: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

14

Study on JS Engine Vulnerabilities

• Analyzed patches of 50 CVEs assigned to ChakraCore

CVE-2017-0071

CVE-2017-0141

CVE-2017-0196

CVE-2018-0953

...

18% are related toglobal optimization

14% are related toJavaScript array

Patches of 50 CVEs

Page 15: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

15

Study on JS Engine Vulnerabilities

• Compared AST subtrees from two sets

67 PoCs triggeringChakraCore CVEs

At August, 2016

JS Test 1

JS Test 2

JS Test 2038

...

2038 JS tests from ChakraCore repo

After August, 2016

CVE-2016-3247

CVE-2016-7203

CVE-2018-0980

...

Over 95% subtrees from PoCs exist in regression tests

Page 16: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

16

1. To leverage the functionality of JS regression tests

Our Goal

2. To learn the relationship of AST subtrees

- Modeling the relationship between AST subtrees

- Mutation based approach

Page 17: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

17

Our Building Block – Fragments

const v0 = 0;

VarDeclaration

body

decl

id init

name value

Program

VarDeclarator

Identifier

v0

Literal

0

const

kindVarDeclaration

body

Program VarDeclaration

decl

VarDeclaratorconst

kind

value

Literal

0

id init

VarDeclarator

Identifier Literal

name

Identifier

v0

A fragment is a subtree of depth 1

Page 18: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

18

Montage Overview

JS

Preprocessing

TrainedNNLM

AST Mutation

A Sequence of Fragments

Training

JS

Page 19: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

Preprocessing – Normalization

• Normalizing IDs

v0

id init

name value

VarDeclarator

Identifier Literal

0cnt v0

id init

name value

VarDeclarator

Identifier Literal

0idx

19

- To decrease the # of unique fragments, rename IDs

Page 20: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

Preprocessing – Normalization

• Normalizing IDs

v0

id init

name value

VarDeclarator

Identifier Literal

0v0 v0

id init

name value

VarDeclarator

Identifier Literal

0v0

20

- To decrease the # of unique fragments, rename IDs

Page 21: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

• Fragmentation

VarDeclaration

body

decl

id init

name value

Program

VarDeclarator

Identifier

v0

Literal

0

const

kindVarDeclaration

body

Program

21

Normalized AST A sequence of fragments

Preprocessing – Fragmentation

Page 22: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

• Fragmentation

VarDeclaration

body

decl

id init

name value

Program

VarDeclarator

Identifier

v0

Literal

0

const

kindVarDeclaration

body

Program VarDeclaration

decl

VarDeclaratorconst

kind

22

Normalized AST A sequence of fragments

Preprocessing – Fragmentation

Page 23: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

• Fragmentation

VarDeclaration

body

decl

id init

name value

Program

VarDeclarator

Identifier

v0

Literal

0

const

kindVarDeclaration

body

Program VarDeclaration

decl

VarDeclaratorconst

kind

id init

VarDeclarator

Identifier Literal

23

Normalized AST A sequence of fragments

Preprocessing – Fragmentation

Page 24: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

• Fragmentation

VarDeclaration

body

decl

id init

name value

Program

VarDeclarator

Identifier

v0

Literal

0

const

kindVarDeclaration

body

Program VarDeclaration

decl

VarDeclaratorconst

kind

id init

VarDeclarator

Identifier Literal

name

Identifier

v0

24

Normalized AST A sequence of fragments

Preprocessing – Fragmentation

Page 25: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

• Fragmentation

VarDeclaration

body

decl

id init

name value

Program

VarDeclarator

Identifier

v0

Literal

0

const

kindVarDeclaration

body

Program VarDeclaration

decl

VarDeclaratorconst

kind

value

Literal

0

id init

VarDeclarator

Identifier Literal

name

Identifier

v0

25

Normalized AST A sequence of fragments

Preprocessing – Fragmentation

Page 26: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

VarDeclaration

body

decl

id init

name value

Program

VarDeclarator

Identifier

v0

Literal

0

const

kind

26

Normalized AST

A sequence of fragments

Preprocessing – Fragmentation

• Fragmentation

Montage captures the global compositionalrelationships between fragments

Page 27: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

Training Objectives

LSTM modelA sequence of

preceding fragmentsProbability distribution of

next fragment

27

1. Given a sequence, predict the distribution of next fragments

Page 28: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

Training Objectives

28

2. Prioritize the fragments that have a correct type

A given sequence of preceding fragments

FunctionDecl

body

Program FunctionDecl

IdentifierIdentifier BlockStmt

idparams

body

Page 29: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

Training Objectives

29

2. Prioritize the fragments that have a correct type

To be syntactically correct, the root of the next fragment should be Identifier!

A given sequence of preceding fragments

FunctionDecl

body

Program FunctionDecl

IdentifierIdentifier BlockStmt

idparams

body

Page 30: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

AST Mutation

30

VarDeclaration

body

decl

id init

name operator

Program

VarDeclarator

Identifier

v1

BinaryExpr

const

kind

Seed AST

Identifier

v0

name

+

value

Literal

1

left right

const v1 = v0 + 1;

Page 31: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

AST Mutation

31

VarDeclaration

body

decl

init

operator

Program

VarDeclarator

Identifier

v1

BinaryExpr

const

Remove a subtree

Identifier

v0

name

+

value

Literal

1

left right

id

name

kind

Page 32: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

AST Mutation

32

VarDeclaration

body

decl

init

Program

VarDeclarator

Identifier

v1

BinaryExpr

const

id

name

kind

Page 33: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

AST Mutation

33

VarDeclaration

body

decl

init

Program

VarDeclarator

Identifier

v1

BinaryExpr

const

id

name

kind

TrainedLSTM model

Input the sequence of 4 fragmentsrepresenting the current AST

Page 34: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

AST Mutation

34

VarDeclaration

body

decl

init

Program

VarDeclarator

Identifier

v1

BinaryExpr

const

id

name

kind

TrainedLSTM model

The probability distribution ofthe next fragment

0.76 0.12 0.001

Page 35: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

AST Mutation

35

VarDeclaration

body

decl

init

Program

VarDeclarator

Identifier

v1

BinaryExpr

const

id

name

kind

TrainedLSTM model

0.76 0.12 0.001

Randomly select onefrom Top K fragments

Page 36: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

AST Mutation

36

VarDeclaration

body

decl

init

Program

VarDeclarator

Identifier

v1

const

id

name

kind

TrainedLSTM model

operator

Identifier / Identifier

left right

BinaryExpr

Page 37: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

AST Mutation

37

name

v1

name

v0

const v1 = v1 / v0;

Repeat until all leaf nodes become terminal symbols

VarDeclaration

body

decl

init

Program

VarDeclarator

Identifier

v1

const

id

name

kind

operator

BinaryExpr

Identifier / Identifier

left right

TrainedLSTM model

Page 38: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

Resolving Reference Errors

38

Code generated from the previous step

foo();

var str = ‘Hello World’;

function foo () {var obj = Object();num = 10;b.toUpperCase(); // reference error

}

Page 39: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

Resolving Reference Errors

39

foo();

var str = ‘Hello World’;

function foo () {var obj = Object();num = 10;b.toUpperCase();

} Identifier map

global scope:str => stringfoo => functionnum => number

foo:obj => object

Page 40: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

Resolving Reference Errors

40

If possible, statically infer the type of undeclared identifiers!

foo();

var str = ‘Hello World’;

function foo () {var obj = Object();num = 10;b.toUpperCase();

} Identifier map

global scope:str => stringfoo => functionnum => number

foo:obj => object

Page 41: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

Resolving Reference Errors

41

If possible, statically infer the type of undeclared identifiers!

foo();

var str = ‘Hello World’;

function foo () {var obj = Object();num = 10;b.toUpperCase();

} Identifier map

global scope:str => stringfoo => functionnum => number

foo:obj => object

b is a string

Page 42: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

Resolving Reference Errors

42

Replace b with a declared identifier str

foo();

var str = ‘Hello World’;

function foo () {var obj = Object();num = 10;str.toUpperCase();

} Identifier map

global scope:str => stringfoo => functionnum => number

foo:obj => object

Page 43: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

Experiment Setup

43

January, 2017Dataset

February, 2017 ChakraCore 1.4.1

Dataset Unpatched Bugs

• Ran fuzzers against ChakraCore 1.4.1

• JS code testing unpatched bugs are not in our dataset!

• Collected 33.5K unique JS files

- Regression tests from repository of four major JS engines and Test262

- PoCs of known CVEs

Page 44: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

Comparison to State-of-the-art Fuzzers

• For each fuzzer, ran 5 trials of a 72 hours-long fuzzing campaign

44

- CodeAlchemist: A state-of-the-art semantics-aware JS fuzzer, NDSS’19

- IFuzzer: An evolutionary JS fuzzer, ESORICS’16

- jsfunfuzz: A JS fuzzer developed by Mozilla

Metric Build# of Unique Crashes (Known CVEs)

Montage CodeAlchemist jsfunfuzz IFuzzer

MedianRelease 23 (7) 15 (4) 27 (3) 4 (1)

Debug 49 (12) 26 (6) 27 (4) 6 (1)

Page 45: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

Comparison to State-of-the-art Fuzzers

• The # of found unique crashes (known CVEs)

45

Page 46: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

Effect of Language Models

46

1. A random fragment selection w/o model: The baseline of Montage

2. A char/token-level RNN: A prevalent neural language model

• For each approach, ran 5 trials of a 72 hours-long fuzzing campaign

3. A Markov model: A simple language model

Metric Build# of Unique Crashes (Known CVEs)

Montage Random ch/token RNN Markov

MedianRelease 23 (7) 12 (3) 1 (0) 19 (6)

Debug 49 (12) 31 (7) 3 (0) 44 (11)

Page 47: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

• The # of appended fragments to compose a new subtree

47

15 52

90

Effect of the LSTM model

Page 48: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

• For each approach, ran 5 trials of a 72 hours-long fuzzing campaign

48

Effect of Resolving Reference Errors

Metric Build

# of Unique Crashes (Known CVEs)

Montage Montage w/o resolving step

Median

Release 23 (7) 12 (4)

Debug 49 (12) 41 (9)

The resolving step helps to find more bugs!

Montage still finds many bugs without the resolving step!

Page 49: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

Finding Real-World Bugs

• We ran Montage on the four major JS engines for 1.5 months

49

- Found 37 previous bugs in total.

➢ 34 bugs including two CVEs from ChakraCore 1.11.7

➢ One bug from V8 7.4.0 (beta)

➢ Two bugs including one CVE from JSC 2.23.3

Page 50: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

50

- Conducted systematic studies on JS engine vulnerabilities

- Proposed the first NNLM-guided JS engine fuzzing tool

- Found 37 real-world bugs from the latest JS engines

Conclusion

Page 51: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: leeswimming Created Date: 7/7/2020 10:21:26 PM

51

Question?