will you still compile me tomorrow? static cross-version compiler validation chris hawblitzel,...

13
Will You Still Compile Me Tomorrow? Static Cross-Version Compiler Validation Chris Hawblitzel, Shuvendu K. Lahiri (Microsoft Research) Kshama Pawar, Hammad Hashmi, Sedar Gokbulut, Lakshan Fernando, Dave Detlefs, Scott Wadsworth (Microsoft CLR Test Team)

Upload: curtis-mccormick

Post on 17-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Will You Still Compile Me Tomorrow? Static Cross-Version Compiler Validation Chris Hawblitzel, Shuvendu K. Lahiri (Microsoft Research) Kshama Pawar, Hammad

Will You Still Compile Me Tomorrow?

Static Cross-Version Compiler Validation

Chris Hawblitzel, Shuvendu K. Lahiri(Microsoft Research)

Kshama Pawar, Hammad Hashmi, Sedar Gokbulut, Lakshan Fernando, Dave Detlefs, Scott Wadsworth

(Microsoft CLR Test Team)

Page 2: Will You Still Compile Me Tomorrow? Static Cross-Version Compiler Validation Chris Hawblitzel, Shuvendu K. Lahiri (Microsoft Research) Kshama Pawar, Hammad

Finding compiler bugs

+ high automation- limited coverage

+ covers all inputs- false alarms

+ covers all programs- not automated

Compiler

Source program

Assembly code

Testinput

Testing

Output

Compiler

Source program

Assembly code

Automated theorem

prover

Validation

Compiler

Interactive theorem

prover

Verification

Page 3: Will You Still Compile Me Tomorrow? Static Cross-Version Compiler Validation Chris Hawblitzel, Shuvendu K. Lahiri (Microsoft Research) Kshama Pawar, Hammad

Cross-version validation

Compilerversion 4.0

Source program

Assembly code

Automated theorem prover

Compilerversion 4.5

Source program

Assembly code

mov EAX, EDX

and EAX, 255push EAXmov EDX, 0x100000call WriteInternalFlag2

ret

push ESImov ESI, EDX

and ESI, 255push ESImov EDX, 0x100000call WriteInternalFlag2

pop ESIret

compare similar code fewer false alarms

Page 4: Will You Still Compile Me Tomorrow? Static Cross-Version Compiler Validation Chris Hawblitzel, Shuvendu K. Lahiri (Microsoft Research) Kshama Pawar, Hammad

Validation across various dimensions

x86

ARM

ARM+optimizations

v1 v2 v3

Versions

x86+optimizations

v4

Assembly code

Assembly code

Assembly code

Assembly code

Assembly code

Assembly code

Page 5: Will You Still Compile Me Tomorrow? Static Cross-Version Compiler Validation Chris Hawblitzel, Shuvendu K. Lahiri (Microsoft Research) Kshama Pawar, Hammad

Tools: SymDiff, Boogie, Z3

Compilerversion 4.0

Source program

Assembly code

Boogie program verifier

Compilerversion 4.5

Source program

Assembly code

...push ESI...

Z3automated

theoremprover

SymDiff equivalence verifier

...Mem := Store4(...esi...);esp := SUB(esp, imm(4));...

Boogieprogram

Boogieprogram

Combined Boogie program

Verificationcondition

Page 6: Will You Still Compile Me Tomorrow? Static Cross-Version Compiler Validation Chris Hawblitzel, Shuvendu K. Lahiri (Microsoft Research) Kshama Pawar, Hammad

Encoding assembly language• Encode one method at a time

• calls are uninterpreted• inlining not yet supported

• Our encoding is not entirely sound• mathematical integers vs. 32-bit vectors

• Z3 supports both, but reasoning about integers is faster• non-aliasing assumptions

• disjoint regions for stack, heap, static data

• Floating point, switch tables, etc.• Complex instructions

• rep stosb: i. edx i edx+ecx Mem[i] == al

Page 7: Will You Still Compile Me Tomorrow? Static Cross-Version Compiler Validation Chris Hawblitzel, Shuvendu K. Lahiri (Microsoft Research) Kshama Pawar, Hammad

Month-to-month results (ARM)

month2 month3 month4,5

month6 month7 AVG

Identical 37.8125 69 71.0625 69.0625 94.5333333333

333

NaN 68.2941666666

667

Equiva-lent

57.9375 24.375 19.1875 19.375 1.86666666666

667

NaN 24.5483333333

333

Differ-ent

1.75 1.625 3.3125 7 1.53333333333

333

NaN 3.04416666666

667

Time-Out

1.375 3.8125 3.8125 2.6875 0.26666666666

6667

NaN 2.39083333333

333

Missing 1.125 1.1875 2.625 1.875 1.8 NaN 1.7225

10%30%50%70%90%

me

tho

d b

od

ies

Page 8: Will You Still Compile Me Tomorrow? Static Cross-Version Compiler Validation Chris Hawblitzel, Shuvendu K. Lahiri (Microsoft Research) Kshama Pawar, Hammad

Cross-architecture, optimization

x86 opt vs. unopt

ARM opt vs. unopt

x86 vs. ARM MDIL vs. JIT

Identical 0 0 0 0

Equiva-lent

77 73.8461538461538

62.75 65.2296911686995

Different 19 18.7692307692308

29 19.9583204231799

TimeOut 2 1.69230769230769

4.6875 1.33322333674496

Missing 1.88888888888889

5.46153846153846

3.5625 13.4787650713756

10%30%50%70%90%

met

hod

bodi

es

Page 9: Will You Still Compile Me Tomorrow? Static Cross-Version Compiler Validation Chris Hawblitzel, Shuvendu K. Lahiri (Microsoft Research) Kshama Pawar, Hammad

Fault injection (ARM)

m 3,4 month 5 month 6 month 7 AVG

Equiv-un-sound

3.75 2.5 2.5 3.333333333333

33

NaN 3.020833333333

33

Equiv-correct

5.625 7.5 8.75 10 NaN 7.96875

Different 86.875 86.25 83.125 81.33333333333

33

NaN 84.39583333333

33

TimeOut 0.625 0.666666666666

667

0.714285714285

714

0 NaN 0.501488095238

095

Missing 3.125 1.875 3.125 3.333333333333

33

NaN 2.864583333333

33

10%30%50%70%90%

me

tho

d b

od

ies

Page 10: Will You Still Compile Me Tomorrow? Static Cross-Version Compiler Validation Chris Hawblitzel, Shuvendu K. Lahiri (Microsoft Research) Kshama Pawar, Hammad

Counterexample traces

• Helps user find where program execution diverged• Used by automated root cause analysis

Page 11: Will You Still Compile Me Tomorrow? Static Cross-Version Compiler Validation Chris Hawblitzel, Shuvendu K. Lahiri (Microsoft Research) Kshama Pawar, Hammad

Root cause analysis

Page 12: Will You Still Compile Me Tomorrow? Static Cross-Version Compiler Validation Chris Hawblitzel, Shuvendu K. Lahiri (Microsoft Research) Kshama Pawar, Hammad

Bucketing

• Based on root cause analysis• Users write bucket descriptions

Page 13: Will You Still Compile Me Tomorrow? Static Cross-Version Compiler Validation Chris Hawblitzel, Shuvendu K. Lahiri (Microsoft Research) Kshama Pawar, Hammad

Conclusions

• Some statistics:• methods analyzed: > 500,000• new bugs found: 12• false alarm rate, month-to-month versions: 2.2%• false alarm rate, opt vs. unopt, ARM vs x86: > 20%• speed: 13 seconds per method

• Sources of false alarms:• aliasing, run-time system calls, embedded addresses, ...

• Counterexample traces, root cause analysis essential