harnessing computing power for security: bitblaze ... · instead, new two-phase distance algorithm...
TRANSCRIPT
![Page 1: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/1.jpg)
Harnessing computing power for security:BitBlaze, WebBlaze, and
real-time spam URL filteringDevdatta Akhawe, Domagoj Babic, Adam Barth, Juan Caballero,
Chris Grier, Steve Hanna, Justin Ma, Lorenzo Martignoni,Stephen McCamant, Feng Mao, James Newsome, Vern Paxson,
Prateek Saxena, Dawn Song, and Kurt Thomas{smcc,dawnsong}@cs.berkeley.edu
University of California, Berkeley
![Page 2: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/2.jpg)
Computer security: bad news
More powerful computers seem to justincrease our exposure to security threats
![Page 3: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/3.jpg)
Defense is challenging
Software inevitably has bugs
Attackers now have real incentivesFinancial, national interest, . . .
Increasing sophistication and scale ofattacksWe need a new generation of defensetechniques
Move beyond symptom-based andheuristic approaches
![Page 4: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/4.jpg)
The BitBlaze approach
Use program semantics, focus on rootcausesBuild a unified binary analysis platformfor security
Leverage advances in program analysis,instrumentation, etc.
Apply it to solve real-world securityproblems
I’ll discuss just a few examples
![Page 5: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/5.jpg)
BitBlaze core components
![Page 6: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/6.jpg)
Outline
Core technique: symbolic reasoning
Binary-level bug-finding
Binary-level influence measurement
Real-time URL spam filtering
Strings and JavaScript vulnerabilities
![Page 7: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/7.jpg)
Basic idea
Choose some of state (e.g., program orfunction input) to be symbolic:introduce variables for their valuesComputations on symbolic stateproduce formulas rather than concrete(e.g., integer) valuesConstruct queries with these formulas,solve to answer questions aboutpossible program behavior
![Page 8: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/8.jpg)
Why symbolic reasoning?
+ Precise: formulas can capture exactprogram behavior withoutapproximation
+ Complete solver: (i.e. decisionprocedure) will always produce acorrect solution without human help
+ Flexibility: Formulas independent ofparticular form of query
![Page 9: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/9.jpg)
Why not symbolic reasoning?
- Precise, but often not complete: don’tprove that a given behavior can neverhappen
- Complete solver, but solution notguaranteed within reasonablespace/time
- Flexibility, but may be be less efficientthan more specialized approach
![Page 10: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/10.jpg)
Possible approaches
![Page 11: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/11.jpg)
ApplicationsVulnerability signatures [Oakland’06,CSF’07] Protocol replay
[CCS’06] Deviation discovery [USENIX’07] Patch-based exploit
generation [Oakland’08] Modeling content sniffing [Oakland’09]
Influence measurement [PLAS’09] Loop-extended SE
[ISSTA’09] Protocol-level exploration [RAID’09] Kernel API
exploration Decomposing crypto functions [CCS’10] Fixing
under-tainting [NDSS’11] Protocol-model assisted SE [USENIX’11]
JavaScript SE [Oakland’10] Static-guided test generation
[ISSTA’11] Emulator verification [submitted]
![Page 12: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/12.jpg)
ApplicationsVulnerability signatures [Oakland’06,CSF’07] Protocol replay
[CCS’06] Deviation discovery [USENIX’07] Patch-based exploit
generation [Oakland’08] Modeling content sniffing [Oakland’09]
Influence measurement [PLAS’09] Loop-extended SE
[ISSTA’09] Protocol-level exploration [RAID’09] Kernel API
exploration Decomposing crypto functions [CCS’10] Fixing
under-tainting [NDSS’11] Protocol-model assisted SE [USENIX’11]
JavaScript SE [Oakland’10] Static-guided test generation
[ISSTA’11] Emulator verification [submitted]
![Page 13: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/13.jpg)
Challenges of binary symbolic reasoning
Instruction set complexityRewrite to simpler intermediate language
Variable-size memory accessesLazy conversion with mixed-granularitystorage
No type distinction between integersand pointers
Analyze symbolic expression structure
![Page 14: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/14.jpg)
Outline
Core technique: symbolic reasoning
Binary-level bug-finding
Binary-level influence measurement
Real-time URL spam filtering
Strings and JavaScript vulnerabilities
![Page 15: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/15.jpg)
Setting: vulnerability finding
Find exploitable bugs in software, before the badguys doMany bugs found by independent researchers,without benefit of source codeExample vulnerability type: buffer overflowIncorrect or missing bounds check allowsmalicious input to overwrite other sensitive stateDespite extensive research, and some progressin practice, still a major bug category in C/C++programs
![Page 16: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/16.jpg)
Static analysis
Widely used at source-code levelCan be sound (report all potential problems), atcost of false positives (imprecision)Challenge 1: more difficult at binary level
Soundness/precision tradeoff less favorableChallenge 2: developers have a low tolerance forfalse positives
Won’t use a tool that wastes their time
![Page 17: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/17.jpg)
Combined static/dynamic approach
Before static analysis, use dynamic traces to helpwhere static binary analysis has trouble (e.g.,indirect control flow)Design and optimize static analysis forbinary-level challenges (e.g., variable identification,overlapping memory accesses)After static analysis, prioritize true positives bysearching for test cases with symbolic execution
![Page 18: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/18.jpg)
Combined static/dynamic approach
Before static analysis, use dynamic traces to helpwhere static binary analysis has trouble (e.g.,indirect control flow)Design and optimize static analysis forbinary-level challenges (e.g., variable identification,overlapping memory accesses)After static analysis, prioritize true positives bysearching for test cases with symbolic execution
![Page 19: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/19.jpg)
Key challenge: guiding the search
Increase the chances that the paths weexplore will lead to a bug
Path must reach the code location of thebugProgram state at that location musttrigger the bug
Combination of two approaches:1. Data-flow slice and control-flow distance
to direct paths toward a potential bug2. Explore patterns of loop body paths to
cover cases likely to overflow
![Page 20: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/20.jpg)
Key challenge: guiding the search
Increase the chances that the paths weexplore will lead to a bug
Path must reach the code location of thebugProgram state at that location musttrigger the bug
Combination of two approaches:1. Data-flow slice and control-flow distance
to direct paths toward a potential bug2. Explore patterns of loop body paths to
cover cases likely to overflow
![Page 21: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/21.jpg)
Guidance toward a bug
![Page 22: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/22.jpg)
Guidance toward a bug
![Page 23: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/23.jpg)
Guidance toward a bug
![Page 24: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/24.jpg)
Guidance toward a bug
![Page 25: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/25.jpg)
Guidance toward a bug
![Page 26: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/26.jpg)
Guidance toward a bug
![Page 27: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/27.jpg)
Guidance toward a bug
![Page 28: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/28.jpg)
Guidance toward a bug
![Page 29: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/29.jpg)
Guidance toward a bug
![Page 30: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/30.jpg)
Guidance toward a bug
![Page 31: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/31.jpg)
Guidance toward a bug
![Page 32: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/32.jpg)
Guidance toward a bug
![Page 33: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/33.jpg)
Guidance toward a bug
![Page 34: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/34.jpg)
Sub-problem: control-flow distance
An interprocedural control-flow graph hasnodes for statements, and edges betweenstatements and for calls and returns
However, we can’t use a regular graph distancemeasure (Dijkstra’s algorithm), because of calland return matching
Exclude: f calls g, g returns to h
Instead, new two-phase distance algorithm thatfirst computes entry-to-exit distances bottomup, then adds unmatched returns and calls
![Page 35: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/35.jpg)
Guidance results
Unguided GuidedBenchmark Paths Time (s) Paths Time (s)
BIND/b4 1 1:9 1 1:8
Sendmail/s5 3 19:0 3 22:9
BIND/b1 54 2:8 20 3:6
BIND/b2 137 13:3 72 25:1
BIND/b3 9 1:6 4 2:6
Sendmail/s2 16 2:9 9 97:0
Sendmail/s7 56 6:9 1 1:9
WU-FTPD/f1 309 8:1 11 1:1
WU-FTPD/f2 1455 65:8 11 1:4
WU-FTPD/f3 143 60:0 18 11:4
Sendmail/s5 T/O > 21600:0 332 200:4
Sendmail/s6 T/O > 21600:0 86 11:3
Sendmail/s1 T/O > 21600:0 7297 7474:4
Sendmail/s3 T/O > 21600:0 T/O > 21600:0
![Page 36: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/36.jpg)
Guidance results
Unguided GuidedBenchmark Paths Time (s) Paths Time (s)
! BIND/b4 1 1:9 1 1:8
! Sendmail/s5 3 19:0 3 22:9
BIND/b1 54 2:8 20 3:6
BIND/b2 137 13:3 72 25:1
BIND/b3 9 1:6 4 2:6
Sendmail/s2 16 2:9 9 97:0
Sendmail/s7 56 6:9 1 1:9
WU-FTPD/f1 309 8:1 11 1:1
WU-FTPD/f2 1455 65:8 11 1:4
WU-FTPD/f3 143 60:0 18 11:4
Sendmail/s5 T/O > 21600:0 332 200:4
Sendmail/s6 T/O > 21600:0 86 11:3
Sendmail/s1 T/O > 21600:0 7297 7474:4
Sendmail/s3 T/O > 21600:0 T/O > 21600:0
![Page 37: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/37.jpg)
Guidance results
Unguided GuidedBenchmark Paths Time (s) Paths Time (s)
BIND/b4 1 1:9 1 1:8
Sendmail/s5 3 19:0 3 22:9
! BIND/b1 54 2:8 20 3:6
! BIND/b2 137 13:3 72 25:1
! BIND/b3 9 1:6 4 2:6
! Sendmail/s2 16 2:9 9 97:0
Sendmail/s7 56 6:9 1 1:9
WU-FTPD/f1 309 8:1 11 1:1
WU-FTPD/f2 1455 65:8 11 1:4
WU-FTPD/f3 143 60:0 18 11:4
Sendmail/s5 T/O > 21600:0 332 200:4
Sendmail/s6 T/O > 21600:0 86 11:3
Sendmail/s1 T/O > 21600:0 7297 7474:4
Sendmail/s3 T/O > 21600:0 T/O > 21600:0
![Page 38: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/38.jpg)
Guidance results
Unguided GuidedBenchmark Paths Time (s) Paths Time (s)
BIND/b4 1 1:9 1 1:8
Sendmail/s5 3 19:0 3 22:9
BIND/b1 54 2:8 20 3:6
BIND/b2 137 13:3 72 25:1
BIND/b3 9 1:6 4 2:6
Sendmail/s2 16 2:9 9 97:0
! Sendmail/s7 56 6:9 1 1:9
!WU-FTPD/f1 309 8:1 11 1:1
!WU-FTPD/f2 1455 65:8 11 1:4
!WU-FTPD/f3 143 60:0 18 11:4
Sendmail/s5 T/O > 21600:0 332 200:4
Sendmail/s6 T/O > 21600:0 86 11:3
Sendmail/s1 T/O > 21600:0 7297 7474:4
Sendmail/s3 T/O > 21600:0 T/O > 21600:0
![Page 39: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/39.jpg)
Guidance results
Unguided GuidedBenchmark Paths Time (s) Paths Time (s)
BIND/b4 1 1:9 1 1:8
Sendmail/s5 3 19:0 3 22:9
BIND/b1 54 2:8 20 3:6
BIND/b2 137 13:3 72 25:1
BIND/b3 9 1:6 4 2:6
Sendmail/s2 16 2:9 9 97:0
Sendmail/s7 56 6:9 1 1:9
WU-FTPD/f1 309 8:1 11 1:1
WU-FTPD/f2 1455 65:8 11 1:4
WU-FTPD/f3 143 60:0 18 11:4
! Sendmail/s5 T/O > 21600:0 332 200:4
! Sendmail/s6 T/O > 21600:0 86 11:3
Sendmail/s1 T/O > 21600:0 7297 7474:4
Sendmail/s3 T/O > 21600:0 T/O > 21600:0
![Page 40: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/40.jpg)
Guidance results
Unguided GuidedBenchmark Paths Time (s) Paths Time (s)
BIND/b4 1 1:9 1 1:8
Sendmail/s5 3 19:0 3 22:9
BIND/b1 54 2:8 20 3:6
BIND/b2 137 13:3 72 25:1
BIND/b3 9 1:6 4 2:6
Sendmail/s2 16 2:9 9 97:0
Sendmail/s7 56 6:9 1 1:9
WU-FTPD/f1 309 8:1 11 1:1
WU-FTPD/f2 1455 65:8 11 1:4
WU-FTPD/f3 143 60:0 18 11:4
Sendmail/s5 T/O > 21600:0 332 200:4
Sendmail/s6 T/O > 21600:0 86 11:3
! Sendmail/s1 T/O > 21600:0 7297 7474:4
Sendmail/s3 T/O > 21600:0 T/O > 21600:0
![Page 41: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/41.jpg)
Guidance results
Unguided GuidedBenchmark Paths Time (s) Paths Time (s)
BIND/b4 1 1:9 1 1:8
Sendmail/s5 3 19:0 3 22:9
BIND/b1 54 2:8 20 3:6
BIND/b2 137 13:3 72 25:1
BIND/b3 9 1:6 4 2:6
Sendmail/s2 16 2:9 9 97:0
Sendmail/s7 56 6:9 1 1:9
WU-FTPD/f1 309 8:1 11 1:1
WU-FTPD/f2 1455 65:8 11 1:4
WU-FTPD/f3 143 60:0 18 11:4
Sendmail/s5 T/O > 21600:0 332 200:4
Sendmail/s6 T/O > 21600:0 86 11:3
Sendmail/s1 T/O > 21600:0 7297 7474:4
! Sendmail/s3 T/O > 21600:0 T/O > 21600:0
![Page 42: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/42.jpg)
Outline
Core technique: symbolic reasoning
Binary-level bug-finding
Binary-level influence measurement
Real-time URL spam filtering
Strings and JavaScript vulnerabilities
![Page 43: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/43.jpg)
Due and undue influence
How much influence should networkinputs have on a program?For instance, on an indirect jump target
Some influence ! select a legal behaviorToo much influence ! control flowhijacking attack
![Page 44: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/44.jpg)
High and low influence examples
void (*func_ptr)(void);func_ptr = untrusted_input();(*func_ptr)();
void (*func_ptr)(void);switch (untrusted_input()) {
case CMD_OPEN: func_ptr = &open_file;case CMD_READ: func_ptr = &read_file;default: func_ptr = &error;
}(*func_ptr)();
![Page 45: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/45.jpg)
Channel capacity as influence
For a given variable, how many valuescan an attacker produce?
Influence = log2(# values)
Special case of channel capacity frominformation theory
![Page 46: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/46.jpg)
Scalability and precision
Want to analyze large (e.g.,commercial) software
Want results with no error
Our goal: improved trade-off pointsbetween these ideals
![Page 47: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/47.jpg)
Problem statementGiven:
A deterministic program with designatedinputsAn output variable
Question: how many values of theoutput are possible, given differentinputs?
![Page 48: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/48.jpg)
Program to formula example/* Convert low 4 bits of integer to hex */char tohex(int i) {
int low = i & 0xf;char v;if (low < 10)
v = '0' + low;else
v = 'a' + (low - 10);return v;
}
Dynamic: (i & 15) < 10^ (v = 48+ (i & 15))
![Page 49: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/49.jpg)
Program to formula example/* Convert low 4 bits of integer to hex */char tohex(int i) {
int low = i & 0xf;char v;if (low < 10)
v = '0' + low;else
v = 'a' + (low - 10);return v;
}
Static: ((i & 15) < 10 ^ (v = 48+ (i & 15)) _
((i & 15) � 10 ^ (v = 97+ (i & 15) - 10))
![Page 50: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/50.jpg)
Query techniques
Point-by-point exhaustion
Range exclusion
Random output sampling
Probabilistic model counting
![Page 51: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/51.jpg)
Query techniques
Point-by-point exhaustion
Range exclusion
Random output sampling
Probabilistic model counting
![Page 52: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/52.jpg)
Query techniques
Point-by-point exhaustion
Range exclusion
Random output sampling
Probabilistic model counting
![Page 53: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/53.jpg)
Point-by-point exhaustion
Is v = f(i) satisfiable?
Suppose it is, by v1 = f(i1)
Is v = f(i)^ v 6= v1 satisfiable?
. . .
We repeat up to at most 26 = 64
distinct outputs, so every bound up to6 bits is exact
![Page 54: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/54.jpg)
Range exclusion
Is v = f(i)^ (a � v � b) satisfiable?
If not, a whole range is excluded
If so, can subdivide
We also use this with binary search tofind the minimum and maximumoutputs
![Page 55: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/55.jpg)
Random output sampling
Pick vr at random, and check ifvr = f(i) is satisfiable
By default, our tool uses 20 samples,and computes a 95% confidenceinterval
![Page 56: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/56.jpg)
Probabilistic model counting
Use XOR streamlining [GSS06] toprobabilistically reduce #SAT to SAT
Analogy: counting audience members
Random parity constraints over enoughbits are effectively independent
Perform repeated experiments withdifferent numbers of constraints
![Page 57: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/57.jpg)
Probabilistic model counting
Choose # of constraints so that p(SAT) � 0:5
0
0.2
0.4
0.6
p(SAT)
0.8
1
0 5 10 15 20
# parity constraints added
25 30 35
![Page 58: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/58.jpg)
Identity function
v = i
Low High Sample #SAT Actual6.04 32.0 [31.8, 32.0] 32.0 32
![Page 59: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/59.jpg)
tohex
sprintf(&v, "%x", i & 0xf)
Static:Low High Sample #SAT Actual4.00 4.00 N/A N/A 4
Dynamic:Low High Sample #SAT Actual3.32 3.32 N/A N/A log2 102.58 2.58 N/A N/A log2 6
![Page 60: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/60.jpg)
Mix and duplicate
f(x � y) = (x� y) � (x� y)
f(0x00000042) = 0x00420042
f(0x02461111) = 0x13571357
f(0xcafebebe) = 0x74407440
Low High Sample #SAT Actual6.04 32.0 [0.0, 28.6] 15.8 16
![Page 61: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/61.jpg)
Results summary
Goal: distinguish attacks from false positives
![Page 62: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/62.jpg)
Confirming attacks
Vulnerable Windows and Linux binaries
Real attacks all have high influence, atleast 26 bits
Program High Sample #SAT Value SetRPC DCOM 32.0 [31.8, 32.0] 30.4SQL Server 30.9 [26.7, 28.3] 26.6ATPhttpd 32.0 [31.8, 32.0] 31.0
![Page 63: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/63.jpg)
Reveal false positives
Examples cause taint analysis warnings
Measured influence exactly, less than 5bits
Program Low High Value SetRPC %esp 3.81 3.81Samba func. ptr 3.32 3.32
![Page 64: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/64.jpg)
Directions for improving solving
Further targeted query strategiesE.g., two-bit patterns [Meng & Smith,PLAS’11]
Refined strategy for choosing numberof parity constraintsInterface with off-the-shelf #SATsolvers
Question: how to restrict counting tooutput bits?
![Page 65: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/65.jpg)
Outline
Core technique: symbolic reasoning
Binary-level bug-finding
Binary-level influence measurement
Real-time URL spam filtering
Strings and JavaScript vulnerabilities
![Page 66: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/66.jpg)
Mo.va.on
Social Networks (Facebook, TwiMer)
Web Mail (Gmail, Live Mail)
Blogs, Services (Blogger, Yelp)
Spam
![Page 67: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/67.jpg)
Mo.va.on
• Exis.ng solu.ons: – Blacklists – Service-‐specific, account heuris.cs
• Develop new spam filter service: – Filter spam: scams, phishing, malware – Real-‐.me, fine-‐grained, generalizable
![Page 68: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/68.jpg)
Overview
• Our system – Monarch: – Accepts millions of URLs from web service – Crawls, labels each URL in real-‐.me
• Spam Classifica.on – Decision based on URL content, page behavior, hos.ng
– Large-‐scale; distributed collec.on, classifica.on
• Implemented as a cloud service
![Page 69: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/69.jpg)
Monarch in Ac.on
Social Network
Spam Account
URL
![Page 70: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/70.jpg)
Monarch in Ac.on
Monarch
Social Network
Spam Account
URL
![Page 71: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/71.jpg)
Monarch in Ac.on
Monarch
Social Network
3. Fetch Content
Spam URL Content
Spam Account
URL
![Page 72: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/72.jpg)
Monarch in Ac.on
Monarch
Social Network
3. Fetch Content
Spam URL Content
Spam Account
URL
![Page 73: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/73.jpg)
Monarch in Ac.on
Monarch
Social Network
Message Recipients
3. Fetch Content
Spam URL Content
Spam Account
URL
![Page 74: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/74.jpg)
Challenges
Accuracy
Real-‐Time
Scalability
Tolerant to Feature Evolu.on
![Page 75: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/75.jpg)
Outline
• Architecture • Results & Performance • Limita.ons • Conclusion
![Page 76: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/76.jpg)
System Architecture
![Page 77: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/77.jpg)
System Architecture
![Page 78: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/78.jpg)
System Architecture
![Page 79: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/79.jpg)
System Architecture
![Page 80: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/80.jpg)
URL Aggrega.on Source Sample Size
Spam email URLs 1.25 million
Blacklisted TwiMer URLs 567,000
Non-‐spam TwiMer URLs 9 million
Collec.on period: 9/8/2010 – 10/29/2010
![Page 81: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/81.jpg)
Feature Collec.on
• High Fidelity Browser • NavigaGon – Lexical features of URLs (length, subdomains) – Obfusca.on (directory opera.ons, nested encoding)
• HosGng – IP/ASN – A, NS, MX records – Country, city if available
![Page 82: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/82.jpg)
Feature Collec.on
• Content – Common HTML templates, keywords – Search engine op.miza.on – Content of request, response headers
• Behavior – Prevent naviga.ng away – Pop-‐up windows – Plugin, JavaScript redirects
![Page 83: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/83.jpg)
Classifica.on
• Distributed LogisGc Regression – Data overload for single machine
![Page 84: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/84.jpg)
Classifica.on
• Distributed LogisGc Regression – Data overload for single machine
• L1-‐regularizaGon – Reduces feature space, over-‐figng – 50 million features -‐> 100,000 features
![Page 85: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/85.jpg)
Implementa.on
• System implemented as a cloud service on Amazon EC2 – AggregaGon: 1 machine – Feature CollecGon: 20 machines
• Firefox, extension + modified source
– ClassificaGon & Feature ExtracGon: 50 machines • Hadoop -‐ Spark, Mesos
• Straighjorward to scale the architecture
![Page 86: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/86.jpg)
Result Overview
• High-‐level summary: – Performance – Overall accuracy – Highlight important features – Feature evolu.on – Spam independence between services
![Page 87: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/87.jpg)
Performance
• Rate: 638,000 URLs/day – Cost: $1,600/mo
• Process .me: 5.54 sec – Network delay: 5.46 sec
• Can scale to 15 million URLs/day – Es.mated $22,000/mo
![Page 88: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/88.jpg)
Measuring Accuracy
• Dataset: 12 million URLs (<2 million spam) – Sample 500K spam (half tweets, half email) – Sample 500K non-‐spam
• Training, Tes.ng – 5-‐fold valida.on – Vary training folds non-‐spam:spam ra.o – Test fold equal parts spam, non-‐spam
![Page 89: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/89.jpg)
Overall Accuracy
Training RaGo
Accuracy False PosiGve Rate
False NegaGve Rate
1:1 94% 4.23% 7.5%
4:1 91% 0.87% 17.6%
10:1 87% 0.29% 26.5%
Non-‐spam labeled as spam
Spam labeled as non-‐spam
Correctly labeled samples
![Page 90: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/90.jpg)
Overall Accuracy
Non-‐spam labeled as spam
Spam labeled as non-‐spam
Correctly labeled samples
Training RaGo
Accuracy False PosiGve Rate
False NegaGve Rate
1:1 94% 4.23% 7.5%
4:1 91% 0.87% 17.6%
10:1 87% 0.29% 26.5%
![Page 91: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/91.jpg)
Error by Feature
0
10
20
30
40
50
Error False Posi.ve Rate
Error (%)
Error = 1 -‐ Accuracy
![Page 92: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/92.jpg)
Error by Feature Error (%)
0
10
20
30
40
50
Error False Posi.ve Rate
Error = 1 -‐ Accuracy
![Page 93: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/93.jpg)
Error by Feature Error (%)
0
10
20
30
40
50
Error False Posi.ve Rate
Error = 1 -‐ Accuracy
![Page 94: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/94.jpg)
Feature Evolu.on – Retraining Required
86
88
90
92
94
96
98
12-‐Sep 16-‐Sep 20-‐Sep 24-‐Sep
With Retraining Without Retraining
Accuracy (%
)
![Page 95: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/95.jpg)
Spam Independence
• Unexpected result: TwiMer, email spam qualita.vely different
Training Set TesGng Set Accuracy False NegaGves
TwiRer TwiRer 94% 22%
TwiMer Email 81% 88%
Email TwiMer 80% 99%
Email Email 99% 4%
![Page 96: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/96.jpg)
Spam Independence
• Unexpected result: TwiMer, email spam qualita.vely different
Training Set TesGng Set Accuracy False NegaGves
TwiMer TwiMer 94% 22%
TwiRer Email 81% 88%
Email TwiRer 80% 99%
Email Email 99% 4%
![Page 97: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/97.jpg)
Dis.nct Email, TwiMer Features
![Page 98: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/98.jpg)
Email Features Shorter Lived
![Page 99: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/99.jpg)
Limita.ons • Adversarial Machine Learning – We provide oracle to spammers – Can adversaries tweak content un.l passing?
• Time-‐based Evasion – Change content aser URL submiMed for verifica.on
• Crawler Fingerprin.ng – Iden.fy IP space of Monarch, fingerprint Monarch browser client
– Dual-‐personality DNS, page behavior
![Page 100: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/100.jpg)
Outline
Core technique: symbolic reasoning
Binary-level bug-finding
Binary-level influence measurement
Real-time URL spam filtering
Strings and JavaScript vulnerabilities
![Page 101: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/101.jpg)
Example attack: gadget overwrite
![Page 102: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/102.jpg)
Example attack: explanation
Cross-site scripting can exist entirely inclient-side JavaScriptUnsanitized data passed to HTMLcreation (document.write) or evalIn the example, a malicious link injectscode into the TVGuide gadget, turning itinto a phishing vector
![Page 103: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/103.jpg)
What’s new here?Source/sink problem, somewhat likeSQL injection or server-side XSS, but:
JS code takes many kinds of inputs asunstructured strings, requiring customparsingSanitization is not standardized, and oftenapplication-specific
! More difficult challenges for stringreasoning
![Page 104: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/104.jpg)
Exploration overview
Two kinds of exploration:
Event space: GUI actions such asclicking check-boxes or linksValue space: contents of form,message, and URL inputs
Explore new program pathsCheck whether sanitization is sufficient(compare to attack grammar)
![Page 105: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/105.jpg)
Kudzu system overview
![Page 106: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/106.jpg)
Usage of string operations
Substr /
Substring /
CharAt /
CharCodeAt
5%
IndexOf /
LastIndexOf
/ Strlen
78%
Replace /
EncodeURI /
DecodeURI
8%
Match / Test
/ Split
1%Concat
8%
![Page 107: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/107.jpg)
Expressiveness
Regular expressionmembership
Arbitrary concatenation(word equations)
String length function
Can also mix in boolean and (31-bit)integer constraints
![Page 108: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/108.jpg)
Nested architecture
![Page 109: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/109.jpg)
Approach overview
Flatten concatenations to a linearsequence
Abstract to length constraints
For each length assignment:Expand regexps (HAMPI code)Combine in single bitvector querybitvector SAT ! string SAT
Exhausted lengths ! string UNSAT
![Page 110: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/110.jpg)
Approach details
Real JavaScript “regexes” are morecomplex that textbook onesRegexp lengths ! ultimately periodic setTranslate replace with fixed number ofoccurrences
![Page 111: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/111.jpg)
Kaluza performance results
0 50 100 150 200 250
0.05
0.5
5
50
Solve Time (SAT cases) Solve Time(UNSAT cases)
![Page 112: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/112.jpg)
Overall results
Tested 5 AJAX applications and 13iGoogle gadgets (all live)Event and value space exploration bothcontribute to coverage
But some code and events not yetcovered
Found vulnerabilities in 11 apps, including2 missed by our previous taint-directedfuzzer
![Page 113: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/113.jpg)
Summary, and for more info
Symbolic exploration and reasoningenable a wide variety of security tools
Many security problems need lots ofcomputing, but are naturallyparallelizable
http://bitblaze.cs.berkeley.edu/
http://bit.ly/muahjS
![Page 114: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/114.jpg)
Thank you
![Page 115: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/115.jpg)
Backup slides
![Page 116: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/116.jpg)
Web browser content sniffing
An HTTP response contains a contenttype header
E.g., text/html or image/png
But sometimes (�1%) the content typeis missing or invalid
Thus browsers sometimes attempt tosniff (guess) the type from the contentor URL
![Page 117: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/117.jpg)
When content sniffing goes bad
Content type matters because itaffects privilege
Some types of content (HTML, Flash) cancontain code
An unexpected upgrade can allow anuntrusted user to inject JavaScript
I.e., a kind of cross-site scripting (XSS)
Usually a mismatch between thebrowser and another filter
![Page 118: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/118.jpg)
HotCRP attack example
Conference site allows authors to uploadPostScript papers
What if the site accepts this file as PS, but thereviewer’s browser considers it HTML?
%!PS-Adobe
%%Creator: <script>submitReview("A+");
...
Your paper gets accepted :-)
![Page 119: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/119.jpg)
Modeling content sniffing
To understand such attacks, we want aformal model of the sniffer’s behavior
E.g., MH(c) = true if the file contents c
are sniffed as HTMLBoolean combinations correspond topossible mismatch attacks
MP1(c)^MH
2(c)
![Page 120: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/120.jpg)
Model extractionThe content-sniffing strategies ofclosed-source browsers are often un-or under-documented
We look at IE 7, Safari 3.1
Extract from the binary using white-boxexploration (symbolic execution)
Model is a disjunction of pathconditions from accepting paths
![Page 121: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/121.jpg)
Abstracting string functions
Sniffing code makes heavy use ofstring routines
Reason about their semantics, not theirimplementation
+ Summarize multiple paths
+ Skip implementation details
+ Take advantage of specialized solvers(future)
![Page 122: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/122.jpg)
Translating string functions
1. Recognize over 100 binary-levelfunctions (mostly documented)
2. Canonicalize to 14 semantic classes
3. Express in terms of a core constraintlanguage
4. Reduce core constraints to STP bitvectors
![Page 123: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/123.jpg)
Exploration advantage of strings
Block coverage for Safari:
90
100
110
120
130
140
150
160
170
180
190
200
0 5000 10000 15000 20000
Nu
mb
er
of
blo
cks
Time(seconds)
stringsbytes
![Page 124: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/124.jpg)
Summary of attacks found
Tool finds attacks to upgrade 6 contenttypes each in IE and Safari to HTML.
But which pass a common server-sidefilterWikipedia has a more complex filter, but itcan also be bypassed
Automatically generated PS ! HTMLexample:
%!t?HPTw\nOtKoCglD<HeadswssssRsD
![Page 125: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/125.jpg)
Happy ending: safe sniffing
Our models can be used to creatematching server-side filtersWe propose client-side designprinciples for safe sniffing
Avoid privilege escalationPrefix-disjoint signatures
Adopted by IE 8 (partial), Chrome, andHTML 5
![Page 126: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/126.jpg)
Guidance (2): loop patterns
Observation: triggering a loop-related bug often requires aspecific execution pattern within a loop
char buf[20], *p = buf;int mode = 0;while (p >= buf) {
switch (read_char()) {case 'a':
if (mode == 1) {*p++ = 'x'; mode = 0; // path 1
} else { // path 2} break;
case 'b':mode = !mode; break; // path 3
default: p = max(buf, p--); // path 4}
}
![Page 127: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/127.jpg)
Guidance (2): loop patterns
Approach: cover a variety of patternsduring symbolic execution
Don’t try to find the right pattern statically
Statically number paths through a loopbody
Try patterns in inverse relation to theirlength
Interleave use of patterns withdiscovering which paths are feasible
![Page 128: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/128.jpg)
Guidance (2): loop patterns
![Page 129: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/129.jpg)
What do our formulas look like?The key theory is fixed-size bit-vectors,representing machine integers
Exact treatment of overflow, signs, etc.important for binaries
Could use arrays for general memory,lookup tables, but usually don’t
Instead, fix memory layout to be concrete(or unconstrained symbolic)
Usually easy to solve, whether SAT orUNSAT
![Page 130: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/130.jpg)
Solver performance
For easy formulas, mundane changes matter (sample
of 84355 formulas, not a general tool comparison)
������������ ��������������������� ����� ���������������
����� �������������
�
����
����
����
����
����
����
� �����!�"
![Page 131: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/131.jpg)
Checked/bounded copy
v = i & 0x0f
v = 0; if (i < 16) v = i
Low High Sample #SAT Actual4.00 4.00 N/A N/A 4
![Page 132: Harnessing computing power for security: BitBlaze ... · Instead, new two-phase distance algorithm that first computes entry-to-exit distances bottom up, then adds unmatched returns](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c7aa56c23c963c83b8755/html5/thumbnails/132.jpg)
Multiplication and division
v = i * 2
Low High Sample #SAT Actual6.58 32.0 [30.4, 31.6] 31.5 31
v = i / 2
Low High Sample #SAT Actual6.58 31.0 [30.8, 31.0] 31.7 31