boosting verification by automatic tuning of decision procedures
DESCRIPTION
Boosting Verification by Automatic Tuning of Decision Procedures. Domagoj Babi ć joint work with Frank Hutter, Holger H. Hoos, Alan J. Hu University of British Columbia. Decision procedures. Decision procedure. formula. SAT(solution)/UNSAT. Core technology for formal reasoning - PowerPoint PPT PresentationTRANSCRIPT
Automatic Tuning 1/33
Boosting Verification by Automatic Tuning ofDecision Procedures
Domagoj Babić
joint work with Frank Hutter, Holger H. Hoos, Alan J. Hu University of British Columbia
Automatic Tuning 2/33
Decision procedures
• Core technology for formal reasoning
• Trend towards completely automatized verification– Scalability is problematic– Better (more scalable) decision procedures needed– Possible direction: application-specific tuning
Decisionprocedureformula SAT(solution)/UNSAT
Automatic Tuning 3/33
Outline
• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work
Automatic Tuning 4/33
Performance of Decision Procedures
• Heuristics
• Learning (avoiding repeating redundant work)
• Algorithms
Automatic Tuning 5/33
Heuristics and search parameters
• The brain of every decision procedure– Determine performance
• Numerous heuristics:– Learning, clause database cleanup, variable/phase
decision,...• Numerous parameters:
– Restart period, variable decay, priority increment,...
• Significantly influence the performance• Parameters/heuristics perform differently on
different benchmarks
Automatic Tuning 6/33
Spear bit-vector decision procedureparameter space
• Large number of combinations:– After limiting the range of double & unsigned
– After discretization of double parameters
3.78£1018
– After exploiting dependencies
8.34£1017 combinations– Finding a good
combination – hard!
Spear 1.9:– 4 heuristics X
22 optimization functions– 2 heuristics X
3 optimization functions– 12 double– 4 unsigned– 4 bool
------------------------ 26 parameters
Automatic Tuning 7/33
Goal
• Find a good combination of parameters (and heuristics):– Optimize for different problem sets
(minimizing the average runtime)
• Avoid time-consuming manual optimization
• Learn from found parameter sets– Apply that knowledge to design of decision
procedures
Automatic Tuning 8/33
Outline
• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work
Automatic Tuning 9/33
Manual optimization
• Standard way for finding parameter sets
• Developers pick small set of easy benchmarks(Hard benchmarks = slow development cycle)– Hard to achieve robustness– Easy to over-fit (to small and specific benchmarks)
• Spear manual tuning:– Approximately one week of tedious work
Automatic Tuning 10/33
When to give up manual optimization?
• Depends mainly on sensitivity of the decision procedure to parameter modifications
• Decision procedures for NP-hard problems extremely sensitive to parameter modifications– 1-2 orders of magnitude changes in performance
usual– Sometimes up to 4 orders of magnitude
Automatic Tuning 11/33
Sensitivity Example
• Example: same instance, same parameters, same machine, same solver– Spear compiled with 80-bit floating-point precision:
0.34 [s] – Spear compiled with 64-bit floating-point precision:
times out after 6000 [s]– First ~55000 decisions equal, one mismatch, next
~100 equal, then complete divergence• Manual optimization for NP-hard problems
ineffective.
Automatic Tuning 12/33
Outline
• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work
Automatic Tuning 13/33
Automatic tuning
• Loop until happy (with found parameters)
– Perturb existing set of parameters
– Perform hill-climbing:• Modify one parameter at the time• Keep modification if improvement• Stop when a local optimum is found
Automatic Tuning 14/33
Implementation: FocusedILS [Hutter, Hoos, Stutzle, ’07]
• Used for Spear tuning• Adaptively chooses training instances
– Quickly discard poor parameter settings– Evaluate better ones more thoroughly
• Any scalar metric can be optimized– Runtime, precision, number of false
positives,...• Can optimize median, average,...
Automatic Tuning 15/33
Outline
• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work
Automatic Tuning 16/33
Experimental Setup - Benchmarks
• 2 experiments:– General purpose tuning (Spear v0.9)
• Industrial instances from previous SAT competitions– Application-specific tuning (Spear v1.8)
• Bounded model checking instances (BMC)• Calysto software checking instances
• Machines– 55 dual 3.2 GHz Intel Xeon PCs w/ 2 GB RAM cluster
• Benchmark sets divided– Training & test, disjoint– Test timeout: 10 hrs
Automatic Tuning 17/33
Tuning 1: General-purpose optimization
• Training – Timeout: 10 sec– Risky, but no experimental evidence of over-fitting– 3 days of computation on cluster
• Very heterogeneous training set– Industrial instances from previous competitions
• 21% geometric mean speedup on industrial test set over the manual settings
• ~3X on bounded model checking• ~78X on Calysto software checking
Automatic Tuning 18/33
Tuning 1: Bounded model checking instances
Automatic Tuning 19/33
Tuning1: Calysto instances
Automatic Tuning 20/33
Tuning 2: Application-specific optimization
• Training – Timeout: 300 sec– Bounded model checking optimization – 2 days on the cluster– Calysto instances – 3 days on the cluster
• Homogeneous training set
• Speedups over SAT competition settings:– ~2X on BMC– ~20X on SWV
• Speedups over manual settings:– ~4.5X on BMC– ~500X on SWV
Automatic Tuning 21/33
Tuning 2:Bounded model checking instances
~4.5X
Automatic Tuning 22/33
Tuning 2: Calysto instances
~500X
Automatic Tuning 23/33
Overall Results
Solver BMC SWV
#(solved) Avg.runtime (solved)
#(solved) Avg.runtime (solved)
Minisat 289/377 360.9 302/302 161.3
Spear manual
287/377 340.8 298/302 787.1
Spear SAT comp
287/377 223.4 302/302 35.9
Spear auto-tunedapp-specific
291/377 113.7 302/302 1.5
Automatic Tuning 24/33
Overall Results
Solver BMC SWV
#(solved) Avg.runtime (solved)
#(solved) Avg.runtime (solved)
Minisat 289/377 360.9 302/302 161.3
Spear manual
287/377 340.8 298/302 787.1
Spear SAT comp
287/377 223.4 302/302 35.9
Spear auto-tunedapp-specific
291/377 113.7 302/302 1.5
Automatic Tuning 25/33
Overall Results
Solver BMC SWV
#(solved) Avg.runtime (solved)
#(solved) Avg.runtime (solved)
Minisat 289/377 360.9 302/302 161.3
Spear manual
287/377 340.8 298/302 787.1
Spear SAT comp
287/377 223.4 302/302 35.9
Spear auto-tunedapp-specific
291/377 113.7 302/302 1.5
Automatic Tuning 26/33
Overall Results
Solver BMC SWV
#(solved) Avg.runtime (solved)
#(solved) Avg.runtime (solved)
Minisat 289/377 360.9 302/302 161.3
Spear manual
287/377 340.8 298/302 787.1
Spear SAT comp
287/377 223.4 302/302 35.9
Spear auto-tunedapp-specific
291/377 113.7 302/302 1.5
Automatic Tuning 27/33
Overall Results
Solver BMC SWV
#(solved) Avg.runtime (solved)
#(solved) Avg.runtime (solved)
Minisat 289/377 360.9 302/302 161.3
Spear manual
287/377 340.8 298/302 787.1
Spear SAT comp
287/377 223.4 302/302 35.9
Spear auto-tunedapp-specific
291/377 113.7 302/302 1.5
Automatic Tuning 28/33
Outline
• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work
Automatic Tuning 29/33
Software verification parameters
– Greedy activity-based heuristic• Probably helps focusing on the most frequently
used sub-expressions– Aggressive restarts
• Probably standard heuristics and initial ordering do not work well for SWV problems
– Phase selection: always false• Probably related to checked property
(NULL ptr dereference)– No randomness
• Spear & Calysto highly optimized
Automatic Tuning 30/33
Bounded model checking parameters
– Less aggressive activity heuristic– Infrequent restarts
• Probably initial ordering (as encoded) works well– Phase selection: less watched clauses
• Minimizes the amount of work– Small amount of randomness helps
• 5% random variable and phase decisions– Simulated annealing works well
• Decrease randomness by 30% after each restart• Focuses the solver on hard chunks of the design
Automatic Tuning 31/33
Outline
• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work
Automatic Tuning 32/33
Future Work
• Per-instance tuning(machine-learning-based techniques)
• Analysis of relative importance of parameters– Simplify the solver
• Tons of data, little analysis done... Correlations between parameters and stats could reveal important dependencies...
Automatic Tuning 33/33
Take-away messages
• Automatic tuning effective– Especially application-specific
• Avoids time-consuming manual tuning
• Sensitivity to parameter modifications– Few benchmarks = inconclusive results?