hordesat: a massively parallel portfolio sat solvercdcl (cnf formula f) cdcl0 while not all...

HordeSat: A Massively Parallel Portfolio SATSolver

Tomas Balyo, Peter Sanders, Carsten Sinz ?

Karlsruhe Institute of Technology (KIT)Karlsruhe, Germany

Abstract. A simple yet successful approach to parallel satisfiability(SAT) solving is to run several different (a portfolio of) SAT solvers onthe input problem at the same time until one solver finds a solution. TheSAT solvers in the portfolio can be instances of a single solver with differ-ent configuration settings. Additionally the solvers can exchange informa-tion usually in the form of clauses. In this paper we investigate whetherthis approach is applicable in the case of massively parallel SAT solving.Our solver is intended to run on clusters with thousands of processors,hence the name HordeSat. HordeSat is a fully distributed portfolio-basedSAT solver with a modular design that allows it to use any SAT solverthat implements a given interface. HordeSat has a decentralized designand features hierarchical parallelism with interleaved communication andsearch. We experimentally evaluated it using all the benchmark problemsfrom the application tracks of the 2011 and 2014 International SAT Com-petitions. The experiments demonstrate that HordeSat is scalable up tohundreds or even thousands of processors achieving significant speedupsespecially for hard instances.

1 Introduction

Boolean satisfiability (SAT) is one of the most important problems of theoreticalcomputer science with many practical applications in which SAT solvers are usedin the background as high performance reasoning engines. These applicationsinclude automated planning and scheduling [21], formal verification [22], andautomated theorem proving [10]. In the last decades the performance of state-of-the-art SAT solvers has increased dramatically thanks to the invention ofadvanced heuristics [25], preprocessing and inprocessing techniques [19] and datastructures that allow efficient implementation of search space pruning [25].

The next natural step in the development of SAT solvers was parallelization.A very common approach to designing a parallel SAT solver is to run severalinstances of a sequential SAT solver with different settings (or several differentSAT solvers) on the same problem in parallel. If any of the solvers succeedsin finding a solution all the solvers are terminated. The solvers also exchangeinformation mainly in the form of learned clauses. This approach is referred to

? This research was partially supported by DFG project SA 933/11-1

as portfolio-based parallel SAT solving and was first used in the SAT solverManySat [14]. However, so far it was not clear whether this approach can scaleto a large number of processors.

Another approach is to run several search procedures in parallel and ensurethat they work on disjoint regions of the search space. This explicit search spacepartitioning has been used mainly in solvers designed to run on large parallelsystems such as clusters or grids of computers [9].

In this paper we describe HordeSat – a scalable portfolio-based SAT solverand evaluate it experimentally. Using efficient yet thrifty clause exchange andadvanced diversification methods, we are able to keep the search spaces largelydisjoint without explicitly splitting search spaces. Another important feature ofHordeSat is its modular design, which allows it to be independent of any concretesearch engines. HordeSat uses Sat solvers as black boxes communicating withthem via a minimalistic interface.

Experiments made using benchmarks from the application tracks of the 2011and 2014 Sat Competitions [3] show that HordeSat can outperform state-of-the-art parallel SAT solvers on multiprocessor machines and is scalable on computerclusters with thousands of processors. Indeed, we even observe superlinear aver-age speedup for difficult instances.

2 Preliminaries

A Boolean variable is a variable with two possible values True and False. Bya literal of a Boolean variable x we mean either x or x (positive or negativeliteral). A clause is a disjunction (OR) of literals. A conjunctive normal form(CNF) formula is a conjunction (AND) of clauses. A clause can be also inter-preted as a set of literals and a formula as a set of clauses. A truth assignment φof a formula F assigns a truth value to its variables. The assignment φ satisfiesa positive (negative) literal if it assigns the value True (False) to its variableand φ satisfies a clause if it satisfies any of its literals. Finally, φ satisfies a CNFformula if it satisfies all of its clauses. A formula F is said to be satisfiable ifthere is a truth assignment φ that satisfies F . Such an assignment is called asatisfying assignment. The satisfiability problem (SAT) is to find a satisfyingassignment of a given CNF formula or determine that it is unsatisfiable.

Conflict Driven Clause Learning. Most current complete state-of-the-artSAT solvers are based on the conflict-driven clause learning (CDCL) algorithm [23].In this paper we will use CDCL solvers only as black boxes and therefore weprovide only a very coarse-grained description. For a detailed discussion ofCDCL refer to [5]. In Figure 1 we give a pseudo-code of CDCL. The algo-rithm performs a depth-first search of the space of partial truth assignments(assignDecisionLiteral, backtrack – unassigns variables) interleaved withsearch space pruning in the form of unit propagation (doUnitPropagation) andlearning new clauses when the search reaches a conflict state (analyzeConflict,addLearnedClause). If a conflict cannot be resolved by backtracking then the

2

CDCL (CNF formula F )CDCL0 while not all variables assigned doCDCL1 assignDecisionLiteralCDCL2 doUnitPropagationCDCL3 if conflict detected thenCDCL4 analyzeConflictCDCL5 addLearnedClauseCDCL6 backtrack or return UNSATCDCL7 return SAT

Fig. 1. Pseudo-code of the conflict-driven clause learning (CDCL) algorithm.

formula is unsatisfiable. If all the variables are assigned and no conflict is de-tected then the formula is satisfiable.

3 Related Work

In this section we give a brief description of previous parallel SAT solving ap-proaches. A much more detailed listing and description of existing parallel solverscan be found in recently published overview papers such as [15,24].

Parallel CDCL – Pure Portfolios. The simplest approach is to run CDCLseveral times on the same problem in parallel with different parameter settingsand exchanging learned clauses. If there is no explicit search space partitioningthen this approach is referred to as the pure portfolio algorithm. The first parallelportfolio SAT solver was ManySat [14]. The winner of the latest (2014) SatCompetition’s parallel track – Plingeling [4] is also of this kind.

The motivation behind the portfolio approach is that the performance ofCDCL is heavily influenced by a high number of different settings and parametersof the search such as the heuristic used to select a decision literal. Numerousheuristics can be used in this step [25] but none of them dominates all the otherheuristics on each problem instance. Decision heuristics are only one of the manysettings that strongly influence the performance of CDCL solvers. All of thesesettings can be considered when the diversification of the portfolio is performed.For an example see ManySat [14]. Automatic configuration of SAT solvers inorder to ensure that the solvers in a portfolio are diverse is also studied [30].

Exchanging learned clauses grants an additional boost of performance. It isan important mechanism to reduce duplicate work, i.e., parallel searches workingon the same part of the search space. A clause learned from a conflict by oneCDCL instance distributed to all the other CDCL instances will prevent themfrom doing the same work again in the future.

The problem related to clause sharing is to decide how many and whichclauses should be exchanged. Exchanging all the learned clauses is infeasible es-pecially in the case of large-scale parallelism. A simple solution is to distributeall the clauses that satisfy some conditions. The conditions are usually related

3

to the length of the clauses and/or their glue value [1]. An interesting techniquecalled “lazy clause exchange” was introduced in a recent paper [2]. We leavethe adaptation of this technique to future work however, since it would makethe design of our solver less modular. Most of the existing pure portfolio SATsolvers are designed to run on single multi-processor computers. An exception isCL-SDSAT [17] which is designed for solving very difficult instances on looselyconnected grid middleware. It is not clear and hard to quantify whether this ap-proach can yield significant speedups since the involved sequential computationtimes would be huge.

Parallel CDCL – Partitioning The Search Space Explicitly. The classi-cal approach to parallelizing SAT solving is to split the search space between thesearch engines such that no overlap is possible. This is usually done by startingeach solver with a different fixed partial assignment. If a solver discovers thatits partial assignment cannot be extended into a solution it receives a new as-signment. Numerous techniques have presented how to manage the search spacesplitting based on ideas such guiding paths [9], work stealing [20], and generat-ing sufficiently many tasks [11]. Similarly to the portfolio approach the solversexchange clauses.

Most of the previous SAT solvers designed for computer clusters or grids useexplicit search space partitioning. Examples of such solvers are GridSAT [9], PM-SAT [11], GradSat [8], C-sat [26], ZetaSat [6] and SatCiety [28]. ExperimentallyComparing HordeSat with those solvers is problematic, since these solvers arenot easily available online or they are implemented for special environments usingnon-standard middleware. Nevertheless we can get some conclusions based onlooking at the experimental sections of the related publications.

Older grid solvers such as GradSat [8], PM-SAT [11] SatCiety [28], ZetaSat [6]and C-sat [26] are evaluated on only small clusters (up to 64 processors) usingsmall sets of older benchmarks, which are easily solved by current state-of-the-artsequential solvers and therefore it is impossible to tell how well do they scale fora large number of processors and current benchmarks. The solver GridSAT [9] isrun on a large heterogeneous grid of computers containing hundreds of nodes forseveral days and is reported to solve several (at that time) unsolved problems.Nevertheless, most of those problems can now be solved by sequential solversin a few minutes. Speedup results are not reported. A recent grid-based solvingmethod called Part-Tree-Learn [16] is compared to Plingeling and is reportedto solve less instances than Plingeling. This is despite the fact that in theircomparison the number of processors available to Plingeling was slightly less [16].

To design a successful explicit partitioning parallel solver, complex load bal-ancing issues must be solved. Additionally, explicit partitioning clearly bringsruntime and space overhead. If the main motivation of explicit partitioning is toensure that the search-spaces explored by the solvers have no overlap, then webelieve that the extra work does not pay off and frequent clause sharing is enough

4

to approximate the desired behavior 1. Moreover, in [18] the authors argue thatplain partitioning approaches can increase the expected runtime compared topure portfolio systems. They prove that under reasonable assumptions there isalways a distribution that results in an increased expected runtime unless theprocess of constructing partitions is ideal.

4 Design Decisions

In this section we provide an overview of the high level design decisions madewhen designing our portfolio-based SAT solver HordeSat.Modular Design. Rather than committing to any particular SAT solver we

design an interface that is universal and can be efficiently implemented by currentstate-of-the-art SAT solvers. This results in a more general implementation andthe possibility to easily add new SAT solvers to our portfolio.Decentralization. All the nodes in our parallel system are equivalent. There

is no leader or central node that manages the search or the communication.Decentralized design allows more scalability and also simplifies the algorithm.Overlapping Search and Communication. The search and the clause ex-

change procedures run in different (hardware) threads in parallel. The systemis implemented in a way that the search procedure never waits for any sharedresources at the expense of losing some of the shared clauses.Hierarchical Parallelization. HordeSat is designed to run on clusters of com-

puters (nodes) with multiple processor cores, i.e., we have two levels of paral-lelization. The first level uses the shared memory model to communicate betweensolvers running on the same node and the second level relies on message passingbetween the nodes of a cluster.

The details and implementation of these points are discussed below.

5 Black Box for Portfolios

Our goal is to develop a general parallel portfolio solver based on existing state-of-the-art sequential CDCL solvers without committing to any particular solver.To achieve this we define a C++ interface that is used to access the solvers inthe portfolio. Therefore new SAT solvers can be easily added just by implement-ing this interface. By core solver we will mean a SAT solver implementing theinterface.

In this section we describe the essential methods of the interface. All themethods are required to be implemented in a thread safe way, i.e., safe executionby multiple threads at the same time must be guaranteed. First we start withthe basic methods which allow us to solve formulas and interrupt the solver.

1 According to our experiments only 2-6% of the clauses are learned simultaneouslyby different solvers in a pure portfolio, which is an indication that the overlap ofsearch-spaces is relatively small.

5

void addClause(vector<int> clause): This method is used to load the initial

formula that is to be solved. The clauses are represented as lists of literals whichare represented as integers in the usual way. All the clauses must be consideredby the solver at the next call of solve.

SatResult solve(): This method starts the search for the solution of the for-

mula specified by the addClause calls. The return value is one of the followingSatResult = {SAT, UNSAT, UNKNOWN}. The result UNKNOWN is returned whenthe solver is interrupted by calling setSolverInterrupt().

void setSolverInterrupt(): Posts a request to the core solver instance to in-

terrupt the search as soon as possible. If the method solve has been called,it will return UNKNOWN. Subsequent calls of solve on this instance must returnUNKNOWN until the method unsetSolverInterrupt is called.

void unsetSolverInterrupt(): Removes the request to interrupt the search.

Using these four methods, a simple portfolio can be built. When using sev-eral instances of the same deterministic SAT solver, some diversification can beachieved by adding the clauses in a different order to each solver.

More options for diversification are made possible via the following two meth-ods. A good way of diversification is to set default phase values for the variablesof the formula, i.e., truth values to be tried first. These are then used by thecore solver when selecting decision literals. In general many solver settings canbe changed to achieve diversification. Since these may be different for each coresolver we define a general method for diversification which the core solver canimplement in its own specific way.

void setPhase(int var, bool phase): This method is used to set a default

phase of a variable. The solver is allowed to ignore these suggestions.

void diversify(int rank, int size): This method tells the core solver to di-

versify its settings. The specifics of diversification are left to the solver. Theprovided parameters can be used by the solver to determine how many solversare working on this problem (size) and which one of those is this solver (rank).A trivial implementation of this method could be to set the pseudo-randomnumber generator seed of the core solver to rank.

The final three methods of the interface deal with clause sharing. The solverscan produce and accept clauses. Not all the learned clauses are shared. It isexpected that each core solver initially offers only a limited number of clauseswhich it considers most worthy of sharing. The solver should increase the numberof exported clauses when the method increaseClauseProduction is called. Thiscan be implemented by relaxing the constraints on the learned clauses selectedfor exporting.

void addLearnedClause(vector<int> clause): This method is used to add

learned clauses received from other solvers of the portfolio. The core solver candecide when and whether the clauses added using this method are actually con-sidered during the search.

void setLearnedClauseCallback(LCCallback* callback): This method is

used to set a callback class that will process the clauses shared by this solver. To

6

export a clause, the core solver will call the void write(vector<int> clause)

method of the LCCallback class. Each clause exported by this method must be alogical consequence of the clauses added using addClause or addLearnedClause.void increaseClauseProduction(): Inform the solver that more learned clauses

should be shared. This could mean for example that learned clauses of biggersize or higher glue value [1] will be shared.

The interface is designed to closely match current CDCL SAT solvers, butany kind of SAT solver can be used. For example a local search SAT solver couldimplement the interface by ignoring the calls to the clause sharing methods.

For our experiments we implemented the interface by writing binding codefor MiniSat [29] and Lingeling [4]. In the latter case no modifications to thesolver were required and the binding code only uses the incremental interface ofLingeling. As for MiniSat, the code has been slightly modified to support thethree clause sharing methods.

6 The Portfolio Algorithm

In this section we describe the main algorithm used in HordeSat. As alreadymentioned in section 4 we use two levels of parallelization. HordeSat can beviewed as a multithreaded program that communicates using messages withother instances of the same program. The communication is implemented usingthe Message Passing Interface (MPI) [12]. Each MPI process runs the samemultithreaded program and takes care about the following tasks:

– Start the core solvers using solve. Use one fresh thread for each core solver.– Read the formula and add its clauses to each core solver using addClause.– Ensure diversification of the core solvers with respect to the other processes.– Ensure that if one of the core solvers solves the problem all the other

core solvers and processes are notified and stopped. This is done by us-ing setSolverInterrupt for each core solver and sending a message to allthe participating processes.

– Collect the exported clauses from the core solvers, filter duplicates and sendthem to the other processes. Accept the exported clauses of the other pro-cesses, filter them and distribute them to the core solvers.

The tasks of reading the input formula, diversification, and solver starting areperformed once after the start of the process. The communication of ending andclause exchange is performed periodically in rounds until a solution is found. Themain thread sleeps between these rounds for a given amount of time specified asa parameter of the solver (usually around 1 second). The threads running thecore solvers are working uninterrupted during the whole time of the search.

6.1 Diversification

Since we can only access the core solvers via the interface defined above, ouronly tools for diversification are setting phases using the setPhase method andcalling the solver specific diversify method.

7

The setPhase method allows us to partition the search space in a semi-explicit fashion. An explicit search space splitting into disjoint subspaces is usu-ally done by imposing phase restrictions instead of just recommending them.The explicit approach is used in parallel solvers utilizing guiding paths [9] anddynamic work stealing [20].

We have implemented and tested the following diversification proceduresbased on literal phase recommendations.

– Random. Each variable gets a phase recommendation for each core solverrandomly. Note that this is different from selecting a random phase eachtime a decision is made for a variable in the CDCL procedure.

– Sparse. Each variable gets a random phase recommendation on exactly oneof the host solvers in the entire portfolio. For the other solvers no phaserecommendation is made for the given variable.

– Sparse Random. For each core solver each variable gets a random phaserecommendation with a probability of (#solvers)−1, where #solvers is thetotal number of core solvers in the portfolio.

Each of these can be used in conjunction with the diversify method whosebehavior is defined by the core solvers. As already mentioned we use Lingelingand MiniSat as core solvers. In case of MiniSat, we implemented the diversify

method by only setting the random seed. For Lingeling we copied the diver-sification algorithm from Plingeling [4], which is the multi-threaded version ofLingeling based on the portfolio approach and the winner of the parallel ap-plication track of the 2014 SAT Competition [3]. In this algorithm 16 differentparameters of Lingeling are used for diversification.

6.2 Clause Sharing

The clause sharing in our portfolio happens periodically in rounds. Each round afixed sized (1500 integers in the implementation) message containing the literalsof the shared clauses is exchanged by all the MPI processes in an all-to-all fashion.This is implemented by using the MPI Allgather [12] collective communicationroutine defined by the MPI standard.

Each process prepares the message by collecting the learned clauses fromits core solvers. The clauses are filtered to remove duplicates. The fixed sizedmessage buffer is filled up with the clauses, shorter clauses are preferred. Clausesthat did not fit are discarded. If the buffer is not filled up to its full capacitythen one of the core solvers of the process is requested to increase its clauseproduction by calling the increaseClauseProduction method.

The detection of duplicate clauses is implemented by using Bloom filters [7].A Bloom filter is a space-efficient probabilistic set data structure that allowsfalse-positive matches, which in our case means that some clauses might beconsidered to be duplicates even if they are not. The usage of Bloom filtersrequires a set of hash functions that map clauses to integers. We use the following

8

hash function which ensures that permuting the literals of a clause does notchange its hash value.

Hi(C) =⊕`∈C

` · primes[abs(` · i) mod |primes|]

where i > 0 is a parameter we are free to choose, C is a clause, ⊕ denotesbitwise exclusive-or, and primes is an array of large prime numbers. Literals areinterpreted as integers in the usual way, i.e., xj as j and xj as −j.

Each MPI process maintains one Bloom filter gx for each of its core solversx and an additional global one g. When a core solver x exports a learned clauseC, the following steps are taken.

– Clause C is added to gx.– If C 6∈ g, C is added to g as well as into a data structure e for export.– If several core solvers concurrently try to access e, only one will succeed and

the new clauses of the other core solvers are ignored. This way, we avoidcontention at the shared resource e and rather ignore some clauses.

After the global exchange of learned clauses, the incoming clauses need to befiltered for duplicates and distributed to the core solvers. The first task is doneby using the global Bloom filter g. For the second task we utilize the thread localfilters gx to ensure that each of them receives only new clauses.

All the Bloom filters are periodically reset, which allows the repeated sharingof clauses after some time. Our initial experiments showed that this approach ismore beneficial than maintaining a strict “no duplicate clauses allowed”-policy.

Overall, there are three reasons why a clause offered by a core solver can getdiscarded. One is that it was duplicate or wrongly considered to be duplicate dueto the probabilistic nature of Bloom filters. Second is that another core solverwas adding its clause to the data structure for global export at the same time.The last reason is that it did not fit into the fixed size message sent to the otherMPI processes. Although important learned clauses might get lost, we believethat this relaxed approach is still beneficial since it allows a simpler and moreefficient implementation of clause sharing.

7 Experimental Evaluation

To examine our portfolio-based parallel SAT solver HordeSat we did experimentswith two kinds of benchmarks. We used the benchmark formulas from the ap-plication tracks of the 2011 and 2014 SAT Competitions [3] (545 instances) 2

and randomly generated 3-SAT formulas (200 sat and 200 unsat instances). Therandom formulas have 250–440 variables and 4.25 times as many clauses, whichcorresponds to the phase transition of 3-SAT problems [27].

2 Originally we only used the 2014 instances. A reviewer suggested to try the 2011instances also, conjecturing that they would be harder to parallelize. Surprisingly,the opposite turned out to be true.

9

0

100

200

300

400

500

600

700

800

900

1000

0 20 40 60 80 100 120 140 160 180 200

Tim

e in s

eco

nds

Problems

Satisfiable Instances

No Diversification, No SharingOnly Sharing

Only DiversificationDiversification and Sharing

0

100

200

300

400

500

600

700

800

900

1000

0 20 40 60 80 100 120 140 160

Tim

e in s

eco

nds

Problems

Unsatisfiable Instances

No Diversification, No SharingOnly Sharing

Only DiversificationDiversification and Sharing

Fig. 2. The influence of diversification and clause sharing on the performance of Horde-Sat using Lingeling (16 processes with 1 thread each) on random 3-SAT problems.

The experiments were run on a cluster allowing us to reserve up to 128 nodes.Each node has two octa-core Intel Xeon E5-2670 processors (Sandy Bridge) with2.6 GHz and 64 GB of main memory. Therefore each node has 16 cores andthe total number of available cores is 2048. The nodes communicate using anInfiniBand 4X QDR Interconnect and use the SUSE Linux Enterprise Server 11(x86 64) (patch level 3) operating system. HordeSat was compiled using g++(SUSE Linux) 4.3.4 [gcc-4 3-branch revision 152973] with the “-O3” flag.

If not stated otherwise, we use the following parameters: The time of sleepingbetween clause sharing rounds is 1 second. The default diversification algorithmis the combination of “sparse random” and the native diversification of the coresolver. In the current version two core solvers are supported – Lingeling andMiniSat. The default value is Lingeling which is used in all the experimentspresented below. It is also possible to use a combination of Lingeling and MiniSat.

10

Using only Lingeling gives by far the best results on the used benchmarks. Thetime limit per instance is 1 000 seconds for parallel solvers and 50 000 seconds forthe sequential solver Lingeling. Detailed results of all the presented experimentsas well as the source code of HordeSat and all the used benchmark problems canbe found at http://baldur.iti.kit.edu/hordesat.

7.1 Clause Sharing and Diversification

We investigated the individual influence of clause sharing and diversificationon the performance of our portfolio. In the case of application benchmarks weobtained the unsurprising result that both diversification and clause sharing arehighly beneficial for satisfiable as well as unsatisfiable instances. However, forrandom 3-SAT problems the results are more interesting.

By looking at the cactus plots in Figure 2 we can observe that clause sharingis essential for unsatisfiable instances while not significant and even slightlydetrimental for satisfiable problems. On the other hand, diversification has onlya small benefit for unsatisfiable instances. This observation is related to a moregeneral question of intensification vs diversification in parallel SAT solving [13].

For the experiments presented in Figure 2 we used sparse diversificationcombined with the diversify method, which in this case copies the behaviorof Plingeling. It is important to note that some diversification arises due to thenon-deterministic nature of Lingeling, even when we do not invoke it explicitlyby using the setPhase or diversify methods.

7.2 Scaling on Application Benchmarks

In parallel processing, one usually wants good scalability in the sense that thespeedup over the best sequential algorithm goes up near linearly with the numberof processors. Measuring scalability in a reliable and meaningful way is difficultfor SAT solving since running times are highly nondeterministic. Hence, we needcareful experiments on a large benchmark set chosen in an unbiased way. Wetherefore use the application benchmarks of the 2011 and 2014 Sat Competi-tions. Our sequential reference is Lingeling which won the most recent (2014)competition. We ran experiments using 1,2,4,. . . ,512 processes with four threadseach, each cluster nodes runs 4 processes. The results are summarized in Figure 3using cactus plots. We can observe that increased parallelism is always beneficialfor the 2011 benchmarks. In the case of all the benchmarks the benefits beyond32 nodes are not apparent.

From a cactus plot it is not easy to see whether the additional performanceis a reasonable return on the invested hardware resources. Therefore Table 1summarizes that information in several ways in order to quantify the overallscalability of HordeSat on the union of the 2011 and 2013 benchmarks. Wecompute speedups for all the instances solved by the parallel solver. For instancesnot solved by Lingeling within its time limit T = 50 000s we generously assumethat it would solve them if given T + ε seconds and use the runtime of T forspeedup calculation. Column 4 gives the average of these values. We observe

11

http://baldur.iti.kit.edu/hordesat

0

200

400

600

800

1000

1200

0 50 100 150 200 250

Tim

e in s

eco

nds

Problems

Lingeling1x4x42x4x44x4x48x4x4

16x4x432x4x464x4x4

128x4x4

0

200

400

600

800

1000

1200

0 50 100 150 200 250 300 350 400 450 500

Tim

e in s

eco

nds

Problems

Lingeling1x4x42x4x44x4x48x4x4

16x4x432x4x464x4x4

128x4x4

Fig. 3. The impact of doubling the number of processors on the runtime and thenumber solved problems for the 2011 and the union of 2011 and 2013 applicationinstances. The labels represent (#nodes)x(#processes/node)x(#threads/process).

considerable superlinear speedups on average for all the configurations tried.However, this average is not a very robust measure since it is highly dependenton a few very large speedups that might be just luck. In Column 5 we show thetotal speedup, which is the sum of sequential runtimes divided by the sum ofparallel runtimes and Column 6 contains the median speedup.

Nevertheless, these figures treat HordeSat unfairly since most instances areactually too easy for investing a lot of hardware. Indeed, in parallel computing,it is usual to analyze the performance on many processors using weak scalingwhere one increases the amount of work involved in the considered instancesproportionally to the number of processors. Therefore in columns 7–9 we re-strict ourselves to those instances where Lingeling needs at least 10p secondswhere p is the number of core solvers used by HordeSat. The average speedup

12

Core Parallel Both Speedup All Speedup Big

Solvers Solved Solved Avg. Tot. Med. Avg. Tot. Med. CBS

1x4x4 385 363 303 25.01 3.08 524 26.83 4.92 5.86

2x4x4 421 392 310 30.38 4.35 609 33.71 9.55 22.44

4x4x4 447 405 323 41.30 5.78 766 49.68 16.92 68.90

8x4x4 466 420 317 50.48 7.81 801 60.38 32.55 102.27

16x4x4 480 425 330 65.27 9.42 1006 85.23 63.75 134.37

32x4x4 481 427 399 83.68 11.45 1763 167.13 162.22 209.07

64x4x4 476 421 377 104.01 13.78 2138 295.76 540.89 230.37

128x4x4 476 421 407 109.34 13.05 2607 352.16 867.00 216.69

pling8 372 357 44 18.61 3.11 67 19.20 4.12 4.77

pling16 400 377 347 24.83 3.53 586 26.18 5.89 7.34

1x8x1 373 358 53 19.57 3.13 81 20.42 4.36 4.79

1x16x1 400 376 325 27.78 4.06 548 30.30 6.98 7.34

Table 1. HordeSat configurations (#nodes)x(#processes/node)x(#threads/process)compared to Plingeling with a given number of threads. The second column is thenumber of instances solved by the parallel solvers, the third is the number of instancessolved by both Lingeling and the parallel solver. The following six columns contain theaverage, total, and median speedups for either all the instances solved by the parallelsolvers or only big instances (solved after 10(#threads) seconds by Lingeling). The lastcolumn contains the “count based speedup” values defined in Subsection 7.2.

gets considerably larger as well as the total speedup, especially for the large con-figurations. The median speedup also increases but remains slightly sublinear.Figure 4 shows the distribution of speedups for these instances.

Another way to measure speedup robustly is to compare the times neededto solve a given number of instances. Let T1 (Tp) denote the per instance timelimits of the sequential (parallel) solver (50 000s (1 000s) in our case). Let n1 (np)denote the number of instances solved by the sequential (parallel) solver withintime T1 (Tp). If n1 ≥ np (n1 < np) let T ′1 (T ′p) denote the smallest time limit forthe sequential (parallel) solver such that it solves np (n1) instances within thetime limit T ′1 (T ′p). We define the count based speedup (CBS) as

CBS =

{T1/T

′p if n1 < np

T ′1/Tp otherwise .

The CBS scales almost linearly up to 512 cores and stagnates afterward.We are not sure whether this indicates a scalability limit of HordeSat or ratherreflects a lack of sufficiently difficult instances – in our collection, there are only65 eligible instances.

13

0.1

1

10

100

1000

10000

100000

0 50 100 150 200 250

Speedups

Problems

2x4x44x4x48x4x4

16x4x432x4x464x4x4

128x4x4

Fig. 4. Distribution of speedups on the “big instances” – the data corresponding toColumns 7–9 of Table 1.

0

100

200

300

400

500

600

700

800

900

1000

0 50 100 150 200 250 300 350 400

Tim

e in s

eco

nds

Problems

Lingeling (1 thread)Plingeling (8 threads)

HordeSat 1x8x1 (8 threads)Plingeling (16 threads)

HordeSat 1x16x1 (16 threads)

Fig. 5. Comparison of HordeSat and Plingeling with Lingeling on the 2011 and 2014Sat Competition benchmarks.

7.3 Comparison with Plingeling

The most similar parallel SAT solver to our portfolio is the state-of-the-art solverPlingeling [4]. Plingeling is the winner of the parallel track of the 2014 SATCompetition. Both solvers are portfolio-based, they are using Lingeling and evensome diversification code is shared. The main differences are in the clause sharingalgorithms and that Plingeling does not run on clusters only single computers.For this reason we can compare the two solvers only on a single node. Theresults of this comparison on the benchmark problems of the 2011 and 2014 SATCompetition are displayed in Figure 5. Speedup values are given in Table 1.

14

Both solvers significantly outperform Lingeling. The performance of Horde-Sat and Plingeling is almost indistinguishable when running with 8 cores, whileon 16 cores HordeSat gets slightly ahead of Plingeling.

8 Conclusion

HordeSat has the potential to reduce solution times of difficult yet solvableSAT instances from hours to minutes using hundreds of cores on commodityclusters. This may open up new interactive applications of SAT solving. We findit surprising that this was achieved using a relatively simple, portfolio basedapproach that is independent of the underlying core solver. In particular, thismakes it likely that HordeSat can track future progress of sequential SAT solvers.

The Sat solver that works best with HordeSat for application benchmarksis Lingeling. Plingeling is another parallel portfolio solver based on Lingelingand it is also the winner of the most recent (2014) Sat Competition. Comparingthe performance of HordeSat and Plingeling reveals that HordeSat is almostindistinguishable when running with 8 cores and slightly outperforms Plingelingwhen running with 16 cores. This demonstrates that there is still room for theimprovement of shared memory based parallel portfolio solvers.

Our experiments on a cluster with up to 2048 processor cores show thatHordeSat is scalable in highly parallel environments. We observed superlinearand nearly linear scaling in several measures such as average, total, and medianspeedups, particularly on hard instances. In each case increasing the number ofavailable cores resulted in significantly reduced runtimes.

8.1 Future Work

An important next step is to work on the scalability of HordeSat for 1024 coresand beyond. This will certainly involve more adaptive clause exchange strategies.Even for single node configurations, low level performance improvements whenusing modern machines with dozens of cores seem possible. We also would liketo investigate what benefits can be gained by having a tighter integration ofcore solvers by extending the interface. Including other kinds of (not necessarilyCDCL – based) core solvers might also bring improvements.

When considering massively parallel SAT solving we probably have to moveto even more difficult instances to make that meaningful. When this also meanslarger instances, memory consumption may be an issue when running manyinstances of a SAT solver on a many-core machine. Here it might be interestingto explore opportunities for sharing data structures for multiple SAT solvers orto decompose problems into smaller subproblems by recognizing their structure.

Acknowledgment. We would like to thank Armin Biere for fruitful discussionabout the usage of the Lingeling API in a parallel setting.

15

References

1. Audemard, G., Simon, L.: Predicting learnt clauses quality in modern sat solvers.In: IJCAI. vol. 9, pp. 399–404 (2009)

2. Audemard, G., Simon, L.: Lazy clause exchange policy for parallel sat solvers. In:Theory and Applications of Satisfiability Testing–SAT 2014, pp. 197–205. Springer(2014)

3. Belov, A., Diepold, D., Heule, M.J., Jarvisalo, M.: Sat competition 2014 (2014)

4. Biere, A.: Lingeling, plingeling and treengeling entering the sat competition 2013.In: In Proceedings of SAT Competition 2013, A. Balint, A. Belov, M. J. H. Heule,M. Jarvisalo (editors), vol. B-2013-1 of Department of Computer Science Series ofPublications B pages 51-52, University of Helsinki, 2013. (2013)

5. Biere, A., Heule, M., van Maaren, H., Walsh, T.: Conflict-driven clause learningsat solvers. Handbook of Satisfiability, Frontiers in Artificial Intelligence and Ap-plications pp. 131–153 (2009)

6. Blochinger, W., Westje, W., Kuchlin, W., Wedeniwski, S.: Zetasat-boolean satisfia-bility solving on desktop grids. In: Cluster Computing and the Grid, 2005. CCGrid2005. IEEE International Symposium on. vol. 2, pp. 1079–1086. IEEE (2005)

7. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commu-nications of the ACM 13(7), 422–426 (1970)

8. Chrabakh, W., Wolski, R.: Gradsat: A parallel sat solver for the grid. In: Proceed-ings of IEEE SC03 (2003)

9. Chrabakh, W., Wolski, R.: Gridsat: A chaff-based distributed sat solver for thegrid. In: Proceedings of the 2003 ACM/IEEE conference on Supercomputing. p. 37.ACM (2003)

10. Flanagan, C., Joshi, R., Ou, X., Saxe, J.B.: Theorem proving using lazy proofexplication. In: Computer Aided Verification. pp. 355–367. Springer (2003)

11. Gil, L., Flores, P., Silveira, L.M.: Pmsat: a parallel version of minisat. Journal onSatisfiability, Boolean Modeling and Computation 6, 71–98 (2008)

12. Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable im-plementation of the mpi message passing interface standard. Parallel computing22(6), 789–828 (1996)

13. Guo, L., Hamadi, Y., Jabbour, S., Sais, L.: Diversification and intensification inparallel SAT solving. In: Cohen, D. (ed.) Principles and Practice of ConstraintProgramming - CP 2010. Lecture Notes in Computer Science, vol. 6308, pp. 252–265. Springer (2010)

14. Hamadi, Y., Jabbour, S., Sais, L.: Manysat: a parallel sat solver. Journal on Sat-isfiability, Boolean Modeling and Computation 6, 245–262 (2008)

15. Holldobler, S., Manthey, N., Nguyen, V., Stecklina, J., Steinke, P.: A short overviewon modern parallel sat-solvers. In: Proceedings of the International Conference onAdvanced Computer Science and Information Systems. pp. 201–206 (2011)

16. Hyvarinen, A.E., Junttila, T., Niemela, I.: Grid-based sat solving with itera-tive partitioning and clause learning. In: Principles and Practice of ConstraintProgramming–CP 2011, pp. 385–399. Springer (2011)

17. Hyvarinen, A.E., Junttila, T., Niemela, I.: Incorporating clause learning in grid-based randomized sat solving. Journal on Satisfiability, Boolean Modeling andComputation 6, 223–244 (2014)

18. Hyvarinen, A.E., Manthey, N.: Designing scalable parallel sat solvers. In: Theoryand Applications of Satisfiability Testing–SAT 2012, pp. 214–227. Springer (2012)

16

19. Jarvisalo, M., Heule, M.J.H., Biere, A.: Inprocessing rules. In: Proceedings of IJ-CAR 2012. LNCS, vol. 7364, pp. 355–370. Springer (2012)

20. Jurkowiak, B., Li, C.M., Utard, G.: A parallelization scheme based on work stealingfor a class of sat solvers. Journal of Automated Reasoning 34(1), 73–101 (2005)

21. Kautz, H.A., Selman, B., et al.: Planning as satisfiability. In: ECAI. vol. 92, pp.359–363 (1992)

22. Kuehlmann, A., Paruthi, V., Krohm, F., Ganai, M.K.: Robust boolean reasoningfor equivalence checking and functional property verification. IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems 21(12) (2002)

23. Marques-Silva, J.P., Sakallah, K.A.: Grasp: A search algorithm for propositionalsatisfiability. Computers, IEEE Transactions on 48(5), 506–521 (1999)

24. Martins, R., Manquinho, V., Lynce, I.: An overview of parallel sat solving. Con-straints 17(3), 304–347 (2012)

25. Moskewicz, M.W., Madigan, C.F., Zhao, Y., Zhang, L., Malik, S.: Chaff: Engineer-ing an efficient sat solver. In: Proceedings of the 38th annual Design AutomationConference. pp. 530–535. ACM (2001)

26. Ohmura, K., Ueda, K.: c-sat: A parallel sat solver for clusters. In: Theory andApplications of Satisfiability Testing-SAT 2009, pp. 524–537. Springer (2009)

27. Parkes, A.J.: Clustering at the phase transition. In: In Proc. of the 14th Nat. Conf.on AI. pp. 340–345. AAAI Press / The MIT Press (1997)

28. Schulz, S., Blochinger, W.: Parallel sat solving on peer-to-peer desktop grids. Jour-nal of Grid Computing 8(3), 443–471 (2010)

29. Sorensson, N., Een, N.: Minisat v1.13 a sat solver with conflict-clause minimization.SAT 2005 (2005)

30. Xu, L., Hoos, H., Leyton-Brown, K.: Hydra: Automatically configuring algorithmsfor portfolio-based selection. AAAI Conference on Artificial Intelligence (2010)

17

hordesat: a massively parallel portfolio sat solvercdcl (cnf formula f) cdcl0 while not all...

Documents