OpenJNY
May 04, 2018
95

# Handbook of Knowledge Representation - Chapter 2: Satisfiability Solvers

May 04, 2018

## Transcript

1. Handbook of
Knowledge Representation
Chapter 2: Satisﬁability Solvers
Junya Yamaguchi
May 2, 2018
Tokyo Institute of Technology, Inoue Lab
[email protected]

2. SAT Problem Example
• An assignment is a function ϕ : V → {0, 1}
F = ( x
literal
∨ y ∨ z ) ∧ (¬x ∨ ¬y)
clause
∧ (¬y ∨ ¬z) ∧ (¬z ∨ ¬x)
CNF formula
assignment ϕ(x, y, z) = (0, 0, 1) derives F = 1.
• A formula is satisﬁable (SAT) iff there exists an assignment that
evaluates it TRUE, which is called model.
• Otherwise, we say a formula is unsatisﬁable (UNSAT).
1

3. SAT ιϧόͷ׆༂
• ࠷ѱܭࢉྔ͕ࢦ਺ؔ਺Ͱ͋Δ͜ͱͰ༗໊͕ͩɺSAT ιϧό͸ଟ͘ͷྖҬͰޭ
੷Λ࢒͍ͯ͠Δ
• ιϑτ΢ΣΞ/ϋʔυ΢ΣΞݕূ
• ࣗಈςετύλʔϯੜ੒
• ϓϥϯχϯά
• εέδϡʔϦϯά
• ೥ʹҰճ SAT ͷίϯϖ͕͋Δ (SAT Competition)
• ͦ͜Ͱ਺ଟ͘ͷૉ੖Β͍͠ SAT ιϧό͕஀ੜͨ͠
• Ϟμϯͳ SAT ιϧόʹͳΔͱɺ਺ສม਺ɾ਺ઍ੍໿͔Β੒Δඇৗʹ೉͍͠໰
୊Ͱ΋ղ͚Δ
2

4. SAT ͱ஌ֶࣝशͷؔ܎
• ࠓͰ͸ͨ͘͞ΜͷԠ༻͕͋Δ͕ɺͦͷىݯ͸ ஌ֶࣝश
• ஌ࣝͷදݱྗͱܭࢉྔͷτϨʔυΦϑʹؔ͢Δݚڀ (ୈ 3 ষͰѻ͏) ͕ϝΠϯ
• Ծఆ: ࠷ѱܭࢉྔ͕ଟ߲ࣜ࣌ؒͰ͋Γͳ͕Β΋ɺΊͪΌͪ͘ΌΤϨΨϯτͰදݱ
ྗͷߴ͍දݱݴޠΛΈ͚͍ͭͨ
• 90 ೥୅ͷ͸͡Ίʹɺ2 ͭͷ࿦จ͕͜ͷԾఆʹ௅Μͩ
• ͋Δಛघͳ໰୊Λআ͚͹ɺ΄ͱΜͲͷϥϯμϜ SAT ໰୊͸ͱͯ΋؆୯ʹͱ͚Δ
͜ͱΛࣔͨ͠࿦จ
• ͦͷಛघͳ೉͍͠໰୊Ͱ͑͞΋ɺϩʔΧϧαʔνͷςΫχοΫΛ࢖͑͹؆୯ʹղ
͘͜ͱ͕Ͱ͖Δ͜ͱΛࣔͨ͠࿦จ
3

5. ࢲͨͪ͸࠷ѱܭࢉྔʹͱΒΘΕͯͳ͍͔ʁ
• ਺ඦສ΋ͷม਺Λ΋ͭݱ࣮ͷ SAT ໰୊Ͱͷ੒ޭ
• యܕతͳ SAT ໰୊΍ݱ࣮తͳ NP ׬શͳ໰୊ (ͷܭࢉྔ) ʹରͯ͠͸ɺޮ཰Α͘
ղ͘͜ͱ͕Ͱ͖ΔҰൠղ๏͕͋Δ
• ࠷ѱܭࢉྔʹϏϏΓ͗ͯ͢͸͍͚ͳ͍
• ͜ͷ·· SAT ιϧό͕੒௕͢Ε͹ɺ΋ͬͱෳࡶͳ஌ࣝදݱͷݴޠΛѻ͑ΔΑ
͏ʹͳΔ͸ͣ
• NO: ࠷ѱ࣌ܭࢉྔ͕ଟ߲ࣜͳΞϧΰϦζϜͰͷදݱ
• YES: SAT ιϧόͰղ͚Δൣғͷදݱ
4

6. ষͷߏ੒
1. ࣍ͷ 2 छྨͷιϧόʹ͓͍ͯ࢖ΘΕ͍ͯΔओͳςΫχοΫ
• ׬શ (complete) SAT ιϧό
• ෆ׬શ (incomplete) SAT ιϧό
2. ࣮ફతͳ SAT ූ߸Խʹର͢Δ্هςΫχοΫͷ༗ޮੑ
3. SAT ιϧόͷকདྷ΁ͷల๬
5

7. ఆٛ
• ໋୊࿦ཧࣜ (propositional or Boolean formula) ͸ɺม਺ͷू߹্ʹఆٛ
͞ΕΔ࿦ཧࣜ
• ֤ม਺͸ {FALSE, TRUE} ͷͲͪΒ͔ͷ஋ΛऔΔ
• ศ্ٓ {0, 1} Ͱද͢ࣄ͕ଟ͍
• ม਺ͷू߹ V ʹର͢Δ ׬શׂ౰ͯ (truth assignment)1͸ɺࣸ૾
σ : V → {0, 1} ͷ͜ͱɻ
• ಛʹɺ໋୊࿦ཧࣜΛ 1 ʹධՁ͢ΔΑ͏ͳ׬શׂ౰ͯΛ satisfying assignment
΍ Ϟσϧ (model) ͱݺͿɻ
• SAT ιϧόͰѻ͏ SAT ໰୊͸ CNF (conjunctive normal form) ͱݺ͹Ε
Δಛผͳܗࣜͷ໋୊࿦ཧ͚ࣜͩʹ੍ݶ
1
׬શׂ౰ͯ͸୯ʹʮׂ౰ͯʯͱݺ͹ΕΔ͜ͱ΋͋Γ·͕͢ɺຊεϥΠυͰ͸෦෼ׂ౰ͯͷ͜ͱΛׂ౰ͯͱݺͼɺ׬શׂ౰ͯͱ۠ผ͠·͢ɻ
6

8. ॆ଍Մೳੑ൑ఆ໰୊
Boolean satisﬁability testing (SAT) ໰୊
ೖྗ CNF ܗࣜͷ໋୊࿦ཧࣜ (CNF ࣜ) F
࣭໰ F ʹ͸Ϟσϧ͕͋Δ͔ʁ
CNF ࣜ અͷબݴ
F = C1
∧ C2
· · · ∧ Cm
·ͨ͸ F = {Ci
}m
i=1
અ Ϧςϥϧͷ࿈ݴ
C = l1
∨ l2
· · · ∨ ln
·ͨ͸ C = {li
}n
i=1
Ϧςϥϧ ม਺ x ͔ͦͷ൱ఆ ¬x.
ม਺ x ∈ {0, 1}
7

9. ఆٛ
• અʹؚ·ΕΔϦςϥϧ਺ΛɺઅͷαΠζͱݺͿɻ
• e.g. (x ∨ ¬y ∨ z) ͳΒαΠζ͸ 3
• αΠζ͕ 0 ͷઅΛۭઅ (empty clause)ɺαΠζ͕ 1 ͷઅΛ୯Ґઅ (unit
clause)ɺͦͯ͠αΠζ͕ 2 ͷઅΛόΠφϦઅ (binary clause) ͱͦΕͧΕ
ݺͿɻ
• ͢΂ͯͷઅͷαΠζ͕ͪΐ͏Ͳ k Ͱ͋Δ CNF ࣜΛ k-SAT ͱݺͿɻ
• 2-SAT ͸ଟ߲ࣜ࣌ؒͰٻղՄೳ
• 3-SAT Ҏ্ʹͳΔͱ NP ׬શ
(x1
∨ x2
∨ x3
) ∧ (x4
∨ x5
∨ x6
) ∧ (x7
∨ x8
∨ x9
)
8

10. ఆٛ
• ෦෼ׂ౰ͯ (partial assignment) ͸ɺม਺ू߹ͷ෦෼ू߹ʹର͢Δ׬શׂ
౰ͯͷ͜ͱɻ
• CNF ࣜ F ΁ͷ෦෼ׂ౰ͯ ρ ʹରͯ͠ɺρ Λ୅ೖͯ͠ಘΒΕͨࣜͷ͜ͱΛ
simpliﬁed ͞ΕͨࣜͱݺͼɺF|ρ
Ͱද͢ɻ
• 1 ͭҎ্ͷϦςϥϧ͕ 1 ʹධՁ͞Εͨ͢΂ͯͷઅΛ࡟আ
• 0 ʹධՁ͞Εͨ͢΂ͯͷϦςϥϧΛ࡟আ
ࠓޙɺೖྗͱͯ͠༩͑Δ໋୊࿦ཧࣜʹ͸ CNF ࣜΛ҉໧ͷ͏ͪʹԾఆ͢Δ͕ɺ
ଟ͘ͷ৔߹͜ͷԾఆ͸໰୊ʹͳΒͳ͍ɻ
9

11. 2.2 SAT Solver Technology
Complete Methods

12. Complete Methods
• A complete SAT solver, given the input formula F, either produces
a satisfying assignment for F or proves that F is unsatisﬁable.
• Recent complete methods remain variants of a process introduced
• DPLL procedure:
• Was introduced in the early 1960`s.
• Can prune of the search space based on falsiﬁed clauses.
• Performs a backtrack search in the space of partial truth assignments.
• Main improvements to DPLL:
• Smart branch selection heuristics
• Extensions like clause learning and randomized restarts
• Well-crafted data structures such as lazy implementations and watched
literals
10

13. 2.2.1 The DPLL Procedure

14. DPLL Procedure
11

15. DPLL Procedure
• Repeatedly select an unassigned literal l.
• The step to choose l is called branching step.
• Setting l to TRUE or FALSE is called a decision
• decision level is used to reffer the recursion depth at that stage
• Recursively search for a satisfying assignment for F|l
and F|¬l
.
12

16. DPLL Procedure
• Repeatedly select an unassigned literal l.
• The step to choose l is called branching step.
• Setting l to TRUE or FALSE is called a decision
• decision level is used to reffer the recursion depth at that stage
• Recursively search for a satisfying assignment for F|l
and F|¬l
.
12

17. DPLL Procedure
• Repeatedly select an unassigned literal l.
• The step to choose l is called branching step.
• Setting l to TRUE or FALSE is called a decision
• decision level is used to reffer the recursion depth at that stage
• Recursively search for a satisfying assignment for F|l
and F|¬l
.
12

18. DPLL Procedure
• Repeatedly select an unassigned literal l.
• The step to choose l is called branching step.
• Setting l to TRUE or FALSE is called a decision
• decision level is used to reffer the recursion depth at that stage
• Recursively search for a satisfying assignment for F|l
and F|¬l
.
12

19. DPLL Procedure
13

20. DPLL Procedure – Unit Propagation
• Initial state:
F = (x) ∧ (¬x ∨ y) ∧ (¬x ∨ ¬y ∨ z)
ρ = ∅
• There exists an unit clause (x). Execute unit propagation.
F = (x)∧(¬x∨y) ∧ (¬x∨¬y ∨ z)
ρ = {x}
• Now, another unit clause (y) found, let’s more simplify the formula.
F = (x) ∧ (¬x ∨ y)∧(¬x ∨ ¬y∨z)
ρ = {x, y}
• Execute unit propagation in the same way.
F = (x) ∧ (¬x ∨ y) ∧ (¬x ∨ ¬y ∨ z)
ρ = {x, y, z}
• UnitPropagate() ends since there are no unit clauses. 14

21. Key Features of Modern DPLL-Based SAT Solvers
• Variable selection heuristic
• Clause learning
• The watched literals
scheme
• Conﬂict-directed
Backjump
• Fast backjump
• Assignment stack
shrinking
• Cnﬂict clause minimization
• Randomized restrts
• a.k.a. decision strategy
• MOMS, BOHM maximize a moderately
complex function of the dcurrent var.
and cls. state.
• DLIS selects and ﬁx the literal occuring
most frequently in the yet unsatisﬁed
clauses
• VSIDS chooses a literal based on its
weight which preiodically decays but is
boosted if a clause in which it appears
is used in deriving a conﬂict.
15

22. Key Features of Modern DPLL-Based SAT Solvers
• Variable selection heuristic
• Clause learning
• The watched literals
scheme
• Conﬂict-directed
Backjump
• Fast backjump
• Assignment stack
shrinking
• Cnﬂict clause minimization
• Randomized restrts
• Critical role in the success of modern
complete SAT solvers.
• The idea here is to
• cache lcauses of conﬂictz as
learned clauses.
• utilize this information to prune
the search in a different part of the
search space encountered later.
16

23. Key Features of Modern DPLL-Based SAT Solvers
• Variable selection heuristic
• Clause learning
• The watched literals
scheme
• Conﬂict-directed
Backjump
• Fast backjump
• Assignment stack
shrinking
• Cnﬂict clause minimization
• Randomized restrts
• is a implementaion technique to
accelerate unit propagation, introduced
in zChaff.
• which key idea is to maintain and
lwatchztwo special literals for each not
yet satisﬁed clause.
• {U, U} : Not yet statisﬁed
• {0, U} : Unit clause
• {0, 0} : Empty clause
• has high compatibility with clause
learning.
17

24. Key Features of Modern DPLL-Based SAT Solvers
• Variable selection heuristic
• Clause learning
• The watched literals
scheme
• Conﬂict-directed
Backjump
• Fast backjump
• Assignment stack
shrinking
• Cnﬂict clause minimization
• Randomized restrts
• Backtrack just returns to previous
branching point.
• Backjump repeats backtrack safely as
long as possible.
18

25. Key Features of Modern DPLL-Based SAT Solvers
• Variable selection heuristic
• Clause learning
• The watched literals
scheme
• Conﬂict-directed
Backjump
• Fast backjump
• Assignment stack
shrinking
• Cnﬂict clause minimization
• Randomized restrts
• It lets a solver jump directly to a lower
decision level d where;
• even one branch leads to a conﬂict
involving variables at levels d or
lower only.
• for completeness, the level d is not
marked as unsatisﬁable.
• While conﬂict-directed backjumping is
always beneﬁcial, fast backjumping
may not be so.
19

26. Key Features of Modern DPLL-Based SAT Solvers
• Variable selection heuristic
• Clause learning
• The watched literals
scheme
• Conﬂict-directed
Backjump
• Fast backjump
• Assignment stack
shrinking
• Cnﬂict clause minimization
• Randomized restrts
• It’s a technique to learn smaller and
more pertinent learnt clauses.
• When a conﬂict occurs because of a
clause C′, and the size of learnt clause
C exceeds a certain threshold length;
• the solver backtracks to almost the
highest decision level of the
literals in C,
• it then starts assigning to FALSE
the unassigned literals of C′ until a
new conﬂict is encountered.
20

27. Key Features of Modern DPLL-Based SAT Solvers
• Variable selection heuristic
• Clause learning
• The watched literals
scheme
• Conﬂict-directed
Backjump
• Fast backjump
• Assignment stack
shrinking
• Cnﬂict clause minimization
• Randomized restrts
• The idea is to try to reduce the size of a
learned conﬂict clause C by repeatedly
identifying and removing any literals of
C that are implied to be FALSE when
the rest of the literals in C are set to
FALSE.
a ∨ b, ¬a ∨ c
b ∨ c 21

28. Key Features of Modern DPLL-Based SAT Solvers
• Variable selection heuristic
• Clause learning
• The watched literals
scheme
• Conﬂict-directed
Backjump
• Fast backjump
• Assignment stack
shrinking
• Cnﬂict clause minimization
• Randomized restrts
• It allows a SAT solver to arbitrarily stop
the search and restart their branching
process from decision level zero.
• Most of the current SAT solvers, employ
aggressive restart strategies,
sometimes restarting after as few as 20
to 50 backtracks.
22

29. Clause Learning and Iterative DPLL
23

30. Clause Learning and Iterative DPLL
23

31. Clause Learning and Iterative DPLL
23

32. Clause Learning and Iterative DPLL
23

33. Clause Learning and Iterative DPLL
24

34. Clause Learning and Iterative DPLL
24

35. Clause Learning and Iterative DPLL
24

36. Clause Learning and Iterative DPLL
24

37. Clause Learning and Iterative DPLL
24

38. Cleause Learning
• The idea of clause learning came from Artiﬁcial Intelligence;
• seeking to improve backtrack search algorithms by generating
explanations for failure points,
• For general constraint satisfaction problems the explanations are
called lconﬂictszor lno-goodsz.
• In early works, clause learning could only obtain unusable clauses,
but various studies have led to success.
25

39. Implication graph
Implication graph
The implication graph G at a given stage of DPLL is a directed
acyclic graph with edges labeled with sets of clauses.
26

40. How to build implication graph
1. Create a node for each decision literal, labeled with that literal.
2. If there is an unit clause C when regarding nodes as the partial
assingment;
C =
k

i=1
li
∨ l, where {li
} are assigned to FALSE
• Add a node l (if not in G)
, l) for all i = 1, . . . , k (if not in G)
• These edges are labeld with C to specify the cause of implication.
3. Repeat step 2 while no such clause found.
4. Add a special ’conﬂict’ node ¯
Λ. For any variable x s.t. x, ¬x are both
in G, add edges (x, ¯
Λ) and (¬x, ¯
Λ).
27

41. Conﬂict on Implication Graph
• An implication graph may not contain a conﬂict at all, or it may
contain many conﬂict variables and several ways of deriving any
single literal.
28

42. Conﬂict Graph
A conﬂict graph H = (VH
, EH
) is a subgraph of the implication graph
with the following propeties:
• H contains ¯
Λ and exactly one conﬂict variable.
• ∀l ∈ VH
have a path to ¯
Λ
• ∀l ∈ VH
\
{
¯
Λ
}
satisﬁes either of the following:
1. l is a decision literal
2. ∃¬l1
, . . . , ¬lk
∈ VH
s.t. ∀¬li
holds (¬li
, l) ∈ EH
and there exists a
known clause (l1
∨ · · · ∨ lk
∨ l)
29

43. Conﬂict Graph
A conﬂict graph H = (VH
, EH
) is a subgraph of the implication graph
with the following propeties:
• H contains ¯
Λ and exactly one conﬂict variable.
• ∀l ∈ VH
have a path to ¯
Λ
• ∀l ∈ VH
\
{
¯
Λ
}
satisﬁes either of the following:
1. l is a decision literal
2. ∃¬l1
, . . . , ¬lk
∈ VH
s.t. ∀¬li
holds (¬li
, l) ∈ EH
and there exists a
known clause (l1
∨ · · · ∨ lk
∨ l)
29

44. Reason/Conﬂict Side
30

45. Learning Scheme - Relsat
• Relsat uses the cut whose conﬂict side consists of all implied
variables at the current decision level.
• The learnt clause (¬a ∨ ¬b) has exactly one variable from the current
decision level (i.e. b).
• After learning it and backtraking until unassigning b (to decision
level 2), ﬁlpping the value b to ¬b.
31

46. Learning Scheme - Relsat
• Relsat uses the cut whose conﬂict side consists of all implied
variables at the current decision level.
• The learnt clause (¬a ∨ ¬b) has exactly one variable from the current
decision level (i.e. b).
• After learning it and backtraking until unassigning b (to decision
level 2), ﬁlpping the value b to ¬b.
32

47. Unique Implication Points (UIPs)
• An UIP (unique implication point) is a node at the latest decision
level s.t.
• every path from the lastest decision variable to conﬂict literal must go
through the node.
• Intuitively, an UIP is single reason at the current decisioin level.
• There can be more than one UIP , but not 0.
• The decision variabe is the obvious UIP.
33

48. Application of UIPs
• Learning scheme with UIP:
• relsat uses the decision variable as UIP
• Grasp and zChaff use FirstUIP, which is the UIP closest to the conﬂict
variable.
• Grasp also uses all UIPs to learn multiple clauses.
• kUIP – Generalized UIP to k previous decision levels.
• However, the study showed 1UIP is quite robust and outperforms all
other schemes they consider on most of the benchmarks.
34

49. A Proof Complexity Perspective

50. Propositional proof complexity
• Propositional proof complexity is the study of the proofs of validity
of mathematical statements expressed in a Boolean form.
• A propositional proof system is an algorithm A
Input A propositional statement S and a purported proof π
Returns Rejection / Acceptance
• The crucial property of A is that:
• For all invalid S, A rejects the pair (S, π) for all π
• For all valid S, A accepts the pair (S, π) for some π
35

51. Resolution
• Resolution is a very simple proof system with only one rule.
a ∨ b, ¬a ∨ c
b ∨ c
• Repeated application of this rule can derive an empty disjunction iff
the initial formula is unsatisﬁable;
• Such a derivation serves as a proof of unsatisﬁability of the formula.
• There are several resolution types due to the implementaion
restriction.
36

52. General Resolution
1 Paul Beame University of Washington Satisﬁability and Unsatisﬁability: Proof Complexity and Algorithms.
http://slideplayer.com/slide/4978670/
37

53. Tree-like Resolution
1 Paul Beame University of Washington Satisﬁability and Unsatisﬁability: Proof Complexity and Algorithms.
http://slideplayer.com/slide/4978670/
38

54. Variation of Resolution
• Tree-like resolution uses non-empty derived clauses exactly once in
the proof and is equivalent to an optimal DPLL procedure.
• Regular resolution allows any variable to be resolved upon at most
once along any “path” in the proof from an initial clause to Λ,
allowing (restricted) reuse of derived clauses.
• While these and other reﬁnements are sound and complete as proof
systems, they differ vastly in efﬁciency.
39

55. SAT Solver as Proof System
• Most of today`s complete SAT solvers implement a subset of the
resolution proof system.
• Till recently, it was not clear:
• Where do they ﬁt in the proof system hierarchy?
• How do we compare them to reﬁnements of resolution, e.g. regular
resolution?
• Clause learning and random restarts was considered to be
important.
• Despite overwhelming empirical evidence, for many years not much
was known of the ultimate strengths and weaknesses of the two.
40

56. FIrstNewCut
• Beame, Kautz, and Sabharwal answered several of these questions
in a formal proof complexity framework.
• They characterized clause learning as a proof system called CL, and
proposed a new learning scheme called FirstNewCut
• FirstNewCut can provide exponentially shorter proofs than
reﬁnement of general resolution satisfying a natural
self-reduction property.
• These include regular and ordered resolution.
• These are already known to be much stronger than the ordinary DPLL
procedure.
• They also showed that a slight variant of clause learning with
unlimited restarts is as powerful as general resolution itself.
41

57. Ralation Between SAT/UNSAT Proof Complexity
• INTEREST is a familiy of unsatisﬁable formulas, BECAUSE only
proofs of unsatisﬁability can be large.
• The minimum proofs of satisﬁability are O(n).
• However, in practice, many formulas are satisﬁable.
• Achlioptas, Beame, and Molloy have shown
• Negative proof complexity results for unsatisﬁable formulas can be
used;
• for the run-time lower bounds of speciﬁc inference algorithms,
especially DPLL, running on satisﬁable formulas as well.
• The key observation in their work is that before hitting a satisfying
assignment, an algorithm is very likely to explore a large
unsatisﬁable part of the search space due to uninformative initial
assignments.
42

58. Branching Heuristic for SAT/UNSAT proofs.
• Proof complexity does not capture everything we intuitively mean by
the power of a reasoning system because it says nothing about how
difﬁcult it is to ﬁnd shortest proofs.
• However, it is a good notion with which to begin our analysis
because the size of proofs provides a lower bound on the running
time of any implementation of the system.
Negative Positive
Proofs Size Must be large Small proofs exist
Perfect Branching Hopeless Possible
43

59. DPLL and Tree-like Resolution
For a CNF formula F, the size of the smallest DPLL refutation of F is equal
to the size of the smallest tree-like resolution refutation of F.
• The best performances are the same.
• Considering a propositional proof system CL is exponentially
stronger than tree-like resolution, clause learning is very effective.
44

60. Natural Reﬁnement / Proper
• CS
(F) : the length of a shortest refutation of a formula F under a
proof system S
• tree-like, regular, linear, positive, negative, semantic, and ordered
resolution are natural reﬁnements.
• tree-like, reglar, and ordered resolution are exponentially-proper.
45

61. The Power of CL
• (pros) There are some formulas that CL is more effective than some
exponentially-propered resolutions.
• (cons) CL may not be able to simulate all regular resolution proofs!
46

62. CL– with non-redundant vs. RES
• CL– – : CL allowed to branch on a literal whose value is already
assigned.
• Such a branch can:
• lead to an immediate conﬂict;
• allow one to learn a key conﬂict clause that would otherwise have
not been learned.
• A clause learning scheme will be called non-redundant if on a
conﬂict, it always learns a clause not already known.
• Most of the practical clause learning schemes are non-redundant.
47

63. Symmetry Breaking

64. Symmetry
• Symmetry in real-world problems:
• in FPGA routing, all available wires or channels used for connecting two
switch boxes are equivalent;
• in circuit modeling, all gates of the same ltypezare interchangeable;
• in planning, all identical boxes that need to be moved from city A to
city B are equivalent;
• in multi-processor scheduling, all available processors are equivalent;
• in cache coherency protocols in distributed computing, all available
identical caches are equivalent.
• A key property of such objects is that when selecting k of them, we
can choose any k, without loss of generality .
• We would like to treat without-loss-of-generality reasoning in an
automatic fashion.
48

65. Semantic Meaning
• A CNF formula consists of constraints over different kinds of
variables that typically represent tuples of these high level objects.
• During the problem modeling phase, we could have a Boolean
variable zw,c that is TRUE iff the ﬁrst end of wire w is attached to
connector c.
• When the formula is converted into DIMACS format, the semantic
meaning of the variables is discarded.
49

66. Symmetry Breaking Predicates (SBPs)
• Symmetry Breaking Predicates (SBPs) is a method to deal with this
problem:
• Symmetries can be broken by adding one lexicographic ordering
constraint, called llex-constraintsz.
• This constraint removes all redundant solutions but the lexically-ﬁrst
solutions.
• The idea is to identify the set of permutations of variables {πi
},
where πi
keeps the formula unchanged.
• Let σ be a lexically-ﬁrst solution, πi
(σ) is also a satisfying assignment.
50

67. Symmetry in SAT – Graph Isomorphism
• Shatter by Aloul et al. improves the number of lex-constraints from
O(n2) to O(n), where n is the number of variables.
• They use techniques of graph isomorphism to generate SBPs.
• Since there is no polynomial-time algorithms to praph isomorphism,
the number os SBPs to break all symetries can be exponential.
• Shatter handled this by discarding “large” symmetries.
51

68. Symmetry in SAT – Non-CNF Formulation
• Utilizing non-CNF formulations , known as pseudo-Boolean
inequalities, is another approach for symmetry problem.
• PBS by Aloul et al.
• pbChaff by Dixon et al.
• Galena by Chai and Kuehlmann
• Instead of resolution system, they use Cutting Planes proof system.
• It is difﬁcult to implement, so pseudo-Boolean solvers often implement
only a subset of it.
52

69. 2.3 SAT Solver Technology -
Incomplete Methods

70. Incomplete SAT Solver
• What about an incomplete method?
• It can provide the model of satisﬁable formula, but it is not always
possible to solve;
• It can’t say anything about unsatisﬁability.
• Incomplete methods signiﬁcantly outperform DPLL-base methods
for some problems.
• They are generally based on stochastic local search, while typical
complete methods are based on exhaustive branching and
backtracking search.
53

71. Local Search on Satisﬁability
• Original impetus of local search on satisﬁability problems was the
successful application for ﬁnding solutions to N-queens problems.
• Researchers regarded N-queens was an easy problem.
• They felt that it would fail in practice for SAT.
54

72. GSAT
• GSAT is based on randomized local search technique.
• Initialize truth assignment randomlly.
• Then, greedily ﬂip the assignment of the variable that leads to the
greatest decrease in the total number of UNSAT clauses.
• Finish the search if a model is found, otherwise repeat ﬂiping.
55

73. GSAT
• GSAT outperformed DPLL-based solvers on various SAT problem.
• From randomly generated problems to SAT encodings of graph
coloring problems.
• Most time is spent searching plateaus
• Plateaus are possible sideways movements that don’t increase/
decrease the # of UNSAT clauses too much.
56

74. Walksat
57

75. Walksat
• Walksat is a reﬁnement of GSAT, employing random walk moves
of a standard Metropolis search.
• For the purpose to increase randomness.
• Always select the variable to ﬂip from unsatisﬁed clauses.
• If there is a variable that can be ﬂipped without affecting satisﬁed
clauses, pick it up.
• Otherwise, perform a p-greedy strategy.
• This idea is inpired by the O(n2) randomized algorithm for 2-SAT.
• It guarantees that one can reach a model by selecting only from
variables on unsatiﬁed clauses.
58

76. Walksat vs. GSAT
http://ai.cs.unibas.ch/_files/teaching/fs16/ai/slides/ai32.pdf
59

77. Phase Transition Phenomenon

78. Clause-to-Variable Ratio
• Random k-SAT with n variables and m clauses.
• For each clause, select k unique variables uniformaly.
• For each variable, negate it with probablity 0.5
• clause-to-variable ratio
α =
m
n
• The difﬁculity of random k-SAT problems is characterized by α.
• For random 3-SAT, the run-time of problems with α > 4.26 tends to be
much larger than the opposite one.
60

79. Phase Transition
• Left: The run-time
• Right: The ratio of unsatisﬁable problem
• As n grows, the phenomenon becomes shaper and shaper.
61

80. 2.3.2 Survey Propagation

81. Survey Propagation
• Survey Propagation (SP) is a new algorithm for solving hard
combinational problems.
• It showed good results on random 3-SAT instances with one million
variables.
• SP has a connection to Blief Propagation (BP):
• BF is a method to calculate marginal probability on graphical models
(factor-graph).
• Both using an iterative process of local “message” update.
• Unfortunately, the success of SP is limited to random SAT instances.
62

82. Survey Propagation
http://slideplayer.com/slide/4559400/
63

83. 2.4 Runtime Variance and
Problem Structure

84. Exceptionally Hard Problem
• The performance of backtrack-style modhods can vary:
• The way to select the next variable to branch on.
• In what order the possible values are assigned to the variable.
• There are exceptionally hard problems.
• They ignore the “easy-hard-easy” phenomenon.
• They occur in the under-constrained area.
• However, subsequent research showed that such instances are not
inherently difﬁcult.
• Simply renaming the variables or by considering a different search
heuristic such can easily solved such problems.
• The “hardness” is in the combination of the instance with the search
method, rather than in the instances per se2.
• This is why the median is used to measure the “hardness” of problems.
2Adverb: by or in itself or themselves; intrinsically.
64

85. Fat and Heavy Tailed Behavior
• Runtime distributions of search methos provides a better
characterization.
• Quite often complete backtrack search methods exhibit fat and
heavy-tailed behavior.
• Looks like lognormal, Wibull dist.
• Such distributions can be observed:
• when running a deterministic backtracking procedure on a distribution
of random instances;
• by repeated runs of a randomized backtracking procedure on a single
instance.
65

86. Runtime of CSP
Summarizing CSP Hardness with Continuous Probability Distributions. Proceedings of the National Conference on Artiﬁcial Intelligence. 327-333.
66

87. Heavy Tail
• fat-tailedness –
• kurtosis = µ4
µ2
2
, where µ2
, µ4
are 2nd and 4th moment resp.
• A distribution with igh central peak and long tails has generally large
kurtosis.
• A heavy-taild distribution is “heavier” than fat-tailed one.
• It have some inﬁnite moments, e.g. mean, variance, etc.
• DPLL style complete backtrack search methods have been shown to
exhibit heavy-tailed behavior.
• They can solve some problems so quickly, meanwhile do some
problems super slowly.
67

88. 2.4.2 Backdoors

89. Backdoors
• The idea of heavy-tailed behavior comes from backdoor variables .
• Variables which gives us a polynomial subproblem when set.
• This is for explaining how a backtrack search method can get “lucky”
on certain runs.
• When backdoor variables are identiﬁed early on in the search and set
the right way, we say it’s lucky.
• The deﬁnition of a backdoor depends on a particular algorithm,
reffered to as sub-solver.
• It solves a tractable sub-case of the general constraint satisfaction
problem.
68

90. Sub-Solver and Backdoor
• Intuitively, the backdoor are variables such that when set correctly,
the sub-solver can solve the remaining problem.
• A stronger notion of backdoors, strong backdoor, considers both
satisﬁable and unsatisﬁable problems.
69

91. Backdoor Detection Complexity
• Nishimura et al. provided that detecting backdoor sets where the
sub-solver solves Horn or 2-CNF formulas, both of which are linear
time problems.
• They proved that the detection of such a strong backdoor set is
ﬁxed-parameter tractable;
• while the detection of a weak backdoor set is not.
• Dilkina et al. proved that the complexity of backdoor detection
jumps from NP to NP-hard/coNP-hard when adding certain
obvious inconsistency checks to the underlying class.
70

92. Backdoor Size
• The backdoor size of random formulas is large.
• For random 3-SAT near the phase transition, about 30% of variables are
backdoors.
• That’s why DPLL-based search algorithm can’t solve these problems
• Structured (real-world) problems have very few backdoor variables.
71

93. 2.4.3 Restarts

94. Restart Strategy
• Another way to exploit heavy-tailed behavior is to add restarts to
backtracking.
• Gomes et al. proposed randomized rapid restarts (RRR) to take
advantage of heavy-tailed behavior and boost the efﬁciency of
complete backtrack search.
• They also showed that restart strategy with a ﬁxed cutoff
eliminates heavy-tail behavior and has ﬁnite moments.
72

95. Luby’s Restarts
• Luby et al. showed that:
• When the distriution is known, the optimal restart policy is the ﬁxed
cutoff.
• When there is no a priori to the distribution, a universal strategy
minimizes the expected cost.
1, 1, 2, 1, 1, 2, 4, 1, 1, 2, 4, 8, . . .
• Theoretically the universal strategy is within a log factor of the optimal
one, but the convergence is too slowly in practice.
• State-of-the-art SAT solver’s restart:
• Default cutoff value, which is increased linearly
73

96. Bayesian Framework
• Horvitz et al. introduced a Bayesian framework for learning
predictive models of randomized backtrack solvers based on
“knowledgeless” situation.
• Kautz et al. extended it, enabling restart policies to use information
of real-time observations about a solver.
• They also showed the optimal policy for dynamic restarts.
• It can be implemented using dynamic programming technique.
74