online generation via offline selection - low dimensional linear … · 2018-03-08 · online...
TRANSCRIPT
Online generation via offline selection- Low dimensional linear cuts from QP SDP relaxation -
Radu Baltean-Lugojan
Ruth MisenerComputational Optimisation Group
Department of Computing
Pierre Bonami
Andrea TramontaniIBM Research CPLEX Team
https://www.dropbox.com/s/sfpiy9godzqo2t3/preprint.pdf?dl=0
2018/03/07
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 1 / 23
Deterministically solving non-convex QP
QP :
minx
xTQx + cT x
s.t. Ax ≤ b,
x ∈ [0, 1]N
(QP)
Prior work: Branch & Cut with relaxations
RLT McCormick + ext., e.g. triangle ineq. Bonami et al. [2016]
SDP/SOCP/Convex Dong [2016], Saxena et al. [2011],Buchheim and Wiegele [2013], Zheng et al. [2011], Bao et al.[2011], Anstreicher [2009], Chen and Burer [2012]
LP relaxation of SDP (typically high dimensional) e.g. Qualizzaet al. [2012], Sherali and Fraticelli [2002]
Edge-concave Bao et al. [2009], Misener and Floudas [2012]
This work: Off-line (learned) selection of low-dim. LP cuts from SDP
Develop online strong low dimensional linear cuts;
Offline cut selection via neural net estimator trained “a priori”;
Cheaply outer-approximate SDP esp. in combination with other low-dim cuts(RLT, triangle, edge-concave, Boolean quadric polytope).
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 2 / 23
SDP & RLT relaxations(Anstreicher, J Glob Optim, 2009)
minx
xTQx + cT x
Ax ≤ b,
x ∈ [0, 1]N
⇒ minx
Q • X + cT x
Ax ≤ b,
X � xxT SDP relaxation
Xii ≤ xi
Xij − xi − xj ≥ −1
Xij − xi ≤ 0 RLT relaxation
Xij − xj ≤ 0
x ∈ [0, 1]N
X ∈ [0, 1]N×N
Xij = Xji ,
Q • X is the matrix inner product Q • X =∑N
i ,j=1 Qij · Xij ,
SDP ≡ Semidefinite programming,
RLT ≡ Reformulation linearisation technique.
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 3 / 23
SDP & RLT relaxations(Anstreicher, J Glob Optim, 2009)
minx
Q • X + cT x
Ax ≤ b,
X = xxT
x ∈ [0, 1]N
X ∈ [0, 1]N×N
⇒ minx
Q • X + cT x
Ax ≤ b,
X � xxT SDP relaxation
Xii ≤ xi
Xij − xi − xj ≥ −1
Xij − xi ≤ 0 RLT relaxation
Xij − xj ≤ 0
x ∈ [0, 1]N
X ∈ [0, 1]N×N
Xij = Xji ,
Q • X is the matrix inner product Q • X =∑N
i ,j=1 Qij · Xij ,
SDP ≡ Semidefinite programming,
RLT ≡ Reformulation linearisation technique.
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 3 / 23
SDP & RLT relaxations(Anstreicher, J Glob Optim, 2009)
minx
Q • X + cT x
Ax ≤ b,
X = xxT
x ∈ [0, 1]N
X ∈ [0, 1]N×N
⇒ minx
Q • X + cT x
Ax ≤ b,
X � xxT SDP relaxation
Xii ≤ xi
Xij − xi − xj ≥ −1
Xij − xi ≤ 0 RLT relaxation
Xij − xj ≤ 0
x ∈ [0, 1]N
X ∈ [0, 1]N×N
Xij = Xji ,
Q • X is the matrix inner product Q • X =∑N
i ,j=1 Qij · Xij ,
SDP ≡ Semidefinite programming,
RLT ≡ Reformulation linearisation technique.
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 3 / 23
SDP & RLT relaxations(Anstreicher, J Glob Optim, 2009)
minx
Q • X + cT x
Ax ≤ b,
X = xxT
x ∈ [0, 1]N
X ∈ [0, 1]N×N
⇒ minx
Q • X + cT x
Ax ≤ b,
X � xxT SDP relaxation
Xii ≤ xi
Xij − xi − xj ≥ −1
Xij − xi ≤ 0 RLT relaxation
Xij − xj ≤ 0
x ∈ [0, 1]N
X ∈ [0, 1]N×N
Xij = Xji ,
Q • X is the matrix inner product Q • X =∑N
i ,j=1 Qij · Xij ,
SDP ≡ Semidefinite programming,
RLT ≡ Reformulation linearisation technique.
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 3 / 23
SDP & RLT relaxations(Anstreicher, J Glob Optim, 2009)
minx
Q • X + cT x
Ax ≤ b,
X = xxT
x ∈ [0, 1]N
X ∈ [0, 1]N×N
⇒ minx
Q • X + cT x
Ax ≤ b,
X � xxT SDP relaxation
Xii ≤ xi
Xij − xi − xj ≥ −1
Xij − xi ≤ 0 RLT relaxation
Xij − xj ≤ 0
x ∈ [0, 1]N
X ∈ [0, 1]N×N
Xij = Xji ,
Q • X is the matrix inner product Q • X =∑N
i ,j=1 Qij · Xij ,
SDP ≡ Semidefinite programming,
RLT ≡ Reformulation linearisation technique.
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 3 / 23
SDP & RLT relaxations: Point Packing(Anstreicher, J Glob Optim, 2009)
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 4 / 23
Schur’s complement & Decomposition for SDP relaxations
Schur’s complement
SDP Relaxation X = xxT ⇒ relax ⇒ X � xxT
Schur’s complement X � xxT ⇐⇒[
1 xT
x X
]� 0
Key Idea Consider smaller subsets
We have X = xxT , so XS = xS xTS for all S ⊂ 1,N, e.g.1 x1 x2 x3 x4
x1 X11 X12 X13 X14
x2 X21 X22 X23 X24
x3 X31 X32 X33 X34
x4 X41 X42 X43 X44
� 0 =⇒
1 x1 x2 x3
x1 X11 X12 X13
x2 X21 X22 X23
x3 X31 X32 X33
� 0
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 5 / 23
Sum-additive objective decomposition
Recall Power set Pn
If finite set S has |S | = n elements, then S has |Pn| = 2n subsets.
minx
Q • X + cT x
Ax ≤ b,
X = xxT
x ∈ [0, 1]N
X ∈ [0, 1]N×N
=⇒ minx
∑∀S∈Pn
Q ′S • XS + cT x
Ax ≤ b,
X = xxT
x ∈ [0, 1]N
X ∈ [0, 1]N×N
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 6 / 23
Initial RLT relaxation
minx
Q • X + cT x
Ax ≤ b,
X = xxT
x ∈ [0, 1]N
X ∈ [0, 1]N×N
⇒ minx
Q • X + cT x
Ax ≤ b,
Xij − xi − xj ≥ −1 ∀ i , j
Xij − xi ≤ 0 ∀ i , j
Xij − xj ≤ 0 ∀ i , j
Xij = Xji ∀ i , j
x ∈ [0, 1]N
X ∈ [0, 1]N×N
⇒ x , X
Cutting plane motivation
x is feasible in the space of the original QP,
For nonconvex QP, X may be infeasible in the original QP,
I find LP solvers easier to use than SDP solvers,
As in MIP & SAT, may want low-dimensional cutting planes.
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 7 / 23
A decomposition of the SDP relaxation for QP
(1) QP SDP relaxation
minx,X
Q · X + cT x
s.t. Ax ≤ b,[1 xT
x X
]� 0,
x ∈ [0, 1]N , Xii ≤ xi ∀i
(2) QP SDP relaxationat given point x
minX
f (X |x) = Q · X
s.t.
[1 xT
x X
]� 0, Xii ≤ xi ∀i
(3) Relaxed QP SDP relaxation at given x
minX
∑∀S∈Pn
fS(XS |xS)
s.t.
[1 xT
S
xS XS
]� 0 ∀S ∈ Pn, Xii ≤ xi ∀i ,
where Pn = {S ⊂ 1,N, |S | = n ≤ N} (*) and,
f (X |x) = Q · X =∑∀S∈Pn
QS · XS =∑∀S∈Pn
fS(XS |xS).
(4) nD-SDP sub-problemsParameters: n, S ,QS , xS
∀S ∈ Pn
f ∗S (X ∗S |xS) = min
XS
QS · XS
s.t.
[1 xT
S
xS XS
]� 0, Xii ≤ xi ∀i ∈ S
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 8 / 23
A decomposition of the SDP relaxation for QP
(1) QP SDP relaxation
minx,X
Q · X + cT x
s.t. Ax ≤ b,[1 xT
x X
]� 0,
x ∈ [0, 1]N , Xii ≤ xi ∀i
(2) QP SDP relaxationat given point x
minX
f (X |x) = Q · X
s.t.
[1 xT
x X
]� 0, Xii ≤ xi ∀i
(3) Relaxed QP SDP relaxation at given x
minX
∑∀S∈Pn
fS(XS |xS)
s.t.
[1 xT
S
xS XS
]� 0 ∀S ∈ Pn, Xii ≤ xi ∀i ,
where Pn = {S ⊂ 1,N, |S | = n ≤ N} (*) and,
f (X |x) = Q · X =∑∀S∈Pn
QS · XS =∑∀S∈Pn
fS(XS |xS).
(4) nD-SDP sub-problemsParameters: n, S ,QS , xS
∀S ∈ Pn
f ∗S (X ∗S |xS) = min
XS
QS · XS
s.t.
[1 xT
S
xS XS
]� 0, Xii ≤ xi ∀i ∈ S
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 8 / 23
A decomposition of the SDP relaxation for QP
(1) QP SDP relaxation
minx,X
Q · X + cT x
s.t. Ax ≤ b,[1 xT
x X
]� 0,
x ∈ [0, 1]N , Xii ≤ xi ∀i
(2) QP SDP relaxationat given point x
minX
f (X |x) = Q · X
s.t.
[1 xT
x X
]� 0, Xii ≤ xi ∀i
(*) We take n = 3, 4, 5in our experiments.
(3) Relaxed QP SDP relaxation at given x
minX
∑∀S∈Pn
fS(XS |xS)
s.t.
[1 xT
S
xS XS
]� 0 ∀S ∈ Pn, Xii ≤ xi ∀i ,
where Pn = {S ⊂ 1,N, |S | = n ≤ N} (*) and,
f (X |x) = Q · X =∑∀S∈Pn
QS · XS =∑∀S∈Pn
fS(XS |xS).
(4) nD-SDP sub-problemsParameters: n, S ,QS , xS
∀S ∈ Pn
f ∗S (X ∗S |xS) = min
XS
QS · XS
s.t.
[1 xT
S
xS XS
]� 0, Xii ≤ xi ∀i ∈ S
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 8 / 23
A decomposition of the SDP relaxation for QP
(1) QP SDP relaxation
minx,X
Q · X + cT x
s.t. Ax ≤ b,[1 xT
x X
]� 0,
x ∈ [0, 1]N , Xii ≤ xi ∀i
(2) QP SDP relaxationat given point x
minX
f (X |x) = Q · X
s.t.
[1 xT
x X
]� 0, Xii ≤ xi ∀i
(*) We take n = 3, 4, 5in our experiments.
(3) Relaxed QP SDP relaxation at given x
minX
∑∀S∈Pn
fS(XS |xS)
s.t.
[1 xT
S
xS XS
]� 0 ∀S ∈ Pn, Xii ≤ xi ∀i ,
where Pn = {S ⊂ 1,N, |S | = n ≤ N} (*) and,
f (X |x) = Q · X =∑∀S∈Pn
QS · XS =∑∀S∈Pn
fS(XS |xS).
(4) nD-SDP sub-problemsParameters: n, S ,QS , xS
∀S ∈ Pn
f ∗S (X ∗S |xS) = min
XS
QS · XS
s.t.
[1 xT
S
xS XS
]� 0, Xii ≤ xi ∀i ∈ S
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 8 / 23
Cut selection from nD-SDP sub-problems at given x
Generate outer-approximatehyperplanes for each nD-SDP
∀S ∈ Pn : fS(X ∗S |xS) = minXS
QS · XS ,
s.t.
[1 xT
S
xS XS
]� 0, Xii ≤ xi ∀i ∈ S
(Given n, S , parametric on QS , xS)
Combinatorial explosion!
# sub-problems =(Nn
),
need quick optimal selectionof a few sub-problems forgenerating hyperplanes
Selection of nD-SDP sub-problems to cut
Assume current sol. X , x with S sub-problem objective fS(XS , xS)
Order/select top S (to cut off X , x via XS , xS) byestimated objective improvement on X , ObjImpX(S):(
fS(X ∗S |xS)− fS(XS , xS) ≈
)f ∗S (QS , xS)− fS(XS , xS) = f ∗S (QS , xS)− QS · XS ,
( ObjImpX(S) )
where f ∗S (QS , xS) is a fast estimator of f ∗S (xS ,X∗S ).
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 9 / 23
Generating cutting hyperplanes at given XS , xSfor one nD-SDP sub-problem
Generating hyperplanes
Could generate separatinghyperplane tangent to SDP cone.
In practice, generate cuts fromnegative eigenvalues,Qualizza et al. [2012]:
vTk
[1 xT
S
xS XS
]vk =
= vTk λkvk = λk < 0.
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 10 / 23
Data for learning estimator f ∗S (QS , xS)
Estimator only as good as data is sampled
Critical that sample space {QS , xS} is uniform in important features forany learner to generalize well
Features: xS (positioning), eigenvalues {λi} of QS (positive definiteness)
Data sampling
Uniform xS ∈ [0, 1]n
Uniform QS elements Uniform {λi} and orthonormal basis
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 11 / 23
Neural network as estimator f ∗S (QS , xS)
QS →
xs → = f ∗S (QS , xS)
Why neural nets?
f ∗S is a nonlinear regression mapping (collection of convex surfaces)
Neural nets (NN): regression via trained hidden layers, no need to specify model
Flexible model + lots of well sampled data ≈ low variance and bias
Architecture (for 3-5D cases)
3-4 hidden layers with 50-64 neurons
tanh activation (well-scaled with our data) in hidden layers
Trained by 5-fold cross-validation on 1M data pts. with scaled conjugate gradient
Early stop on low gradient (10−5)
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 12 / 23
Engineering Non-linear activation function
Non-linear activation function in the hidden layers
Hyperbolic tangent (tanh) vs. Rectified linear unit (ReLU):
tanh faster to train,
tanh has a bounded output of [−1, 1],
tanh has a significantly positive derivative on the domain [−4, 4],
tanh is symmetric around 0.
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 13 / 23
Are the domain and co-domain bounds okay?Lemma. If all eigenvalues of a square matrix M are bounded within[−m,m] then any element in M is bounded within [−m,m].
Let M ∈ Rn×n with eigenvalues and eigenvectors λi and vi for ∀i ∈ 1, n,and let vij be the j-th element of vi . Then the absolute value of elementMij on the i-th row and j-th column can be expressed as:
|Mij | =
∣∣∣∣∣∣∑k∈1,n
vkivjkλk
∣∣∣∣∣∣≤∑k∈1,n
|vkivjk | · |λk |
≤∑k∈1,n
((v2ki + v2
jk)/2) · |λk |
≤∑k∈1,n
((v2ki + v2
jk)/2)m = m
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 14 / 23
Neural network training (3D case) - Results on .5M test set
3D-SDP trained NN (9 inputs layer + 3 hidden layers x 50 neurons)
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 15 / 23
Neural network training (4D case) - Results on .5M test set
4D-SDP trained NN (14 inputs layer + 3 hidden layers x 64 neurons)
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 16 / 23
Neural network training (5D case) - Results on .5M test set
5D-SDP trained NN (20 inputs layer + 4 hidden layers x 64 neurons)
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 17 / 23
Cut selection in practice
BoxQP spar020-100-1
4 rounds of cuts, n = 3,100 S sub-problems selectedby ObjImpX (S) (black lines)
Better bound by few cuts
After each round,overall bound improving asObjImpX (S) ↘ across S
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 18 / 23
Cut selection in practice
BoxQP spar020-100-1
4 rounds of cuts, n = 3,100 S sub-problems selectedby ObjImpX (S) (black lines)
Limits - NN errorAfter a few rounds, asObjImpX (S) ↘ NN error:
Incorrect selection of S whereObjImpX (S) < 0
Missed selection of S whereObjImpX (S) > 0
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 19 / 23
Results for different problem sizes/densities (n = 3)
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 20 / 23
Conclusion
Pluses
Offline cut selection
Good bounds with fewlow-dimensional linear cuts
Easily integrate SDP-based linearcuts with other cut classes inBranch&Cut
Minuses
Weaker bounds then full SDP orconvex-based relaxations
Best complemented by other linearcutting planes (e.g. RLT-based)
Limited to low-dimensionality cuts
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 21 / 23
References I
Anstreicher. Semidefinite programming versus the reformulation linearizationtechnique for nonconvex quadratically constrained quadratic programming. JGlob Optim, 43(2-3):471, 2009.
Bao, Sahinidis, and Tawarmalani. Multiterm polyhedral relaxations for nonconvex,quadratically constrained quadratic programs. Optim Met Softw, 24(4-5):485,2009.
Bao, Sahinidis, and Tawarmalani. Semidefinite relaxations for quadraticallyconstrained quadratic programming: A review and comparisons. Math Prog,129(1):129, 2011.
Bonami, Gunluk, and Linderoth. Solving box-constrained nonconvex quadraticprograms. 2016.
Buchheim and Wiegele. Semidefinite relaxations for non-convex quadraticmixed-integer programming. Math Prog, 141(1-2):435, 2013.
Chen and Burer. Globally solving nonconvex quadratic programming problems viacompletely positive programming. Math Prog Comput, 4(1):33, 2012.
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 22 / 23
References IIDong. Relaxing nonconvex quadratic functions by multiple adaptive diagonal
perturbations. SIAM J Optim, 26(3):1962, 2016.
Misener and Floudas. Global optimization of mixed integer quadraticallyconstrained quadratic programs (MIQCQP) through piecewise linear andedge-concave relaxations. Math Prog B, 136:155, 2012.
Qualizza, Belotti, and Margot. Linear programming relaxations of quadraticallyconstrained quadratic programs. In Lee and Leyffer, editors, Mixed IntegerNonlinear Programming, page 407. 2012.
Saxena, Bonami, and Lee. Convex relaxations of non-convex mixed integerquadratically constrained programs: Projected formulations. Math Prog, 130:359, 2011.
Sherali and Fraticelli. Enhancing RLT relaxations via a new class of semidefinitecuts. J Glob Optim, 22(1-4):233, 2002.
Zheng, Sun, and Li. Convex relaxations for nonconvex quadratically constrainedquadratic programming: matrix cone decomposition and polyhedralapproximation. Math Prog, 129(2):301, 2011.
Baltean-Lugojan, Misener et al. Online cuts by offline selection 2018/03/07 23 / 23