optimal control of parabolic equations { a spectral

OPTIMAL CONTROL OF PARABOLIC EQUATIONS – A SPECTRALCALCULUS BASED APPROACH

LUKA GRUBISIC∗, MARTIN LAZAR† , IVICA NAKIC‡ , AND MARTIN TAUTENHAHN§

Abstract. In this paper we consider a constrained parabolic optimal control problem. The cost functionalis quadratic and it combines the distance of the trajectory of the system from the desired evolution profiletogether with the cost of a control. The constraint is given by a term measuring the distance between thefinal state and the desired state towards which the solution should be steered. The control enters the systemthrough the initial condition. We present a geometric analysis of this problem and provide a closed-formexpression for the solution. This approach allows us to present the sensitivity analysis of this problem basedon the resolvent estimates for the generator of the system. The numerical implementation is performedby exploring efficient rational Krylov approximation techniques that allow us to approximate a complexfunction of an operator by a series of linear problems. Our method does not depend on the actual choiceof discretization. The main approximation task is to construct an efficient rational approximation of ageneralized exponential function. It is well known that this class of functions allows exponentially convergentrational approximations, which, combined with the sensitivity analysis of the closed form solution, allows usto present a robust numerical method. Several case studies are presented to illustrate our results.

Key words. optimal control, parabolic equations, convex optimization, Krylov spaces, functions ofoperators, spectral calculus

AMS subject classifications. 49N05, 49K20, 49M41, 65F60

1. Introduction. In this paper we consider an optimal control problem for a generallinear parabolic equation governed by a self-adjoint operator on an abstract Hilbert space.The task consists in identifying a control (entering the system through the initial condition)that minimizes a given cost functional, while steering the final state at time T > 0 closeto the given target. The functional comprises of the control norm and an additional termpenalizing the distance of the state from the desired trajectory.

This can be considered as an inverse problem (of initial source identification) for parabolicequations from the optimal control viewpoint. It is an important, but also numericallychallenging issue due to the dissipative nature of such equations. It has been addressed bydifferent methods, some including optimization and optimal control techniques [10, 19, 18, 6].

Optimal control problems with control in initial conditions are less investigated thandistributed or boundary control problems. The latter contain controls acting along the wholetime interval [0, T ]. Such a setting is not the subject of this paper, but we refer an interestedreader to [25], containing a quite clear and detailed exposition of the topic.

Our problem can be treated by exploring the Fenchel-Rockafellar duality for convexoptimisation (cf. [21, Section 3.6]). If the cost functional consists of the control cost only, theproblem is reduced to the classical minimal norm control problem which can be treated bythe Hilbert uniqueness method. In the seminal work [5] this approach is used to transformthe boundary control problems into identification problems for initial data of the adjoint

∗University of Zagreb, Department of Mathematics, Croatia ([email protected], https://www.pmf.unizg.hr/math/luka.grubisic).†University of Dubrovnik, Department of Electrical Engineering and Computing, Croatia ([email protected],

http://www.martin-lazar.from.hr).‡University of Zagreb, Department of Mathematics, Croatia ([email protected], https://web.math.pmf.unizg.

hr/∼nakic).§Universitat Leipzig, Fakultat fur Mathematik und Informatik, Germany (martin.tautenhahn@uni-

leipzig.de, https://home.uni-leipzig.de/mtau/).

1

arX

iv:2

109.

1378

3v1

[m

ath.

OC

] 2

8 Se

p 20

21

mailto:[email protected]

https://www.pmf.unizg.hr/math/luka.grubisic

https://www.pmf.unizg.hr/math/luka.grubisic


http://www.martin-lazar.from.hr


https://web.math.pmf.unizg.hr/~nakic

https://web.math.pmf.unizg.hr/~nakic



https://home.uni-leipzig.de/mtau/

heat equation, better suited to numerical methods than the original problems. A morerecent paper [12] generalizes this method by considering cost functional including the statein addition to the control. In order to explore efficient optimization methods, in both papersthe authors approximate an original problem with a constraint on the final state, by anunconstrained one containing a penalisation term. The solution is then obtained by lettingthe penalisation constant blow up. Similar techniques are applied in [4, 13]. However, thisapproach does not provide an a-priori estimate on the deviation of the final state from thegiven target.

In order to numerically recover the control minimising the functional of interest, most ofthe authors involve finite difference and/or finite element discretisation and employ someiterative scheme (e.g. conjugate gradient), usually including the dual problem. Classicalconvex optimization techniques in Hilbert spaces (e.g. [1, 21]) also provide iterative methodsthat can be applied to our problem. Of course, these iterative techniques come with asignificant computational cost, which increases with the system dimension.

In this paper we propose a different approach based on the spectral calculus for self-adjointoperators and a geometrical representation of the problem. First, we obtain closed-formexpression for the control solution as a function of the self-adjoint operator governing thedynamics of the system. This expression is almost explicit, up to a scalar factor ensuringthat the deviation of the final state from the given target is within the prescribed tolerance.Once the equation for this scalar unknown is solved, the method provides a direct, one-shotformula for the solution. Its numerical computation is achieved by exploring efficient rationalKrylov approximation techniques for resolvents from [3], by which one constructs a rationalapproximant of the aforementioned function of the operator.

The proposed method and the obtained formula are given in the abstract Hilbert spaceframework and can be applied to optimal control problems for a large class of linear parabolicPDEs for which there exist efficient resolvent approximation algorithms. To illustrate ourmethods we treat optimization problems for 1D and 2D heat equations.

Our approach is an extension of the result from [17], where the authors explore thespectral representation of the solution by eigenfunctions of the operator governing the systemdynamics. This eventually leads to an explicit expression (up to a scalar factor) of the optimalfinal state and the optimal control. The obtained formulae are spectrally decoupled meaningthat the n-th Fourier coefficient is fully expressed by the corresponding coefficients of thegiven data: the final target and desired trajectory. However, the practical implementation ofthe algorithm is constrained by the availability of the spectral decomposition of the operator.For general PDE operators with variable coefficients and/or acting on irregular domainsthe decomposition is in general not available or hard to construct. Also, this constructionrequires costly computations that can exceed the gain provided by the efficiency of theobtained formula.

On the other side, the method proposed in this paper is applicable to more complex set-tings. It allows to efficiently treat PDEs with variable coefficients and defined on complicateddomains. In addition, it is robust with respect to small perturbations of both the system andthe cost functional, and we provide estimates on deviation of the original solution from theperturbed one. This is quite important in applications, as in practice the models of interestare often not completely determined, subject to unknown or uncertain parameters, either ofdeterministic or stochastic nature. Furthermore, an expansion of a state in eigenfunctionstypically converges rather slowly except in very specific cases. In comparison, representationsof solutions using Krylov subspaces are much more efficient in the number of required terms.

The paper is organised as follows. In the next section we formulate the problem and state

2

the main result (Theorem 2.3). In Section 3 we present the sensitivity analysis which justifiesa finite-dimensional approximation of the problem. In Section 4 we present the rationalfunction approximation theory and discuss the stability of the finite element approximationof the problem. Further, we discuss the relationship between numerical rational functionscalculus as realized by the rkfit algorithm [3] and the approximation problem for thegeneralized exponential functions which appear as central for the study of the concretenumerical examples. In Section 5 we present 1D and 2D numerical examples which areoutside the scope of the original eigendecomposition method from [17]. Within the concludingremarks we discuss efficiency of the introduced method, open perspectives and comparisonto other approaches.

2. Setting of the problem and characterisation of the solution. Let H be aHilbert space and A be an upper-bounded self–adjoint operator in H with an upper boundκ, i.e. maxσ(A) ≤ κ. We denote by (St)t≥0 the semigroup generated by A. We consider forf ∈ L2((0,∞);H) and u ∈ H the Cauchy problem

(2.1)

y′(t) = Ay(t) + f(t), t > 0,

y(0) = u.

Note that the mild solution of (2.1) is given by

y(t) = Stu+

∫ t

0

Sτf(t− τ)dτ, t ≥ 0,

and is an element of L2((0,∞);H), see e.g. [9]. We say that the system (2.1) is controllableto a target state y∗ ∈ H in time T > 0 if there is u ∈ H such that

STu+

∫ T

0

Sτf(T − τ)dτ = y∗.

We say that the system (2.1) is approximately controllable in time T > 0 if for all y∗ ∈ Hand all ε > 0 there exists u ∈ H such that

(2.2)∥∥∥STu+

∫ T

0

Sτf(T − τ)dτ − y∗∥∥∥ ≤ ε.

Remark 2.1. Let us note that for any T > 0 the operator ST is injective with a denserange. For the reader’s convenience, we present a short proof. For the injectivity assumethat there exists 0 6= x ∈ H such that STx = 0. The semigroup property immediatelyimplies that Stx = 0 for all t > T . Let 0 < t < T be arbitrary. Then there exists k ∈ Nsuch that 2kt > T and hence S2ktx = 0. Since St is a non–negative operator, we have0 = 〈S2ktx, x〉 = ‖S2k−1tx‖2, hence S2k−1tx = 0. Now by induction it follows Stx = 0. Since(St)t≥0 is a C0-semigroup, we obtain 0 = limt→0 Stx = x, a contradiction. Note that thisalso implies that Ran(ST ) is a dense subspace of H as ST is a self–adjoint operator.

Since the range of ST is dense in H the system (2.1) is indeed approximately controllablein any time T > 0. In the class of initial values satisfying (2.2) for a given target y∗ ∈ Hand time T > 0, we are looking for those with minimal cost. More precisely, for ε, T > 0 andy∗ ∈ H we introduce the problem

(2.3) minu∈H

J(u) :

∥∥∥STu+

∫ T

0

Sτf(T − τ)dτ − y∗∥∥∥ ≤ ε

3

where

J(u) =α

2‖u‖2 +

1

2

∫ T

0

β(t)∥∥∥Stu+

∫ t

0

Sτf(t− τ)dτ − w(t)∥∥∥2

dt,

α > 0 and β ∈ L∞((0, T ); [0,∞)) are weights of the cost, and w ∈ L2((0, T );H) is the targettrajectory.

Of course, the notation used can be somewhat simplified by substituting Stu+∫ t

0Sτf(t−

τ)dτ with y(t), but we want to keep the formulation that explicitly shows dependence of theproblem on the given data f, y∗ and w, and the unknown control u.

For ε > 0 and x ∈ H we denote by Bε(x) = y ∈ H : ‖y − x‖ ≤ ε the closed ball ofradius ε and center x. Our problem (2.3) can be restated as

(2.4) minu∈H

J (u) + IBε(y∗)

(STu+

∫ T

0

Sτf(T − τ)dτ

),

where IBε(y∗) is the corresponding indicator function defined as

IBε(y∗) (y) =

0 if y ∈ Bε(y∗),+∞ else.

Since the function u→ J(u) + IBε(y∗) (STu+∫ T

0Sτf(τ)dτ) is proper, strongly convex and

lower-semicontinuous, problem (2.3) has a unique solution, which we denote by uopt (see, forinstance, [21, Corollary 2.20]). Moreover, we define

umin = arg minu∈H

J(u),

as the solution to the corresponding unconstrained problem, while by ymin = STumin +∫ T

0Sτf(T −τ)dτ and yopt = STu

opt +∫ T

0Sτf(T −τ)dτ we denote the corresponding optimal

final states obtained from umin and uopt, respectively.

Remark 2.2. Regarding the results we use from [21], we note that they are stated in thecase of real Hilbert spaces only. However, they carry over to the complex case by realifyingthe Hilbert space H and taking the real part of the inner product instead of the (complex)inner product.

The problem (2.4) has a form of the composite optimization problem [21]. That is tosay that the target functional is a sum of a quadratic function and a “simple” function,e.g. a composition with an indicator function. Such problems – when posed in the correctabstract setting – are typically solved by methods based on proximal operator. Instead,we use the spectral calculus to explicitly construct an operator theoretic representation ofthe trajectories, cf. Remark 2.8 and Section 4. Interestingly, the structure of the abstractcomposite optimization problem is still preserved in the solution formula. We will commenton this explicitly in Remark 2.7 after the statement of the main theorem.

We define y∗,hom = y∗ −∫ T

0Sτf(T − τ)dτ and whom = w −

∫ ·0Sτf(·− τ)dτ . Then our

problem (2.3) can be written as

(2.5) minu∈H

J(u) : ‖STu− y∗,hom‖ ≤ ε

,

where

J(u) =α

2‖u‖2 +

1

2

∫ T

0

β(t)‖Stu− whom(t)‖2dt.

4

Note that Stu is the solution of the corresponding homogeneous Cauchy problem with f = 0.If ε ≥ ‖ymin − y∗‖ it follows that the solution of (2.5) (and hence also of (2.3)) satisfies

uopt = umin. The following theorem covers the non-trivial case 0 < ε < ‖ymin − y∗‖ as well.

Theorem 2.3. Let T, ε > 0 and y∗ ∈ H. Then the optimal initial state uopt is given by

(2.6) uopt = (µεS2T + Ψ)−1(µεST y∗,hom + ψ),

where

Ψ = α Id +

∫ T

0

β(t)S2tdt, ψ =

∫ T

0

β(t)Stwhom(t)dt,

and µε ≥ 0 is the unique solution of Φ(µ) = ε if ε < ‖ymin − y∗‖ = ‖Ψ−1STψ− y∗,hom‖, andzero otherwise. Here Φ: [0,∞)→ [0,∞) is the function defined by

(2.7) Φ(µ) = ‖y∗,hom − (µS2T + Ψ)−1(µS2T y∗,hom + STψ)‖.

Remark 2.4. Since Ψ is positive definite, we indeed have that Ψ and µS2T + Ψ, µ ≥ 0,are invertible. Moreover, for the functional J we have ∇J(u) = Ψu − ψ. As umin is itsglobal minimizer, it immediately follows that umin = Ψ−1ψ, and thus ‖Ψ−1STψ − y∗,hom‖ =‖ymin − y∗‖.

Remark 2.5. For ε < ‖ymin − y∗‖ we obtain from (2.6) and (2.7)

Φ(µε) = ‖y∗ − yopt‖ = ε.

In other words, if the global minimizer umin of the unconstrained problem does not drivethe system to the target ball Bε(y

∗), then the optimal final state lies on the boundary ofthis ball, cf. Lemma 2.9. This is in accordance with previous results on similar problems(e.g. [17, Proposition 2.1] and [6, Theorem 2.4]) that provide the same characterisation ofthe optimal solution.

Remark 2.6. Let φ(µ) = y∗,hom − (µS2T + Ψ)−1(µS2T y∗,hom + STψ), hence Φ(µ) =

‖φ(µ)‖. Then φ(µ) = y∗,hom − x, where x is the solution of the equation

(2.8) (µS2T + Ψ)x = µS2T y∗,hom + STψ,

hence the calculation of Φ(µ) reduces to solving a linear equation. Note also that the optimalinitial state uopt is the solution of the equation

(2.9) (µεS2T + Ψ)x = µεST y∗,hom + ψ.

Remark 2.7. In order to give some geometrical intuition for the constrained optimizationproblem that we solve in the Hilbert space setting, let us observe a formal similarity ofthe result of Theorem 2.3 and the known finite dimensional result [2]. Let λ > 0 and letA ∈ Rm×n be of full rank. Then for a given x ∈ Rn

ux = arg minu∈Rn

λ‖Au‖+

1

2‖u− x‖2

5

is given by the formula

ux =

x−A∗(AA∗)−1Ax, if ‖(AA∗)−1Ax‖ ≤ λ,x−A∗(AA∗ + α∗ Id)−1Ax if ‖(AA∗)−1Ax‖ > λ.

Here α∗ is the unique positive root of the decreasing function

φ(α) = ‖(AA∗ + α Id)−1Ax‖2 − λ2.

The function φ takes the roll of Φ from (2.7).

Remark 2.8. To calculate Ψ we will use the fact that it can be written as a function ofA by using ∫ T

0

β(t)S2tdt =

∫ ∞−∞

∫ T

0

β(t) exp(−2tλ)dtd(E(λ)) = β0(A),

where β0 is a function given by β0(λ) =∫ T

0β(t) exp(−2tλ)dt and E is the spectral measure

of A.However such approach does not work directly with other term entering the formula for

the solution (2.6). Namely, it is not possible to find a nice closed formula for ψ except in special

situations. But we can always find a good approximant for ψ. Let w(t) =∑Ni=1 wiχ[ti−1,ti]

be an approximation of w, where 0 = t0 < t1 < · · · < tN = T , wi ∈ H, i = 1, . . . , N , and χSis the characteristic function of the set S. Then

ψ =

N∑i=1

βi(A)wi, where βi(λ) =

∫ ti

ti−1

β(t) exp(−tλ)dt,

is an approximation of ψ.If the function β is such that we can not explicitly calculate βi, i = 0, . . . , N , we can

still find appropriate approximations of Ψ and ψ by finding appropriate approximations ofβi, i = 0, . . . , N .

Before proving Theorem 2.3 we provide two auxiliary results.

Lemma 2.9. Let ε > 0 and ‖ymin − y∗‖ > ε, then the optimal final state verifies‖y∗ − yopt‖ = ε, i.e. yopt lies on the boundary of the target ball.

Proof. Let us suppose the contrary, that yopt ∈ Bε(y∗). Then it exists η > 0 such that

Bη(yopt) ⊂ Bε(y∗) and then, by continuity of ST , a δ > 0 such that STBδ(uopt)+

∫ T0Sτf(T−

τ)dτ ⊂ Bη(yopt) ⊂ Bε(y∗). In particular, every u ∈ Bδ(uopt) is a feasible control for the

problem (2.3). As uopt is the solution of the same problem, it holds J(uopt) ≤ J(u) for everyu ∈ Bδ(uopt). But, by the convexity of J , a local minimizer is also global. Then uopt issolution of the unconstrained problem, which contradicts the assumption ε < ‖ymin − y∗‖.

Lemma 2.10. The function Φ has the following properties:(a) If 0 < ‖y∗ − ymin‖, then Φ is a strictly decreasing function,(b) limµ→∞ Φ(µ) = 0,(c) Φ(0) = ‖y∗ − ymin‖.

6

Proof. Note that the function Φ can be rewritten as

Φ(µ) = ‖(µS2T + Ψ)−1(Ψy∗,hom − STψ)‖.

For the derivative of µ 7→ Φ2(µ) we have

(Φ2)′(µ) = −2⟨S2T (µS2T + Ψ)−2(Ψy∗,hom − STψ), (µS2T + Ψ)−1(Ψy∗,hom − STψ)

⟩= −2

∥∥ST (µS2T + Ψ)−3/2(Ψy∗,hom − STψ)∥∥2 ≤ 0.

As (Φ2)′ = 2ΦΦ′ and Φ is a nonegative function, it follows that Φ′(µ) ≤ 0 for all µ > 0 andwe have strict negativity if Ψy∗,hom 6= STψ. Recall that Ψumin = ψ (cf. Remark 2.4), henceΨymin = STψ. Now Ψy∗,hom = STψ would imply Ψymin = Ψy∗, a contradiction with theassumption and the invertibility of Ψ.

We now prove (b). First note

y − (µS2T + Ψ)−1µS2T y = y − µ(µ+ S−12T Ψ)−1y = (µ+ S−1

2T Ψ)−1S−12T Ψy

for all y ∈ Dom(S−12T Ψ), which implies that we have y − µ(µ + S−1

2T Ψ)−1y = 0 for ally ∈ Dom(S−1

2T Ψ). Let (yn) be a sequence from Dom(S−12T Ψ) which converges to y∗,hom. Let

n ∈ N be arbitrary. Using ‖(µ+ S−12T Ψ)−1‖ = dist(µ,−σ(S−1

2T Ψ))−1 ≤ µ−1 it follows

limµ→∞

‖y∗,hom − (µS2T + Ψ)−1µS2T y∗,hom‖

= limµ→∞

‖y∗,hom − yn − µ(µ+ S−12T Ψ)−1(y∗,hom − yn) + yn − µ(µ+ S−1

2T Ψ)−1yn‖

≤ 2‖y∗,hom − yn‖+ limµ→∞

‖yn − µ(µ+ S−12T Ψ)−1yn‖ = 2‖y∗,hom − yn‖,

and taking the limit n→∞ we obtain

limµ→∞

‖y∗,hom − (µS2T + Ψ)−1µS2T y∗,hom‖ = 0.

Hence to prove (b) we only have to show

(2.10) limµ→∞

‖(µS2T + Ψ)−1STψ‖ = limµ→∞

‖S−1T (µ+ S−1

2T Ψ)−1ψ‖ = 0.

Since Ran(ST ) is dense in H (cf. Remark 2.1), there exists a sequence (ψm)m∈N in Ran(ST )such that limm→∞ ψm = ψ, and let vm ∈ H be such that ψm = ST vm. Then limµ→∞ S−1

T (µ+S−1

2T Ψ)−1ψm = limµ→∞(µ+ S−12T Ψ)−1vm = 0 for all m ∈ N and for ψ we have the following

estimate

(2.11) ‖S−1T (µ+ S−1

2T Ψ)−1ψ‖ ≤ ‖ST (µS2T + Ψ)−1‖‖ψ − ψm‖+ ‖S−1T (µ+ S−1

2T Ψ)−1ψm‖.

By differentiating one can show that the mapping [0,∞) 3 µ 7→ ‖(µS2T + Ψ)−1x‖2 is adecreasing function for all x ∈ H, so in particular [0,∞) 3 µ 7→ ‖ST (µS2T + Ψ)−1‖ is abounded function. Thus we can pass to the limit in (2.11), from which we obtain (2.10).

Finally, (c) follows from Φ(0) = ‖y∗,hom −Ψ−1STψ‖ = ‖y∗ − ymin‖.

Now we are ready to provide proof of Theorem 2.3.

7

Proof of Theorem 2.3. Note that the case ε ≥ ‖ymin − y∗‖ is covered trivially. Indeed,by choosing µε = 0 in (2.6) we obtain uopt = Ψ−1ψ = umin. By assumption on ε, theunconstrained minimizer umin is admissible, and clearly optimal.

For the rest of the proof we consider the case 0 < ε < ‖ymin − y∗‖. We fix T, ε > 0 andy∗ ∈ H.

Based on Lemma 2.9, our problem (2.3) (or its equivalent form (2.5)) can be restated as

minu∈H

J(u) : ‖STu− y∗,hom‖ = ε

.

whose associate Lagrange functional reads as

L(u, µ) = J(u) +µ

2

(‖STu− y∗,hom‖2 − ε2

).

Its (global) minimizer corresponds to the unique solution of our problem. As J is a differen-tiable function, so it is the Lagrangian L. Thus it achieves the minimum value in the point(uopt, µε) satisfying

∇u,µL(uopt, µε) = 0.

By exploring the relation ∇J(u) = Ψu− ψ we get

∇uL(uopt, µε) = Ψuopt − ψ + µε(S2Tuopt − ST y∗,hom) = 0,

which directly leads to the formula (2.6). In order to determine the optimal value of theLagrange multiplier, we use Lemma 2.9 providing ‖STuopt − y∗,hom‖ = ε. Plugging theexpression for the optimal control uopt we obtain

Φ(µε) = ‖y∗,hom − (µS2T + Ψ)−1(µεS2T y∗,hom + STψ)‖ = ε.

Lemma 2.10 and the assumption ε < ‖ymin − y∗‖ ensure the existence of the unique value µε

satisfying the last relation, which completes the proof.

For the rest of this section we provide a geometric interpretation of the final optimal state.For x ∈ H and W ⊂ H closed and convex, we denote by ΠW (x) the unique projection of xonto the set W .

For c ∈ R we denote by Γc(g) the c-sublevel set of a function g : Dom(g) ⊂ H → Rdefined by

Γc(g) = y ∈ Dom(g) : g(y) ≤ c.

Note that the sublevel set of a convex function is convex, see, e.g., Remark 2.10 in [21].In particular, we have the following result.

Lemma 2.11. Let g : H → R be a differentiable convex function, c > inf g, and y1 ∈H \ Γc(g). Denote by y0 ∈ Γc(g) the unique projection of y1 onto Γc(g). Then there is γ > 0such that

y1 − y0 = γ∇g(y0).

Proof. First note that the projection y0 is the solution of the constrained minimisationproblem

miny∈Γc(g)

1

2‖y − y1‖2 : y ∈ ∂Γc(g)

,

8

whose Lagrangian is given by the relation

L(y, λ) =1

2‖y − y1‖2 + λ(g(y)− c).

Since g is a differentiable, real-valued function, the associated Lagrangian is differentiableas well. Therefore its unique minimizer (y0, γ) satisfies ∇L(y0, γ) = 0, which is exactly theformula appearing in the statement of the lemma.

The sign of γ results from the fact that g(y1) > g(y0) (as y1 ∈ H \ Γc(g)).

We introduce the functional h : Ran(ST )→ R, defined by h(y) = J(S−1T y). For each c ∈ R

we define the corersponding sublevel set

Wc := STu : u ∈ H with J(u) ≤ c = Γc(h).

With the notation cmin = J(umin) we have that Wc = ∅ if and only if c < cmin. In order toshow that Wc are closed we will use the fact that the weak and strong closure of a convex setagrees. If c < cmin the set Wc is empty and thus closed. Assume now c ≥ cmin and let (yn)n∈Nbe a (strongly) convergent sequence in Wc ⊂ H. We denote by y = limn→∞ yn its limit. If weshow that y ∈Wc, i.e. h(y) ≤ c, we are done. For n ∈ N let xn ∈ H be such that yn = STxn.As J(xn) ≤ c for all n ∈ N, it follows that (xn)n∈N is a bounded sequence. Hence there existsa weakly convergent subsequence, still denoted by (xn)n∈N. We denote by x = w-limn→∞ xnits weak limit. For all z ∈ H we have 〈STxn − y, z〉 = 〈xn, ST z〉 − 〈y, z〉 → 〈x, ST z〉 − 〈y, z〉.Hence STx = y. From the continuity of the function J and h(yn) ≤ c for each n ∈ N weconclude

c ≥ w-limn→∞

h(yn) = J(w-limn→∞

xn) = J(x) = h(y).

If we formally differentiate the function h, for the gradient of h we obtain

(2.12)∇h(y) = S−1

2T

(αy +

∫ T

0

β(t)S2ty dt

)− S−1

T

∫ T

0

β(t)Stwhom(t) dt

= S−12T Ψy − S−1

T ψ.

Below we comment how to tackle the case if h is not differentiable in H. Denoting by copt =J(uopt), from Lemma 2.9 it follows that the optimal final state belongs to Wcopt ∪ ∂B(y∗, ε).Specially, it equals the projection ΠW opt

c(y∗) of the target state to the sublevel set Wcopt

(Figure 1).By putting c = copt and y1 = y∗ into Lemma 2.11, we obtain that there exists γopt > 0

such that

y∗ − yopt = γopt∇h(yopt).

By acting with ST on the last equality, taking into account that yopt = STuopt +

∫ t0Sτf(t−

τ)dτ and using (2.12) we obtain

ST y∗,hom − S2Tu

opt = γopt(Ψuopt − ψ),

which lead us again to the solution formula (2.6) with µε = 1/γopt.One can bypass the problem of non-differentiability of h by taking a sequence of ap-

proximate functionals hn(y) = J(e−TAny) for n large enough, where An are the Yosida

9

approximations of the operator A. Then hn : H → R are differentiable functions whichconverge to h. Instead of (2.5) for all n large enough we study the problem

(2.13) miny∈Hhn(y) : ‖y − y∗‖ ≤ ε .

Then one can prove the corresponding version of Lemma 2.9 and use Lemma 2.11 to showthat for all n large enough we have

eTAny∗ − e2TAnuoptn = γopt

n (Ψuoptn − ψ),

where uoptn is the unique solution of (2.13). Finally, by proving limn→∞ uopt

n = uopt andlimn→∞ γopt

n = 1/µε, one recovers (2.6). We skip the details.

Wcopt

y∗

yopt

ε

ymin

Figure 1: Illustration of the optimal final state. It equals the projection of the target stateto the sublevel set Wcopt .

3. Sensitivity analysis. In this section we show that the solution of (2.3) is stable inthe sense that if the parameters α, f , β, w, y∗ and A are perturbed by a small perturbation,then the solution of the perturbed problem is as well a small perturbation of the solution ofthe unperturbed problem. Let 0 < ν < 1 and

(i) δα < ν such that α+ δα > 0,(ii) δf ∈ L2((0,∞);H) such that ‖δf‖L2((0,∞);H) < ν ,

(iii) δβ ∈ L∞((0, T );R) such that ‖δβ‖L∞((0,T );R) < ν and β + δβ ∈ L∞((0, T ); [0,∞)),(iv) δw ∈ L2((0, T );H) such that ‖δw‖L2((0,T );H) < ν,(v) δy∗ ∈ H such that ‖δy∗‖ < ν.

(vi) For the perturbation δA of the operator A we assume that δA is a symmetric linearoperator in H, A+ δA is an upper bounded self-adjoint operator in H, and there existsζ > maxmaxσ(A),maxσ(A+ δA) and R > 0 such that for all s ∈ R we have

(3.1) ‖(ζ + is−A− δA)−1 − (ζ + is−A)−1‖ < ν

1, |s| ≤ R,s−2, |s| > R.

We denote by (Sδt )t≥0 the semigroup generated by A+ δA, and introduce the short handnotation αδ = α+ δα, fδ = f + δf , βδ = β + δβ, wδ = w + δw and y∗δ = y∗ + δy∗. We willuse the estimate (3.1) to obtain an upper bound for the perturbation of the semigroup, e.g.‖Sδt − St‖. Now we introduce the perturbed problem

(3.2) minu∈H

Jδ(u) : ‖SδTu+

∫ T

0

Sδτfδ(T − τ)dτ − y∗δ‖ ≤ ε

10

where

Jδ(u) =αδ2‖u‖2 +

1

2

∫ T

0

βδ(t)

∥∥∥∥Sδt u+

∫ t

0

Sδτfδ(t− τ)dτ − wδ(t)∥∥∥∥2

dt.

Let y∗,homδ = y∗δ−

∫ T0Sδτfδ(T−τ)dτ , whom

δ = wδ−∫ ·

0Sδτfδ(·−τ)dτ , δy∗,hom = y∗,hom

δ −y∗,hom

and δwhom = whomδ − whom. We denote the unique solution of the perturbed problem (3.2)

by uoptδ , and recall that uopt is the unique solution of the unperturbed problem (2.3).

Theorem 3.1. Under the above assumptions we have

‖uoptδ − uopt‖ < Cν

for ν small enough, where C is a constant that does not depend on ν.

Remark 3.2. Let us discuss certain situations where assumption (3.1) is satisfied.(i) Let ‖δA‖ < ν, ζ = maxmaxσ(A),maxσ(A+ δA)+ 1 and R = 1.

Then, by the second resolvent identity and ‖(z−T )−1‖ = 1/ dist(z, σ(T )) for self-adjointT and z ∈ ρ(T ) we conclude for all s ∈ R

‖(ζ + is−Aδ)−1 − (ζ + is−A)−1‖ = ‖(ζ + is−Aδ)−1δA(ζ + is−A)−1‖

<ν

dist(ζ + is, σ(Aδ)) · dist(ζ + is, σ(A))

≤ ν

1 + s2.

Hence, (3.1) is satisfied if δA is a bounded operator with norm smaller than ν.(ii) Let δA be relatively bounded with respect to A with a relative bound smaller than

ν, i.e. ‖δAx‖ ≤ a‖x‖+ b‖Ax‖ for all x ∈ Dom(A) with 0 ≤ b < ν and a nonnegativeconstant a. Let R = 1. Then A+ δA is an upper-bounded self-adjoint operator, seee.g. [15, Theorem V.4.3, Theorem V.4.11]. Let ζ > maxσ(A) =: κ be arbitrary. Then

we have ‖A(ζ − A)−1‖ =∫ κ−∞

|λ|ζ−λ d‖E(λ)‖, hence for all δ > 0 and all ζ > 2+δ

1+δκ we

have ‖A(ζ − A)−1‖ < 1 + δ. Let δ = (ν − b)/(a + b) and let ζ > 2+δ1+δκ be such that

‖(ζ −A)−1‖ < δ. Let x ∈ Dom(A) be arbitrary and let y = (ζ −A)x. Then

‖δAx‖ ≤ a‖(ζ −A)−1y‖+ b‖A(ζ −A)−1y‖ < ν‖(ζ −A)x‖.

Let χ ≥ ζ be arbitrary. Then from 〈x+ (χ− ζ)(ζ −A)−1x, x〉 ≥ 1 for all x ∈ H suchthat ‖x‖ = 1 it follows ‖(I + (χ− ζ)(ζ −A)−1)−1‖ ≤ 1 and hence

‖δA(χ−A)−1‖ = ‖δA(ζ −A)−1(I + (χ− ζ)(ζ −A)−1)−1‖ < ν.

This implies χ ∈ σ(A+ δA), e.g. maxσ(A+ δA) < ζ. By the second resolvent identitywe have for all s ∈ R

‖(ζ + is−Aδ)−1 − (ζ + is−A)−1‖ = ‖(ζ + is−Aδ)−1δA(ζ + is−A)−1‖≤ ‖(ζ + is−Aδ)−1‖‖δA(ζ + is−A)−1‖= dist(ζ + is, σ(Aδ))

−1‖δA(ζ + is−A)−1‖.

11

For all s 6= 0 we have

‖δA(ζ + is−A)−1‖ = ‖δA((I + is(ζ −A)−1)(ζ −A)

)−1‖

= ‖δA(ζ −A)−1(I + is(ζ −A)−1

)−1‖

<ν

s

∥∥∥∥∥(

1

isI + (ζ −A)−1

)−1∥∥∥∥∥

=ν

sdist

(− 1

is, (ζ − σ(A))

−1

)−1

≤ ν√s2 + 1

but note that the final estimate holds also for s = 0. Hence we finally obtain for alls ∈ R and ξ = ζ + 1

‖(ξ + is−Aδ)−1 − (ξ + is−A)−1‖ < ν

1 + s2,

and hence the perturbation δA satisfies the assumption (vi).

Proof of Theorem 3.1. We first estimate the perturbation bound for Ψ. We defineΞ := ζ + is : s ∈ R, where ζ is the value from the assumption (vi). Then Ξ is in theresolvent sets of both A and A + δA. Using the spectral calculus for generators of C0-semigroups (see, for example [9]), we obtain

Ψ = αI +1

2πi

∫ T

0

β(t)

∫Ξ

e2tλ(λ−A)−1dλ dt,

Ψ + δΨ = (α+ δα)I +1

2πi

∫ T

0

(β(t) + δβ(t))

∫Ξ

e2tλ(λ−A− δA)−1dλ dt.

Hence, using Fubini theorem and the resolvent formula, we obtain

(3.3) δΨ = δαI +1

2πi

∫Ξ

(λ−A− δA)−1δA(λ−A)−1

∫ T

0

β(t)e2tλdtdλ

+1

2πi

∫ T

0

δβ(t)

∫Ξ

e2tλ(λ−A− δA)−1dλ dt.

From |∫ T

0β(t)e2tλdt| ≤ ‖β‖ 1

2ζ (e2Tζ − 1) for λ ∈ Ξ, the norm of the second term of δΨ canbe estimated from above by

e2Tξ − 1

4ξπ‖β‖

∫Ξ

‖(λ−A− δA)−1 − (λ−A)−1‖dλ

=e2Tξ − 1

4ξπ‖β‖

(∫ ξ+iR

ξ−iR+

∫Ξ\[ξ−iR,ξ+iR]

)‖(λ−A− δA)−1 − (λ−A)−1‖dλ

<ν(e2Tξ − 1)

2ξπ‖β‖(R+R−1),

12

where in the last inequality we have used (3.1). From

∥∥∥∥∥∫ T

0

δβ(t)

∫Ξ

e2tλ(λ−A− δA)−1dλ dt

∥∥∥∥∥ ≤∫ T

0

|δβ(t)|∥∥∥∥∫

Ξ

e2tλ(λ−A− δA)−1dλ

∥∥∥∥dt

< ν

∫ T

0

∥∥Sδ2t∥∥dt ≤ ν∫ T

0

∫ ζ

−∞|e2tλ| d‖EA+δA(λ)‖dt ≤ ν e2Tξ − 1

2ζ,

we obtain that the third term in (3.3) has an upper bound ν(e2Tξ − 1)/(4ζπ). Hence weobtain

‖δΨ‖ < ν

(1 +

e2Tξ − 1

4ζπ

(2‖β‖(R+R−1) + 1

)).

To obtain an upper bound for ‖δψ‖, we use the same steps as for δΨ, but pulling out theL2-functions using Holder’s inequality. For t ≥ 0 we define the operator function

D(t) =1

2πi

∫Ξ

etλ((λ−A− δA)−1 − (λ−A)−1

)dλ.

Note that D(t) = Sδt − St, hence D(t) is actually the perturbation of St. We calculate

‖D(t)‖ < νπ−1eζt(R+R−1) for all t ≥ 0.

We first estimate

δwhom = δw −∫ ·

0

D(τ)f(τ)dλ dτ − 1

2πi

∫ ·

0

∫Ξ

eτλ(λ−A− δA)−1δf(τ)dλ dτ.

By using Holder’s inequality we estimate

‖δwhom(t)‖ ≤ ‖δw(t)‖+

∥∥∥∥∫ t

0

D(τ)f(τ)dτ

∥∥∥∥+

1

2π

∥∥∥∥∫ t

0

∫Ξ

eτλ(λ−A− δA)−1δf(τ)dλ dτ

∥∥∥∥< ‖δw(t)‖+

ν

π(R+R−1)‖f‖

√e2tζ − 1

2ζ+

ν

2π

√e2tζ − 1

2ζ.

This implies

‖δwhom‖ <√

2ν

(1 +

1

8ζπ2

(2(R+R−1 + 1)

)2 ‖f‖(e2Tζ − 1

2ζ− T

))1/2

.

13

Now we are in position to estimate δψ. Since

δψ =

∫ T

0

β(t)D(T + t)whom(t)dt+

∫ T

0

β(t)D(T + t)δwhom(t)dt

+

∫ T

0

δβ(t)D(T + t)whom(t)dt+

∫ T

0

δβ(t)D(T + t)δwhom(t)dt

+1

2πi

∫ T

0

β(t)

∫Ξ

eτλ(λ−A− δA)−1δwhom(t)dλ dt

+1

2πi

∫ T

0

δβ(t)

∫Ξ

eτλ(λ−A− δA)−1whom(t)dλ dt

+1

2πi

∫ T

0

δβ(t)

∫Ξ

eτλ(λ−A− δA)−1δwhom(t)dλ dt

we can again estimate ‖δψ‖ using the techniques from above and obtain

‖δψ‖ ≤ Cν,

where C is a constant which does not depend on ν and which may change from line to line.Similarly we obtain

‖δy∗,hom‖ ≤ Cν.

Hence we obtained that for ν < 1, each of ‖δΨ‖, ‖δψ‖ and ‖δy∗,hom‖ has an upper bound ofthe form Cν. We have also proved ‖D(t)‖ < C(t)ν.

As the solution is given in terms of linear systems (2.8) and (2.9), to prove the claimof the theorem it is sufficient to show that the solutions of these systems are stable underperturbations. First note that for a chosen µ the operator on the left hand side of (2.8) and(2.9) is bounded and strictly positive and that the same holds for the perturbed right handside. Moreover, from the estimates obtained above, we see that the perturbation of the lefthand side of (2.8) is given by

µD(2T ) + δΨ

and the perturbation of the right hand side of (2.8) is given by

µS2T δy∗,hom + µD(2T )(y∗,hom + δy∗,hom) + ST δψ +D(T )(ψ + δψ).

Hence the norms of the perturbations of both left and right hand side of (2.8) are smallerthan Cν if ν is small enough. This allows us to apply the standard perturbation theoreticresults for the solutions of linear systems ([20], see also [7, Proposition 4.2]) and concludethat the perturbed system (2.8) has the solution x+ δx with δx satisfying ‖δx‖ < Cν. LetΦδ be the perturbed function Φ, δΦδ = Φδ − Φ, and let µδε be the solution of the equationΦδ(µ) = ε. Then Φδ(µ

δε) = ‖y∗,hom + δy∗,hom − xε − δxε‖, where xε + δxε is the solution of

the perturbed system (2.8) with µ = µδε. Using the obtained bounds on the perturbations, itfollows |δΦδ(µδε)| < Cν. Hence

|Φ(µδε)− Φ(µε)| = |Φδ(µδε)− δΦ(µδε)− ε| = |δΦ(µδε)| < Cν.

Since Φ is a continuous and monotone function, it follows that Φ−1 is continuous, hence weobtain |δµε| < Cν.

14

4. Approximations of semigroups and related operator functions. In this sec-tion we will review rational approximation methods for a semigroup St whose generator A isa self-adjoint operator with upper bound κ ≤ 0. Our approach to constructing numericalapproximations can be applied to non stable systems (those systems for which κ > 0) aswell, but such systems are not included among our examples, and we just note that in thecase of a non-stable systems the estimates include a multiplicative constant which growsexponentially with κ.

We say that the function r is a type (n,m) rational function, where n and m arenonnegative integers, if there are polynomials p and q of degrees at most n and m, respectively,such that r = p/q. Here the degrees n and m need not be optimal. Given a rational functionr we have

(4.1) (v, Stv)− (v, r(A)v) =

∫ κ

−∞

(etλ − r(λ)

)d(E(λ)v, v),

where E(·) denotes the spectral measure of the self-adjoint operator A [22, 15]. The supportof the spectral measure of any of the operators tA, for t > 0 is contained in (−∞, 0] andcomputing similarly as in (4.1) we obtain the estimate

(4.2) ‖g(A)v − r(A)v‖ ≤ ‖g − r‖L∞(−∞,0]‖v‖,

where the function g is measurable with respect to the spectral measure of A. Whenconsidering numerical efficiency, a key information is if there exists a rational approximationof g with n small.

For the case in which g(z) = e−z, it is known, see [8, 24], that for each n there exists aunique type (n, n) rational function r∗n which minimizes ‖g− ·‖L∞(−∞,0]. Furthermore, thereexists a constant C > 0, independent of n, such that r∗n verifies

(4.3)

‖g−r∗n‖L∞(−∞,0]

= min‖g − r‖L∞(−∞,0] : r is a type(n, n) rational function.

≤ C

Hn≤ C

9.28903n.

The number H is known under the name of Halphen constant, see [23]. The rational functionr∗n is the unique minimizer of ‖g − r‖L∞(−∞,0] among (n, n) rational functions. Further, r∗ndoes not have zero-pole pairs appearing on the negative real axis.

For the error analysis of approximations of semi-groups it is particularly convenient ifthe rational function is representable in the partial fractions form. For constants r0 and ri,ζi, i = 1, · · · , d the expression

r(z) = r0 +r1

z − ζ1+ · · · rd

z − ζd

is a partial fractions expansion of the rational function r. It has been shown that forg(z) = e−z one can construct, see [24], a partial fractions expansion of the type (n, n) rationalfunction rn such that ‖g − rn‖L∞(−∞,0] ≤ C3.2−n. The constant C > 0 is independent of n.The poles ζi are contained on a hyperbola in a complex plane and the weights are defined bythe application of the n point quadrature rule to the Cauchy integral representation of theexponential function with this hyperbola as a contour.

15

4.1. Rational function fitting. To approximate solutions of the constrained paraboliccontrol problem, we will need rational approximations of slightly more general functions.Let us first note that the optimal approximation result (4.3) can be extended, see [23], ina slightly modified form to the class of perturbed exponential functions g which can berepresented as

g(x) = u0(x) + u1(x)eax

where u0 and u1 6= 0 are arbitrary rational functions and a < 0. According to [23, Theorem1], for any n ∈ N and a chosen but fixed integer k such that n− k ≥ 0 there exists a uniquerational function r∗n,n+k such that

r∗n,n+k = arg min‖g − rn,n+k‖L∞(−∞,0] : r is rational function of type (n, n+ k)

and ‖g − r∗n,n+k‖L∞(−∞,0] ≤ C9.28903−n. Further, the results of [8, 23] are existential. Away to construct a rational approximation satisfying (4.3) is to transform the interval (−1, 1]to (−∞, 0] and then apply the contour integration technique to the transformed problem, see[24]. This can be achieved by the Moebius transformation m(z) = 9(z − 1)/(z + 1), see [24].The inverse transformation to m is given by the formula m−1(z) = −(z + 9)/(z − 9) and itmaps 〈−∞, 0] to 〈−1, 1]. Then the function to approximate is g(z) = em(z) : (−1, 1] → Rand the rational function which approximates ez is obtained by composing the rationalapproximant of g with the inverse Moebius transformation. We first loop a finite contouraround the interval (−1, 1] and then these points get mapped by the Moebius transform intopoints on a curve looping around the infinite interval (−∞, 0].

Let now g1 and g2 be perturbed exponential functions. We are interested in findingtype (n, n) rational approximations of functions of the form g1 + g2, g1g2, g1/g2 and gi m.Obviously, combining rational approximations ri of gi is a natural first idea. However, therational functions r = r1 + r2, r = r1r2 or r = r1/r2 will in general be of a different(component-wise larger) type.

We can however use an approximation approach to truncate the type of the product,sum or a quotient of two rational functions of the type (n, n) to a rational function rof the type (n, n) which for given tol > 0 and an interval [a, b] satisfies the estimate‖r − r‖L2[a,b] ≤ tol‖r‖L2[a,b].

To this end we use the award winning rkfit algorithm from [3]. This is the rationalKrylov function fitting algorithm which implements the rational functions calculus byworking with a representation of a rational function as a transfer function of a pencil ofHessenberg matrices. It performs all ot the aforementioned operations (addition, division,multiplication and composition with a Moebius transformation) stably using only floatingpoint arithmetic. According to [3] given a tolerance tol and the perturbed exponentialfunction g(x) = u0(x) + u1(x)eax, a < 0 such that g ∈ L2(−∞, 0], rkfit algorithm producesa rational function

(4.4) rRK(x) = r0 +r1

x− ζ1+ · · ·+ rd

x− ζd,

in the pole residue form, such that

(4.5) ‖rRK − g‖L∞(−∞,0] ≤ tol‖g‖L2(−∞,0].

We can now construct the operator rRK(A) := r0I +∑di=1 ri(A− ζi)−1 such that

‖g(A)− rRK(A)‖L(H) ≤ tol‖g‖L2(−∞,0].

16

4.2. Galerkin resolvent estimates. The steps needed to compute the action of afunction of an operator on a vector, exemplary g(A)v = Stv, involve two steps. First, weapproximate the function g by a rational function on an interval containing the spectrum ofthe self-adjoint operator A. We then need to sample the resolvent (z −A)−1v at the poles ofthe rational function r.

In what follows we will restrict our considerations to the operator of the divergencetype posed in a compact polygonal domain Ω ⊂ R2. Many statements are algebraic innature and hold in a more general setting. However, the interpolation results for piecewisepolynomial functions and the regularity results for the domain of the operator are specific tothe aforementioned class of operators.

We approximate the action of the resolvent by selecting a finite dimensional subspaceVh ⊂ Dom(A1/2) and then forming the Galerkin projection of A onto Vh. According to [16,Section 5], the Galerkin projection Ah : Vh → Vh is given by the formula

Ah = (A1/2Ph)∗(A1/2Ph),

where Ph is the orthogonal projection onto Vh. Let Vh be the space of piece-wise linear, fora given triangular tessellation of Ω, and continuous functions on Ω. The resolvent estimatefor A using the Galerkin projection Ah reads (see e.g. [14] for technical details)

(4.6) ‖(z −A)−1v − (z −Ah)−1v‖L2(Ω) ≤ Ch2ν‖v‖L2(Ω),

for h < h0 and v ∈ Vh. Here ν > 0 is a parameter depending on the regularity of thefunctions in Dom(A) and h is the maximal diameter of a triangle in the chosen tessellationof Ω and h0 is denoting the minimal level of refinement from which the estimate holds. Notethat constants C and h0 do depend on z in an explicit way but do not depend on v, see [14].We will, however need this estimate solely for at most d poles ζi, i = 1, · · · , d of the rationalfunction rRK from (4.4), and so

‖rRK(A)v − rRK(Ah)v‖L2(Ω) ≤ d C h2ν‖v‖L2(Ω).

Finally, let g(x) = u0(x) + u1(x)eax be the perturbed exponential function. Based on (4.2),for a given rational function rRK and v ∈ Vh we have the estimate

‖g(A)v − rRK(Ah)v‖L2(Ω)

≤ ‖g(A)v − rRK(A)v‖L2(Ω) + ‖rRK(A)v − rRK(Ah)v‖L2(Ω)

≤ ‖g − rRK‖L∞(−∞,0]‖v‖+ dCh2ν‖v‖L2(Ω).

By choosing suitable rRK and h, the last estimate ensures a good approximation of g(A)vbased on a finite dimensional approximation of the operator A.

5. Numerical examples. In this section we consider several constrained optimizationproblems in 1D and 2D. The problems are academic and are primarily chosen to test theefficiency of the developed approach. We compare our results with those obtained by other,already existing methods where such a comparison is possible. We will also report the timingsas means to get an intuition of the efficiency of implementation. The timings will be reportedfor the workstation running Intel Core i5 8600K at 3.60 GHz with 24 GB of DDR4 ram.

In all examples we take the weight function of the form β = χ[T/3,2T/3], while the desiredtrajectory w is assumed to be time independent. This implies that we want the optimalstate to be close to w for times t between T/3 and 2T/3, while no desired trajectory is

17

prescribed outside this interval. With this setting and under the additional assumption thatthe operator A is strictly negative, the operator Ψ and the vector ψ from the main theoremcan be computed explicitly as

Ψ = αI +1

2A−1S2T/3(I − S2T/3), ψ = A−1ST/3(I − ST/3)w.

We can now use spectral calculus to exemplary represent the operator Ψ as

Ψ = αI +

∫R

eλ 2T/3 g(λ) dE(λ)

= αI +

∫R

eλ 2T/3(1/λ− eλ2T/3/λ) dE(λ).

The function λ→ 1/λ− eλ 2T/3/λ is obviously the perturbed exponential function for whichthe rational approximation theory holds (there exists a small degree rational approximation).We can equivalently use rational approximation theory to compute the vector ψ. In numericalprocedure the first step is to determine µε - the solution to the equation Φ(µ) = ε (cf.(2.7)). Taking into account the properties of Φ (given in Lemma 2.10), the equation has aunique solution for every ε ∈ (0,Φ(0)). Any root finding algorithm based only on functionevaluation can be used to robustly approximate the root ε0. We use the Brent method as itis implemented in the Matlab’s procedure fzero. We keep the convergence criterion for theroot finding procedure below the discretization error for the finite element approximation.According to the resolvent analysis, the error in the approximation by a rational function isa lower order perturbation of the system, as compared to the discretization error.

The value of the function Φ(µ) is computed by using the rational approximation and thespectral calculus

(µS2T + Ψ)−1µS2T y∗,hom

=

∫ 0

−∞

µe2Tλ

µe2Tλ + α+ (1/λ− eλ 2T/3/λ)dE(λ)y∗,hom

≈ r0y∗,hom +

d∑i=1

ri(ζi −A)−1y∗,hom.

The function

g(λ) =µe2Tλ

µe2Tλ + α+ (1/λ− eλ 2T/3/λ)

is approximated on (−∞, 0] using the rational function r with 18 pole residue pairs. Theapproximation r satisfies (4.5) with tol = 10−15, as it is the default for rkfit. The functionis a quotient of perturbed exponential functions for which we know that there is a highquality low degree rational approximation. We could compute a rational approximation ofg as a quotient of rational approximations, however this rational function could have, inthe worst case, double the degree of the best rational approximation of the numerator anddenominator. Instead, as discussed in Section 4.1, we choose to approximate the function gdirectly, as means of keeping the degree of the approximating rational function lower. Notethat these approximations are obviously independent of A, and hold on (−∞, 0].

Once, the equation for µε is solved, the optimal initial control uopt follows by (2.6). Forits computation we explore the same procedure as the one for calculation of Φ(µ).

18

5.1. 1D heat equation. As the first test of the proposed method we consider the heatequation (with variable coefficient) on Ω = [0, π] accompanied by homogeneous Dirichletboundary conditions. The operator A is taken of the form

A = −∂x((1 + aχ[γ,π])∂x)

with γ = 2.2. The parameter γ determines the contact of two materials with a differentdiffusivity coefficient. We consider two cases:

1. a = 0, with A being the isotropic Laplace operator;2. a = −0.8, resulting in discontinuity of diffusion coefficient at point γ.

Operator A is discretized by conforming linear finite elements with h = 1/20 and we use thelumped mass discretization in order to be able to utilized optimized rkfit library.

Besides the function β determined in the beginning of this section, for this example wepropose

• α = 10−4,• final time T = 0.01,• desired trajectory ω = χ[π/5,2π/5],• final target y∗ = χ[3π/5,4π/5],• f = 0 (homogenous equation).

The choice of w stimulates the state trajectory to be concentrated on the left part of thedomain during the central time period, while at the final time the target y∗ requires it to besupported at the right hand side, at least for small values of the tolerance ε.

For the isotropic case a = 0, the above setting coincides with Example 4.1 from [17].In such a way we shall be able to compare our results with those obtained by a differentmethod based on spectral decomposition of the Laplace operator.

The example is performed for three values of the final tolerance

ε = [0.2, 0.5, 0.9]Φ(0),

depicted on Figure 2, together with the corresponding values µε (solutions to equation (2.7))and graph of function Φ (in log− log scale). The figure confirms the properties of function Φprovided by Lemma 2.10. Specially, its initial value Φ(0) coincides with

‖ymin − y∗‖ = 1.0374

where ymin is the optimal final state of the unconstrained problem. The corresponding initialvalue umin, which is just the minimizer of functional J can also be obtained by standardmethods of convex analysis.

Remark 5.1. Note that a gradient method for computing the minimizer of J requiresa solution of the forward problem for the parabolic equation. A basic step of any implicitmethod for the solution of a parabolic equation is the evaluation of a resolvent like function.The convergence of a gradient method with Nesterov’s acceleration is at best 1/n2, where n isthe number of forward problem solves. On the other hand, our method is based on utilizingfunctional calculus and the best rational function approximations of operator functions. Inconsequence, these operator functions can be approximated with a method that convergesat the rate of at least 9−n, where n is the number of the resolvent evaluations. This is avery crude comparison. However, under a very modest assumption that we need at least oneevaluation of the resolvent function per forward problem solve, we clearly see a potentialadvantage of the the rational function approach. This is the reason why such methods are

19

becoming methods of choice for the solution of parabolic problems and also for numericallyinverting the Laplace transform in the case of the solution of inverse problems, see [23].

The elapsed time to produce the plot which included sampling Φ in 350 points was 12.97seconds and it took 0.36 seconds to compute Ψ(0) alone.

(a) Unisotropic difusion (b) Isotropic difusion

Figure 2: Function Φ, the chosen values of ε and corresponding µε.

The results in the isotropic case for the prescribed values of ε are presented in Figure 3.When ε is small the initial mass is concentrated on the support of the target y∗, in order tosteer the system close to it at the final time. On the opposite, for large value of ε, the initialcontrol is concentrated on the left. In such a way the solution stays close to the desiredtrajectory ω in the middle part of the time interval during which the distributed cost β isactive. Finally, the intermediate value of ε is a trade-off between the optimisation of the costfunctional J and the requirement to hit the final target with the given tolerance.

Besides agreeing with the intuition, the results completely coincide with those obtainedin [17, Example 4.1]. This provides the first confirmation of the method proposed in thisarticle. Furthermore, we note that for increased tolerance ε the optimal control resembles thesolution of the unconstrained problem umin (Figure 4), where the latter is just a minimizerof functional J . This is expected, as for large values of ε the solution is less affected bythe prescribed target y∗, while the unconstrained problem is completely independent of it.The results for the discontinuous diffusion and for the same range of the final tolerance arepresented in Figure 5. In their main features, the results coincide with those obtained in thecase of the constant diffusion coefficient. The novelty is broken symmetry of the solution inthe right part of the domain, where the discontinuity occurs. As a consequence, the center ofinitial mass is slightly shifted rightward, where diffusion processes are slower. This is logical,having in mind that in this region the initial mass can better approximate the characteristicfunction (of the support of y∗) during a larger period of time, as small diffusion rate will notmodify its form significantly.

5.2. 2D heat equation on irregular domain. In the next example we repeat thesame calculation in the 2D setting. To this end we use the Dirichlet Laplace operator definedon the L-shape domain Ω = [−1, 1]

2 \ ([−1, 0]× [0, 1]). We will use Lagrange P1 elements(i.e. we approximate using piecewise linear and continuous functions). We have used a shaperegular mesh with h = 1/30, also with the lumped mass discretization of the semigroup, andwe choose T = 1/20. Due to the reentrant corner of the L-shaped domain we have a loss

20

Figure 3: Example 5.1, isotropic case. The initial control u = y(0) (left), the computedsolution at time t = T/2 compared with the desired trajectory ω (middle), and the optimalfinal state at t = T compared with the target y∗ (right) for three different values of thetolerance ε.

of regularity of the functions in Dom(A) and so the resolvent estimate (4.6) holds with ν,0 < ν < 1. In the case of H2-regular solutions we would have r = 1.

For the target data we choose• ω(x) = χ‖x−x0‖1≤0.2,

• y∗(x) = e−20‖x−x1‖2 + e−20‖x−x2‖2 + e−30‖x−x3‖2 ,with x0 = (−0.5,−0.5), x1 = (0.5, 0.5), x2 = (0.6, 0.1) and x3 = (0.8, 0.4) (Figure 6). Theother parameters are the same as in the previous example.

The results for the three values of the final tolerance ε = [0.1, 0.5, 0.9]Φ(0) are displayedin Figure 7. We show the solutions’ snapshots at t = 0, T/2, T .

The first row depicts evolution of the state for small tolerance ε. The initial controlsteers the system close to the prescribed target y∗ (cf. Figure 6) at the final time, whilethere is no coincidence with ω in the between period. For the tolerance 0.5 of the range of Φequal importance is assigned both to ω and the final state. The largest tolerance allows thesolution to optimize the given cost functional almost independently of the prescribed targety∗.

Essentially, the results exhibit the same behavior as those obtained in the previousexamples. The elapsed time for computing Ψ(0) – the unconstrained problem – is 0.9828

21

Figure 4: Example 5.1, unisotropic case. The plot of umin and y∗.

seconds. This demonstrates efficiency and flexibility of the method in 2D and, in particular,in case of irregular domains. We chose to exemplary report the timing for Ψ(0), since in thiscase it is possible to compute the value of Ψ with other methods such as those which arebased on gradient optimization.

6. Conclusion. In this paper we have constructed and implemented a numericalalgorithm for a constrained optimal control problem. The problem consists of identifying aninitial datum that minimizes a given cost functional and steers the system at the final timewithin a prescribed distance from the target. The algorithm results in an (almost explicit)formula for the solution, expressed in terms of the operator governing the system. Theformula itself was derived previously (cf. [17]), but its implementation was based on spectraldecomposition, which requires knowledge or construction of eigenfunctions of the operator.

The main novelty of this article is twofold. Firstly, we provide a complete quantifiedsensitivity analysis of the solution with respect to all the data entering the problem. Inparticular, it implies a good approximation of the solution in cases where the operator or theexternal source are not completely determined. Secondly, for the numerical implementationwe explore efficient Krylov subspace techniques that allow us to approximate a complexfunction of an operator by a series of linear problems. We provide a-priori estimates for theapproximation that are not sensitive to any particular spatial discretization, and neither to amatrix representation of the operator A. The theoretical results are confirmed by numericalexamples. The first and the simplest example coincides with the one analysed in [17], andthe results obtained with two approaches are in complete agreement. The following, morecomplex examples confirm the good performance of the algorithm in the case of operatorswith variable coefficients and acting on irregular domains.

The proposed approach can be generalised to other optimal control problems. The firststep in this direction would be to consider a distributed control problem, i.e. one in which acontrol enters the equation through a non-homogeneous term and is active along the entiretime frame. This would also allow for boundary control problems which, by using the classicalFattorini’s approach [11], can be expressed as distributed ones. The second generalisationwould consider different norms that enter into the cost functional. In particular, it wouldbe tempting to include L1-terms in the cost, since these introduce sparsity into the control.Of course, such a generalisation requires a more subtle theoretical analysis, as the cost

22

Figure 5: Example 5.1, discontinuous diffusion. The initial control u = y(0) (left), thecomputed solution at time t = T/2 compared with the desired trajectory ω (middle), andthe optimal final state compared with the target y∗ (right) for three different values of thetolerance ε.

functional is not differentiable in this case. This approach would also enable to considerdifferent non-smooth, convex functionals.

Acknowledgment. The work of L.G. has been supported by Hrvatska Zaklada zaZnanost (Croatian Science Foundation) under the grant IP-2019-04-6268 - Randomized lowrank algorithms and applications to parameter dependent problems. The work of M.L. andI.N. has been supported by Hrvatska Zaklada za Znanost (Croatian Science Foundation)under the grant IP-2016-06-2468 - Control of Dynamical Systems.

REFERENCES

[1] H. Bauschke and P. Combettes, Convex analysis and monotone operator theory in Hilbert spaces,CMS Books in Mathematics book series, Springer, Cham, 2nd ed., 2017, https://doi.org/10.1007/978-3-319-48311-5.

[2] A. Beck, First-Order Methods in Optimization, Society for Industrial and Applied Mathematics,Philadelphia, 2017, https://doi.org/10.1137/1.9781611974997.

[3] M. Berljafa and S. Guttel, The RKFIT algorithm for nonlinear rational approximation, SIAM J.Sci. Comput., 39 (2017), pp. A2049–A2071, https://doi.org/10.1137/15M1025426.

[4] F. Boyer, On the penalised HUM approach and its applications to the numerical approximation of

23

https://doi.org/10.1007/978-3-319-48311-5

https://doi.org/10.1007/978-3-319-48311-5

https://doi.org/10.1137/1.9781611974997

https://doi.org/10.1137/15M1025426

Figure 6: Example 5.2. The prescribed final target y∗.

null-controls for parabolic problems, in CANUM 2012, 41e Congres National d’Analyse Numerique,L. Chupin and A. Munch, eds., vol. 41 of ESAIM Proc., EDP Sciences, Les Ulis, 2013, pp. 15–58,https://doi.org/10.1051/proc/201341002.

[5] C. Carthel, R. Glowinski, and J. Lions, On exact and approximate boundary controllabilitiesfor the heat equation: a numerical approach, J. Optim. Theory Appl., 82 (1994), pp. 429–484,https://doi.org/10.1007/BF02192213.

[6] E. Casas, B. Vexler, and E. Zuazua, Sparse initial data identification for parabolic PDE andits finite element approximations, Math. Control Relat. Fields, 5 (2015), pp. 377–399, https://doi.org/10.3934/mcrf.2015.5.377.

[7] G. Chen and Y. Xue, Perturbation analysis for the operator equation Tx = b in Banach spaces, J.Math. Anal. Appl., 212 (1997), pp. 107–125.

[8] W. Cody, G. Meinardus, and R. Varga, Chebyshev rational approximations to ex in [0,∞) andapplications to heat-conduction problems, J. Approx. Theory, 2 (1969), pp. 50–65, https://doi.org/10.1016/0021-9045(69)90030-6.

[9] K. Engel and R. Nagel, One-Parameter Semigroups for Linear Evolution Equations, vol. 194 ofGraduate Texts in Mathematics, Springer, New York, 2000, https://doi.org/10.1007/b97696.

[10] C. Fabre, J. Puel, and E. Zuazua, On the density of the range of the semigroup for semilinear heatequations, in Control and Optimal Design of Distributed Parameter Systems, J. Lagnese, D. Russell,and L. White, eds., vol. 70 of The IMA Volumes in Mathematics and its Applications, Springer,New York, 1995, pp. 73–91, https://doi.org/10.1007/978-1-4613-8460-1 4.

[11] H. O. Fattorini, Boundary control systems, SIAM Journal on Control, 6 (1968), pp. 349–385, https://doi.org/10.1137/0306025.

[12] E. Fernandez-Cara and A. Munch, Numerical exact controllability of the 1D heat equation: dualityand Carleman weights, J. Optim. Theory Appl., 163 (2014), pp. 253–285, https://doi.org/10.1007/s10957-013-0517-z.

[13] E. Fernandez-Cara and E. Zuazua, The cost of approximate controllability for heat equations: thelinear case, Adv. Differential Equations, 5 (2000), pp. 465–514, https://doi.org/ade/1356651338.

[14] J. Gopalakrishnan, L. Grubisic, and J. Ovall, Spectral discretization errors in filtered subspaceiteration, Math. Comp., 89 (2020), pp. 203–228, https://doi.org/10.1090/mcom/3483.

[15] T. Kato, Perturbation theory for linear operators, vol. 132 of Classics in Mathematics, Springer, Berlin,1995, https://doi.org/10.1007/978-3-642-66282-9.

[16] I. Lasiecka and R. Triggiani, Control theory for partial differential equations: continuous andapproximation theories, I Abstract parabolic systems, vol. 74 of Encyclopedia of Mathematicsand its Applications, Cambridge University Press, Cambridge, 2000, https://doi.org/10.1017/CBO9781107340848.

[17] M. Lazar, C. Molinari, and J. Peypouquet, Optimal control of parabolic equations by spectraldecomposition, Optimization, 66 (2017), pp. 1359–1381.

[18] Y. Li, S. Osher, and R. Tsai, Heat source identification based on l1 constrained minimization, InverseProbl. Imaging, 8 (2014), pp. 199–221, https://doi.org/10.3934/ipi.2014.8.199.

[19] D. Meidner and B. Vexler, Adaptive space-time finite element methods for parabolic optimization

24

https://doi.org/10.1051/proc/201341002

https://doi.org/10.1007/BF02192213

https://doi.org/10.3934/mcrf.2015.5.377

https://doi.org/10.3934/mcrf.2015.5.377

https://doi.org/10.1016/0021-9045(69)90030-6

https://doi.org/10.1016/0021-9045(69)90030-6

https://doi.org/10.1007/b97696

https://doi.org/10.1007/978-1-4613-8460-1_4

https://doi.org/10.1137/0306025

https://doi.org/10.1137/0306025

https://doi.org/10.1007/s10957-013-0517-z

https://doi.org/10.1007/s10957-013-0517-z

https://doi.org/ade/1356651338

https://doi.org/10.1090/mcom/3483

https://doi.org/10.1007/978-3-642-66282-9

https://doi.org/10.1017/CBO9781107340848

https://doi.org/10.1017/CBO9781107340848

https://doi.org/10.3934/ipi.2014.8.199

Figure 7: Example 5.2. The initial control u = y(0) (left), the computed solution at timet = T/2 compared with the desired trajectory ω (middle), and the optimal final state (right)at t = T for three different values of the tolerance ε. The red dashed circle marks theconstraint ω on the trajectory.

problems, SIAM J. Control Optim., 46 (2007), pp. 116–142, https://doi.org/10.1137/060648994.[20] M. Nashed, Perturbations and approximations for generalized inverses and linear operator equations,

in Generalized inverses and applications, M. Nashed, ed., Academic Press, New York, 1976,pp. 325–396, https://doi.org/10.1016/B978-0-12-514250-2.50013-5.

[21] J. Peypouquet, Convex optimization in normed spaces: theory, methods and examples, SpringerBriefsin Optimization, Springer, Cham, 2015, https://doi.org/10.1007/978-3-319-13710-0.

[22] M. Reed and B. Simon, Methods of modern mathematical physics. III Scattering theory, AcademicPress, San Diego, 1979.

[23] H. Stahl and T. Schmelzer, An extension of the ’1/9’-problem, J. Comput. Appl. Math., 233 (2009),pp. 821–834, https://doi.org/10.1016/j.cam.2009.02.084.

[24] L. Trefethen, J. Weideman, and T. Schmelzer, Talbot quadratures and rational approximations,BIT, 46 (2006), pp. 653–670, https://doi.org/10.1007/s10543-006-0077-9.

[25] F. Troltzsch, Optimal control of partial differential equations: theory, methods and applications,vol. 112 of Graduate Studies in Mathematics, American Mathematical Society, Providence, 2010,https://doi.org/10.1090/gsm/112.

25

https://doi.org/10.1137/060648994

https://doi.org/10.1016/B978-0-12-514250-2.50013-5

https://doi.org/10.1007/978-3-319-13710-0

https://doi.org/10.1016/j.cam.2009.02.084

https://doi.org/10.1007/s10543-006-0077-9

https://doi.org/10.1090/gsm/112

optimal control of parabolic equations { a spectral

Documents