mixed gradient-tikhonov methods for solving non …margotti/media/mix.pdf · non-linear ill-posed...

MIXED GRADIENT-TIKHONOV METHODS FOR SOLVINGNON-LINEAR ILL-POSED PROBLEMS IN BANACH SPACES

FABIO MARGOTTI

Abstract. Tikhonov regularization is a very useful and widely used method for find-ing stable solutions of ill-posed problems. A good choice of the penalization functionalas well as a careful selection of the topologies of the involved spaces is fundamentalto the quality of the reconstructions. These choices can be combined with some apriori information about the solution in order to preserve desired characteristics likesparsity constraints for example.

To prove convergence and stability properties of this method, one usually has toassume that a minimizer of the Tikhonov functional is known. In practical situationshowever, the exact computation of a minimizer is very difficult and even finding anapproximation can be a very challenging and expensive task if the involved spaceshave poor convexity or smoothness properties.

In this paper we propose a method to attenuate this gap between theory andpractice, applying a gradient-like method to a Tikhonov functional in order to ap-proximate a minimizer. Using only available information, we explicitly calculate amaximal step-size which ensures a monotonically decreasing error. The resultingalgorithm performs only finitely many steps and terminates using the discrepancyprinciple. In particular the knowledge of a minimizer or even its existence does notneed to be assumed.

Under standard assumptions, we prove convergence and stability results in rel-atively general Banach spaces, and subsequently, test its performance numerically,reconstructing conductivities with sparsely located inclusions and different kinds ofnoise in the 2D Electrical Impedance Tomography.

1. Introduction

We intent to find a solution of the non-linear and ill-posed inverse problem

(1) F (x) = y

where F operates between Banach spaces X and Y , that is, F : D(F ) ⊂ X → Y, withD(F ) denoting the domain of definition of F . We assume to have full knowledge ofthis operator, but only an approximation yδ for y satisfying

(2)∥∥y − yδ∥∥ ≤ δ,

is available. The noise level δ > 0 is assumed to be known as well.In the last few years, a large variety of methods such as gradient-based and Newton-

like methods have been employed to solve the above inverse problem in relative generalBanach spaces [21, 13, 15, 19]. Another popular and widely used method to stablyapproximate a solution of (1) is Tikhonov regularization. This method consists in

Date: January 8, 2016.The author acknowledges support by Coordenacao de Aperfeicoamento de Pessoal do Nıvel Superior

(CAPES) and Instituto Nacional de Matematica Pura e Aplicada (IMPA).1

2 FABIO MARGOTTI

finding (in case it exists) a minimizer xα of the Tikhonov functional

(3) Tα (x) :=1

r

∥∥F (x)− yδ∥∥r + αΦ (x) , α > 0,

where r > 1 and Φ: X → R+0 is a subdifferentiable functional. Choosing the regular-

ization parameter α depending on the noise level δ, the challenge is now proving theregularization property :

xα(δ) → x+ as δ → 0,

where x+ ∈ X is a solution of (1) . A vast literature concerning Tikhonov regularizationmethods is available [23, 2, 10, 14]. Probably the biggest trouble in this approach isthe necessity of solving an optimization problem. Finding a minimizer of (3) exactly isalmost impossible in practical situations. Even finding only an approximate minimizerfor (3) is not a trivial task and it becomes a very difficult and expensive problem (interms of computational effort) if the involved Banach spaces have poor smoothness orconvexity properties, see e.g. [3, 12].

An alternative for stably solving (1) , but exempted from the obligation of solving anoptimization problem, can be obtained employing a gradient method. If a functionalT : X → R is at least Gateaux-differentiable (in short G-differentiable) at x and X is aHilbert space, then the opposite of its G-derivative, −∇T (x) , is a descent direction forT from x , which means that T (x− λ∇T (x)) < T (x) provided λ > 0 small enough.Motivated by this result, the gradient methods in Hilbert spaces are defined by theiteration:

xn+1 = xn − λn∇T (xn) ,

where λn > 0 and T is the residual-functional

(4) T (x) :=1

2

∥∥F (x)− yδ∥∥2.

Consequently,∥∥F (xn+1)− yδ

∥∥ <∥∥F (xn)− yδ

∥∥ for λn small enough. The Frechet-derivative (in short F-derivative) of T exists whenever F is F-differentiable. In thiscase, the chain-rule yields

∇T (xn) = F ′ (xn)∗(F (xn)− yδ

).

It is well-known that for ill-posed problems, the iteration needs to be terminated in timein order to avoid error amplification. A usual stop-criterion is the so-called discrepancyprinciple: stop in the first iteration n = N (δ) satisfying

(5)∥∥yδ − F (xN(δ)

)∥∥ ≤ τδ,

with τ > 1 being a pre-defined constant. The regularization property now reads:

(6) xN(δ) → x+ as δ → 0.

The choice of the step-size λn is a crucial factor for the success of this kind ofmethod. Different choices of λn result in different gradient methods. Some classicalexamples are: Landweber method (constant step-size), Steepest Descent method (step-size minimizes the residual-functional (4)) and the Minimal Error method (where the

step-size minimizes the error-functional E (x) := ‖x− x+‖2), see [20] for more details.

Mixed methods are defined applying a gradient-like method to the Tikhonov func-tional (3) instead of applying it in the residual-functional (4) . The resulting iterationin Hilbert spaces is given by

xn+1 = xn − λn∇Tα (xn) ,

MIXED METHODS IN BANACH SPACES 3

with λn > 0. Of course this iteration is well-defined only if Tα is at least G-differentiable.Although the sequence generated by this iteration in some sense approximates a min-imizer of Tα (Tα (xn+1) < Tα (xn) for λn small enough), we do not intend to find thisminimizer exactly. Instead, the iteration is terminated using the discrepancy principle(5) . The regularization property reads exactly as in (6) in this situation.

In this paper we propose a mixed method in Banach spaces, combining a gradient-like method with a Tikhonov functional. Using only available information, we explicitlycompute a maximal step-size λmax = λmax (n, α) such that λn ∈ (0, λmax] implies thenew method has a monotonically decreasing error. In order to have a non-empty inter-val for selecting λn, we choose either a small constant (in n) regularization parameterα = α (δ) or allow it to vary during the iteration, i.e., α = αn (δ) . The first variation issimilar to Tikhonov-like methods, because the last iterate xN(δ) in some sense approx-imates a minimizer xα(δ) of (3) , however, it is not necessary to find or even assumethe existence of a minimizer of (3). The second strategy in turn, results in a methodsimilar to gradient-like methods, but with the significant advantage of having an extrastability term, which is automatically adjusted, gathering information on the currentiterate, and accordingly conferring more stability to the whole iteration process. Inboth cases, an appropriate penalization functional can be utilized for incorporating apriori information of the exact solution and preserving desired features in the recon-structions. The analysis includes as particular cases, non-linear versions of the gradientmethods: Landweber, Steepest Descent and Minimal Error method.

This work is structured as following: the next section collects some necessary resultsfrom [8, 7, 22] concerning the geometry of Banach spaces. In Section 3, we presentthe mixed method and in Section 4, a complete convergence analysis is carried out.Finally, we test the efficiency of our method in the non-linear and highly ill-posedinverse problem of 2D Electrical Impedance Tomography in Section 5.

2. A little about geometry of Banach spaces

In this section we collect needed facts about the geometry of Banach spaces. Forproofs and more details we refer to the book of Cioranescu [8] and to [7, 22]. Thenotation ., which we will use in the following sections will be used with followingmeaning: a (x) . b (x) if and only if there exists a positive constant C independentof x such that a (x) ≤ C b (x) for all x. Further, we assume w.l.o.g. that X is a realBanach space, see [8, Remark 3.2].

The modulus of smoothness of the Banach space X is defined as

ρX (τ) :=1

2sup ‖x+ τy‖+ ‖x− τy‖ − 2 : ‖x‖ = ‖y‖ = 1 , τ ≥ 0,

and the modulus of convexity as

δX (ε) := inf

1− ‖x+ y‖

2: ‖x‖ = ‖y‖ = 1, ‖x− y‖ ≥ ε

, 0 ≤ ε ≤ 2.

The space X is called uniformly smooth if limτ→0+ρX(τ)τ

= 0 and uniformly convex ifδX (ε) > 0 for all 0 < ε ≤ 2. Let s > 1 be fixed. We call X s−smooth if ρX (τ) . τ s forall τ ≥ 0 and we call it s−convex if δX (ε) & εs for all 0 ≤ ε ≤ 2. X uniformly smoothor uniformly convex implies X reflexive. Further, X is s−smooth (resp. uniformlysmooth) iff X∗ is s∗−convex (resp. uniformly convex) and vice-versa, with 1

s+ 1

s∗= 1.

4 FABIO MARGOTTI

Fix p > 1 and define the norm-functional as

(7) f (x) :=1

p‖x‖p , x ∈ X.

The Banach space X is called strictly convex (resp. locally uniformly convex) iff f isstrictly convex (resp. locally uniformly strictly convex1) and it is called smooth (resp.locally uniformly smooth) iff f is G-differentiable (resp. F-differentiable) in X. Thefollowing implications hold:

s− smoothness⇒ uniform smoothness⇒ locally uniform smoothness⇒ smoothness,

s− convexity⇒ uniform convexity⇒ locally uniform convexity⇒ strict convexity,

and none reciprocal is true. Hilbert spaces are 2−smooth and 2−convex.Important examples are the Lebesgue spaces Lp (Ω) , the Sobolev spacesW n,p (Ω) and

the space of p−summable sequences `p (R) . They are2 p∧2−smooth and p∨2−convexfor 1 < p <∞. For p = 1 or p =∞ they are neither smooth nor strictly convex.

A continuous and strictly increasing function ϕ : R+0 → R+

0 such that ϕ (0) = 0 andlimt→∞ ϕ (t) =∞ is called a gauge function. The duality mapping associated with thegauge function ϕ is the point-to-set mapping Jϕ : X → 2X

∗defined by

Jϕ (x) := x∗ ∈ X∗ : 〈x∗, x〉 = ‖x∗‖ ‖x‖ and ‖x∗‖ = ϕ (‖x‖)where 〈·, ·〉 : X∗ ×X → R is the duality pairing. The duality mapping associated withthe gauge function t 7−→ tp−1 has the special notation Jp,

Jp (x) :=x∗ ∈ X∗ : 〈x∗, x〉 = ‖x∗‖ ‖x‖ and ‖x∗‖ = ‖x‖p−1 .

The mapping J2 is called the normalized duality mapping and as a consequence of RieszRepresentation Theorem, it is the identity operator in Hilbert spaces. Furthermore, J2

is linear only in Hilbert spaces. A selection jϕ : X → X∗ of the duality mapping Jϕis a function which satisfies jϕ (x) ∈ Jϕ (x) for all x ∈ X. From definition, it followsimmediately that each selection jp of the duality mapping Jp satisfies the inner-product-like properties

〈jp (x) , y〉 ≤ ‖x‖p−1 ‖y‖ and 〈jp (x) , x〉 = ‖x‖p .The very important Asplund’s Theorem guarantees that if ϕ is a gauge function,

then the functional

(8) ψ (x) :=

∫ ‖x‖0

ϕ (s) ds

is convex and its subdifferential coincides with the duality mapping:

(9) Jϕ (x) = ∂ψ (x) .

In particular, Jp (x) = ∂f (x) , with f from (7) . Thus, X smooth implies that Jp : X →X∗ is single-valued and in this case, Jp = ∇f (·) is the G-derivative of f. Further,X locally uniformly smooth implies the F-differentiability of f and consequently Jp iscontinuous. Analogous results hold true if Jp is replaced by Jϕ. If X is additionallylocally uniformly convex, then X is reflexive and Jϕ is invertible, with a continuousinverse given by J−1

ϕ = J∗ϕ−1 . In particular,

(10) J−1p = J∗p∗ : X∗ → X∗∗ ∼= X

1See the definition of a locally uniformly strictly convex function in [8, Def. 2.10 ii), Ch. II].2a ∧ b := min a, b and a ∨ b := max a, b .


holds true in a locally uniformly smooth and locally uniformly convex Banach space.Suppose now that X is a smooth Banach space and ϕ is a fixed gauge function.

Then Jϕ is single-valued and we define the Bregman distance ∆ϕ : X ×X → [0,∞) as

∆ϕ (x, y) := ψ (x)− ψ (y)− 〈Jϕ (y) , x− y〉 ,

where ψ is the function from (8) . From Asplund’s Theorem (9) , it follows that ∆ϕ (x, y) ≥0. Of course x = y implies ∆ϕ (x, y) = 0. Using the particular gauge functionϕ (t) = tp−1 the Bregman distance becomes

∆p (x, y) =1

p‖x‖p − 1

p‖y‖p − 〈Jp (y) , x− y〉

=1

p‖x‖p − 〈Jp (y) , x〉+

1

p∗‖Jp (y)‖p

∗.

This equality mimics the polarization identity

(11)1

2‖x− y‖2 =

1

2‖x‖2 − 〈y, x〉+

1

2‖y‖2 ,

which holds true in Hilbert spaces, and where also holds ∆2 (x, y) = 12‖x− y‖2 .

If X is strictly convex, the function f in (7) is strictly convex, which in turn impliesthe strict convexity of ∆p on its first variable and consequently ∆p (x, y) = 0 iff x = y.The Bregman distance is not a metric, because it does not necessarily satisfy thesymmetry for instance. But if X is reflexive, the symmetry-like property ∆p (x, y) =∆p∗ (Jp (y) , Jp (x)) can be shown. A straightforward calculation shows the importantThree Points Identity :

(12) ∆p (z, y)−∆p (z, x) = ∆p (x, y) + 〈Jp (y)− Jp (x) , x− z〉 ,

for all x, y, z ∈ X. Moreover,

∆p (x, y) ≥ 1

p‖x‖p +

1

p∗‖y‖p − ‖y‖p−1 ‖x‖ .

Now, if (xn)n∈N ⊂ X is an arbitrary sequence and x ∈ X is a fixed vector, then theinequality ∆p (x, xn) ≤ ρ implies

‖xn‖p−1

(1

p∗‖xn‖ − ‖x‖

)≤ ρ.

Considering now the cases 1p∗‖xn‖−‖x‖ ≤ 1

2p∗‖xn‖ and 1

p∗‖xn‖−‖x‖ > 1

2p∗‖xn‖, one

can prove the implication

(13) ∆p (x, xn) ≤ ρ =⇒ ‖xn‖ ≤ 2p∗(‖x‖ ∨ ρ1/p

).

Therefore, (xn)n∈N is bounded whenever ∆p (x, xn) is bounded. A similar result can beproven if ∆p (xn, x) ≤ ρ.

If the duality mapping is single-valued and continuous (this is the case in a locallyuniformly smooth Banach space) then the continuity is handed down to both argumentsof the Bregman distance ∆p.

The important Xu-Roach Theorems [25] imply that in a s−convex Banach spacethere exists, for each p ≤ s, a positive constant Kp,s such that

Kp,s (‖y‖ ∨ ‖x− y‖)p−s ‖x− y‖s ≤ ∆p (x, y)

6 FABIO MARGOTTI

for all x, y ∈ X. In particular, if the sequence (xn)n∈N ⊂ X is bounded (or p = s),then there exists a constant C > 0 such that

(14) ‖xn − xm‖s ≤ C∆p (xn, xm) .

Similarly, in a s−smooth Banach space it holds, for each p ≥ s,

(15) ∆p (x, y) ≤ Cp,s(‖y‖p−s ‖x− y‖s ∨ ‖x− y‖p

)for all x, y ∈ X, where Cp,s > 0 is a constant.

3. Mixed Gradient-Tikhonov methods

To properly introduce the ideas, some requirements on the inverse problem F (x) = yare necessary:

Assumption 1. (a) There exists a solution x+ ∈ X of equation (1) and a numberρ > 0 such that

Bρ

(x+,∆p

):=v ∈ X : ∆p

(x+, v

)< ρ⊂ D (F ) .

(b) The function F is continuously F-differentiable in Bρ (x+,∆p) and its derivativesatisfies

‖F ′(v)‖ ≤M for all v ∈ Bρ

(x+,∆p

),

where M > 0 is a constant.(c) (Tangential Cone Condition (TCC)): There exists a constant 0 ≤ η < 1 such

that‖F (v)− F (w)− F ′(w) (v − w)‖ ≤ η ‖F (v)− F (w)‖ ,

for all v, w ∈ Bρ (x+,∆p).

Assuming Y is a locally uniformly smooth Banach space, the norm-functional f (y) :=1r‖y‖rY is F-differentiable and its F-derivative coincides with the duality mapping:∇f (y) := Jr (y) . Further, Jr : Y → Y ∗ is single-valued and continuous and sinceF is F-differentiable too, an application of chain rule yields

∇(

1

r

∥∥F (x)− yδ∥∥r) = F ′ (x)∗ Jr

(F (x)− yδ

).

However, in a general Banach space Y , the duality mapping Jr is not necessarily single-valued and the above inequality does not hold automatically true. For this reason, wedefine the point-to-set mapping dTα : Bρ (x+,∆p)→ 2X

∗as

(16) dTα (x) :=w∗ + αz∗ : w∗ ∈ F ′ (x)∗ Jr

(F (x)− yδ

)and z∗ ∈ ∂Φ (x)

⊂ X∗,

where Φ: X → R+0 is a subdifferentiable functional. Observe that, if Y is locally

uniformly smooth and Φ is G-differentiable (resp. F-differentiable) at x, then theTikhonov functional Tα in (3) is G-differentiable (resp. F-differentiable) at x as well,and the set dTα (x) is singleton. In this case, the unique element in dTα (x) is theG-derivative (resp. F-derivative) of Tα at x, i.e., dTα (x) = ∇Tα (x) .

The plan now is applying a gradient-like method to Tα in order to approximate aminimizer (in case it exists): assume X is locally uniformly smooth and s−convex withp ≤ s and define the iteration3

(17) Jp (xn+1) := Jp (xn)− λnΨn, Ψn ∈ dTα (xn) ,

3Jp is single-valued and invertible in this situation, see (10) .


with λn > 0. Assuming now that xn is well-defined and belongs to Bρ (x+,∆p) (⊂D (F )) and adopting the notations

An := F (xn) , bδn := yδ − F (xn) and en := x+ − xn,it follows from Three Points Identity (12) , with f (λn) := ∆p (x+, xn+1)−∆p (x+, xn) ,that

f (λn) = ∆p (xn, xn+1) +⟨Jp (xn+1)− Jp (xn) , xn − x+

⟩(18)

= ∆p∗ (Jp (xn+1) , Jp (xn))− λn⟨A∗njr

(bδn), en⟩

+ λnα 〈z∗n, en〉 ,

with z∗n ∈ ∂Φ (xn) . Our aspiration now is calculating a positive number λmax =λmax (α, n) such that the inequalities 0 < λn ≤ λmax will imply the rightmost ex-pression in (18) is negative, resulting in the monotonicity of the error:

∆p

(x+, xn+1

)< ∆p

(x+, xn

)and consequently xn+1 ∈ Bρ (x+,∆p) ⊂ D (F ) .

To estimate the first term in the rightmost expression in (18) , we make use ofthe s−convexity of X, which is equivalent to the s∗−smoothness of X∗. As xn ∈Bρ (x+,∆p) , it follows from (13) that ‖xn‖ ≤ Cρ,x+ , with

(19) Cρ,x+ := 2p∗(∥∥x+

∥∥ ∨ ρ1/p).

Since p∗ ≥ s∗, inequality (15) together with definition (17) implies that

∆∗p (Jp (xn+1) , Jp (xn)) ≤ Cp∗,s∗(‖Jp (xn)‖p

∗−s∗ ‖Jp (xn+1)− Jp (xn)‖s∗∨ ‖Jp (xn+1)− Jp (xn)‖p

∗)

≤ Cp∗,s∗(Cp−s∗(p−1)

ρ,x+ λs∗

n ‖Ψn‖s∗∨ λp∗n ‖Ψn‖p

∗).

The discrepancy principle (5) ensures that δ < 1τ

∥∥bδn∥∥ for n = 0, ..., N (δ) − 1. Thenfrom TCC and (2) we obtain∥∥bδn − Anen∥∥ =

∥∥yδ − F (xn)− F ′ (xn)(x+ − xn

)∥∥≤ δ +

∥∥F (x+)− F (xn)− F ′ (xn)

(x+ − xn

)∥∥≤ δ + η

∥∥F (x+)− F (xn)

∥∥≤ δ + η

(∥∥yδ − F (xn)∥∥+ δ

)≤(η +

1 + η

τ

)∥∥bδn∥∥ .Therefore, the second term in the rightmost expression in (18) can be estimated by

−λn⟨A∗njr

(bδn), en⟩

= λn[⟨jr(bδn), bδn − Anen

⟩−⟨jr(bδn), bδn⟩]

≤ λn

(∥∥bδn∥∥r−1 ∥∥bδn − Anen∥∥− ∥∥bδn∥∥r) ≤ −λnC0

∥∥bδn∥∥r ,with

(20) 0 < C0 ≤ 1− η − 1 + η

τ,

which is well-defined provided τ > (1 + η) / (1− η) . Finally, if a bound for Φ (x+) isknown, i.e., if a number C1 > 0 satisfying Φ (x+) ≤ C1 is available, it follows fromdefinition of subdifferential that the third term in the rightmost expression in (18) canbe bounded by

(21) λnα 〈z∗n, en〉 ≤ λnα(Φ(x+)− Φ (xn)

)≤ λnαC1.

8 FABIO MARGOTTI

Putting all together, we arrive at f (λn) ≤ g (λn) , with

g (λn) := λn

[Cp∗,s∗

(Cp−s∗(p−1)

ρ,x+ λs∗−1n ‖Ψn‖s

∗∨ λp∗−1

n ‖Ψn‖p∗)− C0 ‖bn‖r + αC1

].

Observe that g (λn) < 0 if λn ∈ (0, λmax] , where the calculable bound λmax is given by4

(22) λmax := C2

[C0

∥∥bδn∥∥r − αC1

]s−1

‖Ψn‖s∧[C0

∥∥bδn∥∥r − αC1

]p−1

‖Ψn‖p

,

with 0 < C2 <(C1−sp∗,s∗C

p−sρ,x+ ∧ C

1−pp∗,s∗

).

To have a positive λmax, we arrive with two possibilities:

• Variation 1: α = α (δ) is fixed during the whole iteration. In this case

λmax > 0⇔∥∥bδn∥∥r > C1

C0

α (δ) .

Observe that if

(23) 0 ≤ α (δ) ≤ C3δr,

with 0 < C3 <C0

C1τ r, then

C1

C0

α (δ) < (τδ)r <∥∥bδn∥∥r

as long as the discrepancy principle (5) is not satisfied. Therefore λmax remainspositive and it is possible to choose λn ∈ (0, λmax].• Variation 2: α = αn (δ) is possibly updated during the iteration. In this

situation,

λmax > 0⇔ 0 ≤ αn (δ) <C0

C1

∥∥bδn∥∥r .Moreover, choosing

(24) 0 ≤ αn ≤ C4 ‖bn‖r

with 0 < C4 < C0/C1, it follows that

(25) C0

∥∥bδn∥∥r − αnC1 ≥ C0

∥∥bδn∥∥r − C1C4

∥∥bδn∥∥r = (C0 − C1C4)∥∥bδn∥∥r > 0.

See Algorithm 1 for an implementation in pseudocode.

Due to the discrepancy principle (5) , the strategy (23) for choosing the regularizationparameter α is a particular case of (24) . Hence, it suffices to carry out the convergenceanalysis only for the second variation.

4A simple calculation could replace λmax with the larger bound

λmax,2 =

[C0

∥∥bδn∥∥r − α (C1 − Φ (xn))]s−1

Cs−1p∗,s∗C

s−pρ,x+ ‖Ψn‖s

∧[C0

∥∥bδn∥∥r − α (C1 − Φ (xn))]p−1

Cp−1p∗,s∗ ‖Ψn‖p

,

but since this expression is more complicated, we keep the first one for the ease of the presentation.


Algorithm 1 Mixed Method

Input: F ; Φ; yδ; δ; η; Cp∗,s∗ ; C1; Cρ,x+ ; x0;

Output: xN with ‖yδ − F (xN)‖ ≤ τδ;

Choose τ > 1+η1−η ; 0 < C0 ≤ 1− η − 1+η

τ; 0 < C2 <

(Cs−1p∗,s∗C

s−pρ,x+ ∧ C

p−1p∗,s∗

);

0 < C4 < C0/C1;

n := 0;

while ‖yδ − F (xn)‖ > τδ do

Choose Ψn ∈ F ′(xn)∗Jr(yδ − F (xn)) + α∂Φ(xn); 0 ≤ α ≤ C4‖yδ − F (xn)‖r;

λmax := C2

[C0‖yδ−F (xn)‖r−αC1]s−1

‖Ψn‖s ∧ [C0‖yδ−F (xn)‖r−αC1]p−1

‖Ψn‖p

;

Choose 0 < λn ≤ λmax;

xn+1 := J∗p∗(Jp(xn)− λnΨn);

n := n+ 1;

end while

xN := xn;

4. Convergence Analysis

We formalize the results of last section in the next theorem. Note that in principle,N (δ) =∞ is possible, which means that termination of Algorithm 1 is not guaranteedyet.

Theorem 2. Let X be locally uniformly smooth and s−convex for some s ≥ p > 1.Fix r > 1 and suppose that Assumption 1 holds true. If x0 ∈ Bρ (x+,∆p) , τ >(1 + η) / (1− η) in (5) and λn ∈ (0, λmax] , with λmax defined in (22), then

• all the iterates are well-defined and belong to Bρ (x+,∆p) as long as the discrep-ancy principle (5) is not satisfied;• the iterates have the monotonically decreasing error behavior

(26) ∆p

(x+, xn+1

)< ∆p

(x+, xn

)for n = 0, ..., N (δ)− 1;• the inequality

(27) λn∥∥bδn∥∥r . ∆p

(x+, xn

)−∆p

(x+, xn+1

)holds for n = 0, ..., N (δ)− 1;• the generated sequence is uniformly bounded:

(28) ‖xn‖ ≤ Cρ,x+ for all δ > 0 and n = 0, ..., N (δ) ,

where Cρ,x+ > 0 is defined in (19) .

Proof. We employ an inductive argument: suppose that for some n < N (δ) , all theiterates x0, ..., xn are well defined and belong to Bρ (x+,∆p) . Since xn ∈ Bρ (x+,∆p),τ > (1 + η) / (1− η) and λn ≤ λmax, the reasoning in last section shows that

∆p

(x+, xn+1

)−∆p

(x+, xn

)≤ λnCp∗,s∗

(Cp−s∗(p−1)

ρ,x+ λs∗−1

max ‖Ψn‖s∗∨ λp∗−1

max ‖Ψn‖p∗)

− λn (C0 ‖bn‖r + αnC1)

10 FABIO MARGOTTI

≤ −λn (1− C2) [C0 ‖bn‖r − αnC1](25)

. −λn∥∥bδn∥∥r .

Thus, (27) holds true. The result also implies (26) and consequently

∆p

(x+, xn+1

)< ∆p

(x+, xn

)< ρ.

Therefore xn+1 ∈ Bρ (x+,∆p) ⊂ D (F ) and the induction proof is complete.

In order to prove that Algorithm 1 terminates, we first prove that λmax ≥ λmin > 0with

(29) λmin := c ‖bn‖t , t ≥ −r,where c > 0 is a constant independent of n and δ, and then restrict λn to the interval[λmin, λmax] . We start assuming the subdifferentiable functional Φ verifies the followingassumption:

Assumption 3. The sequence (z∗n)n∈N ⊂ X∗ is uniformly bounded on n and δ whenever(xn)n∈N ⊂ X is uniformly bounded on the same variables, where z∗n ∈ ∂Φ (xn) for eachn ∈ N.

Many usual choices of Φ satisfy above assumption. For instance:

• Φ (x) :=∫ ‖x‖

0ϕ (s) ds, where ϕ is a gauge function, see Section 2. In this case,

due to Asplund’s Theorem (9) , ∂Φ (x) = Jϕ (x) and therefore

z∗n ∈ ∂Φ (xn)⇒ ‖z∗n‖ = ϕ (‖xn‖) .Since ϕ is continuous, the desired result follows. For the particular choiceϕ (t) = tp−1 we have the well-known and widely used penalization functionalΦ (x) = 1

p‖x‖p . Of course the choice Φ (x) = 1

p‖x− x0‖p is also permitted.

• Φ (x) := ∆p (x, x) , where x ∈ X is a fixed vector. Then ∂Φ (x) = Jp (x)−Jp (x)

and ‖z∗n‖ ≤ ‖xn‖p−1 + ‖x‖p−1 .

See that, if x = x0, then the upper bound C1 in (21) can be chosen equals ρbecause Φ (x+) = ∆p (x+, x0) < ρ.• Φ (x) := ‖x‖ is also possible because, see [8, Prop. 3.4, Ch. I],

∂ ‖x‖ = z∗ ∈ X∗ : 〈z∗, x〉 = ‖x‖ and ‖z∗‖ = 1 , x 6= 0.

Therefore, ‖z∗n‖ ≤ 1 for all n ∈ N.• Φ (x) := Φn (x) = ∆p (x, xn−1) . Here, ∂Φn (x) = Jp (x) − Jp (xn−1) . Thus,

‖z∗n‖ ≤ ‖xn‖p−1 + ‖xn−1‖p−1 .

Observe that in this case,

Φn

(x+)

= ∆p

(x+, xn−1

)≤ ∆p

(x+, x0

)< ρ.

for all n ∈ N, and therefore, C1 = ρ can be chosen again.

Remark 4. The penalization functional Φ has a strong influence in the quality of thereconstructed solution. A suitable selection of Φ can help to preserve desired char-acteristics of the solution, whereas a bad choice of this functional may destroy thesefeatures.

If the sought-for solution has a sparse representation in a fixed basis for example, thischaracteristic is not completely damaged during the iteration assuming that a suitablepenalization functional Φ is chosen. Sparsity constraints are preserved for example,if the functionals Φ (x) = 1

p‖x− x0‖p or Φ (x) = ∆p (x, x0) , for p ≈ 1, are used.


In these cases, Φ (x) has a small value for those vectors x, whose representation ofx − x0 in the fixed basis is sparse. This means that these functionals penalize vectorswith a non-sparse representation more severely and accordingly, give priority for sparsereconstructions, see [9, 12] and the numerical experiments in Section 5.

From (22) and (25) it follows that

(30) λmax &

∥∥bδn∥∥r(s−1)

‖Ψn‖s∧∥∥bδn∥∥r(p−1)

‖Ψn‖p.

The uniform bound (28) together with Assumption 3 implies that the sequences (z∗n)0≤n≤N(δ)

are uniformly bounded on n and δ. Moreover, from inequality α .∥∥bδn∥∥r and Assump-

tion 1(b) , it follows that

(31) ‖Ψn‖ ≤M∥∥bδn∥∥r−1

+ α ‖z∗n‖ .∥∥bδn∥∥r−1

+∥∥bδn∥∥r . ∥∥bδn∥∥r−1 ∨

∥∥bδn∥∥r .This implies in view of (30) that

(32) λmax &∥∥bδn∥∥s−r ∧ ∥∥bδn∥∥−r ∧ ∥∥bδn∥∥p−r ,

which proves that there exists c > 0, independent of n and δ and t ≥ −r in (29) suchthat 0 < λmin ≤ λmax.

We are now well-prepared to prove termination of Algorithm 1.

Theorem 5 (Termination). Assume all the hypotheses of Theorem 2 and assume ad-ditionally that Assumption 3 holds true as well as λn ∈ [λmin, λmax] , where λmax isdefined in (22) and λmin in (29). Then Algorithm 1 terminates.

Proof. From (32) it follows that the interval [λmin, λmax] is non-empty provided the

constant c in (29) small enough. Since λn ≥ λmin &∥∥bδn∥∥t , with t ≥ −r, it follows from

(5) that for any ` ≤ N (δ),

(τδ)t+r ` ≤`−1∑n=0

∥∥bδn∥∥t+r . `−1∑n=0

λn∥∥bδn∥∥r (27)

.`−1∑n=0

(∆p

(x+, xn

)−∆p

(x+, xn+1

))= ∆p

(x+, x0

)−∆p

(x+, x`

)≤ ∆p

(x+, x0

)<∞,

which can only be possible if ` <∞ and consequently N (δ) <∞.

Weak convergence follows from (28) and reflexivity of X. More precisely, let (δj)j∈Nbe a zero-sequence and assume all the hypotheses of last Theorem. If the operator Fis weakly closed, then each subsequence of

(xN(δj)

)j∈N has itself a subsequence which

converges weakly to a solution of (1) . If this solution is unique, then xN(δ) x+ asδ → 0, see e.g. [17, Cor. 3.5].

Remark 6. The reasoning in last Theorem shows that the sequences of the residuals(bδn)

0≤n≤N(δ)are uniformly bounded. Thus, from (32) ,

(33) λmax &∥∥bδn∥∥s−r .

Additionally, inequality (31) guarantees that the sequences (Ψn)0≤n≤N(δ) are uniformly

bounded too. Hence, in view of (25) , the definition of λmax in (22) could be changed bythe much simpler expression

(34) λmax := C5

∥∥bδn∥∥r(s−1)

‖Ψn‖s,

12 FABIO MARGOTTI

with C5 > 0 being a small constant. In this case, λmax ≤ λmax and the property

0 < λn ≤ λmax =⇒ ∆p

(x+, xn+1

)< ∆p

(x+, xn

)would still hold true.

Some obvious possibilities for choosing a step-size satisfying λn ∈ [λmin, λmax] are:

λn = c∥∥bδn∥∥s−r, for a small constant c > 0, λn = λmax and λn = λmax, see (33) and

(34). Further, λn can be randomically chosen in the interval [λmin, λmax].Due to (33) we are allowed to change the definition of λmin in (29) to

(35) λmin := c ‖bn‖t , t > −rand we still have λmax ≥ λmin provided c > 0 small enough. This more restrictivedefinition of λmin will be important for proving that F (xn) → y in the noiseless case,see (40) .

Remark 7. If α ≡ 0 is chosen, definition (17) becomes

Jp (xn+1) := Jp (xn)− λnF ′ (xn)∗ jr(F (xn)− yδ

),

which is a gradient-like method applied directly to the non-linear equation (1). In thiscase, the step-size (34) is transformed in

λME = C5

∥∥bδn∥∥r(s−1)

‖A∗nJr (bδn)‖s

and the resulting gradient method is a variation of the Minimal Error method, see [20]and the first section of this paper. Further,∥∥A∗nJr (bδn)∥∥p∗r∗(s−1)

=⟨J∗p∗(A∗nJr

(bδn)), A∗nJr

(bδn)⟩r∗(s−1)

≤∥∥AnJ∗p∗ (A∗nJr (bδn))∥∥r∗(s−1) ∥∥bδn∥∥r(s−1)

,

and therefore, the step-size

(36) λSD := C5

∥∥A∗nJr (bδn)∥∥p∗r∗(s−1)−s∥∥AnJ∗p∗ (A∗nJr (bδn))∥∥r∗(s−1)

satisfies λSD ≤ λME ≤ λmax. Moreover, using Assumption 1(b) and the boundedness ofthe residual sequence, it follows that

(37) λSD ≥ C5

∥∥A∗nJr (bδn)∥∥r∗(s−1)−s

M r∗(s−1)≥ C6,

whenever s ≤ r, where C6 > 0 is a small constant. Thus λSD ∈ [λmin, λmax]. Observethat the step-size (36) reduces to

λSD :=

∥∥A∗nbδn∥∥2

‖AnA∗nbδn‖2

in Hilbert spaces whenever C5 = 1 and p = s = r = 2. This means that this variationof the Steepest Descent method in Banach spaces is also included as particular case inour convergence analysis5. Finally, inequality (37) implies that the Landweber method

5In Hilbert spaces, C5 ≤ 2(1− η − 1+η

τ

). Thus C5 = 1 can be chosen provided η < 1

2 and τ ≥2 1+η

1−2η , see again [20].


satisfies λLW ∈ [λmin, λmax] with the constant step-size λLW = C6 provided s ≤ r. Inthis case, the inequalities

0 < C6

∥∥bδn∥∥0= λmin = λLW ≤ λSD ≤ λME = λmax ≤ λmax

hold.

4.1. The noiseless case. In this subsection and in the next one, we must clearly differbetween the noisy (δ > 0) and the noise-free (δ = 0) cases. Therefore, a notation witha superscript δ will be adopted whenever data is contamined with noise: xδn, b

δn, A

δn,

etc., and in contrast, xn, bn, An, etc. always originate from exact data. The initialiterate is chosen independently of δ: xδ0 = x0.

Algorithm 1 is well-defined in the noiseless situation if the inequalities in (20) arereplaced with 0 < C0 ≤ 1− η, and in this case, all the results of Theorem 2 still holdtrue. Notice that, although in the noise-free case, the stop-criterion (5) is well-definedand reads ‖bn‖ = 0, in which case xn is a solution of (1) , Algorithm 1 generally doesnot stop, but generates a sequence which converges strongly to a solution of (1) as wewill prove in Theorem 8 below.

Theorem 8 (Convergence without noise). Let X be locally uniformly smooth ands−convex for some s ≥ p > 1. Fix r > 1 and suppose that Assumptions 1 and 3 holdtrue. If δ = 0, x0 ∈ Bρ (x+,∆p) and λn ∈ [λmin, λmax] , where λmin and λmax are definedin (35) and (22) respectively, then Algorithm 1 either terminates with a solution of (1)or generates a sequence which converges to a solution of this equation. In particular,if the solution of (1) is unique in Bρ (x+,∆p) , then xn → x+ as n→∞.Proof. If Algorithm 1 stops after n ∈ N iterations, then the current iterate is a solutionof (1) because ‖y − F (xn)‖ = ‖bn‖ = 0. Otherwise, (xn)n∈N is a Cauchy sequence, aswe will prove now. Let m, l ∈ N with m ≤ l and choose z ∈ m, . . . , l such that

(38) ‖bz‖ ≤ ‖bn‖ for all n ∈ m, . . . , l .As X is s−convex and the sequence (xn)n∈N is bounded, see (28), it follows from (14)that

‖xm − xl‖s ≤ 2s−1 (‖xm − xz‖s + ‖xz − xl‖s) . ∆p (xz, xm) + ∆p (xz, xl) .

Three Points Identity (12) implies now that

(39) ‖xm − xl‖s . βm,z + βl,z + f (z,m, l)

withβm,z := ∆p

(x+, xm

)−∆p

(x+, xz

)and

f (z,m, l) :=∣∣⟨Jp (xz)− Jp (xm) , xz − x+

⟩∣∣+∣∣⟨Jp (xz)− Jp (xl) , xz − x+

⟩∣∣ .By monotonicity (26), we conclude that ∆p (x+, xn) → γ ≥ 0 as n → ∞. Thus, βm,zand βl,z converge to zero as m→∞ (which causes z →∞ and l→∞). Further, usingz∗n ∈ ∂Φ (xn) , it follows from definition (17) that

f (z,m, l) ≤l−1∑n=m

∣∣⟨Jp (xn+1)− Jp (xn) , xz − x+⟩∣∣

≤l−1∑n=m

λn∣∣⟨A∗nJr (bn) + αz∗n, xz − x+

⟩∣∣

14 FABIO MARGOTTI

≤l−1∑n=m

λn(‖bn‖r−1

∥∥An (xz − x+)∥∥+ α ‖z∗n‖

∥∥xz − x+∥∥) .

As the sequence (xn)n∈N is bounded, it follows from Assumption 3 that (z∗n)n∈N isbounded too and since 0 ≤ α . ‖bn‖r ,

α ‖z∗n‖∥∥xz − x+

∥∥ . ‖bn‖r .From assumption 1(c) ,∥∥An (xz − x+

)∥∥ ≤ ∥∥An (x+ − xn)∥∥+ ‖An (xz − xn)‖

≤ ‖bn‖+∥∥bn − An (x+ − xn

)∥∥+ ‖F (xz)− F (xn)‖+ ‖F (xz)− F (xn)− F ′ (xn) (xz − xn)‖≤ (η + 1) (‖bn‖+ ‖F (xz)− F (xn)‖)

≤ (η + 1) (2 ‖bn‖+ ‖bz‖)(38)

≤ 3 (η + 1) ‖bn‖ .Putting all together, we arrive at

f (z,m, l) .l−1∑n=m

λn ‖bn‖r(27)

.l−1∑n=m

(∆p

(x+, xn

)−∆p

(x+, xn+1

))= βm,l,

which together with (39) implies that (xn)n∈N is a Cauchy sequence. Since X is com-plete, it converges to some x∞ ∈ X. Now, as λn ≥ λmin,

(40)∞∑n=0

‖bn‖r+t .∞∑n=0

λn ‖bn‖r(27)< ∞.

Since r > −t, it follows that ‖y − F (xn)‖ = ‖bn‖ → 0 as n → ∞ and since F iscontinuous, we have y = F (x∞). If (1) has only one solution in Bρ (x+,∆p) thenx∞ = x+.

4.2. Regularization Property. Our next step is proving a stability property, whichtogether with the convergence result of Theorem 8 will ensure the regularization prop-erty of Theorem 11 below. To this end, we require the following assumption:

Assumption 9. Let (δj)j∈N be a zero-sequence and n := lim supj→∞

N (δj) (which can

possibly be ∞). Then, for every n ≤ n,

limj→∞

λδjn =: λn ∈ R and limj→∞

αδjn =: αn ∈ R.

Of course the above assumption is true for instance, if λδn and αδn depend continuouslyon δ. This assumption, combined with appropriate restrictions on the data-space Y andon the functional Φ, ensures stability, as next lemma shows.

Lemma 10. Assume all the hypotheses of Theorem 5 and fix a zero-sequence (δj)j∈N.Assume additionally that Assumptions 3 and 9 hold true, Y is locally uniformly smoothand Φ is continuously G-differentiable in Bρ (x+,∆p). Then the mapping dTα defined

in (16) is single-valued, λn ∈ [λmin, λmax] and αn ∈ [0, C4 ‖bn‖r] (see Assumption 9)and for each n ≤ n, it holds that

xδjn → xn as j →∞,


where the noise-free sequence (xn)n∈N is the one associated with the step-size sequence(λn)n∈N and the regularization sequence (αn)n∈N.

Proof. We prove the statement by induction. Assume that for a fixed number n ∈ Nsatisfying n < n, it holds

xδjn → xn as j →∞.

From (2) it follows that yδj → y as j →∞ and since F is a continuous function,∥∥∥bδjn ∥∥∥→

‖bn‖ as j →∞. Consequently, λδjmin → λmin and since α

δjn converges to αn as j →∞, it

follows from 0 ≤ αδjn ≤ C4

∥∥∥bδjn ∥∥∥r that αn ∈ [0, C4 ‖bn‖r] and accordingly, λδjmax → λmax.

Therefore the inequality λδjmin ≤ λ

δjn ≤ λ

δjmax implies that λn ∈ [λmin, λmax] , which proves

that xn+1 is well-defined. Now, the locally uniform smoothness of Y implies the dualitymapping Jr is single-valued and continuous and since F ′ is continuous too,

F ′(xδjn)∗Jr(F(xδjn)− yδj

)→ F ′ (xn)∗ Jr (F (xn)− y)

as j →∞. Since Φ is continuously G-differentiable, ∂Φ coincides with the G-derivative

∇Φ and it is therefore single-valued and continuous. Then ∇Φ(xδjn

)→ ∇Φ (xn) and

Ψδjn = dTα

(xδjn)

= F ′(xδjn)∗Jr(F(xδjn)− yδj

)+ αδjn ∇Φ

(xδjn)

converges to

Ψn := dTα (xn) = F ′ (xn)∗ Jr (F (xn)− y) + αn∇Φ (xn)

as j →∞. Finally, the continuity of J∗p∗ implies that

xδjn+1 = J∗p∗

(Jp(xδjn)− λδjn Ψδj

n

)→ J∗p∗

(Jp (xn)− λnΨn

)= xn+1

as j →∞.

From Theorem 8 and Lemma 10 we obtain the regularization property.

Theorem 11 (Regularization Property). Assume all the hypotheses of last Lemma

and fix a zero-sequence (δj)j∈N . Then the sequence(xδjN(δj)

)j∈N

converges to a solution

of (1) in Bρ (x+,∆p) . In particular, if there is a unique solution of this equation inBρ (x+,∆p) , then

limj→∞

xδjN(δj)

= x+.

Proof. Assume first the sequence (N (δj))j∈N is bounded. In this case, n <∞, and since

(N (δj))j∈N ⊂ N, we conclude that there exists a number J ∈ N such that ‖N (δj)‖ ≤ n

for all j ≥ J. Thus, the sequence(xδjN(δj)

)j≥J

splits into at most n subsequences having

the form(xδjkn

)k∈N

, where N (δjk) = n ≤ n. From Theorem 10, xδjkn → xn as k → ∞

and therefore

limk→∞

xδjkN(δjk)

= limk→∞

xδjkn = xn.

Now, xn is a solution of (1) because from (2) and (5) ,

‖y − F (xn)‖ = limk→∞

∥∥∥y − F (xδjkn )∥∥∥

16 FABIO MARGOTTI

≤ limk→∞

(∥∥y − yδjk∥∥+

∥∥∥∥yδjk − F (xδjkN(δjk)

)∥∥∥∥)≤ lim

k→∞(1 + τ) δjk = 0.

But, xn+k = xn for all k ∈ N whenever xn is a solution of (1) , because in this case

‖bn‖ = 0 and consequently, λmax = 0 in (22) . Thus, all the subsequences of(xδjN(δj)

)j∈N

converge to the same solution.Suppose now that N (δj) → ∞ as j → ∞ and let ε > 0 be given. As the Bregman

distance is a continuous function in both arguments, there exists γ = γ (ε) > 0 suchthat

(41) ∆p

(x, xδjn

)<εs

Cwhenever

∥∥x− xδjn ∥∥ < γ,

where C > 0 is the constant from (14). From Theorem 8, there exists M ∈ N and asolution x∞ of (1) such that

‖x∞ − xn‖ <γ

2for all n ≥M.

As N (δj)→∞, there exists J1 ∈ N such that N (δj) ≥M for all j ≥ J1. Finally, fromLemma 10, there exists J2 ≥ J1 such that∥∥∥xM − xδjM∥∥∥ < γ

2for all j ≥ J2.

Thus, for any j ≥ J2,∥∥∥x∞ − xδjM∥∥∥ ≤ ‖x∞ − xM‖+∥∥∥xM − xδjM∥∥∥ < γ

and consequently,∥∥∥x∞ − xδjN(δj)

∥∥∥s (14)

≤ C∆p

(x∞, x

δjN(δj)

) (26)

≤ C∆p

(x∞, x

δjM

) (41)< εs

5. Numerical Experiments

In this section we test our method for the non-linear and severely ill-posed inverseproblem of EIT (Electrical Impedance Tomography) introduced by Calderon [5]. Wefocus only on the main ideas and refer the interested reader to the survey article [4]and the close related work [6] for more details.

Let Ω ⊂ R2 be a bounded and simply connected Lipschitz domain. Applying electriccurrents g : ∂Ω → R on its boundary6 and reading the resulting voltages f : ∂Ω → Ron its boundary as well, we aim to reconstruct the electric conductivity γ : Ω → R inthe whole of Ω.

Once fixed the electric current g ∈ L2♦ (∂Ω) :=

v ∈ L2 (∂Ω) :

∫∂Ωv = 0

, there

exists a unique electric potential u ∈ H1♦ (Ω) :=

v ∈ H1 (Ω) :

∫∂Ωv = 0

satisfing the

variational equation

(42)

∫Ω

γ∇u∇ϕ =

∫∂Ω

gϕ for all ϕ ∈ H1♦ (Ω) ,

6The symbol ∂ was adopted in the previous sections as the subdifferential of a convex function. Inthis section however, we also use it to represent the boundary of a set. Its meaning will be made clearby the context.


provided γ ∈ L∞+ (Ω) := v ∈ L∞ (Ω) : v ≥ C a.e. , where C is a positive constant.Further, since u ∈ H1

♦ (Ω) , its trace belongs to L2♦ (∂Ω) and it is just the voltage on

the boundary: f = u|∂Ω.7

For each fixed conductivity γ ∈ L∞+ (Ω) , we define a bounded linear operator Λγ : L2♦ (∂Ω)→

L2♦ (∂Ω) , g 7→ f, which we call Neumann-to-Dirichlet map (in short NtD). The forward

operator associated with EIT is now defined by F : L∞+ (Ω) ⊂ L∞ (Ω)→ L(L2♦(∂Ω), L2

♦ (∂Ω)),

F (γ) := Λγ.

Finding γ in above equation for a given Λγ is the EIT inverse problem we want tosolve. Astala and Paivarinta proved in [1] the uniqueness of solutions of this inverse

problem provided Λγ operates between H−1/2♦ (∂Ω) and H

1/2♦ (∂Ω) .

In practical situations, the NtD map is not completely available. Only partial datacan be observed and the best one can do is applying d ∈ N currents gj ∈ L2

♦(∂Ω),j = 1, . . . , d, and then recording the resulting voltages fj = Λγgj. We thus fix the vector

G := (g1, ..., gd) ∈(L2♦(∂Ω)

)dand introduce the operator FG : L∞+ (Ω) ⊂ L∞(Ω) →(

L2♦(∂Ω)

)d, γ 7→ (Λγg1, ...,Λγgd) , which is F-differentiable8, see, e.g. [16].

Since an analytical solution of (42) is not available in general, the inverse problemneeds to be solved with help of a computer. For this reason, we construct a Delaunaytriangulation for Ω, T = Ti : i = 1, . . . ,M, with M = 1684 triangles (see the thirdpicture in Figure 1 below) and approximate γ by piecewise constant conductivities:define the finite dimensional space V := spanχT1 , . . . , χTM ⊂ L∞ (Ω) and search con-

ductivities in V, which means that our reconstructions always have the form∑M

i=1 θiχTi ,with (θ1, ..., θM) ∈ RM . Unfortunately, the Banach space L∞(Ω) is not regular enoughto be included in the convergence analysis of last section (it is not locally uniformlysmooth for instance). We thus fix p ∈ (1,∞) and equip the space V with the Lp−norm:

Vp :=(V, ‖·‖Lp(Ω)

). The spaces Lp(Ω), 1 < p <∞, are p∧2−smooth and p∨2−convex,

and the duality mapping Jp : Lp(Ω)→ Lp∗(Ω) is computed by

(43) Jp(f) = |f |p−1 sgn(f)

pointwise. With this new framework our forward operator reads

(44) FG : V ⊂ Vp →(L2(∂Ω)

)d, γ 7→ (Λγg1, ...,Λγgd) ,

where V = L∞+ (Ω)∩V . Since all the norms are equivalent in finite dimensional spaces,FG remains F-differentiable. We stress the fact that this restriction in the solutionspace X is reasonable, because the necessity of using a computer always forces theintroduction of a finite dimensional space. Moreover, it is possible to recover onlyfinitely many degrees of freedom of the conductivity from finitely many measurements.

Above framework guarantees the forward operator FG in (44) satisfies the tangen-tial cone condition, Assumption 1(c), at least in a small ball around a solution, see

[16, Theorem 3.4]. Further, its F-derivative, F ′G : int(V ) → L(Vp, (L2(∂Ω))

d), satisfies

7The conditions∫∂Ωg = 0 and

∫∂Ωf = 0 are sufficient to respectively prove existence and unique-

ness of solutions. They are physically interpreted as the law of conservation of charge and the ground-ing of the potential.

8Equipped with an inner product defined in a very natural way, induced by the inner product in

L2(∂Ω), the space(L2♦(∂Ω)

)dis a Hilbert space.

18 FABIO MARGOTTI

F ′G (γ)h = (w1|∂Ω , ..., wd|∂Ω), where wj ∈ H1♦ (Ω) is the unique solution of

(45)

∫Ω

γ∇wj∇ϕ = −∫

Ω

h∇uj∇ϕ for all ϕ ∈ H1♦ (Ω)

with uj solving (42) for g = gj.

Intending to compute the adjoint operator F ′G(γ)∗ : (L2(∂Ω))d → Vp∗ , we fix the

vectors z := (z1, ..., zd) ∈ (L2(∂Ω))d

and h ∈ Vp and apply the following procedure: foreach j = 1, ..., d, let uj and ψzj be the unique solutions of (42) for g = gj and g = zjrespectively. Then,

〈F ′G(γ)∗z, h〉 = 〈z, F ′G(γ)h〉 =d∑j=1

⟨zj, wj|∂Ω

⟩=

d∑j=1

∫∂Ω

zj wj|∂Ω

(42)=

d∑j=1

∫Ω

γ∇ψzj∇wj

=d∑j=1

∫Ω

γ∇wj∇ψzj(45)= −

d∑j=1

∫Ω

h∇uj∇ψzj = −

⟨d∑j=1

∇uj∇ψzj , h

⟩.

Hence,

(46) F ′G(γ)∗z = −d∑j=1

∇uj∇ψzj .

In our tests, the set Ω is defined as the unit square (0, 1) × (0, 1) and we supplythe current-set G with d = 12 independent currents: identifing the faces of Ω with thenumbers m = 0, 1, 2, 3, we apply the currents

g3m+k(x) =

cos (2kπx) : on the face m

0 : elsewhere on ∂Ω

for k = 1, 2, 3. The exact solution γ+ consists of a constant background conductivityγ ≡ 0.1 and an inclusion B ⊂ Ω with conductivity 1. Denoting the centroid of Ti byξi, we approximate γ+ in the space Vp by

γ+ = γ +M∑i=1

σivi, with σi =

0.9 : ξi ∈ B0 : otherwise

.

Figure 1 illustrates in the images on the left and in the middle, two examples ofγ+, when B is a ball and a four-squares inclusion respectively. The rightmost picturein this figure shows the triangulation T , used to reconstruct the conductivities in ourexperiments.

The data

(47) y := (Λγ+g1, ...,Λγ+gd)

corresponding to the exact solution γ+ have been computed using the finite elementmethod (FEM). The problems (42) and (45) have been solved by FEM as well, butusing a much coarser discretization mesh than the one used to generate the data.Further, we assume the background conductivity is known and start with the initialiterate γ0 = γ ≡ 0.1. Observe that in this case, the vector γ+ − γ0 =

∑Mi=1 σivi has a


Figure 1. Left and middle: two examples of searched-for conductivitieswith sparsely distributed inclusions. Right: triangulation T used toreconstruct the conductivities.

sparse representation in the fixed basis of Vp because the searched coefficients σi arenon-zero only in B.9 The relative Lp−error of the n−th iterate γn,

Epn := 100

‖γn − γ+‖Lp(Ω)

‖γ+‖Lp(Ω)

is utilized to compare the quality of the reconstructions.In order to avoid undesirable instability effects, which can arise from an unfavorable

selection of the geometry of the mesh, we propose employing a strategy using a weight-function ω : Ω→ R to define the weighted-space Lpω (Ω) :=

f : Ω→ R :

∫Ω|f |p ω <∞

.

The main results have been collected from [24] and [18, Subsection 5.1.2], we only clar-ify the most important ideas. See that ‖f‖Lpω(Ω) :=

∥∥fω1/p∥∥Lp(Ω)

defines a norm in

Lpω (Ω) if ω > 0, and this norm becomes equivalent to the regular Lp−norm when-ever ωmin ≤ ω ≤ ωmax with ωmin and ωmax being positive constants. In this case,the spaces Lp (Ω) and Lpω (Ω) become isomorphic and Lpω (Ω) inherits all the proper-ties from Lp (Ω) . In particular, Lpω (Ω) is a p ∨ 2−convex (thus s = p ∨ 2 in The-orem 2) and p ∧ 2−smooth Banach space, (Lpω)∗ = Lp

∗ω , and the duality mapping

Jp : Lpω (Ω)→ Lp∗ω (Ω) is exactly the same as (43) . Following ideas from [24], we further

choose

ω :=M∑i=1

βiχTi with βi :=‖F ′G (γ0)χTi‖(L2(∂Ω))d

|Ti|,

where |Ti| is the area of triangle Ti. Finally, we alter the topology of space Vp, changing

the norm Lp (Ω) to Lpω (Ω) ,10 i.e., from now on, Vp =(V, ‖·‖Lpω(Ω)

).

For our first experiment, we observe the exact conductivity γ+ shown in the leftmost

picture of Figure 1. As Y = (L2 (∂Ω))d

is a Hilbert space, the normalized dualitymapping is the identity operator, as a result, we fix r = 2 and consequently Jr (f) = ffor all f ∈ Y. Because γ+ − γ0 has a sparse representation in the basis χT1 , ..., χTM ,we choose the penalization functional Φ (γ) = 1

p‖γ − γ0‖p , see Remark 4. Since the

value of Φ (γ+) is known in our experiments, but unknown in practical situations, this

9Jin, Khan and Maass proposed a different method to solve the EIT problem in a similar situation,see [11].

10The introduction of the weight-function ω changes the evaluation of the adjoint operator in (46)

slightly. From [18], we see that F ′G(γ)∗z = − 1ω

∑dj=1∇uj∇ψzj .

20 FABIO MARGOTTI

p = 2 p = 1.01

0 1000 2000 3000 4000 5000

0.4

0.5

0.6

0.7

0.8

Iteration

Itera

tion

Err

or

0 1000 2000 3000 4000 5000

0.4

0.5

0.6

0.7

0.8

Iteration

Itera

tion

Err

or

Figure 2. Experiments performed without addition of artificial noise.Relative iteration errors E2

n (curves from above) and E1n (curves from

below) versus the iteration index n in different spaces and with differentregularization parameters α. Red line: α = 0, blue line: α = 0.1 andblack line: α = 0.9.

quantity has been overestimated with an error of 50%, this is, we fix C1 = 1.5Φ (γ+) ,which we consider a reasonable guess. Next, we compare the reconstructions withrespect to different spaces X = V2 (p = 2) and X = V1.01 (p = 1.01). At the sametime, we test the convergence results from Theorem 8 and examine the effect of theregularization parameter αn. From polarization identity (11) follows that the constantCp∗,s∗ in (15) can be set to 1/2 in Hilbert spaces, and since C2 < C1−s

p∗,s∗ , we fix C2 = 1.9for the case p = 2. As the constant C2 depends on both Cp∗,s∗ and Cρ,x+ in the spaceV1.01, it is much harder to estimate. We try C2 = 0.001 for the case p = 1.01, whichworked satisfactorily in our experiments. Further, the estimate η = 0.5 has beenutilized and the remaining parameters of Algorithm 1 have been chosen as C0 = 1− η,λk = λmax and

αn := αC0

C1

‖y − FG (γn)‖r ,

with α ∈ [0, 1) . The purpose of parameter α is regulating on which level the effect ofthe penalization functional Φ is incorporated into the reconstructions. The larger α is,the bigger is the influence of Φ in the iterates. In view of (22) however, one can see thatlarge values of α lead to small step-sizes, which can possibly slow down the convergenceof the method. Figure 2 confronts the errors E1

n and E2n with the iteration index n for

n = 0, ..., 5000, using three different regularization levels: α = 0 (red lines), α = 0.1(blue lines) and α = 0.9 (black lines), where no artificial noise is added (δ = 0).11

It is clear that different values of α do not affect the quality of the reconstructions inthe Hilbert space V2 very strongly. Moreover, the sequence (E1

n)n∈N does not decreasein this space, which means that the errors in the L1−norm are not improved. However,this is not the case in V1.01, where α has an important impact on the reconstructions.See that in this Banach space, the choice α = 0 produces a fast decrease of the sequence(E2

n)n∈N although, large errors in the L1−norm are observed after a small improvement.On the other hand, both sequences have a monotonically decreasing behavior for arelatively long period if the value of α is increased.

Complementary to Figure 2, we display in Figure 3 a linear interpolation of theiterate n = 5000 using the following configurations: leftmost picture: p = 2 and α = 0

11Of course some sources of noise such as noise from discretization and computer work precisioncannot be avoided and are always present.


E2N = 61, 40%,

E1N = 42, 64%

E2N = 49, 99%,

E1N = 50, 08%

E2N = 67, 57%,

E1N = 34, 10%

Figure 3. Linear interpolation of iterate γ5000 in the noiseless situation,comparing different spaces and regularization levels. Left: p = 2 andα = 0. Middle: p = 1.01 and α = 0. Right: p = 1.01 and α = 0.9.

(this structure corresponds to the red line in the left picture of Figure 2), picture in themiddle: p = 1.01 and α = 0 (corresponds to the red line in the right picture of Figure2) and rightmost picture: p = 1.01 and α = 0.9 (black line in the right picture of Figure2). The pictures are in different scales and below each of them, the L1 and L2 relativeerrors E1

N and E2N for N = 5000 are shown. As expected, the Hilbert space V2 has

produced in the first picture, an oversmoothed reconstruction with a very oscillatorybackground. The picture displayed in the middle exhibits a thin and high inclusion,as predicted for reconstructions in Banach spaces Vp with p close to 1, however, theuse of a very small regularization parameter (α = 0) impedes that the effect of thepenalization functional Φ is assimilated by the iterates, leading to a large oscillation inthe background and a consequent large error in the L1−norm. Finally, in the rightmostpicture, the large value of α fully incorporates the effect of the penalization functionalΦ into the reconstruction, which combined with the small value of p, promotes a sparsereconstruction for γn − γ0. As a result, a small error in the L1−norm is observed anda very stable background is produced. The price to be paid is a large error in theL2−norm and a slow convergence, due to the resulting small step-sizes, see (22) .

Aiming the examination of the performance of our method when data is corrupted bynoise, we contaminate the simulated data y in (47) with artificially generated randomnoise, with a relative noise level δ > 0,

(48) yδ = y + δ noi‖y‖(L2(∂Ω))d ,

where noi is a uniformly distributed random variable such that ‖noi‖(L2(∂Ω))d = 1.

Next, we set τ = 1.2 (1 + η) / (1− η) , α = 0.5 and δ = 0.1%, and fix all the remainingparameters as in last experiment, except for C0, which needs to be replaced by

C0 = 1− η − 1 + η

τ≈ 0.17 (1− η) .

Finally, we try an alternative step-size setting λn = λmax with C5 = 0.3C2 in (34) . Inour first experiment with noisy data, we investigate the influence of different spacesin the reconstructions and therefore, compare again the spaces X = Vp for p = 2 andp = 1.01. The results are exhibited in Figure 4, where each row represents a differentsought-for conductivity. The first column displays the exact solutions γ+, while thesecond and the third one show the reconstructions for p = 2 and p = 1.01 respectively.After the third column, a colorbar related to the reconstructions is displayed on each

22 FABIO MARGOTTI

Exact p = 2 p = 1.01

Figure 4. Comparison of reconstructions in different spaces and withthe same noise level δ = 0.1%. First column: sought-for conductivities.Second column: reconstructions in Hilbert space V2. Third column: re-constructions in Banach space V1.01.

row (the exact solutions are in a different color scale). In all cases, the inclusions werelocated, but a clear improvement is perceived when the Hilbert space V2 is changed bythe more suitable Banach space V1.01. Observe that the values of the conductivities aremuch closer to the real ones if the space X = V1.01 is considered. Moreover, the shapesof the inclusions are more realistic in this space (see the last row).

In the last experiment, we test the efficiency of Algorithm 1 when dealing withimpulsive noise. In contrast with uniformly distributed noise, this kind of perturbationhas a very sparse distribution, and we can take advantage of it equipping the data-space Y with a different norm. In Figure 5 we present side by side, two different kindsof noise used in our experiments. The picture on the left shows uniformly distributednoise and the one on the right exhibits an example of impulsive noise, with uniformnoise superimposed by some outliers12, present in about 5% of the points. Both noisestype are scaled in order to have the same L2−norm.

12Outliers are highly inconsistent data points and may arise from procedural measurement errors.


Uniform Noise Impulsive Noise

0 500 1000 1500−3

−1

1

3x 10

−3

0 500 1000 1500−0.02

−0.015

−0.01

−0.005

0

0.005

0.01

0.015

0.02

Figure 5. Different kinds of noise. Left: uniformly distributed noise.Right: impulsive noise. Both noises have the same L2−norm.

Although the L2−norm is the same for both noises shown in Figure 5, the L1.01−normof the second kind of noise amounts to only 25% of the first one, which means thatin this norm, the data corrupted by impulsive noise furnishes much more informationand is more precise than the one contamined with uniform noise. For this reason, wechange the norm in data-space Y to (Lr(∂Ω))d for some r > 1 but close to 1 andconsider the new operator

FG : V+ ⊂ Vp → (Lr(∂Ω))d , 1 < r ≤ 2.

As ‖w‖(Lr(∂Ω))d . ‖w‖(L2(∂Ω))d for every w ∈ Y, the new operator FG is still F-

differentiable and its derivative remains the same. To perform our experiment underimpulsive noise, we have chosen the exact solution γ+ with a four-squares inclusion,displayed in the middle of Figure 1. The noises have been scaled such that the resultingrelative L2-noise in (48) is δ = 0.3% in both cases. The results are illustrated in Fig-ure 6, where the first row shows the reconstructions when uniform noise is consideredand the second one displays the results when the data is contamined with impulsive

noise. The first and second column represent the Hilbert space Y = (L2(∂Ω))d

and the

Banach space Y = (L1.01(∂Ω))d

respectively. Below each picture, the number of per-formed iterations until termination, N = N(δ), and the relative error in the L2−norm,E2N , are highlighted. As is promptly seen, the reconstructions do not show a relevant

variation when uniform noise is considered, but under the effect of impulsive, the space

Y = (L1.01(∂Ω))d

provides a much better reconstruction, showing a smaller L2−errorand exhibiting inclusions with more precise locations, values and shapes. However, adisadvantage becomes evident: a much larger number of iterations needs to be per-formed in this situation.

References

[1] Kari Astala and Lassi Paivarinta. Calderon’s inverse conductivity problem in the plane. Ann. ofMath. (2), 163(1):265–299, 2006.

[2] A. B. Bakushinskiı. On a convergence problem of the iterative-regularized Gauss-Newton method.Zh. Vychisl. Mat. i Mat. Fiz., 32(9):1503–1509, 1992.

[3] Thomas Bonesky, Kamil S. Kazimierski, Peter Maass, Frank Schopfer, and Thomas Schuster.Minimization of Tikhonov functionals in Banach spaces. Abstr. Appl. Anal., pages Art. ID 192679,19, 2008.

[4] Liliana Borcea. Electrical impedance tomography. Inverse Problems, 18(6):R99–R136, 2002.

24 FABIO MARGOTTI

r = 2 r = 1.01

UniformNoise

N = 0023, E2N = 83, 39% N = 0024, E2

N = 83, 31%

ImpulsiveNoise

N = 0035, E2N = 83, 53% N = 1902, E2

N = 76, 82%

Figure 6. Reconstructed conductivities obtained under the use of uni-form and impulsive noise. The columns represent different spaces andthe rows, different types of noise.

[5] Alberto-P. Calderon. On an inverse boundary value problem. In Seminar on Numerical Analysisand its Applications to Continuum Physics (Rio de Janeiro, 1980), pages 65–73. Soc. Brasil.Mat., Rio de Janeiro, 1980.

[6] Margaret Cheney, David Isaacson, and Jonathan C. Newell. Electrical impedance tomography.SIAM Rev., 41(1):85–101 (electronic), 1999.

[7] Charles Chidume. Geometric properties of Banach spaces and nonlinear iterations, volume 1965of Lecture Notes in Mathematics. Springer-Verlag London, Ltd., London, 2009.

[8] Ioana Cioranescu. Geometry of Banach spaces, duality mappings and nonlinear problems, vol-ume 62 of Mathematics and its Applications. Kluwer Academic Publishers Group, Dordrecht,1990.

[9] Ingrid Daubechies, Michel Defrise, and Christine De Mol. An iterative thresholding algorithm forlinear inverse problems with a sparsity constraint. Comm. Pure Appl. Math., 57(11):1413–1457,2004.

[10] Heinz W. Engl, Martin Hanke, and Andreas Neubauer. Regularization of inverse problems, volume375 of Mathematics and its Applications. Kluwer Academic Publishers Group, Dordrecht, 1996.

[11] Bangti Jin, Taufiquar Khan, and Peter Maass. A reconstruction algorithm for electrical impedancetomography based on sparsity regularization. Internat. J. Numer. Methods Engrg., 89(3):337–353,2012.

[12] Bangti Jin and Peter Maass. Sparsity regularization for parameter identification problems. InverseProblems, 28(12):123001, 70, 2012.

[13] Qinian Jin. Inexact Newton-Landweber iteration for solving nonlinear inverse problems in Banachspaces. Inverse Problems, 28(6):065002, 15, 2012.

[14] Barbara Kaltenbacher, Andreas Neubauer, and Otmar Scherzer. Iterative regularization meth-ods for nonlinear ill-posed problems, volume 6 of Radon Series on Computational and AppliedMathematics. Walter de Gruyter GmbH & Co. KG, Berlin, 2008.

[15] Barbara Kaltenbacher and Ivan Tomba. Convergence rates for an iteratively regularized Newton-Landweber iteration in Banach space. Inverse Problems, 29(2):025010, 18, 2013.


[16] Armin Lechleiter and Andreas Rieder. Newton regularizations for impedance tomography: con-vergence by local injectivity. Inverse Problems, 24(6):065009, 18, 2008.

[17] Armin Lechleiter and Andreas Rieder. Towards a general convergence theory for inexact Newtonregularizations. Numer. Math., 114(3):521–548, 2010.

[18] Fabio Margotti. On inexact Newton methods for inverse problems in Banach spaces. PhD Thesis.Karlsruher Institut fur Technologie, Karlsruhe, 2015.

[19] Fabio Margotti and Andreas Rieder. An inexact Newton regularization in Banach spaces basedon the nonstationary iterated Tikhonov method. Journal of inverse and Ill-posed Problems,23(4):373–392, 2014.

[20] A. Neubauer and O. Scherzer. A convergence rate result for a steepest descent method and aminimal error method for the solution of nonlinear ill-posed problems. Z. Anal. Anwendungen,14(2):369–377, 1995.

[21] F. Schopfer, A. K. Louis, and T. Schuster. Nonlinear iterative methods for linear ill-posed prob-lems in Banach spaces. Inverse Problems, 22(1):311–329, 2006.

[22] Thomas Schuster, Barbara Kaltenbacher, Bernd Hofmann, and Kamil S. Kazimierski. Regular-ization methods in Banach spaces, volume 10 of Radon Series on Computational and AppliedMathematics. Walter de Gruyter GmbH & Co. KG, Berlin, 2012.

[23] A. N. Tikhonov. On the solution of incorrectly put problems and the regularisation method. InOutlines Joint Sympos. Partial Differential Equations (Novosibirsk, 1963), pages 261–265. Acad.Sci. USSR Siberian Branch, Moscow, 1963.

[24] Robert Winkler and Andreas Rieder. Model-aware newton-type inversion scheme for electricalimpedance tomography. Inverse Problems, 31(4):045009, 2015.

[25] Zong Ben Xu and G. F. Roach. Characteristic inequalities of uniformly convex and uniformlysmooth Banach spaces. J. Math. Anal. Appl., 157(1):189–210, 1991.

Department of Mathematics, Federal University of Santa CatarinaE-mail address: [email protected]

mixed gradient-tikhonov methods for solving non …margotti/media/mix.pdf · non-linear ill-posed...

Documents