lecture 28 - university of waterloolinks.uwaterloo.ca/amath353docs/set10.pdf · that all of the...
TRANSCRIPT
Lecture 28
Solution of Heat Equation via Fourier Transforms and Convolution Theorem
Relvant sections of text: 10.4.2, 10.4.3
In the previous lecture, we derived the unique solution to the heat/diffusion equation on R,
∂u
∂t= k
∂2u
∂x2, −∞ < x < ∞, (1)
with initial condition
u(x, 0) = f(x). (2)
The result was
u(x, t) =
∫
∞
−∞
f(s)ht(x − s) ds, t > 0, (3)
where the “heat kernel” function ht(x) is given by
ht(x) =1√
4πkte−x2/4kt, t > 0. (4)
This is a fundamental result – it states that the solution u(x, t) is the spatial convolution of the
functions f(x) and ht(x).
The derivation given in the last lecture may seem a little contrived, since we took the integral form
of the solution u(x, t) derived in a previous lecture and then performed some manipulations inside the
integral.
In this lecture, we provide another derivation, in terms of a convolution theorem for Fourier
transforms. Starting with the heat equation in (1), we take Fourier transforms of both sides, i.e.,
F(
∂u
∂t
)
= kF(
∂2u
∂x2
)
, (5)
where we have acknowledged the linearity of the Fourier transform in moving the constant k out of
the transform. It now remains to make sense of these Fourier transforms. You may recall that in the
case of Laplace transforms (LTs), the LT of a derivative of a function could be related to the LT of
the function.
First, let us recall the definition of the FT of a function u(x, t):
F(u) = U(ω, t) =1
2π
∫
∞
−∞
u(x, t)eiωxdx. (6)
194
This implies that
F(
∂u
∂t
)
=1
2π
∫
∞
−∞
∂u
∂t(x, t)eiωxdx. (7)
The only t-dependence in the integral on the right is in the integrand u(x, t). As a result, we may
write
F(
∂u
∂t
)
=∂
∂t
[
1
2π
∫
∞
−∞
u(x, t)eiωxdx
]
=∂
∂tU(ω, t). (8)
In other words, the partial time-derivative of u(ω, t) is simply the partial time-derivative of U(ω, t).
In order to make sense of the right hand side of Eq. (5), we should first examine the FT of ∂u/∂x,
i.e.,
F(
∂u
∂x
)
=1
2π
∫
∞
−∞
∂u
∂x(x, t)eiωxdx. (9)
In this case, of course, we may not pull the partial derivative ∂/∂x out of the integral, since the variable
x is involved in the integration procedure. This integral looks like it is begging for an integration by
parts, so let’s try it: We’ll set f = eiωx and dg = ∂u/∂x to yield
1
2π
∫
∞
−∞
∂u
∂x(x, t)eiωxdx =
1
2πeiωxu(x, t)|∞
−∞− 1
2π(iω)
∫
∞
−∞
u(x, t)eiωxdx. (10)
Recall that the boundary conditions for the heat equation on the infinite interval were
u(x, t) → 0 as x → ±∞. (11)
As a result, the contributions to the first term vanish and we are left with the integral. But the
integral is simply a multiple of the Fourier transform of u, i.e.,
F(
∂u
∂x
)
= −iωF(u) = −iωU(ω, t). (12)
We may iterate this result to obtain the FT of ∂2u/∂x2:
F(
∂2u
∂x2
)
= F(
∂
∂x
(
∂u
∂x
))
= −iωF(
∂u
∂x
)
= (−iω)2F(u)
= −ω2F(u). (13)
195
We may now substitute the results of these calculations into Eq. (5) to give
∂U(ω, t)
∂t= −kω2U(ω, t). (14)
Taking FTs of both sides of the heat equation converts a PDE involving both partial derivatives in x
and t into a PDE that has only partial derivatives in t. This means that we can solve Eq. (14) as we
would an ordinary differential equation in the independent variable t – in essence, we may ignore any
dependence on the ω variable.
The above equation may be solved in the same way as we solve a first-order linear ODE in t.
(Actually it is also separable.) We write it as
∂U(ω, t)
∂t+ kω2U(ω, t) = 0, (15)
and note that the integrating factor is I(t) = ekω2t to give
∂
∂t
[
ekω2tU(ω, t)]
= 0. (16)
Integrating (partially) with respect to t yields
ekω2tU(ω, t) = c, (17)
or
U(ω, t) = ce−kω2t. (18)
where c is a “constant” with respect to partial differentiation by t. This means that c can be a function
of ω. As such, we’ll write
U(ω, t) = c(ω)e−kω2t. (19)
You can differentiate this expression partially with respect to t to check that it satisfies Eq. (14).
Now notice that at time t = 0,
U(ω, 0) = c(ω). (20)
In other words, c(ω) is the FT of the function u(x, 0). But this is the initial temperature distribution
f(x)! In other words
c(ω) =1
2π
∫
∞
−∞
f(x)eiωxdx = F (ω), (21)
where we have written F (ω) to represent the FT of f(x). Therefore, Eq. (19) may be rewritten as
U(ω, t) = F (ω)e−kω2t. (22)
196
Now, it seems that we have almost solved our heat equation problem. All we have to do is to take
the inverse FT of both sides to retrieve u(x, t). But how do we take the inverse FT of the right-hand
side? The function e−kω2t is not a complex exponential in x, e.g., eiωx, so that we cannot use a
shift theorem. On the other hand, if we may consider this Gaussian as a Fourier transform, then the
right-hand side becomes a product of Fourier transforms, i.e.,
U(ω, t) = F (ω)G(ω, t), G(ω, t) = e−kω2t. (23)
You may recall that there is a convolution theorem for products of Laplace transforms – there is also
a convolution theorem for Fourier transforms:
Convolution Theorem for Fourier Transforms: Let F(f) = F and F(g) = G. Then, assuming
that all of the integrals in the equation below exist,
F−1(FG) =1
2π
∫
∞
−∞
f(s)g(x − s) ds =1
2πf ∗ g(x). (24)
Or, equivalently,
F(
1
2πf ∗ g
)
= F (ω)G(ω). (25)
Proof: By definition
F−1(FG) =
∫
∞
−∞
F (ω)G(ω)e−iωxdω. (26)
We replace F (ω) by its definition as an integration in x, i.e.,
F (ω) =1
2π
∫
∞
−∞
f(x)eiωsds, (27)
so that
F−1(FG) =
∫
∞
−∞
[
1
2π
∫
∞
−∞
f(s)eiωsds
]
G(ω)e−iωxdω. (28)
We now reverse the order of integration. (Theoretically, this should involve a little care, since the
intervals of integration are infinite. We assume that the functions in the integrand are suitably well-
behaved, i.e., piecewise continuous, L1-integrable and then invoke “Fubini’s Theorem.”) The result
is
F−1(FG) =1
2π
∫
∞
−∞
f(s)
[∫
∞
−∞
G(ω)e−iω(x−s)dω
]
ds. (29)
197
But, by definition, the integral in the square brackets is the inverse Fourier transform of G, i.e., the
function g, but evaluated at x− s, because of the term “x− s” in the complex exponential. The result
is
F−1(FG) =1
2π
∫
∞
−∞
f(s)g(x − s) dx, (30)
which proves the theorem. Note the similarity – at least in idea – to the convolution theorem for
Laplace transforms. In that case, the convolution between two functions was defined slightly differently.
As well, there is an additional factor of 1/2π which arises from the asymmetric appearance of this
factor in the FT but not in the inverse FT.
We now wish to apply the above convolution theorem to Eq. (22). This, of course, means that we
must be able to produce the inverse FT of the Gaussian G(ω, t) = e−kω2t. But we did this a couple of
lectures ago – it is the function
gt(x) =
√
π
kte−x2/4kt. (31)
From the Convolution Theorem for FTs, then, we have
u(x, t) =1
2π
∫
∞
−∞
f(s)gt(x − s) ds
=
∫
∞
−∞
f(s)1√
4πkte−(x−s)2/4kt ds
=
∫
∞
−∞
f(s)ht(x − s) ds, (32)
which is precisely the result obtained in the previous lecture. Note that the functions gt(x) and ht(x)
are related by the factor 1/2π, i.e.,
ht(x) =1
2πgt(x). (33)
198
Lecture 29
Fourier Transforms (cont’d)
Let’s review our results to date regarding the heat equation on the (infinite) real line R:
∂u
∂t= k
∂2u
∂x2, −∞ < x < ∞, (34)
with initial condition
u(x, 0) = f(x), f ∈ L2(R). (35)
The condition that f ∈ L2(R) implies zero-endpoint conditions, i.e.,
f(x) → 0 as x → ±∞. (36)
By taking Fourier transforms of both sides of Eq. (34), we arrive at the following equation of evolution
of the Fourier transform U(ω, t) of u(x, t):
∂U
∂t(ω, t) = −kω2U(ω, t), (37)
with initial condition
U(ω, 0) = F (ω) = F(f). (38)
Since there are no partial derivatives with respect to ω in this equation, we may solve it as we would
an ordinary differential equation, in fact, a first order, linear (as well as separable) differential equation
with respect to the independent variable t. The solution was easily found to be
U(ω, t) = F (ω)e−kω2t. (39)
We may now obtain the solution u(x, t) to the heat equation by taking inverse Fourier transforms
(IFTs) of both sides of this equation. The IFT of the LHS is u(x, t). The RHS of (39) is a product of
functions, namely, F (ω) and G(ω) = e−kω2t. From the Convolution Theorem for FTs, the IFT of this
product is a convolution of the IFTs of F and G, i.e.,
u(x, t) =1
2π
∫
∞
−∞
f(s)gt(x − s) ds, (40)
where
gt(x) = F−1[G(ω)] =
√
π
kte−x2/4kt. (41)
199
The final result is
u(x, t) =
∫
∞
−∞
f(s)ht(x − s) ds, (42)
where
ht(x) =1√
4πkte−x2/4kt (43)
is the heat kernel. The heat kernel ht(x) is a Gaussian function that “spreads out” in time: its standard
deviation is given by σ(t) =√
2kt. The convolution in Eq. (42) represents a weighted averaging of
the function f(s) about the point x – as time increases, this weighted averaging includes more points
away from x, which accounts for the smoothing of the temperature distribution u(x, t).
You are already familiar with such smoothing from our study of the heat/diffusion equation in
1D on a finite interval, but here is an illustration of the qualitative aspects of such smoothing. In the
figure below, we show the graph of a temperature distribution that exhibits a concentration of heat
energy around the position x = a. At a time t2 > t1, this heat energy distribution has been somehwat
smoothened – some of the heat in the region of concentration has been transported to points farther
away from x = a:
u(x, t1) u(x, t2), t2 > t1
In the figure below, we illustrate how a discontinuity in the temperature function gets smoothened
into a continuous distribution. At time t = 0, there is a discontinuity in the temperature function
u(x, 0) = f(x) at x = a. Recall that we can obtain all future distributions u(x, t) by convolving f(x)
with the Gaussian heat kernel. For some representative time t > 0, but not too large, three such
Gaussian kernels are shown at points x1 < a, x2 = a and x3 > a. Since t is not too large, the Gaussian
kernels are not too wide. A convolution of u(x, 0) with these kernels at x1 and x3 will not change the
distribution too significantly, since u(x, 0) is constant over most of the region where these kernels have
significant values. However, the convolution of u(x, 0) with the Gaussian kernel centered at the point
of discontinuity x = x2 will involve values of u(x, 0) to the left of the discontinuity as well as values of
u(x, 0) to the right. This will lead to a significant averaging of the temperature values, as shown in
the figure. At a time t2 > t1, the averaging of u(x, 0) with even wider Gaussian kernels will produce
200
even greater smoothing. We shall return to this phenomenon in a later section.
x3a a
t1 > 0
u(x, t), t > 0u(x, 0)
t2 > t1
x1
Let’s now return to the evolution of the Fourier transform function U(ω, t) in Eq. (39). From this
equation, we see that
U(0, t) = F (0) and U(ω, t) → 0 for all ω 6= 0. (44)
This means that in the limit t → ∞, U(ω, t) approaches the zero function – it is nonzero only at the
point ω = 0. Let’s call this limiting function U∞(ω), i.e.,
U∞(ω) =
F (0), ω = 0
0, ω 6= 0(45)
Then
U(ω, t) → U∞(ω) as t → ∞ for all ω ∈ R. (46)
The inverse Fourier transform of this limiting function, which we’ll call u∞(x), is, by definition, given
by
u∞(x) =
∫
∞
−∞
U∞(ω)e−iωx dω = 0. (47)
(The possibly nonzero value of U∞(0) does not contribute to the integral.) Taking inverse Fourier
transforms of each side of Eq. (46), we obtain the result
u(x, t) → 0 as t → ∞ for all x ∈ R. (48)
The fact that we start with a nonzero initial temperature distribution u(x, 0) = f(x) which evolves
to the zero solution on the real line might be somewhat bothersome from the viewpoint of conservation
of energy. Where did all of that energy go? We claim that it is still there, i.e., on the line, but that
it has dispersed over the entire infinite line.
201
To see this, let’s start by computing the total thermal energy associated with the temperature
distribution u(x, 0) = f(x) for x ∈ R. We’ll assume that the real line represents a homogeneous rod of
constant cross-sectional area A and lineal density ρ. Furthermore, we assume that the thermal energy
density function is given by
e(x, t) = cρ[u(x, t) − u0]. (49)
And for simplicity, we again assume that u0 = 0.
The initial total thermal energy of the infinite rod is then given by
E(0) = cρ
∫
∞
−∞
u(x, 0)A dx = cρA
∫
∞
−∞
f(x) dx. (50)
We assume that this integral is finite. The thermal energy E(t) at any time t > 0 is then given by
E(t) = cρA
∫
∞
−∞
u(x, t) dx. (51)
Let’s now compute the rate of change E′(t):
E′(t) = cρAd
dt
∫
∞
−∞
u(x, t) dx
= cρA
∫
∞
−∞
∂u
∂t(x, t) dx (by Leibniz rule)
= cρA
∫
∞
−∞
k∂2u
∂x2(x, t) dx (since u is solution to heat equation)
= ckρA
[
∂u
∂x(x, t)
]x→∞
x→−∞
(Fundamental Theorem of Calculus II)
= 0. (52)
The final line comes from the fact that the partial derivative ∂u/∂x must go to zero as x → ±∞ since
u → 0 as x → ±∞.
Since E′(t) = 0 for all t > 0, it follows that
E(t) = E(0), t ≥ 0. (53)
Even though u(x, t) → 0 as t → ∞, the total energy remains constant – it simply becomes spread out
over the entire infinite real line R.
Without loss of generality, we assume that the initial temperature function f(x) is nonnegative
for all x ∈ R and u(x, t) ≥ 0 for all x ∈ R and t ≥ 0. Then if we strip away the constant multiplicative
202
factor ckρA from the above result, we have
∫
∞
−∞
u(x, t) dx =
∫
∞
−∞
|u(x, t)| dx
=
∫
∞
−∞
|f(x)| dx. (54)
This last line defines the so-called “L1 norm” of f , denoted as ‖f‖1. This comes from the definition
of the space of integrable functions, L2(R) (as opposed to square-integrable functions L2(R)):
L1(R) = {f : R → R | ‖f‖1 =
∫
∞
−∞
|f(x)| dx < ∞}. (55)
In other words, the L1 norm of the temperature function u(x, t) remains constant in time, even though
u(x, t) → 0.
On the other hand, we now show that the L2 norm of u(x, t) goes to zero as t → ∞, i.e.,
‖u(x, t)‖2 =
[∫
∞
−∞
|u(x, t)|2 dx
]1/2
→ 0 as t → ∞. (56)
To show this, let’s compute the time derivative of the squared L2 norm of u:
d
dt‖u‖2
2 =d
dt
∫
∞
−∞
u(x, t)2 dx
=
∫
∞
−∞
∂
∂t[u(x, t)2] dx (Leibniz rule)
= 2
∫
∞
−∞
u∂u
∂tdx
= 2k
∫
∞
−∞
u∂2u
∂x2dx (u is solution to heat equation). (57)
We now integrate by parts, with f = u and g′ = ∂u/∂x:
d
dt‖u‖2
2 = 2k
[
u∂u
∂x
]x→∞
x→∞
− 2k
∫
∞
−∞
(
∂u
∂x
)2
dx. (58)
The first term on the RHS vanishes because u → 0 as x → ±∞. The integral on the RHS is nonnegative
since the integrand is nonnegative. The only way that the integral can be zero is if∂u
∂x= 0 for all
x ∈ R, in which case u(x, t) must be constant. The only constant function that is in the space L2(R)
is the zero function. So if we exclude the case that u = 0, we have
d
dt‖u(x, t)‖2
2 < 0, t > 0. (59)
This implies that for each x ∈ R, ‖u(x, t)‖2 is decreasing in time. Unfortunately, this result does not
imply that u(x, t) → 0 as t → ∞: it doesn’t prevent ‖u(x, t)‖2 from approaching a nonzero constant
203
asymptotically. In a little while, with the help of Parseval’s identity, we’ll be able to show that,
indeed, the L2 norm of u does go to zero. For the moment, let us simply accept this result.
The above results – that as t → ∞:
1. u(x, t) → 0 for all x ∈ R,
2. ‖u‖1 = ‖f‖1, constant,
3. ‖u‖2 → 0,
may be somewhat difficult to reconcile. Let’s consider a rather simple example that also demonstrates
these properties. For any L > 0, define the following function,
fL(x) =
12L , −L ≤ x ≤ L,
0, otherwise.(60)
The graph of this function is sketched below.
y = fL(x)
x
y
0 L−L
1
2L
We are concerned with the behaviour of fL(x) as L → 0. From the figure above, it is clear that the
nonzero part of the graph of fL gets wider, but it also approaches zero. First of all, note that
‖fL‖1 =
∫
∞
−∞
|fL(x)| dx
=
∫ L
−L
1
2Ldx
= 1. (61)
This, of course, implies that
‖fL‖1 → 1 as L → ∞. (62)
204
This is a simple consequence of the fact that the area enclosed by the graph of fL(x) and the x-axis
is constant. (Yes, the function was cleverly contrived to exhibit this behaviour.)
Now consider
‖fL‖22 =
∫
∞
−∞
|fL(x)|2 dx
=
∫ L
−L
(
1
2L
)2
dx
=1
4L2· 2L
=1
2L. (63)
This implies that
‖fL‖2 =1√2L
→ 0 as L → ∞. (64)
We have constructed a simple function that behaves qualitatively in a manner similar to the temper-
ature function u(x, t) discussed earlier.
205
Smoothing produced by the heat/diffusion equation and some applications in signal
and image processing
We now return to the idea of smoothing under evolution of the heat/diffusion equation, in particular,
the smoothing of a discontinuity, as showed earlier. The graphs shown earlier are reproduced below.
x3a a
t1 > 0
u(x, t), t > 0u(x, 0)
t2 > t1
x1
We now move away from the idea of the heat or diffusion equation modelling the behaviour of
temperatures or concentrations. Instead, we consider the function u(x, t) as representing a time-
varying signal. And in the case of two spatial dimensions, we may consider the function u(x, y, t) as
representing the time evolution of an image. Researchers in the signal and image processing noted the
smoothing behaviour of the heat/diffusion equation quite some time ago and asked whether it could
be exploited to accomplish desired tasks with signals and images. We outline some of these ideas very
briefly below for the case of images, since the results are quite dramatic visually.
First of all, images generally contain many points of discontinuity – these are the edges of the
image. In fact, edges are considered to define an image to a very large degree, since the boundaries
of any objects in the image produce edges. In what follows, we shall let the function u(x, y, t) denote
the evolution of a (non-negative) image function under the heat/diffusion equation,
∂u
∂t= k∇2u = k
[
∂2u
∂x2+
∂2u
∂y2
]
, u(x, y, 0) = f(x, y). (65)
Here we shall not worry about the domain of definition of the image. In practice, of course, image
functions are defined on a finite set, e.g., the rectangular region [a, b] × [c, d] ⊂ R2.
Some additional notes on the representation of images
Digitized images are defined over a finite set of points in a rectangular domain. Therefore,
they are essentially m × n arrays of greyscale or colour values – the values that are used
206
to assign brightness values to pixels on a computer screen. In what follows, we may
consider such matrices to define images that are piecewise constant over a region of R2.
We shall also be looking only at black-and-white (BW) images. Mathematically, we may
consider the range of greyscale values of a BW image function to be an interval [A,B], with
A representing black and B representing white. In mathematical analysis, one usually
assumes that [A,B] = [0, 1]. As you probably know, the practical storage of digitized
images also involves a quantization of the greyscale values into discrete values. For so-
called “8 bit-per-pixel” BW images, where each pixel may assume one of 28 = 256 discrete
values (8 bits of computer memory are used to the greyscale value at each pixel), the
greyscale values assume the values (black) 0, 1, 2, · · · , 255 (white). This greyscale range,
[A,B] = [0, 255] is used for the analysis and display of the images below.
From our earlier discussions, we expect that edges of the input image f(x, y) will become more
and more smoothened as time increases. The result will be an increasingly blurred image, as we show
in Figure 1 below. The top image in the figure is the input image f(x, y). The bottom row shows
the image u(x, y, t) for two future times. The solutions u(x, y, t) we computed by means of a 2D
finite-difference scheme using forward time difference and centered difference for the Laplacian. It
assumes the following form,
u(n+1)ij = u
(n)ij + s[u
(n)i−1,j + u
(n)i+1,j + u
(n)i,j−1 + u
(n)i,j+1 − 4u
(n)ij ], s =
k∆t
(∆x)2. (66)
(For details, you may consult the text by Haberman, Chapter 6, p. 253.) This scheme is numerically
stable for s < 0.25: The value s = 0.1 was used to compute the images in the figure.
Heat/diffusion equation and “deblurring”
You may well question the utility of blurring an image: Why would one wish to degrade an image in
this way? We’ll actually provide an answer very shortly but, for the moment, let’s make use of the
blurring result in the following way: If we know that an image is blurred by the heat/diffusion equation
as time increases, i.e., as time proceeds forward, then perhaps a blurry image can be deblurred by
letting it evolve as time proceeds backward. The problem of “image deblurring” is an important one,
e.g., acquiring the license plate numbers of cars that are travelling at a very high speed.
If everything proceeded properly, could a blurred edge could possibly be restored by such a pro-
cedure? In theory, the answer is “yes”, provided that the blurred image is known for all (continuous)
207
Figure 1. Image blurring produced by the heat/diffusion equation. Top: 2288 × 1712 pixel (8 bits=256
grayscale levels/pixel) image as initial data function f(x, y) = u(x, y, 0) for heat/diffusion equation on R2.
Bottom: Evolution of image function u(x, y, t) under discrete 2D finite-difference scheme (see main text). Left:
After n = 100 iterations. Right: After n = 500 iterations.
values of x and y. In practice, however, the answer is “generally no”, since we know only discrete, sam-
pled values of the image. In addition, running the heat/diffusion equation backwards is an unstable
process. Instead of the exponential damping that we saw very early in the course, i.e., an eigensolution
208
in one dimension, un(x, t) = φnhn(t) evolving as follows,
un(x, t) = φn(x)e−k(nπ/L)2t, (67)
we encounter exponential increase: replacing t with −t yields,
un(x, t) = φn(x)ek(nπ/L)2t. (68)
As such, any inaccuracies in the function will be amplified. As a result, numerical procedures associated
with running the heat/diffusion equation backwards are generally unstable.
To investigate this effect, the blurred image obtained after 100 iterations of the first experiment
was used as the initial data for a heat/diffusion equation that was run backwards in time. This may
be done by changing ∆t to −∆t in the finite difference scheme, implying that s is replaced by −s.
The result is the following “backward scheme,”
u(n−1)ij = u
(n)ij − s[u
(n)i−1,j + u
(n)i+1,j + u
(n)i,j−1 + u
(n)i,j+1 − 4u
(n)ij ]. (69)
The first blurred image of the previous experiment (lower left image of Figure 1) was used as input
into the above backward-time scheme. It is shown at the top of Figure 2. After five iterations, the
image at the lower left of Figure 2 is produced. Some deblurring of the edges has been accomplished.
(This may not be visible if the image is printed on paper, since the printing process itself introduces
a degree of blurring.) After another five iterations, additional deblurring is achieved at some edges
but at the expense of some severe degradation at other regions of the image. Note that much of the
degradation occurs at smoother regions of the image, i.e., where spatially neighbouring values of the
image function are closer to each other. This degradation is an illustration of the numerical instability
of the backward-time procedure.
Image denoising under the heat/diffusion equation
We now return to the smoothing effect of the heat/diffusion equation and ask whether or not it could
be useful. The answer is “yes” – it may be useful in the denoising of signals and/or images. In many
applications, signals and images are degraded by noise in a variety of possible situations, e.g., (i)
atmospheric disturbances, particularly in the case of images of the earth obtained from satellites or
astronomical images obtained from telescopes, (ii) the channel over which such signals are transmitted
are noisy, These are part of the overall problem of signal/image degradation which may include both
209
Figure 2. Attempts to deblur images by running the heat/diffusion equation backwards. Top: Blurred image
u(100)ij from previous experiment, as input into “backward” heat/diffusion equation scheme, with s = 0.1.
Bottom left: Result after n = 5 iterations. Some deblurring has been achieved. Bottom right: Result
after n = 10 iterations. Some additional deblurring but at the expense of degradation in some regions due to
numerical instabilities.
blurring as well as noise. The removal of such degradations, which is almost always only partial, is
known as signal/image enhancement.
A noisy signal may look something like the sketch at the left of Figure 3 below. Recalling that
the heat/diffusion equation causes blurring, one might imagine that the blurring of a noisy signal may
210
produce some deblurring, as sketched at the right of Figure 3. This is, of course, a very simplistic
idea, but it does provide the starting point for a number of signal/image denoising methods.
Denoised image u(x, t)Noisy image f(x) = u(x, 0)
A noisy signal (left) and its denoised counterpart (right).
In Figure 4 below, we illustrate this idea as applied to image denoising. The top left image is our
original, “noiseless” image u. Some noise was added to this image to produce the noisy image u at
the top right. Very simply,
u(i, j) = u(i, j) + n(i, j), (70)
where n(i, j) ∈ R was chosen randomly from the real line according to the normal Gaussian distribution
N (0, σ), i.e., zero-mean and standard deviation σ. In this case, σ = 20 was used. The above equation
is usually written more generally as follows,
u = u + N (0, σ). (71)
For reference purposes, the (discrete) L2 error between u and u was computed as follows,
‖u − u‖2 =
√
√
√
√
1
5122
512∑
i,j=1
[u(i, j) − u(i, j)]2 = 20.03. (72)
This is the “root mean squared error” (RMSE) between the discrete functions u and u. (You first
compute the average value of the squares of the differences of the greyscale values at all pixels. Then
take the square root of this average value.) In retrospect, it should be close to the standard deviation
σ of the added noise. In other words, the average magnitude of the error between u(i, j) and u(i, j)
should be the σ-value of the noise added.
The noisy image u was then used as the input image for the diffusion equation, more specifically,
the 2D finite difference scheme used earlier, i.e.,
u(n+1)ij = u
(n)ij + s[u
(n)i−1,j + u
(n)i+1,j + u
(n)i,j−1 + u
(n)i,j+1 − 4u
(n)ij ], (73)
with s = 0.1.
211
After five iterations (lower left), we see that some denoising has been produced, but at the expense
of blurring, particularly at the edges. The L2 distance between this denoised/blurred image and the
original noiseless image u is computed to be
‖u5 − u‖2 = 16.30. (74)
We see that from the viewpoint of L2 distance, i.e., the denoised image u5 is “closer” to the noiseless
image u than the noisy image u. This is a good sign – we would hope that the denoising procedure
would produce an image that is closer to u. But more on this later.
After another five iterations, as expected, there is further denoising but accompanied by additional
blurring. The L2 distance between this image and u is computed to be
‖u10 − u‖2 = 18.23. (75)
Note that the L2 distance of this image is larger than that of u5 – in other words, we have done worse.
One explanation is that the increased blurring of the diffusion equation has degraded the image farther
away from u than the denoising has improved it.
We now step back and ask: Which of the above results is “better” or “best”? In the L2 sense, the
lower left result is better since its L2 error (i.e., distance to u) is smaller. But is it “better” visually?
Quite often, a result that is better in terms of L2 error is is poorer visually. And are the denoised
images visually “better” than the noisy image itself. You will recall that some people in class had the
opinion that the noisy image u actually looked better than any of the denoised/blurred results. This
illustrates an important point about image processing – the L2 distance, although easy to work with,
is not necessarily the best indicator of visual quality. Psychologically, our minds are sometimes more
tolerant of noise than degradation in the edges – particularly in the form of blurring – that define an
image.
Image denoising using “anisotropic diffusion”
We’re not totally done with the idea of using the heat/diffusion equation to remove noise by means
of blurring. Once upon a time, someone got the idea of employing a “smarter” form of diffusion –
one which would perform blurring of images but which would leave their edges relatively intact. We
could do this by making the diffusion parameter k to be sensitive to edges – when working in the
vicinity of an edge, we restrict the diffusion so that the edges are not degraded. As we mentioned
earlier, edges represent discontinuities – places where the magnitudes of the gradients become quite
212
Figure 4. Image denoising using the heat/diffusion equation. Top left: 512× 512 pixel (8 bits=256 grayscale
levels/pixel) San Francisco test image u. Top right: Noisy image, u = u+N (0, σ) (test image plus zero-mean
Gaussian noise, σ = 20), which will serve as the initial data function for heat/diffusion equation on R2. L2
error of noisy image: ‖u − u‖2 = 20.03.
Bottom: Evolution under discrete 2D finite-difference scheme (forward time difference scheme), Left: After
n = 5 iterations, some denoising along with some blurring, ‖u5 − u‖2 = 16.30. Right: After n = 10 iterations,
some additional denoising with additional blurring, ‖u10 − u‖2 = 18.23.
large. (Technically, in the continuous domain, the gradients would be undefined. But we are working
with finite differences, so the gradients will be defined, but large in magnitude.)
213
This implies that the diffusion parameter k would depend upon the position (x, y). But this is
only part of the process – since k would be sensitive to the gradient ~∇u(x, y) of the image, it would,
in fact, be dependent upon the image function u(x, y) itself!
One way of accomplishing this selective diffusion, i.e., slower at edges, is to let k(x, y) be inversely
proportional to some power of the gradient, e.g.,
k = k(‖~∇u‖) = C‖~∇u‖−α, α > 0. (76)
The resulting diffusion equation,∂u
∂t= k(‖~∇u‖)∇2u, (77)
would be a nonlinear diffusion equation, since k is now dependent upon u, and it multiplies the
Laplacian of u. And since the diffusion process is no longer constant throughout the region, it is
no longer homogeneous but nonhomogeneous or anisotropic. As such, Eq. (77) is often called the
anisotropic diffusion equation.
To illustrate this process, we have considered a very simple example, where
k(‖~∇u‖) = ‖~∇u‖−1/2. (78)
Some results are presented in Figure 5. This simply anisotropic scheme works well to preserve edges,
therefore producing better denoising of the noisy image used in the previous experiment. The denoised
image u20 is better not only in terms of L2 distance but also from the perspective of visual quality
since its edges are better preserved.
Needless to say, a great deal of research has been done on nonlinear, anisotropic diffusion and its
applications to signal and image processing.
214
Figure 5. Image denoising and edge preservation via “anisotropic diffusion.” Top: Noisy image, u = u+N (0, σ)
(test image plus zero-mean Gaussian noise, σ = 20). L2 error of noisy image: ‖u − u‖2 = 20.03.
Bottom left: Denoising with (isotropic) heat/diffusion equation, ut = k∇2u, reported earlier. Finite-difference
scheme, s = 0.1, n = 5 iterations. L2 of denoised image: ‖u5 − u‖2 = 16.30. Bottom right: Denoising with
anisotropic heat equation, Finite-difference scheme, s = 0.1, k(‖~∇u‖) = ‖~∇u‖−1/2, n = 20 iterations. There
is denoising but much less blurring around edges. L2 error: ‖u20 − u‖2 = 15.08. Not only is the result from
anisotropic diffusion better in the L2 sense but it is also better visually, since edges have been better preserved.
215
Lecture 30
Fourier Transforms (cont’d)
At this point, we stop to examine some of the mathematical properties of the Fourier transform –
such details were skipped over in previous lectures so that we could quickly arrive at some operational
results on how to solve PDEs with FTs.
Let us recall the basic formula for the Fourier transform, F (ω), of a function f : R → R:
F (ω) =1
2π
∫
∞
−∞
f(x)eiωxdx, ω ∈ R. (79)
First of all, the most obvious aspect of F (ω) is that it is complex-valued, i.e., F : R → C. This
follows from the Euler formula for the complex exponential - we won’t go into any particulars since
they are not important to the present discussion.
Let us consider the FT in Eq. (1) as defining a mapping of the function f(x) to the function
F (ω). Note that both functions, f and F , are defined over the real line R – it is convention that “x”
(space variable) or “t” (time variable) are used to denote the argument of f and “ω” (frequency) the
argument of F .
So we’ll consider the Fourier transform as a mapping
F : f → F, or F(f) = F. (80)
And we also have encountered the inverse mapping,
F−1 : F → f, or F−1(F ) = f. (81)
The first thing that we can say about F (and its inverse) is that it is linear, i.e., if F1 = F(f1)
and F2 = F(f2), then
F(c1f1 + c2f2) = c1F1 + c2F2. (82)
This is easily verified using the definition in Eq. (79).
However, we would like to make more mathematical sense of this mapping. For example, do all
functions f(x) have Fourier transforms? Or are there some for which the Fourier transform in (79) is
undefined? As an example, consider the function f(x) = 1 for x ∈ R. From Eq. (79, its FT is given
by
F (ω) =1
2π
∫
∞
−∞
eiωxdx, ω ∈ R. (83)
216
Does this integral make sense? Let’s consider the trunctated integral
Fb(ω) =1
2π
∫ b
−beiωxdx
=1
2iπω
[
eiωb − e−iωb]
=sin(ωb)
πω. (84)
For each value of ω, the limit of Fb(ω) as b → ∞ does not exist – Fb(ω) simply oscillates with b.
Therefore, the FT, as an improper integral does not exist.
This simple example indicates that we’ll have to be careful a specify what kinds of functions are
being mapped to what other functions. In other words, what spaces of functions is being mapped to
what other corresponding spaces?
It is not too hard to show that if f belongs to the space L1(R) introduced earlier, that is, the set
of functions f : R → R for which∫
∞
−∞
|f(x)| dx < ∞, (85)
then the integral in (79) is well-defined, i.e., is finite. But since we are now encountering complex-
valued functions F (ω), it makes sense to extend our definition of these function spaces. We’ll let
L1(R) now denote the set of integrable complex-valued functions, i.e.,
L1(R) = {f : R → C ,
∫
∞
−∞
|f(x)| dx < ∞}, (86)
where |f(x)| denotes the modulus of the complex number f(x). Furthermore, we let the above integral
define the L1-norm of the function f :
‖f‖1 =
∫
∞
−∞
|f(x)| dx, (87)
which is finite.
The extension of our space of functions to complex-valued ones is not a big deal: It merely states
that we can take Fourier transforms of complex-valued functions. In applications, we normally deal
with real-valued functions, but sometimes it’s convenient to extend the treatment to cover complex-
valued ones. After all, when we take the inverse Fourier transform, it will often be the inverse FT of
a complex-valued function.
217
So let’s now assume that f ∈ L1(R) and look at its Fourier transform. In particular, we’ll look
at the truncated integral:
Fb(ω) =1
2π
∫ b
−bf(x)eiωxdx. (88)
Noting that |eiωx| ≤ 1, we examine the modulus of this truncated integral:
|Fb(ω)| =
∣
∣
∣
∣
1
2π
∫ b
−bf(x)eiωxdx
∣
∣
∣
∣
≤ 1
2π
∫ b
−b|f(x)eiωx| dx
≤ 1
2π
∫ b
−b|f(x)| dx. (89)
But now note that, by definition,
|F (ω)| = limb→∞
|Fb(ω)| = limb→∞
1
2π
∫ b
−b|f(x)| dx =
1
2π‖f‖1 < ∞. (90)
In other words, the Fourier transform F (ω) is well-defined for all ω ∈ R.
Unfortunately, we cannot go further: It is not necessarily true that if f ∈ L1(R), then its Fourier
transform F (ω) is an L1 function. This, however, is not a stumbling block in this course.
In many applications, it is desirable to work in another space of integrable functions, namely the
square-integrable functions L2. We shall modify the definition given earlier in this course to include
complex-valued functions, i.e.,
L2(R) = {f : R → C |∫
∞
−∞
|f(x)|2 dx < ∞}. (91)
As in the case of real-valued functions, this space is an inner product space: The inner product of two
functions f and g is given by
〈f, g〉 =
∫
∞
−∞
f(x)∗g(x) dx, (92)
where f(x)∗ denotes the complex conjugate of f(x). The L2-norm of a function f ∈ L2(R) is then
given by
‖f‖2 = 〈f, f〉1/2 =
[∫
∞
−∞
f(x)∗f(x) dx
]1/2
=
[∫
∞
−∞
|f(x)|2 dx
]1/2
. (93)
We’ll show below that the Fourier transform F maps L2(R) functions to L2(R) functions, i.e.,
F : L2(R) → L2(R). (94)
218
This is important in quantum mechanics. First of all, the space L2(R) (if we simply limit ourselves
to one-dimensional systems on R for the moment) is natural since we wish the (complex-valued)
wavefunction Ψ(x) to be square-integrable. In fact, one requires that Ψ satisfy the normalization
condition,∫
∞
−∞
Ψ(x)∗Ψ(x) dx = 1. (95)
As those who have studied quantum mechanics know, the probability of finding the particle in the
interval (x, x + dx) is given by |Ψ(x)|dx. Since the probability of finding the particle somewhere in
space, i.e., over the real line R is unity, we have the above normalization condition.
And as those who have studied quantum mechanics may also have seen, the Fourier transform
F (ω) is related to the so-called momentum representation of the wavefunction Ψ(x).
Note that a necessary condition for a function f(x) to belong to L1(R) or L2(R) is that f(x) → 0
as x → ±∞.
Parseval’s Identity
Relevant section of text: 10.4.3
We now derive an important result that relates the L2-norms of a function f(x) and its Fourier
transform F (ω).
Let f, g ∈ L2(R) with Fourier transforms, F,G ∈ L2(R), respectively. Furthermore, suppose that
H(ω) = F (ω)G(ω). (96)
From the Convolution Theorem for FTs, it follows that h, the inverse FT of H, is given by
h =1
2πf ∗ g = F−1(FG). (97)
From the definition of the FT and its inverse, this implies that
1
2π
∫
∞
−∞
f(s)g(x − s) ds =
∫
∞
−∞
F (ω)G(ω)e−iωxdω. (98)
This relation is true for all ω ∈ R, including x = 0, so that
1
2π
∫
∞
−∞
f(s)g(−s) ds =
∫
∞
−∞
F (ω)G(ω)dω. (99)
219
Now for a function f(x), define the function g(x) such that
g(−x) = f(x)∗. (100)
Then the LHS of Eq. (99) becomes1
2π
∫
∞
−∞
f(s)f(s)∗ ds. (101)
It now remains to make sense of the RHS of Eq. (99). We need to determine what the Fourier
transform G(ω) is. By definition,
G(ω) =1
2π
∫
∞
−∞
g(s)eiωsds
=1
2π
∫
∞
−∞
f(−s)∗eiωsds. (102)
Now make the change of variable u = −s, du = −ds, etc., so that the above integral becomes
G(ω) = − 1
2π
∫
−∞
∞
f(u)∗e−iωudu
=1
2π
∫
∞
−∞
f(u)∗e−iωudu
=
[
1
2π
∫
∞
−∞
f(u)eiωudu
]
∗
= F (ω)∗. (103)
We now substitute the results of (101) and (103) into (99) to obtain the final result,
1
2π
∫
∞
−∞
f(s)f(s)∗ ds =
∫
∞
−∞
F (ω)F (ω)∗dω. (104)
This is Parseval’s identity. It is the continuous analogue of the (discrete) Parseval equality relating
the L2-norm of a function f in terms of the sum of squares of its Fourier coefficients with respect to
an orthonormal basis {φn}. The above result can be expressed in terms of the L2 norms of f and F :
1
2π‖f‖2
2 = ‖F‖22. (105)
In signal processing language, up to the factor 1/(2π), the “energy” of the signal f(x) is equal to the
energy of its Fourier transform F (ω). The appearance of the factor 1/(2π) is due to its appearance
in the definition of the Fourier transform and not in the inverse. In other, “symmetric” definitions,
there is no factor appearing in Parseval’s identity.
220
Some final comments:
1. Parseval’s identity tells us that the Fourier transform F is a mapping from L2(R) to L2(R).
From Eq. (105), it follows that if f ∈ L2(R), i.e., ‖f‖2 < ∞, then F is also in L2(R), i.e.,
‖F‖2 < ∞.
2. This result now allows us to conclude that the L2 norm of the solution u(x, t) to the heat equation
goes to zero, i.e.,
‖u(x, t)‖2 → 0, as t → ∞. (106)
Remember that we showed that ‖u(x, t)‖2 was decreasing in time, but we couldn’t show that it
was decreasing to zero. We may now use Parseval’s inequality to establish this result. Recall
that the time evolution of the Fourier transform of u(x, t) is given by
U(ω, t) = F (ω)e−kω2t, (107)
where F = F(f) is the FT of the initial temperature distribution.
The squared L2 norm of U(ω, t) is given by
‖U(ω, t)‖22 = 〈U(ω, t), U(ω, t)〉
=
∫
∞
−∞
|F (ω)|2e−2kω2t dω. (108)
We now claim that the function F (ω), the Fourier transform of f(x), is bounded for all ω ∈ R.
The proof of this statement follows in the same way that we showed that F (ω) was bounded in
the case that f(x) is an L1 function.
Now define
M = maxω∈R
|F (ω)|. (109)
The integral in the second line of Eq. (108) may then be bounded as follows,
∫
∞
−∞
|F (ω)|2e−2kω2t dω ≤ M2
∫
∞
−∞
e−2kω2t dω. (110)
The integral on the RHS may be evaluated with the help of the following result that we have
used earlier in our discussion of Gaussian functions,
∫
∞
−∞
e−x2
dx =√
π. (111)
221
We make the change of variable x = ω√
2kt, so that dx =√
2kdω. With a just a little work, we
find that∫
∞
−∞
e−2kω2t dω ≤√
π
2kt. (112)
Putting all the results together, we obtain the result,
‖U(ω, t)‖22 ≤ M2
√
π
2kt→ 0 as t → ∞. (113)
From Parseval’s identity,1
2π‖u(x, t)‖2
2 = ‖U(ω, t)‖22, (114)
it follows that
‖u(x, t)‖2 → 0 as t → ∞, (115)
thus proving the desired result.
222