imperial college london department of physics mathematics...

83
Imperial College London Department of Physics Mathematics for the Theory of Materials: Lectures 1–13 Dr Arash Mostofi Comments and corrections to a.mostofi@imperial.ac.uk Lecture notes may be found on Blackboard (http://bb.imperial.ac.uk)

Upload: trinhtuyen

Post on 23-Feb-2019

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

Imperial College London

Department of Physics

Mathematics for the Theory of Materials: Lectures 1–13

Dr Arash Mostofi

Comments and corrections to [email protected]

Lecture notes may be found on Blackboard (http://bb.imperial.ac.uk)

Page 2: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

ii

Page 3: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

Contents

1 Vector Spaces and Tensors 11.1 Learning Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4.2 Definition of a Vector Space . . . . . . . . . . . . . . . . . . . . . . . . 31.4.3 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4.4 Basis Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4.5 The Inner (or Scalar) Product and Orthogonality . . . . . . . . . . . . . 4

1.5 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.5.1 Notation* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.5.2 Basic Matrix Algebra* . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.5.3 Suffix Notation and Summation Convention . . . . . . . . . . . . . . . . 61.5.4 Special Types of Matrix* . . . . . . . . . . . . . . . . . . . . . . . . . . 81.5.5 Levi-Civita Symbol (Alternating Tensor) . . . . . . . . . . . . . . . . . . 101.5.6 Determinants* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.5.7 Linear Equations and Inverse Matrices* . . . . . . . . . . . . . . . . . . 121.5.8 Eigenvalues and Eigenvectors* . . . . . . . . . . . . . . . . . . . . . . . 131.5.9 Normal Modes* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.6 Scalars, Vectors and Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.7 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.7.1 Revision: Transformation of Vectors . . . . . . . . . . . . . . . . . . . . 171.7.2 Transformation of Rank-Two Tensors . . . . . . . . . . . . . . . . . . . 191.7.3 Definition of a Tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.7.4 Tensors vs Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.8 Diagonalisation of Rank-Two Tensors . . . . . . . . . . . . . . . . . . . . . . . . 221.8.1 Representation Quadric . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.8.2 Stationary Property of Eigenvalues . . . . . . . . . . . . . . . . . . . . . 24

1.9 Tensor Properties of Crystals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.9.1 Matter Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.9.2 Neumann’s Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.9.3 Thermal Conductivity as an Example . . . . . . . . . . . . . . . . . . . . 261.9.4 Steady-State Heat Flow in Crystals . . . . . . . . . . . . . . . . . . . . . 29

1.10 Tensor Differential Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291.10.1 Tensor Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

iii

Page 4: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

Contents

1.10.2 Suffix Notation for ∇ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311.10.3 ∇ as a Rank-One Tensor . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2 Green Functions 332.1 Learning Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.2 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.4 Variation of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.4.1 Homogeneous Initial Conditions . . . . . . . . . . . . . . . . . . . . . . 352.4.2 Inhomogeneous Initial Conditions . . . . . . . . . . . . . . . . . . . . . . 352.4.3 Homogeneous Two-Point Boundary Conditions . . . . . . . . . . . . . . 36

2.5 Dirac Delta-Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.5.1 Properties of δ(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.5.2 δ(x) as the Limit of a Top-Hat Function . . . . . . . . . . . . . . . . . . 382.5.3 δ(x) as the Limit of a Sinc Function . . . . . . . . . . . . . . . . . . . . 392.5.4 δ(x) as the Continuous Analogue of δij . . . . . . . . . . . . . . . . . . . 41

2.6 Green Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3 Hilbert Spaces 433.1 Learning Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.2 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.4 Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.5 Sturm-Liouville Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.5.1 Self-Adjoint Differential Operators . . . . . . . . . . . . . . . . . . . . . 453.5.2 Eigenfunctions and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . 463.5.3 Eigenfunction Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . 473.5.4 Eigenfunction Expansion of Green Function . . . . . . . . . . . . . . . . 483.5.5 Eigenfunction Expansions for Solving ODEs . . . . . . . . . . . . . . . . 49

3.6 Legendre Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.7 Spherical Harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Integral Transforms 554.1 Learning Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.2 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.4 Fourier Series* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.4.1 The Fourier Basis* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.4.2 Fourier Coefficients* . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.4.3 Symmetry* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.4.4 Discontinuous Functions* . . . . . . . . . . . . . . . . . . . . . . . . . . 594.4.5 Parseval’s Theorem and Bessel’s Inequality* . . . . . . . . . . . . . . . . 60

4.5 Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.5.1 Definition and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.5.2 Dirac Delta-Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

iv

Page 5: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

Contents

4.5.3 Properties of the Fourier Transform . . . . . . . . . . . . . . . . . . . . 634.5.4 Examples of Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . 644.5.5 Bandwidth Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.6 Diffraction Through an Aperture . . . . . . . . . . . . . . . . . . . . . . . . . . 654.6.1 Parseval’s Theorem for Fourier Transforms . . . . . . . . . . . . . . . . . 664.6.2 Convolution Theorem for Fourier Transforms . . . . . . . . . . . . . . . 684.6.3 Double Slit Diffraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.6.4 Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.7 Laplace Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.7.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.7.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.7.3 Solving Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . 754.7.4 Convolution Theorem for Laplace Transforms . . . . . . . . . . . . . . . 75

v

Page 6: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

Contents

vi

Page 7: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

Chapter 1

Vector Spaces and Tensors

1.1 Learning Outcomes

To understand the following concepts and be able to apply them to solving problems: vector spacesand their properties; the basic properties and rules of tensor algebra and their transformationlaws; suffix notation and Einstein summation convention; the Kronecker and Levi-Civita symbols;principal axes and diagonalisation of symmetric tensors; the application of tensors to the propertiesof crystals; Neumann’s principle; tensor fields; the representation and use of grad, div and curl insuffix notation.

1.2 Further Reading

You are encouraged to read beyond these lecture notes. This has a number of benefits: it allowsyou to explore material that goes beyond the core concepts included here; it encourages you todevelop your skills in assimilating academic literature. Seeing a subject presented in a differentways by different authors can enhance your understanding of it.

1. Riley, Hobson & Bence, Mathematical Methods for Physics and Engineering, 3rd Ed., Chap-ter 8: Matrices and Vector Spaces; ibid. Chapter 9: Normal Modes; ibid. Chapter 26:Tensors, Sections 1–9, and 12;

2. Lovett, Tensor Properties of Crystals. A nice little book covering crystallography (Ch. 1),tensor algebra (Ch. 2), and applications to materials (Ch. 3–7).

3. Nye, Physical Properties of Crystals. Similar to Lovett, but at a much more advanced level.

1

Page 8: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

2 TSM-CDT: Mathematics: Arash Mostofi

1.3 Introduction

There are properties of materials that do not depend on the direction in which they are measured.Take, for instance, density: it is the ratio of the mass of an object to its volume, both scalarquantities and, hence, both independent of direction, which means the density itself is a scalarand independent of direction.

There are other properties of crystals that do depend on direction, and this can often be the reasonwhy a material may be scientifically interesting or technologically useful. Electrical conductivityσ, for instance, is the ratio of the resultant current density J to the applied electric field E . Both

J and E are quantities which possess a direction as well as a magnitude (i.e., they are vectors).For the same crystal, the result of a measurement of σ will depend on the direction in which E

is applied and J is measured. This means that a single value of σ is not enough to specify the

properties of the crystal. In fact, σ is a second-rank tensor, which means that it is a quantity that

can be represented as a matrix with two suffixes σij , i , j ∈ 1, 2, 3, and hence nine components.1

It turns out that many crystal properties can be thought of as being tensors: elasticity, thermalconductivity, permittivity, magnetic susceptibility, thermal expansivity and many more besides.

This chapter is about the mathematics of such quantities, how to manipulate them, and how tounderstand what they mean. But first we must review some concepts that you should be familiarwith: vectors, vector spaces, matrices, eigenvalues and eigenvectors.

It is worth noting that tensors are not just important in the study of materials. For example,Einstein’s theory of general relativity, which describes the force of gravity as being a manifestationof the curvature of spacetime caused by the presence of matter and energy, may be expressedmathematically as a tensor equation.

1.4 Vector Spaces

1.4.1 Introduction

You should be familiar with the concept of a vector2 to describe quantities that have both amagnitude and a direction in three-dimensional space. Those of you who have studied the theoryof Special Relativity will have come across 4-vectors, vectors in four-diomensional space-time. Imay not surprise you to learn that the concept of a vector can be extended to an arbitrary numberof dimensions and the “space” in which these vectors live can refer to more abstract quantitiesthan, say, one would usually associate with the word “space”.

1Since the crystal will have some point-group symmetry, not all of these components will be independent.2The word means carrier, from the Latin vehere, to carry.

Page 9: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

1. Vector Spaces and Tensors 3

1.4.2 Definition of a Vector Space

A set of vectors satisfying the following relations constitute a linear vector space:

1. Closed3 under addition, both commutative and associative:

u + v = v + u (u + v) + w = u + (v + w)

2. Closed under multiplication, both distributive and associative, by a scalar4:

a(u + v) = au + av a(bu) = (ab)u

3. There exists a null vector 0:v + 0 = v

4. Multiplication by the unit scalar leaves a vector unaltered:

1× v = v

5. Each vector v has a corresponding negative vector −v :

v + (−v) = 0

This last property enables subtraction by defining u − v ≡ u + (−v).

1.4.3 Linear Independence

A set of n non-zero vectors ui is said to be linearly independent if

n∑i=1

aiui = 0⇒ ai = 0 ∀ i ,

which implies that no vector in the set can be expressed as a linear combination of all the others.

Conversely, if a set of scalars ai, at least one of which is non-zero, can be chosen such thatthe sum on the left-hand side is the null vector, then the set of vectors ui is said to be linearlydependent.

Linear independence can be used to define the dimensionality of a vector space V , which is saidto be of dimension n if there exists a set of n linearly independent vectors in V , but all sets ofn + 1 vectors are linearly dependent.

3Closure of a set of elements under a given operation implies that the result of the operation is also an elementof the set.

4A scalar is a quantity with magnitude only. The word is derived from the Latin scala, which means ladder.

Page 10: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

4 TSM-CDT: Mathematics: Arash Mostofi

1.4.4 Basis Vectors

Any set of n linearly independent vectors ui in an n-dimensional vector space V is a basis forV . Any vector v in V can be represented as a linear combination of the basis vectors:

v =n∑

i=1

aiui

where the set of scalars ai are unique to the given vector v .

Examples that you will be most familiar with are:

• The three Cartesian basis vectors in three-dimensional Euclidean space (E3). Any vectorv ∈ E3 can be written uniquely as

v = v1e(1) + v2e(2) + v3e(3)

where v1, v2, v3 are the components of v in the basis e(1), e(2), e(3).

• The two-dimensional representation of the set complex numbers (C) in terms of the basis1, i. Any complex number z ∈ C may be expressed uniquely as

z = x + iy

where x and y are the components of z in the basis 1, i.

1.4.5 The Inner (or Scalar) Product and Orthogonality

For every pair of vectors u and v in V , we define the inner product 〈u|v〉 ∈ C. The inner productcan be thought of as the generalisation of the familiar scalar (or dot) product of three-vectors,u ·v = uv cos θ where θ is the angle between u and v . We will use u ·v and 〈u|v〉 interchangeably.

The inner product has the following properties

1. 〈u|v〉 = 〈v |u〉∗ ⇒ 〈v |v〉 ∈ R

2. 〈u|av 1 + bv 2〉 = a 〈u|v 1〉+ b 〈u|v 2〉

3. ||v ||2 ≡ 〈v |v〉 ≥ 0

4. ||v || = 0⇒ v = 0

(1) and (2) imply that〈av 1 + bv 2|u〉 = a∗ 〈v 1|u〉+ b∗ 〈v 2|u〉 .

The norm of a vector is defined as ||v || ≡ 〈v |v〉1/2 and is analogous to the length |v | of athree-vector. We will often use |v | and ||v || interchangeably.

Page 11: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

1. Vector Spaces and Tensors 5

Two vectors are said to be orthogonal if their inner product is zero. If they each also have unitnorm, they are said to be orthonormal. The Cartesian basis in three-dimensions is an example ofan orthonormal basis since

e(i) · e(j) =

0 i 6= j1 i = j

1.5 Matrices

This section is mainly revision and included to refresh the memory. Knowledge of subsectionsmarked with an asterisk is assumed, therefore they will be skipped over rapidly in lectures.

1.5.1 Notation*

A matrices are arrays of numbers, or elements, in the form

(3 14 1

) 5 92 65 3

5 8 97 9 32 3 8

that obey a special algebra, i.e., a special set of rules for their addition, multiplication etc. Lower-case aij is used to denote the element that is in the i th row and j th column of the matrix A. You

will sometimes see the elements of a matrix denoted as [B]ij , which is equivalent to writing bij .

A subscript used in this way is known as an index (plural indices) or, equivalently, a suffix (pluralsuffixes). Using suffix notation is often a very handy way of doing matrix algebra. We shall returnto this later.

1.5.2 Basic Matrix Algebra*

The following is a list of the properties of matrix algebra.

1. Equality. For two matrices A and B to be equal they must be of the same size, i.e., have

the same number of rows and columns, and all their elements must be equal

A = B ⇔ aij = bij 1 ≤ i ≤ m, 1 ≤ j ≤ n

2. Multiplication by a scalar. Multiplying a matrix A by a scalar λ results in all of the

elements aij of A being multiplied by λ

B = λA ⇔ bij = λaij 1 ≤ i ≤ m, 1 ≤ j ≤ n

Page 12: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

6 TSM-CDT: Mathematics: Arash Mostofi

3. Addition and subtraction. Only matrices of the same size may be added to or subtractedfrom one another.

C = A± B ⇔ cij = aij ± bij 1 ≤ i ≤ m, 1 ≤ j ≤ n

4. Matrix addition is commutative and associative:

A + B = B + A, A + (B + C) = (A + B) + C

5. Matrix multiplication. The product C = AB exists if and only if the number of columns

of A is equal to the number of rows of B. The matrix product of an m× p matrix A and a

p × n matrix B is defined as

C = AB ⇔ cij =

p∑k=1

aikbkj 1 ≤ i ≤ m, 1 ≤ j ≤ n

6. Matrix multiplication is associative and distributive, but in general not commutative:

(AB)C = A(BC), (A + B)C = AC + BC, AB 6= BA

1.5.3 Suffix Notation and Summation Convention

Suffix notation is a powerful tool for the manipulation of tensors. Although we will be mainlyconcerned with three-dimensions, i.e., all suffixes take values 1, 2 or 3, the notation is much moregeneral and equally well may be applied to N-dimensions.

When using suffix notation, a shorthand that is often used involves ommitting the summationsymbol when there is a repeated suffix. For example, the expression for the elements of C = AB

iscij = [AB]ij =

∑k

aikbkj (1.1)

and this may be written ascij = aikbkj (1.2)

where it is implicitly assumed that there is a summation over the repeated index k . This shorthandis known as the Einstein summation convention.

There are three basic rules to suffix notation:

1. In any one term of an expression, suffixes may appear only once, twice or not at all.

2. A suffix that appears only once on one side of an expression must also appear once on theother side. It is called a free suffix or free index.

3. A suffix that appears twice is summed over. It is called a dummy suffix or dummy index.

Page 13: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

1. Vector Spaces and Tensors 7

The scalar (dot) product of two vectors may, for example, be expressed as

a · b = a1b1 + a2b2 + a3b3 = aibi

A matrix-vector multiplication, for example,

y = Mx =

m11 m12 m13

m21 m22 m23

m31 m32 m33

x1

x2

x3

=

m11x1 + m12x2 + m13x3

m21x1 + m22x2 + m23x3

m31x1 + m32x2 + m33x3

may be expressed in suffix notation as

yi = Mikxk ,

where i is the free suffix and k is the dummy suffix. Dummy suffixes are so-called because it doesnot matter what symbol we use to represent them, as long as they always occur in pairs and aslong as we do not use a symbol that already appears elsewhere in the expression. Hence,

ai = Mijbj = Miξbξ = Mi•b•.

Free suffixes must match on both sides of the equation, and indeed in every term of the equation.Hence,

ai = Mijbj − Qipcp and aζ = Mζjbj − Qζkck

are equivalent (and valid) expressions, whereas

ai = Mkjbj + Qipcp

is an incorrect use of suffix notation as the left hand side has a free suffix i which is not matchedin the first term on the right hand side (which has a free suffix k).

Some more examples are given below:

• y = Ax + z ⇔ yi = aijxj + zi

• C = AB ⇔ cij = aikbkj

• D = CT ⇔ dij = cji

• C = ATB ⇔ cij = akibkj

• tr[A] = aii

• tr[ABC] = aipbpqcqi

The action of replacing two different free suffixes by a single dummy suffix (which is then summedover) is known as contraction. For example, cij represents the ij th element of matrix C . Con-tracting the two free suffixes gives cii (or, equivalently cjj , or indeed any other choice of symbolfor the dummy suffix) which is then just tr[C].

Page 14: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

8 TSM-CDT: Mathematics: Arash Mostofi

1.5.4 Special Types of Matrix*

1. Transpose. The transpose of a matrix M is obtained by swapping the rows and columns

and is denoted with a superscript “T”.

[AT]ij = [A]ji .

The transpose of a product is equal to the product of the transposes in reverse order:

(AB)T = BTAT,

and, of course, (AT)T = A.

2. Hermitian conjugate. The Hermitian conjugate of a matrix is denoted by a superscriptdagger † and is defined as

M† ≡ (M∗)T ≡ (MT)∗

where “*” represents complex conjugation. For real matrices M ∈ R, the operations of

Hermitian conjugation and transposing are identical.

3. Column and row matrices. A column matrix has m rows and 1 column. A three-vector umay be thought of as a column matrix:

u =

ux

uy

uz

.

A row matrix has 1 row and n columns, e.g., uT is a row matrix

uT =(ux uy uz

).

It is worth noting that the inner product for vectors may be written as a matrix product

a · b = a†b = (b†a)∗

4. Square matrix. If A is a square matrix, then it has the same number of rows as columns.

The trace of a square n × n matrix is the sum of its diagonal elements and is denoted

tr(A) ≡n∑

i=1

aii ≡ aii

The trace has a cyclic property, namely that

tr(ABC) = tr(BCA) = tr(CAB)

5. Diagonal matrix. If Λ is diagonal, then λij = 0 if i 6= j .

Page 15: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

1. Vector Spaces and Tensors 9

6. Unit matrix. The n × n unit matrix, also known as the identity matrix, is conventionallydenoted by I(n), or just I or 1, and is a special case of diagonal matrices in which the diagonal

elements are all equal to 1:

I =

1 0 0 · · ·0 1 0 · · ·0 0 1 · · ·...

......

. . .

The elements of I are usually denoted by δij where

δij =

1 i = j0 i 6= j

δij is called the Kronecker delta5. The main properties of the Kronecker delta are

• Symmetric: δij = δji

• Trace: δii = n (in n dimensions)

• “Sifting”: xi = δijxj

7. Inverse matrix. The inverse matrix A−1 of A satisfies

A−1A = AA−1 = I

The inverse of a product is equal to the product of the inverses in reverse order

(AB)−1 = B−1A−1

8. Tridiagonal matrix. A matrix in which only the main diagonal and the diagonals imme-diately above and below it contain non-zero entries. These matrices appear often in thenumerical solution of physical problems.

9. Upper (lower) triangular matrix. A matrix in which only the main diagonal and all theelements above (below) it are non-zero. The determinant of such a matrix turns out to beequal to the product of the diagonal elements and the inverse of an upper (lower) triangularmatrix is also upper (lower) triangular.

10. Symmetric and anti-symmetric matrices. If S is symmetric, then ST = S or, alternatively,

sij = sji . If A is anti-symmetric, then AT = −A or, alternatively, aij = −aji .

• All anti-symmetric matrices have zeros on the diagonal.

• Every square matrix may be expressed as the sum of a symmetric and an anti-symmetricmatrix.

5Named after the German mathematician and logician Leopold Kronecker (1823-1891). He believed that “Godmade the integers; all else is the work of man”.

Page 16: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

10 TSM-CDT: Mathematics: Arash Mostofi

11. Orthogonal matrix. If R is orthogonal, then it satisfies

RRT = RTR = I ⇔ R−1 = RT

It is worth noting that the n columns of any orthogonal matrix form an n-dimensionalorthogonal basis of vectors. Why?

12. Hermitian and anti-Hermitian matrices. These may be thought of as the complex gen-eralisations of symmetric and anti-symmetric matrices. If H is Hermitian, then H† = H or,

alternatively, hij = h∗ji . If A is anti-Hermitian, then A† = −A or, alternatively, aij = −a∗ji .

• All anti-Hermitian matrices have zeros on the diagonal.

• Every complex square matrix may be expressed as the sum of a Hermitian and ananti-Hermitian matrix.

13. Unitary matrix. These may be thought of as complex generalisations of orthogonal matri-ces. U is unitary if

UU† = U†U = I ⇔ U−1 = U†

1.5.5 Levi-Civita Symbol (Alternating Tensor)

The Levi-Civita symbol or alternating tensor, is a rank-three isotropic tensor, and is defined as

εijk =

1 (i , j , k) = (1, 2, 3), (2, 3, 1), (3, 1, 2)−1 (i , j , k) = (1, 3, 2), (2, 1, 3), (3, 2, 1)

0 otherwise

In three dimensions, each suffix can take the value 1, 2, or 3, therefore εijk has 33 = 27 elements.From the definition, it can be seen that 21 of these are zero (whenever any two suffixes are equal)and the remaining six are ±1, depending on whether (i , j , k) is a cyclic or anti-cyclic permutationof (1, 2, 3). In other words,

• ε123 = ε231 = ε312 = 1

• ε132 = ε213 = ε321 = −1

• εijk = 0 if i = j or j = k or i = k

From the definition follows the cyclic property εijk = εjki = εkij and εijk = −εjik etc.

The alternating tensor can be used to write vector (cross) products in suffix notation:

c = a× b ⇔ ci = εijkajbk .

Page 17: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

1. Vector Spaces and Tensors 11

A useful identity involving the contraction of two alternating tensors is

εijkεklm = δilδjm − δimδjl .

For example, we can use it to prove the vector identity

a× (b × c) = (a · c)b − (a · b)c

as follows

[a× (b × c)]i = εijkaj [b × c]k

= εijkajεklmblcm

= (δilδjm − δimδjl)ajblcm

= (ajcj)bi − (ajbj)ci

= (a · c)[b]i − (a · b)[c]i

Note the compactness of the derivation as compared to doing it in Cartesian coordinates andwriting out the vector product explicitly.

1.5.6 Determinants*

Associated with each square matrix A there is a scalar quantity called the determinant that is

denoteddet A ≡ |A|

For a 2× 2 matrix, it is given by det A = a11a22 − a12a21, and for a 3× 3 matrix it is

det A ≡

∣∣∣∣∣∣a11 a12 a13

a21 a22 a23

a31 a32 a33

∣∣∣∣∣∣ = a11

∣∣∣∣a22 a23

a32 a33

∣∣∣∣− a12

∣∣∣∣a21 a23

a31 a33

∣∣∣∣+ a13

∣∣∣∣a21 a22

a31 a32

∣∣∣∣The alternating tensor may be used to express the determinant of a 3× 3 matrix A:

detA = εijka1ia2ja3k

The general form for the determinant of an n × n matrix is given by

det A =n∑

k=1

a1k(−1)k−1 det A(1k)

where A(1k) is the (n − 1)× (n − 1) matrix obtained by deleting the first row and kth column of

A. This formula can be repeated until the problem is reduced to just a sum over determinants of

2× 2 matrices.

Determinants have the following properties:

Page 18: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

12 TSM-CDT: Mathematics: Arash Mostofi

1. In the general formula above, the determinant is expanded over the first row of elements,but we may do so over any row or column, as long as we get the signs of the terms correct.E.g., over the i th row we have

det A =n∑

k=1

aik(−1)k−i det A(ik)

and over the kth column we have

det A =n∑

i=1

aik(−1)k−i det A(ik).

det A(ik) is known as the minor of aik , and (−1)k−i det A(ik) is known as the cofactor of aik .

Cofactors are just minors that are multiplied by an appropriate sign (−1)k−i .

2. det A = 0 if

• all the elements of one row (or column) are zero

• any two rows (or two columns) are identical

• any two rows (or two columns) are proportional

3. If two rows (or two columns) of a matrix are interchanged, then the sign of the determinantchanges.

4. If each element in one row (or column) of a matrix is multiplied by a scalar λ, then thedeterminant is also multiplied by λ.

5. For an n × n matrix, det(λA) = λn det A

6. The determinant is unchanged if a multiple of a particular row (or column) is added to thecorresponding elements of another row (or column).

7. Interchanging all the columns and rows does not affect the value of the determinant, i.e.,det A = det AT

8. The determinant of a product is the product of determinants, i.e., det(AB) = det A det B

1.5.7 Linear Equations and Inverse Matrices*

An example of a system of linear equations is

a11x1 + a12x2 = y1

a21x1 + a22x2 = y2⇔ aijxj = yi ⇔ Ax = y

If y = 0, then the equations are said to be homogeneous. Otherwise they are known as inhomo-geneous. Knowledge of the inverse of A enables solution of the equations. If and inverse exists,

then it is unique.

Page 19: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

1. Vector Spaces and Tensors 13

The general expression for the inverse of an n × n matrix A is

[A−1]ij =(−1)i−j det A(ji)

det A≡

adj A

det A

where A(ji) is the matrix A with the j th row and i th column removed. Notice the interchange

between i and j . The numerator is known as the adjoint of A and is obtained by replacing each

element of A with its cofactor and taking the transpose of the result.

If det A = 0, then A−1 does not exist and A is said to be singular. If a matrix A is singular,

then the vectors composed of the rows (or columns) of A must be linearly dependent. In other

words, one row (or column) can be written as a linear combination of the others. If a matrix isnon-singular, then its rows (or columns) are linearly independent and A−1 exists.

Consider the homogeneous linear equations given by

Ax = 0 ⇔ aijxj = 0

1. If A is non-singular, then det A 6= 0, A−1 exists, and there is one unique solution x = 0.

2. If A is singular, then det A = 0 and A−1 does not exist. There are two possibilities, either

• x = 0, or

• there is a family of solutions

Now consider the inhomogeneous equation

Ax = y ⇔ aijxj = yi

1. If A is non-singular, then A−1 exists and there is a unique solution x = A−1y .

2. If A is singular, then det A = 0 and A−1 does not exist. There are two possibilities, either

• there is a family of solutions, or

• the equations are inconsistent and there is no solution

1.5.8 Eigenvalues and Eigenvectors*

An eigenvalue equation takes the form

aijxj = λxi

Page 20: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

14 TSM-CDT: Mathematics: Arash Mostofi

where aij are the componenets of a rank-two tensor, and x is an eigenvector with correspondingeigenvalue λ. For an n×n symmetric tensor there will be n distinct eigenvectors and correspondingeigenvalues.

In the context of material properties, eigenvectors are also known as principal directions andeigenvalues as principal values.

Rearranging the eigenvalue equation gives

(aij − λδij)xj = 0

which has non-trivial solutions (x 6= 0) if

det(A− λI) = 0

This condition, when expanded out, gives an nth-order polynomial in λ, known as the characteristicequation, whose n roots λ(i), i ∈ 1, ... , n, are the n eigenvalues of A. Substituting each

eigenvalue back into the original eigenvalue equation enables the eigenvector x (i) correspondingto each eigenvalue λ(i) to be found.

Three important properties of symmetric tensors are that:

1. They are always diagonalisable

2. Their eigenvalues are real

3. Their eigenvectors are orthogonal

For proofs see, e.g., Riley, Hobson and Bence.

1.5.9 Normal Modes*

Consider the arrangement of masses and springs shown in Fig. 1.1. The force F exerted by aspring of spring-constant k that is extended from equilibrium by a displacement x is given byHooke’s law: F = kx . Newton’s second law tells us that the acceleration of an object of massm is related to the net force F that acts upon it by F = mx . Considering the net force on eachmass, the equations of motion for the system shown in Fig. 1.1 are

mx1 = −k1x1 + k2(x2 − x1)

mx2 = −k2(x2 − x1)

This pair of coupled second order differential equations may be written in matrix form as

m

(x1

x2

)=

(−k1 − k2 k2

k2 −k2

)(x1

x2

)

Page 21: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

1. Vector Spaces and Tensors 15

1

m

k

k

m

x1

x2

2

Figure 1.1: Two equal masses m and springs with spring-constants k1 and k2. Displacements fromequilibrium are denoted by x1 and x2.

Physical intuition tells us that when the system is set in motion, the masses are going to undergosome sort of oscillation. So we try to find special solutions such that both masses oscillate withthe same, pure frequency ω, i.e., solutions that look like

xk(t) = αkeiωt

where α is a constant vector. Such solutions are called normal modes. Differentiating twice wefind

x = (iω)2αeiωt = −ω2x ,

which we substitute back into the coupled differential equations above to obtain

−mω2

(x1

x2

)=

(−k1 − k2 k2

k2 −k2

)(x1

x2

)which we may rearrange to give

1

m

(k1 + k2 −k2

−k2 k2

)(x1

x2

)= ω2

(x1

x2

).

It may be seen that these equations are of the general form of an eigenvalue6 equation:

Ax = λx

where λ = ω2 and

A =1

m

(k1 + k2 −k2

−k2 k2

).

The eigenvectors are in some sense very special, because applying the matrix A to an eigenvector

does not change its direction – the result λx is simply x scaled by the eigenvalue λ.

6Eigen is a German word meaning same. You can see why: in the case of the masses on springs, if the massesall oscillate with the same frequency ω, then that is an eigenfrequency of the system, related to the eigenvalue λby λ = ω2.

Page 22: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

16 TSM-CDT: Mathematics: Arash Mostofi

1.6 Scalars, Vectors and Tensors

Scalars have magnitude and are independent of the any direction. Examples of scalar physicalquantities include density, temperature, mass, volume. As we shall see, scalars are also rank-zerotensors.

Vectors have magnitude and direction. Physical examples include velocity, force, temperaturegradient, electric field, current density. Given a set of basis vectors e, any vector v may bedescribed in terms of its components vi in that basis

v =3∑

i=1

vi e(i) ≡

v1

v2

v3

.

Vectors are examples of rank-one tensors.

Rank-two tensors are an extension of the concept of a vector. Consider the following example.For the special case of an isotropic7 crystal, no matter what direction in which an electric field Eis applied, the resulting current density J will be in the same direction as the field and given by

J = σisoE

or, in suffix notation,

Ji = σisoEi = σisoδijEj

where the scalar σiso is the conductivity of the isotropic crystal8. In other words, J and E areparallel. In general, however, electrical conductivity in most crystals is not isotropic and J and Ewill not be parallel. The general relationship between J and E is given by

J = σE ⇔ Ji = σijEj .

σ is the electrical conductivity tensor, and is an example of a rank-two tensor. Thus, if I were to

apply an electric field of the form E =(

010

), the resulting current density would be J =

(σ12σ22σ32

),

i.e., not only does current flow in the direction e(2) in which the field was applied, but also alongthe transverse directions e(1) and e(3).

A tensor is a mathematical representation of a physical quantity. The rank of a tensor is equal tothe number of its free suffixes. Examples of rank-two tensors:

7Isotropic means identical in all directions.8Cubic crystals are an example for which the conductivity is isotropic.

Page 23: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

1. Vector Spaces and Tensors 17

Property Equation

Permittivity Di = εijEj

Dielectric susceptibility Pi = ε0χeijEj

Permeability Bi = µijHj

Magnetic susceptibility Mi = χmij Hj

Electrical conductivity Ji = σijEj

Thermal conductivity hi = −κij∇jTThermal expansion εij = αij∆TStress Fi = σijAj

Strain ui − (u0)i = εijxj + ωijxj

1.7 Transformations

When we talk about transformations, what we mean is a change of basis from one orthogonal sete to another e ′.

Measurement of a physical property (such as polarisation, conductivity, susceptibility etc.) is madewith respect to a particular basis. When we change basis, the physical property itself does notchange, but the way in which we represent it does, i.e., the components of the tensor that we useto describe the physical property change.

1.7.1 Revision: Transformation of Vectors

Consider a physical property that has magnitude and direction and hence may be represented bya vector (i.e., a rank-one tensor). This property can be described by its components in the basise, or by its components in the rotated basis e ′,

p = p1e(1) + p2e(2) + p3e(3) = p′1e ′(1) + p′2e ′(2) + p′3e ′(3)

as shown in Fig. 1.2. If we denote the components with respect to the basis e by the vector pand the components with respect to the basis e ′ by p′, i.e.,

p =

p1

p2

p3

and p′ =

p′1p′2p′3

,

then the two sets of components are related by an orthogonal transformation9, which in the caseshown in Fig. 1.2, is described by

p′1 = p1 cos θ + p2 sin θ

p′2 = −p1 sin θ + p2 cos θ

p′3 = p3

9Recall that an orthogonal matrix is one for which RTR = RRT = I .

Page 24: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

18 TSM-CDT: Mathematics: Arash Mostofi

e1

e2

e ′1

e ′2

p1

p2

p′1

p′2

θ

p

Figure 1.2: The vector p may be described by its components in the basis e or e′. In the case drawn

e(3) = e′(3) points out of the page.

or, in suffix notationp′ = Lp ⇔ p′i = lijpj (1.3)

where in this particular example

L =

cos θ sin θ 0− sin θ cos θ 0

0 0 1

Eqn. 1.3 is the transformation law for a rank-one tensor. The matrix L defines the transfor-

mation between two coordinate systems and is orthogonal:

lki lkj = lik ljk = δij

In general, the transformation matrix L is given by

lij = e ′(i) · e(j) (1.4)

It is worth noting that the elements in the i th row of L are the direction cosines (Fig. 1.3) of e ′(i)

with respect to the basis e.

Exercise: The polarisation of a crystal is measured in the basis e to be P =(

201

)(in SI units10).

Find the polarisation P ′ as measured in a different basis e ′ where

e(1) =(

100

)e(3) =

(010

)e(2) =

(001

)10The SI unit of polarisation is Cm−2.

Page 25: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

1. Vector Spaces and Tensors 19

r

x

y

z

α

β

γ

(x , y , z)

x

y

z

Figure 1.3: Revision: the direction cosines of vector r are cosα, cosβ and cos γ.

and

e ′(1) =1√3

(111

)e ′(2) =

1√2

(−101

)e ′(3) =

1√6

(1−21

).

1.7.2 Transformation of Rank-Two Tensors

An example of a rank-two tensor is the conductivity σ of a crystal, which connects the applied

electric field E and the resulting current density J:

Ji = σijEj (1.5)

As mentioned before, these quantities have a physical reality. Only their representation changes,depending on the choice of basis. So, what happens to the above relation when we change ourbasis from e to e ′? There is no reason to expect that the physical laws of Nature shouldundergo any change, therefore we would expect that the current, conductivity and electric field,as measured in the new basis, would obey the same relationship, i.e.,

J ′i = σ′ikE ′k , (1.6)

where, as we saw in the previous section, the transformation law for rank-one tensors is such that

J ′i = lipJp and E ′k = lkqEq.

Substituting these expressions into Eqn. 1.6 we have

J ′i = σ′ikE ′k

lipJp = σ′ik lkqEq

lij lipJp = lijσ′ik lkqEq

δjpJp = lij lkqσ′ikEq

Jj = lij lkqσ′ikEq

Page 26: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

20 TSM-CDT: Mathematics: Arash Mostofi

Hence, comparing with Eqn. 1.5,

lij lkqσ′ik = σjq

lpj lij lrq lkqσ′ik = lpj lrqσjq

δpiδrkσ′ik = lpj lrqσjq

σ′pr = lpj lrqσjq

which provides us with the transformation law for a rank-two tensor:

A′ij = lip ljqApq ⇔ A′ = LALT (1.7)

This equation gives us the components A′ij of a rank-two tensor in a basis e ′, given its compo-nents Aij in the basis e. In other words, it tells us how the components of the tensor transformfrom one basis to another.

Exercise: The conductivity of a crystal is measured in the basis e to be

σ =

1 0 00 1 00 0 2

(in SI units11). Find the conductivity σ′ as measured in a different basis e ′ where

e(1) =(

100

)e(2) =

(010

)e(3) =

(001

)and

e ′(1) =1√3

(111

)e ′(2) =

1√2

(−101

)e ′(3) =

1√6

(1−21

).

There are a number of checks one can use to help verify that you’ve transformed your tensorcorrectly:

1. tr A′ = tr A (why?)

2. det A′ = det A (why?)

3. Symmetry is preserved by the transformation (why?)

11The SI unit of conductivity is (Ωm)−1.

Page 27: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

1. Vector Spaces and Tensors 21

1.7.3 Definition of a Tensor

Above we discussed and derived the transformation of scalars (rank-zero tensors), vectors (rank-one tensors) and rank-two tensors. Scalars are, of course, invariant with respect to transformations– for example, when we measure the temperature of a sample, the result does not depend on ourchoice of axes:

T ′ = T

The components of a vector transform according to Eqn. 1.3:

T ′i = lipTp

and the components of rank-two tensors transform according to Eqn. 1.7:

T ′ij = lip ljqTpq

We can keep going and write down the transformation law for a tensor of any rank in the sameway:

T ′ijk... = lip ljq lkr ... Tpqr ... (1.8)

Eqn. 1.8 actually serves as the definition of a tensor:

A tensor is a physical quantity whose components transform according tothe transformation law of Eq. 1.8.

The power and usefulness of suffix notation becomes more apparent when we start dealing withtensors of rank greater than two. Examples of such higher-rank tensors are given below.

Property Equation Notes

Piezoelectric modulus Pi = dijkσjk Polarisation P, stress σ

Elastic compliance εij = sijklσkl Strain ε, stress σ

Elastic stiffness σij = cijklεkl as above

Page 28: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

22 TSM-CDT: Mathematics: Arash Mostofi

1.7.4 Tensors vs Matrices

As we have seen, the components of a rank-two tensor can be represented as an array of numbers[Tij ], much in the same way the transformation matrix [lij ] is represented as an array of numbers.However, this is where the analogy ends, for these quantities are fundamentally very different.[Tij ] is the representation of a physical quantity in a particular basis e; [lij ] is rather an arrayof coefficients that tells us how one basis e is related to another e ′. It makes sense to talkabout transforming [Tij ] from one basis to another; it makes no such sense for [lij ]. This is why[Tij ] is a tensor, while [lij ] is not.

1.8 Diagonalisation of Rank-Two Tensors

We have seen that the transformation law for a rank-2 tensor is given by Eq. 1.7. In general,a rank-two tensor has nine components. If the tensor is symmetric, as is usually the case, thenumber of independent components is reduced to six. A question worth asking is “Is there abasis in which the representation of the tensor is actually diagonal?”, i.e., is there a special basisin which there are just three non-zero components along the diagonal? The answer is yes! Anysymmetric rank-two tensor can be diagonalised by a suitable rotation of basis, and the basis inwhich the tensor is diagonal is the basis of eigenvectors (the principal directions).

Exercise: The components of a tensor in the basis e are given by

T =

2 1 01 2 00 0 4

.

Find the principal values λ and principal directions e ′. Draw the orientation of the principaldirections relative to the basis e.

The normalised principal directions are simply the basis vectors e ′ of the new basis in which therepresentation of the tensor is diagonal, i.e., the transformation matrix is given by lij = e ′(i) · e(j).The transformed tensor is then given by the usual transformation law, and is diagonal (i.e., alloff-diagonal components are zero), with the eigenvalues λ(i) along the diagonal:

T′ = LTLT =

λ1 0 00 λ2 00 0 λ3

Exercise: For the tensor T in the previous exercise, calculate the transformation matrix L and

hence show that T′ is diagonal with the eigenvalues along the diagonal. What is the angle of

rotation in this transformation? Verify that the determinant and trace are invariant with respectto the transformation.

Page 29: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

1. Vector Spaces and Tensors 23

1.8.1 Representation Quadric

The representation quadric of a rank-two tensor is given by the equation

xiTijxj = c ⇔ xTTx = c ,

where c is a constant. When expanded out, this gives us a general quadratic form

t11x21 + t22x2

2 + t33x23 + 2t12x1x2 + 2t23x2x3 + 2t13x1x3 = c ,

where we have assumed that the tensor is symmetric tij = tji . The equation above is difficult tovisualise in three-dimensions, so let us consider the plane x3 = 0, in which case it becomes

t11x21 + t22x2

2 + +2t12x1x2 = c .

The great benefit of tensor diagonalisation becomes clear when we notice that in the basis in whichthe tensor is diagonal, the representation quadric also becomes diagonal. Performing a rotationof basis we find

xTTx = c

xT(LTL)T(LTL)x = c

(xTLT)(LTLT)(Lx) = c

x ′T

T′x ′ = c

where we have used LTL = I and the transformation laws for rank-one and rank-two tensors

x ′i = lipxp and t ′ij = lip ljqtpq.

If L is chosen such that T′ is diagonal, with principal values λ′, then, in the new basis, the

cross-terms in the quadratic form disappear:

x ′T

T′x ′ = λ′(1)

x ′21 + λ′

(2)x ′

22 + λ′

(3)x ′

23 = c .

Whereas in the original basis, the quadratic form may have been difficult to visualise (due to thex1x2 cross-term), in the basis of eigenvectors, in which the quadratic form is diagonal, it is mucheasier.

In Fig. 1.4 we plot the representation quadric for the tensor

T =

5 1 01 3 00 0 1

in the plane x3 = 0.

Exercise: Write down the expression for the quadratic form. Find the eigenvalues and eigenvectorsof T. Express the quadratic form in the basis of eigenvectors. Draw the eigenvectors on Fig. 1.4

and label the lengths of the axes of the ellipse.

Things to note:

Page 30: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

24 TSM-CDT: Mathematics: Arash Mostofi

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

-0.6 -0.4 -0.2 0 0.2 0.4 0.6

x1

x2

Figure 1.4: Quadric surface xi tijxj = 1 in the plane x3 = 0.

1. The symmetry axes are aligned along the principal directions, which is why they are calledthe principal axes.

2. The i th principal axis of the quadric surface has length

√c

λi, where λi is the corresponding

principal value.

3. Depending on the signs of the principal values, the quadric surface may be an ellipsoid or ahyperboloid (or other interesting shapes).

1.8.2 Stationary Property of Eigenvalues

The squared distance from the origin to the quadric surface, normalised to the scale of the surface,is given by

d2(x) =xTx

c=

xTx

xTTx,

and from what was discussed above about the principal axes being aligned with the symmetryaxes of the surface, it must be the case that the principal axes are the directions for which d2

Page 31: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

1. Vector Spaces and Tensors 25

is stationary. Conversely, we can show that the inverse of this quantity, 1/d2(x), which we willdenote by λ(x), is stationary when x is in the direction of a principal axis:

δ(λ(x)) =2(δxT)Tx

xTx−

xTTx

(xTx)2· 2(δxT)x

=2

xTx

(δxT)Tx −xTTx

xTx(δxT)x

=

2

xTx(δxT)

[Tx − λ(x)x

],

where, in the first step, we have used the fact that xTT(δx) = (δxTTx). Now we see that λ is

stationary, i.e., δλ(x) = 0, when Tx = λ(x)x . In other words, when x is aligned with a principal

axis. It is also worth noting that the eigenvalues of T are the values of λ(x) at its stationary

points.

To summarise, the eigenvalues of a Hermitian matrix H are the values of the function

λ(x) =xTHx

xTx

at its stationary points which occur when x is aligned with an eigenvector of H. This is the basis

of the Rayleigh-Ritz method12 for estimating eigenvalues and the quotient in the equation aboveis often called the Rayleigh quotient.

1.9 Tensor Properties of Crystals

1.9.1 Matter Tensors

Many tensors are a measure of the properties of a crystal, such as electrical and thermal con-ductivity, [σij ] and [κij ], electric and magnetic susceptibility, [χe

ij ] and [χmij ], etc. Such tensors are

known as matter tensors.

Most rank-two matter tensors are symmetric. Symmetric rank-two tensors, as we saw earlier, havesix independent components when expressed in some arbitrary basis. The number of independentcomponents is further reduced due to the inherent symmetry of the crystal.

1.9.2 Neumann’s Principle

It is a fundamental postulate of crystal theory that matter tensors must conform to the pointgroup symmetry of the crystal. This is the basis of Neumann’s Principle13, which states that

12See Riley, Hobson and Bence, Ch. 9.3 for more details.13Franz Ernst Neumann (1798–1895), German mineralogist, physicist and mathematician.

Page 32: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

26 TSM-CDT: Mathematics: Arash Mostofi

The symmetry elements of any physical property of a crystal must includethe symmetry elements of the point group of the crystal.

Neumann’s Principle enables us to simplify matter tensors if we know the point group symmetryof the crystal14. For example, it tells us that symmetric rank-two matter tensors must take aspecific form for each of the seven distinct crystal systems when the tensor is represented inthe conventional Cartesian co-ordinate basis for that crystal system. These forms are given inTable 1.1, and the conventional Cartesian co-ordinate axes xi for each crystal system are givenin Table 1.2.

The representation quadric, discussed earlier, represents a rank-two matter tensor completely.When it is expressed in the basis of the crystal axes, the equation of the representation quadrichas as many independent coefficients as there are independent elements in the matter tensor andits symmetry is the symmetry of the crystal property it describes. The quadric can be thought ofas being an object whose orientation is fixed with respect to the crystal axes.

For example, a crystal with cubic symmetry is characterised by having four three-fold symmetryaxes, also known as triad axes, that are arranged like the body diagonals of a cube. By Neumann’sprincipal, the representation quadric of a cubic crystal must also have four triad axes and the onlyway in which this can be achieved is if the quadric surface is spherical. As such, all the principalvalues are equal and the associated matter tensor is isotropic and takes the same form no matterwhich basis it is represented in.

At the other extreme, a triclinic crystal has no symmetry axes and therefore there are no restrictionson the representation quadric, possibly other than a centre of symmetry which, as we’re onlyconsidering symmetric tensors, it already has. Therefore the number of independent elements ofthe associated tensor is equal to the number of independent elements possessed by any symmetrictensor, i.e., six. These can be thought of as being three for specifying the dimensions of thequadric and three for specifying its orientation with respect to the crystal axes.

1.9.3 Thermal Conductivity as an Example

You are familiar with the concept of heat flow in a one-dimensional solid

hx = −k∂T

∂x

where hx is the quantity of heat per unit time per unit area of cross-section flowing in thex-direction due to a temperature gradient ∂T/∂x in that direction, and k is the thermal conduc-tivity 15 of the solid.

14Note that the physical property need not have the same symmetries as the crystal point group, but must includethem. Often matter tensors have more symmetries than the point group.

15The SI units of thermal conductivity are Wm−1K−1.

Page 33: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

1. Vector Spaces and Tensors 27

Optical Class Crystal System N Tensor†

Isotropic Cubic 1

s 0 00 s 00 0 s

UniaxialTetragonalHexagonalTrigonal

2

s1 0 00 s1 00 0 s2

Orthorhombic 3

s1 0 00 s2 00 0 s3

Biaxial Monoclinic 4

s11 0 s13

0 s2 0s13 0 s33

Triclinic 6

s11 s12 s13

s12 s22 s23

s13 s23 s33

Table 1.1: Symmetric rank-two matter tensors in each of the seven crystal classes. N refers to the numberof independent elements. †Referred to conventional axes for the crystal system.

In a three-dimensional, isotropic solid, this equation generalises to

h = −k∇T ⇒ hi = −k∂T

∂xi

but we now know that only cubic crystals are isotropic, therefore, in general, the form of theequation is

hi = −kij∂T

∂xj(1.9)

where [kij ] is the thermal conductivity tensor, a rank-two tensor. The physical meaning of kij isclear from the equation above: consider setting up a unit temperature gradient in the x1 direction

∇T =

100

⇒ h = −

k11

k21

k31

,

i.e., the heat flow along the temperature gradient is −k11, while in the two transverse directionsit is −k21 and −k31.

Page 34: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

28 TSM-CDT: Mathematics: Arash Mostofi

Crystal system Conventional axes

CubicOrthorhombic

x1, x2 and x3 along a, b and c , respectively

TetragonalHexagonalTrigonal

x1 along a, x3 along c

Monoclinic x2 along b

Triclinic No fixed relation to crystallographic directions

Table 1.2: The conventional Cartesian co-ordinate axes xi for each of the seven distinct crystal systems.a, b and c are the lattice vectors of the unit cell of the crystal.

Expressing Eq. 1.9 in the basis of principle axes x ′i gives, e.g., for an orthorhombic crystal

h′i = −κ(i)∂T

∂x ′i

where κ(i) are the principal values of k and h′i are the components of the heat flow in the basis

of principal axes.

Experimentally, it is found that k is symmetric16, and that its principal values κ(i) are always

positive. As such, its quadric surface

κ(1)x ′21 + κ(2)x ′

22 + κ(3)x ′

23 = 1

is always ellipsoidal.

Inverting Eq. 1.9 gives∂T

∂xi= −rijhj , (1.10)

where the quantity [rij ] is the thermal resistivity tensor. In matrix form, r is the inverse of k:

r = k−1

This does not mean that rij = (kij)−1! However, it does imply that the principal values ρ(i) of

the resistivity tensor are related to the principal values of the conductivity tensor by ρi =1

κi.

16There are also good theoretical grounds for this being the case. The interested should look up references toOnsager’s Principle.

Page 35: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

1. Vector Spaces and Tensors 29

1.9.4 Steady-State Heat Flow in Crystals

The phrase “steady-state” means that the heat flow and temperature at any point in the systemdo not vary with time. We will consider two special cases of steady-state heat flow.

Heat Flow in a Thin, Wide Sample

The geometry of this situation is shown in Fig. 1.5. A thin, wide sample of a crystal is heldbetween conducting plates that maintain a steady temperature on either side of the crystal. Dueto the geometry of the sample, the isotherms (lines of constant temperature) must lie parallel tothe surfaces of the crystal (aside from effects at the ends). The temperature gradient −∇T mustbe perpendicular to the isotherms and must therefore be along the x1 direction, as shown in thediagram, i.e.,

∂T

∂x2=∂T

∂x3= 0.

The heat flow h is then given by Eq. 1.9, which reduces to

hi = −k1i∂T

∂x1, i ∈ 1, 2, 3.

In general, the direction of heat flow will be different to that of the temperature gradient. In fact,the rate of heat flow across the sample (which is what might be measured in an experiment) isnot |h|, but rather h1.

Heat Flow in a Long, Narrow Sample

Consider a long, narrow sample held between two conductors that maintain a constant temperaturedifference, as shown in Fig. 1.6. If the walls of the rod are insulated, or the sample is held invacuum, then the direction of heat flow must be parallel to the axis of the rod (again, there willalways be end effects). As a result, the temperature gradient will, in general, be at an angle tothe direction of heat flow and will be given by Eq. 1.10. Since h2 = h3 = 0, we have

∂T

∂xi= −r1ih1.

The temperature gradient along the axis of the rod (which is what might be measured in anexperiment) is therefore given by −r11h1.

1.10 Tensor Differential Operators

1.10.1 Tensor Fields

A scalar field φ(x) is a quantity that has a magnitude at every point x in space, e.g., the temper-ature T (x) inside this room. A vector field is a quantity that has a magnitude and direction at

Page 36: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

30 TSM-CDT: Mathematics: Arash Mostofi

x1

h

−∇T

Figure 1.5: Heat flow in a thin, wide sample.

x1

h

−∇T

Figure 1.6: Heat flow in a long, narrow sample.

every point in space, e.g., the velocity v(x) of the air flow in this room. Similarly, one can extendthis concept to tensor fields of higher-rank. An example is the stress field [σij(x)] in a crystal,which can depend on position.

Page 37: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

1. Vector Spaces and Tensors 31

When dealing with tensor fields, whatever the rank of the tensor, it is important to know how thefield varies with position. In other words, one needs to know derivatives of the field with respectto position, which brings us onto the topic of the gradient operator ∇.

1.10.2 Suffix Notation for ∇

The definitions of grad, div and curl in Cartesian co-ordinates may be expressed using suffixnotation:

[∇φ]i = ∂iφ

∇ · F = ∂iFi

[∇× F ]i = εijk∂jFk

where we have used the convenient shorthand ∂i ≡∂

∂xi.

1.10.3 ∇ as a Rank-One Tensor

In fact, it turns out that ∂i is a rank-one tensor as it can be shown to satisfy the transformationlaw for rank-one tensors,

∂′i = lip∂p

where ∂′i ≡ ∂∂x ′ i

.

Proof: For any quantity T , which may be a scalar, vector or tensor field, from the chain rule

∂′iT = (∂pT )∂xp∂x ′i

.

[xi ] is a rank-one tensor, therefore

xp = ljpx ′j

⇒ ∂xp∂x ′i

=∂

∂x ′i(ljpx ′j )

= ljp∂x ′j∂x ′i

= ljpδji = lip

Hence, substituting into our expression above we obtain ∂′iT = lip∂pT . Since T is arbitrary,∂′i = lip∂p, as required.

An important point to note is that previous tensors that we have come across in this course, whenwritten in suffux notation, were just numbers whose order in an expression could be changedwithout affecting the result. For example

aibijcjdk = dkbijaicj = cjbijdkai etc.

Page 38: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

32 TSM-CDT: Mathematics: Arash Mostofi

The same is not true for ∂i . ∂i is a differential operator and, by convention, acts to everythingappearing to its right. Therefore for vector fields u(x) and v(x), for instance,

∂iuivj = ∂i (uivj) = ui (∂ivj) + (∂iui )vj

where we have used product rule to expand the differential of the product uivj .

• Proof that ∇ · (φu) = u · ∇φ+ φ∇ · u:

[∇ · (φu)]i = ∂i (φui )

= ui (∂iφ) + φ(∂iui ) (using product rule)

= [u · ∇φ]i + [φ∇ · u]i

• Proof that ∇× (u × v) = (∇ · v)u − (∇ · u)v + (v · ∇)u − (u · ∇)v :

[∇× (u × v)]i = εijk∂j(εklmulvm) (using [a× b]i = εijkajbk)

= εijkεklm∂j(ulvm) (since εklm is a constant)

= (δilδjm − δimδjl)∂j(ulvm)

= ∂j(uivj)− ∂j(ujvi )

= ui∂jvj + vj∂jui − uj∂jvi − vi∂juj (using product rule)

= [(∇ · v)u]i + [(v · ∇)u]i − [(u · ∇)v ]i − [(∇ · u)v ]i

Page 39: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

Chapter 2

Green Functions

2.1 Learning Outcomes

To understand and be able to use the methods of variation of parameters and Green functions tosolve problems involving inhomogeneous linear differential equations.

2.2 Further Reading

1. Riley, Hobson & Bence, Mathematical Methods for Physics and Engineering, 3rd Ed., Chap-ter 15: Higher-Order Differential Equations, Sections 2.4 and 2.5.

2. Arfken & Weber, Mathematical Methods for Physicists, 4th (Intl.) Ed., Chapter 9: Sturm-Liouville Theory–Orthogonal Functions, Section 5.

2.3 Introduction

Green functions1 are an invaluable tool for the solution of inhomogeneous differential equations andalso have applications in other, unrelated branches of mathematics and physics such as quantumfield theory and statistical field theory.

We begin by considering the general, second-order linear ordinary differential equation (ODE):

L[y ] ≡[

d2

dx2+ p(x)

d

dx+ q(x)

]y(x) = f (x), (2.1)

subject to certain boundary conditions that must be specified. These may be fixed point boundaryconditions, e.g., y(0) = y(L) = α, or they may be initial value boundary conditions, e.g., y(0) =

1Named after George Green (1793–1841), a British mathematician and physicist, a Nottinghamshire miller whowas entirely self-taught.

33

Page 40: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

34 TSM-CDT: Mathematics: Arash Mostofi

y ′(0) = α. If α = 0, the boundary or initial conditions are said to be homogeneous, otherwisethey are termed inhomogeneous. Similarly, the differential equation itself is called homogeneousif f (x) = 0 and inhomogeneous otherwise.

2.4 Variation of Parameters

You should recall from elementary courses on solving ODEs, that the general solution to Eqn. (2.1)is of the form

y(x) = y0(x) + ay1(x) + by2(x), (2.2)

where y1(x) and y2(x) are solutions of the homogeneous equation L[y ] = 0 and y0(x) is calledthe particular integral, which is any (non-trivial) solution that satisfies L[y ] = f (x) subject to thespecified boundary conditions. Finding the solutions to the homogeneous equation is usually theeasier part. To find the particular integral, the method of variation of parameters may be used,in which we make an ansatz2 for the particular integral:

y0(x) = u(x)y1(x) + v(x)y2(x). (2.3)

If u(x) and v(x) were constants, then y0(x) would be, by the principle of superposition, just asolution of the homogeneous equation. Therefore, we will vary these parameters subject to theconstraint

u′(x)y1(x) + v ′(x)y2(x) = 0, (2.4)

such that y0(x) satisfies the inhomogeneous equation. Taking the first and second derivatives ofEqn. (2.3), making use of Eqn. (2.4) to simplify things, gives

y ′0 = uy ′1 + u′y1 + vy ′2 + v ′y2

= uy ′1 + vy ′2

y ′′0 = uy ′′1 + u′y ′1 + vy ′′2 + v ′y ′2.

Substituting these expressions into the differential equation, after some rearrangement, gives

L[y0] ≡ y ′′0 + py ′0 + qy0 = u′y ′1 + v ′y ′2

Hence, y0(x) is a particular solution to the differential equation if

u′(x)y ′1(x) + v ′(x)y ′2(x) = f (x). (2.5)

Eqns. (2.4) and (2.5) may be solved simultaneously to give

u′(x) = − f (x)y2(x)

W (x)and v ′(x) =

f (x)y1(x)

W (x)(2.6)

where

W (x) ≡∣∣∣∣y1 y2

y ′1 y ′2

∣∣∣∣ = y1y ′2 − y2y ′1 (2.7)

2An educated guess which will later turn out to have been a good one. From the German word meaning onsetor starting-point.

Page 41: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

2. Green Functions 35

is known as the Wronskian3 of y1 and y2. It can be shown that if y1 and y2 are linearly independentsolutions to the homogeneous equation, then W 6= 0.

Eqn. (2.6) may be solved to obtain u(x) and v(x), and hence y0(x). It is important to makesure that y(x) satisfies the boundary or initial conditions that are appropriate to the problem.Examples are given below.

2.4.1 Homogeneous Initial Conditions

Integrating Eqn. (2.6) gives

u(x) = −∫ x

α

f (x ′)y2(x ′)

W (x ′)dx ′ and v(x) =

∫ x

α

f (x ′)y1(x ′)

W (x ′)dx ′,

where the lower limit in each integral is chosen such that the expression for the particular integral

y0(x) =

∫ x

α

[y1(x ′)y2(x)− y2(x ′)y1(x)

W (x ′)

]f (x ′) dx ′

satisfies the homogeneous initial conditions y0(α) = y ′0(α) = 0. The solution to the ODE thatsatisfies the initial conditions is then just y(x) = y0(x), which can be expressed as

y(x) =

∫ ∞α

G (x , x ′)f (x ′) dx ′

where we have defined the Green function

G (x , x ′) =

0 x ′ > x

y1(x ′)y2(x)−y2(x ′)y1(x)W (x ′) x ′ < x

.

2.4.2 Inhomogeneous Initial Conditions

Consider more general initial conditions of the form y(α) = c1, y ′(α) = c2. We can convert theproblem into a homogeneous one by defining

y(x) = y(x) + g(x),

where g(x) is any function chosen to satisfy the inhomogeneous initial conditions g(α) = c1,g ′(α) = c2. Then it can be seen that

L[y ] = f (x),

wheref ≡ f − g ′′ − pg ′ − qg ,

and y(x) now satisfies homogeneous initial conditions y(α) = y ′(α) = 0.

3Named after the Polish mathematician Jozef Maria Hoene-Wronski (1778–1853).

Page 42: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

36 TSM-CDT: Mathematics: Arash Mostofi

2.4.3 Homogeneous Two-Point Boundary Conditions

An example of homogeneous two-point boundary conditions is y(α) = y(β) = 0. A solution toEqn. (2.1) that satisfies y(α) = 0 is

y(x) =

∫ x

α

[y1(x ′)y2(x)− y2(x ′)y1(x)

W (x ′)

]f (x ′) dx ′ + k [y1(α)y2(x)− y1(x)y2(α)] ,

where k is an arbitrary constant. Setting y(β) = 0 gives

k [y1(α)y2(β)− y1(β)y2(α)] = −∫ β

α

[y1(x ′)y2(β)− y2(x ′)y1(β)

W (x ′)

]f (x ′) dx ′,

which enables the constant k to be determined if and only if neither y1(x) nor y2(x) vanish atboth α and β4. To simplify the algebra, it is expedient to choose y1 and y2 such that

y1(α) = y2(β) = 0,

which implies that

k = − 1

y2(α)

∫ β

α

y2(x ′)f (x ′)

W (x ′)dx ′,

which may be substituted into the solution to give

y(x) =

∫ x

α

[y1(x ′)y2(x)− y2(x ′)y1(x)

W (x ′)

]f (x ′) dx ′ +

∫ β

α

y2(x ′)y1(x)

W (x ′)f (x ′) dx ′

=

∫ x

α

y1(x ′)y2(x)

W (x ′)f (x ′) dx ′ +

∫ β

x

y2(x ′)y1(x)

W (x ′)f (x ′) dx ′ (2.8)

≡∫ β

αG (x , x ′)f (x ′) dx ′, (2.9)

where we have defined the Green function

G (x , x ′) =

y1(x ′)y2(x)

W (x ′) α ≤ x ′ < xy1(x)y2(x ′)

W (x ′) x < x ′ ≤ β, (2.10)

and the solution satisfies the required boundary conditions.

Let us consider the properties of the Green function G (x , x ′).

1. Continuity at x = x ′[G (x , x ′)

]x=x ′

≡ limε→0

[G (x , x ′)

]x=x ′+ε

x=x ′−ε

= limε→0

[y1(x ′)y2(x ′ + ε)

W (x ′)− y1(x ′ − ε)y2(x ′)

W (x ′)

]= 0

4In other words, a solution to the inhomogeneous differential equation exists if and only if there is no solutionto the homogeneous differential equation that satisfies both boundary conditions.

Page 43: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

2. Green Functions 37

2. Unit discontinuity of ∂G∂x at x = x ′[

∂G

∂x

]x=x ′

≡ limε→0

[y1(x ′)y ′2(x ′ + ε)

W (x ′)− y ′1(x ′ − ε)y2(x ′)

W (x ′)

]=

W (x ′)

W (x ′)

= 1

We derived above the form of the Green function for the differential operator of Eqn. (2.1) usingthe method of variation of parameters. Below we consider Green functions via a more formalapproach and derive the same results, but first we have to meet the Dirac delta-function.

2.5 Dirac Delta-Function

The Dirac delta-function5 δ(x − x ′) has the fundamental, defining property∫ b

ag(x ′) δ(x − x ′) dx ′ = g(x) for x ∈ [a, b]. (2.11)

What this shows is that under integration the delta-function “picks out” the value of g(x) at thepoint on which the delta-function is centred.

It may be thought of as a “spike”, centred on x = x ′, with zero width, infinite height and unitarea underneath6

δ(x − x ′) =

∞ x = x ′

0 otherwise

2.5.1 Properties of δ(x)

• δ(x) = δ(−x) is an even function

• δ∗(x) = δ(x) is a real function

• The area under δ(x) is 1 ∫ η

−ηδ(x) dx = 1 ∀ η > 0

• The Fourier transform of δ(x) is a constant

δ(k) =1√2π

∫ ∞−∞

δ(x) e−ikx dx =1√2π

e−ikx∣∣∣x=0

=1√2π

5Paul Adrien Maurice Dirac (1902–1984), British theoretical physicist. Dirac shared the Nobel Prize in Physics(1933) with Erwin Schrodinger, for his work on Quantum Theory.

6Technically, this definition is only heuristic as δ(x) can not really be called a function, rather it is a distributionwhich is defined only under integration, as in Eqn. 2.11.

Page 44: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

38 TSM-CDT: Mathematics: Arash Mostofi

2.5.2 δ(x) as the Limit of a Top-Hat Function

δ(x) may be thought of as the limit of a top-hat function of width ε and height 1ε as ε→ 0:

δ(x) = limε→0

δε(x)

where

δε(x) =

ε−1 |x | < ε/20 otherwise

δε(x) is sketched for different values of ε in Fig. 2.1.

0

2

4

6

8

10

-0.6 -0.4 -0.2 0 0.2 0.4 0.6

Figure 2.1: δε(x) for ε ∈ 1, 0.5, 0.2, 0.1.

Using Fig. 2.2 as a guide, consider the limit

limε→0

∫ x+ε/2

x−ε/2f (x ′) δε(x − x ′) dx ′ = lim

ε→0

∫ x+ε/2

x−ε/2f (x ′)

1

εdx ′

= limε→0

f (a)

∫ x+ε/2

x−ε/2

1

εdx ′ a ∈ [x − ε

2, x +

ε

2]

= limε→0

f (a)

[x ′

ε

]x+ε/2

x−ε/2

⇒∫

f (x ′) δ(x − x ′) dx ′ = f (x),

which is exactly Eqn. 2.11.

Page 45: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

2. Green Functions 39

f (x)

δε(x)

0

2

4

6

8

10

-0.6 -0.4 -0.2 0 0.2 0.4 0.6

Figure 2.2: Generic sketch of f (x) and δε(x).

2.5.3 δ(x) as the Limit of a Sinc Function

Consider the integral:

δε(x) =1

∫ 1/ε

−1/εeikx dk .

We can integrate the expression for δε(x):

δε(x) =1

[eikx

ix

]1/ε

k=−1/ε

=1

επ

sin(x/ε)

x/ε

⇒ δε(x) =sinc(x/ε)

επ

Where sinc(x) ≡ sin x

x. δε(x) is plotted in Fig. 2.3 for different values of ε.

Show that7

f (x) = limε→0

∫f (x ′)δε(x − x ′) dx ′. (2.12)

7You may find the Riemann-Lebesque lemma helpful: limλ→±∞∫g(t) sin(λt) dt = 0.

Page 46: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

40 TSM-CDT: Mathematics: Arash Mostofi

-0.5

0

0.5

1

1.5

2

2.5

3

-10 -5 0 5 10

-0.5

0

0.5

1

1.5

2

2.5

3

-10 -5 0 5 10

-0.5

0

0.5

1

1.5

2

2.5

3

-10 -5 0 5 10

Figure 2.3: δε(x) =sinc(x/ε)

επfor ε ∈ 1, 0.2, 0.1.

Page 47: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

2. Green Functions 41

2.5.4 δ(x) as the Continuous Analogue of δij

The Dirac delta-function can also be thought of as the continuous analogue of the discreteKronecker-delta δij which has a property analogous to Eqn. 2.11, but for discrete variables∑

j

δijgj = gi .

2.6 Green Functions

Consider a second order differential operator of the form

L =d2

dx2+ p(x)

d

dx+ q(x),

and the associated inhomogeneous equation

L[y(x)] = f (x),

subject to, e.g., two-point boundary conditions y(α) = y(β) = 0. The Green function G (x , x ′) isdefined by

L[G (x , x ′)] = δ(x − x ′),

where δ(x) is the Dirac delta-function. G (x , x ′) is subject to the same boundary conditions asy(x), namely G (α, x ′) = G (β, x ′) = 0. The solution to the original inhomogeneous equation isthen given by

y(x) =

∫ β

αG (x , x ′)f (x ′) dx ′,

which is easily verified by operating on both sides with L:

L[y ] =

∫ β

αL[G (x , x ′)]f (x ′) dx ′

=

∫ β

αδ(x − x ′)f (x ′) dx ′

= f (x).

For x 6= x ′, δ(x − x ′) = 0, therefore

L[G (x , x ′)] = 0 x 6= x ′,

i.e., the Green function satisfies the homogenoeus equation with solutions y1(x) and y2(x). There-fore, by the principle of superposition, we may write

G (x , x ′) =

A(x ′)y1(x) + B(x ′)y2(x) α < x < x ′

C (x ′)y1(x) + D(x ′)y2(x) x ′ < x < β

To find the parameters A(x ′), B(x ′), C (x ′) and D(x ′), we use the boundary conditions on G (x , x ′),the continuity of G (x , x ′), and the unit discontinuity of ∂G

∂x at x = x ′:

Page 48: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

42 TSM-CDT: Mathematics: Arash Mostofi

1. Boundary conditions: G (α, x ′) = G (β, x ′) = 0.

A(x ′)y1(α) + B(x ′)y2(α) = 0 (2.13)

C (x ′)y1(β) + D(x ′)y2(β) = 0 (2.14)

2. Continuity of G and unit discontinuity of ∂G∂x .(

y1 y2

y ′1 y ′2

)(C − AD − B

)=

(01

), (2.15)

which, if W 6= 0, may be inverted to give

C − A = − y2(x ′)

W (x ′)and D − B =

y1(x ′)

W (x ′). (2.16)

We can simplify the algebra considerably by choosing y1(x) and y2(x) such that

y1(α) = y2(β) = 0.

It then follows from Eqns. (2.13) and (2.14) that

B(x ′) = C (x ′) = 0,

Hence, from Eqns. (2.16) we have

A(x ′) =y2(x ′)

W (x ′)and D(x ′) =

y1(x ′)

W (x ′),

giving us the final result

G (x , x ′) =

y1(x ′)y2(x)

W (x ′) α ≤ x ′ < xy1(x)y2(x ′)

W (x ′) x < x ′ ≤ β,

which is the same result derived using the variation of parameters method (Eqn. (2.10)). Hence thefinal solution that satisfies the differential equation and the boundary conditions y(α) = y(β) = 0is given, as before, by

y(x) =

∫ β

αG (x , x ′)f (x ′) dx ′ (2.17)

=

∫ x

α

y1(x ′)y2(x)

W (x ′)f (x ′) dx ′ +

∫ β

x

y2(x ′)y1(x)

W (x ′)f (x ′) dx ′. (2.18)

Page 49: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

Chapter 3

Hilbert Spaces

3.1 Learning Outcomes

To understand and be able to use the following concepts in solving problems: Hilbert spaces, lineardependence of functions, orthogonality of functions; Sturm-Liouville theory, self-adjoint differentialoperators, eigenfunctions and eigenvalues; eigenfunction expansions and completeness.

3.2 Further Reading

1. Riley, Hobson & Bence, Mathematical Methods for Physics and Engineering, 3rd Ed., Chap-ter 17: Eigenfunction Methods for Differential Equations; Chapter 18: Special Functions.

2. Arfken & Weber, Mathematical Methods for Physicists, 4th (Intl.) Ed., Chapter 9: Sturm-Liouville Theory–Orthogonal Functions.

3.3 Introduction

By analogy with finite-dimensional spaces of vectors that were discussed in Ch. 1, we may thinkof functions as objects in an infinite-dimensional space, known as a Hilbert space1. All of thecharacteristics of vector spaces carry over to the concept of Hilbert spaces, including the importantidea of orthogonality. We will see a deep connection between Hermitian matrices and a particularclass of differential operators known as self-adjoint operators. The eigenfunctions of self-adjointoperators, much like the eigenvectors of Hermitian matrices, turn out to be orthogonal and theseorthogonal functions, much like orthogonal vectors, may be used as a complete basis set for theHilbert space. In this chapter we will introduce everything rather pedagogically and for the case ofproblems in one spatial dimension only. The concepts generalise to higher numbers of dimensionsin a straightforward way as you will see later in the course.

1Named after German mathematician David Hilbert (1862–1943).

43

Page 50: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

44 TSM-CDT: Mathematics: Arash Mostofi

3.4 Hilbert Spaces

3.4.1 Definition

A set of functions u(x), v(x), w(x), ... on the interval x ∈ [a, b] satisfying the same conditionsoutlined for finite-dimensional vectors in Section 1.4.2, constitute an (infinte-dimensional) Hilbertspace.

3.4.2 Properties

In the same way that a set of linearly independent vectors can be used to uniquely express aarbitrary vector in the vector space, any (reasonable) function f (x) on the interval x ∈ [a, b] maybe expressed in terms of a set of linearly independent basis functions yi (x)

f (x) =∞∑i=1

aiyi (x).

The inner product on an interval x ∈ [a, b] is defined as

〈f |g〉 =

∫ b

af ∗(x)w(x)g(x) dx ,

where w(x) is a real-valued weight function2 that is positive on the interval x ∈ [a, b]. Functionsf (x) and g(x) are said to be orthogonal with respect to w(x) on the interval x ∈ [a, b] if

〈f |g〉 = 0. The norm of f (x) is given by ||f || = 〈f |f 〉1/2, and f (x) may be normalised in the usualway: f (x) = f (x)/||f ||.

A set of functions is called orthogonal if

〈yi |yj〉 = 0 for i 6= j ,

and orthonormal if〈yi |yj〉 = δij .

Given a set of linearly independent functions fn(x), one may construct an orthonormal set offunctions gn(x) by the procedure of Gram-Schmidt orthogonalisation, which is explored furtherin the problem sets.

3.5 Sturm-Liouville Theory

Sturm-Liouville theory3 is the theory of inhomogeneous equations of the from Ly = f for second-order differential operators that are self-adjoint. In particular, it is concerned with f = λy , wherey(x) is called an eigenfunction of L with associated eigenvalue λ.

2We will turn our attention to its origin in the next section.3Named after Jacques Charles Francois Sturm (1803–1855) and Joseph Liouville (1809–1882).

Page 51: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

3. Hilbert Spaces 45

3.5.1 Self-Adjoint Differential Operators

Consider the differential operator

L = − d

dx

(ρ(x)

d

dx

)+ σ(x), (3.1)

where ρ(x) and σ(x) are defined on x ∈ [a, b] and we require that ρ(x) > 0 on x ∈ [a, b]. Suchan operator is said to be in self-adjoint form.

Furthermore, consider the inner product

〈u|Lv〉 =

∫ b

au∗(x)Lv(x) dx

= −∫ b

au∗(x)

d

dx

(ρ(x)

dv

dx

)dx +

∫ b

au∗(x)σ(x)v(x) dx

= −[u∗(x)ρ(x)v ′(x)

]ba

+

∫ b

au∗′(x)ρ(x)v ′(x) dx +

∫ b

au∗(x)σ(x)v(x) dx

=[−u∗ρv ′ + u∗′ρv

]ba−∫ b

av(x)

d

dx

(ρ(x)

du∗

dx

)dx +

∫ b

au∗(x)σ(x)v(x) dx

=

∫ b

av(x)Lu∗(x) dx +

[ρ(u∗′v − u∗v ′)

]ba

= 〈v |Lu〉∗ +[ρ(u∗′v − u∗v ′)

]ba

.

Therefore, if the boundary term [ρ(u∗′v − u∗v ′)

]ba

= 0, (3.2)

then 〈u|Lv〉 = 〈v |Lu〉∗ and L is said to be self-adjoint and is analogous to a Hermitian matrix.

Any second order linear differential operator can be put into self-adjoint form. Consider the mostgeneral operator

L = − d

dx

(a(x)

d

dx

)− b(x)

d

dx+ c(x), (3.3)

where we require a(x) 6= 0 for x ∈ [a, b]. If L is multiplied by a function w(x) > 0,

w L = −wd

dx

(a

d

dx

)− bw

d

dx+ cw

= − d

dx

(aw

d

dx

)+(aw ′ − bw

) d

dx+ cw ,

which is in self-adjoint form (with ρ = aw and σ = cw) if we choose w(x) such that

aw ′ = bw

⇒ w(x) = exp

∫ x

α

b(x ′)

a(x ′)dx ′,

where we have chosen w(α) = 1.

Page 52: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

46 TSM-CDT: Mathematics: Arash Mostofi

Self-adjoint, or Hermitian, operators take a very special role in the theory of Quantum Mechanics.According to the Copenhagen Interpretation, any physical quantum mechanical observable, suchas position, energy, momentum, is an eigenvalue of a corresponding self-adjoint operator. The factthat self-adjoint operators have a discrete eigenspectrum λn is the way in which quantisation isbuilt in to quantum mechanics since observables can only take these discrete values4.

3.5.2 Eigenfunctions and Eigenvalues

Consider the inhomogeneous equation

Ly = λy ,

where L is a general second order linear differential operator as given in Eqn. (3.3). As shownabove, we may define a self-adjoint operator L = w L by means of a suitable weight functionw(x):

Ly = λwy . (3.4)

In general, the solutions to Eqn. (3.4) form a discrete, yet infinite, set with eigenvlaues λnand eigenfunctions yn(x) for n ∈ 1, 2, 3, ... ,∞. Analogously to the the eigenvalues andeigenvectors of Hermitian matrices, the eigenvalues of a self-adjoint operator are real and theeigenfunctions corresponding to distinct eigenvalues are orthogonal. The proof is as follows (notethat the weight function is implicit in the definition of the inner product):

Lyi = λiyi Lyj = λjyj⟨yj |Lyi

⟩= λi 〈yj |yi 〉

⟨yi |Lyj

⟩= λj 〈yi |yj〉⟨

yj |Lyi

⟩∗= λ∗i 〈yj |yi 〉

∗⟨yi |Lyj

⟩= λ∗i 〈yi |yj〉

Comparing the final expression in each of the columns above we find

(λ∗i − λj) 〈yi |yj〉 = 0.

1. For i = j we have

(λ∗i − λi )||yi ||2 = 0,

so, if we have non-trivial eigenfunctions, then λ∗i = λi , i.e., the eigenvalues are real.

2. For i 6= j we have,

(λi − λj) 〈yi |yj〉 = 0,

so, if we are considering distinct eigenvalues, then 〈yi |yj〉 = 0, i.e., the eigenfunctions areorthogonal.

4For more details see, eg, P. A. M. Dirac, The Principles of Quantum Mechanics.

Page 53: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

3. Hilbert Spaces 47

3.5.3 Eigenfunction Expansions

If yn is a complete set of orthonormal eigenfunctions of a self-adjoint operator, then any functionwith the same boundary conditions can be expressed as an eigenfunction expansion:

f (x) =∞∑n=1

anyn(x). (3.5)

The coefficients an can be found by orthogonality of the eigenfunctions:

〈ym|f 〉 =∞∑n=1

an 〈ym|yn〉

=∞∑n=1

anδmn

= am.

Substituting into the expansion we find

f (x) =∞∑n=1

〈yn|f 〉 yn(x)

=∞∑n=1

[∫ b

aw(x ′)y∗n (x ′)f (x ′) dx ′

]yn(x)

=

∫ b

adx ′f (x ′)

[w(x ′)

∞∑n=1

yn(x)y∗n (x ′)

],

which gives us the completeness relation

w(x ′)∞∑n=1

yn(x)y∗n (x ′) = δ(x − x ′), (3.6)

which defines what we mean by a complete set of functions.

In practice, particularly in numerical methods, the infinite sum in Eqn. (3.5) must be approximatedby a finite sum

f (x) ≈N∑

n=1

anyn(x).

Defining the error to be

εN ≡

∣∣∣∣∣∣∣∣∣∣f (x)−

N∑n=1

anyn(x)

∣∣∣∣∣∣∣∣∣∣2

= ||f (x)||2 −N∑

n=1

[a∗n 〈yn|f 〉+ an 〈f |yn〉] +N∑

n=1

|an|2

we see that the error is minimised when

δεNδa∗n

= 0 ⇒ an − 〈yn|f 〉 = 0 ⇒ an = 〈yn|f 〉 ,

Page 54: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

48 TSM-CDT: Mathematics: Arash Mostofi

and the value of the error at this point is

εN = ||f ||2 −N∑

n=1

|an|2 .

Using the fact that εN ≥ 0, we obtain Bessel’s inequality:

||f ||2 ≥N∑

n=1

|an|2 , (3.7)

which, in the limit N →∞, becomes an equality and gives Parseval’s theorem

||f ||2 =∞∑n=1

|an|2 . (3.8)

3.5.4 Eigenfunction Expansion of Green Function

If yn are a set of orthonormal eigenfunctions of L with corresponding eigenvalues λn, thenthe Green function for L is given by

G (x , x ′) =∞∑n=1

yn(x)y∗n (x ′)

λn. (3.9)

G (x , x ′) satisfies the appropriate boundary conditions, because yn(x) does, and it satisfies thedefining property of a Green function:

L[G (x , x ′)] =∞∑n=1

L[yn(x)]y∗n (x ′)

λn

=∞∑n=1

w(x)yn(x)y∗n (x ′)

=w(x)

w(x ′)

[w(x ′)

∞∑n=1

yn(x)y∗n (x ′)

]= δ(x − x ′),

where we have used the completeness relation of Eqn. (3.6).

When λn = 0, for any n, then the Green function does not exist, which is another way of phrasingwhat we discovered earlier: if a solution to the homogeneous problem Ly = 0 exists that satisfiesboth boundary conditions, then the inhomogeneous problem Ly = f has no solution; λn = 0implies that yn is exactly such a solution to the homogeneous problem.

Page 55: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

3. Hilbert Spaces 49

3.5.5 Eigenfunction Expansions for Solving ODEs

As an example, consider the differential equation

Ly(x)− νy(x) = f (x),

subject to some boundary conditions, and where L is a self-adjoint operator with weight functionw(x) = 1, eigenvalues λn, and corresponding eigenfunctions yn, subject to the same boundaryconditions.

We may expand both the solution y(x) and the inhomogeneous term f (x) in terms of the completeset of eigenfunctions of L:

y(x) =∞∑n=1

anyn(x), f (x) =∞∑n=1

fnyn(x),

where the coeeficients an and fn are to be determined. Substituting into the original equation,we find

∞∑n=1

an [Lyn(x)− νyn(x)] =∞∑n=1

fnyn(x)

∞∑n=1

an [λn − ν] yn(x) =∞∑n=1

fnyn(x).

Orthogonality of the eigenfunctions implies that

an =fn

λ− ν,

so that the solution is given by

y(x) =∞∑n=1

fnλn − ν

yn(x),

where, from the expansion for f (x), fn = 〈yn|f 〉.

Substituting fn into y(x) we find

y(x) =∞∑n=1

〈yn|f 〉λn − ν

yn(x)

=

∫ b

a

∞∑n=1

yn(x)y∗n (x ′)

λn − νf (x ′) dx ′

=

∫ b

aG (x , x ′)f (x ′) dx ′,

hence we may identify the Green function of the problem as

G (x , x ′) =∞∑n=1

yn(x)y∗n (x ′)

λn − ν.

Page 56: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

50 TSM-CDT: Mathematics: Arash Mostofi

Note that if ν = λn, for any n, then there is no solution: taking the inner product with yn of thedifferential equation with ν = λn

〈yn|Ly − λny〉 = 〈yn|f 〉〈yn|Ly〉 − λn 〈yn|y〉 = 〈yn|f 〉〈y |Lyn〉∗ − λn 〈yn|y〉 = 〈yn|f 〉λn 〈y |yn〉∗ − λn 〈yn|y〉 = 〈yn|f 〉

0 = 〈yn|f 〉 = fn.

3.6 Legendre Polynomials

Legendre’s equation5

(1− x2)y ′′ − 2xy ′ + l(l + 1)y = 0, (3.10)

arises is a number of contexts in science, for example in the solution of Laplace’s equation inspherical co-ordinates. Eqn. (3.10) may be put easily into the form of a self-adjoint eigenvalueproblem with ρ(x) = (1− x2), σ(x) = 0, w(x) = 1 and λ = l(l + 1),

− d

dx

[(1− x2)

dy

dx

]= l(l + 1)y(x). (3.11)

If the boundary conditions are chosen such that y(x) is finite at x = ±1, then ρ(x)y(x) = 0 atx = ±1, so Eqn. (3.2) is satisfied and L is self-adjoint on the interval x ∈ [−1, 1]. It may beshown that this condition can be met only when l ∈ 0, 1, 2, ...6. The resulting eigenfunctions areknown as Legendre polynomials and are usually denoted Pl(x). Since they are the eigenfunctionsof a self-adjoint operator, it follows that polymonials associated with different values of l must beorthogonal on the interval x ∈ [−1, 1]

〈Pm|Pn〉 =

∫ 1

−1Pm(x)Pn(x) dx =

2

2m + 1δmn. (3.12)

For m 6= n, we may prove this directly. Pm(x) satisfies Eqn. (3.11), so if we multiply by Pn(x)and integrate we obtain∫ 1

−1Pn

[(1− x2)P ′m

]′dx +

∫ 1

−1Pnm(m + 1)Pm dx = 0.

Integrating once by parts the first term on the left-hand side we obtain

−∫ 1

−1P ′n(1− x2)P ′m dx +

∫ 1

−1Pnm(m + 1)Pm dx = 0.

Repeating the same analysis but with m and n switched around we find

−∫ 1

−1P ′m(1− x2)P ′n dx +

∫ 1

−1Pmn(n + 1)Pn dx = 0,

5Named after French mathematician Adrien-Marie Legendre (1752–1833).6One may show this by obtaining the series solution to the Legendre equation.

Page 57: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

3. Hilbert Spaces 51

which, when subtracted from the equation previous to it gives

[n(n + 1)−m(m + 1)]

∫ 1

−1PnPm dx = 0,

which, for m 6= n, gives the orthogonality relation of Eqn. (3.12)7.

The first few Legendre polynomials are given by

P0(x) = 1, P1(x) = x ,P2(x) = 1

2 (3x2 − 1), P3(x) = 12 (5x3 − 3x),

and in general

Pl(x) =1

2l l!

dl

dx l(x2 − 1)l .

3.7 Spherical Harmonics

Laplace’s equation in spherical co-ordinates is given by

∇2f =1

r 2

∂r

(r 2∂f

∂r

)+

1

r 2 sin θ

∂θ

(sin θ

∂f

∂θ

)+

1

r 2 sin2 θ

∂2f

∂φ2= 0.

If we take f (r , θ,φ) = r lΘ(θ)Φ(φ) as an ansatz8, where l ∈ 0, 1, 2, ..., Laplace’s equationbecomes

l(l + 1)Θ(θ)Φ(φ) +Φ(φ)

sin θ

d

(sin θ

)+

Θ(θ)

sin2 θ

d2Φ

dφ2= 0

⇒ sin2 θl(l + 1) +sin θ

Θ(θ)

∂θ

(sin θ

)+

1

Φ(φ)

d2Φ

dφ2= 0.

The first two terms on the left-hand side above are purely functions of θ, while the remaining termis purely a function of φ. For non-trivial solutions, therefore, both these terms must be equal tothe same constant, i.e.,

sin θ

Θ(θ)

∂θ

(sin θ

)+ sin2 θl(l + 1) = m2,

1

Φ(φ)

d2Φ

dφ2= −m2,

for some constant m. Since f (r , θ,φ) = f (r , θ,φ+ 2π) must be periodic, it can be seen from thesecond equation that m ∈ Z and

Φ(φ) = ceimφ.

7For m = n see, e.g., Riley, Hobson & Bence.8This is the technique of separation of variables, where we have “guessed” the right answer for the radial part

of f(r) to save time!

Page 58: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

52 TSM-CDT: Mathematics: Arash Mostofi

Substituting u = cos θ into the first equation, after some algebra we find that

(1− u2)d2Θ

du2− 2u

du+ l(l + 1)Θ(u) =

m2

1− u2Θ(θ)

⇒ −[(1− u2)Θ′

]′=

[l(l + 1)− m2

1− u2

]Θ, (3.13)

which is called the associated Legendre equation. It has non-singular solutions when l is an integerand 0 ≤ m ≤ l . These solutions are called the associated Legendre polynomials Pm

l (u). FromEqn. (3.13), it can be seen that P−ml (u) must be proportional to Pm

l (u). A common conventionfor the constant of proportionality is

P−ml (u) = (−1)m(l −m)!

(l + m)!Pml (u). (3.14)

When m = 0, the associated Legendre polynomials reduce to the Legendre polynomials, i.e.,P0l (u) = Pl(u). Eqn. (3.13) is in self-adjoint form with ρ = (1−u2), σ = m2

(1−u2), λ = l(l + 1) and

w = 1. Over the interval x ∈ [−1, 1], the associated Legendre polymonials form an orthogonalbasis set for different values of l :∫ 1

−1Pml (u)Pm

l ′ (u) du =2

2l + 1

(l + m)!

(l −m)!δll ′ .

Furthermore, since we could also have defined the Sturm-Liouville problem with σ = −l(l + 1),λ = −m2 and w = 1

1−u2 , a second orthogonality condition must be also obeyed for differentvalues of m: ∫ 1

−1

Pml (u)Pm′

l (u)

1− u2du =

(l + m)!

m(l −m)!δmm′ , m 6= m′ 6= 0.

The first few associated polynomials (for m 6= 0) are

P11 (u) = (1− u2)1/2, P1

2 (u) = 3u(1− u2)1/2,

p22(u) = 3(1− u2), P1

3 (u) = 32 (5u2 − 1)(1− u2)1/2.

Returning now to Laplace’s equation, we denote the angular part of f (r , θ,φ) by

Y ml (θ,φ) ≡ Θ(θ)Φ(φ) = (−1)m

[2l + 1

(l −m)!

(l + m)!

]1/2

Pml (cos θ)eimφ,

where Y ml (θ,φ) is known as a spherical harmonic. Using Eqn. (3.13), it can be shown that

spherical harmonics satisfy∇2Y m

l (θ,φ) = −l(l + 1)Y ml (θ,φ)

on the unit sphere r = 1. Using Eqn. (3.14) we find that

Y−ml (θ,φ) = (−1)mY m∗l (θ,φ).

Since the spherical harmonics are the eigenfunctions of the self-adjoint operator ∇2, they form acomplete, orthogonal set on the unit sphere∫ π

θ=0

∫ 2π

φ=0Y m′∗l ′ Y m

l dΩ = δll ′δmm′ ,

Page 59: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

3. Hilbert Spaces 53

where dΩ = sin θdθdφ. The first few spherical harmonics are listed below

Y 00 = 1√

4π, Y 0

1 =√

34π cos θ,

Y±11 = ∓

√3

8π sin θe±iφ, Y 02 =

√5

16π (3 cos2 θ − 1),

Y±12 = ∓

√158π sin θ cos θe±iφ, Y±2

2 =√

1532π sin2 θe±2iφ.

Completeness implies that any (reasonable) angular function may be expanded in terms of the setof spherical harmonics:

f (θ,φ) =∞∑l=0

l∑m=−l

almY ml (θ,φ),

where the coefficients alm may be found using orthogonality:

alm =

∫Y m∗l (θ,φ)f (θ,φ) dΩ.

One final relation that we should mention is the spherical harmonic addition theorem: for twoarbitrary directions in space defined by the angles (in spherical polar co-ordinates) (θ,φ) and(θ′,φ′), separated by an angle ψ

Pl(cosψ) =4π

2l + 1

l∑m=−l

Y ml (θ,φ)Y m∗

l (θ′,φ′).

This result is very useful, for example, in scattering theory.

Page 60: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

54 TSM-CDT: Mathematics: Arash Mostofi

Page 61: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

Chapter 4

Integral Transforms

4.1 Learning Outcomes

To be able to represent periodic functions in terms of a Fourier series of sine and cosine functions.The case of discontinuous functions. Parseval’s theorem for Fourier series and its relation topower spectra. To understand the relationship between Fourier series and Fourier transforms.To know the definition and fundamental properties of the Fourier transform. The Parseval andconvolution theorems for Fourier transforms. Application of Fourier methods to diffraction anddifferential equations. Laplace transforms: definition, properties, examples; their application tosolving differential equations; inverse transform by inspection; convolution theorem.

4.2 Further Reading

1. Riley, Hobson & Bence, Mathematical Methods for Physics and Engineering, 3rd Ed., Chap-ter 12: Fourier Series; Chapter 13: Integral Transforms.

2. Arfken & Weber, Mathematical Methods for Physicists, 4th Ed., Chapter 14: Fourier Series;Chapter 15: Integral Transforms.

3. Boas, Mathematical Methods in the Physical Sciences, 2nd Ed., Chapter 7: Fourier Series;Chapter 15: Integral Transforms.

4.3 Introduction

As we saw in the previous chapter, the eigenfunctions yn(x) of a self-adjoint operator form acomplete, orthogonal set of basis functions for the Hilbert space which they span and can thereforebe used to expand any (reasonable) function f (x) in that space

f (x) =∑n

anyn(x).

55

Page 62: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

56 TSM-CDT: Mathematics: Arash Mostofi

In this chapter we focus first on one particular set of eigenfunctions that form the Fourier1 basis.If we consider the properties of expansions in this basis on a periodic interval, the expansion isknown as a Fourier series2. For an infinite interval, it is known as a Fourier transforms, whichis an example of an integral transform. Towards the end of this chapter we will look at anotherexample of an integral transform, namely the Laplace transform.

The methods we will cover in this chapter have an important place in science and engineering, withapplications in signal processing and electronics, solving dierential equations, image processing andimage enhancement, diraction and interferometry, and much more besides.

4.4 Fourier Series*

4.4.1 The Fourier Basis*

Consider the set of functions sn(x) = sin(nπxL ), n ∈ 1, 2, ... and cn(x) = cos(nπxL ), n ∈ 0, 1, ....They have a common period of 2L, e.g., sn(x + 2L) = sn(x) and, as they are the eigenfunctionsof the Sturm-Liouville problem

Ly(x) = −d2y

dx2=(nπ

L

)2y(x),

subject to periodic boundary conditions y(x + 2L) = y(x), they form a mutually orthogonal setover any interval of length 2L:

〈sm|sn〉 = 〈cm|cn〉 = Lδmn m, n 6= 0

〈c0|c0〉 = 2L (4.1)

〈cm|sn〉 = 0 ∀ m, n

We may also prove these orthogonality relations directly. For example, consider the symmetricinterval [−L, L],

〈sm|sn〉 =

∫ L

−Lsin(mπxL ) sin(nπxL ) dx

=1

2

∫ L

−L

[cos( (m−n)πx

L )− cos( (m+n)πxL )

]dx

=1

2

[L

π(m − n)sin( (m−n)πx

L )− L

π(m + n)sin( (m+n)πx

L )

]L−L

m 6= n

= 0 since sin pπ = 0 for any integer p

1Jean Baptiste Joseph Fourier (1768–1830), French mathematician and physicist. Fourier also discovered thegreenhouse effect.

2This part should be mainly revision and is included only for completeness.

Page 63: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

4. Integral Transforms 57

And for the case of m = n, we have

〈sn|sn〉 =

∫ L

−Lsin2(nπxL ) dx

=1

2

∫ L

−L

[1− cos( 2nπx

L )]

dx

=1

2

[x − L

2nπsin( 2nπx

L )

]L−L

=1

2· 2L = L.

A similar analyis may be done for the remaining conditions3.

The set of functions comprising sn(x), n ∈ 1, 2, ..., and cn(x), n ∈ 0, 1, ..., form what isknown as the Fourier basis, an orthogonal basis for the Hilbert space of functions that are periodic(in this case with period 2L).

This means that, in the same way that any vector may be expressed as a linear combination ofbasis vectors, any periodic function f (x) = f (x + 2L) may be represented as a linear combinationof sines and cosines with the same periodicity,

f (x) =∞∑n=0

an cos(nπxL ) +∞∑n=1

bn sin(nπxL ). (4.2)

The expression above is known as a Fourier series, and Fourier analysis is all about finding theexpansion coefficients an and bn.

As an example, consider a triangular wave f (t), periodic on the interval t ∈ [0, T ]

f (t) =

4t/T t ∈ [0, T/4]

2(1− 2t/T ) t ∈ [T/4, 3T/4]4(t/T − 1) t ∈ [3T/4, T ]

Its Fourier series is given by4

f (t) =∞∑n=1

8 · (−1)n+1

(2n − 1)2π2sin( 2(2n−1)πt

T ). (4.3)

The first few terms are

f (t) =8

π2

(sin( 2πt

T )− 1

9sin( 6πt

T ) +1

25sin( 10πt

T )− 1

49sin( 14πt

T ) + ...

).

Figure 4.1 shows that by including more and more terms in the Fourier series, successively betterapproximations to f (t) are obtained. The coefficients of the expansion in Eqn. 4.3 are known asthe Fourier coefficients and can be found by orthogonality.

3Similar orthogonality relations exist for the complex Fourier basis functions einπx/L, n ∈ −∞, ... ,∞.4We shall see in due course how to calculate Fourier series explicitly.

Page 64: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

58 TSM-CDT: Mathematics: Arash Mostofi

-1

-0.5

0

0.5

1

0 0.2 0.4 0.6 0.8 1

Figure 4.1: Triangular wave f (t) (with T = 1) along with its Fourier series representations consisting of1, 2, 3, and 4 terms.

4.4.2 Fourier Coefficients*

First, let us reexpress the Fourier series of Eqn. (4.2) as

f (x) = a0 +∞∑n=1

an cos(nπxL ) +∞∑n=1

bn sin(nπxL ). (4.4)

We may use the orthogonality relations of Eqn. (4.1) to find the coefficients an and bn:

〈f |sm〉 = Lbm

〈f |cm〉 = Lam m 6= 0 (4.5)

〈f |c0〉 = 2La0

4.4.3 Symmetry*

A great deal of effort can be saved in calculating the integrals of Eqn. (4.5) if one considers thesymmetry of f (x) in advance. Note that cos(x) = cos(−x) is an even function, while sin(x) =− sin(−x) is an odd function. Then it follows that

• if f (x) is odd, i.e., f (x) = −f (−x), then an = 0 ∀ n and only sine terms are present in theFourier series,

Page 65: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

4. Integral Transforms 59

• and if f (x) is even, i.e., f (x) = f (−x), then bn = 0 ∀ n and only cosine terms are presentin the Fourier series.

4.4.4 Discontinuous Functions*

Consider the periodic sawtooth function given by

f (x) = x , x ∈ [−1, 1], f (x) = f (x + 2).

This function is odd, therefore an = 0 for all n. Calculating the remaining coefficients we find theFourier series to be

f (x) =∞∑n=1

2

nπ(−1)n+1 sin(nπx). (4.6)

Writing out the first few terms explicitly:

f (x) =2

π

[sin(πx)− 1

2sin(2πx) +

1

3sin(3πx)− 1

4sin(4πx) + ...

]Figure 4.2 shows f (x) and Fourier series approximations to f (x) with successively more terms.This example highlights an interesting feature: why does the Fourier series for the sawtooth funtion

-1

-0.5

0

0.5

1

-1.5 -1 -0.5 0 0.5 1 1.5

Figure 4.2: Sawtooth function f (x) along with its Fourier series representations consisting of 1, 3, 6, and20 terms.

converge more slowly than that for the triangular wave and what is going on with the apparentovershoot of the Fourier series for the sawtooth function at x = ±1?

Page 66: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

60 TSM-CDT: Mathematics: Arash Mostofi

The sawtooth function f (x) is discontinuous at x = ±1. Notice how the value of the Fourierseries at these points takes the arithmetic mean value of f (x) on either side of the discontinuity.This is a general result: if x = x0 is a point of discontinuity of f (x), then the value of the Fourierseries F (x) at x = x0 is

F (x0) = 12 [f (x0+) + f (x0−)]

Compare the Fourier series for the triangular wave (Eqn. 4.3 and Fig. 4.1) and the sawtoothfunction (Eqn. 4.6 and Fig. 4.2). What is noticeable is that for the former, the Fourier coefficientsdecrease as n−2, while for the latter they decrease more slowly as n−1. In other words, the Fourierseries for the sawtooth function converges more slowly as the importance of successive terms doesnot decrease as fast. The difference is due to the fact that the sawtooth function is discontinuous,while the triangular wave is continuous and has a discontinuous first derivative. In general iff (k−1)(x) is continuous, but f (k)(x) is not, then the Fourier coefficients will decrease as n−(k+1).

Take a closer look at Fig. 4.2. Let us denote the Fourier series truncated at the Nth term byFN(x):

FN(x) =N∑

n=0

an cos(knx) +N∑

n=1

sin(knx).

What is clear is that as more terms of the Fourier series are included, a better representation ofthe function f (x) is achieved in the sense that for a given value of x

||FN(x)− f (x)|| → 0 as n→∞.

However, if we zoom in on the discontinuity in Fig. 4.2, as shown in Fig. 4.3, we see that there isa significant overshoot of FN(x). Furthermore, as we add more terms to the series, although theposition of the overshoot moves further towards the discontinuity, its magnitude remains finite.This is called the Gibbs phenomenon and it means that there is always an error in representing adiscontinuous function5. By adding in more terms the size of the region over which there is anovershoot may be decreased, but the overshoot will always remain.

4.4.5 Parseval’s Theorem and Bessel’s Inequality*

Eqns. (3.7) and (3.8) apply to Fourier series and may be easily derived using the orthogonalityrelations.

||f ||2 ≥ a20 +

1

2

N∑n=1

[|an|2 + |bn|2

], (4.7)

||f ||2 = a20 +

1

2

∞∑n=1

[|an|2 + |bn|2

]. (4.8)

In order to understand the physical meaning of Parseval’s theorem, consider the nth harmonic inthe Fourier series of f (x) is

hn(x) = an cos(nπxL ) + bn sin(nπxL ).

5This phenomenon is not unique to Fourier series and is true of other basis function expansions.

Page 67: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

4. Integral Transforms 61

0.5

0.6

0.7

0.8

0.9

1

1.1

-1.5 -1.4 -1.3 -1.2 -1.1 -1

Figure 4.3: Sawtooth function f (x) along with its Fourier series representations consisting of 3, 6, and 20terms.

Using a suitable trigonometric identity, this may be expressed as

hn(x) = γn sin(nπxL + ϕn),

where γn =√

a2n + b2

n and tan(ϕn) =anbn

. Comparing with Parseval’s theorem, Eqn. 4.8, it can

be seen that each harmonic in the Fourier series contributes independently to the mean squarevalue of f (x). In many applications in physics, for example where f (x) represents a time-varyingcurrent in an electrical circuit, or an electromagnetic wave, the power density (energy per unitlength transmitted per unit time) is proportional to the mean square value of f (x), given byEqn. 4.8, and the set of coefficients a2

0, 12γ

2n is known as the power spectrum of f (x).

4.5 Fourier Transforms

An entirely equivalent representation to sines and cosines is provided by a basis of complex expo-nential functions (also known as plane-waves)

e±iknx = cos(knx)± i sin(knx).

Substituting into Eqn. (4.2) this Euler6 relation into the expression for f (x) gives

f (x) =∞∑

n=−∞cneiknx

6Leonhard Paul Euler (1707–1783), Swiss mathematician and physicist who made many pioneering contributions.

Page 68: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

62 TSM-CDT: Mathematics: Arash Mostofi

where the (complex) coefficient cn is related to an and bn.

Consider the function to have periodicity of L, i.e., f (x) = f (x + L). Then we require knL =2nπ ⇒ kn = 2nπ

L . I.e., we have a sum of plane waves that have wavevectors7 kn separated by

∆k =2π

L. Hence

f (x) =∞∑

n=−∞cneiknx

L∆k

If we now let L → ∞8, the spacing of wavevectors (and hence frequencies) approaches zero∆k → 0, and the discrete set of kn becomes a continuous variable k, and the infinite sumbecomes an integral over k :

f (x)L→∞,∆k→0−→ 1√

∫ ∞−∞

f (k)eikx dk

where we have defined f (k) ≡ lim∆k→0

cnL√2π

. This is the definition of the Fourier transform, which

may be thought of as the generalisation of Fourier series to the case of non-periodic functions.

4.5.1 Definition and Notation

Given f (x) such that∫∞−∞ |f (x)| dx <∞, its Fourier transform f (k) is defined as

f (k) =1√2π

∫ ∞−∞

f (x) e−ikx dx . (4.9)

The inverse Fourier transform is defined as

f (x) =1√2π

∫ ∞−∞

f (k) eikx dk . (4.10)

A varied set of notation is used to represent Fourier transforms, the most common of which aregiven below.

f ≡ f ≡ F ≡ F[f ] ≡ FT[f ] and f ≡ F−1[f ] ≡ FT−1[f ]

For the Fourier transform of a function f (x) to be well-defined a number of conditions must befulfilled:

• f (x) is piecewise continuous

• f (x) is differentiable

• f (x) is absolutely integrable:∫∞−∞ |f (x)| dx <∞

7The wavevector k is related to the wavelength L and frequency f by k =2π

L=

2πf

c, where c is the wave

speed.8In other words, the function f (x) is no longer periodic on any finite interval.

Page 69: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

4. Integral Transforms 63

Fourier transforms are defined such that if one starts with f (x) and performs a forward transformfollowed by an inverse transform on the result, then one ends up with f (x) again,

f (x)F−→ f (k)

F−1

−→ f (x)

which provides a definition of the Dirac delta-function.

4.5.2 Dirac Delta-Function

Taking Eqn. 4.10 as our starting point and substituting f (k) from Eqn. 4.9 into it we find

f (x) =1√2π

∫ ∞−∞

f (k) eikx dk

=1√2π

∫ ∞−∞

[1√2π

∫ ∞−∞

f (x ′) e−ikx′

dx ′]

eikx dk

=

∫ ∞−∞

f (x ′)

[1

∫ ∞−∞

eik(x−x ′) dk

]︸ ︷︷ ︸

δ(x−x ′)

dx ′

⇒ f (x) =

∫ ∞−∞

f (x ′) δ(x − x ′) dx ′ (4.11)

where we have defined the Dirac delta-function

δ(x − x ′) =1

∫ ∞−∞

eik(x−x ′) dk. (4.12)

4.5.3 Properties of the Fourier Transform

Usually f (x) = f ∗(x) is a real function, and we shall assume this to be the case. f (k) is, however,in general complex,

f ∗(k) =1√2π

∫ ∞−∞

f ∗(x)(

e−ikx)∗

dx =1√2π

∫ ∞−∞

f (x) eikx dx 6= f (k).

If f (x) has definite symmetry, then f (k) may be either purely real or purely imaginary,

• If f (x) = f (−x) is an even function ⇒ f ∗(k) = f (k) is purely real.

• If f (x) = −f (−x) is an odd function ⇒ f ∗(k) = −f (k) is purely imaginary.

Further useful properties include9

9Proofs are left as an exercise.

Page 70: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

64 TSM-CDT: Mathematics: Arash Mostofi

• Differentiation.F[f (n)(x)

]= (ik)n f (k)

In particular

F

[df

dx

]= ik f (k)

F

[d2f

dx2

]= −k2f (k)

• Multiplication by x .

F[xf (x)] = idf (k)

dk

• Rigid shift of co-ordinate.F[f (x − a)] = e−ika f (k)

4.5.4 Examples of Fourier Transforms

1. Square wave

f (x) =

1 x ∈ [0, 1]−1 x ∈ [−1, 0]0 |x | > 1

f (k) =1√2π

[−∫ 0

−1e−ikx dx +

∫ 1

0e−ikx dx

]=

1√2π

−[−1

ike−ikx

]0

x=−1

+

[−1

ike−ikx

]1

x=0

=

1√2π

2

ik(1− cos k) =

1√2π· 4

iksin2

(k2

)Note that since f (x) is odd, f (k) is imaginary.

2. Top-hat function

f (x) =

1 |x | < a0 |x | ≥ a

f (k) =1√2π

∫ a

−ae−ikxdx

=1√2π

[− 1

ike−ikx

]ax=−a

=

√2

π

sin(ka)

k

Note that since f (x) is even, f (k) is real.

Page 71: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

4. Integral Transforms 65

4.5.5 Bandwidth Theorem

The last example is a good demonstration of the bandwidth theorem or uncertainty principle: ifa function f (x) has a spread ∆x in x-space of, then its spread ∆k in k-space will be such that

∆x ∼ 1

∆k

The mathematics here is analogous to that behind the Heisenberg Uncertainty Principle in Quan-tum Mechanics, which states that for a quantum mechanical particle, such as an electron, itsposition x and momentum p cannot be both known simultaneously to arbitrary accuracy.

∆x ≥ h

∆p

where h is Planck’s constant. Another implication is for, eg, laser pulses: the shorter the pulse intime t, the larger the spread ∆ω of frequencies of harmonics that constitute it (see example onquestion sheet).

4.6 Diffraction Through an Aperture

An application of Fourier transforms is in determining diffraction patterns resulting from lightincident on an aperture. (The analysis is the same for diffraction of X-rays through crystals.) Itcan be shown (though we do not have time to do so here) that the intensity I (k) of light observedin the diffraction pattern is just the square of the Fourier transform of the aperture function h(x):

I (k) = |F[h(x)]|2 ,

where, refering to Fig. 4.4, k = 2π sin θλ , where λ is the wavelength of the incident light and θ is the

angle to the observation point on the screen. For small values of θ we have sin θ ≈ tan θ = X/D,hence k ≈ 2πX

λD , i.e., k is proportional to the displacement of the observation point on the screen.

Consider diffraction through a single slit of width a as an example. The aperture function is givenby a top-hat

h(x) =

1 x ∈ [− a

2 , a2 ]

0 otherwise

i.e., h = 1 corresponds to transparent parts of the aperture, and h = 0 to opaque parts. As weshowed before, the fourier transform of a top-hat function is given by a sinc function:

h(k) =

√2

π

sin(ka/2)

k=

a sinc(ka/2)√2π

.

Hence the intensity is given by

I (k) =a2 sinc2(ka/2)

2π.

The aperture function and intensity are plotted in Figs. 4.5 and 4.6.

Page 72: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

66 TSM-CDT: Mathematics: Arash Mostofi

λ

x

X

D

θ

screenaperture

incident wave

Figure 4.4: Geometry for Fraunhofer diffraction.

4.6.1 Parseval’s Theorem for Fourier Transforms

Parseval’s theorem for Fourier transforms states that∫ ∞−∞|f (x)|2 dx =

∫ ∞−∞|f (k)|2 dk (4.13)

Proof: ∫ ∞−∞|f (x)|2 dx =

∫ ∞−∞

f (x)f ∗(x) dx

=1

∫ ∞−∞

[∫ ∞−∞

f (k)eikx dk

]·[∫ ∞−∞

f ∗(k ′)e−ik′x dk ′

]dx

=

∫ ∞−∞

∫ ∞−∞

f (k)f ∗(k ′)

[1

∫ ∞−∞

ei(k−k′)x dx

]︸ ︷︷ ︸

δ(k−k ′)

dkdk ′

=

∫ ∞−∞

∫ ∞−∞

f (k)f ∗(k ′) δ(k − k ′) dkdk ′

=

∫ ∞−∞

f (k)f ∗(k) dk =

∫ ∞−∞|f (k)|2 dk .

The above proof may be easily generalised for two different functions f (x) and g(x):∫ ∞−∞

f (x)g∗(x) dx =

∫ ∞−∞

f (k)g∗(k) dk (4.14)

Page 73: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

4. Integral Transforms 67

− a2 0 a

2

x

Figure 4.5: Aperture function for single slit of width a.

−6πa −4π

a −2πa 0 2π

a4πa

6πa

k

Figure 4.6: Diffraction pattern for single slit of width a: I (k) =a2

2πsinc2(ka/2).

Page 74: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

68 TSM-CDT: Mathematics: Arash Mostofi

4.6.2 Convolution Theorem for Fourier Transforms

The convolution of two functions f (x) and g(x) is defined as

f ∗ g ≡∫ ∞y=−∞

f (y)g(x − y) dy , (4.15)

from which it is easy to show that f ∗ g = g ∗ f : in the above equation, let ξ = x − y ⇒ y =x − ξ, dy = −dξ

f ∗ g =

∫ −∞ξ=∞

f (x − ξ)g(ξ) (−dξ)

=

∫ ∞−∞

g(ξ)f (x − ξ) dξ

⇒ f ∗ g = g ∗ f

The convolution theorem may be stated in two, equivalent forms. Given two functions f and g ,

1. the Fourier transform of their convolution is directly proportional to the product of theirFourier transforms:

F[f ∗ g ] =√

2π f g (4.16)

2. the Fourier transform of their product is directly proportional to the convolution of theirFourier transforms:

F[fg ] =1√2π

f ∗ g (4.17)

The proofs are very similar, so we shall treat one here and the other is left as an exercise in theexamples sheet.

Proof:

F[f ∗ g ] =1√2π

∫ ∞−∞

[f ∗ g ](x)e−ikx dx

=1√2π

∫ ∞−∞

[∫ ∞−∞

f (y)g(x − y) dy

]e−ikx dx

=

∫ ∞−∞

f (y)

[1√2π

∫ ∞−∞

g(x − y)e−ikx dx

]dy (interchanging order of integration)

=

∫ ∞−∞

f (y)e−iky[

1√2π

∫ ∞−∞

g(x − y)e−ik(x−y) dx

]dy (× by e−ikyeiky = 1)

=

∫ ∞−∞

f (y)e−iky[

1√2π

∫ ∞−∞

g(ξ)e−ikξ dξ

]︸ ︷︷ ︸

g(k)

dy (substituting ξ = x − y)

=

[∫ ∞−∞

f (y)e−iky dy

]︸ ︷︷ ︸

√2πf (k)

g(k) =√

2π f (k)g(k).

Page 75: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

4. Integral Transforms 69

Among many other things, the convolution theorem may be used to calculate diffraction patternsof light incident on complicated apertures. In the next section, we will consider the case of Young’sdouble slit experiment.

4.6.3 Double Slit Diffraction

Earlier it was discussed how, in a Fraunhofer diffraction experiment, the amplitude of the signal atthe detection screen is given by the fourier transform of the aperture function h(x). The intensityI (k) that is observed is then just the square of this amplitude:

I (k) =∣∣∣h(k)

∣∣∣2 ,

where, with reference to Fig. 4.4, k = 2π sin θλ , where λ is the wavelength of the incident wave and

θ is the angle of observation. For small values of θ we have sin θ ≈ tan θ = X/D, hence k ≈ 2πXλD ,

i.e., k is proportional to the displacement of the observation point on the screen.

We saw earlier that for a single slit of width a, the aperture function of which is described by atop-hat function, the intensity observed is given by

I (k) =a2

2πsinc2(ka/2).

The aperture function and intensity are plotted in Figs. 4.5 and 4.6.

The aperture function for a double slit is shown in Fig. 4.7. Here the slit separation is d and theslit width is a. We could go ahead and evaluate the Fourier transform of the aperture functionh(x) for the double slit, but there is a more clever approach which uses the convolution theorem.We note that h(x) is actually the convolution of a double Dirac-delta function (with separationd) with a top-hat function of width a, as shown in Fig. 4.8

h(x) = f ∗ g(x).

The intensity at the screen is given by

I (k) =∣∣∣h(k)

∣∣∣2 = |F[f ∗ g ]|2

and, from the convolution theorem we know that

F[f ∗ g ] =√

2πF[f ]F[g ].

Hence, we can obtain the intensity from the Fourier transforms of f (x) and g(x). g(x) is atop-hat, and as we’ve seen before, it is easy to show that

F[g ] =a√2π

sinc(ka/2).

For the double Dirac-delta function, we see that

f (x) = δ(x − d/2) + δ(x + d/2),

Page 76: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

70 TSM-CDT: Mathematics: Arash Mostofi

−d2

d20

0

1a a

Figure 4.7: Aperture function for double slit: slit width a, separation d .

∗ =

f (x) g(x) h(x) = f ∗ g

Figure 4.8: Convolution theorem for double slit.

Page 77: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

4. Integral Transforms 71

Figure 4.9: Diffraction pattern for double slit.

hence

F[f ] =1√2π

∫ ∞−∞

[δ(x − d/2) + δ(x + d/2)] e−ikx dx

=1√2π

[e−ikd/2 + eikd/2

]=

√2

πcos(kd/2).

Putting it all together we see that the intensity on the screen is given by

I (k) =2a2

πsinc2(ka/2) cos2(kd/2)

which is plotted in Fig. 4.9.

What is seen is the broad envelope function which comes from the sinc(ka/2) factor, and thedenser interference fringes that come from the cos(kd/2) factor. Note that d > a, which is whythe frequency of the fringes is higher than that of the envelope function (a manifestation of thebandwidth theorem that was discussed earlier).

Page 78: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

72 TSM-CDT: Mathematics: Arash Mostofi

4.6.4 Differential Equations

One important use of Fourier transforms is in solving differential equations. A Fourier transformmay be used to transform a differential equation into an algebraic equation since

F

[∂nf

∂xn

]= (ik)nF[f ].

Example: Heat Diffusion Equation

Consider an infinite, one-dimensional bar which has an initial temperature distribution given byθ(x , t = 0) = δ(x). This is clearly a somewhat unphysical situation, but it’s not such a badapproximation to having a situation in which a lot of heat is initially concentrated at the mid-point of the rod.

The flow of heat is determined by the diffusion equation

D∂2θ

∂x2=∂θ

∂t,

which is second-order in x and first-order in t. D is the diffusion coefficient. The boundaryconditions on this problem are that θ(±∞, t) = 0.

Consider what happens if we Fourier transform both sides of this equation with respect to thevariable x :

1√2π

∫ ∞−∞

D∂2θ

∂x2e−ikx dx =

1√2π

∫ ∞−∞

∂θ

∂te−ikx dx

D1√2π

∫ ∞−∞

∂2θ

∂x2e−ikx dx =

∂t

[1√2π

∫ ∞−∞

θ(x , t)e−ikx dx

]DF

[∂2θ(x , t)

∂x2

]=

∂tF[θ(x , t)]

−Dk2θ(k , t) =∂θ(k , t)

∂t.

So the differential equation in x is replaced by an algebraic equation in k . We still have adifferential equation in t which is first-order, and also easy to solve:

θ(k, t) = A(k)e−Dk2t ,

where A(k) = θ(k, t = 0), a constant of integration, is just the Fourier transform of the initialcondition θ(x , t = 0) = δ(x), which we can calculate:

A(k) = θ(k, t = 0) =1√2π

∫ ∞−∞

δ(x)e−ikx dx =1√2π

.

Hence,

θ(k, t) =1√2π

e−Dk2t .

Page 79: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

4. Integral Transforms 73

So, we have Fourier transformed the original differential equation from “x-space” to “k-space”,solved it in k-space, and applied the initial condition in k-space. All that remains to be done isto inverse Fourier transform θ(k , t) back in order to obtain the solution in x-space.

In practice, this is typically the most difficult part of the process (there’s no such thing as a freelunch!), but for the example we have chosen, it turns out to be not too bad because, throughcompleting the square and changing the variable of integration, we are able to reduce the problemto a Gaussian integral of known value10:

θ(x , t) = F−1[θ(k , t)]

=1

∫ ∞−∞

e−Dk2teikx dk

=1

∫ ∞−∞

e−Dt[(k−ix/2Dt)2]−x2/4Dt dk combining exponents and completing the square

=1

2πe−x

2/4Dt

∫ ∞−∞

e−Dtq2dq changing variable to q = k − ix/2Dt ⇒ dq = dk

=1

√π

Dte−x

2/4Dt using the standard result for a Gaussian integral

Hence the final result11

θ(x , t) =1

2√πDt

e−x2/4Dt .

Fig. 4.10 shows the temperature distribution for different times t. Is the behaviour as you wouldexpect?

Figure 4.10: θ(x , t) for three different values of t.

10∫∞−∞ e−αx

2

dx =√π/α.

11The observant will have noticed that, in the fourth line of the derivation above, I should also have changed thelimits on the integral to q = ±∞− ix/2Dt, thus taking the contour of the integral off the real-q axis. It turns outthat I’m allowed to shift the contour back to the real-axis since the integrand has no singularities anywhere. Thisis the subject of fields of mathematics known as complex analysis and contour integration, which have many usesin physics and engineering, and which you will meet later in the course.

Page 80: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

74 TSM-CDT: Mathematics: Arash Mostofi

4.7 Laplace Transforms

A question arises as to what to do if a function f (x) does not tend to zero as x → ±∞ or if f (x)does not exist for x < x0, in which cases the Fourier transform of f (x) is not well-defined. Indeed,often we are interested in following the evolution of a function f (t) given an initial condition att = 0. The Laplace transform may be used for this purpose and is defined as

L[f (t)] ≡ f (s) =

∫ ∞0

e−st f (t)dt, (4.18)

where s = σ + iω is in general a complex variable and σ > 0 is required for convergence of theintegral.

4.7.1 Properties

• Linearity: L[af1(t) + bf2(t)] = aL[f1(t)] + bL[f2(t)]

• L[∂f∂t

]= sf (s)− f (0)

• More generally, L[f (n)(t)] = sn f (s)− sn−1f (0)− ...− f (n−1)(0)

• L[tnf (t)] = (−1)n dn

dsn f (s)

4.7.2 Examples

• L[1] = 1s

• L[e−at ] = 1s+a

• L[cosωt] = ss2+ω2

• L[sinωt] = ωs2+ω2

• L[tn] = n!sn+1

• L[te−at ] = 1(s+a)2

• L[f (at)] = 1|a| f (s/a)

• L[eat f (t)] = f (s − a)

• L[∫ t

0 f (t ′)dt ′]

= 1s f (s)

Proofs of the above will be given in the lectures.

Page 81: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

4. Integral Transforms 75

4.7.3 Solving Differential Equations

As can be seen from the examples above, Laplace transforms convert differentials and integrals inthe t-domain to algebraic forms in the s-domain. As a result, they are of use in solving differentialequations.

Example

f + 5f + 6f = 1; f (0) = f (0) = 0.

Applying a Laplace transform to the entire equation we find,

L[f + 5f + 6f ] = L[1]

s2f − sf (0)− f (0) + 5[sf − f (0)] + 6f =1

s

(s2 + 5s + 6)f =1

s.

On rearrangement, we obtain

f =1

s(s + 2)(s + 3)≡ A

s+

B

s + 2+

C

s + 3,

where, using partial fractions, we can show that A = 16 , B = −1

2 and C = 13 . The inverse

transform can now be done by inspection, using the results derived earlier, to give

f (t) =1

6− 1

2e−2t +

1

3e−3t .

4.7.4 Convolution Theorem for Laplace Transforms

As we saw earlier, a convolution of two functions f1(t) and f2(t) is defined as

f1 ∗ f2 =

∫ ∞−∞

f1(t ′)f2(t − t ′)dt ′.

If f1 and f2 vanish for t < 0, then

f1 ∗ f2 =

∫ t

0f1(t ′)f2(t − t ′)dt ′.

Defining g(t) ≡ f1 ∗ f2, the convolution theorem for Laplace transforms states that

g(s) = f1(s)f2(s).

The proof is as follows:

g(s) =

∫ ∞0

dt e−st[∫ t

0f1(t ′)f2(t − t ′)dt ′

]=

∫ ∞0

dt

∫ t

0dt ′ f1(t ′)f2(t − t ′)e−st

Page 82: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

76 TSM-CDT: Mathematics: Arash Mostofi

Figure 4.11: Integration area for proof of convolution theorem for Laplace transforms.

The integrals over t and t ′ cover the area in the t-t ′ plane shown in Fig. 4.11. As can be seen,therefore, the integrals can equivalently be written

g(s) =

∫ ∞0

dt ′[∫ ∞

t′dt f2(t − t ′)e−st

]f1(t ′)

Substituting t ′′ = t − t ′ in the term in square brackets gives

g(s) =

∫ ∞0

dt ′[∫ ∞

0dt ′′f2(t ′′)e−s(t′′+t′)

]f1(t ′)

g(s) =

∫ ∞0

dt ′[∫ ∞

0dt ′′f2(t ′′)e−st

′′]

e−st′f1(t ′)

= f2(s)

∫ ∞0

dt ′e−st′f1(t ′)

= f2(s)f1(s),

as required.

Example: Evaluate the integral J =∫ 1

0 yn(1− y)mdy .

Consider

g(t) ≡ tn ∗ tm =

∫ t

0τn(t − τ)mdτ . (4.19)

Substituting τ = st and dτ = tds gives

g(t) =

∫ 1

0sntn(t − ts)mtds

= tn+m+1

∫ 1

0sn(1− s)mds

= tn+m+1J.

Taking the Laplace transform of this result gives

g(s) =(n + m + 1)!

sn+m+2J,

Page 83: Imperial College London Department of Physics Mathematics ...pruess/teaching/MTM_2014/MTM_Lecture_Notes.pdf · Imperial College London Department of Physics Mathematics for the Theory

4. Integral Transforms 77

while using the convolution theorem on Eqn. 4.19 gives

g(s) = L[tn]L[tm] =n!m!

sn+m+2.

Then, on equating the last two expressions we have derived for g(s), we obtain the final result

J =n!m!

(n + m + 1)!.