introductionmaths.dur.ac.uk/users/daniel.evans/mes/mesnotesupdated.pdf · as above, we treat yand...

MATHEMATICS FOR ENGINEERS & SCIENTISTS (MATH1551)

Introduction

Mathematics for Engineers & Scientists covers topics useful to a wide variety of Engineers and Scientists.In this half of the course we’ll be covering 3 areas of mathematics, each of which is highly relevant toyour honours degree courses. A brief outline of the term’s work is as follows:Functions of several variables

• differentiation of functions of several variables,• the chain rule,• grad, div and curl,• applications to surfaces.

Solutions of ordinary differential equations

• Various first order ODE’s,• second order linear ODE’s.

Linear Algebra

• Solving simultaneous equations with matrices,• Gaussian Elimination and LU factorisation,• eigenvalues and eigenvectors,• iterative methods - Jacobi, Gauss-Seidel and the SOR methods.

These topics will be reasonably independent and, as we go along, we will try to motivate and givephysical interpretations and applications of each.

1

2 MATHEMATICS FOR ENGINEERS & SCIENTISTS

1. Functions of Several Variables

Last term you covered differentiation and integration of functions of a single variable. Life isn’t usuallythat simple, and most physical quantities depend on several parameters. In this first chapter, we’lldiscuss how to differentiate functions of several variables, and how this helps to analyse their behaviour.

1.1. Partial Differentiation.

We’ll begin with an example illustrating the general idea.

Example 1.1. Consider a cone with height h cmand base circle of radius r cm. It has volume

V (r, h) =1

3πr2h cm3

If the height remains constant but r changes, findthe rate of change of V relative to r.

h

r

Here, although V (r, h) is a function of 2 variables r and h, we keep the h constant and vary the r. Therate of change is obtained by just differentiating with respect to r:

∂V

∂r=

2

3πrh.

Note that we use the symbol ∂ not rather than d. This is conventional notation to remind us that weare only letting r vary.Similarly, we can say what the rate of change is if the radius is fixed and the height changes:

∂V

∂h=

1

3πr2.

This is the idea of partial differentiation. Given a function of several variables, e.g. f(x, y, z), the partialderivative of f with respect to x is obtained by treating all variables in the formula for f , except x, asconstants and differentiating with respect to x. Write ∂f∂x for this. Similarly for

∂f∂y ,

∂f∂z , etc.

Example 1.2. Suppose we are given a function f(x, y, z) = ey2

cos(xyz) (so f could be measuring ascalar quantity in a solid, e.g temperature, electric charge, density, distribution of impurity, etc.) If yand z stay fixed, find the rate of change of f relative to x.As above, we treat y and z as constants and just differentiate with respect to x. We get

∂f

∂x= −yzey

2

sin(xyz)

Similarly, (applying the product and chain rules) we find

∂f

∂y= 2yey

2

cos(xyz)− xzey2

sin(xyz), and∂f

∂z= −xyey

2

sin(xyz).

This process can be repeated to give higher derivatives as in the single variable case:

Example 1.3. Let f(x, y) = cos y + sin(xy). Then

∂f

∂x= cos(xy),

∂2f

∂x2=

∂

∂x

(∂f

∂x

)= −y2 sin(xy)

∂f

∂y= − sin y + x cos(xy), ∂

2f

∂y2=

∂

∂y

(∂f

∂y

)= − cos y − x2 sin(xy)

MATHEMATICS FOR ENGINEERS & SCIENTISTS 3

We can also find mixed derivatives; in this case, these are

∂2f

∂y∂x=

∂

∂y

(∂f

∂x

)= cos(xy)− xy sin(xy)

and∂2f

∂x∂y=

∂

∂x

(∂f

∂y

)= cos(xy)− xy sin(xy).

Notice that these are equal - this happens for all “reasonable functions”; one generally has things like

∂

∂x

(∂f

∂y

)=

∂

∂y

(∂f

∂x

)for all of the functions we meet in this course.

1.2. Chain Rule.

Recall the Chain Rule for functions of one variable: if f(x) is a function of x and if x is a function x(t)of t, then

df

dt=df

dx

dx

dt.

A similar thing happens for functions of several variables. For a function of two variables f(x, y) wherex = x(t), y = y(t), we have

df

dt=∂f

∂x

dx

dt+∂f

∂y

dy

dt

A similar formula is true for f(x, y, z):

df

dt=∂f

∂x

dx

dt+∂f

∂y

dy

dt+∂f

∂z

dz

dt.

Let’s return to the cone example from earlier:

Example 1.4. Suppose we have a cone withheight h cm, base circle of radius r cm and vol-ume

V (r, h) =1

3πr2h cm3

where the height and radius vary with time. Howdoes the volume V vary with time ?

h

r

We will answer this in two ways.

Method 1. Substitute the values of h and r into the expression for V and differentiating this directly.

V (r, h) =1

3πr2h

=⇒ V (t) = 13π (2 + sin t)

2(2 + cos t)

=⇒ dVdt

=1

3π(

2 cos t (2 + sin t) (2 + cos t)− (2 + sin t)2 sin t)

Method 2. Using the Chain Rule given above

dV

dt=∂V

∂r

dr

dt+∂V

∂h

dh

dt

=2

3πrh cos t− 1

3πr2 sin t

=2

3π (2 + sin t) (2 + cos t) cos t− 1

3π (2 + sin t)

2sin t

which is the same expression as in Method 1.


There is also a Chain Rule of the following form: for f(x, y) where x = x(u, v), y = y(u, v), we have

∂f

∂u=∂f

∂x

∂x

∂u+∂f

∂y

∂y

∂u

∂f

∂v=∂f

∂x

∂x

∂v+∂f

∂y

∂y

∂v

Example 1.5. Let f(x, y) = sin(xy). If x = 2u + 2v, y = 2u − 2v, then we can evaluate ∂f∂u in twoways as in the previous example.

Method 1. Substitute the values of x and y into the expression for f and then

f(x, y) = sin(xy)

= sin (4 (u+ v) (u− v))= sin

(4(u2 − v2

))and so differentiating gives

∂f

∂u= 8u cos

(4(u2 − v2

))∂f

∂v= −8v cos

(4(u2 − v2

))Method 2. Using the Chain Rule given above

∂f

∂u=∂f

∂x

∂x

∂u+∂f

∂y

∂y

∂u

= 2y cos(xy) + 2x cos(xy)

= 2 (2u− 2v + 2u+ 2v) cos(4(u2 − v2

))= 8u cos

(4(u2 − v2

))the same expression as in Method 1.

∂f

∂v=∂f

∂x

∂x

∂v+∂f

∂y

∂y

∂v

= 2y cos(xy)− 2x cos(xy)= 2 (2u− 2v − (2u+ 2v)) cos

(4(u2 − v2

))= −8v cos

(4(u2 − v2

))which is the same expression as in Method 1.

Example 1.6. Let f(x, y, z) = x+ 2y + z2 where

x(r, s) =r

s, y(r, s) = r2 + log |s|, z(r, s) = 2r.

for s 6= 0. Find ∂f∂r

.

By the Chain Rule,

∂f

∂r=∂f

∂x

∂x

∂r+∂f

∂y

∂y

∂r+∂f

∂z

∂z

∂r

= 1× 1s

+ 2× 2r + 2z × 2

=1

s+ 4r + 8r

=1

s+ 12r

Exercise. Use the same method to show∂f

∂s=

2

s− rs2.


1.3. Higher derivatives.

Example 1.7. Let f(x, y) be a function of x, y, and let x = veu, y = ve−u. Show that

∂f

∂u= veu

∂f

∂x− ve−u ∂f

∂y

and find similar expressions for∂f

∂vand

∂

∂u

(∂f

∂x

).

By the chain rule,

∂f

∂u=∂f

∂x

∂x

∂u+∂f

∂y

∂y

∂u

= veu∂f

∂x− ve−u ∂f

∂y(1.1)

∂f

∂v=∂f

∂x

∂x

∂v+∂f

∂y

∂y

∂v

= eu∂f

∂x+ e−u

∂f

∂y(1.2)

Notice that the formula (1.1) can apply to any function of x and y, not just f(x, y). Thus we can

replace f by ∂f∂x in (1.1) to get

(1.3)∂

∂u

(∂f

∂x

)= veu

∂2f

∂x2− ve−u ∂

2f

∂y∂x

Exercise: show similarly, that

(1.4)∂

∂u

(∂f

∂y

)= veu

∂2f

∂x∂y− ve−u ∂

2f

∂y2

Using these formulae, we can say some more. We calculate

∂2f

∂u∂v=

∂

∂u

(∂f

∂v

)=

∂

∂u

(eu∂f

∂x+ e−u

∂f

∂y

)= eu

∂f

∂x+ eu

∂

∂u

(∂f

∂x

)− e−u ∂f

∂y+ e−u

∂

∂u

(∂f

∂y

)= eu

∂f

∂x+ eu

(veu

∂2f

∂x2− ve−u ∂

2f

∂y∂x

)− e−u ∂f

∂y+ e−u

(veu

∂2f

∂x∂y− ve−u ∂

2f

∂y2

)= eu

∂f

∂x− e−u ∂f

∂y+ ve2u

∂2f

∂x2− ve−2u ∂

2f

∂y2.

Thus by (1.1) above we have

v∂2f

∂u∂v= veu

∂f

∂x− ve−u ∂f

∂y+ v2e2u

∂2f

∂x2− v2e−2u ∂

2f

∂y2

=∂f

∂u+ v2e2u

∂2f

∂x2− v2e−2u ∂

2f

∂y2

and so

v∂2f

∂u∂v− ∂f∂u

= v2e2u∂2f

∂x2− v2e−2u ∂

2f

∂y2

= x2∂2f

∂x2− y2 ∂

2f

∂y2

This type of calculation can be very useful, for instance, in studying partial differential equations.Think of it as changing variables from (x, y) to (u, v) and expressing a particular differential “object”in the new variables. As in single variable calculus, a well-chosen change of variables can make solvinga problem much easier (see e.g. the questions involving the Laplace equation on the question sheet).


1.4. Directional derivatives and the gradient of a function.

The partial derivative∂f

∂xat a point p gives the rate of change of f when we move from p in the

direction parallel to the x-axis. Similar statements hold for∂f

∂yand

∂f

∂z. This leads us to ask what is

the rate of change of f when we move in a general direction from a point ?

Suppose we are given a point and a direction vector

p = (p1, p2, p3) = p1i + p2j + p3k and v = (v1, v2, v3) = v1i + v2j + v3k.

Furthermore, for now, assume that v has unit length so that |v| =√v21 + v

22 + v

23 = 1. From the first

term, we know that the line through p and in direction v is given by

r(t) = p + tv.

Here, t is a real parameter - if we think of this as time, then the point r(t) moves along the line starting

at p = r(0) and at unit velocity v =dr

dt. In cartesian coordinates,

r(t) = (x(t), y(t), z(t)) = (p1 + tv1, p2 + tv2, p3 + tv3)

so if f(x, y, z) is a scalar function, the rate of change of f along r(t) is

df

dt=∂f

∂x

dx

dt+∂f

∂y

dy

dt+∂f

∂z

dz

dt

=

(∂f

∂x,∂f

∂y,∂f

∂z

)•(dx

dt,dy

dt,dz

dt

)=

(∂f

∂x,∂f

∂y,∂f

∂z

)• (v1, v2, v3) =

(∂f

∂x,∂f

∂y,∂f

∂z

)• v

This leads us to introduce the following concept.

Definition 1.8. The gradient vector field of f(x, y, z) is the vector

∇f =(∂f

∂x,∂f

∂y,∂f

∂z

)=∂f

∂xi +

∂f

∂yj +

∂f

∂zk

Note ∇f is also sometimes written grad f .

Thus, the rate of change of f(x, y, z) at p in unit direction v is given by

df

dt

∣∣∣∣t=0

= ∇f(p) • v.

If v is not a unit vector, then we can easily divide it by it’s length |v|, giving the general definition:

Definition 1.9. The directional derivative of f at p in the direction v is ∇f (p) • v|v|

.

Example 1.10. If f(x, y, z) = x3 − xy2 − z, find the directional derivative of f at (1, 1, 0) in thedirection 2i− 3j + 6k.

We first need to calculate the gradient of f :

∇f =(3x2 − y2

)i− 2xyj− k

and so∇f(1, 1, 0) = 2i− 2j− k.

Hence the required directional derivative is

(2i− 2j− k) • (2i− 3j + 6k) 1√4 + 9 + 36

=4 + 6− 6√

49=

4

7

For example, suppose the temperature distribution in space is given by f(x, y, z) = x3−xy2−z degreesCentigrade. At a certain instant in time, a probe is at the point (1, 1, 0) and travelling with speed 1cm/sec in the direction 2i − 3j + 6k. Then, the above calculation shows that the rate of change oftemperature at that instant measured by the probe is 47

oC/sec.


We stress that f(x, y, z) is a scalar -valued function and ∇f is a vector -valued function.

All of the above works in any number of dimensions. For instance, consider the following 2-dimensionalexample:

Example 1.11. The height of a slope above sea level is f(x, y) = 2y2 + x2 (in some unit of distance)at coordinates (x, y). Find the rate of change of height (i.e. the rate of ascent) when starting at (1, 2)and moving at unit speed towards the south-east.

We need to calculate the directional derivative of f at (1, 2) in the direction i− j. Now∇f = (2x, 4y)

so evaluating this at the point gives

∇f(1, 2) = 2i + 8j.Hence the required directional derivative is

(2i + 8j) • (i− j)√1 + 1

= − 6√2

and so the rate of change of height is −6/√

2 ' −4.24.

Natural question: Given a scalar function f(x, y, z) and a point p, in which direction from p doesf have the largest directional derivative, and what is this largest value? For instance, in the previousexample, in which direction is the steepest ascent, and how steep is it?

Recall that the directional derivative of f at p in the direction v is

∇f (p) • v|v|

= |∇f (p)| cos θ

where θ is the angle between ∇f (p) and v. This clearly has maximum value when cos(θ) = 1 so weneed θ = 0 and hence v is parallel to ∇f(p). Thus, the steepest increase is we move off in the direction∇f(p and the maximum value is |∇f(p|.

Example 1.12. If f(x, y) = − cos(xy), in what direction should one travel from(π2 , 1)

in order tomaximise the rate of change of f?

We calculate

∇f(x, y) = (y sin(xy), x sin(xy))and

∇f(π

2, 1)

= i +π

2j.

So we have to travel in the direction i + π2 j and the rate of change is∣∣∣i + π2

j∣∣∣ = √1 + π2

4.

1.5. Tangent plane to a surface.

First, we give a brief reminder about planes, lines and surfaces in 3-dimensional space.

Planes. The equation of the plane through the point p = (a, b, c) and normal to n = n1i +n2j +n3kis given by

(x− p) • n = 0.Putting in the coordinates x = (x, y, z), turns this into the familiar form

n1x+ n2y + n3z = C

where C = n1a+n2b+n3c. For example, the equation of the plane through (2,−1, 5) which is normalto 4i + 3j− k is

4x+ 3y − z = 4× 2 + 3(−1)− 5 = 0.

Lines. The equation of the line through p = (a, b, c) in the direction v = v1i + v2j + v3k is, inparametric form, given by

x = p + λv


where λ is an arbitrary parameter. Putting in the coordinates x = (x, y, z) makes this

(x, y, z) = (a, b, c) + λ (v1, v2, v3)

= (a+ λv1, b+ λv2, c+ λv3).

Rearranging by solving for λ gives the alternative form:

λ =x− av1

=y − bv2

=z − cv3

.

For example, the equation of the line through (2,−1, 5) in the direction 4i + 3j− k is

x = (2,−1, 5) + λ (4, 3,−1)

or equivalently,x− 2

4=y + 1

3=z − 5−1

.

Surfaces. We will consider surfaces with equations of the form f(x, y, z) = C where C is a constant.These are the sometimes called level surfaces of the function f . You are probably familiar with anumber of surfaces written in this form:

Example 1.13.

The surface of the sphere of radius rand centred at the origin has equation

x2 + y2 + z2 = r2.

Example 1.14.The cylinder of radius r centred alongthe z-axis has equation

x2 + y2 = r2.

Example 1.15.

The paraboloid obtained by rotating aparabola about the z-axis has equation

x2 + y2 − z = 0.


A surface has a tangent plane and normal line at each point. We will next see how to write down theequations of this plane and line.

Let S be a surface with equation f(x, y, z) = C and let r(t) = (x(t), y(t), z(t)) be a curve on S. Thenf (r(t)) = C for all t, so differentiating with respect to t using the chain rule gives

0 =df

dt

0 =∂f

∂x

dx

dt+∂f

∂y

dy

dt+∂f

∂z

dz

dt

0 = ∇f • drdt

But the tangent plane to S at p is made up of all vectors tangent to curves on S through p, so itfollows that

Proposition 1.16. Let S be a surface with equation f(x, y, z) = C and let p be a point of S. Then∇f (p) is the normal vector to S at p.Equivalently, the tangent plane to S at p is the plane through p which is perpendicular to ∇f (p).

Example 1.17. A sheet of metal is shaped to make a surface with equation f(x, y, z) = x2+y2+yz = 0.A length of metal rod is to be welded to the sheet at two points with one end at (2,−1, 5) meeting thesheet at right angles. Find the other point of the sheet at which the metal rod should be attached andthe tangent plane to the surface at (2,−1, 5).

First check that (2,−1, 5) is on the surface: f(2,−1, 5) = 4 + 1− 5 = 0.Now ∇f = 2xi + (2y + z)j + yk and so

∇f (2,−1, 5) = 4i + 3j− k.Also, the normal line is the line through (2,−1, 5) in the direction 4i + 3j− k, and so has equation

x− 24

=y + 1

3=z − 5−1

or, in parametric form,

(x, y, z) = (2,−1, 5) + λ (4, 3,−1) = (2 + 4λ,−1 + 3λ, 5− λ).We need to find the values of λ for which these points lie on S, i.e. for which

(2 + 4λ)2

+ (−1 + 3λ)2 + (−1 + 3λ) (5− λ) = 0that is, 22λ2 + 26λ = 0. This has two solutions: λ = 0 which corresponds to the point (2,−1, 5) andλ = − 1311 which corresponds to the other intersection point, namely(

2− 5211,−1− 39

11, 5 +

13

11

)=

(−30

11,−50

11,

68

11

).

Furthermore, the tangent plane through (2,−1, 5) is perpendicular to ∇f (2,−1, 5) = 4i + 3j− k andso has equation

4x+ 3y − z = 4× 2 + 3(−1)− 5 = 0.

Actually, the surface x2 + y2 + yz = 0 is a coneand looks something like the picture here.


1.6. Div and Curl.

Recall that for a general function f(x) = f(x, y, z), we have the gradient

∇f =(∂f

∂x,∂f

∂y,∂f

∂z

)= i

∂f

∂x+ j

∂f

∂y+ k

∂f

∂z.

We can think of

∇ =(∂

∂x,∂

∂y,∂

∂z

)= i

∂

∂x+ j

∂

∂y+ k

∂

∂z

as a vector in it’s own right - a kind of hybrid of vector and differentiation which operates on thescalar-valued function f to give the vector-valued function ∇f . We will now see two other importantways of combining ∇ with functions. As well as scalar-valued functions of vectors (often called scalarfields) such as f(x) = f(x, y, z) above, we can also consider vector-valued functions of vectors:

A (x) = (A1(x, y, z), A2(x, y, z), A3(x, y, z)) .

Such functions (often called vector fields) are extremely important in many areas of Engineering andScience. For instance, consider a gravitational field; at every point x = (x, y, z) in space, there is agravitational force. This is a vector with each component being a function of x, y and z.

Now, we can combine two vectors a = (a1, a2, a3) and b = (b1, b2, b3) using the dot product :

a • b = a1b1 + a2b2 + a3b3.

This leads us to make the following definition.

Definition 1.18. Given a vector-valued function A (x) = (A1(x, y, z), A2(x, y, z), A3(x, y, z)), we de-fine it’s divergence to be

div A = ∇ •A =(∂

∂x,∂

∂y,∂

∂z

)• (A1, A2, A3) =

∂A1∂x

+∂A2∂y

+∂A3∂z

.

Example 1.19.

(i) Let A (x) =(x2 + yz, xyz2, y2 − z2

). Then div A is

∇ •A =(∂

∂x,∂

∂y,∂

∂z

)•(x2 + yz, xyz2, y2 − z2

)=

∂

∂x

(x2 + yz

)+

∂

∂y

(xyz2

)+

∂

∂z

(y2 − z2

)= 2x+ xz2 − 2z.

(ii) Let A (x) =(yz sinx, x+ z2 cos2 y, xyz − tan z

). Then div A is

∇ •A =(∂

∂x,∂

∂y,∂

∂z

)•(yz sinx, x+ z2 cos2 y, xyz − tan z

)=

∂

∂x(yz sinx) +

∂

∂y

(x+ z2 cos2 y

)+

∂

∂z(xyz − tan z)

= yz cosx− 2z2 cos y sin y + xy − sec2 z.

Another way to combine two vectors a = (a1, a2, a3) and b = (b1, b2, b3) is via the cross-product :

a× b =

∣∣∣∣∣∣i j ka1 a2 a3b1 b2 b3

∣∣∣∣∣∣ = (a2b3 − a3b2) i + (a3b1 − a1b3) j + (a1b2 − a2b1) k.Definition 1.20. Given a vector-valued function A (x) = (A1(x, y, z), A2(x, y, z), A3(x, y, z)), we de-fine it’s curl to be

curl A = ∇×A =

∣∣∣∣∣∣i j k∂∂x

∂∂y

∂∂z

A1 A2 A3

∣∣∣∣∣∣ =(∂A3∂y− ∂A2

∂z

)i +

(∂A1∂z− ∂A3

∂x

)j +

(∂A2∂x− ∂A1

∂y

)k.


Example 1.21.

(i) Let A (x) =(x2 + yz, xyz2, y2 − z2

). Then curl A is

∇×A =

∣∣∣∣∣∣∣i j k∂

∂x

∂

∂y

∂

∂zx2 + yz xyz2 y2 − z2

∣∣∣∣∣∣∣=

(∂(y2 − z2

)∂y

−∂(xyz2

)∂z

)i +

(∂(x2 + yz

)∂z

−∂(y2 − z2

)∂x

)j +

(∂(xyz2

)∂x

−∂(x2 + yz

)∂y

)k

= (2y − 2xyz) i + yj +(yz2 − z

)k

=(2y − 2xyz, y, yz2 − z

).

(ii) Let A (x) =(yz sinx, x+ z2 cos2 y, xyz − tan z

). Then curl A is

∇×A =

∣∣∣∣∣∣∣i j k∂

∂x

∂

∂y

∂

∂zyz sinx x+ z2 cos2 y xyz − tan z

∣∣∣∣∣∣∣=

(∂ (xyz − tan z)

∂y−∂(x+ z2 cos2 y

)∂z

)i +

(∂ (yz sinx)

∂z− ∂ (xyz − tan z)

∂x

)j

+

(∂(x+ z2 cos2 y

)∂x

− ∂ (yz sinx)∂y

)k

=(xz − 2z cos2 y

)i + (y sinx− yz) j + (1− z sinx) k

=(xz − 2z cos2 y, y sinx− yz, 1− z sinx

).

Summarising how our three vector calculus operators work:

- grad takes scalar-valued functions to vector-valued functions,- div takes vector-valued functions to scalar-valued functions,- curl takes vector-valued functions to vector-valued functions.

We don’t have the time to go into the many applications of these new objects but will end this sectionhere by indicating why they’re so useful. In very vague terms:

- as we have seen, the gradient grad f measures how a scalar field f(x) increases,- the divergence div A measures how a vector field A (x) expands or contracts,- the curl curl A measures how a vector field A (x) rotates.

Understanding these is vital, for instance, in the study of Electromagnetism, Fluid Dynamics, Kine-matics, General Relativity,... See e.g. the Wikipedia pages on div, grad and curl for more details.

1.7. Critical points - local maxima and minima.

Recall that for a function f(x) of one variable, the points at whichdf

dx= 0 are called critical points or

stationary points. The tangent to the graph at these points is horizontal.There is a simple test for local maxima and minima:

• If dfdx

= 0 andd2f

dx2< 0 then it’s a local maximum.

• If dfdx

= 0 andd2f

dx2> 0 then it’s a local minimum.

• If dfdx

= 0 andd2f

dx2= 0 then the test is inconclusive.

For example,

• f(x) = −x2 has a local maximum at x = 0 since f ′(0) = 0 and f ′′(0) = −2 < 0.• f(x) = x2 has a local minimum at x = 0 since f ′(0) = 0 and f ′′(0) = 2 > 0.• f(x) = −x4, x4, x3 at x = 0 have a local maximum, local minimum, and point of inflexion

respectively.


We’ll now extend this to a function f(x, y) of two variables. The graph of such a function is the surfacez = f(x, y) sitting above the xy plane.

Analogously to the case of functions of one variable, points at which∂f

∂x=∂f

∂y= 0 are called critical

points. The tangent plane to the graph at these points is horizontal. This time, there are three typesof critical points illustrated in the following simple examples.

Example 1.22.

The surface z = x2 + y2 is a paraboloid ofrevolution and the function f(x, y) = x2 + y2

has a local minimum at (x, y) = (0, 0).

Example 1.23.

The surface z = 1− x2 − y2 is also paraboloid ofrevolution and the function f(x, y) = 1− x2− y2has a local maximum at (x, y) = (0, 0).

Example 1.24.

The surface z = y2 − x2 is a hyperbolicparaboloid and the function f(x, y) = y2 − x2has a saddle point at (x, y) = (0, 0).

The test for local maxima, minima and saddle points is as follows:

(1) If∂f

∂x=∂f

∂y= 0 and

∂2f

∂x2∂2f

∂y2−(∂2f

∂y∂x

)2> 0, but

∂2f

∂x2< 0 then it’s a local maximum.

(2) If∂f

∂x=∂f

∂y= 0 and

∂2f

∂x2∂2f

∂y2−(∂2f

∂y∂x

)2> 0, but

∂2f

∂x2> 0 then it’s a local minimum.

(3) If∂f

∂x=∂f

∂y= 0 and

∂2f

∂x2∂2f

∂y2−(∂2f

∂y∂x

)2< 0, then it’s a saddle point.

(4) If∂f

∂x=∂f

∂y= 0 and

∂2f

∂x2∂2f

∂y2−(∂2f

∂y∂x

)2= 0, then the test is inconclusive.

The proof of this test uses Taylor’s Theorem for functions of two variables; see e.g. the book by Stroudif you want to read a proof.


Remark. In (1) and (2) above, we have

∂2f

∂x2∂2f

∂y2−(∂2f

∂y∂x

)2> 0

∂2f

∂x2∂2f

∂y2>

(∂2f

∂y∂x

)2> 0

Thus in each case,∂2f

∂y2and

∂2f

∂x2are either both positive or both negative.

Example 1.25. Find all the critical points of the function

f(x, y) = 2x3 − x2y + 12y2

and, if possible, classify each critical point as a local maximum, minimum, or saddle point.

We calculate∂f

∂x= 6x2 − 2xy and ∂f

∂y= −x2 + y.

So for critical points we need

6x2 − 2xy = 0 and − x2 + y = 0⇐⇒ 2x(3x− y) = 0 and y = x2

⇐⇒ x = 0 or 3x = y = x2 and y = x2

⇐⇒ x = 0 and y = 0 or y = x2 = 3x⇐⇒ (x, y) = (0, 0) or (x, y) = (3, 9)

Also, we have∂2f

∂x2= 12x− 2y, ∂

2f

∂y2= 1 and

∂2f

∂y∂x= −2x.

We can now apply the test at (0, 0):

∂2f

∂x2∂2f

∂y2−(∂2f

∂y∂x

)2= 0

so the test is inconclusive here.

Applying the text at (3, 9):

∂2f

∂x2∂2f

∂y2−(∂2f

∂y∂x

)2= (36− 18)1− (−6)2 = 18− 36 = −18 < 0

so (3, 9) is a saddle point.

Notation When working with partial derivatives, the following notation is often used:

fx =∂f

∂x, fxx =

∂2f

∂x2, fy =

∂f

∂y, fyy =

∂2f

∂y2,

fxy =∂2f

∂x∂y, fyx =

∂2f

∂y∂x, etc.

Example 1.26. Find all the critical points of the function f(x, y) = y2 + cosx and classify each as alocal maximum, local minimum, or saddle point.

We calculating the first derivatives to be

fx = − sinx and fy = 2y.These are both zero when x = nπ and y = 0 where n is any integer and so the critical points are(nπ, 0). Also, the second derivatives are

fxx = − cosx, fyy = 2, and fxy = 0Notice that cosnπ = (−1)n so, at (nπ, 0),

fxx = (−1)n+1 and fxxfyy − f2xy = 2(−1)n+1.

When n is odd, fxxfyy − f2xy > 0 and fxx > 0 so by the test, (nπ, 0) is then a local minimum.When n is even, fxxfyy − f2xy < 0 so by the test, (nπ, 0) is then a saddle point.


2. Differential Equations

2.1. Background.

An ordinary differential equation (O.D.E.) is an equation connecting a function and some of its deriva-tives. These are extremely common across the sciences.

Example 2.1. (These involve time so we use the variable t here.)

(i) Falling stone: the velocity v(t) of a stone of mass m falling under gravity satisfies the differentialequation

mdv

dt= mg − λv2

where λ is a constant measuring air resistance.(ii) Radioactivity: radiation given out is proportional to the amount y(t) of radioactive substance

present at time t, i.e.dy

dt= −ky

where k is a positive constant determined by the half-life.(iii) Electrical circuits: an LCR circuit consists of an inductor (L henries), a capacitor (C farads),

a resistor (R ohms) and a voltage source (V (t) volts) which varies with time t. The currentI(t) at time t satisfies a differential equation:

Ld2I

dt2+R

dI

dt+I

C=dV

dtV (t)

L

C

R

There are of course many more examples governed by O.D.E.s. For example, Newton’s law of cool-ing, forced oscillations of a damped spring, bending moments, missile pursuit, cables of a suspensionbridge,...

General and particular solutions

We start by looking at a very easy O.D.E.

d2y

dx2= 0.

This is an O.D.E. of order 2 (the order of an O.D.E. is the order of the highest derivative that occursin it).

Clearly y = x is a solution, but it is not the only one. For example, y = 3x + 1 is also a solution; infact, by integrating twice we see that the solutions are given by

y = Ax+B

for any real constants A,B. There are two constants here because we need to integrate twice tosolve the O.D.E. In general in solving an O.D.E. of order n we will get n arbitrary constants in thegeneral solution. Then, a particular solution is determined by n conditions in order to determine theseconstants. For instance, in the example above, y = Ax + B is the general solution, and to find theparticular solution with y(0) = 1, y′(0) = 2 we need to find A,B such that

1 = y(0) = A× 0 +B2 = y′(0) = A.

Hence A = 2, B = 1 and the required particular solution is y = 2x+ 1.

Of course, the above O.D.E. is very simple. Unfortunately, there is no one method for solving O.D.E.s- this should be reminiscent of how there is no single method for calculating all integrals. Indeed, somecan only be attacked by numerical methods. However, because they are so important and so commonin the sciences, many special methods have been developed and we take the following

General Approach:

(1) Classify the type and write the O.D.E. in a standard form.(2) Apply a suitable method to solve the O.D.E.


2.2. First order O.D.E.s.

These involve only first derivatives, so the general solution involves one constant, and hence just onecondition is needed to determine a particular solution.

Separable O.D.E.sThis method works for an O.D.E. of the form

dy

dx= f(x)g(y).

Method of solutionCross-multiply and integrate to obtain ∫

dy

g(y)=

∫f(x) dx.

Solving such an O.D.E is only as hard as finding these two integrals.

Example 2.2. Solve

dy

dx= (1 + x)

(1 + y2

2y

), y(0) = 2.

We just cross-multiply and integrate:∫2y

1 + y2dy =

∫(1 + x) dx

=⇒ log(1 + y2

)= x+ x2/2 +A

=⇒ 1 + y2 = ex+x2/2+A

= ex+x2/2eA

= Bex+x2/2 where B = eA

=⇒ y = ±√Bex+x2/2 − 1

This is the general solution and using the given initial condition,

2 = y(0) = ±√B − 1

we obtain B = 5 and the required solution:

y = ±√

5ex+x2/2 − 1.

But wait: it seems there are two solutions ? However, in order to obtain y(0) = 2, we must only takethe positive square root.

We aren’t always as lucky as in that example - it was obvious the equation was separable. Sometimeswe have to work a bit harder to get it in the correct form.

Example 2.3. Solve

2ydy

dx− x− y2 = 1 + xy2.

Rewrite this as

dy

dx=

1

2y

(1 + xy2 + x+ y2

)=

1

2y

(1 + x+ (1 + x) y2

)=

1

2y(1 + x)

(1 + y2

)= (1 + x)

(1 + y2

2y

)This the now the same O.D.E. as in the previous example.


Homogeneous O.D.E.sThis method works for an O.D.E. of the form

dy

dx= f

(yx

).

Method of solutionMake the substitution u =

y

x. Then applying the product rule to y = ux gives

dy

dx= u+

du

dxx

so the O.D.E. becomes

u+du

dxx = f(u).

This can be rewritten

du

dx=

1

x(f(u)− u)

which is an example of a separable O.D.E and we can now apply the previous method to find u andhence y = ux.

Example 2.4. Solve (i.e. find the general solution of)

dy

dx=(yx

)2+y

x+ 1.

Put u =y

xso that

dy

dx= u+

du

dxx. Then the O.D.E. becomes

u+du

dxx = u2 + u+ 1

=⇒ dudx

=u2 + 1

x

=⇒∫

du

u2 + 1=

∫dx

x

=⇒ arctanu = log |x|+A=⇒ u = tan(log |x|+A).

Hence the general solution is

y = x tan(log |x|+A).

Again, it is not always obvious that an O.D.E. is homogeneous.

Example 2.5. Solve

x2dy

dx= y2 + xy + x2.

Dividing both sides by x2 gives the O.D.E. in the previous example.

If we are given an O.D.E. in the formdy

dx= g(x, y), then we can generally tell if it is homogeneous by

checking that

g(λx, λy) = g(x, y) for all constants λ.

This says that g(x, y) really only depends on the ratio of y and x so can be written as f(y/x).


Linear O.D.E.sThis method works for an O.D.E. of the form

dy

dx+ f(x)y = g(x).

Method of solutionWe define a new function, called the integrating factor, by

h(x) = e∫f(x) dx.

By using the composite rule, one sees that this satisfiesdh

dx= f(x)h(x). Now multiply both sides of

the O.D.E. by h(x), obtaining

h(x)dy

dx+ h(x)f(x)y = h(x)g(x)

=⇒ h(x)dydx

+dh

dxy = h(x)g(x)

=⇒ ddx

(h(x)y) = h(x)g(x).

We can now integrate both sides to get

h(x)y =

∫h(x)g(x) dx+A

and so we obtain the general solution

y =1

h(x)

(∫h(x)g(x) dx+A

).

Example 2.6. Solve

e−xdy

dx+ 2xe−xy = 2x+ 1, y(0) = 3.

We first write it in the standard formdy

dx+ 2xy = ex (2x+ 1) .

The integrating factor is

h(x) = e∫2x dx = ex

2

.

Now multiply both sides by ex2

to obtain

ex2 dy

dx+ ex

2

2xy = ex+x2

(2x+ 1)

=⇒ ddx

(ex

2

y)

= ex+x2

(2x+ 1)

=⇒ ex2

y =

∫ex+x

2

(2x+ 1) dx = ex+x2

+A


y = ex +Ae−x2

.

To find A, note 3 = y(0) = e0+Ae0 = 1+A and so A = 2. Hence the required solution is y = ex+2e−x2

.

Some common integrating factors

(1) For dydx +yx = g(x), the integrating factor is

h(x) = e∫

1x dx = elog |x| = |x|.

Note: The O.D.E. holds for x > 0, in which case |x| = x, or for x < 0 in which case |x| = −x.(2) For dydx −

yx = g(x), the integrating factor is

h(x) = e−∫

1x dx = e− log |x| =

1

|x|.

(3) For dydx +2yx = g(x), the integrating factor is

h(x) = e∫

2x dx = e2 log |x| = x2.


Exact O.D.E.sThis method works for an O.D.E. of the form

Q(x, y)dy

dx+ P (x, y) = 0

where the 2-variable functions P and Q satisfy the following exactness condition:

∂Q

∂x=∂P

∂y.

Method of solutionFirst find a function f(x, y) such that

∂f

∂x= P and

∂f

∂y= Q.

Then the general solution y(x) is given implicitly by the equation

f(x, y) = A

where A is an arbitrary constant.

To see why this works, notice that by using the Chain Rule and the exactness condition,

d

dx(f(x, y(x))) =

∂f

∂x

dx

dx+∂f

∂y

dy

dx= P +Q

dy

dx.

However, f is constant so this derivative is zero and so the O.D.E is satisfied.

To see how this works, i.e. how does one find the f(x, y), we essentially integrate P and Q, asdemonstrated in the next example.

Example 2.7. Solve the O.D.E.

(6xy + y)dy

dx+ 2x+ 3y2 = 0.

Here

Q(x, y) = 6xy + y and P (x, y) = 2x+ 3y2

so∂Q

∂x= 6y and

∂P

∂y= 6y.

Since these are equal, the equation is exact and so we try to find f(x, y) satisfying

∂f

∂x= P = 2x+ 3y2 and

∂f

∂y= Q = 6xy + y.

Integrate the first of these with respect to x to get

f(x, y) =

∫ (2x+ 3y2

)dx = x2 + 3xy2 + g(y).

Notice here that we are treating y as a constant, hence the “constant of integration” g(y) is actually afunction of y. Now differentiate this partially with respect to y to get

∂f

∂y= 6xy + g′(y).

But we want this to equal Q = 6xy + y. Hence we can take g′(y) = y, i.e. g(y) = 12y2, and so

f(x, y) = x2 + 3xy2 +1

2y2 = x2 + y2

(3x+

1

2

).

Thus the solutions of the original O.D.E. are given by

x2 + y2(

3x+1

2

)= A.

Generally, leaving the solution in the implicit form f(x, y) = A is fine. However, in this particular case,we can rearrange to find y in terms of x

y = ±√

2A− 2x26x+ 1

.


2.3. Second order linear constant coefficient O.D.E.s: homogeneous case.

A second order linear constant coefficient O.D.E. is of the form

ay′′ + by′ + cy = r(x)

where a, b, c are constants and r(x) is a function of x. In this section, we’ll consider the homogeneouscase, meaning r(x) = 0. (Note this is a different meaning of “homogeneous” to the one we saw earlier.)

Hence we are solving

ay′′ + by′ + cy = 0.

An important general fact about such O.D.E.s is the following:

Principle of Superposition: If y1(x), y2(x) are solutions, then so is Ay1(x) +By2(x), where A,Bare constants.

Since the O.D.E is second order, the general solution will involve two arbitrary constants and will beof the form

y(x) = Ay1(x) +By2(x)

where y1(x), y2(x) are two different solutions.

Method of solutionForm the auxiliary equation (sometimes called the characteristic equation)

aλ2 + bλ+ c = 0

with roots

λ =−b±

√b2 − 4ac

2a.

Case 1: b2 > 4ac. In this case the auxiliary equation has two distinct real roots λ1, λ2 say. Thegeneral solution is then

y(x) = Aeλ1x +Beλ2x.

Case 2: b2 = 4ac. In this case the auxiliary equation has just one (repeated) real root λ1, say. Thegeneral solution is then

y(x) = (Ax+B)eλ1x.

Case 3: b2 < 4ac. In this case the auxiliary equation has two distinct complex roots, say

λ1, λ2 = p± iq =−b± i

√4ac− b2

2a.

The general solution is then

y(x) = epx (A cos qx+B sin qx) .

Before seeing some examples, we will briefly explain why these are the solutions in each case.

Check for Case 1First let y(x) = eλ1x. Then y′ = λ1e

λ1x and y′′ = λ21eλ1x and so

a y′′ + b y′ + c y = aλ21eλ1x + bλ1e

λ1x + ceλ1x

= eλ1x(aλ21 + bλ1 + c

)= 0

because λ1 is a root of the auxiliary equation. Similarly, eλ2x is a solution and the Principle of

Superposition then gives us the general solution.

Check for Case 2


In exactly the same way as above, y(x) = eλ1x is a solution. So now consider y(x) = x eλ1x. Then

y′ = xλ1eλ1x + eλ1x and y′′ = xλ21e

λ1x + λ1eλ1x + λ1e

λ1x

=(xλ21 + 2λ1

)eλ1x

hence

a y′′ + b y′ + c y = a(xλ21 + 2λ1

)eλ1x + b (xλ1 + 1) e

λ1x + c x eλ1x

= xeλ1x(aλ21 + bλ1 + c

)+ (2aλ1 + b) e

λ1x.

Now aλ21 + bλ1 + c = 0 since λ1 is a root of the auxiliary equation. Also,

λ1 =−b±

√b2 − 4ac

2a= − b

2a

and so 2aλ1 + b = 0 as well. Hence y(x) = x eλ1x is a solution to the O.D.E. and the Principle of

Superposition then gives us the general solution.

Check for Case 3We could do this in a similar way - just substitute y(x) = epx cos qx and y(x) = epx sin qx into theO.D.E. and check it works. Alternatively, we could say the following: notice that just as in the firstcase, eλ1x and eλ2x are solutions. However,

eλ1x = epxeiqx and eλ2x = epxe−iqx

= epx cos(qx) + iepx sin qx = epx cos(qx)− iepx sin qx

and so any linear combination of eλ1x and eλ2x is also a linear combination of epx cos qx and epx sin qx.

Example 2.8. Solve2y′′ + 3y′ − 2y = 0.

The auxiliary equation

2λ2 + 3λ− 2 = 0i.e. (λ+ 2) (2λ− 1) = 0

has roots λ = 12 and −2 so the general solution is

y = Aex/2 +Be−2x.

Example 2.9. Solve

y′′ + y′ + y = 0 where y(0) = 1, y′(0) =1

2.

The auxiliary equation λ2 + λ+ 1 = 0 has roots

λ =−1±

√1− 4

2= −1

2± i√

3

2.


y(x) = e−x/2

(A cos

√3

2x+B sin

√3

2x

).

We also need to find A and B to satisfy the initial conditions. Differentiating gives

y′(x) = −12e−x/2

(A cos

√3

2x+B sin

√3

2x

)+ e−x/2

(−√

3

2A sin

√3

2x+

√3

2B cos

√3

2x

).

Hence

1 = y(0) = A and1

2= y′(0) = −1

2A+

√3

2B

so A = 1, B = 2√3. The required solution is

y(x) = e−x/2

(cos

√3

2x+

2√3

sin

√3

2x

).


Example 2.10. Solve

y′′ + 4y′ + 4y = 0.

The auxiliary equation λ2 + 4λ+ 4 = (λ+ 2)2

= 0 has a repeated root λ = −2 so the general solutionis

y = (Ax+B)e−2x.

2.4. Applications: LCR circuits and damped oscillations.

Recall the example of an LCR circuit from earlier. If we assumethat the input voltage V (t) is constant then dV/dt = 0 and thecurrent I(t) satisfies

Ld2I

dt2+R

dI

dt+I

C= 0.

Equations of this type appear in many different situations; forinstance, an equivalent mathematical model appears in the mo-tion of a shock absorber, as follows.

V

L

C

R

Suppose we have a mass m attached to a spring and connectedto a piston which resists the motion of the mass. Then the posi-tion y(t) of the mass at time t satisfies

md2y

dt2+ c

dy

dt+ ky = 0

where k is the modulus of the spring and c is the damping con-stant of the piston.

We will now use the results of the last section to describe qualitatively the types of behaviour that suchsystems can exhibit. In particular, for the damped spring system we introduce the following positivereal parameters:

ω0 =

√k

mand ζ =

c

2√mk

.

Here ω0 is called the natural frequency and ζ is the damping ratio. The differential equation can thenbe written as

d2y

dt2+ 2ζω0

dy

dt+ ω20y = 0.

Applying the method from the last section, the auxiliary equation is

λ2 + 2ζω0λ+ ω20 = 0

which has solutions

λ1, λ2 =−2ζω0 ±

√4ζ2ω20 − 4ω202

= −ζω0 ± ω0√ζ2 − 1.

The three different cases thus correspond to ζ > 1, ζ = 1 and ζ < 1 respectively.


Overdamping (ζ > 1)

In this case, the two roots λ1 = −ζω0 +ω0√ζ2 − 1 and λ2 = −ζω0−ω0

√ζ2 − 1 are both real and the

general solution is given by

y(t) = Aeλ1t +Beλ2t.

Notice that λ2 < λ1 < 0 since the parameters ω0, ζ are positive and ζ =√ζ2 >

√ζ2 − 1. Hence y(t)

is a sum of two decaying exponential terms and the motion rapidly decreases to zero. The followinggraph displays a few possible solutions with y(0) fixed and various y′(0).

t

y(t)

Critical damping (ζ = 1)In this case, the two roots are equal to λ = −ζω0 and the general solution is given by

y(t) = (At+B)eλt.

Since λ < 0, the exponential term is decaying and the motion rapidly decreases to zero similarly to theprevious case. The following graph displays a few possible solutions.

t

y(t)

Underdamping (ζ < 1)

In this case, the two roots λ1 = −ζω0+ω0√ζ2 − 1 and λ2 = −ζω0−ω0

√ζ2 − 1 are complex conjugates.

In fact, if we write ω = ω0√

1− ζ2, then these two roots are

λ1, λ2 = −ζω0 ± iω.

Hence the general solution is given by

y(t) = e−ζω0t (A cosωt+B sinωt) .

This time, the solution consists of a decaying exponential term multiplied by a trigonometric termwhich oscillates with period ω.

t

y(t)


2.5. Second order linear O.D.E.s: non-homogeneous case. .

We’ll now consider non-homogeneous second order linear O.D.E.s. These are of the form

ay′′ + by′ + cy = r(x)

for some function r(x) which is not zero. Notice now that the Principle of Superposition doesn’t worksince if y1(x) and y2(x) are solutions, then y = y1 + y2 satisfies

ay′′ + by′ + cy = a(y′′1 + y′′2 ) + b(y

′1 + y

′2) + c(y1 + y2) = 2r(x).

However, something similar does help: let yc(x) be the general solution of the corresponding homoge-neous equation

ay′′ + by′ + cy = 0.

We call yc(x) the complementary function of the O.D.E., which we can find using the methods of theprevious sections.

Proposition 2.11. Suppose yp(x) is any solution of ay′′+ by′+ cy = r(x). Then y(x) = yc(x) + yp(x)

is the general solution of ay′′ + b, y′ + cy = r(x).

Proof. We have

ay′′ + by′ + cy = a(y′′c + y′′p ) + b(y

′c + y

′p) + c(yc + yp)

= (ay′′c + by′c + cyc) + (ay

′′p + by

′p + c, yp)

= 0 + r(x)

Also, this solution y = yc + yp will have two arbitrary coefficients coming from the complementaryfunction yc(x).

Thus to solve the non-homogeneous O.D.E., we now just need to find a particular solution yp(x). Whenr(x) is of a familiar form, one way to do this is the Method of Undetermined Coefficients. Thisinvolves looking at r(x) and making an educated guess as to the shape of yp(x), knowing that thederivatives of yp(x) have to somehow add up to give r(x). For instance, consider the following table.

r(x) yp(x)

keωx Ceωx

k0 + k1x+ · · ·+ knxn C0 + C1x+ · · ·+ Cnxn

k cosωx C cosωx+D sinωx

k sinωx C cosωx+D sinωx

Basic Rule: If r(x) is one of the functions in the first column of the table, choose the correspondingfunction yp(x) in the second column and find the undetermined coefficients by substituting yp(x) andits derivatives into the non-homogeneous O.D.E.

Sum Rule: If r(x) is a sum of functions in the first column of the table, then choose for yp(x) thesum of the corresponding functions in the second column.

Modification Rule: If r(x) is a solution of the corresponding homogeneous equation ay′′+by′+cy = 0then multiply your choice of yp(x) by x (or by x

2 if this solution corresponds to a double root of theauxiliary equation).

We’ll now demonstrate with a series of examples.

Example 2.12. Solve the equation

y′′ + 3 y′ + 2 y = 4x2 + 1.

We first find the complementary function yc(x), i.e. the general solution of the corresponding homo-geneous O.D.E.

y′′ + 3 y′ + 2 y = 0.


This has auxiliary equation λ2 + 3λ+ 2 = 0, i.e. (λ+ 2)(λ+ 1) = 0, so the complementary function is

yc = Ae−2x +Be−x.

We now look for a particular solution; the table suggests trying

yp = c2x2 + c1x+ c0

for some constants c0, c1 and c2 to be determined. Now

y′p = 2c2x+ c1 and y′′p = 2c2

so

y′′p + 3y′p + 2yp = 2c2 + 3 (c1 + 2c2 x) + 2

(c0 + c1 x+ c2 x

2)

= 2c2x2 + (6c2 + 2c1)x+ (2c2 + 3c1 + 2c0) = 4x

2 + 1.

This is an identity between polynomials so we can compare the coefficients:

coefficients of x2: 2c2 = 4 so c2 = 2

coefficients of x: 6c2 + 2c1 = 0 so c1 = −6constant coefficients: 2c2 + 3c1 + 2c0 = 1 so c0 = 15/2

Hence the general solution of the O.D.E is y = yc + yp, i.e.

y = Ae−2x +Be−x︸︷︷︸complementary function

+ 2x2 − 6x+ 152︸︷︷︸

particular solution

Note that there are two arbitrary constants, as expected.

Example 2.13. Solve

y′′ + 2y′ + 3y = 6 cos 3x given that y(0) =1

2, y′(0) =

1

2−√

2.

The corresponding homogeneous O.D.E. y′′ + 2y′ + 3y = 0 has auxiliary equation

λ2 + 2λ+ 3 = 0

which has roots−2±

√4− 12

2= −1± i

√2

so the complementary function is

yc = e−x(A cos

√2x+B sin

√2x).

The method says that we should try for a particular solution of the form

yp = C cos 3x+D sin 3x.

Then

y′p = −3C sin 3x+ 3D cos 3x and y′′p = −9C cos 3x− 9D sin 3x.

Substituting into the O.D.E. gives

y′′p + 2y′p + 3yp = −9C cos 3x− 9D sin 3x+ 2(−3C sin 3x+ 3D cos 3x) + 3(C cos 3x+D sin 3x)

= (6D − 3C) cos 3x+ (−6D − 6C) sin 3x = 6 cos 3x.We can now compare the coefficients of cos 3x and sin 3x to get

coeffs of cos 3x: − 6C + 6D = 6coeffs of sin 3x: − 6C − 6D = 0

So D = 12 , C = −12 and the general solution y = yc + yp is

y = e−x(A cos

√2x+B sin

√2x)

︸︷︷︸complementary function

−12

cos 3x+1

2sin 3x︸︷︷︸

particular integral

.


In this example, we also have some initial conditions; we use these to find A and B. Now

y(0) = A− 12

=1

2so A = 1.

Also,

y′ = −e−x(A cos

√2x+B sin

√2x)

+ e−x(−A√

2 sin√

2x+B√

2 cos√

2x)

+3

2sin 3x+

3

2cos 3x

and

y′(0) = −A+B√

2 +3

2=

1

2−√

2 so B = −1.

The required solution is thus

y = e−x(

cos√

2x− sin√

2x)− 1

2cos 3x+

1

2sin 3x.

Remark 2.14. When there are initial conditions as in this last example, it is important to get the stepsin the right order. To solve

ay′′ + by′ + cy = r(x)

the general procedure is

(1) Find the complementary function yc(x) i.e. the general solution of corresponding homogeneouslinear O.D.E. (this involves constants A, B).

(2) Find a particular solution yp(x) (this has no arbitrary constants).(3) Find the general solution by adding yc(x) to yp(x).(4) Find A and B using the initial conditions (if required).

Do not do step (4) straight after step (1). The initial conditions apply to the inhomogeneous equation,not the corresponding homogeneous one.

Example 2.15. Solve

3y′′ − 2y′ − y = ex given that y(0) = 0, y′(0) = 1.

The corresponding homogeneous equation 3y′′ − 2y′ − y = 0 has auxiliary equation3λ2 − 2λ− 1 = 0 i.e. (3λ+ 1)(λ− 1) = 0

which has roots λ = − 13 , 1. Hence the complementary function is

y(x) = Ae−x/3 +Bex.

Normally we should look for a particular solution the form yp(x) = cex but here ex is already a solution

of the homogeneous equation (it is yc(x) with A = 0, B = 1). Thus we use the modification rule andtry

yp(x) = cxex

instead. In that case,y′p = ce

x + cxex and y′′p = 2cex + cxex

so substituting into the O.D.E. gives

3y′′p − 2y′p − yp = 3(2cex + cxex)− 2 (cex + cxex)− cxex

= 4cex = ex.

Hence c = 14 and the general solution is

y = Ae−x/3 +Bex︸︷︷︸complementary function

+1

4xex︸︷︷︸

particular integral

.

Now we use the initial conditions to find A and B. Firstly,

y(0) = A+B = 0.


Also,

y′ = −A3e−x/3 +Bex +

1

4ex +

1

4xex

so

y′(0) = −A3

+B +1

4= 1.

Solving these two equations gives A = − 916 , B =916 and the required solution is

y = − 916e−x/3 +

9

16ex +

1

4xex.

Example 2.16. Solvey′′ + 2y′ + y = e−x.

The corresponding homogeneous equation y′′ + 2y′ + y = 0 has auxiliary equation

λ2 + 2λ+ 1 = 0 i.e. (λ+ 1)2 = 0

which has equal roots λ = −1, −1. The complementary function is thusyc(x) = (A+Bx)e

−x.

Normally we should look for a particular solution of the form yp(x) = ce−x but here e−x is a solution

of the homogeneous equation (it’s yc(x) with A = 1, B = 0), and so is xe−x (it’s yc(x) with A = 0,

B = 1). So the modification rule says we should look for a particular solution of the form

yp(x) = cx2e−x.

Then

y′p = 2cxe−x − cx2e−x and y′′p = 2ce−x − 2cxe−x − 2cxe−x + cx2e−x

= 2ce−x − 4cxe−x + cx2e−x

Substituting into the O.D.E. gives

y′′p + 2y′p + yp = (2ce

−x − 4cxe−x + cx2e−x) + 2(2cxe−x − cx2e−x

)+ cx2e−x

= 2ce−x = e−x

So c = 12 and the general solution is

y = (A+Bx)e−x︸︷︷︸complementary function

+1

2x2e−x︸︷︷︸

particular integral

.

Everything we have done applies similarly to linear constant coefficient O.D.E.s of higher order, aswe’ll now demonstrate.

Example 2.17. Solvey(4) − y = x

subject to the initial conditions

y(0) = 3, y′(0) = 1, y′′(0) = −1, y′′′(0) = 4.

[Notice this 4th order equation has 4 initial conditions.]

The auxiliary equation of the corresponding homogeneous O.D.E. is λ4− 1 = 0. This has roots ±1, ±iso the complementary function is

yc(x) = Aex +Be−x + C cosx+D sinx.

Now look for a particular solution of the form

yp(x) = k0 + k1x.

Substituting givesy′′′′p − yp = −k0 − k1x = x

and so k0 = 0, k1 = −1. Thus the general solution isy(x) = Aex +Be−x + C cosx+D sinx− x.


We also have

y′(x) = Aex −Be−x − C sinx+D cosx− 1y′′(x) = Aex +Be−x − C cosx−D sinxy′′′(x) = Aex −Be−x + C sinx−D cosx

and so the applying the initial conditions leads to simultaneous equations

A+B + C = 3

A−B +D = 2A+B − C = −1A−B −D = 4

In the next chapter we will see a methodical way of solving sets of equations like these. For now,however, we’ll just use ad hoc reasoning:

• Adding the first and third equations and dividing by 2 gives A+B = 1.• Adding the second and fourth equations and dividing by 2 gives A−B = 3.• Hence A = 2, B = −1 and we also find C = 2, D = −1.

Thus the required solution is

y(x) = 2ex − e−x + 2 cosx− sinx− x.

Example 2.18. Solve

d4y

dx4− 2d

2y

dx2+ y = e3x.

The auxiliary equation of the corresponding homogeneous O.D.E. is

λ4 − 2λ2 + 1 = 0(λ2 − 1

)2= 0

(λ− 1)2 (λ+ 1)2 = 0

The roots of this are 1, 1,−1,−1, both being repeated roots. So the complementary function is

yc(x) = (A+Bx)ex + (C +Dx)e−x.

Now look for a PI of the form yp(x) = ke3x. Then

y′′′′p − 2y′′p + yp = 81ke3x − 18ke3x + ke3x

= 64ke3x = e3x

Thus k = 164 and the general solution is

y(x) = (A+Bx)ex + (C +Dx)e−x +1

64e3x.


3. Linear Algebra

Linear algebra concerns linear mappings between vector spaces. This is motivated by the study ofsystems of linear equations such as

a11x+ a12y + a13z = b1

a21x+ a12y + a23z = b2

a31x+ a12y + a33z = b3

where aij and bj are given real numbers and we would like to find x, y and z. Solving this problem isrelatively easy - just multiply, add and subtract the three equations in suitable ways to find x, y and z.However, applications of such equations appearing throughout engineering and sciences often involvemany more than 3 variables. For instance, computer models in aircraft design describing the airflowover the aircraft might be given by 2,000,000 simultaneous linear equations in 2,000,000 variables. Thusit is necessary to develop a systematic approach.

3.1. Matrices.

Whilst this may be familiar to some, we first give some define matrices and their basic operations.

An m× n matrix is a rectangular array of components (these can be real numbers, complex numbersor other mathematical objects such as polynomials) with m rows and n columns:

A =

a11 a12 · · · a1na21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

We have already met some of these; a 3-dimensional (column) vector can be considered as an 3 × 1matrix:

a =

a1a2a3

and similarly, a 3-dimensional row vector a = (a1, a2, a3) can be considered as a 1× 3 matrix.

AdditionAs with vectors, we can add two matrices in an obvious manner - add the components individually tomake the matrix sum.

a11 a12 · · · a1na21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

+b11 b12 · · · b1nb21 b22 · · · b2n...

.... . .

...bm1 bm2 · · · bmn

=a11 + b11 a12 + b12 · · · a1n + b1na21 + b21 a22 + b22 · · · a2n + b2n

......

. . ....

am1 + bm1 am2 + bm2 · · · amn + bmn

Notice that for this to make sense, the two matrices must be of the same size.

Scalar multiplicationAgain, as with vectors, we can multiply a matrix by a scalar - just do it component-wise, e.g.

λ

a11 a12 · · · a1na21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

=λa11 λa12 · · · λa1nλa21 λa22 · · · λa2n

......

. . ....

λam1 λam2 · · · λamn


Matrix multiplicationWe can also multiply two matrices in the following way:

a11 a12 · · · a1na21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

b11 b12 · · · b1rb21 b22 · · · b2n...

.... . .

...bn1 bn2 · · · bnr

=c11 c12 · · · c1rc21 c22 · · · c2r...

.... . .

...cm1 cm2 · · · cmr

where

cij = ai1b1j + ai2b2j + · · ·+ ainbnj .

Rather than memorising this complicated looking formula, it’s much easier to just remember themethod: multiply rows of the first matrix by columns of the second. For instance, consider the casewhere the first matrix is 1× 3 and the second is 3× 1:(

a1 a2 a3)b1b2

b3

= (a1b1 + a2b2 + a3b3)This is just the 1 × 1 matrix consisting of the dot product a · b of the vectors a = (a1, a2, a3) andb = (b1, b2, b3).

More generally, the (i, j)-th component of AB is the dot product of the ith row of A with the jth rowof B. Notice that this forces the length of rows of A to be equal to the length of columns of B. Inparticular, we can multiply an m× n matrix by a p× r matrix precisely when n = p, and in that case,the resulting product is an m× r matrix.For instance, we can multiply(

a bc d

)(e f gh i j

)=

(ae+ bh af + bi ag + bjce+ dh cf + di cg + dj

)but multiplying in the opposite order doesn’t make sense.

Even when AB and BA do both exist, it is important to note that they are not necessarily the same.For example, (

1 01 2

)(0 13 4

)=

(0 16 9

)but

(0 13 4

)(1 01 2

)=

(1 27 8

)One very special matrix that we need some notation for is the identity matrix

I = In =

1 0 · · · 00 1 · · · 0...

.... . .

...0 0 · · · 1

This is the square n× n matrix with 1’s down the diagonal and 0’s everywhere else. It should be clearthat for any matrix A (of suitable size so that the products exist), we have IA = A and AI = A.

A further operation on matrices that often comes up is transposition. The transpose AT of a matrix Ais formed by swapping the rows and columns:

a11 a12 · · · a1na21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

T

=

a11 a21 · · · am1a12 a22 · · · am2...

.... . .

...a1n a2n · · · amn

In particular, this turns column vectors into row vectors and vice versa.

Matrices provide a very convenient setting for simultaneous equations such as the ones at the start ofthe chapter. The equations

a11x+ a12y + a13z = b1

a21x+ a12y + a23z = b2

a31x+ a12y + a33z = b3


can be written in the form a11 a12 a13a21 a22 a23a31 a32 a33

xyz

=b1b2b3

or, more compactly,

Ax = b.

3.2. Gaussian Elimination.

When solving simultaneous equations, e.g. finding the x, y and z satisfying

a11x+ a12y + a13z = b1

a21x+ a12y + a23z = b2

a31x+ a12y + a33z = b3

there are essentially three things that we do: swap equations, multiply an equation by a constant,add a multiple of one equation to another. Notice that none of these three things alter the set ofsolutions. Gaussian elimination (sometimes called row reduction) is a methodical way applying thisfact to manipulate the equations into a form from which we can read off the solutions.

We first write down the augmented matrixa11 a12 a13 b1a21 a22 a23 b2a31 a32 a33 b3

We then attempt to apply three types of elementary row operations

• (Ri ↔ Rj): Swap rows i and j,• (Ri → cRi): Multiply the ith row by a non-zero constant c,• (Ri → Ri + cRj): Add c times the jth row to the ith row,

in some order until the augmented matrix is in the form1 × × ×0 1 × ×0 0 1 ×

(sometimes called upper triangular form). We can then read off the solution quickly.

Example 3.1. Solve the system of equations

2x+ 5y − 4z = −13x+ y + 2z = 7

x+ 2y − z = −3

We write down the augmented matrix and apply row operations:2 5 −4 −131 1 2 71 2 −1 −3

R1↔R3−−−−−→1 2 −1 −31 1 2 7

2 5 −4 −13

R2→R2−R1−−−−−−−−→1 2 −1 −30 −1 3 10

2 5 −4 −13

R3→R3−2R1−−−−−−−−→

1 2 −1 −30 −1 3 100 1 −2 −7

R2→−R2−−−−−−→1 2 −1 −30 1 −3 −10

0 1 −2 −7

R3→R3−R2−−−−−−−−→

1 2 −1 −30 1 −3 −100 0 1 3

Notice that this means

x+ 2y − z = −3y − 3z = −10

z = 3

This can be solved by backward substitution: the third equation says z = 3, from the second equationwe get y = −10− 3z = −1 and the first equation tells us x = −3− 2y + z = 2.


Remark 3.2. Notice there were lots of different choices we could have made at each step. You can thinkof the process as a game (like e.g. Sudoku) where you make moves in various orders. Some moves maybe preferable for simplicity e.g. avoiding fractions if you can, but no matter which choices you make,you always end up at the same place. Indeed, later in the course, we’ll see how some choices are betterfor computer implementations since they reduce rounding errors.

Remark 3.3. It’s also sometimes possible to take the row reduction process further so that the aug-mented matrix is in the form 1 0 0 ×0 1 0 ×

0 0 1 ×

Then it’s really easy to write down the solution - it’s just the final column. For instance, continuingthe last example in this way:1 2 −1 −30 1 −3 −10

0 0 1 3

R2→R2+3R3−−−−−−−−→1 2 −1 −30 1 0 −1

0 0 1 3

R1→R1+R3−−−−−−−−→1 2 0 00 1 0 −7

0 0 1 3

R1→R1−2R2−−−−−−−−→

1 0 0 20 1 0 −10 0 1 3


x+ 2y + 3z = 1

3x+ 2y + z = 2

x+ 5y + 5z = 3

We write down the augmented matrix and apply row operations:1 2 3 13 2 1 21 5 5 3

R2→R2−3R1−−−−−−−−→R3→R3−R1

1 2 3 10 −4 −8 −10 3 2 2

R2→−R2/4−−−−−−−→1 2 3 10 1 2 1/4

0 3 2 2

Notice here, we did two operations in one go. This is okay as long as you don’t operate on the samerow twice in a step. (If you’re not sure, just do one operation at a time.) Continuing,

R3→R3−3R2−−−−−−−−→

1 2 3 10 1 2 1/40 0 −4 5/4

R3→−R3/4−−−−−−−→1 2 3 10 1 2 1/4

0 0 1 −5/16

We can now use backward substitution:

z = −5/16y + 2z = 1/4 =⇒ y = 1/4 + 10/16 = 7/8

x+ 2y + 3z = 1 =⇒ x = 1− 7/4 + 15/16 = 3/16


x+ 2y + 3z = 1

4x+ 5y + 6z = 1

7x+ 8y + 9z = −5

1 2 3 14 5 6 17 8 9 −5

R2→R2−4R1−−−−−−−−−→R3→R3−7R1

1 2 3 10 −3 −6 −30 −6 −12 −12

R2→−R2/3−−−−−−−→R3→−R3/6

1 2 3 10 1 2 10 1 2 2

R3→R3−R2−−−−−−−−→

1 2 3 10 1 2 10 0 0 1


Something has gone wrong ! There’s no way we are going to get this into the required form. However,if we look at the equations we’ve been left with,

x+ 2y + 3z = 1

y + 2z = 1

0 = 1

it doesn’t matter what x, y and z we take, there is no way that 0 = 1. There is no solution - theequations are inconsistent.


x+ 2y + 3z = 1

4x+ 5y + 6z = 1

7x+ 8y + 9z = 1

Note: this is the same as the last example except the 1 in the last equation is now -5. Thus, doing thesame row operations gives1 2 3 14 5 6 1

7 8 9 1

R2→R2−4R1−−−−−−−−−→R3→R3−7R1

1 2 3 10 −3 −6 −30 −6 −12 −6

R2→−R2/3−−−−−−−→R3→−R3/6

1 2 3 10 1 2 10 1 2 1

R3→R3−R2−−−−−−−−→

1 2 3 10 1 2 10 0 0 0

This time,

x+ 2y + 3z = 1

y + 2z = 1

0 = 0

so there are only really two equations here. In fact, if we let z = λ, then

z = λ

y = 1− 2λx = 1− 2y − 3z = 1− 2(1− 2λ)− 3λ = −1 + λ

=⇒

xyz

=−1 + λ1− 2λ

λ

which is the parametric equation of a line. There are infinitely many solutions (and furthermore, we’vefound them all).

Gaussian elimination doesn’t require that there are the same number of variables as equations:


x+ 2y + 3z = 1

4x+ 5y + 6z = 1

This is just the first two equations from the previous example.(1 2 3 14 5 6 1

)R2→R2−4R1−−−−−−−−→

(1 2 3 10 −3 −6 −3

)R2→−R2/3−−−−−−−→

(1 2 3 10 1 2 1

)This corresponds to the two equations

x+ 2y + 3z = 1

y + 2z = 1

As in the previous example, setting z = λ gives the same result:

z = λ

y = 1− 2λx = 1− 2y − 3z = 1− 2(1− 2λ)− 3λ = −1 + λ

=⇒

xyz

=−1 + λ1− 2λ

λ

Remark 3.8. We see with these examples that there are three possibilities:


• If Gaussian elimination produces an augmented matrix of the form1 × × · · · × ×0 1 × · · · × ×0 0 1 · · · × ×...

......

. . ....

...0 0 0 · · · 1 ×

then we can use backward substitution to find a unique solution. In this case, there must bethe same number of equations as variables.• If we end up with a row that looks like (0 0 · · · 0 | ×) where the × is non-zero, then the

equations are inconsistent and there are no solutions.• Otherwise, there are infinitely many solutions. By parametrising, we can write these in the

form of a line (or a plane or a hyperplane...)

We can also think about this geometrically. For the 3 × 3 cases we did above, each equation of theform

ax+ by + cz = d

is a plane in R3. Since we are looking for common solutions to three of these, we are really finding thepoints where 3 planes intersect. Now consider how this can happen:

• They intersect in a unique point.• They don’t intersect at all e.g. one plane could be parallel to another.• The intersect in infinitely many points e.g. in a line

In the next section, we’ll see a quicker way of telling whether there is a unique solution or not, usingthe determinant.

3.3. Determinants and Inverses.

[Warning: the results in this section apply to square matrices only - determinants and inverses ofnon-square matrices don’t exist.]

Suppose we wish to solve two simultaneous equations

ax+ by = u

cx+ dy = v

In matrix form, this is means: find x =( xy

)so that Ax = u where

A =

(a bc d

)and u =

(uv

)are given. If we multiply the first equation by d, the second by a and take one from the other we get

(ad− bc)x = du− bv.Similarly, we can find (ad− bc)y = −cu+ av and so

(ad− bc)(xy

)=

(du− bv−cu+ av

)=

(d −b−c a

)(uv

).

Hence, as long as ad− bc 6= 0, there is a unique solution(xy

)=

1

(ad− bc)

(d −b−c a

)(uv

).

If ad = bc, then one can show (e.g. via Gaussian elimination as in the last section) that there is not aunique solution i.e. no solutions or infinitely many solutions. The number ad − bc is the determinantof A:

det(A) =

∣∣∣∣a bc d∣∣∣∣ = ad− bc.

Furthermore, when det(A) 6= 0, the matrix A has an inverse, i.e. a matrix B with AB = BA = I,namely

A−1 =1

(ad− bc)

(d −b−c a

)When det(A) = 0, there is no such matrix - then A is not invertible.


Similar things hold for larger square matrices, though the formulas are more complicated. For instance,you may recall the 3× 3 determinant:∣∣∣∣∣∣

a b cd e fg h i

∣∣∣∣∣∣ = a∣∣∣∣e fh i

∣∣∣∣− b ∣∣∣∣d fg i∣∣∣∣+ c ∣∣∣∣d eg h

∣∣∣∣= a(ci− fh)− b(di− fg) + c(dh− eg)

Notice that the first 2 × 2 determinant here is what’s left when you remove the row and columncontaining a in the 3× 3 determinant. Similarly for b and c and then we form the alternating sum asshown.

Similarly, we can define the 4× 4 determinants in terms of 4 smaller ones:∣∣∣∣∣∣∣∣a b c de f g hi j k lm n o p

∣∣∣∣∣∣∣∣ = a∣∣∣∣∣∣f g hj k ln o p

∣∣∣∣∣∣− b∣∣∣∣∣∣e g hi k lm o p

∣∣∣∣∣∣+ c∣∣∣∣∣∣e f hi j lm n p

∣∣∣∣∣∣− d∣∣∣∣∣∣e f gi j km n o

∣∣∣∣∣∣Hopefully you can see the general pattern.

Rather than work directly with cumbersome formulas such as these, we can use something like Gaussianelimination to simplify calculations.

Facts about determinants:

• Swapping rows i and j, (Ri ↔ Rj) multiplies the determinant by -1.• Multiplying the ith row by a constant c, (Ri → cRi) multiplies the determinant by c,• Adding c times the jth row to the ith row, (Ri → Ri + cRj) doesn’t affect the determinant,• Transposition doesn’t affect the determinant: det(AT ) = det(A). As a result, three previous

facts apply to column operations in just the same way.

In particular, we see that if there is a row of zeroes, then the determinant is zero and if two rows are thesame then again the determinant is zero. Also, we can often simplify calculations by using row/columnoperations to obtain a row/column with only one non-zero entry. For instance,

Example 3.9. Find the determinant of the matrix

A =

1 2 34 5 67 8 9

We have

det(A) =

∣∣∣∣∣∣1 2 34 5 67 8 9

∣∣∣∣∣∣ R2−4R1=R3−7R1∣∣∣∣∣∣1 2 30 −3 −60 −6 −12

∣∣∣∣∣∣ transpose=∣∣∣∣∣∣1 0 02 −3 −63 −6 −12

∣∣∣∣∣∣=

∣∣∣∣−3 −6−6 −12∣∣∣∣ = (−3)(−12)− (−6)(−6) = 0.

We can also use Gaussian elimination (i.e. row operations) to find the inverse of a square matrix.

Definition 3.10. The inverse of a square matrix A is a matrix A−1 such that

AA−1 = A−1A = I.

The determinant tells us exactly when such an inverse exists by the following important theorem (whichwe state without proof.)

Theorem 3.11. A square matrix A is invertible if and only if det(A) 6= 0.

To apply Gaussian Elimination, we use row operations to transform a ‘big’ augmented matrix:

Method: transform

A I into the form

I B

Then B = A−1 is the inverse we are looking for. If the process fails, e.g. by obtaining a row of zerosthen the inverse doesn’t exist (and hence det(A) = 0.)


Example 3.12. Find the inverse of the matrix

A =

2 5 −41 1 21 2 −1

We write down the ‘big’ augmented matrix and apply row operations:2 5 −4 1 0 01 1 2 0 1 0

1 2 −1 0 0 1

R1↔R3−−−−−→1 2 −1 0 0 11 1 2 0 1 0

2 5 −4 1 0 0

R2→R2−R1−−−−−−−−→R3→R3−2R1

1 2 −1 0 0 10 −1 3 0 1 −10 1 −2 1 0 −2

R2→−R2−−−−−−→1 2 −1 0 0 10 1 −3 0 −1 1

0 1 −2 1 0 −2

R3→R3−R2−−−−−−−−→

1 2 −1 0 0 10 1 −3 0 −1 10 0 1 1 1 −3

R2→R2+3R3−−−−−−−−→R1→R1+R3

1 2 0 1 1 −20 1 0 3 2 −80 0 1 1 1 −3

R1→R1−2R2−−−−−−−−→

1 0 0 −5 −3 140 1 0 3 2 −80 0 1 1 1 −3

Hence the inverse is

A−1 =

−5 −3 143 2 −81 1 −3

You can easily check that AA−1 = A−1A = I.

Once we have the inverse, it makes solving simultaneous equations simple e.g. recall the first examplefrom this chapter: solve the system of equations

2x+ 5y − 4z = −13x+ y + 2z = 7

x+ 2y − z = −3

In matrix form, this says Ax = b with the above matrix A and b =(−13

7−3

). Then,

Ax = b =⇒ x = A−1b =

−5 −3 143 2 −81 1 −3

−137−3

= 2−1

3

.

3.4. LU decomposition.

Sometimes, we may have to solve sets of simultaneous equations Ax = b with the same square matrixA and a number of different b. For very large matrices, computing the inverse A−1 is very slow. Thereis a more efficient method, based on the idea of factorising A = LU as a product of a lower triangularmatrix L and an upper triangular matrix U :

L =

1 0 0 · · · 0× 1 0 · · · 0× × 1 · · · 0...

......

. . ....

× × × · · · 1

and U =× × × · · · ×0 × × · · · ×0 0 × · · · ×...

......

. . ....

0 0 0 · · · ×

If we can do this, then we will be quickly able to find x using backward substitution and forwardsubstitution.


The main idea is that the row operations on A used in Gaussian elimination actually correspond tomultiplying A by some very simple matrices. For instance, let’s transform the following matrix A intoan upper triangular matrix U .

A =

1 2 32 6 10−3 0 6

EA =

1 2 30 2 4−3 0 6

=E︷︸︸︷ 1 0 0−2 1 0

0 0 1

A︷︸︸︷ 1 2 32 6 10

−3 0 6

FEA =

1 2 30 2 40 6 15

=F︷︸︸︷1 0 00 1 0

3 0 1

EA︷︸︸︷ 1 2 30 2 4

−3 0 6

U = GFEA =

1 2 30 2 40 0 3

=G︷︸︸︷1 0 00 1 0

0 −3 1

FEA︷︸︸︷1 2 30 2 4

0 6 15

R2→R2−2R1

R3→R3+3R1

R3→R3−3R2

Generally, the elementary row operation Ri → Ri + cRj corresponds to multiplying (on the left) bythe elementary matrix which has 1’s on the diagonal, c in the (i, j)th position and 0’s everywhere else.

We now have

GFEA = U =⇒ FEA = G−1U=⇒ EA = F−1G−1U=⇒ A = E−1F−1G−1U

(Note here we are being careful to multiply matrices in the correct way, remembering matrix multipli-cation is not commutative.)

It’s easy to find the inverses of E, F and G: just change the sign of the non-zero off-diagonal entry,e.g.

E−1 =

1 0 02 1 00 0 1

since EE−1 = 1 0 0−2 1 0

0 0 1

1 0 02 1 00 0 1

=1 0 00 1 0

0 0 1

Hence A = LU where

L = E−1F−1G−1 =

1 0 02 1 00 0 1

1 0 00 1 0−3 0 1

1 0 00 1 00 3 1

=

1 0 02 1 00 0 1

1 0 00 1 0−3 3 1

=

1 0 02 1 0−3 3 1

which is lower triangular as required.


Note: In an LU decomposition, the lower triangular matrix L has 1’s along the diagonal but the Udoesn’t need to.

Once we have the factorisation A = LU , we can quickly solve Ax = b in the following way:

• Use forward substitution to find u satisfying Lu = b,• Use backward substitution to find x satisfying Ux = u.

Then Ax = LUx = Lu = b and so x is the required solution.

Example 3.13. Use LU decomposition to solve

x+ 2y + 3z = 5

2x+ 6y + 10z = 16

−3x + 6z = 9

The matrix of coefficients is the matrix A we worked with above so we already have the LU decompo-sition:

A =

1 2 32 6 10−3 0 6

= LU = 1 0 02 1 0−3 3 1

1 2 30 2 40 0 3

Now solve Lu = b, i.e. 1 0 02 1 0

−3 3 1

uvw

= 516

9

by forward substitution:

u = 5

2u+ v = 16

−3u+ 3v + w = 9=⇒

u = 5

v = 16− 2u = 6w = 9 + 3u− 3v = 6

Finally, solve Ux = u, i.e. 1 2 30 2 40 0 3

xyz

=56

6

by backward substitution:

x+ 2y + 3z = 5

2y + 4z = 6

3z = 6

=⇒

x = 5− 2y − 3z = 1y = 6−4z2 = −1z = 2

Notice that it’s possible to write down the matrix L as we perform the row operations - when we doRi → Ri+ cRj , we just put −c in the corresponding (i, j)th position of L. We’ll demonstrate this witha 4× 4 matrix.

Example 3.14. Find the LU decomposition of the matrix

A =

2 −3 4 14 −3 10 −26 −15 7 13−2 9 3 −17


A =

2 −3 4 14 −3 10 −26 −15 7 13−2 9 3 −17

2 −3 4 10 3 2 −40 −6 −5 100 6 7 −16

L =

1 0 0 02 1 0 03 × 1 0−1 × × 1

2 −3 4 10 3 2 −40 0 −1 20 0 3 −8

L =

1 0 0 02 1 0 03 −2 1 0−1 2 × 1

U =

2 −3 4 10 3 2 −40 0 −1 20 0 0 −2

L =

1 0 0 02 1 0 03 −2 1 0−1 2 −3 1

R2→R2−2R1R3→R3−3R1R4→R4+R1

R3→R3+2R2R4→R4−2R2

R4→R4+3R3

3.5. Strictly Diagonally Dominant and Positive Definite matrices.

In the previous section, we didn’t consider whether or not it is actually possible to factorise A = LU .In general, answering this is complicated. However, there are two special types of matrices for whichLU decompositions always exist - strictly diagonally dominant and positive definite matrices.

Definition 3.15. Let A be the n× n matrix

A =

a11 a12 · · · a1na21 a22 · · · a2n...

.... . .

...an1 an2 · · · ann

We say A is strictly diagonally dominant (s.d.d for short) if for each 1 ≤ i ≤ n,

|aii| > |ai1|+ · · ·+ |ai,i−1|+ |ai,i+1|+ · · ·+ |ain|

In other words, in each row, the diagonal element is strictly larger in absolute value than the sum ofthe absolute values of the rest of the row.

Theorem 3.16. Let A be a strictly diagonally dominant n× n matrix. Then det(A) 6= 0 and A has a(unique) LU decomposition.

Example 3.17. Consider the matrices

A =

−4 1 21 −3 1−3 0 6

and B = 2 −1 1−1 5 2

1 2 2

Then A is s.d.d. since | − 4| > |1|+ |2|, | − 3| > |1|+ |1| and |7| > | − 2|+ | − 3|.However, B isn’t s.d.d since, for example in the third row, |2| ≤ |1|+ |2|.


We can conclude that A has an LU decomposition (though we can say nothing about B from this).

Definition 3.18. Let A be the n× n matrix

A =

a11 a12 · · · a1na21 a22 · · · a2n...

.... . .

...an1 an2 · · · ann

We say A is positive definite if it is symmetric i.e. aij = aji for all i, j and

xTAx > 0 for all vectors x 6= 0.

Theorem 3.19. Let A be a positive definite n×n matrix. Then det(A) 6= 0 and A has a (unique) LUdecomposition.

Example 3.20. Consider the matrices

A =

−4 1 21 −3 1−3 0 6

and B = 2 −1 1−1 5 2

1 2 2

The matrix A isn’t positive definite since it isn’t even symmetric. On the other hand, A is symmetricand also,

xTBx =(x y x

) 2 −1 1−1 5 21 2 2

xyz

=(x y z

) 2x− y + z−x+ 5y + 2zx+ 2y + 2z

= x(2x− y + z) + y(−x+ 5y + 2z) + z(x+ 2y + 2z)= 2x2 + 5y2 + 2z2 − 2xy + 4yz + 2zx= (x− y)2 + x2 + 4y2 + 2z2 + 4yz + 2zx= (x− y)2 + (x+ z)2 + 4y2 + z2 + 4yz= (x− y)2 + (x+ z)2 + (2y + z)2

This is a sum of squares so is non-negative. Furthermore, it’s equal to zero only whenx− y = 0x+ z = 0

2y + z = 0

=⇒ x = y = z = 0.

Hence B is positive definite.

This example gives a way of showing a matrix is positive definite. An alternative way which is ofteneasier is to apply the following:

Theorem 3.21. A symmetric matrix

A =

a11 a12 · · · a1na21 a22 · · · a2n

......

. . ....

an1 an2 · · · ann

is positive definite if and only if each of the determinants (called the principal minors of A)∣∣∣∣∣∣∣∣∣

a11 a12 · · · a1ka21 a22 · · · a2k

......

. . ....

ak1 ak2 · · · akk

∣∣∣∣∣∣∣∣∣ > 0for 1 ≤ k ≤ n.


Example 3.22. For the matrix

B =

2 −1 1−1 5 21 2 2

,the principal minors are

2 > 0,

∣∣∣∣ 2 −1−1 5∣∣∣∣ = 2(−5)− (−1)(−1) = 9 > 0

and

∣∣∣∣∣∣2 −1 1−1 5 21 2 2

∣∣∣∣∣∣ = 2∣∣∣∣5 22 2

∣∣∣∣− (−1) ∣∣∣∣−1 21 2∣∣∣∣+ ∣∣∣∣−1 51 2

∣∣∣∣= 2(10− 4) + (−2− 2) + (−2− 5) = 1 > 0

3.6. PTLU decomposition.

There are some matrices for which there is no LU decomposition. For instance, consider the matrix

A =

0 0 31 2 12 6 4

Suppose we could factorise A = LU , e.g.

A =

0 0 31 2 12 6 4

=1 0 0a 1 0b c 1

d e f0 g h0 0 i

= d × ×ad × ×bd × ×

We must have d = 0, but then we can’t also have ad = 1, contradiction.

In constructing LU decompositions we only used row operations of type Ri → Ri + cRj . We canextend the process by including swapping rows Ri ↔ Rj . This operation corresponds to multiplicationby another very simple type of matrix Pij - this is obtained by swapping the ith and jth rows of theidentity matrix. For instance, to swap rows 1 and 2 in the matrix A above:

A =

0 0 31 2 12 6 4

R1↔R2−−−−−→1 2 10 0 3

2 6 4

=0 1 01 0 0

0 0 1

0 0 31 2 12 6 4

= P12AWe can now try to make P12A upper triangular starting with the row operation R3 → R2 − 2R1:1 2 10 0 3

2 6 4

R3→R2−2R1−−−−−−−−→1 2 10 0 3

0 2 2

We can make this upper triangular by swapping rows R2 ↔ R3 which corresponds to multiplying byanother permutation matrix P23:1 2 10 0 3

0 2 2

R2↔R3−−−−−→1 2 10 2 2

0 0 3

=1 0 00 0 1

0 1 0

1 2 10 0 30 2 2

What we find is that the matrix PA, where

P = P23P12 =

1 0 00 0 10 1 0

0 1 01 0 00 0 1

=0 1 00 0 1

1 0 0

is a matrix with an LU decomposition. Indeed,

PA =

0 1 00 0 11 0 0

0 0 31 2 12 6 4

=1 2 12 6 4

0 0 3

R2→R2−2R1−−−−−−−−→1 2 10 2 2

0 0 3

and so PA = LU where

P =

0 1 00 0 11 0 0

, L =1 0 02 1 0

0 0 1

and U =1 2 10 2 2

0 0 3


The matrix P is called a permutation matrix - it has a single 1 in each row and in each column and0’s everywhere else. These have the very useful property that to find their inverses, we just take thetranspose: P−1 = PT . In particular that means that our decomposition above says

A = P−1LU = PTLU.

In general, to find a PTLU decomposition of A, one performs Gaussian elimination (i.e. apply rowoperations to make A upper triangular), whilst keeping track of which row swaps one makes - this givesthe permutation matrix P and then one finds the LU decomposition of PA.

introductionmaths.dur.ac.uk/users/daniel.evans/mes/mesnotesupdated.pdf · as above, we treat yand...

Documents