user.mendelu.czuser.mendelu.cz/marik/frvs/book/master.pdf · contents chapter i. precalculus 4 1....

Mathematics I., II.

Robert Marık

DPT. OF MATHEMATICS

FACULTY OF FORESTRY

MENDEL UNIVERSITY BRNO

E-mail address: [email protected]: www.mendelu.cz/user/marik

Contents

Chapter I. Precalculus 41. Analytical geometry of the line on the plane 42. Function 53. Basic properties of functions 9

Chapter II. Calculus 171. Limit, continuity 172. Theorems on continuous functions 273. Derivative 294. Mathematical applications of derivatives 355. Extremal problems 376. Concavity 427. Behavior of the function near infinity, asymptotes 438. Investigation of a function, curve sketching 459. Implicit differentiation 57

Chapter III. Multivariable Calculus 591. Basic concepts 592. Continuity 593. Derivative 604. Extremal problems 625. Implicit functions 67

Chapter IV. Basic Numerical Methods 681. Algebraic equations – general theory 682. Basic numerical methods for algebraic equations 723. Taylor polynomial 754. Lagrange interpolation formula 765. Least squares method 77

Chapter V. Linear Algebra 821. Algebraic linear space 822. Matrix 843. Rank of a matrix 874. Inverse matrix 905. The determinant 926. System of linear equations 957. Concluding remarks 106

Chapter VI. Integral Calculus 1071. Indefinite integral (antiderivative) 107

2

CONTENTS 3

2. Basic methods for integration 1093. Advanced methods for integration 1154. Riemann integral 1225. General theory of the Riemann integral 1246. Evaluation of the Riemann integral 1257. Applications in geometry 1278. Riemann integral on the line and half-line 128

Chapter VII. Differential Equations 1301. Motivation 1302. General ordinary differential equations 1313. ODE with separated variables 1324. First order linear ODE 1365. Second order linear ODE 140

Chapter VIII. Difference Equations 1461. Difference calculus 1462. General difference equations 1463. Linear difference equation 148

Appendix A. Graphs of basic elementary functions 152

Appendix B. Differentiation and integration formulas 155

Appendix C. Slope fields 156

CHAPTER I

Precalculus

1. Analytical geometry of the line on the plane

Let k,q∈R be real numbers. A set of the points in the plane with coordinates[x,y] which satisfy

y = kx+q (1.1)

is a line which goes through the point[0,q]. The quantityk is aslopeof this line and equation(1.1) is aslope-interceptequation of the line.If two points[x1,y1] and[x2,y2] are the points on this line, then

y1 = kx1+q and y2 = kx2 +q.

Subtracting we get

y2−y1 = k(x2−x1)

and

k =y2−y1

x2−x1.

Hence the slope is the ratio of the vertical change(y2− y1) on the line and the correspondinghorizontal change(x2−x1) on this line. This ratio is constant along the whole line.An equation of the line through the point[x1,y1] with slopek is

y−y1 = k(x−x1). (1.2)

This equation is called apoint-slopeequation of the line.Equation of any line which is not parallel to they-axis can be written in a slope-intercept and apoint-slope form. An equation of the vertical line (parallel to they-axis) trough the point[x1,y1]is

x = x1

and this equation cannot be written neither in the slope-intercept nor in the point-slope form,since the slope of the vertical line is undefined.

Example 1.1. Given pointsA = [1,3], B = [3,−2], find equations of the linesp, q, r defined asfollows:

(i) the line p goes through the pointsA andB,(ii) the lineq is a horizontal line through the pointB,

(iii) the line r is a vertical line throughA.Solution:We draw the picture.

4

2. FUNCTION 5

A

B

r

q

p

The equation of the lineq is y = −2, the slope of this line isk = 0.The equation of the liner is x = 1. The slope of the liner is undefined.To find the equation of the linep we establish the slope of this line.

k =−2−33−1

= −52.

The equation of this line is

y−3 = −52(x−1)

and from here

y =112− 5

2x.

2. Function

Definition (function). Let A andB be nonempty sets of real numbers.

Let f be a rule which associates each elementx of the setA with exactly one elementy of thesetB. The rule f is said to be afunction defined onA. We write f : A→ B. If f associatesxwith y, we writey = f (x).

The variablex is customary called anindependent variableandy adependent variable.

The setA is called adomainof the functionf and denoted byDom( f ).

The setB is atarget set. The subset of all that elementsy of the setB which are generated by theelements fromDom( f ) is called animage(or range) of the functionf and denoted byIm( f ).

Remark 2.1. In plain words, the functionf is a rule which associates a real numberx from thedomain with another, exactly defined, numbery from the image of the functionf . This rule canbe expressed in the form

“y = formula-containing-x ” ,

e.g.y =sin(x2+1)

x. Such a function is said to be in itsexplicit form.

The rule which defines the functionf can be also expressed in the form

“ formula-containing-variables-x-and-y = 0”,

e.g. x−y− lny = 0. As a particular example, the value of this function atx = 2 is the (unique)solution of the equationy+ lny = 2. Such a function is said to be in itsimplicit form.Roughly speaking, the explicit form is “effective” and the implicit form “ineffective” for calcu-lating the values of the function.

2. FUNCTION 6

Remark 2.2 (terminology). If the point a belongs toDom( f ), we say that the functionf isdefined at the pointa or that the valuef (a) is well-defined. Usually, a function is given by aformula, without specifying the domain. In this case the domain is understood to be the set of allnumbers for which the formula makes any sense. Such a domain is called anatural domainofthe function (see Remark 2.8 below).

Definition (graph). Let f be a function.

A graphof the function f is the set of all of the points in the plane[x,y] ∈ R2 with property

y = f (x).

If 0 ∈ Dom( f ), then the valuef (0) is well-defined and the point point[0, f (0)] on the graph iscalled any-interceptof the graph.

The numberx∈ Dom( f ) which satisfiesf (x) = 0 is called aroot or azeroof the functionf .

If the numberx0 is a zero of the functionf , then the point[x0,0] on the graph is called anx-interceptof the graph.

Remark 2.3. Any function possesses at most oney-intercept and may have an arbitrary numberof x-intercepts (including no or infinitely many intercepts). These intercepts are intersections ofthe graph with axes.

Remark 2.4. Concerning the functionf , we formulate the following important problems:(i) Givenx∈ Dom( f ), find y which satisfiesy = f (x).

(ii) Given y∈ Im( f ), find all of x which satisfyy = f (x).(iii) Given a valuec∈ R, solve the inequalitiesf (x) > c, f (x) ≥ c, f (x) ≤ c and f (x) < c.

If the graph of the functionf is known, these problems can be very easily solved in a graphicalway. (Explain and illustrate on some particular examples.)

Definition (composite function). Let f : A→ B andg : B→C be functions. Under acompositefunction g◦ f (read “g of f ”) we understand the function defined for everyx∈ A by the relation

(g◦ f )(x) = g( f (x)).

The functionf is said to be aninside functionandganoutside functionof the composite functiong◦ f .

Example 2.1.For f (x) = x2 +1 andg(x) = x+2 find (g◦ f ) and( f ◦g).Solution:

(g◦ f )(x) = g( f (x)) = g(x2 +1) = (x2 +1)+2 = x2 +3

( f ◦g)(x) = f (g(x)) = f (x+2) = (x+2)2+1 = x2 +4x+5

Remark 2.5. Schematically we can illustrate the composite function as follows.

x f(x) g( f (x))

f g

g◦ fRemark 2.6 (basic elementary functions, elementary functions). Several functions which arecommon in applications have special names. This concerns especially the following functions:

2. FUNCTION 7

(i) Power function y= xα , α ∈ R

(ii) Exponential function y= ax, a∈ R+ \{1}

(iii) Logarithmic function y= logax, a∈ R+ \{1}

(iv) Trigonometric functions y= sinx, y = cosx, y = tgx, y = cotgx(v) Inverse trigonometric functions y= arcsinx, y = arccosx, y = arctgx, y = arccotgx

The functions in the preceeding list are calledbasic elementary functions. Values of these func-tions can be computed on most of the mini-calculators. Graphs of these functions can be foundon the page 152.A sum of constant multiples of power functions with a nonnegative integer exponent is called apolynomial, e.g.y= 2x3−4x+1 is a polynomial. The highest power in the polynomial is calledadegreeof the polynomial. (For more detailed discussion about polynomials see page 68.)

The quotient of two polynomials is called arational function, e.g. y =x

x3−2x+1is a rational

function.The function f which can be represented by a single formulay = f (x), where the expressionon the right-hand side is made up of basic elementary functions and constants by means of afinite number of the operations addition, subtraction, multiplication, division and composition iscalled anelementary function. In the exercises we will almost exclusively work with elementaryfunctions, since this is sufficient to describe most of the phenomena in scientific and economicalapplications. For example the function

f : y =e1−cosx

sin2xln

(

x+x2 +arctgx−4)

is (rather complicated) elementary function.Another class of non-elementary functions with wide applications in real world problems isa class ofpiecewise-defined functions. This class consists from the functions which are notelementary, but the domain of these functions can be dividedinto a finite number of subintervalssuch that the function is elementary in each of these subintervals. For example

f : y =

sinx for x≤ 0

x2 +3x−4 for 0< x≤ 10

ex for x > 10

is a piecewise-defined function consisting from three elementary functions.Finaly, the infinite series

y = x+x3

1 ·3 +x5

2 ·3 +x7

3 ·2 ·1 ·3+x9

4 ·3 ·2 ·1 ·3+x11

5 ·4 ·3 ·2 ·1 ·3+ · · ·

is not an elementary function

Remark 2.7. In calculus we are interested especially in the compositionof the basic elementary

functions. Hencey = ln(x+1), y = arcsin√

x andy = sinx2

are treated as composite functions.

On the other hand, the basic elementary functions likey = ex andy = tgx are not compositefunctions.

Remark 2.8 (natural domain of the composite function). To establish a natural domain of acomposite function we usually have to solve a system of (in general nonlinear) inequalities.When writing these inequalities we take care about the functions which yield a restriction on its

2. FUNCTION 8

domain, i.e. which are not defined for all real numbers. We summarize the basic elementaryfunctions with domain different fromR into the following table.

The function . . . is well-defined if . . .f (x)g(x)

g(x) 6= 0 quotient

ln f (x) f (x) > 0 logarithm2k√

f (x), k∈ N f (x) ≥ 0 even rootarcsinf (x) −1≤ f (x) ≤ 1arccosf (x) −1≤ f (x) ≤ 1

Example 2.2.Find a natural domain of the functionf : y = 2+xarcsin4x

.

Solution:Restrictions from the arcsin function and from the quotientgive

−1≤ 4x≤ 1 and x 6= 0.

Let us solve the inequality−1≤ 4x

first. The equation

−1 =4x

yields x = −4. The functiony =4x

is not defined forx = 0. We mark the pointsx = −4 and

x = 0 on the real axis. This divides a real axis into several subintervals. Each subinterval has theproperty that the inequality is either true for allx or false for allx from this subinterval (for anexplanation see Algorithm 2.1, page 28). Therefore it is sufficient to consider a test point in eachof the subintervals , e.g.x = −5, x = −1 andx = 1. We put these values ofx into the inequalityand obtain

x = −5 : −1≤−45

⇒ true,

x = −1 : −1≤−4 ⇒ false,

x = 1 : −1≤ 4 ⇒ true.

The following simple scheme illustrates the situation.!−4

# ◦0

!Hence the solution of−1≤ 4

xis x∈ (−∞,−4]∪ (0,∞).

Now let us solve the inequality4x≤ 1. The solution of

4x

= 1 is x = 4 and the functiony =4x

is

undefined forx= 0. We divide the real axis by the pointsx= 0 andx= 4 into three subintervals.From each subinterval we choose a test number and put this number into the inequality. Thisprocess reveals the subintervals where the inequality holds and the subintervals where it fails.

x = −1 : −4≤ 1 ⇒ true,

x = 1 : 4≤ 1 ⇒ false,

x = 5 :45≤ 1 ⇒ true.

3. BASIC PROPERTIES OF FUNCTIONS 9! ◦0

#4

!Hence the solution of

4x≤ 1 isx∈ (−∞,0)∪ [4,∞).

The intersection of the sets(−∞,−4]∪ (0,∞) and(−∞,0)∪ [4,∞) gives the domain

Dom( f ) = (−∞,4]∪ [4,∞).

Remark 2.9 (shifted and resized graphs of basic elementary functions). Suppose that a graphof the functiony = f (x) is known (this concerns especially the cases when the function f is abasic elementary function, see page 152). Letc denotes a positive quantity. The following tableenables to sketch graphs of several functions which have “almost” the same graph asf has.

To obtain the graph of . . . shift the graph ofy = f (x) . . .y = f (x)+c c units upwardy = f (x)−c c units downwardy = f (x+c) c units to the lefty = f (x−c) c units to the right

The following table describes functions with graph which can be constituted from the graph ofthe functiony = f (x) by a reflection.

To obtain the graph of . . . reflect the graph ofy = f (x) . . .y = − f (x) about thex-axisy = f (−x) about they-axis

The following table describes functions with graph which can be constituted from the graph ofthe functiony = f (x) by a resizing. Leta > 1 be a real number.

To obtain the graph of . . . do this with the graph ofy = f (x) . . .y = a f(x) stretcha-times vertically

y =1a

f (x) restricta-times vertically

y = f

(

1a

x

)

stretcha-times horizontally

y = f (ax) restricta-times horizontallyBy stretching the graph of the function vertically we understand the process which multiplies they-coordinate of each point on the graph by the numbera. As a result, each point on the graphwill have a-times greater distance from the horizontal axis. In a similar way we understand theremaining operations.

3. Basic properties of functions

Definition (odd and even function). Let f be a function with domainDom( f ). Let for everyx∈ Dom( f ) also−x∈ Dom( f ).

(i) The function f is said to beeven, if f (−x) = f (x) holds for everyx∈ Dom( f ).(ii) The function f is said to beodd, if f (−x) = − f (x) holds for everyx∈ Dom( f ).

Arrangement: The function is said to have a parity if it is either odd or even.

Remark 3.1(graphical consequence). The graph of an even function is symmetric about the linex = 0 (about they-axis). The graph of an odd function is symmetric about the origin.

3. BASIC PROPERTIES OF FUNCTIONS 10

The parity of a rational function can be determined very easily, without computing anything, wecan simply use the following theorem. On the other hand, parity of a general function must bechecked using the definition.

Theorem 3.1. (i) A polynomial function is odd (even) if and only if it containstermswith an odd (even) exponent only.

(ii) A rational function is odd if and only if it is a quotient of an odd and an even polyno-mial (in an arbitrary order).

(iii) A rational function is even if and only if it is a quotient of two even or of two oddpolynomials.

Example 3.1.The following functions are even:f (x) = x4−6, g(x) =x3 +x

2x5−3x, h(x) =

x4−6x2 +1

.

The following functions are odd:f (x) = x3−6x7, g(x) =x3−x2x4−3

, h(x) =x6−3x3−x

.

The following functions are neither odd nor even:f (x) = x4 +x2−x, g(x) =x3−x

2x4−3x, y = ex.

Definition (boundedness). Let f be a function andM ⊆ Dom( f ) be a subset of the domain ofthe functionf .

(i) The function f is said to bebounded from belowon the setM if there exists a realnumbera with the propertya≤ f (x) for all x∈ M.

(ii) The function f is said to bebounded from aboveon the setM if there exists a realnumbera with the propertya≥ f (x) for all x∈ M.

(iii) The function f is said to beboundedon the setM if it is bounded from both belowand above on the setM.

If the setM is not specified, we supposeM = Dom( f ).

Remark 3.2. The fact whether a function is bounded (bounded from below, bounded from above)can be easily recognized from the graph of this functions. (Explain! You can discuss the graphson the page 152.)

Example 3.2. • The functionsy= x2, y = ex, y= e−x, y= 2x and in general every expo-nential function are bounded from below.

• The functionsy = sinx, y = cosx, y = arctgx are bounded functions (both from belowand above).

Motivation. In the following paragraphs we will be interested in the fact, whether an applicationof the function f on both sides of an equality yields an equivalent equality. We start with anobvious implication which holds for every well-defined function f .

x1 = x2 ⇒ f (x1) = f (x2) (3.1)

The equationsx1 = x2 and f (x1) = f (x2) are equivalent, if the implication in (3.1) can be re-versed. It is clear that this implicationcannotbe reversed if two mutually different valuesx1 6= x2in the domain off may generate the same value of the functionf . A property which excludesthis possibility is introduced in the following definition.


Definition (one-to-one function). Let f be a function and letM ⊆ Dom( f ) be a subset of thedomain of the functionf .

The functionf is said to beone-to-one functionon the setM if there exists no pair of differentelementsx1, x2 of the setM which are associated with the same elementy of the imageIm( f ).


Remark 3.3 (one-to-one functions). Mathematically formulated, for the one-to-one functionsEthe following implication holds

f (x1) = f (x2) ⇒ x1 = x2. (3.2)

This implication means that if the functionf is one-to-one, then we can remove (or more pre-cisely un-apply) this function from both sides of the relation f (x1) = f (x2) and conclude theequivalent relationx1 = x2. This property can be used when solving nonlinear equations, sincethis is an exact description of the fact that the equationsx1 = x2 and f (x1) = f (x2) are equivalent.

Example 3.3. • The functionf : y= x2 is not one-to-one, since(−3)2 = 32, but−3 6= 3.• The functionf : y= x2 is one-to-one on the set[0,∞). Really, giveny∈ Im( f ) = [0,∞)

there exists exactly onex∈ [0,∞) with propertyx2 = y. Thisx is given by the formulax =

√y.

• The functiony = x2 is one-to-one on the set(−∞,0] as well. Really, giveny∈ Im( f )there exists exactly onex∈ (−∞,0] with propertyx2 = y. Thisx is given by the formulax = −√

y. However, this function is not one-to-one on(−∞,∞), as has been describedin the preceding point.

• The functiony =1x

is one-to-one on its domainR \ {0}. Thex which corresponds to

any particular nonzeroy can be calculated from the relationx =1y

.

• The functiony = sinx is not one-to-one, since sin0= sinπ and 06= π. However, the

functiony= sinx is one-to-one on the set[

−π2

,π2

]

as a quick examination of the graph

(see page 152, if necessary) shows. (Try to find another interval on which the functiony = sinx is one-to-one.)

Remark 3.4 (a graphical consequence). It is easy to see that a graphical criterion for a functionEto be one-to-one is that every horizontal line crossing the graph of the function must meet itin at most one point. (Explain and show on convenient graphs.Determine which of the basicelementary function are one-to-one on its domain. For the basic elementary functions which arenot one-to-one on the whole domain determine the intervals where they are one-to-one.)

Definition (inverse function). Let f : A → B be an one-to-one function. The rule whichassociates everyx with the numbery satisfying f (y) = x defines a function onB (really, foreveryx∈ Im( f ) ⊆ B there exists only oney satisfying f (y) = x). This function is said to be aninverse functionto the functionf . The inverse function to the functionf is denoted byf−1.

Remark 3.5. Do not confuse the inverse and the reciprocal value1

f (x)of the functionf (x). The

notation is the same, the meaning is different.


Remark 3.6 (a graphical consequence). The graph of the functionf and the graph of its inversef−1 are mutually symmetric about the liney = x.

Algorithm 3.1. An analytic formula for the inverse function can be calculated in the followingsteps.

(i) We formally interchange variablesx andy in the relationy= f (x). We obtainx= f (y).This relation defines the inverse function implicitly.

(ii) We check that for everyx there exists uniquey which satisfiesx = f (y). This provesthat f is one-to-one.

(iii) We solve the equationx = f (y) for y. This gives an explicit analytic formula for theinverse functiony = f−1(x).

The second step is usually combined with the third step: the solution y of the equation which issolved in the third step must be unique. From the definition and also from this algorithm it is

clear that(

f−1)−1equalsf .

Example 3.4. (i) The functiony = x2 defined on[0,∞) is one-to-one andy =√

x is itsinverse.

(ii) The functiony = x2 defined on(−∞,0] is one-to-one andy = −√

x is its inverse.(iii) The functiony = x2 is not one-to-one onR and it has no inverse. Really, the equation

x2 = 4 has two solutions onR and the definiteness in the definition of inverse functionis lost.

(iv) The functiony =1x

is self-invertible — the inverse function is againy =1x

. The same

is true also fory = −x.

Example 3.5(calculation of the inverse function). Find the inverse to the functiony =2x−1

x.

Solution:The first step (an interchange of the variables) gives

x =2y−1

y

and hence

xy= 2y−1

1 = (2−x)y

Isolatingy we obtain an explicit formula for the inverse function in theform

y =1

2−x.

Example 3.6(calculation of the inverse function). Let us find the inverse to the functiony =x+ex. The function is one-to-one (it follows e.g. from the monotonicity, as we will see later).Interchange of the variables gives

x = y+ey.

This equation cannot be solved fory. We keep the inverse function in its implicit form.

Remark 3.7. The inverse function can be considered as a tool which restores information whichis lost after application of the functionf . If an application of the functionf to a number (orquantity or variable)x gives the resulty, then for knowny we obtain the originalx by application


of an inverse function toy. (“If the function f is considered to be a kind of poison, then itsinversef−1 restores an influence off , like antidote.”)

x y

f

f−1

Remark 3.8. The inverse function to some of the basic elementary function is sometimes simplya name of another elementary function. Table 1 summarizes the basic elementary function andtheir inverses. The property to be an inverse function is mutual. If, for example, the arcsin(·)function is inverse to the sin(·) function, then also the sin(·) function is an inverse to the arcsin(·)function.

The functiony = f (x) The inverse functiony = f−1(x)

y =√

x y = x2, x > 0y = x2, x > 0 y =

√x

y = ex y = lnxy = lnx y = ex

y = ax y = logaxy = sinx, x∈ [−π/2,π/2] y = arcsinxy = cosx, x∈ [0,π] y = arccosxy = tgx, x∈ [−π/2,π/2] y = arctgx

TABLE 1. Inverse functions to the basic elementary functions

Remark 3.9(How to write a number as a result of a given operation?). It is clear that the compo-Esition of the function and its inverse is identity function.Really, given a value ofx, an applicationof the functionf changes this value and application of the inverse function restores this informa-tion and returns againx. A mathematical formulation of this idea is

f ( f−1(x)) = x and f−1( f (x)) = x

for everyx for which these relations have any sense. This suggests a possibility to write a givennumber as a result of a given function. For example the number1 can be written in each of thefollowing (equivalent) forms

1 = lne1 = log551 = 6log61 = sin(arcsin1) = arctg(tg1) = (√

1)2.

Example 3.7.Write the functionx2 as a logarithm.

Solution: x2 = ln(

ex2)

Example 3.8.Write the functionx for x∈ [−1,1] as a sine function.Solution: x= sin(arcsinx)

Remark 3.10(nonlinear equations). The inverse function allows an alternative interpretationin Eterms of solutions of nonlinear equations: If there exists an inversef−1 of the functionf and ifthe inverse function is defined in somex, then the nonlinear equation with the unknowny

f (y) = x (3.3)


has exactly one solution. This solution is given by the formula

y = f−1(x). (3.4)

An alternative method how to obtain (3.4) from (3.3) is to write the right-hand side of (3.3) inthe form f ( f−1(x)) and conclude (3.4) via implication (3.2).

Example 3.9(nonlinear equation). Solve the equation

e2

x−1 = 2. (3.5)

Solution: Since the inverse function to the exponential function,ex, is the natural logarithmicfunction, lnx, it follows (see the scheme)

2x−1

= ln2.2

x−12

exp(·)

ln(·)(3.6)

From here it is now easy to find

x =2

ln2+1.

An equivalent method how to solve this problem is to to write the number 2 on the right-handside of (3.5) in the “exponential form”eln2. This gives

e2

x−1 = eln2.

Now we conclude (3.6) by (3.2) and evaluatex.

Motivation. Now we will distinguish the classes of functions which preserve or reverse inequal-ities. This yields the following classes of functions.

Definition (monotonicity). Let f be a function andM ⊆ Dom( f ) be a subset of its domain.

(i) The function f is said to beincreasingon the setM if for every pairx1,x2 ∈ M withthe propertyx1 < x2 the relationf (x1) < f (x2) holds.

(ii) The function f is said to bedecreasingon the setM if for every pairx1,x2 ∈ M withthe propertyx1 < x2 the relationf (x1) > f (x2) holds.

(iii) The function f is said to be(strictly) monotoneon the setM if it is either increasingor decreasing onM.


Remark 3.11(to the monotonicity). If the setM in the preceding definition is an interval, thenthe monotonicity allows a clear geometrical interpretation on the graph of the function. However,the situation can be more difficult if the setM is not an interval. This phenomenon is illustratedby the following example.

Example 3.10.Consider the functiony =1x

. This function is not decreasing on its domain, as

the reader could guess from the graph (see page 152)! Really,the numbersx1 = −1 andx1 = 1satisfyx1 < x2, but f (x1) < f (x2), and the definition property of decreasing functions is broken.The correct conclusion is: The function is decreasing on each of the intervals(−∞,0) and(0,∞)and it isnot decreasing on its domainR\{0}.

An immediate consequence of the definition is the following theorem.


Theorem 3.2. If the function is strictly monotone on the set M, then it is one-to-one on M.

Remark 3.12(nonlinear inequalities). Suppose thatf is an increasing function. The definitionEof increasing function implies that the following two inequalities are equivalent.

a < b ⇐⇒ f (a) < f (b)

Really, one of these implications is included directly in the definition of increasing function andthe opposite one is an easy consequence of this definition. Moreover, the same is true also forinequality “≤” (Explain!). Hence we can apply or un-apply an arbitrary increasing function toan arbitrary inequality and the set of solutions does not change.The same can be done also with any decreasing function, but wehave to change the direction ofthe inequality sign.This approach does not concern all inequalities. For a method which is capable to solve moregeneral inequalities see Algorithm 2.1 on the page 28. For nonlinear inequalities which containsthe function with a known graph consult Remark 2.4 on the page6 – the graphical method isusually the fastest tool for solving simple inequalities and the reader is encouraged to use thegraphical method whenever it is possible.

Example 3.11(nonlinear inequality). Solve inequality

ln(x2−4x−4) > 0.

We write the right-hand side as a logarithm

ln(x2−4x−4) > ln1

and remove the logarithms

x2−4x−4 > 1.

From here we obtain

x2−4x−5 > 0,

(x−5)(x+1) > 0,

x∈ (−∞,−1)∪ (5,∞),

where the quadratic inequality can be solved either by graphical method or by solving equationx2−4x−5 = 0 and checking whether the inequality holds on the intervalsobtained by divisionof the real axis by the solution of this quadratic equation.

The next theorem shows that the inverse function (if it exists) preserves the type of monotonicity.

Theorem 3.3. Let f be a function. If the function f is increasing (decreasing, odd), then theinverse function f−1 has the same property.

Example 3.12. • The natural exponential functiony = ex and the natural logarithmy =lnx are mutually inverse functions. Both functions are increasing.

• The exponential functiony =(

12

)xand the corresponding logarithmic functiony =

log12

x are mutually inverse functions. Both functions are decreasing.

Remark 3.13. Even function cannot be one-to-one and it has no inverse.


Remark 3.14(summing up remark). Finally, we summarize all the properties of the functionsEintroduced in this chapter. We use the notation which is usefull when solving equations and (or)inequalities.

a = bf is one-to-one⇐⇒ f (a) = f (b)

a < bf is increasing⇐⇒ f (a) < f (b) a≤ b

f is increasing⇐⇒ f (a) ≤ f (b)

a < bf is decreasing⇐⇒ f (a) > f (b) a≤ b

f is decreasing⇐⇒ f (a) ≥ f (b)

If the function f is one-to-one, then for everyy∈ Im( f ) there exists a unique solutionx of theequation

f (x) = y

and this solution is given by the formulax = f−1(y).

CHAPTER II

Calculus

In this chapter we introduce mathematical methods which yield more detailed informations aboutthe function than the properties considered in the preceding chapter. We will investigate themanner in which quantities vary and whether they approach some specific values.We start with the concept of limit and continuity. This concept is one of the fundamental ideaswhich distinguishes calculus from other areas of mathematics. This concept (like the wholecalculus) is based on very natural ideas, but an exact mathematical description is somewhatcomplicated (for beginners sometimesvery complicated). Anyway, almost all theorems andresults can be illustrated by a simple picture. Always try toillustrate the described situation bysome drawing.We start, as usually, with an introduction to the concept of limits.

1. Limit, continuity

To formulate our forthcoming results in a simple way, we extend the set of real numbers by twoobjects called “infinity” and “minus infinity”. It is not sufficient to add these objects to the setR,but we have to extend the basic operations with real numbers also for infinity. This in containedin the following definition.

Definition (expanded set of real numbers). Underan expanded set of real numbersR∗ we

understand the setR of all real numbers enriched by the numbers±∞ in the following way: WesetR∗ = R∪{∞,−∞} and fora∈ R we set:

a+∞ = ∞, a−∞ = −∞, ∞+∞ = ∞, −∞−∞ = −∞

∞ ·∞ = −∞ · (−∞) = ∞, ∞ · (−∞) = −∞,a∞

=a−∞

= 0

−∞ < a < ∞, |±∞| = ∞.

Further, fora > 0 we set

a ·∞ = ∞ a · (−∞) = −∞,

and fora < 0 we set

a ·∞ = −∞ a · (−∞) = ∞.

Another operations we define with the commutativity of the operation “+” and “·”.

Example 1.1.According to the definition the following computations are valid.

∞+3 = ∞, −10−∞ = −∞∞ ·2 = ∞, ∞ · (−4) = −∞,

−∞.∞ = −∞, 1+6 ·∞ = ∞,

17

1. LIMIT, CONTINUITY 18

−17+∞

= 0, ∞+−∞−2

+0∞

= ∞.

Remark 1.1 (indeterminate forms). The operations “∞−∞”, “±∞ ·0” and “±∞±∞

” remain unde-

fined. Of course, the division by a zero remains undefined as well.

Definition (neighborhood). Under theneighborhoodof the pointa∈R we understand any openinterval which cotains the pointa, we writeN(a). Under thereduced(alsoring) neighborhoodof the pointa we understand the setN(a)\{a}, we writeN(a).

Under theneighborhood of the point∞ we understand the interval of the type(A,∞) and undertheneighborhood of the point−∞ the interval(−∞,A). Under the reduced neighborhood of thepoints±∞ we understand the same as under the neighborhood of these points.

Definition (limit of the function). Let a,L ∈ R∗ and f : R → R. Let the functionf be defined

in some reduced neighborhood of the pointa. We say that the functiony = f (x) approaches tothe limit L asx approaches toa if for any (arbitrary small) neighborhoodN(L) of the numberLthere exists reduced neighborhoodN(a) of the pointa such that for everyx∈ N(a) the relationf (x) ∈ N(L) holds.

We write

limx→a

f (x) = L (1.1)

or f (x) → L for x→ a.

Arrangement. Shortly we read (1.1) as “the limit off at a is L”.Motivation. From a geometric point of view it turns out to be interesting to distinguish the casesin which x approaches toa from the left and from the right. This gives a motivation for thefollowing definitions.

Definition (one-sided neighborhood). Under theright (left) neighborhoodof the pointa∈ R

we understand the interval of the type[a,b),(or (b,a], for left neighborhood), we writeN+(a)(N−(a)). Under thereduced right (left) neighborhoodof the pointa we understand the corre-sponding neighborhood wihtout the pointa, we writeN+(a), (N−(a))

a

neighborhood of the pointa, N(a)

reduced neighborhood of the pointa, N(a)

right neighborhood of the pointa, N+(a)

right reduced neighborhood of the pointa, N+(a)

FIGURE 1. Neighborhood of the pointa.


Definition (one-sided limit). If we replace in the definition of the limit the reduced neighbor-hood of the pointa by the reduced right neighborhood of the pointa, we obtain a definition ofthe limit from the right. We write lim

x→a+f (x) = L.

Similarly, we define also the limit from the left. In this casewe write limx→a−

f (x) = L.

Remark 1.2(shortened notation). Another (very short) notation for one-sided limits isf (a+) =L for the limit from the right andf (a−) = L for the limit from the left. For the two-sided limitwe can write f (a±). In several textbooks also the notationf (a+ 0), f (a− 0) and f (a± 0),respectively, is used.

Remark 1.3. The limit presents a basis of the so-called local concept in the investigation ofthe function. The valuef (a) never occurs in the definition of the limit lim

x→af (x) and it has no

influence neither on the existence nor on the value of this limit. The existence and value of thelimit in the pointa is affected by the values of the function in the nearest points only.The fact that in the limit lim

x→af (x) may exist even if the functionf is undefined ata can be

illustrated on the example of the functionsinx

x. This function is not defined forx = 0, but

limx→0

sinxx

= 1, as we will see later.

On the other hand, for an existence of the limit asx → a the function f (x) must be defined insome (possibly one-sided) ring neighborhood of the pointa. Hence it has no sense to try toevaluate limits like lim

x→1

√

1−3x2, and limx→0−

ln(x).

From the definition it is not clear whether for a functionf and a pointa the limit (1.1) exists ornot. Further, the definition states nothing about the fact, whether, if the limit exists, is the numberL satisfying (1.1) unique. Hence it is an open question, how many limits can a function have at apoint. This problem is solved in the following theorem.

Theorem 1.1(uniqueness of the limit). The function f possesses at the point a at most onelimit (or one-sided limit).

Theorem 1.2 (the relationship between the limit and the one-sided limits). The limit of the Efunction f at the point a∈ R exists if and only if both one-sided limits at the point a exist andare equal. More precisely: If the limits f(a−) and f(a+) exist and f(a−) = f (a+), then thelimit f (a±) exists as well and f(a±) = f (a−) = f (a+). If one of the one-sided limits does notexist or if f(a−) 6= f (a+), then the limit f(a±) does not exist.

Remark 1.4 (How to find the value of the limit on the graph?). The existence and the value ofthe limit can be estimated from the graph of the function.

• Consider a test point with thex-coordinate in some right neighborhood of the pointa.• Let us move the test point along the graph and let thex-coordinate of this point ap-

proache the pointa.• It may happen that they-component of this test point approaches some valueL. If such

a behavior really appears, thenL is a limit of the functionf at the pointa from theright.

• If the y-coordinates do not approach any real number, but either increase or decreasewithout any bound, then the value of the limit is+∞ or−∞, respectively.


Similarly, we can estimate the limit from the left. If the limits from the right and from the leftare equal, then the limit exists (see Theorem 1.2).Figure 2 illustrates this idea. The function graphed on the picture satisfies lim

x→a+f (x) = L. The

left-hand side limit exists as well, but it is different fromL (it is greater).

a xx

y

f (x)

L limit process

test point

target point

FIGURE 2. Limit process limx→a+

f (x) = L.

Example 1.2. The following limits can be evaluated by an inspection of thegraphs of basicelementary functions (see page 152).

limx→∞

arctgx =π2

, limx→0+

cotgx = ∞,

limx→0+

lnx = −∞, limx→∞

ex = ∞,

limx→−∞

ex = 0.

Remark 1.5 (“experimental estimation” of the limit). Let us translate the steps described inRemark 1.4 into a numerical experiment. As a result of this experiment we obtain a guess for thevalue of the limit.

• We consider a sequence of real numbers approaching to the numbera.• We evaluate the functionf in each of these points. We obtain a sequence of values of

the functionf .• We check, whether this sequence approaches a real number (orincrease or decrease

without any bound).• If the function f is “simple enough,”∗ the sequence of images converges to the value of

the limit. From this sequence we can usually get a good idea ofwhat the limit of f atthe pointa could be.

• Remember that this is only an estimate which works in many situations, but not ingeneral! This approach can sometimes lead to incorrect estimates! In each case youshould prefer some of the exact methods (shown in this chapter below) for calculationof the limits.

Example 1.3(numerical experiment). We estimate the value of the limit limx→0+

sinxx

by a numeri-

cal experiment. Let us evaluate the functionsinx

xin the sequence of values approaching to zero.

We obtain∗However, it is almost impossible to say, in general, what does it exactly mean “simple enough”.


x 0.5 0.2 0.1 0.01 0.005 0.00001sinx

x0.95885 0.99334 0.99833 0.999983 0.9999958 1

From this computation it seems to be reasonable to guess

limx→0+

sinxx

= 1.

However this table could be misleading in several respects.• First, because a mini-calculator rounds-off answers, it appears that sin(x)/x actually

equals 1 ifx is within 0.00001 of 0. This is false. A closer investigation of the functionsin(x)/x shows that this function isneverequal to 1.

• Second, although the table indicates that sin(x)/x gets closer and closer to 1 asx ap-proaches 0, we cannot be absolutely sure of this fact. The function sin(x)/x coulddeviate from 1 in some manner for values ofx that are closer to 0 than those listed inthe table.

Although a calculator may help give us a good idea whether a limit exists or not, it cannot beused for the proofs. It is necessary to have available an exact mathematical theory of limits. Wedevelop such a theory in this chapter.

Motivation. As we said before, in general, the value of the functionf at a given pointa has noinfluence neither to the existence nor to the value of the limit at this point. Actually this isnot thecase concerning many functions used in the mathematical description of real–world problems.In most cases the value of the limit is exactlythe sameas the value of the function (provided thefunction is well-defined at the point under consideration).The name for a class of functions withthis (nice) property is revealed in following definition.

Definition (continuity at a point). Let f be a function defined at the pointa∈ R.

The functionf is said to becontinuousat the pointa if limx→a

f (x) = f (a).

The functionf is said to becontinuous from the rightat the pointa if limx→a+

f (x) = f (a).

The functionf is said to becontinuous from the leftat the pointa if limx→a−

f (x) = f (a).

Remark 1.6. According to the definition, the function is continuous at the pointa if• f (a) exists,• lim

x→af (x) exists as finite number,

• f (a) = limx→a

f (x) holds.

If the function is defined in some (at least one-sided) reduced neighborhood of the pointa butat least one of the three conditions above is broken, the point x = a is said to be a point ofdiscontinuityof the functionf . For various types of discontinuity (jump discontinuity, removablediscontinuity and infinite limit) see Figure 7 on the page 31.


Definition (continuity on an interval). The function is said to becontinuous on the open interval(a,b) if it is continuous at every point of this interval.

The function is said to becontinuous on the closed interval[a,b] if it is continuous on(a,b),continuous from the right at the pointa and continuous from the left at the pointb.

Notation. The class of all functions continuous on the intervalI will be denoted byC(I). IfI = (a,b) or I = [a,b], then we write shortlyC((a,b)) or C([a,b]), respectively.

Remark 1.7. The following theorem introduces an important property of elementary functions.Recall that an elementary function is every function which can be defined in the explicit form bya single formula established from a finite number of basic elementary functions (see Remark 2.6on the page 6).

Theorem 1.3(continuity of elementary functions). Every elementary function is continuous onEits domain.

Remark 1.8 (practical). Consider an elementary functionf (x) and the limit of this function for Ex→ a. According to the preceding theorem, we try to substitutea for x into f (x) and calculatef (a) first. If this is possible, i.e. if the numbera is in the domain off , then we obtain the limit.If this fails, i.e. if f (a) is not defined, we must look for another method. Hence, when speakingabout elementary functions, the concept of limit givesnothing new concerning the points fromthe domain. This concept remains interesting only for the points, which are not in the naturaldomain of the function, but the function is defined in some ring neighborhood of these points.

Example 1.4(limit by substitution). The functiony=ex ln(x)x2−1

is continuous on(0,1) and(1,∞).

Hence

limx→2

ex ln(x)x2−1

=e2 ln2

3.

In a similar way we can find limits at every point from the domain of this function. The evaluationof the limit at the pointsx = 0, x = 1 andx = ∞ is not so easy. The evaluation of the limit inarbitrary negative number has no sense, since the function is defined for positive numbers only,not equal to 1.

The following two definitions connect the concept of limits with a clear geometric consequenceon the graph of the function.

Definition (vertical asymptote). Let f be a function andx0 a real number. The vertical linex = x0 is said to be avertical asymptote to the graph of the function fif at lest one of theone-sided limits of the functionf at the pointx0 exists and it is not a finite number.

Example 1.5.The functiony =1x

has a vertical asymptotex = 0.

Example 1.6. The functiony = lnx has the vertical asymptotex = 0. This is asymptote onlyfrom the right since the limit lim

x→0−lnx has no sense.

Example 1.7.The functiony= xex has no vertical asymptote since the limit at every real numberis finite (equal to the value of the function).


Definition (horizontal asymptote). The liney = L is said to be ahorizontal asymptote to thegraph of the function f(x) at +∞ if the limit of the function f at+∞ exists and

limx→∞

f (x) = L

holds. In a similar way we define thehorizontal asymptote at−∞.

horizontal asymptote at+∞ve

rtic

alas

ymp

tote

FIGURE 3. Horizontal and vertical asymptote.

Example 1.8. The exponential functiony = ex has an asymptotey = 0 at the point−∞ and hasno horizontal asymptote at the point+∞.

Example 1.9.The functiony = arctgx has an asymptotey =π2

at+∞ andy = −π2

at−∞.

Example 1.10.The functiony=1x

has the same horizontal asymptote in both±∞, this asymptote

is y = 0.

Example 1.11.The functiony = x2 has no horizontal asymptote.

Theorem 1.4 (algebra of limits). Let a∈ R∗, f ,g : R → R. The following relations holdE

whenever the limits on the right exist and the formula on the right is well-defined.

limx→a

( f (x)±g(x)) = limx→a

f (x)± limx→a

g(x) (1.2)

limx→a

( f (x) ·g(x)) = limx→a

f (x) · limx→a

g(x) (1.3)

limx→a

f (x)g(x)

=limx→a f (x)limx→ag(x)

(1.4)

The same holds for one sided limits as well.

Example 1.12(application of the preceding theorem). Using the preceding theorem we can findthe following limits.

(i) limx→∞

(arctgx+arccotgx) =π2

+0 =π2

(ii) limx→0−

1x

cosx = −∞.1 = −∞


(iii) limx→∞

1xex =

1∞.∞

=1∞

= 0

Example 1.13(examples not covered by the preceding theorem). The rules from Theorem 1.4

cannotbe used for the limit limx→0+

(

1x

+ lnx

)

since the substitutionx= 0 yields an indeterminate

form “∞−∞”.

Similarly, Theorem 1.4 fails for the limits limx→∞

(sin2x+ cos2x) or limx→∞

1x

cos2x since the limits

limx→∞

sin2x and limx→∞

cos2x do not exist. This, however, gives no conclusion concerningthe ex-

istence or nonexistence of the original limits! We must seekanother method for evaluation ofthese limits.

The following theorem shows that evaluation of the limit of acomposite function is, in somesense, similar to the calculation of the values of the function. We find the limit of the insidefunction first and we use the result as an argument of the outside function.

Theorem 1.5(limit of the composite function with continuous component). Let limx→a

f (x) = b

and g(x) be a function continuous at b. Thenlimx→a

g( f (x)) = g(b), i.e.

limx→a

g( f (x)) = g(limx→a

f (x)).

The same holds for one-sided limits as well.

Example 1.14(application of the preceding theorem).(i) lim

x→∞cos(e−x) = cos0= 1

(ii) limx→−∞

earctgx = e−π/2

Theorem 1.5 cannot be used if the outside function is not defined at the limit of the inside func-tion. This situation concerns, for example, the limits withlogarithmic outside function and theinside function approaching to zero or infinity since neither “ln0”, nor “ln ∞” are well-defined.In such cases we usually can use the following theorem.

Theorem 1.6(limit of the composite function). Let limx→a

f (x) = b, limy→b

g(y) = L and let there

exist a ring neighborhood of the point x= a such that f(x) 6= b for all x in this neighborhood.Thenlim

x→ag( f (x)) = L.

Example 1.15(application of the preceding theorem).

(i) limx→0+

ln1x

= ” ln ∞” = ∞

(ii) limx→−∞

arctg(e−x) = ”arctg∞” =π2

(iii) limx→0+

ln(sinx) = ” ln(0+)” = −∞

The following theorem is applicable (among others) when evaluating limits of rational functionsin the point of discontinuity (a zero of the denominator), provided the numerator is not equal tozero at this point∗.

∗If both numerator and denominator equal zero at the pointa, we can cancel the fraction by the linear factor(x−a) and consider this new limit.


Theorem 1.7(limit of the type ”L0

”) . Let a∈ R∗, lim

x→ag(x) = 0 and lim

x→af (x) = L ∈ R

∗ \ {0}.

Suppose that there exists a ring neighborhood of the point a such that the function g(x) doesnot change its sign in this neighborhood. Then

limx→a

f (x)g(x)

=

{

+∞ if g(x) and L have common sign,

−∞ if g(x) and L have an opposite sign,

in the neighborhood under consideration. The same holds forone sided limits as well.

Remark 1.9(practical). Usually, when using Theorem 1.7 we investigate the corresponding one-sided limits first. From the mutual relationship of these one-sided limits we conclude whetherthe two sided limit exists or not (Theorem 1.2).Roughly speaking, Theorem 1.7 states the following: If the denominator of the fraction does notchange its sign infinitely many times in a neighborhood of thepoint a, then the one-sided limits

of the type “L0

” are infinite, The sign of the result can be established by the“usual” rules – the

quotient of two expressions with the same sign is positive and the quotient of two expressionswith opposite signs is negative. We will indicate by the symbol “+0” the fact that the functiontends to zero, but it is positive in some ring neighborhood. With this arrangement we can calcu-

late as follows:2

+0= ∞,

−∞+0

= −∞. Similarly, “−0” will indicate that the function is negative

and tends to zero. With this arrangement we can write e.g.−2−0

= ∞.

Example 1.16(application of the preceding theorem).

(i) limx→2+

1−xx2−4

=−1+0

= −∞

(ii) limx→2−

1−xx2−4

=−1−0

= ∞

(iii) limx→2

1−xx2−4

does not exist, since both one-sided limits are not equal.

(iv) limx→∞

1

1−e1x

=1

1−e1∞

=1

1−e+0 =1−0

= −∞

(v) limx→0+

1

1−e1x

=1

1−e1/+0=

11−e∞ =

11−∞

=1−∞

= 0

The following theorem presents a very fast method for calculating limits of polynomials andrational functions at±∞. We can find these limits almost without writing any computations.

Theorem 1.8(limit of the polynomial or of the rational functions at±∞). It holds

limx→±∞

(a0xn +a1xn−1 + · · ·+an−1x+an) = limx→±∞

a0xn,

limx→±∞

a0xn +a1xn−1 + · · ·+an−1x+an

b0xm+b1xm−1 + · · ·+bm−1x+bm= lim

x→±∞

a0

b0xn−m.

Remark 1.10. The preceding theorem is applicable also in the cases when the rules for algebraof limits give undefined expression∞−∞ for polynomials or

∞∞

for rational functions!

Example 1.17(application of preceding theorem).


(i) limx→∞

(6x3−2x+1) = limx→∞

6x3 = 6.(∞)3 = ∞,

(ii) limx→−∞

(3x5−2x2 +2) = 3.(−∞)5 = −∞,

(iii) limx→∞

x3−22x3−4x2 +1

= limx→∞

12

=12

,

(iv) limx→−∞

x5−22x3−4x2 +1

= limx→−∞

12

x2 = ∞.

Remark 1.11(Evaluation of the limits). The decision tree on Figure 4 can be used to find-outwhich theorem should be used to evaluate a limit. Since the picture refers to some tools notyet revealed to the reader, the reader is encouraged to studythis diagram after an explanation ofl’Hospital’s rule in Theorem 4.1 on the page 36.Remark that this tree covers most of the limits, but not all.

• First, till now we didn’t consider the power expression like0∞, 00, 1∞ and others.• Second, there are limits like “∞−∞” for which the conversion into l’Hospital’s rule is

very cumbersome and other techniques are more handy to evaluate these limits. Thisconcerns, among others, the limits like lim

x→∞(√

x+1−√

x).

• Third, there are limits of the type00

or∞∞

for which the l’Hospital’s rule fails, since

the limit on the right-hand side of (4.2) does not exit or is more complicated than theoriginal limit. In such a cases the reader can try a numericalexperiment (see Remark1.5 and Example 1.3) to estimate the value of the limit or ask an expert.

Evaluate limx→a

f (x).

Substitutex = a into f (x).Is f (a) well-defined? You have the result.

Is x = ±∞ and is f either apolynomial or a rational function?

Use Theorem 1.8.Consider the leading term(-s) only.

The substitution gives . . . .(follow the corresponding line).

Use the l’Hospital’s rule.Consider the new limit and work

from the beginning.

Convert, using convenientalgebraic modifications, into either

00

, or∞∞

.

Consider the one-sided limits firstand use Theorem 1.7. From themutual relationship between the

one-sided limits deduce the(non-)existence and value of the

limit.

Yes

No

Yes

No 00

or∞∞

∞−∞0 ·∞

nonzero0

FIGURE 4. Limits.

2. THEOREMS ON CONTINUOUS FUNCTIONS 27

2. Theorems on continuous functions

Motivation. As we have seen before, the definition of continuity claims that a continuous func-tion has the same value and the limit at every point from its domain. A natural question arises:

How does the definition of continuity from page 21 correspondto the intuitiveidea about the continuous functions, that the function is continuous if thegraph of this function can be sketched as a single curve without breaks, jumpsand holes?

Surprisingly, this highly expected correspondence is onlypartial! The Czech mathematicianBernard Bolzano found the function which is continuous but this functioncannot be graphedby any graphical tool since it has no tangent at every point.∗ This rather extraordinary exampleshows that the class of continuous functions is more comprehensive than the class of functionswhose graphs are “continuous curves” in an intuitive meaning. However, some of the mostimportant properties of “continuous curves in the plane” are preserved for continuous functions.

Theorem 2.1(Weierstrass). Let f be a function defined and continuous on[a,b]. Then the Efunction f is bounded and takes on an absolute maximum and an absolute minimum on theinterval [a,b], i.e. there exist numbers x1,x2 ∈ [a,b] such that f(x1) ≤ f (x) ≤ f (x2) for allx∈ [a,b].

Theorem 2.2(Bolzano, the 1st Bolzano’s theorem). Let f be a function defined and continuousEon [a,b]. If f (a). f (b) < 0 holds (i.e. the values f(a) and f(b) have different signs), then thereexists a zero of the function f on the interval(a,b), i.e. there exists c∈ (a,b) such that f(c) = 0.

Theorem 2.3(Bolzano, the 2nd Bolzano’s theorem). Let f be a function defined and continuousEon [a,b]. Let m and M be absolute minimum and absolute maximum of the funciton f on theinterval [a,b], respectively. Then for every y0 between m and M there exists at least one x0 withproperty f(x0) = y0.

Remark 2.1. All three preceding theorems have a simple geometric interpretation as Figure 5shows.

• The graph of the continuous function on the picture is bounded by the horizontal dot-dash lines.

• The function has an absolute minimum on the interval[a,b] at the boundary pointx= a and an absolute maximum at the pointx= M. The values of the function in theseextrema are shown on they-axis.

• The function changes its sign on the interval[a,b] since f (a) < 0 and f (b) > 0. Ac-cording to the 1st Bolzano’s theorem, there exists at least one zero of the function on(a,b). One of these zeros is denoted byc.

• The valuey0 lies between the maximal and the minimal value on they-axis. Accordingto the 2nd Bolzano’s theorem, there exists at least onex0 ∈ [a,b] with property f (x0) =y0. One of the points with this property is shown on the picture.

We see that the statements of these theorems are quite natural. The main contribution of Weier-strass and Bolzano is in the fact that these mathematicians recognized that these statements arenot trivial consequences of the definition of continuity andthat these statements need a proof.

∗A similar example of continuous function which cannot be graphed has been given few years after Bolzanoby K. Weierstrass. This example became to be more familliar to mathematicians, since Bolzano, an “enemy ofAustro–Hungarian Monarchy”, was not allowed to publish hisresults.

2. THEOREMS ON CONTINUOUS FUNCTIONS 28

a bx0

y0

cx

y

M

zero

upper bound

lower bound

absolute minimum

absolute maximum

FIGURE 5. A function continuous on[a,b].

Algorithm 2.1 (general nonlinear inequality). According to Bolzano’s first theorem, a functioncannot change its sign on the interval which contains neither a point of discontinuity nor a rootof the function. Hence when solving one of the inequalities

f (x) > 0, f (x) ≥ 0, f (x) ≤ 0, and f (x) < 0,

we can proceed in the following steps.(i) We find all of the solutions of the equationf (x) = 0. We mark these points on the real

axis.(ii) We find all points of discontinuity of the functionf (x). We mark these points on the

real axis.(iii) The real axis is divided into several subintervals now. The functionf preserves its sign

on each subinterval. We choose arbitrary (convenient) numberξ form each subinterval,evaluatef (ξ ) and mark the sign of this value to the subinterval. We do this step for allsubintervals.

(iv) Performing the preceding step for all subintervals, weassign the sign of the functionf (x) to each subinterval. Now it is clear wheref (x) > 0 holds and where the inequalityis opposite.

The same technique can be used for an inequality

f (x) ≥ g(x).

In steps (i) and (ii) we mark on the real axis all the solutionsof f (x) = g(x) and all the points ofdiscontinuity of the functionsf (x) andg(x) . In the step (iii) we choose a test number from eachsubinterval and check, whether the relationf (x) > g(x) holds or not. The same will be true forall numbers from the subinterval containing this test number.Remark that the only problem of this algorithm, when the procedure may crash, is in the casewhen we cannot find all of the solutions of the equalityf (x) = 0 or if we cannot find all of thepoints of discontinuity. Such a situation appears when the function f (x) is very complicated.

3. DERIVATIVE 29

In such an extreme case we probably cannot solve the originalinequality neither by Bolzanotheorem nor by any other technique.

Example 2.1.Solve

ln2x− lnx−6x−1

≥ 0

Solution:The left-hand side has a point of discontinuity atx = 1. We solve the equation

ln2x− lnx−6x−1

= 0,

which is equivalent to

ln2x− lnx−6 = 0.

Substitutiony = lnx converts this equation into a quadratic equation

y2−y−6 = 0

with solutionsy1 = 3 andy2 = −2. Hence the solutions of original equation are the solutions oflnx= 3 and lnx= −2, i.e.x1 = e3 andx2 = e−2. We mark the numberse−2, 1 ande3 on the realaxis (see the scheme below). The real axis is divided into three subintervals now. We choose thetest pointsξ1 = e−3, ξ2 = e−1, ξ3 = eandξ4 = e4. Evaluation of the function

f (x) =ln2x− lnx−6

x−1gives

f (ξ1) =(−3)2− (−3)−6

e−3−1≈−6.31< 0,

f (ξ2) =(−1)2− (−1)−6

e−1−1≈ 6.32> 0,

f (ξ3) =12−1−6

e−1≈−3.49< 0,

f (ξ4) =42−4−6

e4−1≈ 0.11> 0.

Hence for the signs of the functionf we have the following scheme./0

0

−−−

e−2

+++ ◦1

−−−

e3

+++

The solution of the inequality is the set[e−2,1)∪ [e3,∞).

3. Derivative

Definition (derivative at a point). Let f be a function and letx∈ Dom( f ). The functionf issaid to bedifferentiable at the point xif the finite limit

f ′(x) = limh→0

f (x+h)− f (x)h

(3.1)

exists. The value of this limit is called aderivative of the function f at the point x.

3. DERIVATIVE 30

Remark 3.1 (one-sided derivatives). A similar definition is used for one-sided derivatives fromthe right and from the left. In this case the corresponding one-sided limit is used in (3.1).

Remark 3.2. Remark that the limit (3.1) is an indeterminate form “00

” and, in general, it is not

easy to find the value of this limit.

Definition (derivative as a function). The function is said to bedifferentiable on an openinterval I if it is differentiable at every point of this interval. The function which assigns to eachpointx from the interval the valuef ′(x) of the derivative is said to be aderivative of the functionf on the interval Iand denoted byf ′.

Remark 3.3. Page 155 shows, how to write a derivative of any basic elementary function. Theseformulas are valid whenever the right-hand side is well-defined.

Definition (higher derivatives). Let f (x) be a function andf ′(x) be the derivative of thisfunction. Suppose that there exists derivative( f ′(x))′ of the functionf ′(x). Then this derivativeis said to be thesecond derivative of the function fand denotedf ′′(x)f ′′(x)f ′′(x). By n-times repetition ofthis process we obtain then-th derivativef (n)(x) of the functionf .

Remark 3.4 (geometric interpretation of the derivative). Consider a functionf and a pointx as E

tangent secant

x x+h

f (x)

f (x+h)

FIGURE 6. The derivative.

on the Figure 6. Further consider a secant to this graph whichintersects the graph at the points[x, f (x)] and[x+h, f (x+h)]. The slope of this line is

f (x+h)− f (x)h

. (3.2)

Now let us (in mind) fix the point[x, f (x)] and move the point[(x+h), f (x+h)] along the graphtowards to this fixed point. With the limit processh→ 0 these two points become identical andthe secant becomes to be a tangent to the graph of the functionf at the pointx. The slope of thistangent is the limit of the slopes of secants, i.e. the limit of (3.2). This limit is by definition andby (3.1) the derivative of the functionf at the pointx.

Remark 3.5(tangent). If the function f is differentiable at the pointa, then the point–slope formEof the equation of the tangent line in the pointa is

y = f ′(a)(x−a)+ f (a). (3.3)

3. DERIVATIVE 31

Example 3.1.The functiony= x3 has the derivativey′ = 3x2. Hence the slope-intercept equationof the tangent to the graph ofy = x3 in the point[2,8] is

y = 3 ·22(x−2)+8 = 12x−16.

Remark 3.6(points where no derivative exists). As stated hereinbefore, there exists a tangent (asa line with a finite slope) to the graph of the functionf if and only if there exists a derivative ofthe functionf in the pointx. The cases when the derivativedoes not existcover several differentsituations shown on the Figure 7.

rem

ovab

led

isco

ntin

uity

infin

itelim

it

jum

p

peak

vert

ical

tan

gen

t

FIGURE 7. The points of discontinuity and the points without derivative.

• There exists avertical tangent to the graph at a given point∗. In this case the limit (3.1)becomes to be infinite.

• There exist both tangent from the right and tangent from the left, and the slopes ofthese tangents are different. In this case the graph of the function f has a kind ofpeak(or corner) in the given point. In this case the limit (3.1) does not exist but thecorresponding one-sided limits exist.

• At least one of the one-sided limits in (3.1) does not exist. In this case the secants haveno limit position. At least one of the one-sided tangents does not exist. The behaviorof the functionf is very complicated near this point.

Remark 3.7 (practical interpretation of the derivative). Let the quantityx denotes time (in con-Evenient units) and suppose that the value of the quantityy changes in the time, i.e.y = y(x).Derivativey′(x) of the functiony in the pointx denotes the instant rate (velocity) of the changeof the functiony at the timex. As a practical example consider the following situation.Let the quantityy denotes the size of population of some species in some bounded area. In thiscase the derivativey′(x) denotes the rate of the change of the size of this population.This changeequals to the number of the individuals which are born in the momentx decreased by the amountof individuals which died in this moment (more precisely in the time interval which starts atgiven time and has unit length).

Theorem 3.1(relationship between the differentiability and continuity). Let f be a function Edifferentiable at the point x= a (on the interval I). Then f is continuous at the point x= a (onthe interval I).

∗Remember that a vertical line has no slope!

3. DERIVATIVE 32

Remark 3.8. The converse statement to the preceding theorem is not true,in general. Really, thefunctiony = |x| is continuous inR but it is not differentiable atx = 0 (the tangent in this pointfrom the left isy = −x and from the righty = x).

Notation. The set of all functions with continuous derivative on the interval I is denoted byC1(I). These functions are calledsmooth functions.

Remark 3.9. The derivatives of all basic elementary functions can be found in the tables ofderivatives. The following rules for algebra of derivatives allows to use these formula also whenevaluating derivatives of more complicated functions. Remark that we willnever use the def-inition to find the derivativeof any function, but we will prefer an application of the knownderivatives of basic elementary functions (see page 155) and the following Theorems 3.2 and3.3.

Theorem 3.2(algebra of derivatives). Let f , g be functions and c∈ R be a real constant. TheEfollowing relations hold

[c f(x)]′ = c f ′(x), the constant multiple rule

[ f (x)±g(x)]′ = f ′(x)±g′(x), the sum rule

[ f (x)g(x)]′ = f (x)g′(x)+ f ′(x)g(x), the product rule[ f (x)

g(x)

]′=

f ′(x)g(x)−g′(x) f (x)g2(x)

, the quotient rule

whenever the derivatives on the right-hand side exist and the expression on the right-hand sideis well defined.

Example 3.2(algebra of derivatives). Find f ′(x) for f (x) =xex

x+1.

Solution:We start with the quotient rule followed by the product rule,i.e.[

xex

x+1

]′=

(xex)′(x+1)− (x+1)′xex

(x+1)2

=(ex +xex)(x+1)−1xex

(x+1)2 =ex(x2+x+1)

(x+1)2 .

Remark 3.10(practical). The sum rule is much simpler than the product rule and the quotientrule. In the following examples the product and the quotientcan be rewritten as sums and thesum rule can be used. This is more handy than a blind application of the product rule or thequotient rule.

(i) [(x+1)(x−2)]′ = (x2−x−2)′ = 2x−1

(ii)

(

x3−x+14x

)′=

14(x2−1+x−1)′ =

14(2x−x−2).

Theorem 3.3(chain rule). Let f and g be differentiable functions. The relation E[ f (g(x))]′ = f ′(g(x))g′(x) (3.4)

holds whenever the right hand side is well defined.

Remark 3.11. The expressionf ′(g(x)) denotes the derivative of the functionf evaluated atg(x). We do not need to writef (x), g(x), f ′(x) andg′(x) every time we differentiate a composite

3. DERIVATIVE 33

function. Roughly speaking, what the chain rule says is thatwe differentiate from the “outsidein”. In other words, we first differentiate the “outside” function f (x) and keep the argument ofthe function f (·) intact, then multiply the result by the derivative of the “inside” functiong(x).With a little practice this becomes easy to do.

Example 3.3 (chain rule). Find the derivatives of the composite functionsy = lnsinx, y =ln(xsinx) andy = ln

(

xsin2(2x))

.(i) We use simply the chain rule withf (x) = lnx andg(x) = sinx. We obtain

y′ = (ln(sinx))′ =1

sinxcosx.

(ii) We use the chain rule followed by the product rule.

y′ = (ln(xsinx))′ =1

xsinx(sinx+xcosx).

(iii) We use the chain rule. The “inside” function is a product and we use the product rule.The second factor in the product is the composite of three functions – we use two timesthe chain rule.

y′ =(

ln(

xsin2(2x))

)′=

1

xsin2(2x)[1.sin2(2x)+x.2sin(2x)cos(2x)2]

Remark 3.12(some technical tricks). The most effective way how to find the derivative of the

functiony =1

(x2+1)5 is to writey = (x2+1)−5 and calculate

y′ = −5(x2 +1)−6(2x) = − 10x(x2 +1)6 .

The application of the quotient rule

y′ =0(x2+1)5−5(x2 +1)42x

(x2+1)10 = − 10x(x2 +1)6

is correct, but cumbersome.

In a similar way, the most effective way how to findy′ if y =2x+1√

3is to writey =

2√3

x+1√3

and calculate

y′ =2√3.1+0 =

2√3.

Example 3.4. Let us differentiate the following functions using convenient rules. We evaluateand simplify the derivative and add some comments concerning this computation below.

(i) y =x

x2 +1

y′ =(x)′ · (x2+1)−x· (x2+1)′

(x2+1)2 =(x2 +1)−x·2x

(x2 +1)2

=1−x2

(1+x2)2

We used simply the quotient rule.

3. DERIVATIVE 34

(ii) y = earctgx2

y′ = earctgx2 · (arctgx2)′ = earctgx2 · 11+(x2)2 · (x

2)′ =2xearctgx2

1+x4

Here we used the chain rule. It is necessary to use this rule two times since the insidefunction “arctgx2” is again a composite function (the inside component is notx butx2).

(iii) y = xln2x

y′ = (x)′ · ln2x+x· (ln2x)′ = 1 · ln2x+x·2lnx · (lnx)′

= ln2x+x·2lnx· 1x

= (2+ lnx) lnx

Here we have a product of the functions “x” and “ln2x”. Hence we use the product rulefirst. The latter function, “ln2x”, is a composite function and we use the chain rule todifferentiate this composite function.

(iv) y =

(

x−1x+1

)2

y′ = 2x−1x+1

(

x−1x+1

)′= 2

x−1x+1

· (x−1)′(x+1)− (x−1)(x+1)′

(x+1)2

= 2x−1x+1

· 1 · (x+1)− (x−1) ·1(x+1)2 = 2

x−1x+1

· 2(x+1)2

= 4x−1

(x+1)3

The function is a composite function. The outside function is a quadratic function andwe use the power rule to differentiate this function. The inside function is a quotientand we use the quotient rule. After an application of the quotient rule the differentiationis finished.

(v) y =√

x+1− ln(1+√

x+1)

y′ =1

2√

x+1· (1+0)− 1

1+√

x+1·(

0+1

2√

x+1· (1+0)

)

=1

2√

x+1

(

1− 1

1+√

x+1

)

=1

2√

x+1·

√x+1

1+√

x+1

=1

2(1+√

x+1)

We use the sum rule and differentiate the radical and the logarithm separately. Bothfunctions are composite functions. The logarithm is a composite function with a sumas its inside function. Hence when differentiating the inside function of the logarithm,we use the sum rule.

(vi) y =√

1−x·arcsin√

x

y′ = (√

1−x)′ ·arcsin√

x+√

1−x· (arcsin√

x)′

=1

2√

1−x· (1−x)′ ·arcsin

√x+

√1−x· 1

√

1− (√

x)2(√

x)′

4. MATHEMATICAL APPLICATIONS OF DERIVATIVES 35

= − 1

2√

1−xarcsin

√x+

√1−x

1√

1− (√

x)2

12√

x

=1

2√

x− arcsin

√x

2√

1−x(The comments to this calculation are left to the reader.)

Remark 3.13 (another notation). An equivalent notation for the derivative of the functiony =f (x) is

y′(x) = f ′(x) =dydx

. (3.5)

This notation is used especially in applications. In the applications the functions usually containseveral constants or parameters and derivative denoted by (3.5) shows which variable is differ-entiated and which symbol is considered as an independent variable. As a particular example, ifthe variableV given by the formulaV = πr2h is considered as a function of the variabler, thedifferentiation gives

dVdr

= 2πrh.

When the same variableV is considered as a function of the variableh we havedVdh

= πr2.

The chain rule written in this notation has the formdudx

=dudv

dvdx

whereu is a function ofv, v is a function ofx, and we differentiated the composite functionu(v(x)). This rule is easy to remember since the notation is similar to the multiplication offractions, where the term dv “cancels”.

4. Mathematical applications of derivatives

Remark 4.1 (linear approximation of a function). Let f be a function differentiable at the pointx = a. Then we can find the equation of its tangent in the pointa by using (3.3). From the graphis clear that this tangent is the best linear function which approximatesf (x) near the pointa.Hence we can write an approximate formula

f (x) ≈ f (a)+ f ′(a)(x−a) (4.1)

which approximates the functionf by a linear function. Remember that this approximation isusually convenient for the points very close to the pointa only. If this linear approximation isnot sufficient in a particular problem, we can approximate the function f by a higher degreepolynomial, as will be explained later (Taylor formula on page 75).

The derivative presents also an extremely powerful tool foran evaluation of the limits whichyield an indeterminate form. Namely, the following theorem(due to J. Bernoulli!) can be used.

4. MATHEMATICAL APPLICATIONS OF DERIVATIVES 36

Theorem 4.1(l’Hospital’s rule). Let a∈ R∗ and let f and g be functions defined and differen-

tiable in some ring neighborhood of the point a. Suppose thateither limx→a

f (x) = limx→a

g(x) = 0

or | limx→a

g(x)| = ∞ holds. Then

limx→a

f (x)g(x)

= limx→a

f ′(x)g′(x)

, (4.2)

holds if the limit on the right-hand side exists (finite or infinite). The same holds for one-sidedlimits as well.

Remark 4.2. The l’Hospital’s rule can be used for evaluation of the indeterminate forms00

and∞∞

and also for the limits of the fractions in which denominatorapproaches∞ and the limit of

numerator either does not exist or it is unknown. (If a finite limit of the numerator exists and ifthe limit of denominator is not finite, then the limit of the quotient is clearly 0, as Theorem 1.4shows.) Remark that if the limit on the right in the formula (4.2) does not exist, we cannot drawany conclusion about the original limit on the left of (4.2).This original limit may exist or mayfail to exist.

Remark 4.3. If the limit on the right-hand side of (4.2) is (after evaluation of the derivatives andsubstitutionx = a) still one of the indeterminate forms, we can use the l’Hospital’s rule again.Usually, after a finite number of consecutive applications of l’Hospital’s rule we obtain the limitwhich can be evaluated simply by substitution. In this case the original limit equals to the lastlimit. (Remember that before each application of l’Hospital’s rule we must show that the limit is

either00

oranything

±∞).

It may also happen that after an application of l’Hospital’srule we obtain either the same limitor worse and more difficult limit. In this case we must look foranother method for evaluation ofthe limit. However, this case is rather extraordinary.

Example 4.1.Find limx→∞

lnx√x

, limx→0

x−arctgxx3 and lim

x→0

xex +x−2ex +2x3 .

Solution:

limx→∞

lnx√x

=∞∞

l’H.r.= lim

x→∞

1x1

2√

x

=00

simpl.= 2 lim

x→∞

1√x

subst.= 0,

limx→0

x−arctgxx3 =

00

l’H.r.= lim

x→0

1− 11+x2

3x2 =00

simpl.= lim

x→0

13(1+x2)

subst.= −1

3,

limx→0

xex +x−2ex +2x3 =

00

l’H.r.= lim

x→0

ex +xex +1−2ex

3x2

=00

l’H.r.= lim

x→0

ex +ex +xex−2ex

6xsimpl.= lim

x→0

xex

6xsimpl.= lim

x→0

ex

6subst.=

16

5. EXTREMAL PROBLEMS 37

Remark 4.4(other indeterminate forms). The l’Hospital’s rule can be usually used also to find alimit of the form 0.(±∞) and∞−∞. In each of these cases we must convert the function insidethe limit into a form of a fraction since the l’Hospital’s rule (4.2) concerns fractions only.

• In the case of the indeterminate form 0.∞ we write one of the functions as a fractionand convert the product into one fraction.

• In the case of the indeterminate form∞−∞ we write both terms as fractions and sub-tract by converting into common denominator.

To write a function as a fraction we can use the following tricks.If this function appears

in the limit,try to write it this form.

tgx1

cotgxor

sinxcosx

cotgx1

tgxor

cosxsinx

ef (x) 1

e− f (x)

f (x)11

f (x)

The last possibility can be used for any functionf (x) but this approach yields usually the mostdifficult computation.

Example 4.2.Find limx→0+

xlnx, limx→∞

x2e−x2and lim

x→1(1−x) tg

πx2

.

Solution:

limx→0+

xlnx = 0×∞ = limx→0+

lnx1x

=∞∞

l’H.r.= lim

x→0+

1x

− 1x2

simpl.= lim

x→0+−x

subst.= 0,

limx→∞

x2e−x2= ∞×0 = lim

x→∞

x2

ex2 =∞∞

l’H.r.= lim

x→∞

2x

2xex2

simpl.= lim

x→∞

1

ex2

subst.= 0,

limx→1

(1−x) tgπx2

= 0×∞ simpl.= lim

x→1

1−x

cotgπx2

=00

l’H.r.= lim

x→1

−1

−π2

sin−2 πx2

subst.=

2π

.

For a summing-up discussion about limits study once more Remark 1.11 on the page 26.

5. Extremal problems

Remark 5.1(optimal control). The main problem of the optimal control is to find the conditionsunder which the quantity, we are interested in, is optimal insome sense. Thus, depending ofthe context, we would like to gain a maximal profit from producing or selling things, to gain aminimal costs for a production or a minimal amount of produced waste.

In this section we focus ourselves onto the problem to find thepoints in which the function takeson its extremal values. More precisely, we will use the following concept.


Definition (local extrema). Let f : R → R andx0 ∈ Dom( f ). The functionf is said to take onits local maximumat the pointx0 if there exists a neighborhoodN(x0) of the pointx0 such thatf (x0) ≥ f (x) for all x∈ N(x0).

The function f is said to take on itssharp local maximumat the pointx0 if there exists aneighborhoodN(x0) of the pointx0 such thatf (x0) > f (x) for x∈ N(x0)\{x0}.

If the opposite inequalities hold, then the functionf is said to take on itslocal minimumor sharplocal minimumat the pointx0.

A common word for local minimum and maximum is alocal extremum(pl. extrema). A com-mon word for the sharp local maximum and the sharp local minimum is asharp local extremum.

In plain words, the functionf takes on its sharp local maximum at the pointx = a if there is nogreater value thanf (a) in a neighborhood of the pointa (i.e. close to the pointx = a).There exists a close relationship between the existence of alocal extremum and monotonicity ofthe function, as the following theorem shows.

Theorem 5.1(sufficient conditions for (non-)existence of local extrema). Let f be a functiondefined and continuous in some neighborhood of x0.

• If the function f is increasing in some left-hand side neighborhood of the point x0 anddecreasing in some right-hand side neighborhood of the point x0, then the function ftakes on its sharp local maximum at the point x0.

• If the function f is decreasing in some left-hand side neighborhood of the point x0 andincreasing in some right-hand side neighborhood of the point x0, then the function ftakes on its sharp local minimum at the point x0.

• If the function f is either increasing or decreasing in some (two-sided) neighborhoodof the point x0, then there is no local extremum of the function f at x0.

Remark 5.2. Graphically the situation can be represented in the following scheme.ր

a

MAXցb

min րc

րd

MAXց

Remark 5.3 (absolute extremum). Consider a functionf defined and continuous on the closedEinterval[a,b]. According to Weierstrass Theorem 2.1, page 27 the functionf takes on its absolutemaximum and its absolute minimum on the interval[a,b]. It is clear (explain!) that this absoluteextremum can occur only at the local extremum or at one of the end points of the interval[a,b].(Consult also Figure 5, page 28.)

The following definition concerns points with a very close relationship to local extrema.

Definition (stationary point). The pointx0 is said to be astationary pointof the function f iff ′(x0) = 0.

Remark 5.4 (geometric interpretation). Tangent to the function in the stationary point is a hori-zontal line (explain!).

Example 5.1. • The functionsy = ex andy = lnx have no stationary point. Really, nei-

ther of the equationsex = 0 and1x

= 0 possesses a solution inR.


• The functionsy = x2 andy = x3 have its stationary point at the pointx = 0. Really,both equations 2x = 0 and 3x2 = 0 possess a unique solutionx = 0.

• The functiony =x2

x3 +1has the derivativey′ = −x(x3−2)

(x3+1)2 and the stationary points

are the solutions of equation

−x(x3−2)

(x3 +1)2 = 0,

i.e. x = 0 andx =3√

2.

Theorem 5.2(relationship between stationary point and local extremum). Let f be a function Edefined in x0. If the function f takes on a local extremum at x= x0, then the derivative of thefunction f at the point x0 either does not exist or equals zero and hence x= x0 is a stationarypoint of the function f .

Example 5.2.All the following observations are in harmony with Theorem 5.2. E• The pointx = 0 is a stationary point and a local minimum of the functiony = x2.• The pointx = 0 is a stationary point of the functiony = x3. However, this function

takes on no local extremum at this point.• Both functionsy = |x| and y =

3√

x2 take on its local minimum at the pointx = 0.However, none of them has a derivative atx = 0.

Remark 5.5 (strategy for establishing of local extrema). According to Theorem 5.2, a functionf cannot take on a local extremum at the point where the derivative of the functionf does notequal zero. Hence every local extremumx = x0 belongs to one of the following groups.

• The pointx = x0 is a stationary point of the functionf , i.e. f ′(x0) = 0• The pointx0 belongs toDom( f ) and the function is not differentiable atx0, i.e. f (x0)

is well-defined andf ′(x0) is undefined.Usually, there are only few points which satisfy one of thesetwo conditions. If we locate all ofthese points and all points of discontinuity, then we can findlocal extrema of the functionf aswell. To test a point with a zero or undefined derivative for anexistence of a local extremum, itis sufficient to establish the type of monotonicity on the left and on the right of this point and useTheorem 5.1. The type of monotonicity can be established from the following theorem.The statement of this theorem is quite natural. It states that a function and its tangent have thesame type monotonicity, both are increasing or both are decreasing. The monotonicity of the tan-gent can be established from its slope: negative slope meansdecreasing tangent and decreasingfunction, positive slope means increasing tangent and increasing function.

Theorem 5.3(relationship between derivative and monotonicity). Let f be a function. Supposethat f is differentiable on the open interval I.

• If f ′(x) > 0 on I, then the function f is increasing on I.• If f ′(x) < 0 on I, then the function f is decreasing on I.

The following algorithm summarizes the method which can be used to find local extrema of afunction.

Algorithm 5.1 (first derivative test). Let f be an elementary function. Local extrema of thefunction f can be found in the following steps.


(i) We find a natural domain of the functionf (i.e. the set of allx for which the functionis defined)

(ii) We find the derivativef ′(x) of the function f (x). We simplify the derivative as muchas possible. If possible, we factor the derivative into a product of several simpler ex-pressions.

(iii) We solve the equationf ′(x) = 0. If the derivative is factored into the product of severalsimple expressions, we use the fact that a product equals zero only if and only if atleast of the terms in the product equals zero. This yields thestationary points of thefunction.

(iv) We specify the domain of the derivativef ′(x). Usually it is the same as the domain ofthe original functionf (x). This gives the points of discontinuity of the derivative.

(v) Suppose that we found all stationary points and all points of discontinuity of the deriv-ative. We mark these points on the real line.

(vi) Now, the real line is divided into several subintervals. On each subinterval the deriva-tive preserves its sign (in harmony with Theorem 2.2).

(vii) We choose an arbitrary test number in each subinterval, sayξ in thei-th subinterval. Wefind the value of the derivative at this number,f ′(ξ ), and from the sign of the derivativewe specify whether the functionf is increasing or decreasing atξ and, consequently,on the whole subinterval. We record the type of monotonicityin a simple graphicalscheme and do this for all subintervals.

(viii) We find the points where the function is continuous andthe type of monotonicitychanges. At these points the functionf (x) takes on a local extremum.

(ix) To recognize the type of an extremum, we consider the type of monotonicity on theright and on the left.

Example 5.3.Find local extrema and intervals of monotonicity of the function f : y= 8x13 +x

43 .

Solution:The function is defined for allx∈ R. The derivative can be evaluated by the power ruleand the sum rule

y′ = 813

x−23 +

43

x13 =

43

x−23(2+x) =

43

2+x

x23

.

It is clear that the derivative equals zero atx=−2 and it is undefined atx= 0. These points dividethe real line into three subintervals. In these subintervals consider test pointsξ1 = −3, ξ2 = −1andξ3 = 1. Evaluating the derivative at the test points we find thatf ′(−3) < 0, f ′(−1) > 0 andf ′(1) > 0. Hence we obtain the scheme

ց−2

min ր0

ր

From the scheme it is clear that the function takes on a local minimum atx=−2 and it possessesno other local extremum. Intervals of monotonicity are recorded in the picture.

Example 5.4.Find local extrema and intervals of monotonicity of the function f : y = x2e−x.Solution: The function is defined for allx∈ R, i.e. Dom( f ) = R. Derivative of the function isevaluated by the product rule and the functione−x is differentiated by the chain rule

y′ =(

x2e−x)′ = 2xe−x−x2e−x = x(2−x)e−x.

The solutions of the equation

y′ = x(2−x)e−x = 0


arex1 = 0 andx2 = 2. There are no points of discontinuity of the derivative. The numbersx1,x2 divide the real line into three subintervals. In the subintervals we choose test pointsξ1 = −1,ξ2 = 1 andξ3 = 3. Evaluating the derivative in these test points we obtain

y′(ξ1) = y′(−1) = −1.3.e1 < 0,

y′(ξ2) = y′(1) = 1.1.e−1 > 0,

y′(ξ3) = y′(3) = 3.(−1).e−3 < 0.

The information about the monotonicity is recorded in the following picture.ց

0

min ր2

MAXց

The function takes on a local minimum at the pointx= 0 and a local maximum at the pointx= 2.

Example 5.5.Find local extrema and intervals of monotonicity of the function f : y =x2

lnx.

Solution: The function ln(x) is well-defined for allx > 0 and the quotient is well-defined iflnx 6= 0. HenceDom( f ) = R

+ \ {1} = (0,1)∪ (1,∞). The derivative of the functionf can beevaluated by the quotient rule

y′ =2xlnx−x2 1

x

ln2x=

x(2lnx−1)

ln2x.

The solutions of the equation

y′ =x(2lnx−1)

ln2x= 0

are the solutions of

x(2lnx−1) = 0.

One of the solutions seems to bex1 = 0, but this point is not in the domain of the functionf .Hence we can divide byx and get

2 lnx−1 = 0.

Isolating lnx we get lnx =12

and an application of the inverse function givesx2 = e1/2. The

derivative is not defined forx3 = 1. This point and the pointx1 = 0 are the points of disconti-nuity of the functionf (x). The numbersx1 = 0, x3 = 1 andx2 = e1/2 divide real axis into foursubintervals. However, the leftmost interval does not belong to the domain off and, in fact, we

consider only three subintervals(0,1), (1,e1/2) and(e1/2,∞). Consider the test pointsξ1 =12

,

ξ2 = e1/4 andξ3 = e. Evaluating the derivative in these test points we obtain

y′(ξ1) = y′(

12

)

=12

(

2ln 12 −1

)

ln2 12

< 0,

y′(ξ2) = y′(

e1/4)

=e1/4

(

214 −1

)

(14

)2 < 0,

y′(ξ3) = y′ (e) =e(2.1−1)

(1)2 > 0,

The intervals of monotonicity are recorded in the followingscheme.

6. CONCAVITY 42

/0 ◦0

ց ◦1

ցe1/2

min ր

The function takes on its local minimum at the pointx = e1/2 and it has no local maximum.

6. Concavity

Another property of functions which possesses a clear interpretation on the graph is concavity.

Definition (concavity). Let f be a function differentiable atx0.

The functionf is said to beconcave up(concave down) atx0 if there exists a ring neighborhoodN(x0) of the pointx0 such that for allx ∈ N(x0) the points on the graph of the functionf areabove (below) the tangent to the graph in the pointx0, i.e. if

f (x) > f (x0)+ f ′(x0)(x−x0)(

f (x) < f (x0)+ f ′(x0)(x−x0))

(6.1)

holds.

The function is said to beconcave up (concave down) on the interval Iif it has this property ineach point of the intervalI .

Example 6.1. • The natural exponential functiony = ex is concave up onR and its in-verse, the natural logarithmic functiony = lnx, is concave down onR+.

• The cubic functiony = x3 is concave down forx < 0 and concave up forx > 0.• The functiony = sinx is concave down on the interval(0,π).

Remark 6.1. Consider a functionf defined on the intervalI . Suppose, for brevity, that thefunction f is increasing onI .

• If the function f is concave up onI , then the rate of the increase of the functionf isincreasing; the increase of the functionf speeds up.

• If the function f is concave down onI , then the rate of the increase slows down.At the points where the type of concavity changes, the velocity of the increase is either highestor smallest comparing with the points in the neighborhood. From this reason, these points are ofhigh interest when we investigate the functionf and these points have a special name, introducedby the following definition.

Definition (point of inflection). The pointx0 in which the type of concavity changes is said tobe apoint of inflectionof the functionf .

Example 6.2. • The functiony = x3 has the point of inflection atx = 0.• The functiony = sinx has the points of inflection atx = kπ, wherek is an arbitrary

integer.

The concavity contains an information about the rate of change of the increase or decrease. Sincethe information about the increase or decrease is containedin the first derivative and the rate ofchange can be recorded also by the derivative, it is natural to expect that the concavity of thefunction is closely related with the second derivative. Thefollowing theorem introduces thisrelationship.

7. BEHAVIOR OF THE FUNCTION NEAR INFINITY, ASYMPTOTES 43

Theorem 6.1(relationship between the 2nd derivative and concavity). Let f be a function andf ′′ be the second derivative of the function f on the open interval I.

• If f ′′(x) > 0 on I, then the function f is concave up on I.• If f ′′(x) < 0 on I, then the function f is concave down on I.

The concavity can also indicate the type of local extremum ata stationary point. Roughly speak-ing, a concave-up function cannot take on a local maximum anda concave-down function cannottake on a local minimum.Theorem 6.2(2nd derivative test, concavity and local extrema). Let f be a function and x0 astationary point of this function.

• If f ′′(x0) > 0, then the function f has its local minimum at the point x0.• If f ′′(x0) < 0, then the function f has its local maximum at the point x0.• If f ′′(x0) = 0, then the2nd derivative test fails. A local extremum may or may not

occur. Both cases are possible.

To find the intervals on which the function is concave up/downwe use the same method as weused to find the intervals of monotonicity of the function. The only difference is that we workwith the second derivativef ′′(x) instead off ′(x) and we interpret the signs off ′′(x) in terms ofconcavity of f (x).

7. Behavior of the function near infinity, asymptotes

In the remaining part of this chapter we will investigate functions near the points+∞ and−∞.We will be interested in the fact, whether the graph approaches a line or not.

Definition (inclined asymptote). Let f be a function defined in some neighborhood of+∞. Theline y= kx+q is said to be aninclined asymptote at+∞ to the graph of the function y= f (x) if

limx→∞

|kx+q− f (x)| = 0

holds.

Similarly, if we consider the point−∞ instead of+∞, we obtain the definition of the inclinedasymptote at−∞.

Remark 7.1. As a special case of the preceding definition we obtain fork = 0 the horizontalasymptote, introduced on the page 23.

Theorem 7.1(inclined asymptote). Let f be a function defined in some neighborhood of thepoint+∞. The line y= kx+q is an inclined asymptote at+∞ to the graph of the function f(x)if and only if the following limits exist as finite numbers

k := limx→∞

f (x)x

and q:= limx→∞

( f (x)−kx). (7.1)

Similarly, if we consider−∞ instead of+∞, we obtain the asymptote at−∞.

The following theorem presents an equivalent definition of the inclined (or horizontal) asymptote.The meaning is the same – the vertical distance of the points on the graph and the points on theasymptotic line approaches zero.

7. BEHAVIOR OF THE FUNCTION NEAR INFINITY, ASYMPTOTES 44

Theorem 7.2.Let f be a function defined in some neighborhood of+∞ The line y= kx+q isan asymptote to the graph y= f (x) if and only if the function f can be written in the form

f (x) = kx+q+g(x),

where g(x) tends to zero as x tends to infinity, i.e. limx→∞g(x) = 0.A similar statement holds also for−∞.

Figure 8 illustrates this obvious statement.

y = f (x)y = kx+qg(x)

FIGURE 8. Inclined asymptote at+∞.

Example 7.1. • The long division of polynomials shows that the functionf (x) =x3

x2 +1can be written in an equivalent form

f (x) =x3

x2 +1= x− x

x2 +1.

The first part of this sum is a linear function and the second one tends to zero asxapproaches to±∞. Hence the liney = x is an asymptote to the graph of the functionfat both±∞.

• The functionf (x) = 2x+π −arctgx can be written in an equivalent form

f (x) =(

2x+π2

)

+(π

2−arctgx

)

.

The expression in the first parentheses is linear and the expression inside the second

parentheses tends to a zero asx approaches to∞, since limx→∞

arctgx=π2

. Hence the line

y = 2x+π2

is an asymptote to the graph of the functionf at +∞. In a similar way we

can show that the asymptote to the graph of the functionf at−∞ is y = 2x+32

π.

• Concerning the last two examples, an application of Theorem7.1 based on the evalua-tion of the limits (7.1) is also possible, but cumbersome.

In general, there exists no relationship between the asymptote at+∞ and−∞. As convenientexamples consider horizontal asymptote and Examples 1.8, 1.9 and 1.10, page 23. We must findboth asymptotes separately. Only in a very special case, thecase of rational functions, we canuse the following theorem.

Theorem 7.3(inclined asymptotes of rational functions). Inclined asymptotes to the graph ofa rational function at±∞ exist simultaneously (either both exist or both do not exist) and areequal.

8. INVESTIGATION OF A FUNCTION, CURVE SKETCHING 45

8. Investigation of a function, curve sketching

Now, we will be interested in the problem to find out the most important properties and somedetailed information about the function. Our main object isnot to transfer a complete tabulationof functional values to graph paper but to gain a good qualitative (and maybe also quantitative)understanding of the function shape with as little effort aspossible. To gain a rapid impressionof a function and to explore its main properties we employ various methods of the differentialcalculus.Usually we investigate a functionf in the following steps.

(i) We find the domain off , decide whetherf is odd, even or periodical.(ii) We find intercepts of the graph with axes and intervals where the value of the function

is positive and/or negative.(iii) We find the one-sided limits at the points of discontinuity and at±∞.(iv) We find and simplify the derivativef ′. Then we find intervals wheref is increasing

and/or decreasing and local extrema of the functionf .(v) We find and simplify the second derivativef ′′. Then we find the intervals wheref is

concave up and/or down and inflection points of the functionf .(vi) If the limits in +∞ and/or−∞ are not finite, we find inclined asymptotes in these points.

(vii) We sketch the graph of the functionf .When constructing the graph of given function we are interested less in drawings of high accuracythan in drawings which “show trends”. From the graph of the function it must be clear where thefunction is increasing and where decreasing, where there are breaks in the graph, where there arezeros and local extrema of the function,where it is concave up and down and what is its shapegenerally. An idea to locate a few points of the graph and connect them blindly is usually nothelpful.The fact that it is sometimes better not to keep the same scalefor all parts of the axes can bedemonstrated on the comparison of the hand-created pictures to the following examples and thegraphs generated by computer. (Try to generate the graphs onthe computer yourself. M$ Excelis not the best tool for this, but still can be used.)Especially, an investigation of the function from Example 8.1 based on the functional valuesonly gives not a good understanding about the shape of the function since it approaches anasymptote very slowly. On the computer you either cannot seelocal extrema, or you cannot seean asymptote. (Which one of these possibilities occurs depends on the interval forx, where thecomputer is asked to visualize the graph.)

Example 8.1. Investigate the functiony =ln2x

x.

Solution:We establish the domain of the function from the conditionsx > 0 (x is inside a loga-rithm) andx 6= 0 (x is in the denominator). This shows thatDom f = (0,∞).The function is not defined forx = 0 and hence the graph has noy-intercept.The equationy = 0 gives

ln2xx

= 0,

ln2x = 0,

lnx = 0,

x = 1,


and the graph has anx-intercept atx = 1.We find intervals on which the graph is below and above thex-axis. We divide thex-axis bythe numbersx = 0 (the point of discontinuity) andx = 1 (thex-intercept) into three subintervals.The leftmost subintervals does not belong to the domain of the function f and hence we focusout attention to the other two subintervals. Considering test numbers∗ ξ1 = e−1 andξ2 = e andevaluating the sign of the functionf in these test numbers we obtain

/0 ◦0

+++

1

+++

and the function is nonnegative for allx∈ Dom( f ).Let us investigate the limits of the functionf at zero (a point of discontinuity, i.e. a candidate fora vertical asymptote) and infinity.

limx→0+

ln2xx

=(−∞)2

+0=

∞+0

= ∞

limx→∞

ln2xx

=∞∞

= limx→∞

2lnx1x

1= 2 lim

x→∞

lnxx

= 2∞∞

= 2 limx→∞

1x

1= 2

1∞1

= 201

= 0

This shows that the graph has a vertical asymptotex= 0 and a horizontal asymptotey= 0 at+∞.Let us find the derivative of the function. We use the quotientrule and when differentiating thenumerator of the fraction we use the chain rule.

y′ =2ln(x)1

xx− ln2x

x2 =lnx(2− lnx)

x2

We try to find the intervals of monotonicity. To do this, we solvey′ = 0 first. Since the derivativeis in the form of quotient, it is sufficient to consider the numerator only†. This gives the equation

lnx(2− lnx) = 0

and hence either lnx = 0 or lnx = 2. From here we obtain two stationary pointsx1 = 1 andx2 = e2. Marking these two points and the point of discontinuityx = 0 we obtain a division ofthex-axis into subintervals. The leftmost subinterval does notbelong to the domain. Choosing atest point in each of the remaining subintervals (e.g.ξ1 = e−1, ξ2 = e1, ξ3 = e3) and evaluatingthe derivative in that point we obtain

/0 ◦0

ց1

min րe2

MAXց

This shows that the function takes on a local minimum at the point x = 1. The value of the

function in this local minimum isy = f (1) =ln21

1= 0 — it is exactly thex-intercept. Similarly,

the function has its local maximum in the point[x = e2,y = 4e−2].It is convenient to check that these results do not contradict to the previous results. Investigatingthe function, we obtain the following properties.

∗When looking for these test points remember that the logarithm of a real number can be simply evaluated if thisnumber is in an exponential form. Also remember that the function ex is an increasing function and ifc is a numberbetweena andb, then the numberec is a number betweenea andeb. Hence−1 < 0 < 1 impliese−1 < 1 < e.

†A quotient of two real numbers equals zero if and only if the numerator equals zero


• Since limx→0+

f (x) = ∞, the function is decreasing in some right ring neighborhoodof 0.

• Since the function is positive on its domain and has an horizontal asymptotey = 0, itdecreases to this asymptote.

• Since the functionf has anx-interceptx = 1 and is positive on both right and left ofthis intercept, it has a local minimum at this point.

All these facts must be in harmony with the scheme of monotonicity. And, in our computations,they really do.Let us proceed with the second derivative. To find the second derivative we use the first derivativein the form convenient for differentiation, namely

y′ =2lnx− ln2x

x2 ,

and differentiate. We obtain (by the quotient rule and the chain rule and after simplifications)

y′′ =

(

21x −2lnx1

x

)

x2−2x(

2lnx− ln2x)

x4

=2x−2xlnx−4xlnx+2xln2x

x4

= 21−3lnx+ ln2x

x3 .

To find the points for whichy′ = 0 it is sufficient to find zeros of the numerator. This yields

1−3lnx+ ln2x = 0.

Substituting lnx = t we obtain a quadratic equation

1−3t + t2 = 0.

An application of quadratic formula yields two solutionst1,2 =3±

√5

2. Hence the points with

vanishing second derivative arex3 = e3+

√5

2 andx4 = e3−

√5

2 . The point of discontinuity of thesecond derivative is the same as the point of discontinuity of the original function, i.e.x= 0. Wedivide the real axis by these points into four subintervals,omit the leftmost subinterval whichdoes not belong to the domain and establish the sign of the second derivative in convenient test

points (e.g.ξ1 = e0 = 1, ξ2 = e32 andξ3 = e3). We obtain

/0 ◦0

∪

e3−

√5

2

∩

e3+

√5

2

∪

We can again check the correspondence between these resultsand the previous results. Thefollowing facts are known from the first derivative and the limits of the functionf .

• The function is known to have a local minimum atx = 1 and hence it is concave up atthis point.

• The function is known to have a local maximum atx= e2 and hence it is concave downat this point.

• The function decreases to the horizontal asymptote and hence it is concave up in someneighborhood of infinity.

• The function approaches to infinity asx approaches to zero from the right. Hence it isconcave up in some right neighborhood of zero.


All these facts correspond well with the scheme which shows the intervals of concavity, as thereader can check immediately.Finally, we sketch the graph of the function. When marking the points in the Cartesian coordi-nates remember that

0 <3−

√5

2< 2 <

3+√

52

and hence

1 < e3−

√5

2 < e2 < e3+

√5

2 .

Really, the last set of inequalities is an easy consequence of the preceding inequalities and thefact that exponential function is an increasing function.

1e

3−√

52

e2e

3+√

52

FIGURE 9. The graph to Exercise 8.1

Example 8.2. Investigate the functionf : y =x2

x2−4.

Solution:The restrictionx2−4 6= 0 yields the domainDom( f ) = R\{±2}. The rational functioncontains even powers ofx only and hence it is an even function and the graph is symmetricalabout they-axis∗. As a consequence, all local extrema, points of inflection and asymptotes havesymmetrical counterparts.Thex-intercepts are the solutions of

x2

x2−4= 0,

i.e. x= 0. Thisx-intercept and the points of discontinuity divide thex-axis into four subintervals.Investigating the sign of the function in the test pointsξ1 = −3, ξ2 = −1, ξ3 = 1 andξ4 = 3 weobtain the scheme†

+++ ◦−2

−−−

0

−−− ◦2

+++

Let us investigate limits at the points of discontinuityx=±2. Substitution of these numbers intothe function shows that the one-sided limits in these pointsare not finite (Theorem 1.7 on the

∗See also Theorem 3.1, page 10†The same scheme can be obtained from the fact that the numerator of the fraction is always nonnegative and

hence the sign of this fraction is the same as the sign of the quadratic function in the denominator.


page 25 can be used). From the signs of the function on the subintervals we obtain (without anyadditional calculations)

limx→−2±

x2

x2−4= ∓∞, lim

x→2±

x2

x2−4= ±∞.

Further, by Theorem 1.8 on the page 25,

limx→±∞

x2

x2−4= lim

x→±∞

x2

x2 = limx→±∞

1 = 1

and the function has a horizontal asymptotey = 1 in both+∞ and−∞.Differentiating the function by the quotient rule and simplifying we obtain

y′ =2x(x2−4)−x22x

(x2−4)2

= −8x

(x2−4)2 .

The solution of the equationy′ = 0 is x = 0. The points of discontinuity of the derivative are thezeros of the denominator:x=±2. Evaluating the derivative∗ at the test pointsξ1 =−3, ξ2 =−1,ξ3 = 1 andξ4 = 3 we obtain

ր ◦−2

ր0

MAXց ◦2

ց

The function takes on its local maximum atx = 0. This was known already from the fact that thegraph has anx-intercept at this point and is negative on the left and on theright of this point.Differentiating the first derivative, taking out the repeating factor(x2−4) in the numerator andsimplifying we obtain

y′′ = −8(x2−4)2−x2(x2−4)2x

(x2−4)4

= −8(x2−4−4x2)(x2−4)

(x2−4)4

= 83x2+4

(x2−4)3 .

The numerator of the second derivative is always nonzero andhence the type of concavity maychange only at the points of discontinuity. We have the following scheme

∪ ◦−2

∩ ◦2

∪

We collect all informations about the function in the Figure8.2.

Example 8.3. Investigate the functionf : y =x

lnxSolution:The restrictions onx from the logarithmic function and from the quotient arex> 0 andlnx 6= 0. HenceDom( f ) = (0,∞)\{1} and the points of discontinuity arex = 0 andx = 1.The equation

xlnx

= 0

∗Remark that the denominator of the derivative is always nonnegative and it has no influence on the sign of thederivative. From this reason it is sufficient to substitute into the numerator only


−2 2


has no solution on the domain (the only “aspirant”x = 0 does not belong toDom( f )) and thefunction may change its sign in the points of discontinuity only. Considering all of the points ofdiscontinuity, splitting the domain into corresponding subintervals and substituting convenienttest points we obtain the following scheme.

/0 ◦0

−−− ◦1

+++

One sided limits at the points of discontinuity are

limx→0+

xlnx

=0−∞

= 0, limx→1±

xlnx

=1±0

= ±∞.

The limit at infinity is an indeterminate form and can be established by the l’Hospital’s rule asfollows.

limx→∞

xlnx

=∞∞

= limx→∞

11x

= limx→∞

x = ∞ (8.1)

Differentiating the functionf by the quotient rule we obtain

y′ =lnx−1

ln2x.

The zero point of the derivative is the solution of lnx = 1, i.e. x = e. This point and the pointsof discontinuity divide thex-axis into subintervals where the function preserves its monotonic-ity. Investigating the derivative in convenient test points from these subintervals we obtain thefollowing scheme with monotonicity.

/0 ◦0

ց ◦1

ցe

min ր

The function takes on a local minimum atx = e. The value of the function at this point is

y(e) =e

lne= e.

Differentiating f ′(x) we have

y′′ =

1x ln2x− (lnx−1)2lnx

1x

ln4x

= − lnx−2

xln3xwith the intervals of concavity as the following picture shows.


/0 ◦0

∩ ◦1

∪e2

∩

Finally we test whether the function possesses an asymptoteat infinity. We have

k = limx→∞

xlnx

x= lim

x→∞

1lnx

=1∞

= 0

q = limx→∞

xlnx

−0x = limx→∞

xlnx

= ∞

(see (8.1)). Hence the function has no asymptote at infinity.

1e e2


Example 8.4. Investigate the functionf : y =x3

2(x+1)2 .

Solution:The denominator of the quotient cannot equal zero, which is the only restriction onx.HenceDom( f ) = R\{−1}. The sign of the function is given by the following scheme.

−−− ◦−1

−−−

0

+++

One-sided limits at the point of discontinuity are

limx→−1±

x3

2(x+1)2 =−1+0

= −∞

and the limits at infinity are

limx→±∞

x3

2(x+1)2 = limx→±∞

x3

2x2 = limx→±∞

x2

= ±∞.

Differentiating we obtain

y′ =12

3x2(x+1)2−x32(x+1)

(x+1)4 =12

x2(x+1)[3(x+1)−2x](x+1)4

=x2(x+3)

2(x+1)3 =x3 +3x2

2(x+1)3

and henceր

−3

MAXց ◦−1

ր0

ր


The function has its local maximum at

[

x = −3,y = −278

]

.

Differentiating once more we have

y′′ =12

(3x2+6x)(x+1)3− (x3 +3x2)3(x+1)2

(x+1)6

=12

(3x2+6x)(x+1)−3(x3 +3x2)

(x+1)4

=3x

(x+1)4

and∩ ◦

−1∩

0∪

Since the degree of both polynomials in numerator and in denominator differ by one, there is aninclined asymptote at±∞. The coefficients from the slope-intercept equation of the asymptotey = kx+q are evaluated by the following computations (see Theorem 7.1 and Theorem 1.8).

k = limx→∞

f (x)x

= limx→∞

x2

2(x+1)2 = limx→∞

x2

2x2 =12,

q = limx→∞

f (x)−kx= limx→∞

x3

2(x+1)2 −12

x = limx→∞

x3− (x3 +2x2 +2x)2(x+1)2

= limx→∞

−2x2−2x2(x2 +1)

= limx→∞

−2x2

2x2 = −1.

Hence the liney =12

x−1 is an asymptote of the graph of the function at+∞. This line is also

an asymptote in−∞. Really, the reader can check that the same computations canbe extendedalso to the case of limits at−∞. (We can also use the statement of Theorem 7.3. According tothis theorem, asymptotes of every rational function at±∞ are equal.)

−3 −1


Example 8.5. Investigate the functionf : y =(x−1)2

x2+1.

Solution: There is no restriction on the domain, henceDom( f ) = R and the function has nopoints of discontinuity. The solution of the equation

(x−1)2

x2 +1= 0


is x = 1. The function is nonnegative for allx ∈ R. Really, the denominator of the quotient ispositive (since it is a sum of a square and a positive number) and the numerator is nonnegative(since it is a square).The limits at infinity are

limx→±∞

(x−1)2

x2 +1= lim

x→±∞

x2

x2 = 1

and the liney = 1 is a horizontal asymptote to the graph of the function at both±∞.Differentiating by the quotient rule we obtain

y′ =2(x−1)(x2+1)− (x−1)22x

(x2 +1)2 =2(x−1)(x2+1−x2 +x)

(x2+1)2

= 2(x−1)(x+1)

(x2 +1)2

andy′ = 0 for x1,2 = ±1. Hence the intervals of monotonicity areր

−1

MAXց1

min ր .

and the function takes on a local maximum[x = −1,y = 2] and a local minimum[x = 1,y = 0].Differentiating the derivative in a form convenient for differentiation we obtain

y′′ = 2

(

x2−1(x2+1)2

)′= 2

2x(x2 +1)2− (x2−1)2(x2 +1)2x(x2+1)4

= 4x(x2 +1)x2 +1−2(x2−1)

(x2 +1)4 = 4x3−x2

(x2+1)3 .

The equationy′′ = 0 possesses three roots:x1 = 0 andx2,3 =±√

3. For the intervals of concavitywe obtain the following scheme.

∪−√

3

∩0

∪ √3

∩

Hence all the zeros of the second derivative are the points ofinflection.Now, we can collect all informations and sketch the graph of the function.

−√

3 −1 1√

3


Example 8.6. Investigate the functionf : y =x3 +2

2x.

Solution: The restriction from the denominator of the quotient yieldsthe point of discontinuityx = 0 and domainDom( f ) = R \ {0}. The x-intercept can be obtained as a solution of the


equationy = 0: the solution isx = − 3√

2. From thex-intercept and the point of discontinuity wecan establish the signs of the function.

+++

− 3√

2

−−− ◦0

+++

The one-sided limits at the point of discontinuity are infinite (see Theorem 1.7) and the sign ofthe limits can be determined from the scheme with the signs ofthe function.

limx→0±

x3 +22x

= ±∞.


y′ =12

(

x2 +2x−1)′ =12

(

2x−2x−2) = x− 1x2 =

x3−1x2 .

The solution ofy′ = 0 isx = 1 and we have the following intervals of monotonicity.ց ◦

0

ց1

min ր

The function takes on its local minimum at

[

x = 1,y=32

]

The second derivative

y′′ = (x−x−2)′ = 1+2x−3 =x3 +2

x3

yields the intervals of concavity as follows.∪− 3√

2

∩ ◦0

∪

There are no asymptotes at±∞ since

k = limx→±∞

x3+22x

x= lim

x→±∞

x3 +22x2 = lim

x→±∞

x2

= ±∞.

Another approach how to establish the asymptotic behavior is based on the long division of thepolynomials in the numerator and the denominator. This longdivision shows

f (x) =x3 +2

2x=

12

x2 +1x.

The second term1x

approaches a zero forx numerically large and can be omitted in the limit

x→±∞. Hence the function has about the same asymptotic behavior as the quadratic function

y =12

x2.

Example 8.7. Investigate the functionf : y = e−x−e−2x.Solution:There are no restriction on the domain, henceDom( f ) = R and the function is contin-uous for allx∈ R. Thex-intercept(s) is (are)

e−x−e−2x = 0,

e−x = e−2x,

ex = 1,

x = ln1,


− 3√

2 1


x = 0

and the signs of the function are given by the following scheme−−−

0

+++


y′ = −e−x +2e−2x = e−2x(2−ex).

Hencey′ = 0 for x = ln2 and we obtain intervals of monotonicity as follows.ր

ln2

MAXց

The function takes on its local maximum at the pointx = ln2. The value of the function in this

point isy = eln2−1 −eln2−2= 2−1−2−2 =

12− 1

4=

14.


y′′ = e−x−4e−2x = e−2x(ex−4)

andy′′ = 0 if and only ifx = ln4. Hence∩

ln4∪

The function is decreasing and bounded below by thex-axis for largex. Hence it possesses ahorizontal asymptote at+∞. The limits of the function at±∞ are

limx→∞

e−x−e−2x = 0−0 = 0,

limx→−∞

e−x−e−2x = ∞−∞ = limx→−∞

1ex −

e−x

ex

= limx→−∞

1−e−x

ex =1−∞+0

= −∞.

Since the exponential function grows up very rapidly, it is natural to expect that the function hasno inclined asymptote at+∞.

Example 8.8. Investigate the functionf : y = (x+1)ex.Solution: The function is defined and continuous for allx ∈ R. Thex-intercept of the graph isx = −1 and hence we have

−−−

−1

+++


ln2 ln4



y′ = ex +(x+1)ex = (x+2)ex

and henceց

−2

min ր


y′′ = ex +(x+2)ex = (x+3)ex

and∩

−3∪

The function is decreasing and bounded above near−∞ (the function is bounded above by thex-axis, as follows from the scheme with the signs of the function). Hence it possesses a horizontalasymptote at−∞. Evaluating the limit at this point we have

limx→−∞

(x+1)ex = −∞.0 = limx→−∞

x+1e−x =

−∞∞

= limx→−∞

1−e−x =

1−∞

= 0

and the function has asymptotey = 0 at−∞. Evaluating the limit at+∞ we have

limx→∞

xex = ∞e∞ = ∞.

Since the exponential function grows up very rapidly, it is natural to expect that the function hasno inclined asymptote in+∞.

−3 −2 −1


9. IMPLICIT DIFFERENTIATION 57

9. Implicit differentiation

In the next paragraphs we introduce a technique which enables to find a derivative of the functiondefined implicitly, such as

y2 +xy+ lnx+ lny = 0. (9.1)

A development of an exact mathematical theory of the implicit functions exceeds our skills at thismoment. For example, at this stage we cannot ensure that for everyx from the domain (9.1) thereexists exactly one numbery which satisfies (9.1). From this reason we will show the techniqueon examples only.Suppose that a functiony = f (x) is well-defined by (9.1) andy0 = f (x0) for some real numbersx0, y0. (This means that the relation

y20 +x0y0+ lnx0 + lny0 = 0

is fulfilled.) To find f ′(x0) we proceed as follows.We differentiate relation (9.1). When differentiating we considery to be a function ofx and hencedydx

= y′. When differentiating the termxy we use the product rule and when differentiatingy2

and lny we use a chain rule because of the “inside” function isy. We obtain

2yy′ +y+xy′ +1x

+1y

y′ = 0.

Isolatingy′ we obtain

y′(

2y+x+1y

)

= −y− 1x,

y′2y2+xy+1

y2 = −xy+1x

,

y′ = − (xy+1)y2

x(2y2 +xy+1).

Now substitutingx = x0 andy = y0 we can evaluatey′(x0). We see that the derivative can befound explicitly in terms ofx andy.

Example 9.1.Given a function in its implicit form

xlny+x2 +y2 = 2, (9.2)

find the derivative at the pointx0 = 1, y0 = 1 and the tangent in this point.Solution:Differentiating (9.2) implicitly we have

lny+x1y

y′ +2x+2yy′ = 0.

Isolatingy′ we get

y′(

xy

+2y

)

= −2x− lny

y′x+2y2

y= −2x− lny

y′ = −(2x+ lny)yx+2y2

9. IMPLICIT DIFFERENTIATION 58

and hence

y′(1) = −(2+ ln1)

1+2= −2

3.

The tangent is (by formula (3.3))

y = 1− 23(x−1) = −2

3x+

53.

CHAPTER III

Multivariable Calculus

Only in the simplest problems the quantity, we are interested in, depends on one variable only.Usually the result depends on many, mutually independent, different input quantities. For exam-ple, the volume of a cylinder,V = πr2h depends on both height and radius of the cylinder. Thus,it is fruitful to study the functions of more independent variables.

1. Basic concepts

In the following we extend the main ideas of the differentialcalculus to multivariable functions.Roughly speaking, there is a principal difference between the calculus of the functions of oneand two variables, but there is no substantial difference between the calculus of functions of two,three orn variables. From this reason we will focus our attention to the functions of two variablesonly. A generalization of our results for the functions ofn variables is natural and easy.

Definition (function of two variables). The rule f which to each ordered pair of the realnumbers(x,y) from the given setA ⊆ R

2 assigns exactly one another real number is called afunction of two variablesdefined onA. We write f : A → R, particularly if A = R

2 we writef : R

2 → R.

• For the function of two variables we define adomainand animageof the function inthe same manner as for the function of one variable.

• Under agraphof the functionf of two variables we understand the set{(x,y,z) ∈ R3 :

(x,y) ∈ Dom( f ) andz= f (x,y)}. Usually the graph can be considered as a surface inthe 3-dimensional space.

• Let z0 be a real number. An intersection of the graph of the functionf (x,y) with thehorizontal planez= z0 is called alevel curve at the level z0. (Such a level curves youknow from the geography.)

2. Continuity

An exact theory of limits is somewhat more complicated for functions of two and more variablesthan theory presented in the preceding chapters. The main cause of this effect is the fact thatthere are many possibilities how we can approach a given point in the plane in contrast to onlytwo possibilities how we can approach given point on the realline.From this reason we will not develop so comprehensive theory, as we did for the one-variablefunctions. We focus our attention to the problems of continuity and extend the concept of deriva-tives.

59

3. DERIVATIVE 60

Definition (neighborhood inR2). Let X = (x,y) ∈ R2 be a point inR2 andε > 0 be a positive

real number. Under anε-neighborhood of the point Xwe understand the set

Nε(X) = {X0 = (x0,y0) ∈ R2 : ρ(X,X0) < ε},

whereρ(X,X0) is theEuclidean distanceof the pointsX andY defined in a usual way, by therelation

ρ(X,X0) =√

(x−x0)2+(y−y0)2

Under thereducedε-neighborhood of the point Xwe understand the set

Nε(X) = Nε(X)\{X}.

Definition (continuity). Let f be a function of two variables andX0 = (x0,y0) be a point fromits domain. The functionf is said to becontinuousat X0 = (x0,y0) if for every real numberε,ε > 0, there exists a neighborhoodN(X0) of the pointX0 such that| f (x,y)− f (x0,y0)| < ε foreveryX = (x,y) ∈ N(X0).

In harmony with the one-variable calculus we define the classof elementary functions of twovariables as a class of functions of two variables, which canbe expressed in the explicit formthrough basic elementary functions by means of a finite number of the operations addition, sub-traction, multiplication, division and composition. As a particular example, the function

z= arctg(x2+x+y)√

xy

+ ln(y2sin3x)

is an elementary function.An important property of the elementary functions is presented in the following theorem.

Theorem 2.1(continuity of elementary functions). Every elementary function is continuous onEits domain.

3. Derivative

The concept of the derivative of one-variable-functions has a natural extension for the functionsof two variables. However, there is an important difference. Having informations about thederivative of a function of one real variable we can obtain very detailed and deep results con-cerning this function∗. The derivative of a function of more variables does not contain so deepinformation about the function. Among others, the functionof two variables which has deriva-tives at given point can be discontinuous at this point! Sucha phenomenon cannot appear for thefunctions of one variable!

∗especially concerning the continuity and monotonicity

3. DERIVATIVE 61

Definition (partial derivative). Let f : R2 → R be a function of two variables. Suppose that a

finite limit

f ′x(x,y) = lim∆x→0

f (x+∆x,y)− f (x,y)∆x

.

exists. The value of this limit is called apartial derivative with respect to x of the function f atthe point(x,y).

The rule which associates every(x,y)∈ M with the value off ′x(x,y) is called apartial derivativeof the function f with respect to x.

Similarly, apartial derivative with respect to yis defined through the limit

f ′y(x,y) = lim∆y→0

f (x,y+∆y)− f (x,y)∆y

.

By differentiating the first derivatives we obtain higher derivatives.

Remark 3.1 (practical evaluation of the partial derivatives). According to the preceding defini-tion, the partial derivative with respect to a given variable is the “usual” derivative of the functionof one variable, where only the given variable remains underconsideration and the other variablesare considered to be constant (like parameters). In other words, we obtain the 1st partial deriv-ative of f with respect tox by differentiating f with respect tox while treatingy as a constant.Similarly, the 1st derivative off with respect toy can be obtained by holdingx as a constant anddifferentiating f with respect toy. When differentiating, the usual formulas for differentiationof basic elementary functions and the usual rules for differentiation (known from Theorem 3.2,page 32) serves as a main tool.

Remark 3.2 (a geometric consequence). If fx(x0,y0) exists (derivative with respect tox at thepoint(x0,y0)), it determines the slope of the line parallel to thexz-plane and tangent to the surfacez= f (x,y) at the point(x0,y0, f (x0,y0)). A similar statement holds for the derivative with respectto y.

Remark 3.3 (tangent plane, linear approximation). If the graph of the functionz = f (x,y) isconsidered as a surface inR

3, then the tangent plane to this surface in the point(x0,y0) is

z= f (x0,y0)+ f ′x(x0,y0)(x−x0)+ f ′y(x0,y0)(y−y0).

The tangent plane is under some additional conditions∗ the best linear approximation of thegraph. Hence the approximate formula

f (x,y) ≈ f (x0,y0)+ f ′x(x0,y0)(x−x0)+ f ′y(x0,y0)(y−y0)

holds for(x,y) close to(x0,y0).

Remark 3.4(partial derivatives and the rate of change). From a practical point of viewfx(x0,y0)gives the instantaneous rate of change off with respect tox when the remaining independentvariabley is fixed. Thus, wheny is fixed aty0 andx is close tox0, a small changeh of thevariablex will lead to the change approximatelyh fx(x0,y0) in f .

∗as continuity of the partial derivatives in some neighborhood of the point(x0,y0)


Remark 3.5 (another notation). Another notation for the derivative of the functionz= f (x,y)

with respect tox is fx, z′x, zx,∂ f∂x

,∂z∂x

and similarly for the derivative with respect toy, e.g.,z′y

or∂z∂y

. The second derivatives are denoted in one of the following way: z′′xx, f ′′yy, z′′xy, zxx, fyy, zxy,

∂ 2z∂x∂y

,∂ 2z∂y2 .

There exist two derivatives of the first order and four derivatives of the second order. The follow-ing theorem shows that the second derivatives are (under some additional assumptions) in factonly three, since the mixed partials derivatives are equal.

Theorem 3.1(Schwarz). If the partial derivatives f′′xy a f ′′yx exist and are continuous on theopen set M, then they equal on the set M, i.e.

f ′′xy(x,y) = f ′′yx(x,y)

hold for every(x,y) ∈ M.

The assumptions of the preceding theorem are satisfied in usual technical applications.

4. Extremal problems

Let f : R2 → R be defined and continuous in some neighborhood of the point(x0,y0). We

are interested in the problem, whether or not the values of the function at the point(x0,y0) aremaximal when comparing to the values of the function in “other points”, i.e. whether

f (x0,y0) > f (x,y) for (x,y) 6= (x0,y0), (4.1)

or

f (x0,y0) ≥ f (x,y) (4.2)

holds. First, we have to specify, what does exactly mean the phrase “other points”. Giving aspecific meaning to this phrase, we obtain three important types of extrema.

Definition (maxima of the functions of two variables).

• Suppose that there exists a neighborhoodN((x0,y0)) of the point(x0,y0) such that(4.1) holds for every(x,y) ∈ N((x0,y0)). Then the functionf is said to gain itssharplocal maximumat the point(x0,y0).

• Suppose that another function of two variablesg(x,y) is given andg(x0,y0) = 0. Ifthere exists a neighborhoodN((x0,y0)) of the point(x0,y0) such that inequality (4.1)holds for every(x,y) ∈ N((x0,y0)) which satisfies the relation

g(x,y) = 0, (4.3)

then functionf is said to gain itsconstrained sharp local maximum with respect to theconstraint(4.3) at the point(x0,y0).

• Suppose that the setM ⊆ Dom( f ) is given. If inequality (4.1) holds for every(x,y) ∈M, the functionf is said to gain itssharp global maximum of the function f over theset Mat the point(x0,y0).


Definition (other local extrema).

• If inequality (4.1) is replaced by (4.2) in the preceding definition, we omit the word“sharp”.

• If the direction of the inequality sign in (4.1) and (4.2) is changed in the precedingdefinition, we obtain a definition ofa local minimum, a constrained local minimumand a global minimumof the functionf .

x

y

zlocal maximum

constraint maximum

level curves

constraintg(x,y) = 0

FIGURE 1. Extrema of the functions of two variables on the graph

Remark 4.1. Even in the case when we speak about the sharp extrema is the word “sharp”sometimes omitted.

Theorem 4.1(Fermat). Let f(x,y) be a function of two variables. Suppose that the functionf (x,y) takes on a local extremum in the point(x0,y0). Then either at least one of the partialderivatives at the point(x0,y0) does not exist, or both partial derivatives at this point vanish,i.e.

f ′x(x0,y0) = 0 = f ′y(x0,y0). (4.4)

Remark 4.2. The significance of the preceding theorem is in the fact that usually only few pointsfrom the domain of the function have the property mentioned in the Fermat’s theorem. In mostof the points both partial derivatives exist and at least oneof them is nonzero. In these pointsthe local extremum cannot occur. (Compare this with the relationship between local extrema andstationary point for the functions of one variable.)The point with vanishing both partial derivatives is calledastationarypoint. We usually use thefollowing test to recognize whether a local extremum at the stationary point exists.


Theorem 4.2(2nd derivative test). Let f be a function of two variables and(x0,y0) a stationarypoint of this function, i.e. suppose that(4.4) holds. Further suppose that the function f hascontinuous derivatives up to the second order in some neighborhood of this stationary point.Denote by D the following determinanta

D(x0,y0) =

∣

∣

∣

∣

f ′′xx(x0,y0) f ′′xy(x0,y0)f ′′xy(x0,y0) f ′′yy(x0,y0)

∣

∣

∣

∣

= f ′′xx(x0,y0) f ′′yy(x0,y0)−[

f ′′xy(x0,y0)]2

.

One of the following cases occurs

• If D > 0 and f′′xx > 0, then the function f has a sharp local minimum at the point(x0,y0).

• If D > 0 and f′′xx < 0, then the function f has a sharp local maximum at the point(x0,y0).

• If D < 0, then there is no local extremum of the function f at the point(x0,y0).• If D = 0, then the test fails. Any of the preceding situations may occur.

acalled Hessian

Remark 4.3. Note that ifD > 0, then f ′′xx and f ′′yy are both nonzero and have the same sign.

Remark 4.4. All the presented results have direct extension from two variables into 3 ornvariables. Only an extension of the 2nd derivative test in Theorem 4.2 is somewhat more com-plicated.

Example 4.1.Find local extrema of the functionf : z= x4 +y4−4xy+30.Solution:We find the first derivatives:

f ′x =4x3−4y,

f ′y =4y3−4x,

and the system for stationary points is

4x3−4y = 0,

4y3−4x = 0.

From the first equation it followsy = x3 and substituting into the second equation we obtain

4(x3)3−4x = 0,

4x9−4x = 0,

x9−x = 0,

x(x8−1) = 0.

Hence thex-coordinates of the stationary points arex = 0 and a solution(s) ofx8−1 = 0. Fromherex8 = 1 andx=±1. The correspondingy-coordinate can be obtained from the relationy= x3.We get three stationary pointsS1 = [0,0], S2 = [1,1] andS3 = [−1,−1].We find the second derivatives

f ′′xx = 12x2,

f ′′xy = −4,

f ′′yy = 12y2.


For the second derivatives atS1 we havef ′′xx(S1) = f ′′yy(S1) = 0 andf ′′xy(S1) =−4. HenceD(S1) =

0 ·0− (−4)2 = −16 and there is no extremum atS1.In the pointS2 we have f ′′xx(S2) = f ′′yy(S2) = 12 and f ′′xy(S2) = −4. HenceD(S2) = 12.12−(−4)2 = 144−16> 0. Since f ′′xx(S2) = 12> 0 the function takes on its local minimum at thepointS2.In the pointS3 we havef ′′xx(S3) = f ′′yy(S3) = 12 andf ′′xy(S3) =−4. Hence the situation is the sameas inS2.

Example 4.2.Find local extrema of the functionf : z= yln(x2+y).Solution: We find the derivatives off . For derivative with respect tox we use the constantmultiple rule followed by the chain rule, for derivative with respect toy we use the product rulefollowed by the chain rule. We obtain

f ′x =2xy

x2+y,

f ′y = ln(x2 +y)+y

x2 +y.

The system for stationary points is2xy

x2 +y= 0,

ln(x2+y)+y

x2 +y= 0.

From the first equation we see that eitherx = 0 ory = 0. We will distinguish two cases.

(i) CASE I, y = 0. The second equation gives ln(x2) = 0 and hencex2 = 1 andx = ±1.This yields two stationary pointsS1 = [1,0] andS2 = [−1,0].

(ii) CASE II, x = 0. The second equation gives

lny+yy

= 0,

lny+1 = 0,

lny = −1,

y = e−1

and we obtain the third stationary pointS3 = [0,e−1].We find the second derivatives

f ′′xx =2y(x2 +y)−2xy2x

(x2 +y)2 ,

f ′′xy =2x(x2 +y)−2xy

(x2 +y)2 ,

f ′′yy =1

x2 +y+

x2 +y−y(x2 +y)2 .

Evaluating derivatives at the stationary points we obtain

D(S1) = 0 ·2−22 = −4 < 0,

D(S2) = 0 ·2− (−2)2 = −4 < 0,


D(S3) = 2 ·e−02 > 0.

Hence there is no extremum atS1 andS2 and there is a local minimum of the functionf at S3.

Now we will focus our attention to the problem to find constrained extrema. If one of the vari-ables can be isolated from the constraint, we can convert theproblem to find a constrained ex-tremum of the function of two variables to the problem to find an extremum of the function ofone variable, as the following algorithm and example show.

Algorithm 4.1. The constrained extrema of the function of two variables canbe found in thefollowing steps.

(i) Consider a functionz= f (x,y) and a constraintg(x,y) = 0.(ii) Suppose that the constraint can be solved for one of the variablesx, y. Suppose, for

brevity, thatg(x,y) = 0 can be written in an equivalent formy= ψ(x), whereψ(x) is afunction of a single variablex.

(iii) We substitutey = ψ(x) into z= f (x,y) and obtainz= f (x,ψ(x)), i.e. we obtainz as afunction ofx.

(iv) Using the differential calculus of the functions of onereal variable (Algorithm 5.1,page 39) we find the extrema of the functionz= f (x,ψ(x)).

(v) If x0 is a local extremum from the previous step, we calculate fromthe equationy =ψ(x) the corresponding value of the second variabley. If y0 is such a value, then(x0,y0)is a constrained local extremum of the original functionz = f (x,y) with constraintg(x,y) = 0.

(vi) If the equation 0= g(x,y) cannot be solved fory, we can solve it forx, i.e., we canwrite x = ϕ(y) and work with a similar manner as in the preceding steps.

This method does not cover the cases in which the equation 0= g(x,y) cannot be solved neitherfor x nor fory. If the presented method fails, we can find the constrained extrema by the methodof Lagrange multipliers. The description of this method canbe found in the literature.

Example 4.3.Find the local extrema of the functionf : z= (xy)2 with constraintg : x−ey = 0.Solution:We solve the constraint forx. We obtainx = ey and substitute forx into z. This gives

z(y) = (eyy)2 = y2e2y

andz is a function of the variabley. The derivative gives

z′(y) = (y2e2y)′

= 2ye2y +2y2e2y = 2e2yy(1+y)

where′ is a derivative with respect toy. The stationary points of the functionz(y) = y2e2y arey = 0 andy = −1. The intervals of monotonicity are sketched in the following scheme.

ր−1

MAXց0

min ր

Hencey = −1 is a local maximum andy = 0 a local minimum of the functionz= y2e2y. Wefind for each of these values the correspondingx-coordinate from the constraintx = ey. Hencethe functionf gains a constrained local maximum in the point[e−1,−1] and a constrained localmaximum[1,0].An alternative approach in this problem is to solve the constraint for y, i.e. y = lnx, and investi-gate the functionz(x) = x2 ln2x. The results must be (of course) the same.

5. IMPLICIT FUNCTIONS 67

5. Implicit functions

The following theorem gives an effective criterion which can be used to recognize whether anequation containingx andy defines correctlyy as a function of the variablex.

Theorem 5.1.Let f be a continuous function of two variables. Suppose thatthere exists a point(x0,y0) ∈ R

2 such that

f (x0,y0) = 0.

Suppose that f has continuous partial derivatives in a neighborhood of(x0,y0) and supposethat f′y(x0,y0) 6= 0. Then the relation

f (x,y) = 0 (5.1)

defines in some neighborhood of the point(x0,y0) exactly one continuous function y= ϕ(x)which is differentiable at x0 and

ϕ ′(x0) = − f ′x(x0,y0)

f ′y(x0,y0).

Definition. Relation (5.1) is said to be animplicit formof the functionϕ(x) from the precedingtheorem.

Note that from the point of view of the functions of two variables the implicit function is anequation for a level curve at the level zero.

Example 5.1.Consider the unit circle

x2 +y2−1 = 0. (5.2)

The function f (x,y) = x2 +y2−1 has partial derivativef ′y(x,y) = 2y and fory 6= 0 the relation(5.2) defines implicitly a function.In this simple example we can also find the explicit form of this function. Solving fory we obtain

y = ±√

1−x2,

where the sign depends on the sign ofy. The positive sign corresponds to the upper semi-circleand the negative sign to the lower semi-circle.

CHAPTER IV

Basic Numerical Methods

1. Algebraic equations – general theory

In this section we focus our attention to one of the most important classes of elementary functions— to polynomials. We will be concerned especially in the problem to find zeros of polynomials.

Definition (polynomial). Let n be a nonnegative integer and

Pn(x) = a0xn+a1xn−1 +a2xn−2 + · · ·+an−2x2 +an−1x+an (1.1)

be ann-degree polynomial with real coefficientsa0, a1, . . .an, wherea0 6= 0 (i.e., the highestpower really appears in the polynomial).

The coefficienta0 is called aleading coefficient of the polynomial Pn(x) and the coefficientan is anabsolute term of the polynomial Pn(x). The terma0xn is called aleading term of thepolynomial Pn(x).

Example 1.1.The polynomial

y = 2x6 +x5−3x+10

is a 6-degree polynomial with the leading term 2x6 and the absolute term 10.

Remark 1.1 (simplest polynomials). • A zero-degree polynomial is a constant function.• A first-degree polynomial is alinear function. The graph of the linear function is a line

(remember that to draw a line on the paper it is sufficient jointwo different points ofthis line).

• A two-degree polynomial is aquadratic polynomial. The graph of the quadratic poly-nomial is aparabola.

• A three-degree polynomial is acubic polynomial. The graph of the cubic polynomialis a cubic parabola.

Definition (algebraic equation). Under ann-degree algebraic equationwe understand theequation of the formPn(x) = 0, i.e.

a0xn +a1xn−1 +a2xn−2 + · · ·+an−2x2 +an−1x+an = 0 (1.2)

An algebraic equation is sometimes called also a polynomialequation.

Definition (zero of an algebraic equation). Under azero of equation(1.2) (or azero of polyno-mial (1.1)) we understand the numberc which satisfiesPn(c) = 0, i.e., which substituted forxconverts (1.2) into a valid relation.

A zero of an algebraic equation is sometimes called also aroot of this equation or simply asolution.

68

1. ALGEBRAIC EQUATIONS – GENERAL THEORY 69

Remember that at this stage we can solve linear and quadraticequations.

Example 1.2.The numbersx = 1 andx = −2 are zeros of the polynomial

P(x) = x3+2x2−x−2. (1.3)

Really, a quick evaluation showsP(1) = 0 andP(−2) = 0. The numberx = 3 is not a zero ofthis polynomial, sinceP(3) = 40.

The solvability (at least in the field of complex numbers) of every algebraic equation is guaran-teed by the following theorem.

Theorem 1.1(the basic theorem of algebra). Every algebraic equation has a zero in the set ofcomplex numbers.

Later we will see that the number of zeros is actually equal ton, provided we count these zerosin a special way – together with multiplicity.The following theorem presents an equivalent formulation of the definition of zero. It defines azero in terms of polynomial division.

Theorem 1.2(Bezout). The number c is a zero of polynomial(1.1) (of equation(1.2)) if and Eonly if the linear polynomial(x−c) is a factor of this polynomial, i.e., if and only if there existsan (n−1) degree polynomial Qn−1(x) with property

Pn(x) = (x−c)Qn−1(x). (1.4)

Definition (linear factor corresponding to the zero of algebraic equation). If c is a zero ofpolynomial (1.1) (of equation (1.2)), then the linear polynomial (x− c) with an independentvariablex is said to be alinear factor of the polynomial(1.1)corresponding to the zero x= c.

Example 1.3.Polynomial (1.3) can be written in each of the following equivalent forms

y = (x−1)(x2+3x+2), y = (x+2)(x2−1), y = (x−1)(x+1)(x+2).

The reader can check these relations by multiplying the parentheses.

Remark 1.2(division by the linear factor). Theorem 1.2 states that ifc is a zero of a polynomial,then we can divide this polynomial by the linear factor(x−c) without any remainder. This divi-sion can be performed by the Horner’s scheme (see below) or bya long division of polynomials.

Remark 1.3 (Horner’s scheme). The Horner’s scheme is a numerical method for easy• calculation of the value of the polynomial in given numberx = a,• division of a polynomial by a linear polynomial(x−a).

Consider ann-degree polynomial

Pn(x) = a0xn +a1xn−1 +a2xn−2 + · · ·+an−2x2+an−1x+an

and a real numbera. Let us write the following schemea0 a1 a2 . . . an

a b0 b1 b2 . . . bn

The coefficients from this scheme arise as follows.(i) In the first line the coefficients of the polynomialPn(x) are given.

(ii) We write the numbera on the begin of the second line.


(iii) We copy the leading coefficienta0 into the first column of the table. We will refer tothis coefficient as tob0.

(iv) We calculate the next coefficientb1 asa.b0 + a1, i.e. we multiply the last number inthe second line bya and add the number which is diagonally above this last numberinthe second row.

(v) We do the same with the next coefficient, i.e.b2 = a.b1+a2.(vi) We proceed in this way up tobn which is below the last coefficient of the original

polynomialPn(x).Now denote byQn−1(x) the(n−1)-degree polynomial with coefficientsb0, b1, . . . ,bn−1, i.e.

Qn−1 = b0xn−1 +b1xn−2 +b2nn−3+ · · ·++bn−2x+bn−1

The following statements hold.(i) The last number in the second line of the Horner’s scheme is the value of the polyno-

mial Pn(x) for x = a, i.e.

Pn(a) = bn.

(ii) The numberPn(a) is the remainder after a division ofPn(x) by the linear polynomial(x−a) and the quotient of these two polynomials is a polynomial with the coefficientsfrom the second line of this scheme, i.e.Pn(x)x−a

= Qn−1(x)+Pn(a)

x−a.

(iii) Multiplying the last relation by the polynomial(x−a) we obtain

Pn(x) = (x−a)Qn−1(x)+Pn(a). (1.5)

If bn = 0, thenx = a is a zero of the polynomialPn(x). In this casePn(a) = 0 and (1.5) gives afactorization of the polynomialPn(x) which is known to exist from Theorem 1.2.

Example 1.4.Divide the polynomial

P(x) = x6+3x5−12x4−38x3 +21x2 +99x+54

(i) by the linear polynomial(x−1)(ii) by the linear polynomial(x+1)

and establish the valuesP(1) andP(−1).Solution:Forx = 1 we have by the Horner’s scheme

1 3 −12 −38 21 99 541 1 4 −8 −46 −25 74 128

and hence

P(x) = (x5+4x4−8x3−46x2−25x+74)(x−1)+128,

orP(x)x−1

= x5 +4x4−8x3−46x2−25x+74+128x−1

.

In a similar way, forx = −1 we get1 3 −12 −38 21 99 54

−1 1 2 −14 −24 45 54 0


and hence

P(x) = (x5+2x4−14x3−24x2 +45x+54)(x+1). (1.6)

The valuesP(1) andP(−1) are the last numbers in the second row of Horner’s scheme, henceP(1) = 128 andP(−1) = 0. The numberx = −1 is a zero of the polynomialP(x). The factori-sation from Theorem 1.2 is (1.6).

Motivation. If the numberc is a zero of the polynomial (1.2), it is sometimes important to know,whether this number is also a zero of the polynomialQn−1(x) from Theorem 1.2 or not. It turnsout to be reasonable to distinguish these two cases. This motivates the following definition.

Definition (multiplicity of zero). Let c be a zero of the polynomial (1.1). The zeroc is said tobe of themultiplicity k if there exists(n−k)-degree polynomialQn−k(x) such that,

Pn(x) = (x−c)kQn−k(x) and Qn−k(c) 6= 0. (1.7)

Remark 1.4. A zero of multiplicity one is called a simple zero, a zero of multiplicity two iscalled a double zero.

Theorem 1.3.Let x= c be a zero of the polynomial Pn(x) and let Qn−k(x) be polynomial fromthe relation(1.7). Then the polynomials Pn(x) and Qn−k(x) have common zeros, includingmultiplicity of these zeros, with exception of the zero x= c.

Remark 1.5 (multiplicity of zero). In other words, the multiplicity of a zerox = c states, how Emany times we can divide given polynomial by the linear factor (x−c) without remainder. Whyis this property and the preceding theorem so important? Suppose that we have to solve theequationPn(x) = 0 and we have found one of the roots of this equation, sayx= c. We can dividethe left hand side of the equation by the linear factor(x−c) and we obtain another polynomialand another equation, which is of degree(n− 1). Having in mind that the smaller degree ofpolynomial means simpler equation, we consider this new equation in the sequel.If the zerox= c is of multiplicity k (which is at least two), we can continue to even more simplerequation: we can divide the left hand side of the original equation by the linear factor(x−c) notonly once, butk-times! This gives an(n−k)-degree equation which is simpler than the original(n−1)-degree equation.When solving an algebraic equation remember:Whenever you find any zero of the polynomialP(x) (algebraic equation P(x) = 0), establish the multiplicity of this zero and divide the poly-nomial by the linear factor corresponding to this zero as many times as possible. This processproduces, in the case of a zero of multiplicity k, an(n−k)-degree polynomial Qn−k(x) (algebraicequation Qn−k(x) = 0). This polynomial has the same zeros, including its multiplicity, as thepolynomial P(x) has, with exception of the root x= c. Hence, looking for the zeros of the poly-nomial Qn−k(x), we find, as byproduct, zeros of polynomial P(x). Since the polynomial Qn−k(x)is simpler than the polynomial P(x), we use this method whenever it is possible.

The definition of multiplicity has an equivalent formulation in terms of derivatives, as the fol-lowing theorem shows. This equivalent formulation also suggests a method, which can be usedto define the multiplicity of zero also for non-polynomial functions∗.

∗The concept of division and factorization cannot be extended from the class of polynomial to the class of moregeneral functions.

2. BASIC NUMERICAL METHODS FOR ALGEBRAIC EQUATIONS 72

Theorem 1.4(multiplicity of zeros and the derivatives). The number c is a zero of the polyno-mial (1.1) (equation(1.2)) of the multiplicity k if and only if

Pn(c) = P′n(c) = P′′

n (c) = · · · = P(k−1)n (c) = 0

and

P(k)n (c) 6= 0,

i.e. if and only if c is a zero of the polynomial and all its derivatives up to the order(k−1)(including(k−1)) and it is not a zero of the derivative of the order k.

Remark 1.6 (a geometric consequence). The zeros of a polynomial are thex-intercepts of thegraph of this polynomial. In a neighbourhood of this zero thepolynomial either changes its signor not. Which of these two possibilities actually occurs depends on the multiplicity of this zero.More precisely, the following holds:

• The polynomial changes the sign at the zero of anoddmultiplicity. If the multiplicityis at least 3, the zero is also a stationary point of the polynomial (tangent in this pointis horizontal) and also a point of inflection.

• The polynomial does not change the sign at the zero of anevenmultiplicity. Hencethere is a local extremum at this point; a local maximum, if the sign of the polynomialis negative in a neighbourhood of the zero and a local minimum, if the sign of thepolynomial is positive.

Theorem 1.5(the number of zeros of polynomial). Every n-degree polynomial has exactly nzeros in the field of complex numbers, counted with multiplicity.

Arrangement. The phrase “counted with multiplicity” means, that a doublezero is counted astwo zeros and the zero of multiplicityk is counted ask zeros.Working in the class of real functions, we are frequently interested in real zeros only. Modifica-tion of the preceding theorem for real zeros is the following.

Theorem 1.6(the number of real zeros). Every n-degree polynomial has in the field of realEnumbers either exactly n zeros or n− l zeros, where l is an even number. Each zero is in thisamount counted with multiplicity.

2. Basic numerical methods for algebraic equations

The problem to find all zeros of a given polynomial is not solvable in general. From this reasonit is sometimes is necessary to use approximation methods. Some of the methods available forworking with polynomials are presented in the following paragraphs. Our aim is

• to find the interval containing all zeros of the polynomial,• to find the system of intervals with the property that each interval contains exactly one

zero (separation of zeros),• to approximate zeros of the polynomial with error not higherthan a given number.

Theorem 2.1(estimate for zeros of algebraic equation). Let xi (for i = 1..n) be zeros (real orcomplex) of equation(1.2). The following estimate holds for all of these zeros

|xi | < 1+A|a0|

, (2.1)

where A= max{|ai|, i = 1..n}.


The following simple rule gives an estimate for the number ofpositive zeros of a polynomial.

Theorem 2.2(estimate for the number of real zeros, Descartes). Let p be the number of positivezeros of the polynomial(1.1) and s the number of the changes of the sign in the sequence ofits coefficients a0, a1, a2, . . ., an, (the coefficients which are equal to zero are not consideredinthis sequence). Then either p= s or p< s and in the latter case the number s− p is and evennumber.

Example 2.1. The polynomialP(x) = x8−x5 +x3 +x2−x+1 has either 4 or 2 or no positivezero. These zeros (if there are at least two) are in the interval (0,2).

Example 2.2.The polynomialP(x) = 2x6−x3+4x2+x−6 has either 3 or 1 positive zero in theinterval(0,4).

Remark 2.1 (number of negative zeros – a modification of Descarte’s rule). To estimate thenumber of negative zeros we use the simple fact that

P(x) = P(−(−x)).

This can be read as follows: ifx = c is a negative zero of the polynomialP(x), then the numberx = −c is a positive zero of the polynomialP(x) = P(−x). Hence to find negative zeros of thepolynomialP(x), suffices to find the positive zeros ofP(x). Practically, we work in the followingsteps.

• We write the polynomialP(x) := P(−x) substituting−x for x in the polynomialP(x).This gives a new polynomial which differs form the polynomial P(x) i the signs of thecoefficients at the odd powers ofx.

• We establish the number of the sign changes in the sequence ofthe coefficients in thepolynomialP(x).

• The number of negative zeros of the polynomialP(x) is equal to the numbers of thesign changes from the preceding step or less by an even number.

Example 2.3.For polynomialP(x) = 2x6−x3+4x2+x−6 we haveP(x) = P(−x) = 2x6+x3+

4x2−x−6 and the polynomialP(x) has one negative zero. This zero is in the interval(−4,0).

We will study the polynomials (algebraic equations) with integer coefficients first. In this case itis easy to find all its integer zeros (if exists any).

Theorem 2.3(necessary condition for polynomial with integer coefficients). Suppose that allcoefficients of a given polynomial(1.1) are integers. Suppose that c∈ Z is an integer zero ofthis polynomial. Then the absolute term must be divisible bythe number c, i.e. c|an.

Remark 2.2(practical). From Theorem 2.3 it follows that it is sufficient to look for integer zerosof a polynomial with integer coefficients in the set of all integers factors of the terma0. Animportance of this idea is in the fact that the set of all factors of a0 contains only a finite amountof number. We can test in a finite time each of these candidateswhether it is actually a zero andwhat is a multiplicity of this zero. If we find a zeroc of multiplicity k, we dividek-times thepolynomial with linear factor(x− c) and consider the obtained new polynomial in the sequel.(For an explanation see Remark 1.5.)

Example 2.4(integer roots and factorisation). Solve in the field of integers the equation

x5 +x4−5x3−9x2−24x−36= 0

and factorize the left-hand side as much as possible.


Solution: The absolute term in the polynomial on the left is−36 and hence the only integerswhich can be solution of this equation are the numbers which divide the number−36, i.e.±1,±2, ±3, ±4, ±6, ±9, ±12, ±18 and±36. We will use the Horner’s scheme to recognise thezeros and their multiplicity. Some comments follow this computation.

1 1 −5 −9 −24 −361 1 2 −3 −12 −36 −72

−1 1 0 −5 −4 −20 −162 1 3 1 −7 −38 6= 0

−2 1 −1 −3 −3 −18 || 0 1)

−2 1 −3 3 −9 || 0 2)

−2 1 −5 13 −35 3)

3 1 0 3 || 0 4)

−3 1 −3 12

Remarks:1) x = −2 is a zero (at least simple). We test this zero for a possible multiplicity. Moreover, in

the following we will consider the integer factors of the number 18 only.2) x = −2 is a zero of the multiplicity at least two.3) x = −2 is a zero of multiplicity two and not three or more. Remark that this computation was

in fact not necessary. Really, in the following it is sufficient to consider the integer factors ofthe number−9 only. The number−2 is not such a factor.

4) x = 3 is a zero of multiplicity at least one. We obtained the line with nonnegative coefficients.According to Theorem 2.2, there is no positive zero of the remaining equation. Hence in thefollowing we consider the negative factors of the number 3 only.

Results:The zeros arex1,2 = −2 andx3 = 3. The factorisation of the left-hand side is

(x+2)2(x−3)(x2 +3) = 0.

From here it is clear that the remaining zeros are not real numbers, i.e.x4,5 6∈ R (since equationx2 +3 = 0 has no solution in the field of real numbers).

Arrangement. The numberc is said to be a zero of polynomial (1.1)with an error less thanε ifit differs from the actual zero at most byε, i.e. if the actual zero is in the interval(c− ε,c+ ε).It is clear that if the interval[a,b] contains∗ a zero of the polynomialP(x), then the centrec =a+b

2of this interval is a zero of the polynomial with error less than the half of the length of this

interval, i.e.ε =b−a

2.

The method of bisection. Given a polynomialP(x) and real numbersa,b ∈ R, suppose thatE

P(a)P(b) < 0 holds. Consider the pointc =b+a

2and the valueP(c) of the polynomial in this

point. One of the relations

P(a)P(c) < 0 or P(c)P(b) < 0 or P(c) = 0

holds. Omitting the last possibility (in this casex= c is an exact zero of the polynomialP(x)), wesee that one of the intervals(a,c) and(b,c) contains a zero of the polynomialP(x) When seeking

∗This is guaranteed especially ifP(a) andP(b) have different signs – see Theorem 2.2, page 27.

3. TAYLOR POLYNOMIAL 75

a zero, we can focus our attention to the appropriate left half or right half of the interval(a,b).Hence the localisation of the zero is somewhat better: the length of the interval with change ofsign is one half to the original length and hence the accuracyis two times better. Following thisidea we can, after a finite number of steps, obtain the value ofthe zero with an arbitrary precision.

Example 2.5(method of bisection). Solve the equation

P(x) = x3+x−1 = 0

with error less than 0.03.Solution: All zeros are in the interval(−2,2) (see (2.1)). There is one positive zero and nonegative zero (Descart’s rule of the signs). Evaluation of the polynomial in the pointsx = 0 andx = 1 shows the change of the sign of the polynomial in the interval (0,1) and the zero is in thisinterval. By bisection of this interval we obtain the following table.

a c =a+b

2b P(a) P(c) P(b) ε =

b−a2

0 0.5 1 − −0.37 +0.5 0.75 1 − +0.17 +0.5 0.625 0.75 − −0.13 +0.625 0.6875 0.75 − +0.01 + 0.620.625 0.6563 0.6875 − −0.06 + 0.03120.6563 0.6719 0.6875 0.0156

The zero isx = 0.67±0.02.

3. Taylor polynomial

Motivation. Let f be a real function with the following properties.• The valuef (x0) is known.• We have no effective method to evaluate the function at the other points, different from

x0.• The value of the firstn derivatives of the functionf at the pointx0 is known.

We state the following problem:Find an n-degree polynomial which approximates the functionf in the neighbourhood of the point x0 with the smallest possible error.The solution of this problem is introduced in the following definition.

Definition (Taylor polynomial). Let n∈ N be a positive integer andf be a function defined atx0 which has derivatives up to the ordern at x0. The polynomial

Tn(x) = f (x0)+f ′(x0)

1!(x−x0)+

f ′′(x0)

2!(x−x0)

2+ · · ·+ f (n)(x0

n!(x−x0)

n

is called ann-degree Taylor polynomial of the function f at x0.

The pointx0 is called acentreof this polynomial.

4. LAGRANGE INTERPOLATION FORMULA 76

Theorem 3.1(Taylor). Let f be a function defined in a neighborhood N(x0) of the point x0. Letf ′(x0), f ′′(x0), . . . , f(n)(x0) be the values of the first n derivatives of the function f at thepointx0. Suppose that the(n+1)st derivative is continuous in N(x0). Then for all x∈ N(x0)

f (x) = Tn(x)+Rn+1(x),

holds, where Tn(x) is the n-degree Taylor polynomial of the function f in the point x0 andRn+1(x) a remainder. The remainder can be written in the form

Rn+1(x) =f (n+1)(c)(n+1)!

(x−x0)n+1, (3.1)

where c is some number between x and x0.

Remark 3.1 (polynomial approximation). From the formula (3.1) it follows that the remainderEis small if

• (x−x0) is small, i.e.,x is close tox0,• n! is large, i.e.,n is large,• | f (n+1)(x)| is numerically small in the neighborhood ofx0

If these conditions are satisfied, we can write

f (x) ≈ Tn(x) (3.2)

in the neighborhood ofx0 and the error is small. The formula (3.2) presents the solution of theproblem stated in the motivation for this section — it is the best polynomial approximation ofthe functionf in a neighborhood of the pointx0.

4. Lagrange interpolation formula

Motivation. An n-degree polynomial is given uniquely by(n+1) independent conditions.∗ Thevalues of the polynomial at the given(n+1) mutually different points can play the role of suchconditions.It can be proved: Given a set of(n+ 1) pairs [xi ,yi ] of the real numbers, there exists a uniquen-degree polynomialP(x) which satisfiesP(xi) = yi for all xi .The main problem of this section is to find an analytic formulafor this polynomial. There areseveral possibilities how to solve this peroblem. One of these possibilities is introduced in thefollowing Theorem.

Theorem 4.1(Lagrange). Let us consider the set of(n+ 1) ordered pairs[xi,yi ], i ∈ N0, 0≤i ≤ n. Polynomial

L(x) = y0l0(x)+y1l1(x)+ · · ·+ynln(x) (4.1)

where

l i(x) =(x−x0)(x−x1) · · ·(x−xi−1)(x−xi+1) · · ·(x−xn)

(xi −x0)(xi −x1) · · ·(xi −xi−1)(xi −xi+1) · · ·(xi −xn)

satisfies L(xi) = yi for all i ∈ N0, 0≤ i ≤ n.

∗This follows from the fact that this polynomial contains(n+1) coefficients. As a particular case, two differentpoints define a line.

5. LEAST SQUARES METHOD 77

Definition (Lagrange polynomial). The polynomialL(x) from Theorem 4.1 is called aLa-grange polynomial.

Polynomialsl i(x) from Theorem 4.1 are calledsmall Lagrange polynomials.

Formula (4.1) is called aLagrange interpolation formula.

Example 4.1.Find a 3-degree polynomialP(x) which satisfiesP(1) = 3,P(2) =−2, P(−1) = 0andP(0) = 1.Solution:We write the values into the following table.

i 0 1 2 3xi 1 2 −1 0yi 3 −2 0 1

By the Lagrange interpolation formula (4.1) we have

P(x) = 3l0(x)+(−2)l1(x)+0l2(x)+1l3(x) = 3l0(x)−2l1(x)+ l3(x).

The small Lagrange polynomials are

l0(x) =(x−2)(x+1)x(1−2)(1+1)1

=x3−x2−2x

−2

= −12(x3−x2−2x),

l1(x) =(x−1)(x+1)x(2−1)(2+1)2

=x3−1

6

=16(x3−x),

l3(x) =(x−1)(x−2)(x+1)

(0−1)(0−2)(0+1)=

(x2−1)(x−2)

2

=12(x3−2x2−x+2).

Note that it is not necessary to write the polynomiall2(x), sincey2 = 0 andl2(x) is multiplied bya zero. The Lagrange interpolation formula gives

P(x) = −32(x3−x2−2x)−2

16(x3−x)+

12(x3−2x2−x−2)

= x3(

−32− 1

3+

12

)

+x2(

32−1

)

+x

(

3+13− 1

2

)

+1

= −43

x3 +12

x2+176

x+1.

It is easy to check that this polynomial has the desired properties.

5. Least squares method

Motivation. Let us consider a set ofn ordered pairs[xi ,yi] (i = 1..n). These pairs can be viewedas coordinates of the points in the plane. Our problem is to find an analytic formula for the liney = ax+ b which is the best linear approximation of the data[xi ,yi ], i.e. a line which fits thisdata file as accurately as possible. From a geometric point ofview, the graph of this line passes


close to the data points. Under the best linear approximation we understand the fact that thedifferences ofyi andaxi + b have to be as small as possible. Mathematically formulated,thecriterion of optimality is that the sum of the squares of the differences[yi − (axi +b)] is as smallas possible, i.e.

S(a,b) =n

∑i=1

[yi − (axi +b)]2 → local minimum.

Note the difference between the formulation of this problemand the problem solved by La-grange polynomial. In the problem of interpolation, solvedby Lagrange formula, the resultingpolynomial has to go through all the points.When fitting a data file by a line, the resulting function is simple (linear polynomial is one of thesimplest polynomials at all) but the graph doesnot pass through the data points. This approachcan be used especially if the data are obtained as results of some experiment or as results ofsome statistic method and the values[xi,yi ] are themselves inaccurately specified. In this casethe analytical formula

• can be used as a mathematical model for the data points,• minimizes some of the experimental or statistical errors.

s1

s2

s3s4

s5

x1 x2 x3 x4 x5

S(a,b) = (s21+s2

2 +s23+s2

4+s25) → minimum

FIGURE 1. Least squares method

Theorem 5.1 (least squares method). The line y= ax+ b is the best linear fit of the points[x1,y1], [x2,y2], . . . , [xn,yn] if and only if the coefficients a, b satisfy

a∑x2i +b∑xi = ∑xiyi (5.1)

a∑xi +bn= ∑yi

The sums in these formula are evaluated from i= 1 to i = n.

Remark 5.1 (practical). Given a set of points, we write the system (5.1) and solve thissystemwith respect toa andb. This gives the slopea and they-interceptb of the approximating linearfunctiony = ax+b.

Example 5.1.Find the best linear fit for the data file.xi 0 1 3 5 6yi 5 3 3 2 1

Solution:The points in the data file are[0,5], [1,3], [3,3], [5,2] and[6,1]. There are 5 points andhencen = 5. We do the computations necessary to find the coefficients from the system (5.1) inthe following table.


i xi yi x2i xiyi

1 0 5 0 02 1 3 1 33 3 3 9 94 5 2 25 105 6 1 36 6

∑ 15 14 71 28

By (5.1) we can write the linear system

71a+15b = 28,

15a+5b = 14.

The solution of this system isa = − 713

.= −0.538 andb =

28765

.= 4.415. The best linear fit for

the data file is

y = −0.538x+4.415.

The graph of this data file and the linear fit are on the Figure 2.

y = ax + b

1 2 3 4 5 6 x

1

2

3

4

5

y

FIGURE 2. The best linear fit

Remark 5.2(nonlinear data fit). In a similar way any curve of the typey= a f(x)+bg(x) can beused to fit the data file. Approach in this case is almost the same as for the linear data fit, only thesystem (5.1) is somewhat different. (This system can be derived using partial derivatives and thecalculus of the functions of two variables. We look for the valuesa andb for which the functionS(a,b) has a local minimum.)Several nonlinear functions can be simplified into linear functions by an algebraic modificationand after a suitable substitution. Some of the important examples are shown in Table 1.In a similar way even more formulas can be used for a data fit. For example, we can fit data bythe full quadratic function

y = ax2 +bx+c.

The mathematical description of this process is almost the same as fitting a line. The functionSis a function of three variablesS= S(a,b,c) and the calculus of the functions of three variablesis used. The conditions for a local extremum yield three linear equations which can be solved fora, b andc. For details consult the literature.

Example 5.2.Fit the data file by an exponential function.


To fit data files by . . . substitute . . . and use a linear fit . . . .

y =a

x+b X =

1x

y = aX+b

y = ax2+b, x > 0 X = x2 y = aX+b

y =√

ax+b Y = y2 Y = ax+b

y = beax, y,b > 0 Y = lny, B = lnb Y = ax+B

y = bxa, x,y,b > 0 Y = lny, B = lnb, X = lnx Y = aX+B

y = ax+b

x, x > 0 X = x2, Y = xy Y = aX+b

TABLE 1. Nonlinear fits

xi 0 1 3 5 6yi 45 20 5 2 1

Solution: The data file presents a rapidly decreasing relationship betweenx andy. The graphof this data file shows that the linear fitting will not producea good result. From this reason itseems to be better use the exponential functiony = beax as a mathematical model for this datafile. We start with the exponential function

y = beax

and apply a logarithmic function to both sides of this relation. This gives

lny = ln(beax) .

Simplifications on the right give

lny = lnb+ lneax

lny = lnb+ax.

SubstitutingY = lny andB = lnb we obtain a linear relationship

Y = ax+B,

as Table 1 shows. We will fit the data file[xi,Yi ] by the linear functionY = ax+B. We summarisethe necessary computations in the following table.

i xi yi Yi = lnyi x2i xiYi

1 0 45 3.807 0 02 1 20 2.996 1 2.9963 3 5 1.609 9 4.8284 5 2 0.693 25 3.4665 6 1 0 36 0

∑ 15 73 9.105 71 11.290

The system (5.1) takes the form

71a+15B = 11.290

15a+5B = 9.105


with solutionsa = −0.616 andB = 3.670. Hence lnb = B = 3.670 and employing an inversefunction to the logarithm we have

b = eB = e3.670 .= 39.253.

The best exponential fit is

y = 39.253·e−0.616x.

The graph of this exponential fit and the data file are shown on the Figure 3. On this figureyou can also see the best linear fit, as a dotted line. It is clear that this linear fit is much worsemathematical model for the data file, comparing with the exponential fit.

y = beax

1 2 3 4 5 6 x

10

20

30

40

50

y

FIGURE 3. Exponential data fit

CHAPTER V

Linear Algebra

The main object of our interest is to develop a set of tolls which allow to study systems of linearequations. Among others, we develop techniques which allow

• to describe the systems which possesse a solution,• to find the solution.

We start with extending the addition and multiplication of real numbers ton-tuples of real num-bers.

1. Algebraic linear space

Definition (algebraic linear space). The setRn of ordered n-tuples of real numbers(a1,a2, . . . ,an) with the operations addition of vectors and multiplicationof a vector by a realnumber defined for everyc∈ R and(a1,a2, . . . ,an),(b1,b2, . . . ,bn) ∈ R

n by the relations

(a1,a2, . . . ,an)+(b1,b2, . . . ,bn) = (a1+b1,a2+b2, . . . ,an+bn) (1.1)

c· (a1,a2, . . . ,an) = (c·a1,c·a2, . . . ,c·an) (1.2)

is called analgebraic linear space, or analgebraic vector space, shortly avector space.

Definition (vectors). Consider the vector spaceRn.

Elements of this space are called (algebraic) vectors. The fact that a variable is a vector will bedenoted by an arrow symbol over this variable:~a.

The numbersa1, . . . ,an are calledcomponents of the vector(a1,a2, . . . ,an).

The numbern is called adimensionof the spaceRn. A vector fromRn is called ann-vector.

Remark 1.1. According to the definition, the operations addition of vectors and multiplicationof a real number and a vector are defined as an addition or multiplication of the correspondingcomponents. From this reason these operations preserve their basic properties known from thealgebra of real numbers. Among others, the addition of vectors is associative, commutative anddistributive with respect to the multiplication by a real number.

Example 1.1(vector operations). Vectors~a = (1,2,3,4) and~b = (−2,3,1,0) are vectors fromthe spaceR4. Vector~c = (0,1,3,4,−1) is an element of the spaceR5. The following relationshold: 5~c = (0,5,15,20,−5) and~a+~b = (−1,5,4,4). The sum~a+~c is not defined.

Remark 1.2 (zero vector). The vector~o := (0,0, . . . ,0) is called azero vector. From the defini-tion of vector operations it follows thatt~o=~o,~o+~u =~u and 0~u=~o for an arbitrary vector~u andan arbitrary real numbert. Hence the zero vector has the same role for of addition of vectors likethe number zero for addition of real numbers.

82

1. ALGEBRAIC LINEAR SPACE 83

Remark 1.3(column vector). The components of the vector can be also rearranged into columns.In such a case we speak about acolumn vector, e.g.,

~v =

12−4

is a 3-dimensional column vector (or shortly, a column 3-vector).

Definition (linear combination). Let ~u1, ~u2, . . . ,~uk be a finite sequence of vectors fromRn.The vector~u which satisfies

~u = t1~u1+ t2~u2+ · · ·+ tk~uk, (1.3)

for some real numberst1, t2, . . . , tk is said to be alinear combination of vectors~u1, ~u2, . . . ,~uk.The numberst1, t2, . . . ,tk are said to becoefficients of this linear combination.

Remark 1.4. When speaking about the linear combination of vectors we suppose that all thesevectors are of the same dimension. In this case the right-hand side of equality (1.3) is well-defined.

Example 1.2(linear combination). Let~a= (1,4,2),~b= (−2,0,1) and~c= (1,−2,1). The vector

~d = 2~a+~b−3~c = (−3,14,2)

is a linear combination of vectors~a,~ba~c. (Clearly there exist infinitely many linear combinationsof these vectors, since the coefficients of the linear combination can take any real value.)For an evaluation of the linear combinations it is usually better to use the column vectors, e.g.

2

123

−3

10

−1

+2

45

−2

=

7145

.

In this notation we perform the operations separately alongthe “rows”.

Remark 1.5 (trivial linear combination). Suppose that all coefficients of a given linear combi-nation equal zero. Such a linear combination is called atrivial linear combination. In this casethe right-hand side of (1.3) gives the zero vector. Hence thezero vector can be always written asa linear combination of given vectors. Now we state an important question:

Is the trivial linear combination the unique linear combination with this prop-erty? This means: Given a set U of vectors, is there a possibility how toobtain the zero vector as a nontrivial linear combination ofvectors from U?

The answer is: For some vectorsyesand for someno. It turns out to be important to distinguishthese cases. This is a motivation for the following definition.

Definition (linear (in-)dependence of vectors). Vectors~u1, ~u2, . . . ,~uk are said to belinearlydependentiff there exists at least one nontrivial linear combinationof all these vectors whichyields the zero vector. More precisely, the vectors are linearly dependent if there exist realnumberst1, t2, . . . ,tk such that at least one of these numbers is nonzero and

~o = t1~u1+ t2~u2+ · · ·+ tk~uk (1.4)

holds. The vectors are said to belinearly independentif they are not linearly dependent.

2. MATRIX 84

Remark 1.6 (verbal formulation of the preceding definition). From the preceding definition itfollows: The vectors are linearly dependent if there are at least two different possibilities how towrite the zero vector as a linear combination of the given vectors – one of these possibilities isthe trivial linear combination, the second one is different.

Remark 1.7 (test for linear (in-)dependence). In some special cases it is very easy to recognizeEwhether vectors from a given set are linearly dependent or independent. More precisely, thefollowing statements are true:

• If the given set of vectors contains a vector which is a constant multiple of an anothervector from this set, then the vectors are linearly dependent.

• Two vectorsare linearly dependent if and only if one of these vectors is aconstantmultiple of the other one.

• If the given set of vectors contains more thann vectors (wheren is the dimension ofthe vector space), then the vectors are linearly dependent.

The test for linear dependence or independence is not so easyin the cases not covered by thepreceding rules. In a general situation we test the linear (in-)dependence via the rank of thematrix constituted from these vectors, as will be explainedlater (page 87).

Example 1.3(application of the preceding remark). Using Remark 1.7, we can easily resolvethe problem of linear dependence of the following vectors.

• Vectors~a = (1,2,5),~b = (0,1,0) and~c = (−2,−4,−10) are linearly dependent, since~c = −2~a.

• Vectors~d = (1,3) and~e= (−1,3) are linearly independent.• Vectors~t = (1,0,1),~u = (1,2,3),~v= (1,−2,3), ~w= (2,−1,1) are linearly dependent,

since number of the vectors (four) is greater than the dimension (three).• The problem of a linear (in-)dependence of the vectors~i = (1,2,3,1), ~j = (2,1,0,1)

and~k = (1,1,−3,2) is not so easy — no statement from Remark 1.7 applies. We putaway this problem for future, the concept of matrices and rank of a matrix can resolvethis problem, see Remark 3.1, page 88 below.

2. Matrix

Definition (matrix). A rectangular array

A =

a11 a12 a13 · · · a1na21 a22 a23 · · · a2n

......

. . ....

am1 am2 · · · · · · amn

whereai j ∈ R for i = 1..mand j = 1..n is called anm×n matrixor shortly amatrix.

The set of allm×n matrices will be denoted byRm×n. Shortly we writeA = (ai j )m

i=1n

j=1 orA = (ai j ).

An m×n matrix is called asquare matrixif m= n and arectangular matrixotherwise.

The elementsaii are called elements of themain diagonal.

2. MATRIX 85

Definition (transposed matrix). Let A = (ai j ) ∈ Rm×n be anm×n matrix. Then×m matrix

AT which is obtained from the matrixA by interchanging the rows and the columns is called atranspose matrix to the matrix A, i.e.,AT ∈ R

n×m and

AT = (a ji),

whereai j are the elements of the matrixA.

Example 2.1(transposed matrix). For

A =

1 −2 30 1 32 1 90 1 −2

we haveAT =

1 0 2 0−2 1 1 1

3 3 9 −2

.

Remark 2.1 (row and column operations). The rows of anm×n matrix can be viewed as anm-tuple ofn-vectors. Hence it is meaningful to speak about the linear combination of rows of agiven matrix, about the linear dependence or independence of the rows and so on. In the samesense we can speak about the row operations in the matrix, e.g., addition or subtraction of tworows or the multiplication of a row by a real number. Similarly, the matrix can be viewed as ann-tuple of columnm-vectors.

Definition (basic matrix operations). Let A = (ai j ), B = (bi j ) bem×n matrices. Under asumof the matrices A and Bwe understand them×n matrixC = (ci j ) with entriesci j = ai j +bi j forall i, j. We writeC = A+B.

Let A = (ai j ) be anm×n matrix andt ∈ R be a real number. Under aproduct of the number tand the matrix Awe understand them×n matrix D = (di j ) with entriesdi j = t.ai j for all i, j.We writeD = tA.

Remark 2.2 (matrix operations). As the definition states, the addition of matrices is defined asan addition of the corresponding components of the matrices. Since these components are realnumbers, the addition of the matrices preserves the basic properties of the operation “addition ofreal numbers”. Really, the addition of matrices is associative, commutative and distributive withrespect to the multiplication by real number.

Example 2.2. For A =

2 −1 23 1 −22 0 1

, B =

1 −2 10 1 32 4 1

andC =

2 4−1 2

3 1

we have

A+B =

3 −3 33 2 14 4 2

and the sumA+C is undefined.

2. MATRIX 86

Definition (matrix multiplication). Let A = (ai j ) be anm×n matrix andB = (bi j ) be ann× pmatrix. Under theproduct of the matrices A and B (in this order!)we understand them× pmatrixG = (gi j ) defined

gi j = ai1b1 j +ai2b2 j + · · ·+ainbn j

for everyi = 1..m, j = 1..p. In other words, a scalar product of the vector from thei-th row ofthe matrixA and the j-th column of the matrixB is at the positioni j in the matrixG is. Wewrite G = AB (in this order!).

Example 2.3.For the matrices from Example 2.2 we have

AC=

2 −1 23 1 −22 0 1

·

2 4−1 2

3 1

=

11 8−1 12

7 9

and the productCA is undefined. Hence the commutative lawAC= CA is broken for matrices!

The last computation shows that the matrix multiplication is not a commutative operation. Itis a natural question, whether it is fruitful to use such a “strange” operation, and whether thisoperation is applicable to solve real world problems. The following remark shows that the matrixmultiplication is closely related to the linear combination of rows or columns.

Remark 2.3 (relationship between the matrix multiplication and the linear combinations of thevectors). Consider the matrix product

A ·B :=

1 2 32 1 0

−1 3 2

·

2 1 0−1 2 1

3 2 4

=

9 11 143 4 11 9 11

=: C

Further consider the vectors formed by the columns of the matrix A and a linear combinationof these vectors with coefficients from the first column of thematrix B, i.e., consider the linearcombination

2

12

−1

−1

213

+3

302

.

An evaluation of this linear combination (do it yourself!) yields the first column of the matrixC. If the coefficients of the linear combination were the numbers from the second column of thematrixB, the resulting vector would be the second column of the matrix C, similarly also for thethird column. Hence this matrix multiplication can be considered as three linear combinationswritten in one equality.Similarly, the productA.B can be considered as a linear combinations of the rows of the matrixB, where the coefficients of the linear combinations are in therows of the matrixA (try this onthe shown example!).

In general, it is hard to study an operation for which either associative law or distributive law isbroken. Fortunately, both associative and distributive law hold for matrix multiplication. Whenworking with distributive law we have to be aware of the fact that the multiplication is not com-mutative and we must distinguish multiplication of parenthesis from the left and from the right.

3. RANK OF A MATRIX 87

Theorem 2.1(properties of the matrix multiplication). The matrix multiplication is associativeand distributive from both left and right, i.e., the following relations hold

A(BC) = (AB)C (the associative law)

A(B+C) = AB+AC (the left distributive law)

(B+C)A = BA+CA (the right distributive law)

whenever these operations have sense.

Another important problem when defining a new operation is the following:Is there a neutral element∗ with respect to this operation?

We reveal that there is a neutral element to the matrix multiplication – the identity matrix, whichis presented in the following definition and Theorem 2.2.

Definition (identity matrix). Under ann×n identity matrixwe understand then×n matrixwith the numbers 1 in the main diagonal and the numbers 0 outside this diagonal. Then×nidentity matrix is denoted byIn.

Example 2.4.The 3×3 identity matrix is the matrix

I3 =

1 0 00 1 00 0 1

.

Remark 2.4. In the following we omit the indexn from the identity matrixIn. This cannotproduce any misunderstanding since for a givenm×n matrix A the productAI is defined iffI isn×n identity matrix and the productIA is defined iffI is m×m matrix.

Theorem 2.2(properties of identity matrix). Let A be a matrix and I the identity matrix. ThenIA = A and AI= A whenever this product is defined.

3. Rank of a matrix

Motivation. An m×n matrix consists fromm·n real numbers and it is a rather complicated ob-ject. In this section we assign to each matrix a nonnegative integer, which includes an importantinformation about this matrix. Using this concept, we can completely resolve the problem oflinear dependence and independence of vectors†.

Definition (rank of a matrix). Let A be a matrix. Under therank of the matrix Awe understandthe maximal number of the linearly independent rows of the matrix A. The rank of the matrixAwill be denoted by rank(A).

∗A neutral element with respect to the multiplication is an elemente which satisfiese· a = a = a · e for alla. Thus, a neutral element whit respect to multiplication of real numbers is the number 1, a neutral element withrespect to addition of the real numbers is the number 0. A neutral element with respect to vector or matrix additionis a vector or matrix consisting from zeros only.

†Recall that Remark 1.7 on the page 84 does not cover all situations and can be used in a very special casesonly.


Remark 3.1(linear (in–)dependence of vectors). Let A be anm×n matrix. From the definition it Efollows that the rows of the matrixA are linearly independentn-vectors if and only if rank(A) =m. Below we present a procedure for an effective evaluation ofthe rank. Knowing the rank ofthe matrix, we can resolve the problem concerning a linear dependence or independence of therows (and columns, as follows from Theorem 3.4 on the page 89 below) of the matrix.

Definition (pivot, row echelon form). Let A be anm×n matrix. The first nonzero element ofeach row of the matrixA is said to be apivot of this row. The matrixA is said to be in therowechelon formif

• all zero rows (if exists any) are at the bottom of the matrix,• if two successive rows are non-zero, then the second row starts with more zeros than

the first one, i.e. the pivot of each row appears after the pivot of the preceding row.

Theorem 3.1 (rank of a matrix in the row echelon form). The rank of a matrix in the rowechelon form equals to the number of the nonzero rows of this matrix.

Example 3.1. The matrixA =

2 2 2 3 −1 50 0 1 0 0 30 0 0 −1 2 10 0 0 0 0 0

is in the row echelon form and

rank(A) = 3. The matrixB =

2 2 2 3 −1 50 0 1 0 0 30 0 3 −1 2 1

is not in the row echelon form.

Theorem 3.2(row operations preserving the rank). The following row operations preserve therank of matrices.

(i) Omitting a row which satisfies one of the following condition:• it contains only zeros, or• it equals to another row, or• it equals to a constant multiple of another row.

(ii) Multiplying any row by a nonzero real number.(iii) Interchanging the order of the rows in an arbitrary way.(iv) Keeping one row without any change and adding arbitrary multiples of this row to

arbitrary nonzero multiples of another rows.

Remark 3.2(practical). To find the rank of a matrixA, we look for another matrix which has thesame rank and is in the row echelon form. In this process we usethe operations from Theorem3.2. The fact that two matricesA andB have the same rank will be denoted asA∼ B. A naturalquestion is:

Is it allways possible to convert the matrix A via the operations from Theorem3.2 into a row echelon form?

More precisely, we are interested in the problem, whether there exists a finite sequence of matri-cesA1, . . . ,Ak with the following properties:

• A∼ A1 ∼ A2 ∼ ·· · ∼ Ak ∼ B• We can obtain each matrix in this sequence by a convenient application of the opera-

tions from Theorem 3.2 to the preceding matrix in this sequence.


• The matrixB is in the row echelon form.A positive answer to this question is contained in the following theorem.

Theorem 3.3. Any matrix can be after application of a finite number of row operations fromTheorem 3.2 transformed into its row echelon form.

Arrangement: The process of conversion a matrix by row operations preserving rank into itsrow-echelon form will be referred asrow-reduction.

Example 3.2.Row-reduce the matrixA =

2 9 5 6 −21 3 −1 2 13 5 −11 4 9

−2 −2 12 −1 −7

and find rank(A).

Solution:We use the elementa21 as a pivot.

A =

2 9 5 6 −21 3 −1 2 13 5 −11 4 9

−2 −2 12 −1 −7

We write the pivot row on the top and pivot on that element in the first column. We obtain

A∼

1 3 −1 2 10 −4 −8 −2 60 3 7 2 −40 4 10 3 −5

Now, we use an elementa22 as a pivot and do some row operations with the second row whichin result will decrease the numbers in the matrix. More precisely, we add the second row to thethird and fourth rows. We obtain the matrix witha32 = −1 and we use this element for pivotingin the second column. We divide the second row by the number 2 before pivoting.

A∼

1 3 −1 2 10 −4 −8 −2 60 −1 −1 0 20 0 2 1 1

∼

1 3 −1 2 10 −2 −4 −1 30 −1 −1 0 20 0 2 1 1

∼

1 3 −1 2 10 −1 −1 0 20 0 −2 −1 −10 0 2 1 1

The last row is a constant multiple of the preceding one and can be omitted. The matrixA isequivalent to the following row echelon matrix

A∼

1 3 −1 2 10 −1 −1 0 20 0 −2 −1 −1

.

Hence rank(A) = 3. The rows of the matrix are linearly dependent 5-vectors. There are threerows (however we do not know which ones) which are linearly independent.

The following theorem has an important consequence: all facts which concern the rank of thematrix and which are valid for rows of the matrix can be reformulated also for columns. This isa consequence of the following theorem.

Theorem 3.4.The rank of a matrix and the rank of its transpose are the same.

4. INVERSE MATRIX 90

4. Inverse matrix

Motivation: Another important problem associated with any operation possessing a neutral ele-ment is the existence or nonexistence of the inverse element∗. Let us formulate this problem formatrix multiplication of square matrices.

Given a square matrix A, find the square matrix B (if exists any) such thatboth products A·B and B·A equal to the identity matrix I.

First of all, we introduce simply a name for a solution of sucha problem.

Definition (invertible matrix, inverse matrix). Let A be ann×n square matrix. If there existsann×n matrixA−1 which satisfies the relations

A−1A = I = AA−1, (4.1)

then the matrixA is said to beinvertible. The matrixA−1 is said to be theinverse matrixto A.

Remark 4.1 (existence of an inverse matrix). Not every matrix is invertible. We will show(Theorem 7.1, page 106 below) that there exists a simple characterization of invertible matricesin terms of the rank of the matrix.

Hereinafter we present an efficient method for evaluation ofthe inverse matrix. The procedure isvery similar to the row-reduction, explained on the previous pages. First of all, let us introducean extension of the row echelon form.

Definition (reduced row echelon form). An m×n matrix A is said to be in thereduced rowechelon formif

• A is in the row echelon form,• pivots in all rows equal 1,• each of the pivots is the only nonzero number in its column.

Algorithm 4.1 (calculation of the inverse matrix). Given a squaren×n matrix A, the inverse EmatrixA−1 can be calculated in the following steps.

(i) We write the matrixA and then×n identity matrixIn together.(ii) We convert the matrixA into its reduced row-echelon form by row operations from

Theorem 3.2.(iii) We distinguish two mutually different cases.

• If the reduced row echelon form of the matrixA is not then×n identity matrix,thenA is not invertible.

• If the reduced row echelon form is then×n identity matrix, then the application ofall of the steps which convertA into its reduced row echelon form onto the identitymatrix yields the inverseA−1.

(iv) Remark that wecannotuse any of the column operations.

∗The inverse element with respect to multiplication of real numbers to the number 2 is12

, since 212

=12

2 = 1

and the number 1 is a neutral element with respect to the multiplication of real numbers. The number 0 possessesno inverse, since there is no realx which satisfies the relation 0x = 1.

The inverse element with respect to the addition to the number 3 is the number−3, since 3+(−3) = −3+3 = 0and the number 0 is a neutral element with respect to the addition of real numbers.

4. INVERSE MATRIX 91

Example 4.1.Given a matrixA =

5 −1 52 0 2

−3 −3 1

, find A−1.

Solution: We will start with the matrixA and the identity matrix. We will do all row operationson both matrices simultaneously, i.e., we will work with thematrix (A|I) as with a 3×6 matrix.

5 −1 5 1 0 02 0 2 0 1 0−3 −3 1 0 0 1

Usually it is better to use the numbers±1 as pivots. From this reason we “build up” the number1 in the first column. Particularly, we add the 2nd row to the 3rd row. After this step we havea31 = 1 and we use this element as a pivot, i.e., we place the row withthe pivot on the top and“clean” the 1st column.

5 −1 5 1 0 02 0 2 0 1 0−1 −3 3 0 1 1

∼

−1 −3 3 0 1 10 −16 20 1 5 50 −6 8 0 3 2

Further, we use the elementa32 to decrease the numbers in the matrix. More precisely, wemultiply the third row by−3 and add it to the second row.

−1 −3 3 0 1 10 2 −4 1 −4 −10 −6 8 0 3 2

Now we pivot on the elementa22 = 2. We multiply the second row by(−3) and the first row by(−2) and add. This yieldsa12 = 0. Similarly we obtaina32 = 0.

2 0 6 −3 1 10 2 −4 1 −4 −10 0 −4 3 −9 −1

Pivoting on the elementa33 = 0 gives the diagonal matrix.

4 0 0 3 −7 −10 2 0 −2 5 00 0 4 −3 9 1

Now we divide each row by its pivot and obtain the identity matrix on the left. This produces theinverse matrixA−1 on the right.

1 0 0 3/4 −7/4 −1/40 1 0 −1 5/2 00 0 1 −3/4 9/4 1/4

If a constant multiple of the identity matrix is on the left, then the corresponding multiple ofinverse matrix is on the right. Particularly, if the matrix 4I is on the left, then the matrix 4A−1 ison the right.

4 0 0 3 −7 −10 4 0 −4 10 00 0 4 −3 9 1

5. THE DETERMINANT 92

The inverse isA−1 =

3/4 −7/4 −1/4−1 5/2 0

−3/4 9/4 1/4

=14

3 −7 −1−4 10 0−3 9 1

5. The determinant

Definition (determinant). Let A be ann×n square matrix. Under adeterminant of the matrix Awe understand the real number detA which is assigned to the matrix by the following three–steprecursive algorithm.

(i) If the matrixA is 1×1 matrix, i.e. ifA = (a11), then detA = a11.(ii) Suppose that the determinant of(n−1)× (n−1) matrix is defined. Denote byMi j

the determinant of the(n−1)× (n−1) matrix which has arisen from the matrixA byomitting thei-th row and thej-th column. We definecofactor Ai j of the elementai j

in the matrixA as the productAi j = (−1)i+ jMi j .(iii) Finally, we define the determinant of the matrixA by the relation

detA = ai1Ai1+ai2Ai2+ · · ·+ainAin (5.1)

wherei ∈ {1,2, . . .n} is the index of arbitrary row.

Remark 5.1 (notation). Determinant of the matrixA is denoted also by|A|. If A = (ai j ), wewrite also|ai j | instead of|(ai j )|.Remark 5.2 (Is the definition correct?). The formula (5.1) is called anexpansion of the deter-minant along the i-th row. This formula allows to write the determinant of then×n matrix interms ofn determinants of(n−1)× (n−1) matrices. Each of these determinants can be writtenin terms of determinants of the(n−2)×(n−2) matrices and so on. We end after a finite numberof steps when we obtain determinants of 1×1 matrices. It should be noted that the indexi in theexpansion can be chosen arbitrary. The proof of this fact canbe found in the literature under thenameLaplace theorem. In this sense the expansion (5.1) is called theLaplace expansion of thedeterminant along the i-th row.

Many of the most important properties of matrices depend on the fact whether the determinant ofthe matrix equals zero or not. It is fruitful to distinguish these cases by the following definition.

Definition (regular and singular matrix). Let A be a square matrix. The matrixA is said to besingular if detA = 0 and it is said to beregular in the opposite case.

Remark 5.3 (determinants of the simplest matrices). For 2×2 matrix the definition of determi-Enant gives (we usei = 1):

∣

∣

∣

∣

a bc d

∣

∣

∣

∣

= a|d|−b|c|= ad−bc

and for 3×3 matrix∣

∣

∣

∣

∣

∣

a b ci j kx y z

∣

∣

∣

∣

∣

∣

= a

∣

∣

∣

∣

j ky z

∣

∣

∣

∣

−b

∣

∣

∣

∣

i kx z

∣

∣

∣

∣

+c

∣

∣

∣

∣

i jx y

∣

∣

∣

∣

= a jz+bkx+ciy− (c jx+biz+aky)


The formula for the determinant of 3×3 matrix can be remembered with the help of the followingscheme

a b ci j kx y za b ci j k

In this scheme we write once more the first and the second rows below the determinant. Thenwe multiply in this scheme each three elements in the main diagonal and below (solid lines inthe diagram). The products which arise from these elements are with the sign “+”. Next wemultiply along the auxiliary diagonala13-a22-a31 and below (dotted lines in the diagram) andforce change of the sign of these products (add minus sign in the front). Finally we sum up allthe products with correct signs.

Remark 5.4 (practical). The calculation of determinant from the definition is useless for de- Eterminants of higher orders, since we should calculate a lotof products. From this reason wedevelop other techniques for an evaluation of the determinant. The main idea of these techniquesis the following

(i) We will show that the Laplace expansion can be reformulated for columns as well asfor rows of the determinant.

(ii) We change slightly the matrix by the operations which donot change the value of thedeterminant. The new matrix will have at most one nonzero element in some row or insome column.

(iii) We write the Laplace expansion along this “almost zero” row or column. In this ex-pansion almost all of the cofactors are multiplied by zero and we need not to evaluatethem. Only one cofactor remains. We obtain one determinant of the ordern−1.

(iv) In the similar manner we work with the obtained determinant. We obtain determinantof the ordern−2. We end when we reach determinant of the order 1, 2 or 3. Thisdeterminant can be evaluated by the corresponding rule.

In the following theorems we present the row operations which either preserve the value of thedeterminant or change it in the known way. These operation are (slightly) different from thoseone, which preserve the rank! Read carefully Theorems 5.1 and 5.2 and note differences betweenthese theorems and Theorem 3.2!

Theorem 5.1(operations preserving the value of the determinant). The following operationspreserve the value of the determinant:

(i) Leaving one row (column) without any change and adding arbitrary multiples of thisrow (column) to the remaining rows (columns).

(ii) Transposition of the matrix.

Remark 5.5 (Laplace expansion for columns). Theorem 5.1 shows that changing rows of thedeterminant into columns (and vice versa) does not change the value of the determinant. Thus allstatements concerning the determinant and rows can be reformulated also for columns. Amongothers, the Laplace expansion along a column reads as follows: for an arbitrary column indexj ∈ {1,2, . . . ,n} we have

detA = a1 jA1 j +a2 jA2 j + · · ·+an jAn j,


whereAi j is the i j -th cofactor of the matrixA. This formula is called theLaplace expansionalong the j-th column.

Theorem 5.2(another operations with determinant). The following operations change the valueof the determinant in a known way:

(i) Interchange two rows (or two columns) of the determinant changes the sign of thedeterminant.

(ii) Dividing an arbitrary row (an arbitrary column) by a nonzeronumber a decreasesthe value of the determinant a-times.

Remark 5.6. According to the preceding theorem, the following holds.∣

∣

∣

∣

∣

∣

2 4 8−1 2 4

0 1 12

∣

∣

∣

∣

∣

∣

= 2

∣

∣

∣

∣

∣

∣

1 2 4−1 2 4

0 1 12

∣

∣

∣

∣

∣

∣

= 2.4.

∣

∣

∣

∣

∣

∣

1 2 1−1 2 1

0 1 3

∣

∣

∣

∣

∣

∣

.

Example 5.1.Evaluate

∣

∣

∣

∣

∣

∣

∣

∣

∣

2 1 1 1 11 3 1 1 11 1 4 1 11 1 1 5 11 1 1 1 6

∣

∣

∣

∣

∣

∣

∣

∣

∣

·

Solution:(For comments see the paragraph below the computation)∣

∣

∣

∣

∣

∣

∣

∣

∣

2 1 1 1 11 3 1 1 11 1 4 1 11 1 1 5 11 1 1 1 6

∣

∣

∣

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

∣

∣

∣

0 −5 −1 −1 −11 3 1 1 10 −2 3 0 00 −2 0 4 00 −2 0 0 5

∣

∣

∣

∣

∣

∣

∣

∣

∣

Lapl= 1·(−1)2+1 ·

∣

∣

∣

∣

∣

∣

∣

−5 −1 −1 −1−2 3 0 0−2 0 4 0−2 0 0 5

∣

∣

∣

∣

∣

∣

∣

=

= −1 · (−2) · (−1) ·

∣

∣

∣

∣

∣

∣

∣

5 1 1 1−2 3 0 0

1 0 −2 0−2 0 0 5

∣

∣

∣

∣

∣

∣

∣

= −2 ·

∣

∣

∣

∣

∣

∣

∣

5 1 1 1−2 3 0 011 2 0 2−2 0 0 5

∣

∣

∣

∣

∣

∣

∣

Lapl=

= −2 ·1 · (−1)1+3 ·

∣

∣

∣

∣

∣

∣

−2 3 011 2 2−2 0 5

∣

∣

∣

∣

∣

∣

= −2 · [−20−12− (165)] = 394

As a first step we use the elementa21 for pivoting in the first column. We have to be aware ofthe facts that we cannot change the order of rows∗ and we cannot multiply the non-pivot rows†.When the pivoting is finished, we use the Laplace expansion along the first column and obtain afourth-order determinant.Further, we take the number(−2) from the third and the number(−1) from the first row, inorder to decrease the numbers and remove the negative signs.In the resulting determinant, thefirst column is less handy for pivoting, since it contains a lot of nonzero elements. Pivoting inthe second, third or fourth column is much easier. The best column for pivoting seems to be thecolumn with smallest numbers, i.e, the third column. Pivoting on the elementa13 “cleans” thethird column and we can expand the determinant along this column.When we obtain the third order determinant, we evaluate by the rule from Remark 5.3, page 92.

∗Determinant would change the sign, according to Theorem 5.2.†Theorem 5.1 allows to multiply the row which remains in determinant, i.e., the pivot row only.

6. SYSTEM OF LINEAR EQUATIONS 95

6. System of linear equations

One of the most important applications of linear algebra is solving the system of linear equations.

Definition (system of linear equations). Under asystem of m linear equations in n unknownswe understand the system of equations

a11x1+a12x2 +a13x3+ · · ·+a1nxn = b1

a21x1+a22x2 +a23x3+ · · ·+a2nxn = b2

a31x1+a32x2 +a33x3+ · · ·+a3nxn = b3 (6.1)...

am1x1 +am2x2 +am3x3 + · · ·+amnxn = bm.

Variablesx1, x2, . . . ,xn are said to beunknowns. Real numbersai j are said to becoefficients ofthe left-hand sides of equations, real numbersb j coefficients of the right-hand sidesor constantterms of equations.

Under asolution of the system(6.1) we understand then-tuple of real numbers which, substitutedfor the unknowns, convert the equations into identities.

When solving (6.1) only the coefficients are important. Thismotivates the following definition,which allows to work with the coefficients only.

Definition (matrix of the system). Matrix

A =

a11 a12 a13 · · · a1na21 a22 a23 · · · a2n...

.... . .

...am1 am2 am3 · · · amn

(6.2)

is said to be amatrix of the system(6.1) (or acoefficients matrix). Matrix

A∗ =

a11 a12 a13 · · · a1n b1a21 a22 a23 · · · a2n b2...

.... . .

......

am1 am2 am3 · · · amn bm

(6.3)

is said to be anaugmented matrix of the system(6.1).

Remark 6.1 (vector notation of the system of linear equations). The system (6.1) can written inEan equivalent form of a vector equation. Really, denote

~a1 =

a11a21...

am1

, ~a2 =

a12a22...

am2

, ~a3 =

a13a23...

am3

, . . . ,~an =

a1na2n...

amn

, ~b =

b1b2...

bm

. (6.4)

Evaluating the following linear combination one can see that the vector equation

x1~a1+x2~a2+x3~a3+ · · ·+xn~an =~b (6.5)


is equivalent to (6.1). Now we see that the problem to find a solution of linear system (6.1)possesses the following equivalent formulation (we use thenotation introduced in (6.4)).

Write the vector~b as a linear combination of vectors~a1,~a2, . . . ,~an.Recall that if all the right-hand sides of the linear system equal zero, the problem of solvability(6.1) has a close connection to the problem of linear independence of vectors (see the definitionof the linear (in-)dependence on page 83).

Definition (homogeneous system). If

b1 = b2 = · · · = bm = 0

holds, then system (6.1) is said to behomogeneous.

Remark 6.2 (trivial solution). Every homogeneous system possesses a solution. Really, it isclear that then-tuple x1 = 0, x2 = 0, . . . , xn = 0 is a solution of an arbitrary homogeneoussystem. This solution is called atrivial solution.

Remark 6.3 (matrix notation). The linear combination in (6.5) can be written as a matrix prod-uct. This leads to the matrix equation

a11 a12 · · · a1na21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

x1x2...

xn

=

b1b2...

bm

. (6.6)

Denote byA the matrix (6.2) of the system, by~b the column vector of the right-hand sides andby~x the vector of unknowns , i.e.

~b =

b1b2...

bm

and ~x =

x1x2...

xn

.

Using this notation, the linear system can be written as the matrix equation

A~x =~b. (6.7)

This form is used frequently in the engineering computations for simplicity and brevity.

Theorem 6.1(Frobenius). System(6.1) has a solution if and only if the augmented matrix ofEthis system has the same rank as the matrix of this system, i.e. rank(A) = rank(A∗).

Algorithm 6.1 (solution of linear system). One of the simplest methods for solving system oflinear equations are the Gauss elimination and the Gauss–Jordan elimination. The Gauss elimi-nation consists in the following steps.

(i) We write the augmented matrix of the system. Thei-th column contains the coeffi-cients atxi and the last column contains the right-hand sides. The orderof the rows isarbitrary.

(ii) We convert the augmented matrix into its row echelon form. We use row operations∗

from Theorem 3.2.∗no operations on columns!


(iii) We rewrite back the augmented matrix in the row echelonform into a system of linearequations (in the original unknowns). The set of all solutions of this new system is thesame as the set of all solutions of the original system.

(iv) We start with the last equation. Three mutually different cases are possible(a) The last equation does not contain any unknown, i.e. it has the form 0= a, where

a is a nonzero number. In this case the system possesses no solution.(b) The last equation contains exactly one unknown. In this case we solve the equation

for this unknown and continue with the next step.(c) There arek unknowns in the last equation (k > 1). In this case we solve one arbi-

trary of these unknowns through the other(k−1) ones. These(k−1) unknownsare calledfree unknowns. The free unknowns can be considered as parameters andcan take any real values.

(v) We continue with the last but one equation. The unknowns which appeared in thepreceding steps are considered to be known already. Two cases can occur.(a) The equation contains one “new” unknown (i.e. all of the unknowns, with excep-

tion of one unknown, are free or known already). We solve the equation for thisunknown. The formula for this unknown may contain also the free unknowns.

(b) The equation contains at least two “new” unknowns. If there is l , l > 1, newunknowns, then we isolate one of them and the other(l −1) will be free.

(vi) We repeat the last step until we reach the first equation.At this stage the system issolved. We either calculate all of the unknowns (the system has unique solution) or wecalculate the non-free unknowns in terms of the free ones. These free variables can beconsidered as parameters and can take any real value. Hence,if at least one of the freevariables is present, then the system has infinitely many solutions.

Remark that the choice of the free unknowns is not unique and two equivalent sets of solutionscan be written in several, very different, forms.

Remark 6.4. The following three mutually different cases may occur: E(i) The system has no solution if and only if rank(A) 6= rank(A∗). This occurs if the last

line of the augmented matrix in the row echelon form corresponds to the equation 0= awherea is a real nonzero number. This equation clearly has no solution and hence thewhole system has no solution.

(ii) The system has exactly one solutions if and only if rank(A) = rank(A∗) = n.(iii) The system has infinitely many solutions if and only if rank(A) = rank(A∗) < n. In this

case the unknowns can be computed in terms of(n− rank(A)) independent parameters,or, in other words, in(n− rank(A)) free variables.

Example 6.1.Solve the linear system.

2 x1 + x2 − x3 + x4 = 1x1 − 3 x2 + 3 x3 − 4 x4 = 1

5 x1 + x2 − x3 + 2 x4 = −12 x1 − x2 + x3 − 3 x4 = 4

Solution:We will work by Gauss algorithm. We pivot on the elementa21 = 1 in the first column

2 1 −1 1 11 −3 3 −4 15 1 −1 2 −12 −1 1 −3 4

∼

1 −3 3 −4 10 7 −7 9 −10 16 −16 22 −60 5 −5 5 2


We divide the third row by the number 2. The numbers 7, 8 and 5 (see the next matrix below)are unhandy for pivoting. Therefore we take the second row from the third row and obtaina32 = 1. Further we pivot on this element.

1 −3 3 −4 10 7 −7 9 −10 8 −8 11 −30 5 −5 5 2

∼

1 −3 3 −4 10 7 −7 9 −10 1 −1 2 −20 5 −5 5 2

∼

1 −3 3 −4 10 1 −1 2 −20 0 0 −5 130 0 0 −5 12

Now we pivot ona34 = −5. This is comfortable, sincea44 is a constant multiple of this pivot.

1 −3 3 −4 10 1 −1 2 −20 0 0 −5 130 0 0 0 −1

Conclusion: The last equation in the obtained augmented matrix corresponds to the equation0 =−1 which has no solution. Hence the linear system has no solution. Really, rank(A) = 3 andrank(A∗) = 4. Since these ranks are different, the system is not solvable.

Example 6.2.Solve the linear system

3 x1 + 5 x2 − 11x3 + 4 x4 = 9x1 + 3 x2 − x3 + 2 x4 = 1

2 x1 + 9 x2 + 5 x3 + 6 x4 = −2− 2 x1 − 2 x2 + 12x3 − x4 = −7

Solution:We convert the augmented matrix into its row echelon form. Westart with pivoting ona21 = 1.

3 5 −11 4 91 3 −1 2 12 9 5 6 −2

−2 −2 12 −1 −7

∼

1 3 −1 2 10 −4 −8 −2 60 3 7 2 −40 4 10 3 −5

∼

1 3 −1 2 10 −2 −4 −1 30 3 7 2 −40 4 10 3 −5

Note that we divided the second row by the number 2. Now we pivot on the elementa22 = −2.We multiply the pivot row by 3, the second row by 2 and add. Further we multiply the pivot rowby 2 and add to the last row. We obtain the following matrix.

1 3 −1 2 10 2 4 1 −30 0 2 1 10 0 2 1 1

The last two rows are identical and one of the rows can be removed. The remaining matrix is inthe row echelon form.

1 3 −1 2 10 2 4 1 −30 0 2 1 1

Discussion:The ranks of the matricesA andA∗ are the same, rank(A) = rank(A∗) = 3. The setof the solutions contains 4−3 = 1 parameter (or one free variable).Evaluation of the unknowns:We write the last equation first.

2x3 +x4 = 1

We choosex3 to be a parametert and solve the equation forx4. We obtain

x3 = t,


x4 = 1−2t.

We write the second equation

2x2 +4x3+x4 = −3

and substitute forx3 andx4. We obtain

2x2 +4t +(1−2t) = −3

and from here

x2 = −2− t.

Finally, the first equation

x1 +3x2−x3 +2x4 = 1

after substitution forx2, x3 andx4 gives

x1 +3(−2− t)− (t)+2(1−2t) = 1

with solution

x1 = 5+8t.

Conclusion:The set of the solutions is a one-parametric family of real 4-tuples

x1 = 5+8t,

x2 = −2− t,

x3 = t,

x4 = 1−2t,

wheret is a real parameter.

Remark 6.5 (Gauss–Jordan elimination). The Gauss–Jordan elimination is a modification andextension of the Gauss elimination. This modification lies in the fact that we calculate there-duced row echelonform of the augmented matrix. The advantage of this approachis, that thelinear system corresponding to the reduced row echelon formis very simple: it allows to evaluatevery fast the unknowns corresponding to the pivots in terms of non-pivots (the non-pivots can beconsidered as free variables). Hence the back substitution, which needs an extra calculations inthe case of Gauss elimination, is trivial in the case of Gauss–Jordan elimination.A disadvantage of the Gauss–Jordan elimination lies in the fact, that this method may produceeither high numbers or fractions in the matrix. From this reason we usually use a combination ofboth methods. More precisely, we do the following:

• We choose a pivot in the first column and “clean” the first column. We obtain a matrixwhich has only one nonzero element in the first column.

• We choose a pivot in the next column and write the pivot row as the second.– If this pivot equals to±1, or if the other points in the second column are con-

stant multiples of the pivot, we clean the second column completely, following theGauss–Jordan elimination.

– If we decide to pivot on a different element than±1, we usually perform the piv-oting not in the whole column, but in the rows below the pivot row only, followingthe Gauss elimination.

• In a similar manner we continue up to the last pivot. We clean the whole column if thepivot is±1 and pivot only below the pivot in the opposite case.


In the following example we will use the Gauss–Jordan elimination. Using this procedure wecould obtain rather big numbers. We must work with these large numbers as carefully as possible.

Example 6.3.Solve by Gauss–Jordan elimination the linear system.

3 x1 + 4 x2 − x3 + 3 x4 − 3 x5 = 04 x1 + 2 x2 − x3 + 2 x4 + 3 x5 = 02 x1 − 3 x2 + x3 − 4 x4 + 8 x5 = 03 x1 − x2 − 2 x3 − x4 + x5 = 0

Solution: The system is homogeneous and it possesses always at least trivial solution. Let ussolve this system by the Gauss–Jordan elimination, as required.The numbers in the first column are not convenient for pivoting. In view of this fact, we take thefirst row from the second and last row. This produces the new pivot a21 = 1 and a zero ina41.After this step we pivot on the elementa21

3 4 −1 3 −3 04 2 −1 2 3 02 −3 1 −4 8 03 −1 −2 −1 1 0

∼

3 4 −1 3 −3 01 −2 0 −1 6 02 −3 1 −4 8 00 −5 −1 −4 4 0

∼

1 −2 0 −1 6 00 10 −1 6 −21 00 1 1 −2 −4 00 −5 −1 −4 4 0

Now we pivot ona32 = 1. We write this pivot row as the second. We modify the first rowaswell – we multiply the pivot row by the number 2 and add to the first row. This producesa12 = 0. Further, we clean the rest of the column as usual. When finishing, we divide the lastrow by the number 2.

1 0 2 −5 −2 00 1 1 −2 −4 00 0 −11 26 19 00 0 4 −14 −16 0

∼

1 0 2 −5 −2 00 1 1 −2 −4 00 0 −11 26 19 00 0 2 −7 −8 0

The third row is rather complicated. Therefore we multiply the last row by the number 5 andadd to the third row. This producesa33 =−1 (a new pivot) and smaller numbers in the third row.

1 0 2 −5 −2 00 1 1 −2 −4 00 0 −1 −9 −21 00 0 2 −7 −8 0

We pivot ona33 in the whole third column.

1 0 0 −23 −44 00 1 0 −11 −25 00 0 −1 −9 −21 00 0 0 1 2 0

∼

1 0 0 −23 −44 00 1 0 −11 −25 00 0 1 9 21 00 0 0 1 2 0

Finally we pivot ona44 = 1 (in the whole fourth column).

1 0 0 0 2 00 1 0 0 −3 00 0 1 0 3 00 0 0 1 2 0

Discussion:The system has one free variable (one parameter).Conclusion:The system has infinitely many solutions. The set of solutionis an one-parametricset which can be written in the form

x1 = −2t,

x2 = 3t,


x3 = −3t,

x4 = −2t,

x5 = t, t ∈ R.

In the following example we present a technique which allowsto change the order of columns inthe matrix.

Example 6.4.Solve the linear system by Gauss–Jordan elimination.

− 2 x1 + 2 x2 + 5 x4 − 5 x5 = 2x1 − x2 − 4 x4 + x5 = −1x1 + x2 − x3 + 5 x5 = −4x1 + 3 x2 − 2 x3 + x4 − 2 x5 = 1

Solution:We write the augmented matrix of the system and pivot ona31. (The elementsa21 anda41 also handy for this.)

−2 2 0 5 −5 21 −1 0 −4 1 −11 1 −1 0 5 −41 3 −2 1 −2 1

∼

1 1 −1 0 5 −40 4 −2 5 5 −60 −2 1 −4 −4 30 2 −1 1 −7 5

The numbers 4,−2 and 2 in the second column are handy for pivoting by the Gausselimination. However, let us continue by the Gauss–Jordan elimination. With this approach itwould be better∗ to have a number±1 as a pivot, since the elementa21 = 1 is not an integermultiple of other numbers in the second column. From this reason we place the third column tothe place of the second column and use a new pivota32 = 1 (which was originallya33). Wehave to keep the name of the unknowns. From this reason we write the name of the unknownson the top of the column.

x1 x3 x2 x4 x51 −1 1 0 5 −40 −2 4 5 5 −60 1 −2 −4 −4 30 −1 2 1 −7 5

∼

x1 x3 x2 x4 x51 0 −1 −4 1 −10 1 −2 −4 −4 30 0 0 −3 −3 00 0 0 −3 −11 8

We divide the third row by the number−3. This producesa34 = 1 and we use this element formpivoting in the column. In a similar way we pivot in the last column.

x1 x3 x2 x4 x51 0 −1 −4 1 −10 1 −2 −4 −4 30 0 0 1 1 00 0 0 −3 −11 8

∼

x1 x3 x2 x4 x51 0 −1 0 5 −10 1 −2 0 0 30 0 0 1 1 00 0 0 0 −8 8

∼

x1 x3 x2 x4 x51 0 −1 0 5 −10 1 −2 0 0 30 0 0 1 1 00 0 0 0 −1 1

∼

x1 x3 x2 x4 x51 0 −1 0 0 40 1 −2 0 0 30 0 0 1 0 10 0 0 0 1 −1

The matrix is in the reduced row echelon form.Discussion:The system has a solution. The set of all solutions is an one-parametric family of5-tuples.

∗better, but not necessary


Conclusion:There are infinitely many solutions. We can write the equations corresponding tothe rows of augmented matrix (take care on the correct unknowns, indicated on the top of eachcolumn) and convert the unknownx2 on the right-hand side. This produces a set of all solutionsin the following form:

x1 = 4+x2,

x3 = 3+2x2,

x4 = 1,

x5 = −1.

(6.8)

We can write this set of solutions in a parametric form

x1 = 4+ t,

x2 = t,

x3 = 3+2t,

x4 = 1,

x5 = −1,

wheret is a real parameter.

Remark 6.6 (solution in a basis). Let us mention one advanced topic from the terminology.Sometimes we have to find the solution of the linear system which contains the givenr-tuple ofunknowns expressed in terms of the othern− r unknowns (r is the rank of the matricesA andA∗). Suppose, for brevity, that we have to find the solution of linear system in 7 unknowns andrank(A) = rank(A∗) = 4 in the form in which the unknownsx1, x2, x4, x6 are to be expressedthroughx3, x5 andx7. In this case we say that we have to find the solution of the system in thebasis~a1,~a2,~a4,~a6, where~ai is the column vector formed by thei-th column of the matrixA (see(6.4)). If the desired form of the solution exists, we say that vectors~a1, ~a2, ~a4 and~a6 form thebasisof the solution. The variablesx1, x2, x4 andx6 are in this sense calledvariables of the basisand the other variables are calledfree variables. As a particular example, the solution (6.8) ofthe linear system from Example 6.4 is written in the basis[~a1,~a3,~a4,~a5], the free variable isx2and the other unknowns are basic variables.The simplest way how to solve the system in the given basis is the following.

(i) We rearrange the columns of the augmented matrix of the system in such an order, thatthe columns of the coefficients which correspond to the variables in the basis are firstand the coefficients of the free variables follow. The coefficients of the right-hand sidescome last. We must indicate to each column the name of the variable which belongs tothis column!

(ii) We convert augmented matrix into its row echelon or intoits reduced row echelonform. This form must have nonzero numbers in the main diagonal and zeros under thisdiagonal. (Otherwise the given basis does not exist).

(iii) We solve the basic variables by consecutive evaluation (in the case of Gauss algorithm)or directly (in the case of Gauss–Jordan algorithm). In thisstep remember that thecolumns of the system are rearranged and each column corresponds to the variableindicated at this column.

Another possibility is to preserve the order of the columns as is and convert the matrix into theform, which would be (reduced) row echelon after erasing thecolumns corresponding to freevariables. This is showed in the following example.



2 x1 + 2 x2 + 8 x4 = 5− x1 + x2 + x3 − 3 x4 = −2

4 x1 − 2 x2 − 3 x3 + 4 x4 = 4− 2 x1 + 2 x2 + 2 x3 = 1

and write the solution in the basis[~a1,~a3,~a4].Solution: We write an augmented matrix of the system and we convert thismatrix in the formwhich would be row echelon (or reduce row echelon) after erasing the second column (sincea2is not in the basis). The rank of the matrix must be three (there are three vector in the basis).We will pivot on the elementa21 = 1 in the first column.

2 2 0 8 5−1 1 1 −3 −2

4 −2 −3 4 4 −2 2 2 0 1

∼

−1 1 1 −3 −20 4 2 2 10 2 1 −8 −40 0 0 6 3

Since~a2 is not in the basis, we continue to the third row. We pivot ona33 and, followingGauss–Jordan algorithm, we pivot in the whole column. We obtain the following matrix.

−1 −1 0 5 20 2 1 −8 −40 0 0 18 90 0 0 2 1

∼

−1 −1 0 5 20 2 1 −8 −40 0 0 2 1

The third row is a constant multiple of the last row and can be removed. Further we pivot ona34 = 2. When putting a zero toa14 we multiply the first row by(−2), multiply the pivot rowby 5 and add. Putting a zero toa24 a child’s play. Finally we divide each row by its pivot.

2 2 0 0 10 2 1 0 00 0 0 2 1

∼

1 1 0 0 1/20 2 1 0 00 0 0 1 1/2

The rank of both matrix and augmented matrix is 3. The solution in the basis[~a1,~a3,~a4] is

x1 =12−x2,

x3 = −2x2,

x4 =12.

Note that the variablex4 is fixed and it does not contain a free variablex3. From this reason thevariablex4 must be always in the basis. Another possible basis for this system are[~a1,~a2,~a4] and[~a2,~a3,~a4].

Remark 6.7. Let us denote the columns of the matrix of the linear system inthe precedingexample by~a1, ~a2, ~a3 and~a4. Further let us denote (in harmony with the notation (6.4)) thevector of the right-hand sides of equations by~b. From the fact that the system has infinitelymany solutions it follows that there are infinitely many possibilities how to write vector~b as alinear combination of the vectors~a1, . . . ,~a4. Let us putx2 = 0 (free variable) in the solution of

the system. We obtainx1 =12

, x3 = 0 andx4 = −12

. Hence there is only one possibility how to

write the vector~b as a linear combination of vectors~a1,~a3 and~a4. This linear combination is12~a1+0~a3−

12~a4 =~b.


It can be proved that if the vector~b changes and if the system remains solvable, then for eachvector~b there exist unique values ofx1, x3 andx4 such that

x1~a1+x3~a3+x4~a4 =~b

holds. Such a group of vectors~a1,~a3 and~a4 is said to be, in theory of vector spaces, to be abasisand the values ofx1, x2 andx3 coordinatesof the vector~b in this basis.

The following theorem presents an alternative method for solving the linear system. We willpresent here the simplest version of this theorem, which is applicable for systems with uniquesolution only.

Theorem 6.2(Cramer). Let us consider a system of n linear equations in n unknowns. Supposethat the n×n matrix A of this linear system is regular, i.e. D= detA 6= 0. Then the system hasa unique solution.Consider an arbitrary integer i,1 ≤ i ≤ n. Replace the i-th column of the matrix A by thecoefficients of right-hand sides and denote by Di determinant of this new matrix.Then the unknown xi can be calculated from the formula

xi =Di

D. (6.9)

Formula (6.9) is called the Cramer’s rule. The importance ofthis formula lies in the fact thateach unknown can be found simply by evaluating two determinants, without calculating any ofthe other unknowns. If we are interested in one of the unknowns only, the Cramer’s rule may bethe most effective tool for the calculation.Remark that an extension of this rule to the systems with infinitely many solutions can be foundin the literature. If we have to find all of the unknowns, then Cramer’s rule is usually less effectivethan the Gauss or Gauss–Jordan elimination.


3 x1 + x2 + 4 x3 = 1x1 + x2 + x3 = 0x1 − x2 − x3 = 1

(6.10)

for the unknownx2 and then solve the linear system

3 x1 + x2 + 4 x3 = b1x1 + x2 + x3 = b2x1 − x2 − x3 = b3

(6.11)

for the unknownx2. Hereb1, b2 andb3 are arbitrary real numbers.Solution:Consider system (6.10). We will use the Cramer’s rule. Determinant of the matrix ofthe system (6.10) is

D =

∣

∣

∣

∣

∣

∣

3 1 41 1 11 −1 −1

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

4 0 32 0 01 −1 −1

∣

∣

∣

∣

∣

∣

= (−1)(−1)5∣

∣

∣

∣

4 32 0

∣

∣

∣

∣

= −6

and sinceD 6= 0 then Cramer’s rule can be used. For determinantD2 we obtain

D2 =

∣

∣

∣

∣

∣

∣

3 1 41 0 11 1 −1

∣

∣

∣

∣

∣

∣

= 4+1− (3−1) = 5−2 = 3.


Hence the unknownx2 in (6.10) equals

x2 =D2

D=

3−6

= −12.

Now consider the more general system (6.11). The determinant D is the same as for (6.10) andthe evaluation of the determinantD2 gives

D2 =

∣

∣

∣

∣

∣

∣

3 b1 41 b2 11 b3 −1

∣

∣

∣

∣

∣

∣

= −b1

∣

∣

∣

∣

1 11 −1

∣

∣

∣

∣

+b2

∣

∣

∣

∣

3 41 −1

∣

∣

∣

∣

−b3

∣

∣

∣

∣

3 41 1

∣

∣

∣

∣

= 2b1−7b2 +b3,

where the Laplace expansion along the 2nd column has been used. The unknownx2 in (6.11) isgiven by formula

x2 =D2

D= −2b1−7b2+b3

6.

The following theorem presents one of the important applications of the inverse matrix. Thistheorem can be derived from the matrix notation of the system(6.7) using a multiplication byA−1 (provided this inverse exists).

Theorem 6.3. Let us consider the system of linear equations(6.1) or (6.6) with coefficients Ematrix A and with column-vector~b as a vector of right-hand sides. Suppose that A is invertibleand A−1 is its inverse.Then the system possesses a unique solution~x = A−1 ·~b, where the dot indicates the matrixmultiplication.

This theorem is of great interest in the case when we have a family of the linear systems whichdiffer in the right-hand sides only.


5 x1 − x2 + 5 x3 = 12 x1 + 2 x3 = 2

− 3 x1 − 3 x2 + x3 = 3(6.12)

and then solve the system

5 x1 − x2 + 5 x3 = b12 x1 + 2 x3 = b2

− 3 x1 − 3 x2 + x3 = b3

. (6.13)

The numbersb1, b2 andb3 are arbitrary real numbers.Solution:Example 4.1 on the page 91 shows that the matrix

A =

5 −1 52 0 2

−3 −3 1

has the inverse

A−1 =14

3 −7 −1−4 10 0−3 9 1

.

7. CONCLUDING REMARKS 106

By Theorem 6.3, the solution of (6.12) is

x1x2x3

=14

3 −7 −1−4 10 0−3 9 1

·

123

=14

−141618

=

−7/24

9/2

,

i.e. x1 = −72

, x2 = 4 andx3 =92

. Similarly the solution of (6.13) is

x1x2x3

=14

3 −7 −1−4 10 0−3 9 1

·

b1b2b3

=14

3b1−7b2−b3−4b1+10b2

−3b1+9b2 +b3

.

7. Concluding remarks

There exists a close connection between all of the ideas concerning matrices. This connectioncan be seen, for example, from the following theorem.

Theorem 7.1(summing–up theorem). Let A be an n×n square matrix. The following state-Ements are equivalent:

(i) The rows of the matrix A are linearly independent vectors fromRn.

(ii) The columns of the matrix A are linearly independent vectorsfromRn.

(iii) rank(A) = n(iv) A is invertible, i.e. A−1 exists.(v) A is regular, i.e.detA 6= 0.

(vi) System of linear equations with the coefficients matrix A hasa unique solution foreach n-tuple of right hand-sides.

(vii) Homogeneous system of linear equations with coefficients matrix A has only trivialsolution.

(viii) Every vector fromRn can be expanded as a linear combination of vectors formed

by rows (columns) of the matrix A. This expansion is unique, up to the order of thevectors in the linear combination.

CHAPTER VI

Integral Calculus

Motivation. The main problem of integral calculus is to find the function with known derivative,i.e., we will be interested in the following problem.

Given a function f(x) defined on the interval I, find the function y(x) definedon I which satisfies

y′(x) = f (x) (0.1)

for every x∈ I.

1. Indefinite integral (antiderivative)

The following definition simply introduces the name for the solution of the problem introducedin the motivation above.

Definition (antiderivative). Let I be an open interval,f andF be functions defined onI . If

F ′(x) = f (x) holds for everyx∈ I , (1.1)

then the functionF is said to be anantiderivativeof the functionf on the intervalI .

Remark 1.1 (continuity of the antiderivative). Let F be an antiderivative of some functionf onthe intervalI . Then the functionF is continuous onI . This follows from differentiability of theantiderivative and from the fact that differentiability implies continuity.

Example 1.1. • The functiony= x2+1 has the antiderivativey=x3

3+x. Really,

(

x3

3+x

)′=

x2 +1.

• The same is true also for the functiony =x3

3+x−5. Really,

(

x3

3+x−5

)′= x2 +1.

Theorem 1.1(uniqueness of the antiderivative). The antiderivative of a function is unique, upto an additive constant factor.More precisely, the following holds

(i) If F is an antiderivative of the function f on the interval I, then the same holds also for thefunction G(x) = F(x)+c, where c∈ R is arbitrary real quantity, not depending on x.

(ii) If F and G are antiderivatives of the same function f on the interval I, they are different atmost in the additive constant factor on the interval I, i.e. there exists c∈ R such that

G(x) = F(x)+c for x∈ I.

107

1. INDEFINITE INTEGRAL (ANTIDERIVATIVE) 108

Definition (indefinite integral). Let f be a function defined onI . The set of all antiderivativesof the function f is said to be anindefinite integralof the function f on the intervalI . Theindefinite integral is denoted by

∫

f (x)dx.

The function which has an antiderivative (indefinite integral) on the intervalI is said to beintegrableon the intervalI .

Remark 1.2 (arrangement). Theorem 1.1 shows that there is only a slight difference betweenone of the antiderivatives and all antiderivatives. From this reason we frequently use the termindefinite integral as a synonym to the antiderivative.

Theorem 1.2(sufficient condition for integrability). Every function which is continuous on theinterval I is integrable on I.

Remark 1.3(philosophical). According to Theorem 1.2, every continuous function possesses anantiderivative.The problem to find this antiderivative is a reverse problem to the differentiation. Differentiationof every elementary function can be performed completely using the formula for derivatives ofbasic elementary functions and rules for differentiation (the sum, product, quotient, chain, andconstant multiple rule).Unfortunately, the problem to find the antiderivative is much more difficult in comparison withthe calculation of the derivative of the function∗. There are integrable elementary function forwhich the antiderivative is known to exist, but it is not an elementary function and hence it cannotbe expressed by means of the finite number of elementary functions and operations like addition,subtraction, multiplication, division and composition ofthese functions.For example the antiderivative of the Gaussian functione−x2

exists, but it is not an elementaryfunctions. This antiderivative appears in statistics and it is one of the most important nonelemen-tary functions.

Theorem 1.3(linearity of the indefinite integral). Let f , g be functions integrable on the inter-Eval I and c be a real number. The following relations hold on the interval I

∫

f (x)±g(x)dx =∫

f (x)dx±∫

g(x)dx,∫

c f(x)dx = c∫

f (x)dx.

Example 1.2.∫

(

2cosx+1x2 −4

√x+sin

x2

cosx2

)

dx

= 2∫

cosxdx+

∫

x−2 dx−4∫

x12 dx+

12

∫

sinxdx

∗This is a quite common phenomenon in mathematics. A computation may be easy, and reversing the computa-tion may be very difficult. We will see such a phenomenon again, in a motivation to the method of partial fractions,below.

2. BASIC METHODS FOR INTEGRATION 109

= 2sinx+x−1

−1−4

x32

32

+12(−cosx)

= 2sinx− 1x− 8

3x√

x− 12

cosx+C

Remark 1.4(technical). The preceding theorem shows, that the operations addition of the func-tions and the multiplication by the constant factor causes no problems in the indefinite integral.Unfortunately, the situation is very different in the casesof multiplication, division and com-position of two functions. If we can find the antiderivativesof the functionsf (x) andg(x), wehave no possibility, how to use these antiderivatives for construction of the antiderivative of theproduct f (x).g(x), quotient f (x)/g(x) or the composite functionf (g(x)). From this reason wewill consider some special cases separately. We can easily evaluate the integral of the compositefunction with linear inside function (Theorem 2.1 below) and the antiderivative of the quotient,which has the derivative of the denominator in the numerator(Theorem 2.2 below), as the fol-lowing paragraphs show.

2. Basic methods for integration

Theorem 2.1(special case of the composite function). Let f be an integrable function definedon the open interval I and let F be an antiderivative of this function. Then

∫

f (ax+b)dx =1a

F(ax+b)+C,

for every x for which ax+b∈ I, where C is any real number.

Example 2.1(application of the preceding theorem).∫

1x2 +6x+15

dx =∫

1x2 +2 ·x·3+32+15−32 dx

=∫

1(

x+3)2

+6dx =

1√6

arctgx+3√

6+C,

∫

1√2x2−3

dx =∫

1√

(√

2x)2−3dx =

1√2

ln

∣

∣

∣

∣

√2x+

√

(√

2x)2−3

∣

∣

∣

∣

+C

=1√2

ln∣

∣

∣

√2x+

√

2x2−3∣

∣

∣+C.

Theorem 2.2(integral of a special fraction). Let f be a differentiable function which has nozero on the open interval I and f′(x) be its derivative on I. Then

∫

f ′(x)f (x)

dx = ln | f (x)|+C, (2.1)

where C is arbitrary real number, holds on the interval I.

Example 2.2(application of the preceding theorem).∫

tgxdx =∫

sinxcosx

dx = −∫ −sinx

cosxdx = − ln |cosx|+C,

∫

x+2x2 +4x+8

dx =12

∫

2x+4x2 +4x+8

dx =12

ln(x2 +4x+8)+C.


The next paragraphs are devoted to the integration of rational functions. Recall that the rational

function is the functionR(x) =Pn(x)Qm(x)

, wherePn(x) is ann-degree polynomial andQm(x) is an

m-degree polynomial. We have to distinguish two different cases. This is the main idea of thefollowing definition.

Definition (proper and improper rational function). Let R(x) =Pn(x)Qm(x)

be a rational function.

The functionR(x) is said to beproper if n < m andimproperotherwise.

Theorem 2.3(decomposition of the improper rational function). Every rational function pos-sesses a decomposition into a sum of a polynomial and a properrational function.

Remark 2.1 (technical). The decomposition from the preceding theorem can be found bythelong division of the polynomials (the division with the remainder). Roughly speaking, for im-proper functions we can divide the numerator by the denominator and for the proper functionswe cannot. After this division the problem reduces somewhat– the polynomial can be integratedby basic formulas and hence it is sufficient to develop a method for integration of proper rationalfunctions only. For simplicity we will consider the rational functions without multiple complexroots only.

Motivation. Adding the fractions on the left we can easily verify the relation14

1x−1

+14

1x+1

− 12

xx2 +1

=x

x4−1.

Which of these sides of the last equality is simpler? If we areinterested in an evaluation of thefunction at some particularx, then the expression on the right is simpler, since it contains onlyone fraction. However, from the point of view of integrationthe expression on theleft is simplersince it can be integrated by basic formulas for integration.We know how to write the formula on the right from the formula on the left, we simply addfractions. Now we develop a method which enables us to write the formula on the left from theformula on the right (this direction is more difficult).


Theorem 2.4(decomposition of the proper rational function into partial fractions). Let R(x) = EPn(x)Qm(x)

be a proper rational function. Suppose that the polynomialsPn(x) and Qm(x) have no

common zeros and the polynomial Qm(x) has only real zeros (with arbitrary multiplicity) orcomplex zeros of the multiplicity one.

• Let us assign to each simple real root c of the polynomial Qm(x) the fraction

Ax−c

,

where A is some (not yet determined) real constant.• Let us assign to each real root c of the multiplicity k of the polynomial Qm(x) the

k-tuple of the fractions

A1

x−c,

A2

(x−c)2 , . . . ,Ak

(x−c)k ,

where Ai are some (not yet determined) real constants.• For each pair of the mutually conjugate complex roots of the polynomial Qm(x) the

factorization of the polynomial Qm(x) contains the factor of the type(x2 +Mx+N).Let us assign to this pair of complex roots the fraction

Bx+Cx2 +Mx+N

,

where B and C are some (not yet determined) real numbers.Then for a convenient particular choice of the not constantsA, Ai , B, C, . . . the function R(x)can be written as a sum of all of the fractions, considered above. This decomposition is uniqueup to the order in the sum.

Definition (partial fractions). The fractions presented in the preceding theorem are calledpartial fractions.

Example 2.3(expansion into partial fractions). In the following we write (with undeterminedcoefficients) expansions of several proper rational functions.

x2

(x−1)x(x+3)=

Ax−1

+Bx

+C

x+3,

xx3−1

=A

x−1+

Bx+Cx2 +x+1

,

3x−2(x−1)2x2 =

Ax−1

+B

(x−1)2 +Cx

+Dx2 ,

x2 +2x+1(x2 +1)(x+2)2 =

Ax+Bx2 +1

+C

x+2+

D(x+2)2 .

Remark 2.2 (integrals of partial fractions). Remember that for each partial fraction we alreadyhave a method available for integration of this fraction. Really: The following formulas areconsequences of the known integration formula and basic rules for integration

∫

1x−a

dx = ln |x−a|, (2.2)


∫

1(x−a)n dx =

11−n

(x−a)1−n, for n > 1, (2.3)

∫

1x2 +a2 dx =

1a

arctgxa, (2.4)

∫

1(x+m)2+a2 dx =

1a

arctgx+m

a. (2.5)

The following partial fraction can be expanded as a sum of thepreceding fractions. If the de-nominator contains quadratic polynomial of the typex2 +a2, we write

∫

Ax+Bx2 +a2 dx =

∫

A2

2xx2+a2 +B

1x2 +a2 dx. (2.6)

The first fraction is (2.1) and the second one (2.4). If the full quadratic polynomial appears in thedenominator, we use the scheme

∫

Ax+Bx2 +Mx+N

dx =

∫

α2x+M

x2+Mx+N+β

1x2+Mx+N

dx. (2.7)

The first fraction is (2.1) and the second one can be convertedinto (2.5) by completing squaresin the denominator. The numbersα andβ are suitable real numbers which are given uniquelyand can be in each particular case easily found.

We summarize the method of the integration of the rational function.

Algorithm 2.1. To find the integral of a given rational function we use the following steps.(i) To integrate the rational function, this function must beproper. In the opposite case we

use the long division of polynomials and write the improper rational function as a sumof a polynomial and aproperrational function. We integrate the polynomial separatelyand evaluate the integral of the remaining rational function in the following steps.

(ii) The proper rational function may be directly one of the partial fractions. In this case weintegrate using the corresponding basic rules and formulasand finish. In the oppositecase we continue by the following step.

(iii) We factor the denominator of the rational function into the product of mutually differentlinear factors, powers of linear factors and (or) quadraticfactors with complex roots.Such a factorization always exists for the function which has only real roots and simplecomplex roots, however we can find it only in several special cases. If we cannot factorthe denominator, most probably we are not able to find the integral.

(iv) Suppose that the factorization of the denominator is known. In this case we know all ofthe roots of the denominator with their multiplicity. We write the corresponding partialfraction (fractions) to each of the roots as follows.• If the expression(x−K) appears in the factorization, we add the fraction of the

typeA

x−K.

• If the expression(x−K)n appears in the factorization, we addn partial fractionsA

x−K,

B(x−K)2 , . . . ,

. . .

(x−K)n .

• If the expression(x2 + Mx+ N) appears in the denominator, we add the fractionAx+B

x2+Mx+N.


In this way we work with all factors of the denominator. Remember that all of theconstants in the numerators of the partial fraction are mutually independent. At theend of this step we have a decomposition of the rational function into partial fractions,which containsm undetermined constants (m is the degree of the polynomial in thedenominator).

(v) We multiply the relation which describes the decomposition into partial fractions bythe common denominator. We obtain the equality which contains polynomials on bothsides.

(vi) We multiply parentheses and sum up like powers of the variablex on both sides. Thecoefficients at the corresponding powers on the left and on the right must be equal.From this condition we write a system ofn linear equations.

(vii) We solve the linear system with respect to the unknowns. If our computation is free oferrors, then the system has a unique solution.

(viii) If the original rational function has points of discontinuity (points, where the denomi-nator equals zero), we can put these points into the relationobtained in step (v). Thisis an effective method for quick evaluation of some of the constants.

(ix) We put the computed values of constants into the formal decomposition and integrateeach of the partial fractions separately by an application of the corresponding rule orformula.

Remark 2.3 (multiple complex roots). We will not consider the case when the denominator ofthe fraction has complex roots of multiplicity at least 2, i.e. the cases in which the factorizationof the denominator contains the expression like(x2+Mx+N)k for k∈ N, k > 1. The method forintegration of these fractions can be found in the literature.

Example 2.4.Find I1 =

∫

x2 +1(x−1)(x+2)(x−2)

dx.

Solution:We write the decomposition into partial fractions in terms of indeterminate coefficients

x2 +1(x−1)(x+2)(x−2)

=A

x−1+

Bx+2

+C

x−2

and multiply by a common denominator(x−1)(x+2)(x−2). We obtain equality

x2 +1 = A(x+2)(x−2)+B(x−1)(x−2)+C(x−1)(x+2).

We substitute successivelyx = 1, x = −2 andx = 2, as step (viii) of the Algorithm 2.1 suggests.We obtain equations

2 = A3(−1),

5 = B(−3)(−4),

5 = 4C,

with solutionsA = −23

, B =512

andC =54

. Since we have all coefficients, we need not to do

anything more (steps (vi) and (vii) of Algorithm 2.1 need notto be used) and the expansion ofthe integrand is

x2 +1(x−1)(x+2)(x−2)

= −23

x−1+

512

x+2+

54

x−2.


An integration gives

I1 =− 23

ln |x−1|+ 512

ln |x+2|+ 54

ln |x−2|

=112

(

−8ln|x−1|+5ln|x+2|+15ln|x−2|)

=112

ln

∣

∣

∣

∣

(x+2)5(x−2)15

(x−1)8

∣

∣

∣

∣

+C.

Example 2.5.Find I2 =∫

x4−x+1x3 +x2 dx.

Solution:The rational function is not proper. We must divide the numerator first

x4−x+1x3 +x2 = x−1+

x2−x+1x3+x2 .

We continue with the proper rational fraction on the right. We factor the denominator and writethe expansion into partial fractions

x2−x+1x2(x+1)

=Ax2 +

Bx

+C

x+1.

We multiply the last relation by a common denominatorx2(x+1)

x2−x+1 = A(x+1)+Bx(x+1)+Cx2

Substitutionsx = 0 andx = −1 suggested in step (viii) of Algorithm 2.1 give equations

1 = A,

3 = C.

We have to find the value of the constantB. We expand the formula on the right hand side

x2−x+1 = Ax+A+Bx2 +Bx+Cx2

and from the coefficients at the like powers ofx we write equations

x2 : 1 = B+C,

x1 : −1 = A+B,

x0 : 1 = A.

From the first or the second equation we obtainB = −2. Hence we can substitute these valuesinto the expansion and integrate

I2 =∫

x−1+1x2 −

2x

+3

x+1dx

=x2

2−x− 1

x−2ln|x|+3ln|x+1|

=x3−2x2−2

2x+ ln

|x+1|3x2 +C.

Example 2.6.Find I3 =

∫

xx3−8

dx.

3. ADVANCED METHODS FOR INTEGRATION 115

Solution: Factorization of the denominator gives(x− 2)(x2 + 2x+ 4) and the expansion withundetermined coefficients is

x(x−2)(x2+2x+4)

=A

x−2+

Bx+Cx2+2x+4

.

Multiplication by the common denominator gives

x = A(x2 +2x+4)+(Bx+C)(x−2)

and after substitutionx = 2 we obtain 2= 12A, i.e. A =16

. Multiplying the parentheses on the

right we obtain

x = Ax2 +2Ax+4A+Bx2−2Bx+Cx−2C

which is equivalent to the system

x2 : 0 = A+B,

x1 : 1 = 2A−2B+C,

x0 : 0 = 4A−2C,

with solutionB = −16

andC =13

. Hence we obtain

I3 =∫

16

1x−2

+−1

6x+ 13

x2 +2x+4dx

=16

ln |x−2|− 16

∫

x−2x2 +2x+4

dx

=16

ln |x−2|− 16

∫ 12(2x+2)−1−2

x2 +2x+4dx

=16

ln |x−2|− 16

∫

12

2x+2x2 +2x+4

+−3

(x2+2x+1)+3dx

=212

ln |x−2|− 112

ln |x2 +2x+4|+ 36

∫

1(x+1)2+3

dx

=112

ln(x−2)2− 112

ln |x2+2x+4|+ 1

2√

3arctg

x+1√3

=112

ln(x−2)2

x2 +2x+4+

1

2√

3arctg

x+1√3

.

3. Advanced methods for integration

Theorem 3.1(special case of the product, integration by parts). Let u and v be functions differ-Eentiable on the interval I. The equality

∫

u(x)v′(x)dx = u(x)v(x)−∫

u′(x)v(x)dx, (3.1)

holds on I whenever the integral on the right-hand side exists.


Remark 3.1 (practical). Let P(x) be a polynomial. Typically we use the integration by parts inthe evaluation of the integrals like

∫

P(x)eαx+β dx,∫

P(x)sin(αx+β )dx,∫

P(x)cos(αx+β )dx,

and∫

P(x)arctgxdx,∫

P(x) lnmxdx,

whereα,β ∈ R andm∈ N. In the first set of the integrals we choose the functionsu andv suchthat the polynomial function is differentiated and the exponential (or trigonometric) functionis integrated. In the second set of the integrals we choose the functionsu andv such that thefunctions arctgx or lnx are differentiated and the polynomial is integrated.

Example 3.1(integration by parts).∫

(x2+1)e−xdxu = x2 +1 u′ = 2x

v′ = e−x v = −e−x= −(x2 +1)e−x +2

∫

xe−x dx

u = x u′ = 1

v′ = e−x v = −e−x= −(x2 +1)e−x +2

(

−xe−x +∫

e−xdx)

= −(x2 +1)e−x +2(−xe−x−e−x) = −e−x(x2 +2x+3)+C,

∫

xarctgxdxu = arctgx u′ =

11+x2

v′ = x v=x2

2

=x2

2arctgx− 1

2

∫

x2

1+x2 dx

=x2

2arctgx− 1

2

∫

1− 11+x2 dx =

x2

2arctgx− 1

2x+

12

arctgx+C.

Theorem 3.2(integral of a special case of the composite function, the 1st method of substitu-tion). Let f(t) be a function continuous on I,ϕ(x) be a function differentiable on J andϕ(J) = I Eholds. Then for x∈ J

∫

f (ϕ(x))ϕ ′(x)dx =∫

f (t)dt (3.2)

holds, where we substitute t= ϕ(x) on the right hand side.

Remark 3.2 (technical). With substitution (3.2) we writet instead ofϕ(x) and dt instead ofϕ ′(x)dx in the given integral. In other words, whenever we use substitution t = ϕ(x) , we sub-

stitute alsodt = ϕ ′(x)dx .

The 2nd method of substitution is a converse of the 1st methodof substitution.


Theorem 3.3(the 2nd method of substitution). Let f(x) be a function continuous on the openEinterval I, ϕ(t) be a differentiable function which has no stationary point on the interval J andϕ(J) = I. Then

∫

f (x)dx =

∫

f (ϕ(t))ϕ ′(t)dt (3.3)

holds on I, where we substitute t= ϕ−1(x), on the right-hand side. Hereϕ−1(x) denotes theinverse function to the functionϕ(x).

Remark 3.3. An existence of the inverse functionϕ−1 follows from the fact that the derivativeof the functionϕ has no zeros (there exists no stationary point). The expression on the right-handside seems to be more complicated than the integral of the left-hand side of (3.3). However, ineach particular case we perform such a substitution which makes the expression on the right-handside simple after some algebraic manipulations.

Remark 3.4 (technical). With substitution (3.3) we writeϕ(t) instead ofx andϕ ′(t)dt insteadof dx. In other words, when using substitutionx = ϕ(t) , we substitute alsodx = ϕ ′(t)dt .

Example 3.2(substitution suggested by the composite function).

∫

xe1−x2dx

1−x2 = t−2xdx = dt

xdx = −12

dt= −1

2

∫

et dt = −12

et = −12

e1−x2+C

Remark 3.5 (trigonometric substitution). Consider integral∫

R(sinx,cosx)dx whereR(·, ·) is

a rational function of two variables. This integral covers integrals of fractions with powers ofsinx and cosx. In this case itmaybe possible to take out from the numerator or the denominatorone of the trigonometric function (e.g. sinx). If after this step the function sinx appears in thefraction in even powers only, then we use the identity sin2x = 1− cos2x, convert the integral

into∫

R(cosx)sinxdx and continue with substitution cosx = t. If this fails, it may be possible

to take out from the integrand the function cosx and if in the fraction the function cosx remainsin even powers only, we can through the identity cos2x = 1−sin2x convert the integral into the

form∫

R(sinx)cosxdx which is ready for substitution sinx = t. If both possibilities fail, it is

possible to use the substitution tgx2

= t. This, however, exceeds the level of our course.

Example 3.3(trigonometric substitution, odd power in the numerator).∫

tg3xdx =∫

sin3xcos3x

dx =∫

sin2xcos3x

sinxdx =∫

1−cos2xcos3x

sinxdx =

cosx = t

−sinxdx = dt

sinxdx = −dt

=

∫

−1− t2

t3 dt =

∫

1t− t−3dt =

= ln |t|+ 12

t−2 = ln |cosx|+ 12cos2x

+C


Example 3.4(trigonometric substitution, odd power in the denominator).∫

1(2+cosx)sinx

dx =

∫

sinx

(2+cosx)sin2xdx

=∫

1(2+cosx)(1−cos2x)

sinxdx

cosx = t

sinxdx = −dt= −

∫

1(2+ t)(1− t2)

dt =

=

∫

1(2+ t)(1+ t)(t−1)

dt

=

∫

−12

11+ t

+16

1t−1

+13

12+ t

dt

= −12

ln |1+ t|+ 16

ln |t−1|+ 13

ln |2+ t|

= −12

ln(1+cosx)+16

ln(1−cosx)+13

ln(2+cosx)

=16

ln(1−cosx)(2+cosx)2

(1+cosx)3 +C

Remark 3.6 (irrational functions, substitution for elimination of the root). In the integral of

the form∫

R(

x,√

ax+b)

dx we use the substitution which removes the root,ax+b = t2 ,

or equivalentlyx =1a(t2− b) and t =

√ax+b (we supposet ≥ 0, assumptiont ≤ 0 andt =

−√

ax+b leads to the same result). If the termk√

ax+b appears inside the integral, we performthe substitutionax+b = tk which removes thek-th root.

Example 3.5(irrational functions).

∫

√3x+2−1

x+1dx

3x+2 = t2

3dx = 2t dt

dx =23

t dt

x =13(t2−2)

t =√

3x+2

=

∫

t −113(t2−2)+1

23

t dt

= 2∫

t −1t2+1

t dt = 2∫

t2− tt2+1

dt = 2∫

1+−t −1t2+1

dt

= 2∫

1− tt2+1

− 1t2+1

dt

= 2(

t − 12

ln |t2+1|−arctgt)

+C

= 2√

3x+2− ln |3x+3|−2arctg√

3x+2+C


Remark 3.7. Other frequently used substitutions are

∫

R(ex)dx

ex = t

x = ln t

dx =1t

dt

=∫

R(t)t

dt,

∫

R(lnx)1x

dxlnx = t

1x

dx = dt=

∫

R(t)dt.

The following examples present another types of substitutions. In this examples we suggestcorrect the substitution in the formulation of the problem.

Example 3.6.Assumingx > 0, substitutet =√

1−x2 in the integrals

I0 =

∫

√

1−x2dx, I1 =

∫

x√

1−x2dx,

I2 =∫

x2√

1−x2dx, I3 =∫

x3√

1−x2dx.

EvaluateI1 andI3. (Don’t evaluateI0 andI2)Solution:With the suggested substitutiont =

√

1−x2 we have

t2 = 1−x2

x2 = 1− t2

2xdx = −2t dt

xdx = −t dt

These relations enable to substitute in the integrals consisting from the termsx2,√

1−x2 andxdx, since

x2 = 1− t2,√

1−x2 = t, xdx = −t dt.

This concerns integralsI1 andI3. The substitution is not so comfortable for integralsI0 andI2.For these integrals we have to use also the relation

x =√

1− t2.

Note that after this substitution radicals appear in the integrals and we obtain difficult integrals.

I0 =

∫

1x

√

1−x2xdx =

∫

1√1− t2

t(−t)dt =

∫ −t2√

1− t2dt, (difficult)

I1 =∫

√

1−x2xdx =∫

t(−t dt) = −∫

t2dt = −t3

3= −1

3(1−x2)

√

1−x2 +C,

I2 =

∫

x√

1−x2xdx =

∫

√

1− t2t(−t)dt = −∫

t2√

1− t2dt, (difficult)

I3 =∫

x2√

1−x2xdx =∫

(1− t2)t(−t)dt =∫

t4− t2dt =t5

5− t3

3

=(1−x2)2

√1−x2

5− (1−x2)

√1−x2

3=

(1−x2)√

1−x2

15[3(1−x2)−5]


= −(1−x2)(2+3x2)√

1−x15

.

We have seen that the substitution is useless in the evaluation of the integralsI0 andI2.

Now we suggest another substitution which allows to integrate I0.

Example 3.7.EvaluateI0 =

∫

√

1−x2dx by substitutionx = sint.

Solution: With x = sint we have dx = cost dt. We will work on the interval where cost ispositive. The substitution gives

I0 =∫

√

1−sin2 t cost dt =∫ √

cos2 t cost dt =∫

cos2 t dt

=∫

1+cos2t2

dt =12

∫

(1+cos2t)dt

=12

(

t +12

sin2t

)

=12(t +sint cost) =

12(t +sint

√

1−sin2 t)

=12(arcsinx+x

√

1−x2)+C.

In the process of evaluation we used the trigonometric formulas

cos2x =1+cos2x

2,

12

sin2x = sinxcosx.

Example 3.8.Evaluate the integral∫

tg3xdx by substitution tgx = t.

Solution:With the suggested substitution we have

tgx = t

x = arctgt

dx =1

1+ t2 dt

and hence∫

tg3xdx =∫

t3 11+ t2 dt =

∫

t3

t2+1dt =

∫

t − tt2+1

dt

=t2

2− 1

2ln(t2+1) =

12

tg2x− 12

ln(1+ tg2x)+C

Note that we evaluated this integral by the substitution cosx = t hereinbefore, in Example 3.3.Both results seem to be different, but using convenient algebraic modifications you can show thatboth approaches give either the same primitive function, orthe results differ only in an additiveconstant of integration.

Remark that the substitutiont = tgx can be used for evaluation of the integral∫

tg4xdx, where

trigonometric substitution from Example 3.3 fails (both sinx and cosx are in an even power).

Example 3.9.Use the substitutiont = tgx in the integral∫

sinx+cosxsinx−cosx

dx. Do not evaluate the

integral.

Solution:With this substitution we havex = arctgt and dx =1

t2+1dt. We have to convert sinx

and cosx to tgx and then tot. To do this we consider the right triangle on Figure 1. From this


1

√

t2+1

x

t

FIGURE 1. Triangle

triangle and from the definition of trigonometric functionsit is clear that sinx =t√

t2+1and

cosx =1√

t2+1. We put the substitution in the integral

∫

sinx+cosxsinx−cosx

dx =

∫

t√t2+1

+1√

t2+1t√

t2+1− 1√

t2+1

1t2+1

dt =

∫

t +1(t−1)(t2+1)

dt.

The last integral can be evaluated using partial fractions.

Example 3.10.Use the substitutionx = tgt in the integral∫

1

(x2 +1)32

dx. Do not evaluate this

integral.

Solution:With x = tgt we have dx =1

cos2 tdt and hence this substitution gives

∫

1

(x2+1)32

dx =

∫

1

(1+ tg2 t)32

1cos2 t

dt =

∫

1(

1+sin2 tcos2 t

)

32

1cos2 t

dt

=

∫

1(

sin2+cos2 tcos2 t

)

32

1cos2 t

dt =

∫

1(

1cos2 t

)

32

1cos2 t

dt

=∫

cost dt.

The integral int is very easy now.

Example 3.11. Substitute

√

xx−1

= t in the integral∫

x+

√

xx−1

x+1dx. Do not evaluate the

integral.Solution:From the relation betweenx andt we have

xx−1

= t2

x = t2(x−1)

t2 = (t2−1)x

t2

t2−1= x

4. RIEMANN INTEGRAL 122

Differentiating the last expression we have

2t(t2−1)− t22t(t2−1)2 dt = dx

and hence

− 2t(t2−1)2 dt = dx.

Now we can employ the substitution:

∫

x+

√

xx−1

x+1dx = −

∫

t2

t2−1+ t

t2

t2−1+1

2t(t2−1)2 dt.

Simplification of this integral is left to the reader.

4. Riemann integral

Definition (partition of the interval). Let [a,b] be a closed interval. Under apartition of theinterval [a,b] we understand the finite sequenceD = {x0,x1, . . . ,xn} of points from the interval[a,b] with property

a = x0 < x1 < x2 < x3 < · · · < xn−1 < xn = b.

Under anorm of the partition Dwe understand the maximal of the distances of two consecutivepoints in the partition. The norm of the partitionD is denoted byν(D). Henceν(D) = max{xi −xi−1,1≤ i ≤ n}.

Definition (integral sum). Let [a,b] be a closed interval andf be a function defined andbounded on[a,b]. Let D be a partition of the interval[a,b]. Let R = {ξ1, . . . ,ξn} be a finitesequence of points from the interval[a,b] satisfyingxi−1 ≤ ξi ≤ xi for i = 1..n (i.e. R containsone arbitrary number from each of the subintervals formed bythe partitionD). The sum

σ( f ,D,R) =n

∑i=1

f (ξi)(xi −xi−1)

is said to be anintegral sum of the function fassociated to thepartition D and thechoice of thenumbersξi in R.

Remark 4.1 (geometric interpretation of the integral sum). Suppose, for simplicity, that theEfunction f is nonnegative on the interval(a,b). Each term of the integral sum presents an area ofthe rectangle with the basis formed by the subinterval in thepartition and the height is equal tothe value of the functionf evaluated at the corresponding pointξ from this interval. The integralsum is equal to the area of the set formed from all this rectangles.

4. RIEMANN INTEGRAL 123

a = x0 x1 x2 x3 xn−1 b = xn

. . .

ξ1 ξ2 ξ3 ξn

FIGURE 2. Integral sum.

Definition (Riemann integral). Let [a,b] be a closed interval andf be a function defined andbounded on the interval[a,b]. Let Dn be a sequence of partitions of the interval[a,b] whichsatisfiesν(D) → 0 for n→ ∞ andRn be a sequence of the corresponding choices of numbersξfrom this interval. The functionf is said to beintegrable in the sense of Riemann on the interval[a,b] if there exists a real numberI ∈ R with property

limn→∞

σ( f ,Dn,Rn) = I

for every sequenceDn, which satisfies limn→∞

ν(Dn) = 0 and for arbitrary particular choice of the

pointsξ in Rn. The numberI is said to be aRiemann integral of the function f on the interval[a,b]. We write

I =∫ b

af (x)dx.

Remark 4.2 (verbal formulation of the preceding definition). Suppose, for simplicity, that theEfunction f is continuous on the interval[a,b]. The definition of the Riemann integral can beexpressed as follows:

(i) We divide the interval[a,b] into subintervals. In thei-th subinterval we choose anarbitrary numberξi and evaluate the integral sum.

(ii) In the next step we choose a refinement of the partition. We consider a new partitionof the interval[a,b] with a less norm than is the norm of the partition considered in theprevious step. Again we choose numbersξi and compute the corresponding integralsum.

(iii) We consider finer and finer partition of the interval[a,b] and perform the computationdescribed in the preceding step. The integral sums approaches to the valueI , which isthe value of the integral.

The continuity of the functionf guarantees also the integrability of this function (i.e. the conver-gence of the limit process). Remember that in the discontinuous case the proof of the integrabilitymay be a very hard work.

5. GENERAL THEORY OF THE RIEMANN INTEGRAL 124

Definition (extension of Riemann integral). For a > b we define∫ b

af (x)dx = −

∫ a

bf (x)dx.

Further we define∫ a

af (x)dx = 0.

Remark 4.3 (geometric interpretation). From the definition of the Riemann integral it is clear

that if the functionf is nonnegative on the interval[a,b], then the value of the integral∫ b

af (x)dx

equals to the area of the region in the plane bounded from above by the graph of the functionf (x), from below by thex-axis, from the left by the vertical linex = a and from the right bythe vertical linex = b. This area is called anarea under the curve y= f (x) for x ∈ [a,b] , orcurvilinear trapezoid.

5. General theory of the Riemann integral

Theorem 5.1(sufficient conditions for integrability).(i) The function which is continuous on[a,b] is integrable on[a,b] (in the sense of Riemann).(ii) The function which is bounded on[a,b] and contains at most finite number of discontinuities

on this interval is integrable (in the sense of Riemann).(iii) The function which is monotone on[a,b] is integrable (in the sense of Riemann).

Theorem 5.2(linearity with respect to the function). Let f , g be functions integrable on[a,b]and c be a real number. The following relations are valid.

∫ b

a[ f (x)+g(x)]dx =

∫ b

af (x)dx+

∫ b

ag(x)dx,

∫ b

ac f(x)dx = c

∫ b

af (x)dx.

Theorem 5.3(additivity with respect to the domain of the integration). Let f be a functionintegrable on[a,b]. Let c∈ (a,b) be arbitrary. Then the functions f is integrable on bothintervals[a,c] and[c,b] and

∫ b

af (x)dx =

∫ c

af (x)dx+

∫ b

cf (x)dx

holds

Theorem 5.4(monotonicity with respect to the function). Let f and g be functions integrable on[a,b]

such that f(x) ≤ g(x) for x∈ (a,b). Then∫ b

af (x)dx≤

∫ b

ag(x)dx holds.

Remark 5.1 (integral of the nonnegative function). For f ≡ 0 we obtain from the precedingtheorem the following result: Integral of the nonnegative function is a nonnegative number.

Theorem 5.5(mean value theorem). Let f be continuous on the closed interval[a,b]. There

exists at least one numberµ ∈ [a,b] with the property f(µ)(b−a) =∫ b

af (x)dx.

Definition (mean value). The valuef (µ) of the function f in the pointµ from the precedingtheorem is said to be amean value of the function f on the interval[a,b].

6. EVALUATION OF THE RIEMANN INTEGRAL 125

Remark 5.2 (geometric interpretation). From a geometric point of view it is clear that the areaof the curvilinear trapezoid under the curvey= f (x) with basis[a,b] and the area of the rectanglewith height equal to the mean value and basis[a,b] are the same.

6. Evaluation of the Riemann integral

Usually the only effective method how to evaluate the Riemann integral is the following theorem.This theorem converts the problem to find the Riemann integral into the problem to find anantiderivativeF of the functionf . Under suitable regularity conditions (continuity) the Riemannintegral can be calculated through the values of the antiderivative at the pointsa andb.

Theorem 6.1(Newton–Leibniz). Let f(x) be integrable in the sense of Riemann on[a,b]. Let EF(x) be a function continuous on[a,b] which is an antiderivative of the function f on theinterval (a,b). Then

∫ b

af (x)dx = [F(x)]ba = F(b)−F(a)

holds.

Example 6.1.Find∫ 2

1xln2xdx.

Solution:Integrating by parts we have

∫

xln2xdxu = ln2x u′ =

2x

lnx

v′ = x v=x2

2

=x2

2ln2x−

∫

xlnxdx

u = lnx u′ =1x

v′ = x v=x2

2

=x2

2ln2x−

[

x2

2lnx−

∫

x2

dx

]

=x2

2ln2x−

[

x2

2lnx− x2

4

]

=x2

4(2ln2x−2lnx+1),

where the constant of integration can be chosen arbitrary, we usedC = 0. The antiderivative iscontinuous on(0,∞) and we can use the Newton–Leibniz formula

∫ 2

1xln2xdx =

[

x2

4

(

2ln2x−2lnx+1)

]2

1

=44

(

2ln22−2ln2+1)

− 14

(

2ln21−2ln1+1)

= 2ln22−2ln2+34.

Remark 6.1 (trapezoidal rule). Let f be a continuous function on[a,b]. Let us cut the intervalE

[a,b] into n subintervals of theequallengthh =b−a

n. Let us denotex0, x1, . . . ,xn the boundary

points of these subintervals and byy0, y1, . . . , yn the corresponding values of the functionf

6. EVALUATION OF THE RIEMANN INTEGRAL 126

in these points. (In this notationa = x0 andb = xn.) The main idea of this method is that weapproximate the area of the region under the curvey = f (x) on the interval[xi ,xi+1] by the areaof the trapezoid with corners[xi,0], [xi ,yi ], [xi+1,0] and[xi+1,yi+1]. The sum of the areas of thesetrapezoids is an approximation of the integral, more precisely, the following theorem holds:

∫ b

af (x)dx≈ h

2

(

y0+2y1 +2y2+ · · ·+2yn−1 +yn

)

.

This formula works also for the function which changes sign.The error in this formula is smallfor

• largen, i.e., the partition is sufficiently fine• smallb−a, i.e., the interval is not too long• small| f ′′(x)| on (a,b).

Algorithm 6.1 (approximation of definite integral by the trapezoidal rule). Let us summarize thesteps for an approximation of the definite integral by trapezoidal rule.

(i) Choose somen and calculate the lengthh =b−a

nof subintervals.

(ii) Calculate the points in the partition of the interval[a,b]: x0 = a, x1 = a+h, x2 = a+2h,x3 = a+3h, . . . ,xi = a+ ih, . . . ,xn−1 = a+(n−1)h, xn = b.

(iii) Calculate the values of the function in the points of the partition: y0 = f (x0), y1 =f (x1), . . . ,yi = f (xi), . . . ,yn = f (xn). . . . ,yn = f (xn).

(iv) Calculate the sumS= y0 +2y1 +2y2+ · · ·+2yi + · · ·+2yn−1 +yn

(v) An approximation for the integral∫ b

af (x)dx is the number

hS2

whereS is the sum

obtained in the preceding step. Ifn is sufficiently large, then the error is small.

Example 6.2.Find∫ 2

1

sinxx

dx.

Solution:We use the trapezoidal rule withn = 10. The limits of the integral area = 1 andb = 2.

The length of the intervals ish =b−a

n=

2−110

= 0.1. We record our computations into the

following table (we round to 6 decimal places).

i xi = a+hi yi =sinxi

xim myi

0 1 0.841471 1 0.8414711 1.1 0.810189 2 1.6203772 1.2 0.776699 2 1.5533983 1.3 0.741199 2 1.4823974 1.4 0.703893 2 1.4077855 1.5 0.664997 2 1.3299936 1.6 0.624734 2 1.2494677 1.7 0.583332 2 1.1666648 1.8 0.541026 2 1.0820539 1.9 0.498053 2 0.99610510 2 0.454649 1 0.454649

The sum of the numbers in the last column isS= 13.184361 and hence∫ 2

1

sinxx

dx≈ hS2

=S20

=

0.659218.

7. APPLICATIONS IN GEOMETRY 127

An “exact” calculation based in the antiderivative and Newton-Leibniz theorem cannot be used

in this example, since the functionsinx

xhas no antiderivative in the class of elementary functions.

An application of more advanced techniques shows that the value of the integral rounded to 15decimal places isI

.= 0.659329906435512. We see that the trapezoidal rule producedin our

computation an error on the 4th decimal place.

7. Applications in geometry

A Riemann integral of a nonnegative continuous function is equal to the area of the curvilinearEtrapezoid under the graph of the function, as Remark 4.3 on the page 124 states. This followsimmediately from the definition of the Riemann integral. Another geometrical applications arethe following.

• The areaS of the region in the plane bounded from above by the curvey = f (x) andfrom below by the curvey = g(x) (we supposef (x) ≥ g(x) on [a,b]) can be calculatedfrom the formula

S=

∫ b

a[ f (x)−g(x)]dx. (7.1)

The signs of the functionsf andg in this formula can be arbitrary.• The volumeV of the solid formed by a revolution of the curvilinear trapezoid under

the curvey = f (x) for x = [a,b] about thex-axis can be calculated from the formula

V = π∫ b

af 2(x)dx.

The functionf is supposed to be nonnegative on[a,b].

x

y

x

y

x

y

x

S=∫ b

af (x)dx V = π

∫ b

af 2(x)dx

• Many other volumes and areas can be calculated by cutting thegiven region (solid)into several pieces which satisfy the above conditions. We calculate the area (volume)of each piece separately and then add (or subtract, if necessary) these areas (volumes).The area of the shaded region on the following picture is given by (7.1). The volumeof the solid which is constituted by a revolution of this shaded region about thex-axis

(see the figure) isV = π∫ a

b

[

f 2(x)−g2(x)]

dx

8. RIEMANN INTEGRAL ON THE LINE AND HALF-LINE 128

x

y

ab

f (x)

g(x)

x

y

x

y

S=

∫ b

a[ f (x)−g(x)]dx V = π

∫ a

b

[

f 2(x)−g2(x)]

dx

• Let f be a continuously differentiable function. The lengthL of the graphy = f (x)between the pointsa andb can be calculated from the formula

L =

∫ b

a

√

1+[ f ′(x)]2dx. (7.2)

• There are many other geometric, technical and physical applications. For particularapplications in your area of interest consult the literature.

8. Riemann integral on the line and half-line

In the following we extend the concept of Riemann integral for integration on the unboundedintervals like[1,∞), (−∞,∞) and so on. This is handy especially because of applications instatistics.

Definition (improper integral). Let a be a real number andf be a function integrable in thesense of Riemann on the interval[a, t] for everyt > a. Under animproperintegral

I =

∫ ∞

af (x)dx (8.1)

we understand the limit

I = limt→∞

∫ t

af (x)dx,

if this limit exist as a finite number. In this case the integral is said beconvergent. If the limitdoes not exists or equals±∞, then the integral is said to bedivergent.

The integral∫ a

−∞f (x)dx is defined in a similar way as the limit lim

t→−∞

∫ a

tf (x)dx.

The integral∫ +∞

−∞f (x)dx is defined as the sum of two integrals

∫ c

−∞f (x)dx+

∫ ∞

cf (x)dx where

c ∈ R is any real number, provided both integrals are convergent.It can be shown that theparticular value ofc has no influence to the value of the resulting integral.

Example 8.1.Find I =

∫ ∞

1

1x(x2 +1)

dx.

8. RIEMANN INTEGRAL ON THE LINE AND HALF-LINE 129

Solution:Substituting the upper limit byu we obtain

I = limu→∞

∫ u

1

1x(x2 +1)

dx.

Decomposition into partial fractions gives∫

1x(x2 +1)

dx =∫

1x− x

x2 +1dx = lnx− 1

2ln(x2 +1).

The evaluation of the integral by Newton–Leibniz formula gives∫ u

1

1x(x2 +1)

dx = lnu− 12

ln(u2+1)+12

ln(2)

and hence

I = limu→∞

[

lnu− 12

ln(u2+1)+12

ln(2)]

.

We add the terms with logarithms and evaluate the limit as a limit of continuous function withcontinuous “outside” component

I =12

ln2+12

ln(

limu→∞

u2

u2+1

)

=12

ln2+12

ln1 =12

ln2.

Example 8.2.Find I =∫ ∞

2

1xlnx

dx.

Solution:We write

I = limu→∞

∫ u

2

1xlnx

dx.

The indefinite integral satisfies∫

1xlnx

dx =∫ 1

x

lnxdx = ln | lnx|

and hence

I =

∫ ∞

2

1xlnx

dx = limu→∞

∫ u

2

1xlnx

dx = limu→∞

[

ln | lnu|− ln | ln2|]

= ∞

and the integral diverges.

CHAPTER VII

Differential Equations

1. Motivation

The basic problem of the integral calculus is the following:Given a function f(x) defined on the interval I, find the function y(x) definedon I which satisfies

y′(x) = f (x) (1.1)

for every x∈ I.

The solution to this problem is known:

y(x) =

∫

f (x)dx+C, (1.2)

where∫

f (x)dx is an arbitrary antiderivative of the functionf andC is an arbitrary constant.

A slight modification of this problem is:Given a function f(x) defined on the interval I and given numbersα ∈ I andβ ∈ R, find the function y(x) defined on I which satisfies(1.1) for every x∈ Iand y(α) = β .

The solution is also known: We solve the problem (1.1) and obtain (1.2). In the relation (1.2)we substitute (after evaluation of the integral, of course)x = α andy = β . Then we solve theobtained equation forC and substitute the value ofC into (1.2). The obtained function is asolution of (1.1) and satisfiesy(α) = β .

Example 1.1.Find the functiony which satisfies

y′ = 2x, y(1) = 2.

Solution:Integration of the equation givesy(x) =

∫

2xdx= x2+C. From the conditiony(1) = 2

we substitutex = 1 andy = 2 in the expressiony = x2 +C. We obtain

2 = 12+C

and henceC = 1. The solution of the problem is the functiony(x) = x2 +1.

In the following we slightly reformulate this simple problem. We will suppose that both variablesx andy appear on the right-hand side of (1.1). This assumption makes the problem very difficult.First let us formulate this new problem.

130

2. GENERAL ORDINARY DIFFERENTIAL EQUATIONS 131

2. General ordinary differential equations

Definition (ordinary differential equation). Let ϕ(x,y) be a function of two variables. Theequation

y′ = ϕ(x,y), (2.1)

is called thefirst order ordinary differential equation, explicitly solved for y′, shortlyordinarydifferential equationor ODE.

Definition (solution of ODE). Under asolutionof the ODE on the intervalI we understandevery functiony = y(x) which has the following properties

• y(x) is differentiable onI ,• the functionϕ(x,y(x)) is defined for allx∈ I ,• the relationy′ = ϕ(x,y(x)) holds for allx∈ I .

Definition (initial value problem). Let α, β be real numbers,α ∈ I . The problem to find thefunctiony = y(x) which satisfies equation (2.1) and aninitial condition

y(α) = β (2.2)

is called aninitial value problem(shortlyIVP).

The solution of the IVP (2.1), (2.2) is called aparticular solution.

The graph of the particular solution is called anintegral curve.

Remark 2.1. According to the definition, the ordinary differential equation is a relationshipbetween an unknown functiony and its derivativey′.

Remark 2.2. Equation (2.1) is sometimes considered in equivalent formsdydx

= ϕ(x,y)

or

dy = ϕ(x,y)dx.

Remark 2.3 (geometric interpretation, slope field). Consider equation (2.1). This equation as-signs to each point[x0,y0] in the plane a direction with the slopeϕ(x0,y0). Any solution ofequation (2.1) has a graph which has the direction (slope of the tangent) required by (2.1) at eachpoint. Roughly speaking, the differential equation states:

If an integral curve goes through the point[x0,y0], it follows necessarily thedirectionϕ(x0,y0) at this point.

The simplest way how to visualize this idea is to draw a short mark at several points to indicatethe direction associated with that point. This can be done ina systematic way on the computer.As a result, we obtain aslope fieldof the equation. This slope field has the property that the linearelements in this field are tangent to the integral curve whichpasses through the same point. Forsome slope fields see Appendix C on the page 156.The initial condition (2.2) requires that the graph of the particular solution goes through the point[α,β ].

3. ODE WITH SEPARATED VARIABLES 132

Example 2.1.The solution of equationy′ = x2+xypassing through the point[2,1] has the slope22+2.1 = 6 in this point. Hence the line

y = 6(x−2)+1

is a tangent to this integral curve in[2,1].

Remark 2.4 (general solution). Usually it is possible to write all functions which satisfy an Eordinary differential equation by a single formula containing some a real constant, sayC, whichcan take arbitrary values, i.e., there exists a functiony = y(x) either in an explicit form

y = y(x,C),

or in an implicit form

Ψ(y,x,C) = 0,

which solves (2.1) for everyC∈R. Such a function is called ageneral solution. Every (or almostevery) particular solution can be obtained from the generalsolution by convenient choice of theconstantC. Compare the basic problem of the integral calculus mentioned at the begin of thischapter.

Example 2.2.The function

y = Ce−x, C∈ R, (2.3)

satisfiesy′ = Ce−x(−1) = −Ce−x = −y. Hence the function (2.3) is a general solution of theequation

y′ = −y.

The particular solution of this equation satisfying the initial conditiony(0) = 3 isy = 3e−x.

Remark 2.5. Sometimes we speak about a particular solution also in the case when no initialcondition is given. In this case we understand under the termparticular solution any (arbitrary,but only one) solution of the equation.

In the following we will consider some special cases of differential equations. In these cases wewill be able to find an analytic formula for the solution.

3. ODE with separated variables

In this section we will study one of the simplest differential equations.

Definition (ODE with separated variables). Let f andg be continuous functions. The equation

y′ = f (x)g(y) (3.1)

is said to be anordinary differential equation with separated variables.

Example 3.1.The equation

y′−x−y = 0

is not an equation with separated variables. Really, an equivalent form of this equation isy′ =x+y and the right-hand side cannot be factored into the formf (x)g(y). On teh other hand, theequation

e−xy′ +ex+yy = 0


is an equation with separated variables. Really, an equivalent form of this equation is

y′ =−ex+yy

e−x

which can be written as

y′ = −yey ·e2x.

This equation is like (3.1).

Remark 3.1 (existence and uniqueness of the solution). If g(β ) 6= 0, then the solution of theEinitial value problem (3.1), (2.2) exists and is unique in some neighborhood of the pointα.

Algorithm 3.1 (equation with separated variables). Let us consider equation (3.1).(i) If the algebraic equationg(y) = 0 has solutionsk1, k2, . . . , kn, then the constant func-

tionsy≡ k1, y≡ k2, . . . ,y≡ kn are solutions of this equation.∗ In the following we willcompute the non-constant solutions.

(ii) We work under conditiong(y) 6= 0. We writey′ in the form of the quotientdydx

:

dydx

= f (x)g(y) (3.2)

(iii) With the quotientdydx

we work like with the usual fraction. Using the usual algebraic

operations we separate the variables, i.e., we convert the equation into the form whichhas just one variable on each side of the equality sign. We obtaindy

g(y)= f (x)dx. (3.3)

(iv) We integrate equation (3.3):∫

dyg(y)

=

∫

f (x)dx (3.4)

There is an integral in the variabley on the left and an integral in the variablex on theright. We evaluate these integrals and consider also a constant of integration,C∈ R, inone of these integrals.

(v) The relation obtained after integration defines the general solution in its implicit form.Customarily we convert this solution into the explicit form, if it is possible.

(vi) Sometimes it may happen that the constant solutions obtained in the first step of this al-gorithm can be included in the general solution for some special choice of the constant.If this situation appears, we do this inclusion.

(vii) If an initial condition is given, we find the value of theconstantC for which this con-dition is satisfied. We use this value in the general solutionand obtain the particularsolution.

Example 3.2.Solve IVP

y2−1+yy′(x2−1) = 0, y(0) = 2.

∗In these solutions the uniqueness of the solution of the IVP may be broken. Ify ≡ ki is such a solution, it iscalled asingularsolution.


Solution:The equation can be written in the form

y2−1 = yy′(1−x2),

y′ =y2−1

y1

1−x2 .

The equation has separated variables and it is meaningful for x 6=±1 andy 6= 0. These conditionare guaranteed by the initial conditiony(0) = 2. We separate variables

dydx

=y2−1

y1

1−x2 ,

yy2−1

dy =1

1−x2 dx, integrate∫

ydyy2−1

=

∫

dx1−x2 , evaluate integrals

and obtain12

ln |y2−1| = 12

ln∣

∣

∣

1+x1−x

∣

∣

∣+c, c∈ R.

According to the initial condition we considery2−1 > 0 (since 22−1> 0) and1+x1−x

> 0 (since

1+01−0

> 0). We can remove the absolute values.

12

ln(y2−1) =12

ln1+x1−x

+c,

ln(y2−1) = ln1+x1−x

+2c,

ln(y2−1) = ln1+x1−x

+ lne2c, we add logarithms

ln(y2−1) = ln(1+x

1−xe2c

)

.

Since the function ln(·) is one-to-one we remove the logarithms

y2−1 =1+x1−x

e2c, rename the constant

y2 = 1+C1+x1−x

, C = e2c ∈ R+.

This is a solution in the implicit form. We substitutex = 0 andy = 2 from the initial condition

22 = 1+C1+01−0

and solve forC. We obtainC = 3. Hence

y2 = 1+31+x1−x

=4+2x1−x

is a solution of our initial value problem. An explicit form of this solution is

y =

√

4+2x1−x


since we considery > 0.

Example 3.3.Solve equation

3xy2y′ = (y3−1)(x3−1).

Solution:The equation is an equation with separated variables, sinceit can be written in the form

y′ =y3−13y2 · x3−1

x.

The equation is meaningful forx 6= 0 andy 6= 0. Fory = 1 the right-hand side equals zero andhence the constant functiony≡ 1 is a solution of this equation. We continue with separationofthe variables fory 6= 1

3y2

y3−1dy =

x3−1x

dx

and after integration we obtain

ln |y3−1| = x3

3− ln |x|+c, c∈ R.

This is a general solution of the equation. We simplify this equation and try to convert it into itsexplicit form. We write both sides as logarithms

ln |y3−1| = ln(

ex3/3 1|x| ec

)

and remove the logarithms

|y3−1| = ex3/3 1|x| ec.

By omitting the absolute values both sides of the equation can differ in the sign. We include thissign to the constant factorec

y3−1 = (±ec)ex3/3 1x

and rename the constantC = ±ec ∈ R\{0}. We obtain

y3−1 =Cx

ex3/3. (3.5)

For C = 0 this formula contains the constant solutiony = 1. Hence we can supposeC ∈ R

arbitrary. We keep the solution in the last form, but remember, that it can be solved fory andhence the solution can be converted into its explicit form.

Remark 3.2(solution of the IVP through definite integral). Particular solution of the initial valueproblem (3.1), (2.2) can be written in the form of the definiteintegral

∫ y

β

dtg(t)

=∫ x

αf (t)dt. (3.6)

Remark that we rename the variable in the integral. This is necessary, sincex andy are used forupper limits in the integrals.

4. FIRST ORDER LINEAR ODE 136

4. First order linear ODE

In the following we will consider that the functionϕ(x,y) in (2.1) 1is linear with respect to thevariabley. More precisely, we will study the following equation.

Definition (first order linear ODE). Let a, b be continuous function onI . The equation

y′ +a(x)y = b(x) (4.1)

is said to be thefirst order linear ordinary differential equation(shortlyLDE). If b(x) ≡ 0 onI ,then the equation is calledhomogeneousandnonhomogeneousotherwise.

Definition (associated homogeneous equation). Consider nonhomogeneous equation (4.1).Homogeneous equation

y′ +a(x)y = 0 (4.2)

with the left-hand side identical with (4.1) is said to be ahomogeneous equation associated withthe nonhomogeneous equation(4.1).

Example 4.1.Equations

y′−2yln(x) =sinx

xand y′ = y+x

are linear. Equations

y′−y2 = x2 and yy′ = x2

are not linear. The linearity is broken because of the presence of the squarey2 in the first equationand the productyy′ in the second one.

Remark 4.1(existence and uniqueness of the solution). Givenα ∈ I andβ ∈R, every IVP (4.1), E(2.2) possesses a unique solution defined on the intervalI .

Remark 4.2(trivial solution of homogeneous LDE). Homogeneous LDE possesses the constantEsolutiony(x) = 0 for an arbitrary coefficienta(x). This fact can be proved by a direct substitution.The solutiony(x) = 0 is called atrivial solution and can be obtained by the initial conditiony(α) = 0 for α arbitrary.

The form of the LDE is somewhat special. As a consequence of this property, the set of allsolutions is somewhat special as well. The following theorem holds.

Theorem 4.1(general solution of LDE). Consider nonhomogeneous LDE(4.1)and associatedhomogeneous LDE(4.2).

• If yp(x) is a particular solution of nonhomogeneous LDE and y0(x,C) is a generalsolution of the associated homogeneous LDE, then the function

y(x,C) = yp(x)+y0(x,C) (4.3)

is a general solution of nonhomogeneous LDE.• If yp(x) is a particular solution of nonhomogeneous LDE and y0p(x) a nontrivial

particular solution of the associated homogeneous LDE, then the function

y(x,C) = yp(x)+Cy0p(x) (4.4)

is a general solution of nonhomogeneous LDE.


According to the preceding theorem, it is sufficient to find only one particular solution of thenonhomogeneous equation and either a general solution of the associated homogeneous equationor at least one nontrivial solution of this equation. From these functions and from the precedingtheorem we can set up the general solution of the nonhomogeneous equation. Both of theseparticular problems will be solved in the next paragraphs.

4.1. Homogeneous LDE.Homogeneous LDE

y′ +a(x)y = 0. (4.5)

is an equation with separated variables. Reallydydx

= −a(x)y

and fory 6= 0 we havedyy

= −a(x)dx,

ln |y| = −∫

a(x)dx+c, c∈ R.

We convert the right-hand side into a logarithmic form

ln |y| = ln(

e−∫

a(x)dx ·ec)

and remove logarithms

y = Ce−∫

a(x)dx, C = ±ec ∈ R\{0}.ForC = 0 we obtain the trivial solutiony≡ 0 and hence the general solution is

y(x,C) = Ce−∫

a(x)dx, C∈ R. (4.6)

4.2. Nonhomogeneous equation – variation of the constant.If yp0(x) is a nontrivial par- Eticular solution of the homogeneous LDE, then the general solution of this equation isy(x,C) =Cyp0(x). We will look for the particular solution of the nonhomogeneous equation in the form

yp(x) = K(x)yp0(x), (4.7)

whereK(x) is a differentiable function. Since this function replacesthe constant in the formulafor general solution of the nonhomogeneous equation, this method is calledvariation of theconstant. We claim thatyp(x) is a solution of (4.1) and we have to findK(x). Differentiating(4.7) we obtain

y′p(x) = K′(x)yp0(x)+K(x)y′p0(x),

and substitution into (4.1) yields

K′(x)yp0(x)+K(x)y′p0(x)+a(x)K(x)yp0(x) = b(x),

and

K′(x)yp0(x)+K(x)[

y′p0(x)+a(x)yp0(x)]

= b(x).

Sinceyp0(x) is a solution of the homogeneous LDE, the expression in brackets vanishes andhence

K′(x)yp0(x) = b(x). (4.8)


We solve the last equation forK′(x) and after integration we obtainK(x). Substitution of thefunction K(x) into (4.7) gives a particular solution of the nonhomogeneous equation and byTheorem 4.1 we can write the general solution of the nonhomogeneous equation (4.1). If weconsider the constant of integration,C, when evaluating the functionK(x), then the formula (4.7)yields immediately the general solution of nonhomogeneousequation.Usually we perform the variation of constant for each particular equation. However, we can usethis method also for general functionsa(x) andb(x) in (4.1) which yield the following formula.

Theorem 4.2(explicit formula for the general solution of LDE). The function

y(x,C) = e−∫

a(x)dx[

∫

b(x)e∫

a(x)dx dx+C]

=

∫

b(x)e∫

a(x)dx dx+C

e∫

a(x)dx, C∈ R (4.9)

is a general solution of(4.1). In this notation each of the integrals denotes one of the an-tiderivatives, i.e. the constants of integration are not considered (= can be arbitrary, but fixed).

Algorithm 4.1. The nonhomogeneous first order linear differential equation can be solved in thefollowing steps.

(i) We write the associated homogeneous LDE, i.e., we replace the right-hand side ofequation (4.1) by zero.

(ii) We solve this homogeneous equation by separation of thevariables or by formula (4.6).We obtain a general solution of the homogeneous equation.

(iii) We look for a particular solution of nonhomogeneous equation in the form (4.7), i.e.,we replace the constant in the general solution of the associated homogeneous equationby the functionK(x).

(iv) We evaluate the derivativey′p0(x) and substitute this derivative and the expression (4.7)into nonhomogeneous equation.

(v) We solve the obtained equation forK′(x).(vi) We integrate the functionK′(x) and obtainK(x). The constant of integration can be

taken arbitrary, usually 0.(vii) We substitute the functionK(x) into relation (4.7) and obtain the particular solution of

the nonhomogeneous LDE.(viii) We use formula (4.3) and obtain the general solution of the nonhomogeneous equation.

(ix) If the IVP is solved, we determine the value of the constant C for which the initialcondition is satisfied and put this value in the general solution.

Example 4.2.Solve

y′ = 1+3ytgx. (4.10)

Solution:We write the equation in the form

y′−3ytgx = 1. (4.11)

It is a nonhomogeneous LDE. We write the associated homogeneous LDE first

y′−3ytgx = 0. (4.12)

Fory 6= 0 we separate variables

dyy

= 3tgxdx

and integrate

ln |y| = −3ln|cosx|+c c∈ R.


We remove logarithms and absolute value, rename the constant and include the trivial solutioninto the obtained formula. This yields the general solutionof the homogeneous equation in theform

yGH(x) = Ccos−3x, C∈ R.

We continue with nonhomogeneous equation. We look for the particular solution in the form

yPN(x) = K(x)cos−3x. (4.13)

Differentiation of this formula gives

y′PN(x) = K′(x)cos−3x+K(x)3cos−4xsinx

and substitution into (4.11) gives

K′(x)cos−3x+K(x)3cos−4xsinx−3K(x)cos−3xtgx = 1.

After simplification we obtain

K′(x)cos−3x = 1

(which can be equivalently obtained directly from (4.8)) and hence

K′(x) = cos3x.

We integrate this equation

K(x) =∫

cos3xdx =∫

(1−sin2x)cosxdx = sinx− sin3x3

,

where the substitution sinx = t has been used. We substitute the functionK(x) into (4.13) andobtain the particular solution of (4.10) in the form

yPN(x) =sinx

cos3x− sin3x

3cos3x.

We add this solution and the general solution of the associated homogeneous equation and obtainthe general solution of (4.10) in the form

y(x) = yGH(x)+yPN(x) =sinx

cos3x− sin3x

3cos3x+

Ccos3x

, C∈ R.

Example 4.3.Solve the equation

xy′ +y = xln(x+1). (4.14)

Solution:An equivalent form of equation (4.14) is

y′ +1x

y = ln(x+1). (4.15)

This shows that the equation is linear witha =1x

andb = ln(x+1). The solution exists on each

subinterval of the interval(−1,∞) not containing the number 0. The general solution of theassociated homogeneous equation

y′ +1x

y = 0

is given by the formula

yGH(x) = Ce−∫

a(x)dx = Ce− ln |x| =C|x| =

Kx

,

5. SECOND ORDER LINEAR ODE 140

whereK = ±C ∈ R is a new constant. (Using this constant we can remove the absolute value,which would be unpleasant in the forthcoming integration.)We look for the particular solution of nonhomogeneous equation (4.15) in the formyPN(x) =

K(x)1x

. The differentiation and substitution into (4.15) (or simply application of (4.8)) gives

K′(x)1x

= ln(x+1)

and hence

K′(x) = xln(x+1).

Integrating by parts we have

K(x) =x2

2ln(x+1)− x2

4+

x2− 1

2ln(x+1)

and substitution into the particular solution gives

yPN(x) = K(x)1x

=x2

ln(x+1)− x4

+12− 1

2xln(x+1)

The general solution of equation (4.14) is

y(x) = yPN(x)+yGH(x) =x2

ln(x+1)− x4

+12− 1

2xln(x+1)+

Cx, C∈ R

The equation is solved.

5. Second order linear ODE

Definition (second order linear differential equation). Let p, q and f be functions continuouson the intervalI . The equation

y′′+ p(x)y′ +q(x)y = f (x) (5.1)

is said to be asecond order linear differential equation. Under asolutionof this equation weunderstand every function which has the second derivative on the intervalI and satisfies (5.1)for everyx∈ I .

Definition (initial value problem). Let α ∈ I be a number from the intervalI and β , γ bearbitrary real numbers. The problem to find the solution of equation (5.1) which satisfiesinitialconditions

{

y(α) = βy′(α) = γ

(5.2)

is called aninitial value problemfor equation (5.1), shortly IVP.

The solution of the IVP is called aparticular solution.


Definition (special types of 2nd order LDE). Equation (5.1) is said to behomogeneousif f (x) =0 for all x∈ I andnonhomogeneousotherwise.

Equation (5.1) is said to be anequation with constant coefficientsif the functionsp(x) andq(x)are constant functions.

Definition (general solution). There exists a single formulay = y(x,C1,C2) such that everysolution of equation (5.1) can be obtained from this formulafor a convenient choice of the realnumbersC1, C2. This formula is called ageneral solutionof equation (5.1)

Remark 5.1 (existence and uniqueness of the solution). Every IVP for equation (5.1) has aEunique solution defined on the intervalI .

Remark 5.2 (trivial solution). The constant functiony(x) ≡ 0 is solution of the homogeneousequation

y′′ + p(x)y′ +q(x)y = 0,

as can be verified by direct substitution. This solution is called a trivial solutionof the homoge-neous equation.

Definition (associated homogeneous equation). Consider nonhomogeneous equation (5.1).Homogeneous equation

y′′+ p(x)y′ +q(x)y = 0. (5.3)

with the left-hand side identical with equation (5.1) is called ahomogeneous equation associatedto the nonhomogeneous equation(5.1).

Like in the case of the first order linear differential equation, there exists a relationship betweenthe solution of nonhomogeneous equation and the solution ofthe associated homogeneous equa-tions.First let us study the homogeneous equation (5.3).

5.1. Homogeneous equation.Concerning the second order linear differential equation,wewill show that the set of all solutions forms a kind of vector space. First let us introduce theconcept of linear independence of functions.

Definition (linear (in-)dependence of functions). Let y1 and y2 be functions defined onI .Suppose that neithery1 nory2 equal zero everywhere onI . The functionsy1 andy2 are said to belinearly dependenton the intervalI if there exists a numberk∈R such that eithery1(x) = ky2(x)or y2(x) = ky1(x) holds for allx∈ I . In the opposite case the functions are said to belinearlyindependent.

Theorem 5.1(general solution of homogeneous equation). Let y1 and y2 be linearly indepen-dent nontrivial solutions of the homogeneous equation(5.3). The function y defined by therelation

y(x,C1,C2) = C1y1(x)+C2y2(x), C1 ∈ R, C2 ∈ R (5.4)

is general solution of(5.3)on the interval I.


Definition (fundamental system of solutions). The pairy1 andy2 of the functions from thepreceding theorem is called afundamental system of the solutionsof equation (5.3).

Unfortunately, in general, we can find the fundamental system of solutions only in the case ofsecond order LDE with constant coefficients.Theorem 5.2(general solution of the equation with constant coefficients). Let us consider thehomogeneous second order ODE with constant coefficients

y′′ + py′ +qy= 0, p,q∈ R (5.5)

and the quadratic auxiliary equation

z2 + pz+q = 0 (5.6)

with unknown z.

• If z1,z2 ∈ R are mutually different real zeros of the auxiliary equation(5.6), we put

y1(x) = ez1x and y2(x) = ez2x.

• If z1 ∈ R is a double real zero of the auxiliary equation(5.6), we put

y1(x) = ez1x and y2(x) = xez1x.

• If z1,2 = α ± iβ 6∈ R are complex zeros of the auxiliary equation(5.6), we put

y1(x) = eαxcos(βx) and y2(x) = eαxsin(βx).

The functions y1 and y2 form a fundamental system of solutions of equation(5.5). A generalsolution of equation(5.5) is

y(x,C1,C2) = C1y1(x)+C2y2(x), C1 ∈ R, C2 ∈ R.

Example 5.1. • The auxiliary equation of the differential equation

y′′ +y′−6y = 0

is z2+z−6 = 0. This auxiliary equation possesses two real rootsz1 = 2 andz2 = −3.Hence the general solution of this equation isy = C1e2x +C2e−3x, whereC1,C2 arearbitrary real numbers.

• The auxiliary equation of the differential equation

y′′ +4y = 0

is z2+4 = 0. This equation possesses two complex conjugate zerosz1,2 = ±2i. Hencethe general solution of this equation isy = C1sin(2x)+C2cos(2x), whereC1,C2 arearbitrary real numbers.

5.2. Nonhomogeneous equation.The following theorem suggests a method how to find ageneral solution of the 2nd order LDE. This theorem shows that this general solution has a veryspecial form — we can write this solution from three special functions, as formula (5.7) shows.

Theorem 5.3(general solution of nonhomogeneous second order LDE). Let y1(x) and y2(x)be fundamental system of solutions of the homogeneous LDE(5.3) and yp(x) be an arbitraryparticular solution of the nonhomogeneous LDE(5.1). Then the function

y(x,C1,C2) = C1y1(x)+C2y2(x)+yp(x), C1 ∈ R, C2 ∈ R (5.7)

is a general solution of the nonhomogeneous LDE(5.1).


According to the preceding theorem, the situation for the second order LDE is almost the sameas in the case of the first order LDE: It is sufficient to solve∗ the associated homogeneous LDEand to find only one solution of the nonhomogeneous equation.Then we can use the precedingtheorem and obtain the general solution of the nonhomogeneous equation.We focus our attention to the equations with constant coefficients only, since only in this case wecan solve the associated homogeneous equation (Theorem 5.2). In this case it remains to find aparticular solutionyp(x)

Theorem 5.4(solution of the nonhomogeneous LDE with constant coefficients). Consider thesecond order LDE with constant coefficients

y′′ + py′ +qy= f (x). (5.8)

Let y1(x) and y2(x) be a fundamental system of solutions of the associated homogeneous equa-tion (5.5) from Theorem 5.2. Let A(x) and B(x) be differentiable functions with derivativesA′(x) and B′(x) which satisfy the system of equations

A′(x)y1(x)+B′(x)y2(x) = 0,

A′(x)y′1(x)+B′(x)y′2(x) = f (x).(5.9)

The function yp(x) defined by the relation

yp(x) = A(x)y1(x)+B(x)y2(x) (5.10)

is a particular solution of(5.8). The function

y(x) = C1y1(x)+C2y2(x)+yp(x), C1 ∈ R,C2 ∈ R,

is a general solution of(5.8).

Remark 5.3 (Cramer’s rule, wronskian). The linear system (5.9) possesses always a uniqueEsolutionA(x), B(x). We can solve it by an arbitrary method for solving the systemof linearequations. Note that in this case the coefficients and the unknowns in the system are not realnumbers but functions. However, everything what we said forthe systems with real coefficientsextends also for the system (5.9). Among others, the system can be solved by the Cramer’s rule.The determinant of the coefficient matrix is

W =

∣

∣

∣

∣

y1(x) y2(x)y′1(x) y′2(x)

∣

∣

∣

∣

.

This determinant is called awronskian of the functions y1(x) and y2(x) and it is always nonzero.With the auxiliary determinants

W1 =

∣

∣

∣

∣

0 y2(x)f (x) y′2(x)

∣

∣

∣

∣

, W2 =

∣

∣

∣

∣

y1(x) 0y′1(x) f (x)

∣

∣

∣

∣

,

we obtain the formula for unknownsA′(x) andB′(x) in the form

A′(x) =W1

W, B′(x) =

W2

W.

The Cramer’s rule is very effective especially when the fundamental system of the solutions isformed by the functions sinβx and cosβx.

Algorithm 5.1. Nonhomogeneous second order LDE with constant coefficientscan be solved inthe following steps.

∗= to find a general solution


(i) First, we consider the associated homogeneous LDE. We solve the auxiliary equationand find the fundamental system of solutions and the general solution of this homoge-neous equation by Theorem 5.2 .

(ii) We look for a particular solution of the nonhomogeneousequation in the form (5.10).We identify the right-hand sidef (x) and build up the system (5.9).

(iii) We solve (5.9) for the derivativesA′(x), B′(x). We integrate these derivatives and obtainA(x), B(x). We consider no constant of integration. These constants may be arbitrary,usually 0.

(iv) We put the functionsA(x), B(x) into (5.10) and obtain the particular solution of thenonhomogeneous equation. Finally, we obtain a general solution of this equation fromthe formula in Theorem 5.4.

(v) If the initial conditions are given, we calculate the derivativey′ of the general solutionand put the corresponding values forx, y andy′. We obtain the system of linear equa-tions with unknownsC1, C2. We solve this system for the unknowns and put the valuesof these unknowns into the general solution.

Example 5.2.Solvey′′−4y′ +4y = e−x.Solution:The associated homogeneous equation is

y′′−4y′+4y = 0

with auxiliary equation

z2−4z+4 = 0.

The auxiliary equation has double rootz1,2 = 2. In this case the fundamental system of solutionsis (see Theorem 5.2)y1(x) = e2x, y2(x) = xe2x. Let us seek a particular solution of the nonhomo-geneous equation in the formy = A(x)y1(x)+B(x)y2(x). We will use Theorem 5.4. Derivativesof the functions from the fundamental system arey′1(x) = 2e2x andy′2(x) = e2x(1+2x). Systemfor coefficientsA′, B′ (we omit the argumentx for brevity) takes the form

A′e2x +B′xe2x = 0,

2A′e2x +B′(1+2x)e2x = e−x.

Dividing both equations by the factore2x we obtain

A′+B′x = 0,

2A′+B′(1+2x) = e−3x.

Solution of this system isB′ = e−3x andA′ = −xe−3x. By integration we obtainA andB:

A(x) =

∫

A′dx =

∫

−xe−3x dx =13

xe−3x− 13

∫

e−3x dx =

=13

xe−3x +19

e−3x,

B(x) =∫

B′dx =∫

e−3x dx = −13

e−3x

and the particular solution of the nonhomogeneous equationis

yp(x) = A(x)y1(x)+B(x)y2(x) =(1

3xe−3x +

19

e−3x)

e2x− 13

e−3xxe2x =19

e−x.


General solution of the nonhomogeneous equation is

y = C1e2x +C2xe2x +19

e−x, C1,2 ∈ R.

CHAPTER VIII

Difference Equations

1. Difference calculus

In this chapter we will study sequences of real numbers. Recall that the sequence of real numbersis every functiona : N0 → R. The arguments of this function we write usually as indexes and wearrange the values in the “sequence”(an) = (a0,a1,a2,a3, . . . ,ai, . . .).

Remark 1.1 (important sequences). The most important sequences are the arithmetic and thegeometric sequences.

• Each term in thearithmetic sequencecan be obtained from the preceding one by addingsome fixed valued, called difference. For example, the sequence(−2,1,4,7, . . .) is anarithmetic sequence with differenced = 3. The formula for a general terman of thearithmetic sequence with the first terma0 and the differenced is

an = a0+nd.

• Each term in thegeometric sequencecan be obtained from the preceding one by mul-tiplication by the fixed nonzero valueq called quotient. As a particular example, the

sequence(

84,21,214

,2116

, . . .)

is a geometric sequence with quotient14

. The formula

for a general terman of the geometric sequence with quotientq and the first terma0 is

an = a0qn.

The fact whether a sequence is increasing or decreasing can be recognized from the differenceof two consecutive terms. This is the motivation for the following definition.

Definition (difference of a sequence). Let an, an+1 be two consecutive terms of the sequence(an). The differencean+1−an is called a(forward) difference of the sequence(an) at the pointn and denoted by∆an. Hence

∆an = an+1−an. (1.1)

Remark 1.2. The letter∆ is a capital Greek letter “delta”.

Example 1.1.Consider the sequence(an) = (n2). The difference∆an is given by the relation

∆an = (n+1)2−n2 = n2+2n+1−n2 = 2n+1

for all n∈ N0.

2. General difference equations

According to Chapter VII, a differential equation is a relationship between an unknown functiony and the derivative of this functiony′. Recall that the derivative of a function is a quantitywhich describes, how fast the functiony changes. An information about changes of terms in a

146

2. GENERAL DIFFERENCE EQUATIONS 147

sequence is included into its differences. Hence it is natural to expect that we will be interestedin a relationship between a given sequence and its difference.

Definition (difference equation of the first kind). Let f be a real function. The equation

∆yn = f (n,yn) (2.1)

is called afirst order difference equation of the first kind.

Let us rewrite the difference∆yn into yn+1−yn

yn+1−yn = f (n,yn)

and let us solve the resulting equation foryn+1. We get

yn+1 = f (n,yn)+yn.

This is a recurrence equation for the terms of the sequence(yn). Giveny0, it is possible after afinite number of steps to findyn for arbitraryn. This relation is equivalent to (2.1) and it is calleda difference equation of the second kind.

Definition (difference equation of the second kind). Let ϕ be a real function of two variables.The equation

yn+1 = ϕ(n,yn). (2.2)

is called afirst order difference equation of the second kind, shortly adifference equationor ∆E

Definition. Under asolutionof equation (2.2) we understand every sequence defined onN0which satisfies (2.2) for alln∈ N0.

Our principal problem will be to find an analytic formula for ageneraln-th term of the solution.(Remember that if we have to find for giveny0 the firstn terms of the solution, it is sufficient tousen-times formula (2.2).)The problem to find a solution of (2.2) has infinitely many solutions (since there are infinitelymany of possibilities for the starting valuey0). If there is a formula containing a real variableC,say

yn = y(n,C),

such that for everyC this formula defines a solution of (2.2), then this formula iscalled ageneralsolutionof equation (2.2).

Definition (particular solution). The problem to find the solution of (2.2) which for givennumbersα ∈ N0 andβ ∈ R satisfies aninitial condition

yα = β (2.3)

is called aninitial value problemfor equation (2.2), shortly IVP. The solution of this IVP iscalled aparticular solution.

The particular solution can be usually obtained from the general solution for a convenient choiceof the constantC (like for differential equations).

3. LINEAR DIFFERENCE EQUATION 148

3. Linear difference equation

Usually the problem to find a general solution of a differenceequation is more difficult thanthe problem to find a general solution of a differential equation. One of important classes ofdifferential equations which can be solved analytically isintroduced in the following definition.

Definition (first order linear∆E with constant coefficients). Let q ∈ R \ {0}. The differenceequation

yn+1−qyn = f (n) (3.1)

is called afirst order linear difference equation with constant coefficient, shortly L∆E.

This equation is said to behomogeneousif f (n) ≡ 0 onN0 andnonhomogeneousotherwise.

Remark 3.1. Note the sign “−” forerunning the coefficientq. This sign is only technical and itallows to formulate the later results in a relatively simpler way.

The homogeneous difference equation with the left-hand side identical to the left-hand side ofthe nonhomogeneous equation (3.1) plays a crucial role whensolving (3.1).

Definition. The first order homogeneous difference equation

yn+1−qyn = 0 (3.2)

is said to be ahomogeneous equation associated with equation(3.1).

Equation (3.2) is a recurrence equationyn+1 = qyn for the coefficients of geometric sequence andthe general solution of this equation isyn = Cqn, whereC is an arbitrary real constant.

Remark 3.2 (trivial solution). The zero sequenceyn = 0 for all n ∈ N0 is a solution of thehomogeneous equation, as can be verified by direct substitution. This solution is called atrivialsolution.

The relationship between the homogeneous and nonhomogeneous equations is the following.

Theorem 3.1(general solution of nonhomogeneous equation). Let yn be a particular solutionof the nonhomogeneous linear equation(3.1) and Yn(C) = Cqn be a general solution of theassociated homogeneous equation. Then a general solution yn(C) of nonhomogeneous equation(3.1) is the function

yn(C) = yn+Yn(C) = yn +Cqn. (3.3)

According to the last theorem it is sufficient to find the particular solution of nonhomogeneousequation – hence the situation is about the same as for lineardifferential equations.To find this particular solution in some special cases we introduce a method of “educated guess-ing” of the particular solution based on the following theorem.


Theorem 3.2(particular solution). Let ρ be a real number and Ps(n) an s-degree polynomialof the variable n. Consider the first order linear differenceequation with constant coefficient

yn+1−qyn = ρnPs(n). (3.4)

A particular solution of this equation can be found in the form

yn = ρnQs(n)nr , (3.5)

where Qs(n) is an s-degree polynomial of variable n. In this notation r= 1 if ρ = q andr = 0 otherwise. The coefficients of the polynomial can be found bysubstitution(3.5) withundetermined coefficients into(3.4).

Example 3.1.Solve∆E

∆yn = yn−2n−3.

Solution:We convert equation into the form without differences

yn+1−yn = yn−2n−3,

yn+1−2yn = −2n−3. (3.6)

The associated homogeneous equation is equation

yn+1−2yn = 0 (3.7)

with general solutionYn = K ·2n. The right-hand side of (3.6) satisfies the assumptions of Theo-rem 3.2 forρ = 1 ands= 1. The particular solution from Theorem 3.2 is

yn = 1n(an+b) = an+b.

In this notationyn+1 = a(n+1)+b= an+a+b holds and hence after substitution into (3.6) wehave

an+a+b−2(an+b) = −2n−3.

This holds for everyn∈ N0 if and only if the coefficients of the corresponding powers ofn arethe same. From here we obtain the system

−a = −2,

a−b = −3,

with solutiona = 2, b = 5. General solution of the equation is the sequenceyn = Yn + yn =K ·2n+2n+5.

Example 3.2.Solve IVP

yn+1 +2yn = (−2)n(n−1) (3.8)

with initial conditiony1 = −2.Solution:The equation is the first order linear difference equation. The associated homogeneousequation is

yn+1 +2yn = 0

with general solution

Yn = C(−2)n, C∈ R.


The right-hand side of the equation satisfies the assumptions of Theorem 3.2 forρ = −2 ands= 1. Sinceρ = a, we taker = 1. Particular solution from Theorem 3.2 is

yn = (−2)nn(an+b) = (−2)n(an2+bn)

and foryn+1 we have (after simplification)

yn+1 = (−2)n+1(n+1)[a(n+1)+b] = (−2)n+1(an2+2an+bn+a+b).

Substitution into (3.8) yields:

(−2)n+1(an+2an+bn+a+b)+2(−2)n(an2+bn) = (−2)n(n−1).

We look for the conditions ona andb which guarantee that this relation holds for everyn∈ N0.We divide equation by the factor(−2)n

(−2)(an+2an+bn+a+b)+2(an2+bn) = n−1

and simplify. Remark that the terms containing the powern2 cancel. We obtain equation

−4an−2a−2b = n−1.

From the coefficients at the powersn1 andn0 we write the system

−4a = 1,

−2a−2b = −1,

with solutiona = −14

, b =34

. The particular solution is a sequenceyn = (−2)nn(

−14

n+34

)

.

The general solution is

yn = Yn+yn = C(−2)n+(−2)nn(

−14

n+34

)

=14(−2)n(4C−n2 +3n).

We have to determine the value of the constantC. From the initial condition we substituten = 1andyn = −2 and obtain the equation

−2 =14(−2)(4C−1+3)

with solutionC =12

. Solution of the initial value problem is the sequence

yn =14(−2)n(2−n2+3n).

Remark 3.3. Let us explore what happens if we would seek the particular solution in Example3.2 in the (incorrect) form

yn = (−2)n(an+b). (3.9)

After evaluation ofyn+1 and substitution into (3.8) we obtain

−2(−2)n(an+a+b)+2(−2)n(an+b) = (−2)n(n−1),

which can be simplified into

−2a = n−1.


From the coefficients at the powers ofn we obtain system

−2a = −1,

0 = 1,

which has no solution. Hence no solution of (3.8) can be foundin form (3.9).

APPENDIX A

Graphs of basic elementary functions

In the following we present graphs of the most frequently used basic elementary functions.

y

1

2

3

x-3 -2 -1 1 2

y = x2

y

x

1

1

y =√

x

y

-3

-2

-1

1

2

x-3 -2 -1 1 2

y = x3

y

1

2

3

x-3 -2 -1 1 2

y = x4

y

-3

-2

-1

1

2

x-3 -2 -1 1 2

y =1x

y

x

a

11

y = ax, a > 1

152

A. GRAPHS OF BASIC ELEMENTARY FUNCTIONS 153

y

x

1a

1

y = ax, 0 < a < 1

y

x1 a

1

y = logax, a > 1

y

x1

a

1

y = logax, 0 < a < 1

y

-1

1

xπ2

π 3π2

2π 3π 4π

y = sinx

y

-1

1

xπ2

π 3π2

2π 3π 4π

y = cosx

y

xπ 2ππ2−π

2

y = tgx

y

xπ 2ππ2−π

2

y = cotgx

y

x1−1

π2

−π2

y = arcsinx

A. GRAPHS OF BASIC ELEMENTARY FUNCTIONS 154

y

x

π

π2

1−1

y = arccosx

y

x

π2

−π2

y = arctgx

y

x

π

π2

y = arccotgx

APPENDIX B

Differentiation and integration formulas

(I) (c)′ = 0

(II) (xn)′ = nxn−1

(III) (ax)′ = ax lna(IV) (ex)′ = ex

(V) (logax)′ =1

xlna

(VI) (lnx)′ =1x

(VII) (sinx)′ = cosx(VIII) (cosx)′ = −sinx

(IX) (tgx)′ =1

cos2x

(X) (cotgx)′ = − 1

sin2x

(XI) (arcsinx)′ =1√

1−x2

(XII) (arccosx)′ = − 1√1−x2

(XIII) (arctgx)′ =1

1+x2

(XIV) (arccotgx)′ = − 11+x2

Rules for differentiation.

u,v : R → R, c∈ R,

1. (u(x)±v(x))′ = u′(x)±v′(x)

2. (cu(x))′ = cu′(x)

3. (u(x)v(x))′ = u′(x)v(x)+u(x)v′(x)

4.(u(x)

v(x)

)′=

u′(x)v(x)−u(x)v′(x)v2(x)

5.(

f (g(x)))′

= f ′(g(x)) ·g′(x)

(I)∫

dx = x+c

(II)∫

xndx =xn+1

n+1+c

(III)∫

1x

dx = ln |x|+c

(IV)∫

axdx =ax

lna+c

(V)∫

exdx = ex +c

(VI)∫

sinxdx = −cosx+c

(VII)∫

cosxdx = sinx+c

(VIII)∫

1cos2x

dx = tgx+c

(IX)∫

1

sin2xdx = −cotgx+c

(X)∫

1√A2−x2

dx = arcsinxA

+c

(XI)∫

1√x2±B

dx = ln |x+√

x2±B|+c

(XII)∫

1A2+x2 dx =

1A

arctgxA

+c

(XIII)∫

1A2−x2 dx =

12A

ln

∣

∣

∣

∣

A+xA−x

∣

∣

∣

∣

+c

Basic methods of integration.integration by parts, substitution, expansion intopartial fractions,∫

f ′(x)f (x)

= ln | f (x)|+c

155

APPENDIX C

Slope fields

Here we present of slope field and integral curves of several differential equations.A simple tool which allows to draw the slope field of first orderdifferential equation can be foundon http://www.mendelu.cz/user/marik/jode/direction-field.html.

y

x

FIGURE 1. Slope field ofy′ = y.

y

x

FIGURE 2. Slope field ofy′ = −y.

y

x

FIGURE 3. Slope field ofy′ =y(M−y), M ∈ R

+.

y

x

FIGURE 4. Slope field ofy′ =x2 +y2.

156

C. SLOPE FIELDS 157

y

x

FIGURE 5. Slope field ofy′ =y2−x

y

x

FIGURE 6. Slope field ofy′ =

e−x− 710

y.

user.mendelu.czuser.mendelu.cz/marik/frvs/book/master.pdf · contents chapter i. precalculus 4 1....

Documents