introduction - whitman collegea study of convex functions with applications matthew liedtke may 14,...

A Study of Convex Functions with ApplicationsMatthew Liedtke

May 14, 2012

1. Introduction

The study of concave up functions in calculus is often limited to describingone of the possible classifications of an increasing function. This is unfortu-nate because the simple generalization to a convex function greatly extendsour scope for analysis. The theory developed in the study of convex func-tions, arising from intuitive geometrical observations, may be readily appliedto topics in real analysis and economics.

This paper begins with a rigorous study of convex functions with the goalof developing a theoretical framework to extend to further topics. Our ap-plications to real analysis include right-endpoint approximation for integralsand pointwise convergence. In our discussion of economics, we will use thetheory of consumer behavior under uncertainty and Jensen’s Inequality toshow the relation between convex functions and the modeling of risk lovingbehavior. While a first course in real analysis is necessary to truly graspthese results, this paper has been written to be accessible to those who havecompleted a bridge to higher mathematics course.

2. Convex functions and theory

2.1. Concave Functions. The theoretical predecessor to convexity is aconcave up function. As seen in Figure 1, the rate at which the concave upfunction f is increasing is itself increasing. This may be formally expressedin terms of f ′.

Definition 2.1. Let f be a differentiable function defined on an interval I.

(a) The function f is concave up on I if f ′ is increasing on I.(b) The function f is concave down on I if f ′ is decreasing on I.

Two important theoretical properties of a concave up function begin withgeometric observations. Consider a function f that is concave up on aninterval I. Referring to Figure 1, we see that the line tangent to the graphat a point (c, f(c)) lies below all other points on the graph of f . Similarly,a secant line drawn between (a, f(a)) and (b, f(b)) lies above the graph.

Definition 2.2. Let f be a function defined on an interval I.

(a) Suppose that f is differentiable on I. The graph of f lies above itstangent lines on I if for each c ∈ I, the inequality f(x) ≥ Tc(x) isvalid for all x ∈ I, where Tc represents the function whose graph is theline tangent to the graph of y = f(x) when x = c.

(b) The graph of f lies below its secant lines on I if for each pair ofdistinct points a, b ∈ I with a < b, the inequality f(x) ≤ Sab(x) is validfor all x ∈ [a, b], where Sab represents the function whose graph is theline joining the points (a, f(a)) and (b, f(b)).

1

2

Figure 1. A concave up function

x

y

y = f(x)

a bc

f(a)

f(b)

f(c)

Our observation concerning secant lines is particularly useful as a tool infurther proofs. Indeed, throughout this paper we will be returning to thisresult as a basis for proofs and applications. For this reason we will carefullyprove this result below while the complementary tangent line proof may befound in Gordon [1].

Theorem 2.3. Let f be a differentiable function defined on an interval I.The function f is concave up on I if and only if the graph of f lies belowits secant lines on I.

Proof. Suppose that the function f is concave up on I, and let [a, b] ⊆ I.We define the secant line between the points (a, f(a)) and (b, f(b)) by

Sab(x) =f(b)− f(a)

b− a(x− a) + f(a).

Let D : [a, b]→ R be defined

D(x) =

{f(x)−f(a)

x−a for a < x ≤ c;f ′(a) for x = a.

Since f is continuous on I and differentiable at a, it follows that D is con-tinuous on [a, b]. Furthermore, using the fact that f is concave up, and thusf ′(x) − f ′(a) ≥ 0 for x ∈ (a, b), we have that D′(x) ≥ 0 for x ∈ (a, b). TheMonotonicity Theorem states that D(x) ≤ D(b) for x ∈ (a, b), and usingthe definition of D, we find that

f(x)− f(a)

x− a≤ f(b)− f(a)

b− a

f(x) ≤ f(b)− f(a)

b− a(x− a) + f(a).

Thus, the graph of f lies below its secant lines on I.Now instead suppose that the graph of f lies below its secant lines on I.

Let a, b ∈ I such that a < b. By the Mean Value Theorem, there exists a

3

c ∈ (a, b) such that f ′(c) = f(b)−f(a)b−a . Since the graph of f lies below its

secant lines on I, we have for x ∈ [a, b] that

f(x) ≤ Sab = f ′(c)(x− a) + f(a),

and therefore,

f ′(a) = limx→a+

f(x)− f(a)

x− a≤ f ′(c).

It can be shown that f ′(c) ≤ f ′(b); the proof is analogous. It followsf ′(a) ≤ f ′(b), and therefore f is concave up on [a, b]. 1 �

2.2. Convex functions. While it is certainly convenient to work withinthe set of differentiable functions, this restriction will limit the scope of ouranalysis. For instance, there are functions that are not differentiable on thesame interval for which their graph lies below its secant lines. The absolutevalue function, which is not differentiable at zero, is an example of a functionwith this property. See Figure 2.

Figure 2. The function f(x) = |x| lies below its secant lines.

x

yf(x) = |x|

With the following definition of a convex function we will drop the require-ment that f is differentiable while maintaining the geometrical properties ofa concave up function.

Definition 2.4. Let f be a function defined on an interval I. The functionf is convex on I if for each interval [c, d] ⊆ I, the inequality

f(

(1− t)c+ td)≤ (1− t)f(c) + tf(d)

is valid for 0 ≤ t ≤ 1.

As might be expected, this generalization comes with a trade-off. Prov-ing convexity from the definition has two possible difficulties. First, we willshow in Example 2.5 that even working with a simple function is in itselfnontrivial. Second, recall that the motivating factor when introducing con-vexity was to further the reach of our analysis. As is often the case, it maybe beneficial to have an equivalent definition. In Theorem 2.8 we will returnto our original geometric observation and show that secant lines lie abovethe graph of a convex function.

1The details of this proof are referenced from Gordon [1].

4

Example 2.5. We will show that f(x) = x2 is convex from the definition.Let [c, d] ⊆ I and 0 ≤ t ≤ 1. We find that

f(

(1− t)c+ td)

= (1− t)2c2 + 2t(1− t)cd+ t2d2

= (1− t)c2 − t(1− t)c2 + 2t(1− t)cd+ t2d2

= (1− t)c2 + t(1− t)c(2d− c) + t2d2 − td2 + td2

= (1− t)c2 − t(1− t)(c2 − 2cd+ d2) + td2

= (1− t)c2 − t(1− t)(c− d)2 + td2

≤ (1− t)c2 + td2

Thus, the function f(x) = x2 is convex.

We begin the proof of Theorem 2.8 with two lemmas.

Lemma 2.6. Let f be defined on an interval [c, d]. If 0 ≤ t ≤ 1, thenc ≤ (1− t)c+ td ≤ d.

Proof. Simple algebra reveals that

0 ≤ t ≤ 1

0 ≤ t(d− c) ≤ d− c

c ≤ c+ t(d− c) ≤ d

c ≤ (1− t)c+ td ≤ d ,

which is the desired result. �

Lemma 2.7. Let Scd be the function that represents the line through thepoints (c, f(c)) and (d, f(d)) and suppose that 0 ≤ t ≤ 1. Then

Scd

((1− t)c+ td

)= (1− t)f(c) + tf(d).

Proof. Note that the function Scd is defined by

Scd(x) =f(d)− f(c)

d− c(x− c) + f(c).

For 0 ≤ t ≤ 1, we find that

Scd

((1− t)c+ td

)=

f(d)− f(c)

d− c

((1− t)c+ td− c

)+ f(c)

=f(c)− f(d)

c− d(t)(d− c) + f(c)

= t(f(d)− f(c)

)+ f(c)

= (1− t)f(c) + tf(d). �

Theorem 2.8. Let f be a function defined on an interval I. Then f isconvex on I if and only if the graph of f lies below its secant lines on I.

5

Proof. Let f be defined on an interval I. Suppose that the graph of f liesbelow its secant lines on I, and let c, d ∈ I with c < d. By hypothesis theinequality f(x) ≤ Scd(x) is valid for all x ∈ [c, d], and we note that the pointx = (1 − t)c + td for 0 ≤ t ≤ 1 is in the interval [c, d] by Lemma 2.6. Itfollows from Lemma 2.7 that

f(x) = f(

(1− t)c+ td)≤ Scd

((1− t)c+ td

)= (1− t)f(c) + tf(d).

Hence, the function f is convex by definition.Now suppose f is convex on I, and let c, d ∈ I with c < d. For each

x ∈ [c, d] there exists a t ∈ [0, 1] such that x = (1 − t)c + td. From thedefinition of convexity we have that

f(x) = f(

(1− t)c+ td)≤ (1− t)f(c) + f(d) = Scd

((1− t)c+ td

).

This shows that the function f lies below its secant lines on I. �

2.3. Continuity. With Theorem 2.8 we have sufficient theoretical tools toanswer an important question: how well behaved are convex functions? Thatis, what can we say about continuity and differentiability? Recall that afunction may not be continuous if it has a jump discontinuity, a removablediscontinuity, or is wildly oscillating about a point. The fact that the secantlines must lie above the graph of a convex function should be convincingevidence that none of these cases are possible. See Figure 3 for an illustrationof this argument with a jump discontinuity.

Figure 3. The function f with a jump discontinuity.

x

y

f(x)

In proving this result, we will make use of Lemma 2.9, a proof of whichmay be found in Gordon [1].

Lemma 2.9. Let f be a convex function defined on an interval I. If a,b,and c are points in I such that a < b < c, then

f(b)− f(a)

b− a≤ f(c)− f(a)

c− a≤ f(c)− f(b)

c− b.

6

The inequality in Lemma 2.9 represents the slope of secant lines alongthe graph of f . This allows us to visualize lower and upper bounds for theslope of a line segment by choosing appropriate secant lines. This idea willbe illustrated in the proof of the following theorem.

Theorem 2.10. If f is a convex function defined on an open interval (a, b),then f is continuous on (a, b).

Proof. Suppose f is convex on (a, b), and let [c, d] ⊆ (a, b). Choose c1 andd1 such that

a < c1 < c < d < d1 < b.

If x, y ∈ [c, d] with x < y, we have from Lemma 2.9 (see Figure 4) that

f(y)− f(x)

y − x≤ f(d)− f(y)

d− y≤ f(d1)− f(d)

d1 − dand

f(y)− f(x)

y − x≥ f(x)− f(c)

x− c≥ f(c)− f(c1)

c− c1,

showing the set {∣∣∣∣f(y)− f(x)

y − x

∣∣∣∣ : c ≤ x < y ≤ d}

is bounded by M > 0. It follows |f(y)− f(x)| ≤M |y − x|, and therefore fis uniformly continuous on [c, d]. Recalling that uniform continuity impliescontinuity, we have shown that f is continuous on [c, d]. Since the interval[c, d] was arbitrary, f is continuous on (a, b). 2 �

Figure 4. Upper and lower bounds for Theorem 2.10.

x

y

y = f(x)

a c1 c x y d d1 b

The following corollary follows immediately from the fact that any closedinterval may be contained in an open interval.


7

Corollary 2.11. If f is a convex function defined on R, then f is continuouson R.

2.4. Differentiability. We now consider the differentiability of a convexfunction. Remember that we introduced convexity in order to remove thedifferentiability constraint from a concave up function. Since a convex func-tion is still geometrically related to a concave function, we may suspect theystill share similar properties. The derivative of a concave up function isincreasing. However, we can not readily draw an analogous conclusion forconvex functions because we are not guaranteed the derivative of a convexfunction exists at a point. We need an additional definition to expand theconcept.

Definition 2.12. Let f be defined on an interval I and let c ∈ I. The leftand right derivatives of f at c are denoted and defined by

f−(c) = limx→c−

f(x)− f(c)

x− cand

f+(c) = limx→c+

f(x)− f(c)

x− crespectively, provided the appropriate limits exist. (If c is an endpoint ofI, only one of these expressions is defined.) To say that f has one-sidedderivatives at c means that both f−(c) and f+(c) exist.

The concept of left and right derivatives is similar to that of left andright limits; a function f is differentiable at c if and only if f−(c) = f+(c).The following theorem provides the analogous result for convex functions.A proof may be found in Gordon [1].

Theorem 2.13. If f is a convex function defined on an open interval (a, b),then f has one-sided derivatives at each point of (a, b), f−(x) ≤ f+(x)for all x ∈ (a, b), and the functions f− and f+ are increasing on (a, b).Furthermore, if f is continuous on [a, b], then these conclusions may beextended to the closed interval [a, b].

Thus, the difference between a convex and concave up function in terms ofdifferentiablility is related to the inequality of the left and right derivatives.The following example continues our discussion of this result.

Example 2.14. Let f be a continuous function defined on [a, b], letc ∈ (a, b), and suppose that f is convex on [a, c] and [c, b]. We will con-struct a function that shows that f may not be convex on [a, b]. The clearpoint of interest is where the two intervals intersect at c. If we can piecewisedefine a function f such that f−(x) ≥ f+(x), then we may proceed using thecontrapositive of Theorem 2.13. Suppose a > 0, and define f : [a, b] → R,shown in Figure 5 by

8

f(x) =

{x2 for a ≤ x ≤ c;(x− c)2 + c2 for c ≤ x ≤ b.

We note f is continuous and that f−(x) = 2c ≥ 0 = f+(x). By Theorem2.13, the function f is not convex on [a, b].

Figure 5. The function f from Example 2.14.

x

y

y = f(x)

y = c2

a c b

In the previous example, the function f fails to be convex on the interval[a, b] because the condition f−(x) ≤ f+(x) is not valid at the point c. Thefollowing theorem explores a special case in which f is defined by piecewiselinear segments. Since the slope of each segment is constant, we will satisfythis inequality if the slope of the first line is less than the slope of the second.

Theorem 2.15. Let f be a continuous function defined on [a, b], letc ∈ (a, b), and suppose that f is convex on [a, c] and [c, b]. If f is linearon each of the intervals [a, c] and [c, b] and the slope of f on [a, c] is less thanthe slope of f on [c, b], then f is convex on [a, b].

Proof. Let f1 and f2 denote the line segment joining the points (a, f(a)),(b, f(b))and (b, f(a)), (c, f(c)) respectively (see Figure 6). We will first consider thecase that a < x ≤ c and show

Sab(x) =f(b)− f(a)

b− a(x− a) + f(a) ≥ f(c)− f(a)

c− a(x− a) + f(a) = f1.

By hypothesis, the slope of f2 is greater than the slope of f1. This means

f(b)− f(a)

b− c− f(c)− f(a)

c− a≥ 0.

Noting that b−cb−a > 0, we find

9

0 ≤( b− cb− a

)(f(b)− f(c)

b− c− f(c)− f(a)

c− a

)=

( c− bb− a

)(f(b)− f(c)

c− b+f(c)− f(a)

c− a

)=

f(b)− f(c)

b− a+c− bb− a

(f(c)− f(a)

c− a

)=

f(b)− f(c)

b− a+( 1

b− a− 1

c− a

)f(c)− f(a)

=f(b)− f(c) + f(c)− f(a)

b− a− f(c)− f(a)

c− a

≤ f(b)− f(a)

b− a(x− a) + f(b)− f(c)− f(a)

b− a(x− a)− f(b).

It follows that Sab(x) ≥ f1. The case in which we consider c < x < b andshow Sab(x) ≥ f2 is analogous. The graph of f lies below the secant lineSab, and therefore f is convex on [a, b]. �

Figure 6. The function f as defined in Theorem 2.15.

x

f(x)

a c b

f(a)

f(c)

f(b)

f1

f2

Sab

Here we take an aside to prove an important result: the number of pointsat which a convex function defined on an open interval is not differentiable iscountable. We will then return to our previous theorem in order to providea worst case example of a convex function that is not differentiable at acountably infinite number of points.

Theorem 2.16. If f is a convex function defined on an open interval (a, b),then the set of points in (a, b) at which f is not differentiable is countable.

Proof. The function f+ is increasing by Theorem 2.13. It can be shownthat since f+ is monotone, f+ has a countable number of discontinuities.We will show that f is differentiable at every point when f+ is continuous.

10

Let ε > 0, and suppose f+ is continuous at c. There exists a δ > 0 suchthat |f+(x) − f+(c)| < ε for all x ∈ (a, b) that satisfy |x − c| < δ. Supposec − δ < x < c. By Theorem 2.13, we find f+(x) ≤ f−(c). Using this fact,we have

f+(c)− ε < f+(x) ≤ f−(c) ≤ f+(c) < f+(c) + ε.

Since ε was arbitrary, it follows that f−(c) = f+(c). Thus, f is differentiableat c. 3 �

Example 2.17. For each positive integer n, let xn = (2n−1)2n and yn = n

n+1 .

Let f be the piecewise linear function that joins the points (xn, yn). Thefunction f is convex on the interval [0.5, 1) and the set of points at which fis not differentiable is countably infinite. [1]

First note that the range of the sequence {xn} is the interval [0.5, 1). Ourgoal will be to show that the slope of any linear segment of f is less than theslope of the subsequent segment. This amounts to proving that the followinginequality is true for all positive integers n ≥ 2:

(1)yn+1 − ynxn+1 − xn

≥ yn − yn−1

xn − xn−1.

This will allow us to use Theorem 2.15 to prove that f is convex on [0.5, 1).However, before we proceed it would be prudent to simplify inequality (1)to a more workable form. Substituting for the sequence definitions of {xn}and {yn}, inequality (1) becomes

(2)n+1n+2 −

nn+1

2n+1−12n+1 − 2n−1

2n

≥n

n+1 −n−1n

2n−12n −

2n−1−12n−1

.

Consider the left hand side of inequality (2). Simple algebra reveals that

n+ 1

n+ 2− n

n+ 1=

(n+ 1)(n+ 1)− n(n+ 2)

(n+ 2)(n+ 1)

=n2 + 2n+ 1− n2 − 2n

(n+ 2)(n+ 1)

=1

(n+ 2)(n+ 1)

2n+1 − 1

2n+1− 2n − 1

2n=

2n(2n+1 − 1)− (2n − 1)(2n+1)

(2n+1)(2n)

=−2n + 2n+1

(2n+1)(2n)

=1

2n+1.


11

By performing similar operations on the right hand side, it can be shownthat inequality (2) reduces to

2n+1

(n+ 2)(n+ 1)≥ 2n

(n+ 1)(n).

Let n ≥ 2. We find that

n ≥ 22

n+ 2≥ 1

n

2n+1

(n+ 2)(n+ 1)≥ 2n

(n+ 1)(n)

Hence, inequality (2) is true for all n ≥ 2. By Theorem 2.15 it followsthat the function f is convex on [0.5,1).

Furthermore, since f is convex on [0.5, 1). Theorem 2.16 states thatthe set of points in (0.5, 1) at which f is not differentiable is countable.Extending this observation to the interval [0.5, 1) is valid because we areadding at most one more point for which f is not differentiable. Thus, theset of points in [0.5, 1) for which f is not differentiable is countable. Thefunction f is illustrated in Figure 7. While f may appear smooth in thefigure, it should be noted that the graph of f is defined as a countablyinfinite number of line segments.

Figure 7. The function f as defined in Example 2.17.

x

f(x)

x1 = 12 x2 = 3

4x3x4 ... 1

y1 = 12

y2 = 23

y3

y4

.

.

.

1f(x)

12

3. Alternative criteria for convexity

Before proceeding to applications of convexity in real analysis and eco-nomics we will first derive a useful result applicable to continuous functions.Our motivating factor is clear from Example 2.5; proving convexity fromthe definition may be very cumbersome. Any method to streamline this willmake a welcome addition to our criterion for convexity.

Theorem 3.1. If f is continuous on (a, b) and

f(x+ y

2

)≤ 1

2f(x) +

1

2f(y)

for all x, y ∈ (a, b), then f is convex on (a, b).

This is a rather remarkable result for two reasons. First, Theorem 3.1simplifies the definition of convexity so that we need only consider the casein which t = 1

2 for a large class of functions. This result also provides ameans to approach unconventional functions that we know are continuous,an example of which will be shown following the proof.

This theorem is also notable for its long and possibly unfamiliar prooftechnique. We begin by leveraging our hypothesis to show that the numberof terms in the numerator, and the corresponding integer divisor, can beextended to any power of two, and ultimately to any n ∈ Z+ (Lemma 3.3and Lemma 3.5, respectively). Using these results, we can show the functionsatisfies the definition of convexity for rational value of t. Finally, sequencesof rational numbers will be used to approximate irrational values of t.

Proof of Theorem 3.1. Each of the following lemmas will be preceded byan example in which we work through a specific case. While our algebra isnot necessarily difficult, it is quite possible to be overwhelmed in the generalproof.

Example 3.2. Suppose that f is continuous on (a, b) and

f(x+ y

2

)≤ 1

2f(x) +

1

2f(y)

for all x, y ∈ (a, b). We will illustrate how the number of terms in Theorem3.1 may be extended from two to four. By applying our hypothesis twice,we see that

f(x1 + x2 + x3 + x4

4

)= f

( x1+x22 + x3+x4

2

2

)≤ 1

2f(x1 + x2

2

)+

1

2f(x3 + x4

2

)≤ 1

2

(1

2f(x1) +

1

2f(x2) +

1

2f(x3) +

1

2f(x4)

)=

1

4f(x1) +

1

4f(x2) +

1

4f(x3) +

1

4f(x4).

13

Lemma 3.3. If f is defined on (a, b) and

f(x+ y

2

)≤ 1

2f(x) +

1

2f(y)

for all x, y ∈ (a, b), then for n ∈ Z+, f has the property that

f

(1

2n

2n∑i=1

xi

)≤ 1

2n

2n∑i=1

f(xi).

Proof. We will use the Principle of Mathematical Induction. When n = 1,the inequality is valid by hypothesis, and hence 1 ∈ S. Suppose that theinequality is true for some positive integer k. We then have that

f(x1 + x2 + . . . + x2k + . . . + x2k+1

2k+1

)= f

( x1+x2+...+x2k

2k+

x2k+1

+x2k+2

+...+x2k+1

2k

2

)

≤ 1

2f(x1 + x2 + . . . + x2k

2k

)+

1

2f(x2k+1 + x2k+2 + . . . + x2k+1

2k

)≤ 1

2

(1

2k

2k∑i=1

f(xi)

)+

1

2

(1

2k

2k+1∑i=2k+1

f(xi)

)

=1

2k+1

2k+1∑i=1

f(xi).

Thus, the inequality is true for k + 1. It follows from the Principle ofMathematical Induction that the inequality

f

(1

2n

2n∑i=1

xi

)≤ 1

2n

2n∑i=1

f(xi)

holds for all positive integers. �

Our next goal is to show that the number of terms in Theorem 3.1 maybe extended to any m ∈ Z+. Note that 2n ≤ m < 2n+1 for some n ∈ Z+.We will work through an example where m = 6.

Example 3.4. We wish to show that

f

(1

6

6∑i=1

xi

)≤ 1

6

6∑i=1

f(xi).

Let x̄ equal the average of x1, x2, . . . , x6; that is, let x̄ = 16

6∑i=1

xi. By adding

x̄ to the average of the xi’s we find that the resulting average remains un-

changed. Therefore

6∑i=1

(xi)+x̄+x̄

8 = x̄. The idea is to continue adding x̄ untilwe are considering a number of terms that is a power of two. This allows us

14

to use our previous result. We find that

f(x̄) = f

( 6∑i=1

xi + x̄+ x̄

8

)

≤

6∑i=1

f(xi) + f(x̄) + f(x̄)

8

=1

8

6∑i=1

f(xi) +2

8f(x̄)

3

4f(x̄) ≤ 1

8

6∑i=1

f(xi)

f(x̄) ≤ 1

6

6∑i=1

f(xi)

f

(1

6

6∑i=1

xi

)≤ 1

6

6∑i=1

f(xi).

Having reached the desired result, we will now prove the general case.

Lemma 3.5. If f is defined on (a, b) and

f(x+ y

2

)≤ 1

2f(x) +

1

2f(y)

for all x, y ∈ (a, b), then for m ∈ Z+ the function f has the property that

f

(1

m

m∑i=1

xi

)≤ 1

m

m∑i=1

f(xi).

Proof. Note that we have already covered the case where n is a power of

two. Let x̄ = 1m

m∑i=1

xi and choose n ∈ Z+ such that 2n < m < 2n+1. Let

15

p = 2n+1 −m. Since m+ p is a power of two, we have by Lemma 3.3 that

f(x̄) = f

( m∑i=1

xi +p∑

i=1x̄i

m+ p

)

≤

m∑i=1

f(xi) +p∑

i=1f(x̄i)

m+ p

=1

m+ p

m∑i=1

f(xi) +p

m+ pf(x̄)

m

m+ pf(x̄) ≤ 1

m+ p

m∑i=1

f(xi)

f(x̄) ≤ 1

m

m∑i=1

f(xi)

f

(1

m

m∑i=1

xi

)≤ 1

m

m∑i=1

f(xi).

�

We now have the computational tools to prove Theorem 3.1.

Theorem 3.1. If f is continuous on (a, b) and

f(x+ y

2

)≤ 1

2f(x) +

1

2f(y)

for all x, y ∈ (a, b), then f is convex on (a, b).

Proof. We will first prove the case where t = mn for m,n ∈ Z+. By Lemma

3.5 we have that

f(

(1− t)x+ ty)

= f(

(1− m

n)x+

m

ny)

= f((n−m)x+my

n

)≤ 1

n

( n−m∑i=1

f(x) +m∑i=1

f(y))

=n−mn

f(x) +m

nf(y)

= (1− t)f(x) + tf(y).

Now consider the general case where t ∈ [0, 1]. Let t ∈ [0, 1], and let {tn}be a sequence of rational numbers that converges to t. By our earlier resultwe find that

f(

(1− tn)x+ tny)≤ (1− tn)f(x) + tnf(y)

16

for every positive integer n. Since f is continuous, we have

limn→∞

f(

(1− tn)x+ tny)

= f(

(1− t)x+ ty)

and

limn→∞

((1− tn)f(x) + tnf(y)

)= (1− t)f(x) + tf(y).

Therefore f(

(1− t)x+ ty))≤ (1− t)f(x)+ tf(y). It follows that f is convex

by definition. �

The benefit of Theorem 3.1, as we alluded to earlier, is that it makesdetermining convexity for continuous functions much easier. By consideringonly the t = 1

2 case in the definition of a convex function, we may gaininsight into algebraic techniques that would otherwise be obfuscated. Thefollowing theorem is an excellent example of how our result simplifies anotherwise difficult proof.

Theorem 3.6. Suppose that f is increasing on an interval [a, b) and definea function φ by

φ(x) =

∫ x

af.

The function φ is convex on (a, b).

Proof. By the Fundamental Theorem of Calculus, the function φ is contin-uous on the interval (a, b). Let [c, d] ⊆ (a, b), and let m = c+d

2 . We mustshow

φ(m) ≤ 1

2φ(c) +

1

2φ(d)

or equivalently,

2

∫ m

af ≤

∫ c

af +

∫ d

af.

Since f is increasing, the number f(m) is an upper and lower bound for fon the intervals [c,m] and [m, d] respectively. Noting that m is the midpointof c and d, we have∫ m

cf ≤ f(m)(m− c) = f(m)(d−m) ≤

∫ d

mf.

17

From this statement we find that∫ c

af +

∫ d

af =

∫ c

af +

∫ m

af +

∫ d

mf

≥∫ c

af +

∫ m

af +

∫ m

cf

=

∫ m

af +

∫ m

af

= 2

∫ m

af.

By Theorem 3.1, the function φ is convex on [a, b).We note in passing that if the increasing function f were continuous, we

may have used the Fundamental Theorem of Calculus to show that φ′′ > 0,and hence that φ is concave up on. Using our previous observations aboutsecant lines, we know that a concave up function is also convex. Since aconvex function may not be differentiable, however, the converse of thisstatement is not true in general. �

4. Applications to Real Analysis

As we explored the behavior of convex functions we found that several ofthe results appealed to the fact that a convex function lies below its secantlines. A few of the specific applications to real analysis, however, insteadrefer to the definition of convexity. The result is that the proofs becomenotably more algebraic in nature. The extra work involved should not be adeterrent from the notable results. In this section, we will first discuss therelationship between a decreasing convex function and the sequence of rightendpoint approximations to the area under the curve. We will then explorea curious exception involving pointwise convergence.

4.1. Right Endpoint Approximation. The Riemann integral is inextri-cably tied to the concept of determining the area under a function f on aclosed interval I. Generally, the interval I is first divided into a number ofsubintervals from which a point, referred to as a tag, is chosen from each.The function evaluated at each tag, multiplied by the length of the respec-tive interval, is a rectangle estimate of the area under f on that subinterval.The sum of these rectangles provides an estimate to the area under f on theentire interval.

Right endpoint approximation is the specific case in which the right end-point of each subinterval is chosen as the tag. The sequence of right end-point approximations to a positive, continuous, decreasing function definedon [0, 1] is given by

Rn =

n∑i=1

f( in

) 1

n.

18

While each term of the sequence will underestimate the area under thecurve, it can be shown that Rn converges to the Riemann integral. However,it is not necessarily true that each iteration provides a better estimate; thatis, we need additional criteria to guarantee the sequence of right endpointapproximations {Rn} is increasing. Convexity satisfies this requirement.

The first three values of Rn are illustrated in Figure 8. Note that sinceR2 is simply a bisection of R1, the total area approximated must be greater.With this argument we see that the subsequence R2n is increasing. However,it is not immediately obvious that R2 ≤ R3.

Figure 8. The sequence {Rn} respresented for a convexfunction: (a) n = 1, (b) n = 2, (c) n = 3.

(a)

Area = 0.5

x

f(x)

f(x) = 11+x

1

12

(b)

Area ≈ 0.58

x

f(x)

f(x) = 11+x

112

12

23

(c)

Area ≈ 0.61

x

f(x)

f(x) = 11+x

113

23

34

3512

19

Lemma 4.1. If f is convex on [a, b] and i and n are positive integers, thenthe inequality

f( in

)≤(n− i

n

)f( i

n+ 1

)+i

nf( i+ 1

n+ 1

)

is valid.

Proof. We find that

i

n=

in+ i

n(n+ 1)=in− i2 + i2 + i

n(n+ 1)=

(n− i)i+ i(i+ 1)

n(n+ 1)=(n− i

n

)( i

n+ 1

)+( in

)( i+ 1

n+ 1

),

and noting that f is convex, we have that

f( in

)≤ f

((n− in

)( i

n+ 1

))+ f

(( in

)( i+ 1

n+ 1

)).

�

Theorem 4.2. If f : [0, 1]→ R is convex, positive, and decreasing on [0, 1],

then the sequence {Rn} of right endpoint approximations to∫ 1

0 is increasing.

20

Proof. We will show that Rn ≤ Rn+1. We have from Lemma 4.1 that

Rn =1

n

n∑i=1

f( in

)≤ 1

n

n∑i=1

((n− in

)f( i

n+ 1

)+i

nf( i+ 1

n+ 1

))

=1

n

n∑i=1

(n− in

)f( i

n+ 1

)+

1

n

n+1∑i=1

i− 1

nf( i

n+ 1

)=

1

n

(n∑

i=1

n− 1

nf( i

n+ 1

))+

1

nf(1)

=n∑

i=1

1

n+ 1f( i

n+ 1

)+

n∑i=1

(n− 1

n2− 1

n+ 1

)f( i

n+ 1

)+

1

nf(1)

=

n∑i=1

1

n+ 1f( i

n+ 1

)+

n∑i=1

((n− 1)(n+ 1)− n2

(n2)(n+ 1)

)f( i

n+ 1

)+

1

nf(1)

=

n∑i=1

1

n+ 1f( i

n+ 1

)− 1

(n2)(n+ 1)

n∑i=1

f( i

n+ 1

)+

1

nf(1)

and using the fact that f is a decreasing function,

≤n∑

i=1

1

n+ 1f( i

n+ 1

)− 1

(n2)(n+ 1)

n∑i=1

f(1) +1

nf(1)

=

n∑i=1

1

n+ 1f( i

n+ 1

)+ f(1)

( 1

n− 1

n(n+ 1)

)=

n∑i=1

1

n+ 1f( i

n+ 1

)+ f(1)

1

n+ 1

=1

n+ 1

( n∑i=1

f( i

n+ 1

)+ f

(n+ 1

n+ 1

))=

1

n+ 1

n+1∑i=1

f( i

n+ 1

)= Rn+1.

�

4.2. Pointwise Convergence. A sequence of functions {fn} convergespointwise to f on an interval I if the sequence {fn(x)} converges to f(x) foreach x ∈ I. One of the striking weaknesses of pointwise convergence is that

21

certain properties of {fn} may fail to carry over to the limit function. Thatis to say, even if each of the functions fn is bounded, continuous, or RiemannIntegrable, it does not necessarily follow that f shares this property. Forcounterexamples demonstrating this fact, see Gordon [1].

One may wonder if a sequence of convex functions shares a similar fate.The answer may be surprising. Pointwise convergence is indeed sufficient toguarantee that convexity transfers to the limit function.

Theorem 4.3. Let {fn} be a sequence of convex functions defined on aninterval I. If {fn} converges pointwise on I to a function f then f is convexon I.

Proof. Suppose that fn is a sequence of convex functions defined on I thatconverges pointwise on I to a function f . This means that for [c, d] ⊆ I

f(

(1− t)c+ td)

= limn→∞

fn

((1− t)c+ td

)≤ lim

n→∞

((1− t)fn(c) + tfn(d)

)= lim

n→∞(1− t)fn(c) + lim

n→∞tfn(d)

= (1− t)f(c) + tf(d).

It follows that the function f is convex on I by definition.�

5. Economic Modeling of Uncertainty

The purpose of this section will be to explore an application of convexity toeconomic consumer thoery. Economics is the study of how agents choose toallocate scarce resources. Given a set of rationality assumptions, economistshope to model and predict the choices of individuals. Since the 1960’s,economists have been increasingly concerned with developing more rigorousmathematical models. The particular model we will be discussing is that ofconsumer behavior under situations of uncertainty. Before addressing ourapplication to convexity, we will briefly discuss its theoretical foundation. 4

5.1. Consumer Preference. Let xi ∈ R+ represent the number of unitsof good i. We assume that there is a finite number of goods n. The vectorx = (x1, . . . , xn) describes a particular distribution of goods, and is referredto as a consumption bundle. Note that a consumption bundle is a pointx ∈ Rn

+. In situations of certainty, consumer theory looks at the prefer-ence of individuals over the set of consumption bundles X. An axiomaticassumption is that an individual is able to apply a preference relation to acollection of bundles. That is, given two bundles x1 and x2, an individual

4The axioms, definitions, and theorems in the section are referenced from Jehle andReny’s Advanced Microeconomic Theory.

22

has a preference or is indifferent between the two. The symbol % repre-sents a binary relation defined on the set of consumption bundles X. Forexample, the statement x1 % x2 means that x1 is at least as preferable asx2 to a particular consumer. We extend this concept to consumer behaviorin situations of uncertainty by considering a vector of probability weightedoutcomes called a gamble.

Let A = {a1, . . . , an} be a finite set of certain outcomes. Dependingon the situation, an outcome may be an amount of money, an event, oranything else. A simple gamble assigns a probability pi, using the symbol◦ as notation, to each of the outcomes ai in A such that

g = (p1 ◦ a1, . . . , pn ◦ an) where pi ≥ 0,

n∑i=1

pi = 1.

For example, the gamble in which a coin is flipped and an individual recieves$10 if heads, and $0 if tails, may be represented as (1

2 ◦ 10, 12 ◦ 0). The set

of simple gambles GS on A is given by

GS ≡ {(p1 ◦ a1, . . . , pn ◦ an)| pi ≥ 0,n∑

i=1

pi = 1}.

There are six axioms of choice under uncertainty; however, for the scope ofthis paper, we will only be concerned with two. The interested reader shouldrefer to Jehle and Reny [2]. We assume that a consumer is able to applya preference relation % over GS , just as they are able to over consumptionbundles. The symbols ∼ and � denote indifference and strict preferences,respectively.

Axiom 1 (Completeness). For any two gambles, g and g′ in GS ,

g % g′ or g′ % g.

Axiom 2 (Transitivity). For any three gambles g, g′, g′′ in GS ,

if g % g′ and g′ % g′′, then g % g′′.

5.2. Von Neumann-Morgenstern Utility Functions. It may be diffi-cult to analyze the behavior of individuals when their preference informationis expressed over a set of vectors. We therefore introduce the concept of autility function. In general, a utility function is a transformation from a setof objects to the set of real numbers that preserves consumer preferences.In situations of certainty, the domain of a utility function is the set of con-sumption bundles. The corresponding domain of the utility function u insituations of uncertainity is the set of gambles GS . We also say that u hasthe expected utility property.

Definition 5.1. The utility function u : GS → R has the expected utilityproperty if, for every g ∈ GS ,

u(g) =n∑

i=1

piu(ai)

23

where (p1 ◦ a1, . . . , pn ◦ an) is the simple gamble induced by g.

Hence, the utility of a gamble is simply the probability weighted utilitiesof the outcomes ai. Note that u(ai) is simply the utility function evaluatedat a gamble in which the probability assigned to the outcome ai is equalto one. A utility function in situations of uncertainty is refered to as a vonNeumann-Morgenstern (VNM) utility function. It is named after economistsJohn von Neumann and Oskar Morgenstern who proved its existence fromfour of the axioms of choice under uncertainty. A proof may be found inJehle and Reny [2].

Theorem 5.2. Let preferences % over gambles in GS satisfy the six axiomsof choice under uncertainty. Then there exists a utility function u : GS → Rrepresenting % on GS , such that u has the expected utility property.

5.3. Risk. One of the defining characteristics of the VNM utility function isthe preservation of consumer preferences over GS . This result allows us to vi-sualize consumer behavior. In particular, we will look at how an individual’sattitude towards risk is shown by the VNM utility function.

For illustrative purposes, we will consider the case in which our outcomesare some units of wealth wi. The expected value of a gamble g is given by

E(g) =n∑

i=1

piwi.

Intuitively, a risk averse individual is one who prefers to minimize theirexposure to uncertainty. We would therefore expect a risk averse individualto prefer the expected value of a gamble instead of accepting the uncertaintyinherently associated with a gamble. The following definitions reflect thisidea.

Definition 5.3. Let the function u represent an individual’s VNM utilityfunction for gambles over wealth. For the gamble g, an individual is

(1) risk averse at g if u(E(g)

)> u(g).

(2) risk neutral at g if u(E(g)

)= u(g).

(3) risk loving at g if u(E(g)

)< u(g).

The following hypothetical gamble illustrates risk loving behavior.

Example 5.4. Consider the following gamble

g1 = (0.75 ◦ $700, 0.25 ◦ $1900).

The expected value of the gamble is given by

E(g1) = p1w1 + p2w2 = (0.75)($700) + (0.25)($1900) = $1000.

By the definition of the expected utility property, the utility of the gambleis given by the individual’s VNM utility function evaluated at each of the

24

outcomes and weighted by the respective probabilities

u(g1) = p1u(w1) + p2u(w2) = .75u($700) + .25u($1900).

Suppose that a risk loving individual over GS is offered g1. By our definition,we have that

u($1000) = u(E(g1)) < u(g1) = .75u($700) + .25u($1900).

The individual’s preference of the risk inherent in g1 is illustrated in Figure9. Note that the individual’s VNM utility function u appears to be convexon R+.

We find intuition for this observation by recalling that the graph of aconvex function lies below its secant lines. In the case of a gamble with twooutcomes, it can be shown that u(g1) corresponds to a point along the secantline drawn from (w1, u(w1)) to (w2, u(w2)). In particular, we find that

Sw1w2(E(g1)) = u(g1).

With reference to Figure 9, we note that a convex VNM utility functionsatisfies the requirement for risk loving behavior that u(E(g1)) < u(g1).

Figure 9. Illustration of the gamble presented in Example 5.4.

$

Utility

u(x)

w1 = 700 w2 = 900

u(w1) = u(700)

u(w2) = u(1900)

E(g1) = 1000

u(E(g1)) = u(1000)

u(g1) = .75u(700) + .25u(1900)

25

5.4. Jensen’s Inequality. We now return to the overlying topic of con-vexity. As we alluded to at the conclusion of our previous example, it canbe shown that a VNM utility function that preserves risk loving behavioris convex. This follows quickly from Jensen’s Inequality, a result that weprove below.

Lemma 5.5. For any points x1, x2, . . . , xn in (a, b) and for any nonnegativenumbers p1, p2, . . . , pn, we have

a <p1x1 + p2x2 + · · ·+ pnxn

p1 + p2 + · · ·+ pn< b.

Proof. Let xm = min{x1, x2, . . . , xn} and xM = max{x1, x2, . . . , xn}. Wefind that

p1x1 + p2x2 + · · ·+ pnxn

p1 + p2 + · · ·+ pn≤

p1xM + p2xM + · · ·+ pnxM

p1 + p2 + · · ·+ pn= xM

(p1 + p2 + · · ·+ pn

p1 + p2 + · · ·+ pn

)= xM < b,

and

p1x1 + p2x2 + · · ·+ pnxn

p1 + p2 + · · ·+ pn≥

p1xm + p2xm + · · ·+ pnxm

p1 + p2 + · · ·+ pn= xm

(p1 + p2 + · · ·+ pn

p1 + p2 + · · ·+ pn

)= xm > a.

�

Theorem 5.6 (Jensen’s Inequality). A function f is convex on (a, b) if andonly if for any points x1, x2, . . . , xn in (a, b) and for any nonnegative numbersp1, p2, . . . , pn, the inequality

f(p1x1 + p2x2 + · · ·+ pnxn

p1 + p2 + · · ·+ pn

)≤ p1f(x1) + p2f(x2) + · · ·+ pnf(xn)

p1 + p2 + · · ·+ pn

is valid.

Proof. Suppose that a function f is convex on (a, b). We will use the Prin-ciple of Mathematical Induction. For any points x1, x2, . . . , xn in (a, b) andfor any nonnegative numbers p1, p2, . . . , pn, consider the inequality

f(p1x1 + p2x2 + · · ·+ pnxn

p1 + p2 + · · ·+ pn

)≤ p1f(x1) + p2f(x2) + · · ·+ pnf(xn)

p1 + p2 + · · ·+ pn.

When n = 1, we have that

f(p1x1

p1

)= f(x1) ≤ p1f(x1)

p1= f(x1).

Suppose that the inequality is true for some positive integer k. This may beexpressed in summation notation as

f

(∑k+1i=1 pixi∑k+1i=1 pi

)= f

(∑ki=1 pixi∑k+1i=1 pi

+pk+1xk+1∑k+1

i=1 pi

)

26

Since∑k

i=1 pixi∑ki=1 pi

is in (a, b) by Lemma 5.5, we may use the definition of a

convex function and the induction hypothesis. Referring to the definition ofa convex function, it may be helpful to note

t =

∑ki=1 pixi∑k+1i=1 pi

and (1− t) =pk+1∑k+1i=1 pi

.

We find that

f

(∑k+1i=1 pixi∑k+1i=1 pi

)= f

(∑ki=1 pixi∑ki=1 pi

·∑k

i=1 pi∑k+1i=1 pi

+pk+1xk+1∑k+1

i=1 pi

)

≤ f

(∑ki=1 pixi∑ki=1 pi

)∑ki=1 pi∑k+1i=1 pi

+ f(xk+1)pk+1∑k+1i=1 pi

≤∑k

i=1 pif(xi)∑ki=1 pi

·∑k

i=1 pi∑k+1i=1 pi

+f(pk+1)pk+1∑k+1

i=1 pi

=

∑ki=1 pif(xi)∑k+1

i=1 pi+f(pk+1)pk+1∑k+1

i=1 pi

=

∑k+1i=1 pif(xi)∑k+1

i=1 pi,

thus showing that the inequality is true for k+1. It follows from the Principleof Mathematical Induction that the inequality

f(p1x1 + p2x2 + · · ·+ pnxn

p1 + p2 + · · ·+ pn

)≤ p1f(x1) + p2f(x2) + · · ·+ pnf(xn)

p1 + p2 + · · ·+ pn

is true for all positive integers, for any points x1, x2, . . . , xn in (a, b), and forany nonnegative numbers p1, p2, . . . , pn.

Now suppose that for any points x1, x2, . . . , xn in (a, b) and for any non-negative numbers p1, p2, . . . , pn, the inequality

f(p1x1 + p2x2 + · · ·+ pnxn

p1 + p2 + · · ·+ pn

)≤ p1f(x1) + p2f(x2) + · · ·+ pnf(xn)

p1 + p2 + · · ·+ pn

is valid. For 0 ≤ t ≤ 1, let p1 = t, p2 = (1− t), and p3 = . . . = pn = 0. Wefind that

f(tx1 + (1− t)x2)

)≤ tf(x1) + (1− t)f(x2)

Hence, the function f is convex by definition. �

Theorem 5.7. An individual is risk loving over gambles involving nonneg-ative wealth levels if and only if their VNM utility function is strictly convexon R+.

Proof. Suppose that an individual is risk loving over gambles involving non-negative wealth levels, let u represent their VNM utility function, and let

27

g ∈ GS . This means that u(E(g)

)< u(g), which can be equivalently ex-

pressed as

u(p1w1 + p2w2 + . . .+ pnwn

)< p1u(w1) + p2u(w2) + . . .+ pnu(wn).

Noting that∑n

i=1 pi = 1, it follows from Jensen’s Inequality that the func-tion u is convex on R+.

Now suppose that an individual’s VNM utility function u is convex onR+ and let g ∈ GS . Jensen’s Inequality states that the following inequality

u(p1w1 + p2w2 + . . .+ pnwn

)< p1u(w1) + p2u(w2) + . . .+ pnu(wn)

holds for each wi ∈ R+. Hence, the individual is risk loving at g.We have thus shown that an individual is risk loving over gambles involv-

ing nonnegative wealth levels if and only if their VNM utility function isstrictly convex on R+. �

Conclusion

Using convexity to discuss risk behavior has allowed for greater appreci-ation of the underlying economic theory. In the same way, our theoreticalanalysis of convex functions that began with geometrical observations ledto rewarding topics in real analysis. We conclude our discussion of convexfunctions with a survey of areas of further research. With additional time,we may have explored the relationship between convex functions and met-ric preserving functions. For those interested in applications of convexityto economics, one may reference Jehle and Reny’s discussion of indifferencecurve analysis in [2]. Furthermore, we note that convex functions are in-extricably related to convex sets. The study of convex sets is essential inlinear optimization, as well as graduate level economic proofs.

Acknowledgements

Sincere thanks to Price Hardman for his helpful comments on severaldrafts of the paper, to Professor Gordon and Professor Balof for their ad-visement and careful attention to details, to Professor Schueller, for withouthis support with TikZ this paper would be notably less interesting, and toProfessor Crouter for her support in the economics section. Finally, I wouldlike to thank all members of the Whitman Swim Team, past and present,for their daily support and encouragement.

References

[1] Russel A. Gordon, Real Analysis: A First Course, (New York: Addison-Wesley, 2002).[2] Geoffrey A. Jehle and Philip J. Reny, Advanced Microeconomic Theory, (New York:

Addison-Wesley, 2001).

introduction - whitman collegea study of convex functions with applications matthew liedtke may 14,...

Documents