1 functionalanalysis - math.uni-leipzig.degittel/ma4phys/scriptma4(zimmerm).pdf · 1.1 banachspaces...

72
1 Functional Analysis 1

Upload: others

Post on 17-Oct-2019

5 views

Category:

Documents


0 download

TRANSCRIPT

1 Functional Analysis

1

1.1 Banach spacesRemark 1.1. In classical mechanics, the state of some physical system is characterizedas a point x in phase space (generalized position and momentum coordinates). Anobservable (i.e. measurable quantity) is then a function f on phase space and f(x) isthe value obtained by measuring f in the state x. For example for the one-dimensionalharmonic oscillator the phase space is essentially R2 and energy is the function E(x, p) :=12x

2 + 12p

2 (if units are chosen appropriately).In quantum mechanics, the state space is a Hilbert space whose elements are usually

called wave functions. The physical interpretation of such wave functions depends onthe model and might not be obvious. The observables are operators on the Hilbert spaceand possible measurements are their spectral values.Our task in this course will be to understand the mathematical objects mentioned

above and how they are interrelated.

All the vector spaces mentioned below will be over the field of real (R) or complex (C)numbers. To state results that apply for both cases, we write K for the base field.

Definition 1.2. An algebra over K is a K-vector space V with a map V ×V → V calledthe product or multiplication and denoted by juxtaposition such that for any x, y, z ∈ Vand λ, ρ ∈ K the following hold

(i) x(y + z) = xy + xz

(ii) (x+ y)z = xz + yz

(iii) (λx)(ρy) = (λρ)(xy)

(iv) (xy)z = x(yz).

An algebra is called unital if there is a unit element, i.e. e ∈ V such that ev = ve = vfor every v ∈ V .

Remark 1.3. Structures not fulfilling (iv) are also called (non-associative) algebras (e.g.Lie algebras) but all our algebras will be associative.Note that we do not require the product map to be commutative.

Definition 1.4. A normed vector space is a pair (V, ‖ · ‖) consisting of a vector spaceV and a map ‖ · ‖ : V → [0,∞) called the norm fulfilling

(i) ‖λx‖ = |λ| ‖x‖ (positive homogeneity),

(ii) ‖x+ y‖ ≤ ‖x‖+ ‖y‖ (subadditivity/triangle inequality) and

(iii) ‖x‖ = 0 if and only if x = 0 (definiteness)

2

for any x, y ∈ V and any λ ∈ K.A map fulfilling all these conditions except for definiteness is called a semi-norm.If V is an algebra, the pair (V, ‖ · ‖) is called a normed algebra if

‖xy‖ ≤ ‖x‖ ‖y‖

for every x, y ∈ V .

Remark 1.5. The geometric interpretation of ‖x‖ is the “length” of the element x ∈ V .It also allows us to define the distance of two elements x, y ∈ V as ‖x− y‖. Armed withthis distance, we can now define the usual topological concepts:

(i) a sequence (xn) ⊂ V converges against x ∈ V if and only if for every ε > 0 there isN ∈ N such that for every n ≥ N the distance ‖xn − x‖ < ε if and only if ‖xn − x‖converges to 0.

(ii) Br(x) = {y ∈ V | ‖y‖ < r} for r ∈ (0,∞),

(iii) A ⊂ V is open if and only if for every x ∈ A there is r ∈ (0,∞) such thatBr(x) ⊂ A,

(iv) A ⊂ V is closed if and only if V \A is open if and only if the limit of every convergentsequence (xn) ⊂ A is in A,

(v) a map f from (V, ‖ · ‖V ) to (W, ‖ · ‖W ) is continuous in x ∈ V , if and only if forevery ε > 0 there is δ > 0 such that ‖y − x‖V < δ implies ‖f(y)− f(x)‖W < ε, ifand only if f−1(A) is open for every open A ⊂W ,

(vi) the closure A of a set A ⊂ V is the smallest closed set containing A.

Example 1.6. (i) V = Kn with any of the following norms for x = (x1, . . . , xn)

‖x‖1 = |x1|+ · · ·+ |xn|

‖x‖2 =

√|x1|2 + · · ·+ |xn|2

‖x‖p = (|x1|p + · · ·+ |xn|p)1p for p ∈ [0,∞)

‖x‖∞ = sup {|x1| , . . . , |xn|}

In the case n = 1, V is a normed algebra.

(ii)

l1 :=

{(xn)n∈N ⊂ K

∣∣∣∣∣∞∑n=1

|xn| <∞

}with

‖(xn)‖1 =∞∑n=1

|xn|

3

(iii)

l2 :=

{(xn) ⊂ K

∣∣∣∣∣∞∑n=1

|xn|2 <∞

}with

‖(xn)‖2 =

√√√√ ∞∑n=1

|xn|

(iv)

l∞ := {(xn) ⊂ K| ∃K ∈ [0,∞) such that |xn| ≤ K for all n ∈ N}

with

‖(xn)‖∞ = supn∈N|xn|

(normed algebra)

(v)

c0 :={

(xn) ⊂ K∣∣∣ limn→∞

xn = 0}

with

‖(xn)‖∞ = supn∈N|xn|

(normed algebra)

(vi) A a set,

B(A) = {f : A→ K| f bounded}

with

‖f‖∞ := supx∈A|f(x)|

(normed algebra)

(vii) A ⊂ Rn (or more generally any topological space)

C(A) = {f : A→ K| f bounded and continuous}

with

‖f‖∞ := supx∈A|f(x)|

4

(normed algebra)For each of these examples, the validity of the norm axioms needs to be verified

(exercise). There is a number of other norms we could have considered on the spacesdefined above (for example ‖ · ‖∞ on l2), the particular norms chosen are the “usual”ones however and we will generally not explicitly mention them. Thus if we are talkingof l2, we will use the norm ‖ · ‖2 if nothing else is mentioned.

Remark 1.7. Let V be a vector space and ‖ · ‖ a semi-norm on V , then the set

V0 := {v ∈ V | ‖v‖ = 0}

is called the kernel of ‖ · ‖. It is a vector subspace of V (exercise).The quotien space is the set

W = V/V0 := {v + V0| v ∈ V } .

Note that v1 + V0 = v2 + V0 if and only if v1 − v2 ∈ V0. The space W inherits a naturalvector space structure from V by defining

(v1 + V0) + (v2 + V0) := v1 + v2 + V0

λ(v1 + V0) := λv1 + V0

for every v1, v2 ∈ V and λ ∈ K (check that these are well defined and fulfill the vectorspace axioms). The neutral element in this space is then precisely the space V0 = 0+V0.Now we can define

‖v + V0‖ := ‖v‖

(we should use a new symbol on the left hand side but we slightly abuse notation here).The equality v1 +V0 = v2 +V0 implies v1−v2 ∈ V0 and hence ‖v1‖ ≤ ‖v2‖+‖v2 − v1‖ =‖v2‖. Exchanging v1 and v2 yields ‖v1‖ = ‖v2‖ so the map is actually well defined.It inherits the semi-norm properties from the initial semi-norm but it is a norm on W

since ‖v + V0‖ = ‖v‖ = 0 if and only if v ∈ V0 if and only if v + V0 = V0 is the neutralelement of W .

Example 1.8. Let λ be the Lebesgue measure on Rn and A ⊂ Rn (measurable). Definethe space of (Lebesgue) integrable functions

L1(A) :=

{f : A→ K

∣∣∣∣ f is measurable and∫A|f |dλ <∞

}and

‖f‖1 :=

∫A|f |dλ.

5

For f, g ∈ L1(A) and κ ∈ K we have

‖f + κg‖1 =

∫A|f + κg|dλ ≤

∫A

(|f |+ |κ| |g|) dλ = ‖f‖1 + |κ| ‖g‖1 <∞,

which proves the triangle inequality for ‖ · ‖1 and that L1(A) is a vector space.On the other hand, there are obviously non-zero functions f such that

∫A |f | dλ = 0.

So ‖ · ‖1 is only a semi-norm.By quotienting out the kernel of ‖ · ‖1 according to the procedure described above, we

obtain the space of integrable functions L1(A). Note that the elements of this vectorspace are not functions any more but subspaces or equivalence classes of functions. Oftenwe do not have to care, but there are situations where one needs to be careful. The mostimportant thing to keep in mind is, that the elements of L1(A) cannot be evaluated ata single point x ∈ A any more. This is due to the fact, that the subspace any f ∈ L1(A)represents contains functions with arbitrary values at the point x.

Remark 1.9. The construction above can be done for any measure space (A,µ). Thespace l1 is a special case where A = N and µ is the counting measure.

Definition 1.10. Let (V, ‖ · ‖) be a normed space. A sequence (vn) ⊂ V is called aCauchy sequence, if for every ε > 0, there is N ∈ N such that for every n,m ≥ N

‖vn − vm‖ < ε.

The space (V, ‖ · ‖) is called complete if every Cauchy sequence converges. A completenormed vector space is called a Banach space. A complete normed algebra is called aBanach algebra.

Proposition 1.11. The normed space (V, ‖ · ‖) is complete if and only if every absolutelyconvergent series

∑∞k=1 vk in V is convergent. A series is absolutely convergent if and

only if∑∞

k=1 ‖vk‖ <∞.

Proof. Let (V, ‖ · ‖) be complete and∑∞

k=1 vk an absolutely convergent series. Let (sn)be the sequence of partial sums, i.e.

sn =n∑k=1

vk.

For any ε > 0, there is N ∈ N such that∞∑k=N

< ε.

For every n,m > N ( m > n) we have

‖sn − sm‖ =

∥∥∥∥∥m∑

k=n+1

vk

∥∥∥∥∥ ≤m∑

k=n+1

‖vk‖ <∞∑k=N

‖vk‖ < ε.

6

Thus the sequence (sn) is a Cauchy sequence and since the space is complete it converges.Suppose now, that any absolutely convergent series in (V, ‖ · ‖) converges and let (vn) ⊂

V be a Cauchy sequence. Then for each k, there is Nk such that for every n,m ≥ Nk

‖vn − vm‖ < 2−k.

Without loss of generality, we can choose the sequence (Nk) to be strictly increasing.Define w1 = vN1 and wk = vNk − vNk−1

for k > 1. One easily finds

k∑l=1

wl = vNk . (1.1)

Moreover∞∑l=1

‖wl‖ = ‖vN1‖+∞∑l=2

∥∥vNl − vNl−1

∥∥ < ‖vN1‖+∞∑l=2

2l−1 <∞

so the series∑∞

l=1wl is absolutely convergent and hence convergent.Thus (vNk) is a convergent subsequence of (vn). Any Cauchy sequence with a conver-

gent subsequence is convergent (exercise).

Remark 1.12. All the example spaces from example 1.6 and example 1.8 are complete.We give the proof for one example below and for another in proposition 1.38.

Proposition 1.13. Let A ⊂ Rn. The space C(A) of bounded continuous functions onA is complete (with respect to the norm ‖f‖ = supx∈A |f(x)|).

Proof. For every function f ∈ C(A) and every x ∈ A, we have |f(x)| ≤ supx∈A |f(x)| =‖f‖. So let (fn) be a Cauchy sequence in C(A). Fix x ∈ A. For every ε > 0, there isN ∈ N such that for m,n ≥ N , ‖fn − fm‖ < ε so in particular

|fn(x)− fm(x)| = |(fn − fm)(x)| ≤ ‖fn − fm‖ < ε.

This shows, that (fn(x)) is a Cauchy sequence in K and since K is complete it converges.The point x was arbitrary, so the sequence of functions (fn) converges pointwise to somelimit function f .There is K > 0 such that ‖fn‖ ≤ K for every n ∈ N since (fn(x)) is a Cauchy sequence

(exercise). Thus the sequence (fn(x)) is bounded by K independently of x and so thelimit function f is bounded by K as well.We show next, that fn → f with respect to the norm. So let ε > 0 and choose N ∈ N

such that ‖fn − fm‖ < ε for every n,m ≥ N . For every n ≥ N and x ∈ A, this implies|fn(x)− fm(x)| < ε for all m ≥ N and thus

|fn(x)− f(x)| = limm→∞

|fn(x)− fm(x)| < ε.

7

Taking the supremum with respect to x yields ‖fn − f‖ < ε for every n ≥ N whichproves the sought for convergence.Finally, we show that f is continuous. Choose x ∈ A and ε > 0. There is n ∈ N such

that ‖fn − f‖ < ε3 . Since fn is continuous, there is δ > 0 such that |fn(x)− fn(y)| < ε

3for any y ∈ A fulfilling ‖x− y‖ < δ (note that the last norm is the one on Rn). For anysuch y we find

|f(x)− f(y)| ≤ |f(x)− fn(x)|+ |fn(x)− fn(y)|+ |fn(y)− f(y)|≤ 2 ‖f − fn‖+ |fn(x)− fn(y)| < ε.

This proves the continuity of f in x and since the latter point was arbitrary f is contin-uous.

Definition 1.14. Let (V, ‖ · ‖) be a normed vector space. A set A ⊂ V is called compact,if for any open cover (Oi)i∈I of A, i.e. a family of open sets such that A ⊂

⋃i∈I Oi, there

is a finite subcover, i.e. there exist n ∈ N and i1, . . . , in ∈ I such that A ⊂⋃nk=1Oik .

Proposition 1.15. Let (V, ‖ · ‖) be a normed vector space. Any compact set A ⊂ V isclosed and bounded.

Proof. For boundedness, consider the cover B1(0), B2(0), . . ..For closedness, suppose x is a boundary point of A. For r > 0 let

Kr(x) := {y ∈ V | ‖y − x‖ ≤ r} .

For n ∈ N, consider the open sets

On := V \K 1n

(x).

Since x is a boundary point, no finite set of the On can cover A, thus by compactnessthe union ⋃

n∈NOn = V \ {x}

does not cover A, i.e. x ∈ A. Since x was an arbitrary boundary point, A is closed.

Remark 1.16. The Heine-Borel theorem shows, that in Kn any closed and bounded setis compact.This is not true in infinite dimensions. A normed vector space (V, ‖ · ‖) is finite di-

mensional if and only if K1(0) is compact.

Proposition 1.17. Closed subsets of compact sets are compact.

Proof. Exercise.

8

Proposition 1.18. Let (V, ‖ · ‖) be a normed vector space and A ⊂ V . Then A iscompact if and only if every sequence in A has a subsequence that converges in A.

Proof. We only show the direction “ =⇒ ”. A sequence (xn) ⊂ A has a subsequenceconverging in A if and only if there is x ∈ A such that for every ε > 0 the ball Bε(x)contains infinitely many sequence elements (exercise). Suppose there is a sequence (xn)without a convergent subsequence, i.e. for every y ∈ A, there is εy > 0, such thatOy := Bεy(y) contains only finitely many of the (xn). Obviously we have

A ⊂⋃y∈A

Oy

so by compactness, there is a finite subcover Oy1 , . . . , Oyl . But then A contains onlyfinitely many of the (xn) which is a contradiction.

Definition 1.19. Let (V, ‖ · ‖) be a normed vector space. A set A ⊂ V is dense in V , ifand only if A = V .

Proposition 1.20. Let (V, ‖ · ‖) be a normed vector space and A ⊂ V . The followingare equivaltent.

(i) A is dense in V ,

(ii) for every v ∈ V and every ε > 0, there is w ∈ A such that ‖v − w‖ < ε,

(iii) for every v ∈ V , there is a sequence (vn) ⊂ A that converges to v.

Proof. Exercise.

Example 1.21. The set of finite sequences

cc := {(xn) ⊂ K| there is N ∈ N such that xn = 0 for n > N}

is a dense subspace in c0, l1 and l2 (with respect to the respective norms). We provethe statement for l1. Let (xn) ∈ l1 and ε > 0. By assumption,

‖(xn)‖1 =∞∑n=1

|xn| <∞

so there is N ∈ N, such that∑∞

n=N |xn| < ε. Define (yn) ⊂ cc by

yn :=

{xn n < N

0 n ≥ N.

Then

‖(xn)− (yn)‖1 =∞∑n=N

|xn| < ε

9

holds which by the preceding proposition proves the density.Note that in finite dimensions, subspaces are always closed, so the only dense subspace

of Rn is Rn itself.The space cc is not dense in l∞ since the sequence xn = 1 for n ∈ N has ‖ · ‖-distance

at least 1 to any element of cc.

Remark 1.22. Dense subsets are important because they can be substantially easier todeal with than the entire space V . There are many proofs in functional analysis, thatshow some property for a dense subset first and then show that the property is invariantunder norm limits to generalize it to the entire space.

Theorem 1.23 (Stone-Weierstraß). Let K ⊂ Rn be compact (more generally anycompact topological space). Let A ⊂ C(K) be a subalgebra fulfilling

(i) A contains the constant functions,

(ii) for two distinct points x, y ∈ K, there is f ∈ A such that f(x) 6= f(y) (we say thatA separates the points of K),

(iii) A is closed under complex conjugation (void if K = R).

Then A is dense in C(K).

Remark 1.24. The function t 7→√

1 + t is analytic in 0 with the series expansion∞∑n=0

antn

where a0 = 1 and

an =

(12

n

):=

12

(12 − 1

)· · ·(

12 − n+ 1

)1 · · ·n

.

The series converges to√

1 + t for t ∈ [−1, 1).

Without proof.

Lemma 1.25. Let B be a closed (with respect to ‖ · ‖∞) subalgebra of C(K). Then fornon-negative f ∈ B,

√f ∈ B. For every g, h ∈ B, the functions |g| ,max {g, h} and

min {g, h} are in B.

Proof. We prove the case K = R, to which the case K = C can be reduced (exercise).Let f ∈ B be non-negative. Since B is an algebra, it contains

√f if and only if it

contains√λf for any λ > 0. Thus we may assume that ‖f‖ = 1, i.e. 0 ≤ f ≤ 1. The

function g = 1− f also fulfills 0 ≤ g ≤ 1. Then, using the previous remark, we obtain∞∑n=0

|an| ‖gn‖ ≤∞∑n=0

|an| ‖g‖n ≤∞∑n=0

|an| = 2a0 +∞∑n=0

an(−1)n+1 = 2,

10

i.e. the series∑∞

n=0 angn converges absolutely and since C(K) is complete it converges.

Since B is closed, the limit function

√f =

√1− g =

∞∑n=0

angn

is in B.This also proves that |h| =

√h2 ∈ B for any h ∈ B.

Finally

max {h, g} =1

2(h+ g + |h− g|) and min {h, g} =

1

2(h+ g − |h− g|)

prove the remaining assertion.Iterating the above result, we see that B also contains the max and min over finitely

many functions from B.

Proof of theorem 1.23. We prove the case K = R. Let x, y ∈ K be distinct points anda, b ∈ R. By assuption, there is g such that g(x) 6= g(y). Then the function

g := a+b− a

g(y)− g(x)(g − g(x))

is in A and fulfills g(x) = a and g(y) = b.Note that A is also a subalgebra (exercise) to which we can apply the previous lemma.Now choose f ∈ C(K) and ε > 0. For each x, y ∈ K, there is a function gx,y ∈ A such

that gx,y(x) = f(x) and gx,y(y) = f(y). For fixed x ∈ K, define

Vy := {z ∈ K| gx,y(z) < f(z) + ε} = (gx,y − f)−1 (−∞, ε)

is open (as an inverse image of an open interval under a continuous map) and containsy. So the sets (Vy)y∈K form an open cover of K and by compactness, there is a finitesubcover Vy1 , . . . , Vyn . Set

hx = min {gx,y1 , . . . , gx,yn} ∈ A

Then hx(x) = f(x) and hx < f + ε by the definition of the Vy.Now define

Ux := {z ∈ K| hx(z) > f(z)− ε} .

By analogous arguments as above, the sets (Ux)x∈K are an open cover of K and thusthere is a finite subcover Ux1 , . . . , Uxk . Then the function

h = max {hx1 , . . . , hxk} ∈ A

fulfills f − ε < h < f + ε, i.e. |f − h| < ε, i.e. ‖f − h‖ < ε.Since ε was arbitrary, f is a limit point of A and since the latter is closed, we find

f ∈ A.

11

Example 1.26. LetK ⊂ Rn be compact. Then the algebra of polynomials (in n variables)is dense in C(K).

Proof. The polynomials are an algebra of continuous functions containing the constants.Let a = (a1, . . . , an) and b = (b1, . . . , bn) be distinct points of K, i.e. there is i ∈{1, . . . , n} such that ai 6= bi. Then the polynomial (x1, . . . , xn) 7→ xi − ai vanishes on abut not on b so the polynomials separate the points of K.

Definition 1.27. A normed space (V, ‖ · ‖) is called separable, if it admits a countable,dense subset.

Proposition 1.28. A normed space (V, ‖ · ‖) is separable, if and only if there is a count-able set A such that

spanA = {α1a1 + . . .+ αkak| k ∈ N, a1, . . . , ak ∈ A,α1, . . . , αk ∈ K}

is dense.

Proof. Let A ⊂ V be a countable dense subset, then spanA ⊃ A is dense as well.We show the other implication for K = R. For K = C replace Q by Q + iQ below.

Now let A ⊂ V be countable such that spanA is dense. The set

spanQA = {α1a1 + . . .+ αkak| k ∈ N, a1, . . . , ak ∈ A,α1, . . . , αk ∈ Q}

=⋃k∈N

⋃a1∈A

· · ·⋃ak∈A

⋃α1∈Q

· · ·⋃αk∈Q

{α1a1 + . . .+ αkak}

is countable (countable union of countable sets). It is moreover dense in spanA (exercise)and, since the latter is dense in V , also dense in V .

Example 1.29. By the preceding proposition, example 1.21 and example 1.26, the spacesc0, l

1, l2 and C(K) for compact K ⊂ Rn are separable.The space l∞ is not separable. To see this, define the characteristic function (in this

case sequence) 1A of A ⊂ N by

1An =

{1 n ∈ A0 n /∈ A

.

The set of all such characteristic functions is uncountable (it is in bijection with P (N))and for A,B ⊂ N distinct, we have

∥∥1A − 1B∥∥ = 1. Suppose F ⊂ l∞ is dense. Thenfor each A ⊂ N, there is fA ∈ F such that

∥∥fA − 1A∥∥ < 12 . By the triangle inequality,

the elements fA and fB cannot coincide for distinct A,B ⊂ N, hence F must containat least as many elements as P (N), in particular it cannot be countable.

Remark 1.30. Any normed vector space V can be densely embedded into a larger, com-plete normed vector space. The latter space is essentially unique and is called thecompletion of V .

Without proof.

12

1.2 Hilbert SpacesDefinition 1.31. Let V be a vector space. A map 〈 · | · 〉 : V ×V → K is called an innerproduct if

(i) 〈v|w + λz〉 = 〈v|w〉+ λ〈v|z〉 (linearity in the second argument),

(ii) 〈v|w〉 = 〈w|v〉 (conjugate symmetry),

(iii) 〈v|v〉 ≥ 0 (positivity),

(iv) 〈v|v〉 = 0 if and only if v = 0 (definiteness/non-degeneracy)

are fulfilled for any v, w, z ∈ V and λ ∈ K. The pair (V, 〈 · | · 〉) is called an inner productspace. We will also write let V be an inner product space without explicitly mentioningthe inner product.We define ‖v‖ :=

√〈v|v〉 (we show below, that this is actually a norm).

Remark 1.32. From the first two conditions, we obtain conjugate linearity in the firstargument, i.e.

〈v + λw|z〉 = 〈v|z〉+ λ〈w|z〉.

The two linearity properties together are called sesquilinearity (for K = C) or bilinearity(for K = R).The sesquilinearity also implies, that 〈v|w〉 vanishes, whenever either v or w are zero.

Proposition 1.33 (Cauchy-Schwartz-Inequality). Let (V, 〈 · | · 〉) be an inner prod-uct space. Then for any v, w ∈ V , we have

|〈v|w〉| ≤ ‖v‖ ‖w‖ .

Equality holds in the above relation, if and only if v and w are linearly dependent.

Proof. Both assertions are trivial, if w = 0, so assume w 6= 0.From the sequilinearity, we have

0 ≤ 〈v − λw|v − λw〉 = ‖v‖2 − λ〈v|w〉 − λ〈w|v〉+ λλ ‖w‖2

for any λ ∈ K. Substituting 〈v|w〉‖w‖2 for λ yields

0 ≤ ‖v‖2 − |〈v|w〉|2

‖w2‖− |〈v|w〉|‖w‖2

+|〈v|w〉|‖w‖2

= ‖v‖2 − |〈v|w〉|2

‖w‖2

which is the desired inequality.Equality holds if and only if 0 = ‖v − λw‖2, which by definiteness implies v−λw = 0.

Remark 1.34. Note that definiteness is not used in the proof of the inequality.

13

Proposition 1.35. The map v 7→ ‖v‖ is a norm on V . It is called the norm inducedby the inner product.

Proof. We show the triangle inequality:

‖v + w‖2 = 〈v + w|v + w〉 = ‖v‖2 + 〈w|v〉+ 〈v|w〉+ ‖w‖2

= ‖v‖2 + ‖w‖2 + 2<〈v|w〉 ≤ ‖v‖2 + ‖w‖2 + 2 |〈v|w〉|≤ ‖v‖2 + ‖w‖2 + 2 ‖v‖ ‖w‖ = (‖v‖+ ‖w‖)2.

The other norm properties are immediate.

Example 1.36. (i) Kn with

〈(x1, . . . , xn)|(y1, . . . , yn)〉 = x1y1 + · · ·+ xnyn

inducing

‖(x1, . . . , xn)‖ =

√|x1|2 + . . .+ |xn|2

(ii)

l2 =

{(xn)

∣∣∣∣∣ ∑n∈N|xn|2 <∞

}

with

〈(xn)|(yn)〉 =∑n∈N

xnyn. (1.2)

We have to show, that the series actually converges. By the definition of l2, ‖(xn)‖2converges for any (xn). We use the Cauchy-Schartz-inequality on KN to obtain

N∑n=1

|xn| |yn| ≤√|x1|2 + · · ·+ |xn|2

√|y1|2 + · · ·+ |yn|2

≤√∑n∈N|xn|2

√∑n∈N|yn|2 = ‖(xn)‖ ‖(yn)‖ .

Since the right hand side is independent of N , this shows that the series in (1.2)is absolutely convergent and since K is complete, this implies convergence.The induced norm coincides with the norm introduced in example 1.6.

(iii) Let A ⊂ Rn be a measureable set and define

L2(A) :=

{f : A→ K

∣∣∣∣ f measureable with∫A|f |2 dλ <∞

}.

14

and

〈f |g〉 =

∫Afgdλ.

Again we need to show, that this is well defined. In order to do that, apply thearithmetic-geometric mean inequality (pointwise) to the functions |f |2 and |g|2:∣∣fg∣∣ =

√|f |2 |g|2 ≤ |f |

2

2+|g|2

2

and integrate both sides∫A

∣∣fg∣∣dλ ≤ 1

2

(∫A|f |2 dλ+

∫A|g|2 dλ

)<∞.

This shows, that fg is integrable and thus the inner product is well defined. Themap 〈f |g〉 fulfills the properties of an inner product (exercise) except for definite-ness.Similarly to example 1.8, define the subspace

L20(A) :=

{f ∈ L2(A)

∣∣ ‖f‖ = 0}.

Since 〈 · | · 〉 fulfills the Cauchy-Schwartz inequality |〈f |g〉| ≤ ‖f‖ ‖g‖, 〈f |g〉 = 0 ifat least one of f or g is in L2

0(A).On

L2(A) := L2(A)/L20(A)

we can define an inner product⟨f + L2

0(A)∣∣g + L2

0(A)⟩

= 〈f |g〉.

Again we should prove, that this is well defined, so let f , g be two other functionsrepresenting the sets f+L2

0(A) and g+L20(A) respectively, i.e. f−f , g− g ∈ L2

0(A).Then

〈f |g〉 −⟨f∣∣∣ g⟩ = 〈f |g − g〉+

⟨f − f

∣∣∣ g⟩ = 0

so the definition does not depend on the representing function. The new innerproduct inherits the inner product properties from the old one, but it is alsodefinite, since

0 =⟨f + L2

0(A)∣∣f + L2

0(A)⟩

= 〈f |f〉 = ‖f‖2

implies f ∈ L20(A).

The space L2(A) is called the space of square integrable functions over A and itis the Hilbert space appearing in most models of quantum mechanics. We willfrom now on write f for elements of L2(A) even though the elements are strictlyspeaking not functions but classes of functions. The comments from example 1.8about point evaluation of these “functions” apply.The construction above can be done for any measure space.

15

Definition 1.37. An inner product space H is called a Hilbert space, if it is completewith respect to the induced norm.

Proposition 1.38. The space of square integrable functions L2(A) is a Hilbert space.

Proof. By proposition 1.11 it suffices to prove, that any absolutely convergent series isconvergent. Thus let (fn) ⊂ L2(A) be such, that

∑∞n=1 ‖fn‖ = M < ∞. We fix some

actual functions representing the fn, that we will also denote by fn.Define

sn :=n∑k=1

|fn| .

By the triangle inequality we have

‖sn‖ ≤n∑k=1

‖|fn|‖ =n∑k=1

‖fn‖ ≤M,

i.e.∫A s

2ndλ ≤M2.

For any fixed x ∈ A, the sequence sn(x) is monotonously increasing and hence itconverges to s(x) ∈ [0,∞]. By the monotone convergence theorem we have∫

As2dλ =

∫A

limn→∞

s2ndλ = lim

n→∞

∫As2ndλ ≤M2,

which shows, that s is square integrable and in particular, that s(x) <∞ for almost allx ∈ A.Now define

gn(x) :=

n∑k=1

fk(x).

Since the series∑∞

k=1 fk(x) is ablsolutely convergent for almost every x ∈ A, the func-tions gn converge almost everywhere to some function g. By the triangle inequality

|gn| ≤n∑k=1

|fk| = sn ≤ s

which implies in particular |g| ≤ s and shows, that gn and g are square integrable.Moreover

|gn − g|2 ≤ (|gn|+ |g|)2 ≤ 4s2

is an integrable bound and thus by Lebesgue’s theorem

limn→∞

‖gn − g‖2 = limn→∞

∫A|gn − g|2 dλ =

∫A

limn→∞

|gn − g|2 dλ = 0.

This shows, that the series∑∞

n=1 fn converges with respect to the L2-norm.

16

Remark 1.39. The other inner product spaces we have seen so far, i.e. Kn with theusual scalar product and l2 are Hilbert spaces as well.

Remark 1.40. It is easy to verify (exercise), that in an inner product space the parallel-ogram identity holds for all v, w ∈ V :

‖v + w‖2 + ‖v − w‖2 = 2 ‖v‖2 + 2 ‖w‖2 .

In fact, any norm is induced by an inner product if and only if it fulfills the aboveidentity, in which case the inner product can be reconstructed using the polarizationidentity.Applying the parallelogram identity to v − u,w − u yields the alternative form

‖v − w‖2 = 2 ‖v − u‖2 + 2 ‖w − u‖2 − 4

∥∥∥∥1

2(v + w)− u

∥∥∥∥2

. (1.3)

Definition 1.41. Let V be a vector space, a subset A ⊂ V is called convex, if for anyv, w ∈ V , the line segment connecting them

{λv + (1− λ)w| λ ∈ [0, 1]}

is contained in A.

Definition 1.42. Let (V, ‖ · ‖) be a normed vector space, A ⊂ V and x ∈ V . We define

dist(x,A) = inf { ‖x− y‖| y ∈ A} .

Remark 1.43. In general, there is no element actually minimizing the distance, as iswitnessed by the example A = (0, 1), x = 2. In infinite dimensional Banach spaces, thiscan arise even if A is closed.However, for closed A, dist(x,A) = 0 the existence of a sequence (yn) ⊂ A such that‖yn, x‖ → ∞, i.e. yn → x. Thus if A is closed, dist(x,A) = 0 implies x ∈ A.

Theorem 1.44 (Hilbert Projection Theorem). Let H be a Hilbert space A ⊂ Vconvex and closed and x ∈ H. Then there is a unique element y ∈ A that realizes theminimal distance, i.e.

dist(x,A) = ‖x− y‖ .

Proof. By the definition of d := dist(x,A), there is a sequence (yn) ⊂ A such that‖x− yn‖ → d. Hence for any ε > 0, there is N ∈ N such that ‖x− yn‖2 ≤ d2 + ε2

2 forn ≥ N . Apply (1.3) to get

‖ym − yn‖2 = 2 ‖x− yn‖2 + 2 ‖x− ym‖2 − 4

∥∥∥∥1

2(yn + ym)− x

∥∥∥∥2

.

17

Since by assumption 12(yn + ym) ∈ A, for n,m ≥ N we get the estimate

‖ym − yn‖2 ≤ 2

(d2 +

ε2

2

)+ 2

(d2 +

ε2

2

)− 4d2 = ε.

Thus (yn) is a Cauchy sequence and since H is complete, it converges to y. Since A isclosed, y is in A. By the continuity of the norm we get ‖x− y‖ = limn→∞ ‖x− yn‖ = d.Assume z ∈ A is another element such that ‖x− z‖ = d, then using (1.3) again, we

get

‖y − z‖2 = 2 ‖y − x‖2 + 2 ‖z − x‖2 − 4

∥∥∥∥1

2(y + z)− x

∥∥∥∥2

≤ 2d+ 2d− 4d = 0,

i.e. y = z.

Remark 1.45. In general Banach spaces, the corresponding theorem can fail. Uniquenesscan fail even in finite dimensions, e.g. for A =

{(α, 0) ∈ R2

∣∣ α ∈ R}and x = (0, 1) for

the maximum norm ‖(α, β)‖ = max {α, β}. In infinite dimensions, existence can faileven for the case where A is a closed subspace.

Definition 1.46. Let H be a Hilbert space. Two elements x, y ∈ H are called orthog-onal if 〈x|y〉 = 0.Let A ⊂ H, then

A⊥ := {x ∈ H| 〈x|y〉 = 0 for every y ∈ A}

is called the orthogonal complement of A.

Proposition 1.47. Let H be a Hilbert space and A ⊂ B ⊂ H. Then A⊥ is a closedsubspace and A⊥ ⊃ B⊥.

Proof. Exercise.

Proposition 1.48. Let V ⊂ H be a closed subspace. Then V and V ⊥ are complemen-tary, i.e. H = V ⊕ V ⊥, i.e. for any x ∈ H, there are unique x‖ ∈ V and x⊥ ∈ V ⊥ suchthat x = x‖ + x⊥.

Proof. Let’s show the uniqueness first. So assume we have x‖, y ∈ V and x⊥, z ∈ V ⊥,such that x = x‖+x⊥ = y+ z. Then V 3 x‖− y = z−x⊥ ∈ V ⊥ and from the definitionof V ⊥

‖z − x⊥‖2 =⟨x‖ − y

∣∣z − x⊥⟩ = 0.

By definiteness of the norm we get z−x⊥ = x‖−y = 0 which is the required uniqueness.The subspace V is in particular convex. So we can apply theorem 1.44 to get the

unique x‖ ∈ V such that dist(x, V ) =∥∥x− x‖∥∥. We can define x⊥ = x − x‖ and it

remains to show, that x⊥ ∈ V ⊥. For any y ∈ V the function

t 7→∥∥x− x‖ − ty∥∥2

= ‖x⊥‖2 − 2t<〈x⊥|y〉+ t2 ‖y‖2

18

has a global minimum at t = 0 (since x‖ minimizes the distance from x to V ). Butthe function is differentiable (even a polynomial) hence it’s first derivative in t = 0, i.e.<〈x⊥|y〉 must vanish. If K = R, this already show that x⊥ ∈ V ⊥. For K = C, consideralso t 7→

∥∥x− x‖ − ity∥∥2 and an analogous argument shows =〈x⊥|y〉 = 0.

Thus for any y ∈ V , 〈x⊥|y〉 = 0 hence x⊥ ∈ V ⊥.

Corollary 1.49. For any subset A of a Hilbert space H, (A⊥)⊥ = spanA. For a closedsubspace V in particular, we obtain (V ⊥)⊥ = V and spanA

⊥= A⊥.

A subspace W ⊂ H is dense in H, if and only if W⊥ = {0}.

Proof. Let y be in A. Then 〈y|x〉 = 0 for every x ∈ A⊥, hence y ∈ (A⊥)⊥. ThusA ⊂ (A⊥)⊥ and, since spanA is the smallest closed subspace containing A, also spanA ⊂(A⊥)⊥.Now let x be in (A⊥)⊥. Applying the previous theorem we get x‖ ∈ spanA and

x⊥ ∈ spanA⊥ ⊂ A⊥ such that x = x‖ + x⊥. We have

0 = 〈x|x⊥〉 =⟨x‖∣∣x⊥⟩+ 〈x⊥|x⊥〉 = 〈x⊥|x⊥〉

which implies x⊥ = 0 and thus x = x‖ ∈ spanA. Since x was arbitrary, this yields theinclusion (A⊥)⊥ ⊂ spanA.If V is already a closed subspace, i.e. V = spanV = spanV we get (V ⊥)⊥ = V . Now

apply this to the closed subspace A⊥ to get

A⊥ = ((A⊥)⊥)⊥ = spanA⊥.

Finally, note that {0}⊥ = H and H⊥ = {0}. So let W be dense, then W = H andthus W⊥ = W

⊥= H⊥ = {0}. If on the other hand W⊥ = {0}, then W = (W⊥)⊥ = H,

i.e. W is dense.

Remark 1.50. Let H be a Hilbert space, V a closed subspace and x ∈ H. Then we canuniquely decompose x a x = x‖+x⊥ with x‖ ∈ V and x⊥ ∈ V ⊥. The vector x‖ is calledthe projection of x to V . Since (V ⊥)⊥ = V , x⊥ is the projection of x to V ⊥. Note thatx = x‖ if and only if x ∈ V .

Remark 1.51. In the following we will often deal with sequences, that can be finite orcountably infinite. In order to speak about both cases at once, I will be either of theindex sets N or {1, . . . , N} for some N ∈ N. Some of the assertions are also true formore general index sets (in particular in the case of non-separable Hilbert spaces).

Definition 1.52. A family (en)n∈I is called an orthogonal system if en and em areorthogonal whenever n 6= m and an orthonormal system (ONS) if in addition ‖en‖ = 1for every n ∈ I.An ONS is called an orthonormal basis (ONB) or complete orthonormal system if

span {en| n ∈ I} is dense in H.

19

Example 1.53. The functions (ϕn)n∈Z, defined by ϕn(x) = e2πinx are an orthonormalsystem in L2([0, 1]) since

〈ϕn|ϕm〉 =

∫ 1

0e−2πinxe2πimxdx =

∫ 1

0e2πi(m−n)xdx =

{1 m = n

0 m 6= n.

These functions are actually an orthonormal basis. The proof of this fact is surprisinglynon-trivial and we will not give it here.

Remark 1.54. Let (en)n∈I be an orthonormal system and (αn)n∈I , (βn)n∈I ⊂ K se-quences of coefficients. Then for any N ∈ I⟨

N∑n=1

αnen

∣∣∣∣∣N∑k=1

βkek

⟩=

N∑n=1

αnβn.

Since the scalar product is continuous, this equation still holds for N =∞ provided theseries

∑∞n=1 αnen and

∑∞n=1 βnen converge. In particular the norm is given by∥∥∥∥∥∑

n∈Iαnen

∥∥∥∥∥2

=∑n∈I|αn|2 .

Proposition 1.55. Let (en)n∈N be an orthonormal system and (αn)n∈N ⊂ K. Then∑n∈N αnen converges if and only if

∑n∈N |αn|

2 <∞.

Proof. If∑

n∈N αnen converges then there is N ∈ N such that

1 >

∥∥∥∥∥∞∑n=1

αnen −N∑n=1

αnen

∥∥∥∥∥2

=

∥∥∥∥∥∞∑

n=N+1

αnen

∥∥∥∥∥ =∞∑

n=N+1

|αn|2 .

which implies∑

n∈N |αn|2 <∞.

If∑

n∈N |αn|2 <∞ holds, then for every ε > 0, there is N ∈ N such that

∑∞n=N |αn|

2 ≤ε2. So for M > K ≥ N ,∥∥∥∥∥

M∑n=1

αnen −K∑n=1

αnen

∥∥∥∥∥2

=

∥∥∥∥∥M∑

n=K+1

αnen

∥∥∥∥∥2

=

M∑n=K+1

|αn|2 < ε2

shows, that the sequence of partial sums is a Cauchy sequence and hence convergent.

Proposition 1.56 (Bessel’s inequality). Let H be a Hilbert space. Let (en)n∈I be anorthonormal system. Then Bessel’s inequality∑

n∈I|〈en|x〉|2 ≤ ‖x‖2

holds for any x ∈ H, which implies in particular the convergence of∑

n∈I ek〈ek|x〉.

20

Proof. The inequality follows from

0 ≤

∥∥∥∥∥x−N∑n=1

en〈en|x〉

∥∥∥∥∥2

= ‖x‖2 −N∑n=1

〈x|en〉〈en|x〉 −N∑n=1

〈ek|x〉〈ek|x〉+N∑n=1

N∑k=1

〈en|x〉〈ek|x〉〈en|ek〉

= ‖x‖2 −N∑n=1

|〈en|x〉|2

which holds for any N ∈ I proving Bessels inequality for finite I. For infinite I take thelimit for N →∞.The convergence of

∑n∈I en〈en|x〉 follows from Bessel’s inequality and the previous

proposition.

Remark 1.57. The element y :=∑

i∈I en〈en|x〉 is the projection of x to the closed sub-space spanned by the orthonormal system. In particular we have x =

∑i∈I en〈en|x〉 if

(en) is an orthonormal basis.

Proof. Define V := span {en| n ∈ I}. Obviously x = y + (x − y) with y ∈ V so byproposition 1.48 it suffices to show, that x − y ∈ V ⊥. For any k ∈ I, 〈ek|x− y〉 =〈ek|x〉 − 〈ek|x〉 = 0 hence

x− y ∈ {en| n ∈ I}⊥ = V ⊥.

Note that this also yields a method to calculate the projection onto an arbitrarysubspace V by choosing an orthonormal basis (see below).If the (en) are an orthonormal basis, then V ⊥ = {0} (corollary 1.49) so

x =∑n∈I

en〈en|x〉.

Corollary 1.58 (Parseval’s Identity). Let (en)n∈I be an orthonormal system in aHilbert space H. Parseval’s identity∑

n∈I|〈en|x〉|2 = ‖x‖2

holds for any x ∈ H if and only if (en) is an orthonormal basis.

Proof. As we have just seen, x =∑

n∈I en〈en|x〉 if (en) is an orthonormal basis. Parse-val’s identity then follows from remark 1.54.On the other hand, suppose that Parseval’s identity holds for any x ∈ H. For x ∈{en| n ∈ I}⊥, this implies ‖x‖2 = 0 and thus x = 0. Thus {en| n ∈ I}⊥ = {0} and,taking the complement again

H = span {en| n ∈ I}.

21

Remark 1.59. Given an orthonormal basis (en) of a Hilbert space, for each x we get theunique decomposition with respect to that basis

x =∑n∈I

en〈en|x〉.

In example 1.53 this decomposition yields the Fourier series. Note that this series isunconditionally convergent, thus the ordering of the basis elements does not matter.

Proposition 1.60 (Gram-Schmidt-Orthogonalization). Let (xn)n∈I be a linearlyindependent sequence in a Hilbert space H, then there exists an orthonormal system(en)n∈I (of the same cardinality) such that

span {x1, . . . , xn} = span {e1, . . . , en} .

for any n ∈ I. In the infinite case, we also get

span {xn| n ∈ N} = span {en| n ∈ N} .

Proof. We show the case I = N. Since the xn are linearly independent, x1 6= 0. So sete1 := x1

‖x1‖ .< Then obviously x1 and e1 span the same subspace of H. Now we proceedby induction. Suppose we have an orthonormal system e1, . . . , en such that

Vn := span {x1, . . . , xn} = span {e1, . . . , en} .

The space Vn is closed (all finite dimensional subspaces are). Let z be the projection ofxn+1 to Vn and en+1 = xn+1−z

‖xn+1−z‖ . Since the xn are linearly independent, xn+1 is not inVn, so xn+1 − z is non-zero and en+1 hence well defined. Moreover

span {x1, . . . , xn, xn+1} = span {e1, . . . , en, en+1}

holds. Hence the inductively defined sequence (en) is an orthonormal system fulfillingthe conclusion of the theorem.In the infinite case, we rewrite

span {xn| n ∈ N} =⋃n∈N

span {x1, . . . , xn} =⋃n∈N

span {e1, · · · , en} = span {en| n ∈ N} .

Proposition 1.61. Let H be a separable Hilbert space, then it admits an orthonormalbasis.

Proof. Since H is separable, there is a sequence (xn)n∈N that is dense in H. Choosen1 = 1 and, given n1, . . . , nk inductively defined nk+1 to be the smallest index, suchthat xnk+1

is not a linear combination of xn1 , . . . , xnk (terminate the process, if thereis no such index). The (possibly finite) subsequence yk = xnk so defined is linearlyindependent and fulfills

span {xn| n ∈ N} = span {yk| k ∈ N} .

22

Now apply the Gram-Schmidt-orthonormalization to this subsequence resulting in anorthonormal system (ek)k∈N still spanning the same vector space. Since by assumption

span {ek| k ∈ N} = span {xn| n ∈ N}

is dense in H, the (ek) form an orthonormal basis.

Corollary 1.62. In a separable Hilbert space, any orthonormal system can be extendedto an orthonormal basis.

Proof. Let (xn)n∈I be an orthonormal system. Then V = {xn| n ∈ I}⊥ is a closedsubspace of H and thus also a Hilbert space. The space V is also separable (exercise),so it admits an othonormal basis (yn)n∈J . The union of these two bases form a basis forH.

Remark 1.63. In quantum mechanics, the state space of a quantum system is modelledby a separable Hilbert space H (any particular one will do). More precisely the physicalstates are elements x ∈ H fulfilling ‖x‖ = 1.The most common model is L2(R3) for the state space of a single quantum mechanical

particle (much like R6 would be the state space of a single classical particle). Usuallythe function |ϕ|2 is the probability density for the location of the particle, however thestate ϕ contains more (physically relevant) information than this probability density. Ifwe are interested in other observables than location or dynamics (i.e. time evolution ofstates), we need to consider operators on the Hilbert space.

23

1.3 Linear OperatorsDefinition 1.64. Let V,W be vector spaces. A linear map from V to W is a mapT : V →W , such that T (x+ λy) = T (x) + λT (y) for all x, y ∈ V and λ ∈ K.Linear maps from V to V are called (linear) operators or endomorphisms on V .We will often omit the brackets writing Tx for T (x).

Remark 1.65. In finite dimensions, all linear maps are automatically continuous (withrespect to any choice of the equivalent norms). This is not true for infinite dimensionalspaces any more.

Proposition 1.66. Let (V, ‖ · ‖V ), (W, ‖ · ‖W ) be normed vector spaces and T : V →Wa linear map. Then the following are equivalent:

(i) T is uniformly continuous

(ii) T is continuous,

(iii) T is continuous in 0,

(iv) there is M ∈ R such that ‖Tx‖W ≤M for any x ∈ V with ‖x‖V = 1,

(v) there is M ∈ R such that ‖Tx‖W ≤M ‖x‖V for any x ∈ V .

Proof. The implications “(i) =⇒ (ii)” and “(ii) =⇒ (iii)” are trivial.If T fulfils (iii), then there is δ > 0 such that ‖Tx‖W = ‖T (x)− T (0)‖W < 1 for all

x ∈ V fulfilling ‖x‖V = ‖x− 0‖V < δ. For y ∈ V with ‖y‖V = 1, we have∥∥∥∥T (δ2y)∥∥∥∥

W

< 1⇐⇒ ‖Ty‖W <2

δ,

which is (iv)If (iv) holds for M ∈ R, then for any non-zero x ∈ V

‖Tx‖W‖x‖V

=

∥∥∥∥T ( x

‖x‖V

)∥∥∥∥W

≤M

shows that ‖Tx‖W ≤M ‖x‖V (this inequality is trivial if x = 0) and hence (v).Finally suppose (v) holds. For any ε > 0 and for any x, y ∈ V fulfilling ‖x− y‖V < ε

Mwe have

‖Tx− Ty‖W ≤M ‖x− y‖V ≤ ε

which proves the uniform continuity of T .

24

Definition 1.67. Let (V, ‖ · ‖V ), (W, ‖ · ‖W ) be normed vector spaces. An linear mapT : V →W is called bounded, if it fulfils one of the conditions of the previous proposition.The space of all bounded linear maps between V and W will be denoted by B(V,W )and we write B(V ) for B(V, V ).On B(V,W ) we define the operator norm

‖T‖ = sup { ‖Tx‖W | x ∈ V and ‖x‖V = 1} . (1.4)

An linear map with values in the base field K is called a linear functional. We definethe dual space V ′ := B(V,K) as the space of bounded linear functionals.

Remark 1.68. (i) With the pointwise definition of addition and scalar multiplication,i.e.

(T + S)x := Tx+ Sx

(λT )x := λTx,

for any x ∈ V and any λ ∈ K the space B(V,W ) is a vector space (exercise).

(ii) For vector spaces V,W,U and linear maps T : V → W and S : W → U , theconcatenation S ◦ T is a linear map as well. We will also write ST for this map.

(iii) We will often simply use ‖ · ‖ for the various norms appearing whenever the contextis sufficient to disambiguate the symbol.

(iv) Note that the notation is slightly ambiguous. A map f : X → V is usually calledbounded if there is M > 0 such that ‖f(x)‖ ≤M for all x ∈ X. If f is linear, it isbounded in this sense if and only if f is constant zero. So “bounded linear map”or “bounded operator” always refers to definition 1.67.

(v) The operator norm is indeed a norm. We show the triangle inequality. For anyx ∈ V with ‖x‖ = 1 we get

‖(T + S)x‖ ≤ ‖Tx‖+ ‖Sx‖ ≤ ‖T‖+ ‖S‖ .

Taking the supremum over all such x yields ‖T + S‖ ≤ ‖T‖+ ‖S‖.

(vi) In definition (1.4) we can replace ‖x‖V = 1 by ‖x‖V ≤ 1 without changing it(Why?).

(vii) If for some M > 0, ‖Tx‖ ≤ M ‖x‖ holds for all x ∈ V , then ‖T‖ ≤ M . On theother hand, we have ‖Tx‖ ≤ ‖T‖ ‖x‖ for any x ∈ V and any T ∈ B(V,W ). Thuswe have

‖T‖ = inf {M > 0| ‖Tx‖ ≤M ‖x‖ for all x ∈ V } .

25

(viii) For normed vector spaces V,W,U and bounded maps T : V →W and S : W → Uwe have (exercise)

‖ST‖ ≤ ‖S‖ ‖T‖ .

In particular the space B(V ) is a (generally non-commutative) normed algebra.

Example 1.69. (i) Let V be any vector space. Then the identity operator I : V → Vdefined by Ix = x is a linear operator. It is always bounded.

(ii) Let A ⊂ Kn×m, then

TA : Km → Kn

x 7→ Ax

is a bounded linear map.

(iii) Let K ⊂ Rn and x ∈ K. The evaluation map

ϕx : C(K)→ Kf 7→ f(x)

is a linear functional on C(K). Since

|ϕx(f)| = |f(x)| ≤ ‖f‖ ,

holds for any f ∈ C(K), it is bounded with ‖ϕx‖ = 1.

(iv) Let A ⊂ Rn and f : A→ K be a bounded, measurable function. Then

Tf : L2(A)→ L2(A)

g 7→ fg

is a linear operator, a so-called multiplication operator. Since

‖Tfg‖2 =

∫A|fg|2 dλ ≤

∫A‖f‖2∞ |g|

2 dλ = ‖f‖2∞ ‖g‖2

holds for any g ∈ L2(A), Tf is bounded and ‖Tf‖ ≤ ‖f‖∞.

(v) On

D :={f ∈ C1((0, 1)) ∩ L2((0, 1))

∣∣ f ′ ∈ L2((0, 1))}

define

T : D→ L2((0, 1))

f 7→ f ′.

26

This is a linear operator. If we equip D with the L2-norm, the operator T is notbounded since (ϕn) ⊂ D, ϕn(x) = einx fulfills ‖ϕn‖ = 1 but

‖Tϕn‖2 =

∫ 1

0

∣∣ϕ′n(x)∣∣2 dx =

∫ 1

0n2 |ϕn(x)|2 dx = n2

diverges to ∞. Unbounded operators in general and the above example in partic-ular do appear in physical applications. They are however more difficult to treatand we will probably not have time to talk about them.

(vi) Let A ⊂ Rn be measurable. Then

ϕ : L1(A)→ K

f 7→∫Afdλ

is a linear functional. It is bounded, since for f ∈ L1(A) we have

|ϕ(f)| =∣∣∣∣∫Afdλ

∣∣∣∣ ≤ ∫A|f | dλ = ‖f‖ .

Proposition 1.70. Let V,W be normed vector spaces. Then B(V,W ) is a Banachspace, if W is.

Proof. Let (Tn) ⊂ B(V,W ) be such, that∑∞

n=1 ‖Tn‖ <∞. For any x ∈ V∞∑n=1

‖Tnx‖ ≤∞∑n=1

‖Tn‖ ‖x‖ = ‖x‖∞∑n=1

‖Tn‖ <∞

shows, that∑∞

n=1 Tnx is absolutely convergent and hence, since W is complete, alsoconvergent. Define Sx =

∑∞n=1 Tnx. One easily verifies, that S is linear (exercise).

Moreover by the generalized triangle inequality

‖Sx‖ ≤∞∑n=1

‖Tnx‖ ≤

( ∞∑n=1

‖Tn‖

)‖x‖

holds for any x ∈ V , hence S is bounded with ‖S‖ ≤∑∞

n=1 ‖Tn‖. By a similar argument,we get ∥∥∥∥∥

(S −

N∑n=1

Tn

)x

∥∥∥∥∥ =

∥∥∥∥∥∞∑

n=N+1

Tnx

∥∥∥∥∥ ≤( ∞∑n=N+1

‖Tn‖

)‖x‖

for any x ∈ V which implies ∥∥∥∥∥S −N∑n=1

Tn

∥∥∥∥∥ ≤∞∑

n=N+1

‖Tn‖ .

27

Since the right hand side converges to 0 for N →∞, this proves the convergence of theseries

∑∞n=1 Tn with respect to the operator norm. We proved, that every absolutely

convergent series in B(V,W ) is convergent which implies completeness of the space byproposition 1.11.

Proposition 1.71. Let H,G be Hilbert spaces, x ∈ H and A : H → G a linear map.Then

‖x‖ = sup { |〈x|y〉|| y ∈ H, ‖y‖ ≤ 1}

and

‖A‖ = sup { |〈x|Ay〉|| x ∈ G, y ∈ H, ‖x‖ ≤ 1, ‖y‖ ≤ 1} .

Proof. From the Cauchy Schwarz inequality, we get

|〈x|y〉| ≤ ‖x‖ ‖y‖

we get

sup { |〈x|y〉|| y ∈ H, ‖y‖ ≤ 1} ≤ ‖x‖ .

Since moreover ∣∣∣∣⟨x∣∣∣∣ x

‖x‖

⟩∣∣∣∣ = ‖x‖ ,

we have the first equality.Applying this to Ay,

‖Ay‖ = sup { |〈Ay|x〉|| x ∈ G, ‖x‖ ≤ 1}

yields

‖A‖ = sup { ‖Ay‖| y ∈ H, ‖y‖ ≤ 1} = sup { |〈x|Ay〉|| x ∈ G, y ∈ H, ‖x‖ ≤ 1 ‖y‖ ≤ 1} .

Definition 1.72. Let V,W be vector spaces. The kernel of a linear map T : V →W isthe set

kerT = {x ∈ V | Tx = 0} ⊂ V

The image of T is

imT = {Tx| x ∈ V } .

28

Remark 1.73. The sets kerT and imT are linear subspaces (exercise). If T is bounded,then kerT is closed, for if (xn) ⊂ kerT converges to x, then due to the continuity of T ,Tx = limn→∞ Txn = 0 and hence x ∈ kerT . The image of T need not be closed, evenfor bounded T .As an example T consider multiplication by

(1n

)n∈N on l1, i.e.

(x1, x2, x3 . . .) 7→(x1,

1

2x2,

1

3x3, . . .

).

The operator T is bounded. The image contains the finite sequences cc since

T (x1, 2x2, 3x3, . . . , nxn, 0, 0, . . .) = (x1, x2, x3, . . . , xn, 0, 0, . . .)

so it is dense in l1. However, imT is not the entire space since(1,

1

22,

1

32, . . .

)/∈ imT

since it’s alleged pre-image(1, 1

2 ,13 , . . .

)is not in l1.

Remark 1.74. A linear map T : V → W is injective if and only if kerT = {0}. It issurjective if and only if imT = W (exercise).

Definition 1.75. Let V,W be normed spaces. An linear map T : V →W is called

(i) a contraction, if ‖Tx‖ ≤ ‖x‖ for any x ∈ V (equivalently if ‖T‖ ≤ 1),

(ii) an isometry, if ‖Tx‖ = ‖x‖ for any x ∈ V ,

(iii) an isometric isomorphism, if it is a surjective isometry.

Remark 1.76. Any isometry is automatically injective, since ‖Tx‖ = ‖x‖ implies, thatkerT = {0}.Let T : V →W be an isometric isomorphism. Then T is in particular bijective hence

it has an inverse map T−1 which is also bijective. For any w1, w2 ∈ W , there are v1, v2

such that Tv1 = w1 and Tv2 = w2. Then

T−1(w1 + λw2) = T−1(Tv1 + λTv2) = T−1T (v1 + λv2) = v1 + λv2 = T−1w1 + λT−1w2

for any λ ∈ K which proves the linearity of T−1.Moreover, for w = Tv ∈W∥∥T−1w

∥∥ = ‖v‖ = ‖Tv‖ = ‖w‖

holds, i.e. T−1 is an isometric isomorphism as well.An isometric isomorphism preserves all the structure that defines a normed vector

space. So if two spaces are isometrically isomorphic, then they are essentially “the samespace”, we just chose different names to label their objects.

29

If T is an operator between Hilbert spaces fulfilling

〈Tx|Ty〉 = 〈x|y〉

for any x, y, then it is clearly an isometry. On the other hand, the scalar product canbe expressed in terms of the norm alone (via the polarization identity) so any isometrybetween Hilbert spaces also preserves the scalar product.

Example 1.77. The spaces (l1)′ and l∞ are isometrically isomorphic.

Proof. We will use the standard unit sequences (ekn) defined by ekn = 1 if and only ifk = n and 0 otherwise. Note that these have l1-norm 1.For fixed (xn), define T (xn) : l1 → K by

(T (xn))(yn) :=∑n∈N

xnyn. (1.5)

Then

|(T (xn))(yn)| =

∣∣∣∣∣∑n∈N

xnyn

∣∣∣∣∣ ≤∑n∈N|xn| |yn| ≤ ‖(xn)‖∞

∑n∈N|yn| = ‖(xn)‖∞ ‖(yn)‖1

shows, that the sum in (1.5) converges for (yn) ∈ l1. The expression (1.5) is linear in(yn) hence T (xn) is a bounded linear functional on l1.We saw already, that ‖T (xn)‖ ≤ ‖(xn)‖∞. Since

∣∣(T (xn))(ekn)∣∣ = |xk| implies ‖T (xn)‖ ≥

|xk| for any k ∈ N, taking the supremum over k yields ‖T (xn)‖ ≥ ‖(xn)‖∞, i.e.‖T (xn)‖ = ‖(xn)‖∞.The expression (1.5) is also linear in (xn), hence

T : l∞ → (l1)′

(xn) 7→ T (xn)

is linear and as we have shown above an isometry.It remains to show, that T is surjective. In order to do so, first note that for (yn) ∈ l1∥∥∥∥∥(yn)−

N∑n=1

yk(ekn)

∥∥∥∥∥ =

∞∑n=N+1

|yn| → 0

implies that∑∞

n=1 yk(ekn) converges to (yn) in the l1-norm.

So let ϕ ∈ (l1)′ and define (ϕ(ekn))k∈N. The sequence is bounded by ‖ϕ‖ and hence inl∞. For any (yn) ∈ l1, using the linearity and continuity of ϕ, we get(

T(ϕ(ekn)

)k∈N

)(yk) =

∞∑k=1

ϕ(ekn)yk = ϕ

( ∞∑k=1

yk(ekn)

)= ϕ(yn).

Since this is true for any (yn) ∈ l1, we get T(ϕ(ekn)

)= ϕ and since ϕ was arbitrary, T

is surjective.

30

Remark 1.78. Let H be a Hilbert space and x ∈ H. Then y 7→ 〈x|y〉 is a bounded linearfunctional. The boundedness follows from the Cauchy Schwarz inequality:

|〈x|y〉| ≤ ‖x‖ ‖y‖ .

Hence we can define a map

T : H → H ′

x 7→ 〈x| · 〉.

This map is conjugate linear, since the scalar product is conjugate linear in the firstargument. We can calculate the norm of Tx using proposition 1.71:

‖Tx‖ = sup { |〈x|y〉|| y ∈ H} = ‖x‖ .

In particular, T is injective, i.e. 〈x|y〉 = 〈z|y〉 for all y ∈ H implies x = z.So T is a conjugate linear isometric map from H into H ′. Thus we can think of H as

a subspace of H ′.

Theorem 1.79 (Riesz Representation Theorem). The map from the previous re-mark is actually surjective, i.e. for each bounded linear functional ϕ on H, there is aunique x ∈ H fulfilling ‖ϕ‖ = ‖x‖ such that ϕ(y) = 〈x|y〉 for all y ∈ H.

Proof. Let ϕ be in H ′. For ϕ = 0, x = 0 fulfils the requirements.So assume ϕ 6= 0, i.e. kerϕ 6= H. Then kerϕ is a closed subspace of H. The space

(kerϕ)⊥ cannot be the trivial space {0} for otherwise ((kerϕ)⊥)⊥ = kerϕ would be theentire space H. Choose some x 6= 0 from (kerϕ)⊥ and define x = ϕ(x)x

‖x‖2 ∈ (kerϕ)⊥. Thiselement fulfils

ϕ(x) =|ϕ(x)|2

‖x‖2= ‖x‖2

For y ∈ H there are unique y‖ ∈ kerϕ and y⊥ ∈ (kerϕ)⊥ such that y = y‖ + y⊥.There is κ ∈ K, such that ϕ(y⊥) = κϕ(x) i.e. y⊥ − κx ∈ kerϕ, but since y⊥ and x arein (kerϕ)⊥ this implies y⊥ − κx = 0. Thus

〈x|y〉 =⟨x∣∣y‖⟩+ 〈x|y⊥〉 = 〈x|κx〉 = κ ‖x‖2 = κϕ(x) = ϕ(y⊥) = ϕ(y‖) + ϕ(y⊥) = ϕ(y)

holds for any y ∈ H. The uniqueness of x is left as an exercise.

Proposition 1.80. Let H be a Hilbert space and A ∈ B(H). Then there is a uniqueA∗ ∈ B(H), called the adjoint of A, such that

〈x|Ay〉 = 〈A∗x|y〉

holds for any x, y ∈ H.The operation ∗ : B(H)→ B(H), A 7→ A∗

31

(i) is conjugate linear,

(ii) is idempotent, i.e. (A∗)∗ = A for any A ∈ B(H),

(iii) fulfils (AB)∗ = B∗A∗, for any A,B ∈ B(H),

(iv) is an isometry and

(v) fulfils ‖A∗A‖ = ‖AA∗‖ = ‖A‖2 for any A ∈ B(H).

Proof. Let A be a bounded linear operator on H. For fixed x ∈ H, the map y 7→ 〈x|Ay〉is a linear functional, so there is a uniquely defined element of H, which we will denoteby A∗x, such that

〈A∗x|y〉 = 〈x|Ay〉

for any y ∈ H.Varying x, this yields a map x 7→ A∗x which fulfils

〈A∗(x+ λy)|z〉 = 〈x+ λy|Az〉 = 〈x|Az〉+ λ〈y|Az〉 = 〈A∗x|z〉+ λ〈A∗y|z〉= 〈A∗x+ λA∗y|z〉

for any x, y, z ∈ H and λ ∈ K. Since the equality holds for any z ∈ H, we get A∗(x +λy) = A∗x+ λA∗y so A∗ is linear a linear operator.For x, y ∈ H, we have

|〈x|Ay〉| = |〈y|A∗x〉|

from which we get ‖A‖ = ‖A∗‖ taking the supremum over all x, y ∈ H with norm ≤ 1and using the norm formula from proposition 1.71. This shows, that A∗ is a boundedoperator.Now suppose B ∈ B(H) is another operator fulfilling 〈Bx|y〉 = 〈x|Ay〉 for any x, y ∈

H. Then 〈(A∗ −B)x|y〉 = 〈A∗x|y〉 − 〈Bx|y〉 = 〈x|Ay〉 − 〈x|Ay〉 = 0 and taking thesupremum over x, y with norm ≤ 1 yields ‖A∗ −B‖ = 0, i.e. B = A∗. So the adjoint isunique.The isometry of the map A 7→ A∗ was already shown above.For any A,B ∈ B(H), λ ∈ K and x, y ∈ H we have

〈(A+ λB)∗x|y〉 = 〈x|(A+ λB)y〉 = 〈x|Ay〉+ λ〈x|By〉 = 〈A∗x|y〉+ λ〈B∗x|y〉=⟨(A∗ + λB∗)x

∣∣y⟩.Since the adjoint operator is uniquely defined, this implies (A+ λB)∗ = A∗ + λB∗.Similarly we get for any x, y ∈ H

〈(AB)∗x|y〉 = 〈x|ABy〉 = 〈A∗x|By〉 = 〈B∗A∗x|y〉

32

and

〈x|(A∗)∗y〉 = 〈A∗x|y〉 = 〈x|Ay〉

which using the uniqueness of the adjoint yields (AB)∗ = B∗A∗ and (A∗)∗ = A.Finally, we already know, that ‖A∗A‖ ≤ ‖A∗‖ ‖A‖ = ‖A‖2. On the other hand, using

the Cauchy Schwarz inequality we get

‖Ax‖2 = 〈Ax|Ax〉 = 〈x|A∗Ax〉 ≤ ‖x‖ ‖A∗Ax‖ ≤ ‖A∗A‖ ‖x‖2 .

Taking the square root and the supremum over all x with ‖x‖ ≤ 1 yields ‖A‖ ≤√‖A∗A‖

and hence ‖A‖2 = ‖A∗A‖. Now apply this to A∗ to get the missing equality.

Remark 1.81. A Banach algebra A with a map A 3 A 7→ A∗ ∈ A fulfilling conditions (i)to (iv) is called a Banach-∗-algebra. If it also fulfils (v), it is called a C∗-algebra. Hencewe proved, that for a Hilbert space H, B(H) with taking adjoints as the ∗-operation isa C∗-algebra.Another example of a C∗-algebra is the space C(K) of continuous functions on some

compact set K with complex conjugation as the ∗-operation (verify that as an exercise).It is possible to formulate quantummechanics purely in terms of (abstract) C∗-algebras

dispensing with Hilbert spaces altogether. This approach is used in particular in ax-iomatic quantum field theory.

Definition 1.82. Let H be a Hilbert space and A ∈ B(H). The operator A is called

(i) normal if AA∗ = A∗A,

(ii) self-adjoint if A = A∗,

(iii) unitary if AA∗ = A∗A = I,

(iv) a projection if A2 = A and

(v) an orthoprojection if A is a self-adjoint projection.

Remark 1.83. Self-adjoint operators are the observables of quantum mechanics. Notethat for A self-adjoint,

〈x|Ax〉 = 〈Ax|x〉 = 〈x|Ax〉,

so 〈x|Ax〉 is real. We will later show, that the spectrum of self-adjoint operators is realas well.For a unitary operator U , we have

〈Ux|Uy〉 = 〈U∗Ux|y〉 = 〈x|y〉

for any x, y ∈ H so U is an isometry. Since x = UU∗x for any x ∈ H, U is alsosurjective showing that U (and U∗) is an isometric isomorphism from H to itself. Theimportance of unitary operators is thus, that they implement symmetries on the Hilbertspace modelling some physical system.

33

Example 1.84. (i) Let Tg be multiplication on L2(A) by some (measurable) boundedfunction g : A→ K. Then for any f1, f2 ∈ L2(A), the equation

〈f1|Tgf2〉 =

∫Af1gf2dλ =

∫Agf1f2dλ = 〈Tgf1|f2〉

holds which shows, that (Tg)∗ = Tg. Moreover Tghf = ghf = TgThf for any

f ∈ L2(A) and any bounded g, h, so TgTh = Tgh.Thus Tg is always normal, it is self-adjoint if and only if g is real-valued (almosteverywhere), unitary if and only if g is a function of modulus 1 (almost everywhere)and an orthoprojection if and only if g is the characteristic function of some setB ⊂ A.

(ii) Let x be a real number. For f ∈ L2(R) we define (Uxf)(y) := f(y − x). Then forany f, g ∈ L2(R) we have

〈f |Uxg〉 =

∫ ∞−∞

f(y)g(y − x)dy =

∫ ∞−∞

f(z + x)g(z)dz = 〈U−xf |g〉

so (Ux)∗ = U−x. This also implies that Ux is unitary.

(iii) Let V be a closed subspace of a Hilbert space and for x ∈ H denote the projectionof x onto V by Px. We already know, that P is a projection. Also

(I − P )2 = I − 2P + P 2 = I − P

shows, that I−P is a projection as well. By definition of the projection P , Px ∈ Vand (I − P )x ∈ V ⊥ for any x ∈ H. Hence for any x, y ∈ H

〈x|Py〉 = 〈Px+ (I − P )x|Py〉 = 〈Px|Py〉 = 〈Px|Py + (I − P )y〉 = 〈Px|y〉

shows that P is self-adjoint.Actually any orthoprojection arises in this way (exercise).

Proposition 1.85. Let H be a Hilbert space and A ⊂ B(H). Then the following equal-ities hold:

(imA)⊥ = kerA∗

kerA⊥ = imA∗.

Proof. For x ∈ kerA∗ and y ∈ imA we have y = Az for some z ∈ H and hence

〈x|y〉 = 〈x|Az〉 = 〈A∗x|z〉 = 0.

Thus x is in (imA)⊥. On the other hand if x ∈ (imA)⊥ and y ∈ H, then

0 = 〈x|Ay〉 = 〈A∗x|y〉.

Since y ∈ H is arbitrary, the equation implies A∗x = 0 hence x ∈ kerA∗.Taking the orthogonal complement of the equation just obtained yields

imA = ((imA)⊥)⊥ = (kerA∗)⊥

and we get the second equation by applying this to A∗.

34

1.4 Invertible Operators and the SpectrumDefinition 1.86. Let V,W be a normed vector space. A map T ∈ B(V,W ) is calledcontinuously invertible, if there is a map T−1 ∈ B(W,V ) such that TT−1 = IW andT−1T = IV . The map T−1 is called the inverse map.

Remark 1.87. The inverse map of a continuously invertible map T ∈ B(V,W ) is uniquelydefined justifying the notation T−1. Let V,W,U be normed vector spaces and T ∈B(V,W ) and S ∈ B(W,U) continuously invertible. Then ST is continuously invertiblewith (ST )−1 = T−1S−1.

Example 1.88. (i) Any isometric isomorphism is continuously invertible as was shownin 1.76.

(ii) The left shift on l2,

(x1, x2, x3, . . .) 7→ (x2, x3, . . .)

is surjective (Why?) but not invertible since it is not injective.

(iii) The operator considered in remark 1.73 on l1 is injective

(x1, x2, x3, . . .) 7→(x1,

1

2x2,

1

3x3, . . .

)and has dense image but is not surjective, so it cannot be invertible.

(iv) The right shift on l2,

(x1, x2, x3, . . .) 7→ (0, x1, x2, x3, . . .)

is injective (even an isometry). Its image is not even dense (the element (1, 0, 0, . . .)has distance 1) and so cannot be surjective.

Proposition 1.89. Let V be a Banach space. An operator T ∈ B(V ) is continuouslyinvertible if and only if its image is dense and it is bounded below, i.e. there is ε > 0such that ‖Tx‖ ≥ ε ‖x‖ for any x ∈ V .

Proof. If T is continuously invertible, it is in particular surjective, so its image is dense.Moreover we have for any x ∈ V

‖x‖ =∥∥T−1Tx

∥∥ ≤ ∥∥T−1∥∥ ‖Tx‖

which yields ‖Tx‖ ≥ 1‖T−1‖ ‖x‖.

If, on the other hand, T is has dense image and fulfils ‖Tx‖ ≥ ε ‖x‖ for any x ∈ V ,then its kernel is trivial and hence it is injective. Moreover, from the inequality, we candeduce that imT is closed. To see this, let (yn) ⊂ imT be convergent to y ∈ V and (xn)

35

such that Txn = yn for any n ∈ N. Since (yn) is convergent, it is in particular a Cauchysequence and since

‖xn − xm‖ ≤1

ε‖Txn − Txm‖

holds, (xn) is a Cauchy sequence as well and thus converges to x ∈ V . Since T iscontinuous, Txn = yn converges to Tx = y showing that y ∈ imT . So the image of T isdense and closed and hence the entire space V .Thus T is bijective and there is an inverse map T−1 : V → V . This map T−1 is linear

(Why? See remark 1.76.) and fulfils∥∥T−1Tx∥∥ = ‖x‖ ≤ 1

ε‖Tx‖ .

Since T is surjective, this implies that T−1 is bounded.

Remark 1.90. The previous proposition implies in particular, that we can prove thatan operator is not continuously invertible by finding a sequence (xn) ⊂ V , such that‖xn‖ = 1 and Txn → 0, since the existence of such a sequence shows, that the operatorcannot be bounded below.

Remark 1.91. Let T be a bounded operator on a normed vector space V . As for numbers,we will interpret T 0 as I for any T ∈ B(V ). We will often suppress the symbol I whenwriting scalar multiples of the identity, so T − λ stands for T − λI.

Proposition 1.92 (Neumann Series). Let V be a normed vector space and T ∈ B(V ).If the Neumann series

∞∑n=0

Tn

converges with respect to the operator norm, then its limit is (I − T )−1. In particularthis is the case if V is a Banach space and ‖T‖ < 1.

Proof. Suppose that∑∞

n=0 Tn converges to some operator S ∈ B(V ). Then

0 = S − S = limN→∞

N∑n=0

Tn − limN→∞

N−1∑n=0

Tn = limN→∞

TN

holds.Moreover

(I − T )S = (I − T ) limN→∞

N∑n=0

Tn = limN→∞

N∑n=0

(Tn − Tn+1) = limN→∞

(TN+1 − I) = I

36

shows that S is a right inverse of I − T . An analogue argument shows, that S is also aleft inverse.Now suppose V is a Banach space and ‖T‖ < 1. Then B(V ) is a Banach space as well

by proposition 1.70 and∞∑n=0

‖Tn‖ ≤∞∑n=0

‖T‖n

holds. The last series is a convergent geometric series so the Neumann series is absolutelyconvergent and thus convergent.

From now on, we will only consider complex vector spaces.

Definition 1.93. Let V be a normed vector space and T ∈ B(V ). The resolvent set ofT is

ρ(T ) := {λ ∈ C| T − λ is continuously invertible} .

The spectrum of T is σ(T ) = K\ρ(T ). For any λ ∈ ρ(T ), the operator R(T, λ) :=(T − λ)−1 is called the resolvent of T in λ.

Remark 1.94. If V is a finite dimensional space, then T − λ is invertible if and only ifit is injective. Hence λ is in the spectrum of T if and only if there is v ∈ V such thatTv = λv, i.e. if and only if λ is an eigenvalue of T . In the infinite dimensional case,for any eigenvalue λ of T , the operator T − λv is not injective and hence not invertibleso eigenvalues of T are in the spectrum. On the other hand, there are other ways foran operator to fail invertibility (example 1.88), accordingly there can be elements of thespectrum that are not eigenvalues. As we will see below, there are (physically relevant,self-adjoint) operators that do not have any eigenvalues at all. There are, on the otherhand, operators whose spectrum is characterized by their eigenvalues and these operatorshave particularly nice properties. The Hamiltonian of the quantum mechanic harmonicoscillator is one such example.If A is an observable in quantum mechanics, then the spectral values of A are the

possible results of a measurement of A.

Example 1.95. (i) For λ ∈ C\ {1}, the operator I − λ = (1 − λ)I is continuouslyinvertible with inverse 1

1−λI. Thus σ(I) = {1}. Similarly one finds σ(0) = {0}.

(ii) Consider the operator Tx : L2([0, 1]) → L2([0, 1]) defined by (Txf)(x) = xf(x).Then Tx−λ is continuously invertible for λ /∈ [0, 1] (What is it’s inverse?). On theother hand, for any λ ∈ [0, 1), and sufficiently small ε > 0, we have

∥∥(Tx − λ)1[λ,λ+ε]

∥∥2=

∫ 1

0

∣∣(x− λ)1[λ,λ+ε](x)∣∣2 dx ≤

∫ 1

0ε21[λ,λ+ε](x)2dx

= ε2∥∥1[λ,λ+ε](x)

∥∥2.

37

Hence Tx − λ is not bounded below and thus cannot be continuously invertibleshowing that λ ∈ σ(Tx). A similar argument can be made for λ = 1 so we findσ(Tx) = [0, 1].Note that Tx does not have any eigenvalues since there are no non-trivial functionsf fulfilling λf(x) = xf(x) for (almost) all x ∈ [0, 1].

(iii) More generally, it can be shown, that the spectrum of a multiplication operatorTg : L2(A)→ L2(A) is the “essential image” of g.

(iv) Consider the operator (check that it is linear and bounded)

T : l2 → l2

(x1, x2, x3, x4, . . .) 7→ (0, x1, 0, x3, . . .).

It obviously fulfils T 2 = 0, i.e. is is nilpotent. Thus the Neumann series∞∑n=0

(1

λT

)nconverges for every λ 6= 0 showing that λ−T = λ

(I − 1

λT)is continuously invert-

ible for any λ 6= 0.

Proposition 1.96. Let V be a Banach space and T ∈ B(V ). The resolvent set ρ(T ) isopen, the spectrum σ(T ) is compact, non-empty and contained in {λ ∈ K| |λ| ≤ ‖T‖}.The map λ 7→ R(T, λ) defined on ρ(T ) is analytic.

Proof sketch. For any λ > ‖T‖, the operator Tλ has norm < 1 thus using the Neumann

series, (I − Tλ ) is invertible. But the so is (λI − T ) = λ(I − T

λ ). Thus any λ > ‖T‖ is inthe resolvent set of T .We prove ρ(T ) to be open next. So let λ be in the resolvent set and d = 1

‖(T−λ)−1‖ .For any θ ∈ Bd(λ), we have ∥∥(θ − λ)(T − λ)−1

∥∥ < 1.

Thus the Neumann series

(I − (θ − λ)(T − λ)−1) =∞∑n=0

(T − λ)−n(θ − λ)n

converges. We also have the equality

(T − θ)(T − λ)−1 = (T − λ− (θ − λ))(T − λ)−1 = I − (θ − λ)(T − λ)−1

hence the operator

T − θ = (T − θ)(T − λ)−1(T − λ) = (I − (θ − λ)(T − λ)−1)(T − λ)

38

is invertible so Bd(λ) ⊂ ρ(T ) and ρ(T ) is open. For the inverse we obtain the formula

(T − θ)−1 =∞∑n=0

(T − λ)−(n+1)(θ − λ)n,

which is a B(V )-valued power series in θ that converges on Bd(λ) so the resolvent isanalytic. Banach space valued analytic functions share many properties of their real orcomplex valued counterparts, in particular they are arbitrarily differentiable.We already proved, that σ(T ) is bounded and closed (since ρ(T ) is open) thus it is a

compact set.That it is non-empty follows from a Banach-space-valued version of Liouville’s theorem

(about complex analytic functions).

Remark 1.97. For a Hilbert space H and T ∈ B(H) continuously invertible, we have

I = I∗ = (TT−1)∗ = (T−1)∗T ∗ = (T−1T )∗ = T ∗(T−1)∗

showing that T ∗ is continuously invertible and (T ∗)−1 = (T−1)∗.

Proposition 1.98. Let H be a Hilbert space and T ∈ B(H). For A ⊂ C, define A∗ ={λ∣∣ λ ∈ A∗}. Then the following equality holds

σ(T ∗) = σ(T )∗.

Proof. It suffices to show, that

ρ(T ∗) = ρ(T )∗

holds. Thus let λ be in the resolvent set of T i.e. T − λ is continuously invertible. Bythe previous remark, this implies that (T − λ)∗ = T ∗ − λ is continuously invertible andthus λ ∈ ρ(T ∗). So we have shown, that ρ(T )∗ ⊂ ρ(T ∗). Applying this to T ∗ yieldsρ(T ∗)∗ ⊂ ρ(T ) and taking complex conjugates again ρ(T ∗) ⊂ ρ(T )∗.

Proposition 1.99. Let H be a Hilbert space and T ∈ B(H) self-adjoint. Then σ(T ) ⊂R.

Proof. Let α, β ∈ R and λ = α+ iβ and suppose β 6= 0. Then

=〈x|(T − λ)x〉 = =〈x|(T − α)x〉+ =〈x| iβx〉 = |β| ‖x‖2

holds and we can use the Cauchy-Schwarz inequality to find

β ‖x‖2 ≤ |〈x|(T − λ)x〉| ≤ ‖x‖ ‖(T − λ)x‖

and see that T − λ is bounded below: ‖(T − λ)x‖ ≥ |β| ‖x‖.This inequality implies in particular, that T − λ is injective for any λ ∈ C\R. Then

we get

im(T − λ) = ((im(T − λ))⊥)⊥ = ker(T − λ)∗⊥ = ker(T − λ)⊥ = {0}⊥ = H,

so im(T − λ) is dense (corollary 1.49). Now we can apply proposition 1.89.

39

Remark 1.100. As we noted before, self-adjoint operators represent observables in mod-els of quantum mechanical systems. The elements of the spectrum are the possible valuesthat a measurement of the observable can return.

Definition 1.101. Let V be a Banach space. The spectral radius of a bounded operatorT ∈ B(V ) is

r(T ) := sup { |λ|| λ ∈ σ(T )} .

Remark 1.102. Since | · | is a continuous function and σ(T ) a compact set, there is alwaysat least one λ ∈ σ(T ) such that r(T ) = |λ|, i.e. the supremum above is actually aminimum.

Proposition 1.103. The spectral radius can be calculated as

r(T ) = inf{‖Tn‖

1n

∣∣∣ n ∈ N}

= limn→∞

‖Tn‖1n .

Without proof.

Remark 1.104. Unlike in finite dimensions, the lemma fails to be true if A and B do notcommute.

Corollary 1.105. Let H be a Hilbert space and T ∈ B(H) a normal operator. Then‖T‖ = r(T ).

Proof. Let T be self-adjoint first. For any k ∈ N , using the C∗-property (proposition 1.80(v)) we get ∥∥∥T 2k

∥∥∥ =∥∥∥(T 2k−1

)∗T 2k−1

∥∥∥ =∥∥∥T 2k−1

∥∥∥2= · · · = ‖T‖2

k

and hence

r(T ) = limk→∞

∥∥∥T 2k∥∥∥ 1

2k = ‖T‖ .

Now for general normal T , we use this result to obtain

r(T )2 = limn→∞

‖Tn‖2n = lim

n→∞‖(Tn)∗Tn‖

1n = lim

n→∞‖(T ∗T )n‖

1n

= r(T ∗T ) = ‖T ∗T‖ = ‖T‖2 .

40

1.5 The Continuous Functional CalculusRemark 1.106. A functional calculus is a way to rigorously define “functions of oper-ators” like

√T or eiT . Such functions do also appear in physics if an observable is a

function of another one. There are various functional calculi that differ with respect to

• what functions they apply to,

• what operators they can be applied to,

• certain convergence properties.

A class of functions that can be applied to any operator are polynomials. Let T be afixed operator. For any polynomial of one variable

p(t) =n∑i=0

αiti

we can define

p(T ) :=

n∑i=0

αiTi.

This map already has some desirable properties which can be easily verified:

(p+ λq)(T ) = p(T ) + λq(T )

(pq)(T ) = p(T )q(T )

for polynomials p, q and λ ∈ C so it is an algebra homomorphism from the algebra ofpolynomials P to the bounded operators. This implies in particular, that p(T ) and q(T )commute for any p, q.On the other hand the map is not injective (p(t) = tn maps to 0 if Tn = 0) and, on a

Hilbert space, does in general not preserve the ∗-operation (p(T )∗ = p(T ∗)).

Lemma 1.107. Let V be a Banach space and A,B ∈ B(V ) commuting operators. ThenAB is continuously invertible if and only if A and B are invertible.

Proof. The “if” direction is always true.So suppose A is not invertible, then by 1.89, imA is either not dense or A is not

bounded below. In the former case, imAB ⊂ imA is not dense either and AB cannotbe continuously invertible. In the latter case, there is a sequence (xn) ⊂ V such that‖xn‖ = 1 and Axn converges to zero. Then by the continuity of B, BAxn = ABxnconverges to zero as well showing that AB is not bounded below and hence not invertible.Thus if A is not continuously invertible, AB is not continuously invertible either. Since

the situation is completely symmetric when exchanging A and B, the assertion of thelemma follows.

41

Proposition 1.108 (Spectral Mapping Theorem for Polynomials). Let V be aBanach space, and T ∈ B(V ) and p a polynomial. Then the spectrum of p(T ) is

σ(p(T )) = p(σ(T )). (1.6)

Proof. The assertion is trivial if p is a constant polynomial since then (using that σ(T )is always non-empty)

σ(p(T )) = σ(αI) = {α} = p(σ(T )).

We now proceed by induction. Suppose (1.6) holds for polynomials of degree at most nand let p be a polynomial of degree n+ 1.Suppose λ ∈ C, the polynomial p(t)−p(λ) has a zero at λ hence there is a polynomial

q of degree n such that

p(t)− p(λ) = (t− λ)q(t).

Accordingly we obtain the corresponding equation for operators

p(T )− p(λ) = (T − λ)q(T ).

If λ ∈ σ(T ), then T−λ is not continuously invertible and by the lemma p(T )−p(λ) isn’teither, i.e. p(λ) ∈ σ(p(T )). Since λ ∈ σ(T ) was arbitrary, we get p(σ(T )) ⊂ σ(p(T )).On the other hand, suppose κ is in σ(p(T )). We want to show, that κ = p(ρ) for

some ρ ∈ σ(T ). Choose λ ∈ C such that p(λ) = κ (such a λ always exists by thefundamental theorem of algebra). If λ ∈ σ(T ), then there is nothing left to show. Ifλ /∈ σ(T ), then T − λ is continuously invertible. Then by the lemma, q(T ) cannot becontinuously invertible, i.e. 0 ∈ σ(q(T )). The polynomial q has degree n so by theinduction hypothesis, this implies 0 ∈ q(σ(T )), i.e. there is ρ ∈ σ(T ) such that q(ρ) = 0.But this implies p(ρ)− κ = p(ρ)− p(λ) = (ρ− λ)q(ρ) = 0, i.e. p(ρ) = κ.This shows that p(σ(T )) = σ(p(T )) finishing the induction proof.

Proposition 1.109. Let H be a Hilbert space and T ∈ B(H) self-adjoint. Consider thespace P of (complex) polynomial functions with the norm

‖p‖ = sup { |p(λ)|| λ ∈ σ(T )} .

The map

Φ : P→ B(H)

p 7→ p(T )

is an isometric ∗-algebra homomorphism.

Proof. We already saw above, that the map is linear and multiplicative. It also fulfilsp(T )∗ = p(T ∗) = p(T ). Thus it is a ∗-algebra homomorphism.To see that it is also an isometry, note that p(T )∗ = p(T ) commutes with p(T ) hence

p(T ) is normal. Now use the spectral mapping theorem and 1.105 to get

‖p(T )‖ = r(p(T )) = sup { |κ|| κ ∈ σ(p(T ))} = sup { |p(λ)|| λ ∈ σ(T )} = ‖p‖ .

42

Proposition 1.110. Let V be a normed vector space and W a Banach space, U ⊂ V adense subspace and T : U → W a bounded linear map, then there is a unique boundedlinear map T : V → W extending T , i.e. such that Tx = Tx for any x ∈ U . Moreover∥∥T∥∥ = ‖T‖. The operator T is called the bounded linear extension of T .If T is an isometry, so is T .

Proof. We first show the existence of such an extension. If T is the zero operator, it canbe trivially extended, so assume ‖T‖ 6= 0. For any x ∈ V , choose a sequence (xn) ⊂ Uconverging to x. This sequence is in particular Cauchy, i.e. for any ε > 0, there is N ∈ Nsuch that ‖xn − xm‖ < ε

‖T‖ for m,n ≥ N . But then we also have

‖Txn − Txm‖ ≤ ‖T‖ ‖xn − xm‖ ≤ ε,

showing that (Txn) is a Cauchy sequence, which, W being a Banach space, converges.Now define Tx := limn→∞ Txn. This is well defined, for if (yn) ⊂ U also converges to x,we get

limn→∞

Tyn − limn→∞

Txn = limn→∞

T (yn − xn) = 0

since yn − xn converges to zero and T is continuous.Moreover this operator extends T since Tx = limn→∞ Tx = Tx. One readily verifies,

that T is linear. Finally for any x ∈ V and (xn) ⊂ U converging to x,∥∥Tx∥∥ =∥∥∥ limn→∞

Txn

∥∥∥ = limn→∞

‖Tx‖ ≤ limn→∞

‖T‖ ‖xn‖ = ‖T‖ ‖x‖

holds, proving∥∥T∥∥ ≤ ‖T‖. But since T is an extension of T , it’s norm cannot be strictly

smaller.Finally suppose S was another bounded extension of T to all of V . Then (T −S)x = 0

for all x ∈ U so U ⊂ ker(T − S). But the kernel is always closed and since U is dense,its closure V is also contained in ker(T − S), i.e. T − S is the zero operator.The addendum about isometries is left as an exercise.

Proposition 1.111. Denote by A(T ) the C∗-subalgebra generated by T , i.e.

A(T ) = span {Tn| n ∈ N0}.

Then the map Φ from proposition 1.109 can be uniquely extended to an isometric algebraisomorphism from C(σ(T ))→ A(T ).

Proof. Since σ(T ) is compact, the polynomials P are a dense subset of C(σ(T )) (exam-ple 1.26). By the construction of Φ, we get

Φ(P) = span {Tn| n ∈ N0} ⊂ A(T ).

Hence we can use the previous proposition, to extend Φ to an isometric linear map fromΦ : C(σ(T ))→ A(T ). As before we will use the notation f(T ) := Φ(f). This map is also

43

automatically multiplicative since Φ is. To see this, let f, g ∈ C(σ(T )) and (pn), (qn) ⊂ Psuch that pn → f and qn → g in C(σ(T )). Due to the continuity of multiplication inC(σ(T )), pnqn converges to fg. Then due to the continuity of multiplication in B(H)and the continuity of Φ

f(T )g(T ) = limn→∞

fn(T )gn(T ) = limn→∞

(fngn)(T ) =(

limn→∞

fngn

)(T ) = (fg)(T ).

By a similar argument we get

f(T )∗ =(

limn→∞

pn(T ))∗

= limn→∞

pn(T )∗ = limn→∞

pn(T ) = f(T ).

It remains to show, that Φ is also an isomorphism, i.e. that it is surjective. We alreadysaw, that its image is dense and since it is an isometry, Φ is also bounded below henceit is invertible. Any invertible map is in particular surjective.

Remark 1.112. As we did above, we usually write f(T ) instead of Φ(f) or Φ(f).

Remark 1.113. If an operator T ∈ B(H) is invertible, then its inverse T−1 is in A(T ).

Proof. If T is invertible, then 0 /∈ σ(T ) so the function

g : σ(T )→ R

t 7→ 1

t

is continuous. We have tg(t) = g(t)t = 1 for any t ∈ σ(T ) from which we get, using thecontinuous functional calculus, Tg(T ) = g(T )T = I. So g(T ) ∈ A(T ) is the uniquelydefined inverse of T .

Proposition 1.114 (Spectral Mapping Theorem). Let H be a Hilbert space, T ∈B(H) self-adjoint and f : σ(T )→ C continuous. Then σ(f(T )) = f(σ(T )).

Proof. By the previous remark, if f(T )− λ is invertible, then its inverse is in A(f(T )−λ) ⊂ A(T ). Since Φ is an isometric isomorphism, f(T )− λ is invertible in A(T ), if andonly if t 7→ f(t)−λ is invertible in C(σ(T )) if and only if λ /∈ f(σ(T )). Thus λ ∈ σ(f(T ))if and only if λ ∈ f(σ(T )).

Proposition 1.115. Let H be a Hilbert space, T ∈ B(H) self-adjoint and f : σ(T )→ Rand g : f(σ(T ))→ C continuous functions. Then g(f(T )) = (g ◦ f)(T ).

Proof. The operator f(T ) is self-adjoint: f(T )∗ = f(T ) = f(T ). By the spectral map-ping theorem, σ(f(T )) = f(σ(T )) so g(f(T )) is actually well defined.Define

B := {g ∈ C(f(σ(T )))| g(f(T )) = (g ◦ f)(T )} .

44

For the functions 1(t) = 1 and id(t) = t we get

1(f(T )) = I = (1 ◦ f)(T )

id(f(T )) = f(T ) = (id ◦ f)(T )

hence these functions are in B.For any h, g ∈ B and λ ∈ C,

(h+ λg)(f(T )) = h(f(T )) + λg(f(T )) = (h ◦ f)(T ) + λ(g ◦ f)(T ) = ((h+ λg) ◦ f)(T )

(hg)(f(T )) = h(f(T ))g(f(T )) = (h ◦ f)(T )(g ◦ f)(T ) = ((hg) ◦ f)(T ),

so B is an algebra. In particular, it contains the polynomials.Finally, if (gn) ⊂ B converges to g in C(f(σ(T ))), then gn ◦ f converges to g ◦ f in

C(σ(T )) (Why?). Using the continuity of Φ twice we get

g(f(T )) = limn→∞

gn(f(T )) = limn→∞

(gn ◦ f)(T ) = (g ◦ f)(T ),

showing that g ∈ B.Thus B is a closed subalgebra of C(f(σ(T ))) containing the polynomials. Since the

latter set is dense, this implies B = C(f(σ(T ))) which finishes the proof.

Remark 1.116. (i) Let T ∈ B(H) be self-adjoint. Then for any s, t ∈ R, we have

eisteist = eisteist = 1.

Applying the functional calculus we get for any s ∈ R

eisT(eisT

)∗=(eisT

)∗eisT = I

so eisT is a unitary operator. The corresponding family fulfils eisT eirT = ei(s+r)T

since the number valued functions fulfil the corresponding relation.One can show, that the map ϕ(s) = eisTx is differentiable for any x ∈ H and thatϕ(s) fulfils

d

dsϕ(s) = iTϕ(s).

This is the Schrödinger equation for a Hamiltonian T (although in most physicallyrelevant cases T will be an unbounded operator).

(ii) Defining an operator using a power series

f(T ) =∞∑n=0

αnTn

works only if the entire spectrum of T is contained in the interior of the disc ofconvergence. This is the case for functions like et. If the operator series is welldefined, then the f(T ) defined this way coincides with our definition (exercise).

45

(iii) For an arbitrary operator T ∈ B(H), the operator T ∗T is positive (Why?) hencewe may define |T | =

√T ∗T .

(iv) The continuous functional calculus can be extended to normal operators, to moregeneral tuples of commuting operators and, in a way, even to infinite families ofcommuting operators. There are, on the other hand, inherent problems whendealing with functions of several, non-commuting operators.

(v) The continuous functional calculus can be extended to a larger class of functions,the bounded or Borel functional calculus, but one has to relax the required conti-nuity properties somewhat.

46

2 Differential Geometry

47

2.1 Tensor products and multilinear algebraThroughout this section, V and W will be finite dimensional, real vector spaces. Wewill use the symbol

δij =

{1 i = j

0 i 6= j

and the Einstein summation convention, i.e. any index appearing twice, once as anupper and once as a lower index is automatically summed over it’s range. Thus αiβiis shorthand for

∑ni=1 α

iβi. Since the name of indices that we take the sum over isunimportant, we can always freely rename such double indices.

Definition 2.1. The dual space V ∗ of V is the space of linear functionals. In physics lit-erature, elements of V and V ∗ are called contravariant and covariant vectors respectivelyor simply vectors and covectors.

Remark 2.2. (i) V ∗ is a real vector space.

(ii) V ∗ is the algebraic dual (all linear functionals) whereas we used V ′ for the topo-logical dual (continuous linear functionals). In the finite dimensional case we areinterested in, any linear functional is continuous (with respect to any norm) hencethe notions coincide.

(iii) Let e1, . . . , en be a basis of V , and α1, . . . , αn ∈ C, then there is a unique ϕ ∈ V ∗such that ϕ(ei) = αi for any i ∈ {1, . . . , n}.

Proposition 2.3. Let e1, . . . , en be a basis of V , and let e1, . . . , en ∈ V ∗ be the uniqueelements, such that

ei(ej) = δij for i, j ∈ {1, . . . , n} .

Then e1, . . . en is a basis for V ∗, called the dual basis to e1, . . . , en. In particular V ∗ andV have the same dimension.

Proof. The elements e1, . . . , en are linearly independent, for if βiei = 0 for some coeffi-cients β1, . . . , βn ∈ R, then applying this functional to ej for any j ∈ {1, . . . , n} yields0 = βie

i(ej) = βiδij = βj .

For an arbitrary functional ϕ ∈ V ∗, we have

ϕ = ϕ(ei)ei. (2.1)

One verifies easily, that the equality holds on an arbitrary ej : ϕ(ei)ei(ej) = ϕ(ei)δ

ij =

ϕ(ej). Since both sides of the equation are linear functionals that coincide on a basis,they must be equal.

48

Remark 2.4. For any element v = αiei ∈ V and any ϕ = βjej we get

ϕ(v) = βjej(αiei) = αiβje

j(ei) = αiβi. (2.2)

This implies in particular, that ϕ(v) = 0 for any ϕ ∈ V ∗ holds if and only if v = 0.

Corollary 2.5. For v ∈ V define v∗∗ ∈ (V ∗)∗ by v∗∗(ϕ) = ϕ(v). The map v 7→ v∗∗ isan isomorphism.

Proof. The map is linear (Check that!) and V and (V ∗)∗ have the same dimension, soit suffices to show, that the maps kernel is trivial. So suppose v∗∗ = 0 for some v ∈ V ,i.e. ϕ(v) = 0 for any ϕ ∈ V ∗. Then by the previous remark, v = 0.

Remark 2.6. Note that the map v 7→ v∗∗ is “natural” since it does not depend on thechoice of a basis. Having the same dimension, the spaces V and V ∗ are isomorphic aswell, but this isomorphism depends on the choice of a basis. We will always implicitlyidentify (V ∗)∗ with V . So the expression ϕ(v) can be read as ϕ acting on V but also asv acting on V ∗. Note how this symmetry becomes apparent in (2.2).Reading the defining equation of the dual basis

ei(ej) = δij

from this perspective, we see that the dual basis of the dual basis of some basis is theoriginal basis again.As a corollary, we get the dual equation of (2.1), i.e. for any v ∈ V :

v = ei(v)ei.

As an exercise, prove that equation directly.

Remark 2.7. A map h : V ×W → R is called a bilinear functional (multilinear functionalfor more than two arguments) if v 7→ h(v, w) is linear for any w ∈ W and w 7→ h(v, w)is linear for any v ∈ V . We will call the space of bilinear functionals V ∗ ⊗W ∗. It is avector space with the usual addition and scalar multiplication. As for linear functionals,if h(ei, fj) = 0 for arbitrary basis elements of the spaces V and W , then h = 0:

h(v, w) = h(αiei, βjej) = αiβjh(ei, ej) = 0.

So a bilinear map is uniquely determined by its values on bases.For linear functionals ϕ ∈ V ∗ and θ ∈ W ∗, there is a natural bilinear functional

ϕ⊗ θ : V ×W → R defined by

ϕ⊗ θ(v, w) = ϕ(v)θ(w).

Note that the map (ϕ, θ) 7→ ϕ⊗ θ is itself bilinear since (for the first argument)

(ϕ+ λϕ)⊗ θ(v, w) = (ϕ+ λϕ)(v)θ(w) = ϕ(v)θ(w) + λϕ(v)θ(w)

= (ϕ⊗ θ)(v, w) + λ(ϕ⊗ θ)(v, w) = (ϕ⊗ θ + λϕ⊗ θ)(v, w)

49

holds for any ϕ, ϕ ∈ V ∗, θ ∈W ∗, λ ∈ R and v ∈ V , w ∈W .Let e1, . . . en and f1, . . . , fm be bases of V ∗ and W ∗ respectively, then{

ei ⊗ f j∣∣ i ∈ {1, . . . , n} , j ∈ {1, . . . ,m}}

is a basis of V ∗ ⊗W ∗. To see that they are linearly independent, apply a trivial linearcombination 0 = αije

i ⊗ f j to elements ek, fl of the dual bases:

0 = (αijei ⊗ f j)(ek, fl) = αije

i(ek)fj(fl) = αkl.

On the other hand, let h be a bilinear functional, then

h = h(ei, fj)ei ⊗ f j

as can be checked by applying it to elements of the dual basis.Thus the dimension of V ∗ ⊗W ∗ is the product of the dimensions of V ∗ and W ∗.

Definition 2.8. The tensor product V ⊗ W of the spaces V and W is the space ofbilinear maps from V ∗ ×W ∗ → R.For v ∈ V and w ∈W , the tensor product v ⊗ w ∈ V ⊗W is the map

v ⊗ w(ϕ, θ) = ϕ(v)θ(w).

Remark 2.9. (i) The map (v, w) 7→ v ⊗ w is bilinear (exercise).

(ii) The elements v ⊗w of V ⊗W are called pure or simple tensors. Tensors need notbe pure but they are always a linear combination of pure tensors.

(iii) For chosen bases e1, . . . , en of V and f1, . . . , fm of W , and elements v = αiei ∈ Vand w = βjfj ∈W , the tensor product is

v ⊗ w = αiβjei ⊗ fj .

(iv) The tensor product is not commutative. Even in V ⊗ V , the elements v ⊗ w andw ⊗ v are different if v and w are linearly independent.

(v) In many cases, we will be dealing with (multiple) tensor products of V and V ∗. Ifwe choose a basis e1, . . . , en for V , then we get the dual basis e1, . . . , en for V ∗ andaccording bases for the tensor products, for example ei ⊗ ej , i, j, k ∈ {1, . . . , n} onV ⊗ V ∗. We will always implicitly assume this choice of bases if nothing else ismentioned.

(vi) In the physics literature, a basis is often (implicitly) fixed and the tensor is referredto only by its components with respect to that basis. So “let hij be a tensor”actually means “let h = hijei ⊗ ej be a tensor”.

50

(vii) Tensors appear in various places describing physical quantities.Prominent examples are the momentum of inertia tensor in rigid body mechanics,the stress tensor in continuum mechanics, the metric tensor in relativity theory,the electromagnetic field tensor in relativistic electromagnetism and the energy-momentum tensor in general relativity.In quantum mechanics, if H and G are the Hilbert spaces associated to two quan-tum mechanical systems, then H ⊗G describes the (state space of) the combinedsystem, although in that case additional considerations concerning the scalar prod-uct have to be taken into account.

Proposition 2.10 (Universal property). Let U be another (finite dimensional) vec-tor space and h : V × W → U be a bilinear map. Then there is a unique mapH : V ⊗W → U such that

H(v ⊗ w) = h(v, w)

for any v ∈ V and w ∈W .

Proof. Choose bases e1, . . . , en and f1, . . . , fm for V andW respectively. Then the ei⊗fjform a basis for V ×W and there is a unique linear map fulfilling

H(ei ⊗ fj) = h(ei, fj).

For v = αiei and w = βjfj , we get

H(v ⊗ w) = H(αiβjei ⊗ fj) = αiβjH(ei ⊗ fj) = αiβjh(ei, fj) = h(αiei, βjfj) = h(v, w).

Uniqueness is obvious.

Remark 2.11. (i) The universal property can be rephrased (somewhat sloppily) asfollows: If Φ is a map defined on pure tensors mapping to a vector space such that

Φ((v + λv)⊗ w) = Φ(v ⊗ w) + λΦ(v ⊗ w) andΦ(v ⊗ (w + λw)) = Φ(v ⊗ w) + λΦ(v ⊗ w),

then Φ extends uniquely to a linear map on V ⊗W .

(ii) The map v⊗w 7→ w⊗v is obviously linear in v and w, so by the universal property,there is a unique linear map S : V ⊗ V → V ⊗ V extending it. This map (and itsgeneralisations on higher rank analogues) is called a braiding map.It fulfils S2 = I (Why?) and hence all its eigenvalues must be 1 or −1. The corre-sponding eigenspaces of symmetric (eigenvalue 1) and anti-symmetric (eigenvalue−1) tensors are often of particular interest.Expressed in a basis, a tensor h = hijei ⊗ ej ∈ V ⊗ V is symmetric if and only if

hijei ⊗ ej = h = Sh = hijej ⊗ ei = hjiei ⊗ ej .

Since the ei⊗ej are a basis of V ⊗V , this implies hij = hji for any i, j ∈ {1, . . . , n}.Similarly h is anti-symmetric, if and only if hij = −hji.

51

(iii) For three vector spaces U, V,W , we can form (U ⊗ V ) ⊗ W and U ⊗ (V ⊗ W )and these spaces are different (although they have the same dimension). There ishowever a “natural” unique isomorphism such that

(u⊗ v)⊗ w 7→ u⊗ (v ⊗ w).

This map can be constructed using the universal property twice. Since this mapis surjective (use a basis) and the spaces have the same dimension, it is an isomor-phism. Note that the map does not depend on the choice of a basis (although onecould be used to construct it).We will generally implicitly use this map to identify the two spaces and writeU ⊗ V ⊗W . In this sense, the tensor product is associative.This space can also be identified (use the universal property again) with the spaceof trilinear functionals on U × V ×W .

(iv) An element of the tensor product of r copies of V and s copies of V ∗ is called atensor of rank (r, s), an (r, s) tensor or an r times contravariant, s times covarianttensor. Technically the order of the V and V ∗ matters, but we can always chooseto have all the V to the left and all the V ∗ to the right. In components, an (r, s)tensor has r upper and s lower indices. In the physics literature, the positioningof indices is used to identify the type of tensor.

Definition 2.12. By the universal property of the tensor product, there is a uniquemap V ⊗ V ∗ → R such that v ⊗ ϕ 7→ ϕ(v). This map is called the (partial) trace orcontraction and we will write Tr for it.

Remark 2.13. (i) Expressed in a basis, the contraction of a tensor h = hijei ⊗ ej is

Trh = Tr(hijei ⊗ ej) = hijej(ei) = hii.

An alternative formula is (exercise)

Trh = h(ei, ei).

(ii) For higher rank tensors, there are several possibilities of contractions (contractionsover different pairs of indices if working in coordinates). In these cases we add thefactors to contract over as indices of Tr, so on V ⊗ V ⊗ V ∗ we get Tr1,3 and Tr2,3

which both map to V and which are different maps.If working in components, these traces are always calculated equating and summingover the respective indices, i.e. for h = hijkei ⊗ ej ⊗ ek we get

Tr1,3 h = hij iej and Tr2,3 h = hijjei.

The order in which several contractions are carried out does not matter provided wecontract over the same factors. So for h ∈ V ⊗V ⊗V ∗⊗V ∗, Tr Tr1,3 h and Tr Tr2,4 h

52

coincide the result being hij ij if expressed in coordinates. However Tr Tr1,4 h yieldsthe possibly different result hijji.As can already be seen in these examples, the Tr notation gets clumsy ratherquickly as soon as several traces are involved. This is one reason why the indexnotation is prevalent in physics literature.

(iii) Contractions only make sense (independent of a basis) for a V and a V ∗ factor (anupper and a lower index).

Proposition 2.14. Let B(V ) be the space of linear maps from V to V (these are auto-matically bounded in finite dimensions). Then the map Φ : V ⊗ V ∗ → B(V ) defined byv 7→ Φ(h)(v) = Tr2,3 h⊗ v is an isomorphism.

Proof. Since the spaces V ⊗ V ∗ and B(V ) both have dimension (dimV )2, it is sufficientto show that Φ is surjective. So let T ∈ B(V ) be a linear operator. Choose a basise1, . . . , en and calculate for some v ∈ V

Φ((Tei)⊗ ei

)(v) = Tr2,3(Tei)⊗ ei ⊗ v = ei(v)Tei = T

(ei(v)ei

)= Tv,

where we used remark 2.6 for the last step. Since this holds for any v ∈ V , we see that

Φ((Tei)⊗ ei

)= T

showing the surjectivity of Φ.

Remark 2.15. (i) Note that this isomorphism is again “natural” in the sense that itdoes not depend on the choice of a basis. As with the natural isomorphism betweenV ∗∗ and V , we will often use this isomorphism implicitly.

(ii) In components, for a tensor h = hijei ⊗ ej ∈ V ⊗ V ∗ and v = vkek, the map Φ is

Φ(h)(v) = Tr2,3 h⊗ v = hijvk Tr2,3 ei ⊗ ej ⊗ ek = hijv

jei.

Note how the components of Φ(h)(v) are obtained by the usual matrix multiplica-tion formula.

(iii) By contracting the other index, we can construct another isomorphism, that allowsus to interpret tensors h ∈ V ⊗ V ∗ as linear maps in B(V ∗).

(iv) For tensors of higher rank, there is an increasing number of similar isomorphisms.The following table shows some interpretations of a tensor h ∈ V ⊗ V ⊗ V ∗ as(multi-)linear maps on various spaces together with the corresponding expressions

53

in components:

V ∗ × V ∗ × V → R hijkviwjuk,

V ∗ × V → V hijkviuk,

V ∗ × V → V hijkvjuk,

V ∗ ⊗ V ∗ × V → R hijkgijuk,

V ∗ → V ⊗ V ∗ hijkvi,

V ∗ ⊗ V ∗ ⊗ V → R hijkgijk,

V ∗ ⊗ V ∗ ⊗ V → R hijkgjik.

(v) Being an isomorphism, Φ is in particular injective so Φ(h) = 0 implies h = 0.Expressed in components: if hij fulfils hijvj = 0 for any i ∈ {1, . . . n} and any vj ,then hij = 0. In other words, two tensors are the same, if they “act on vectors” inthe same way (similarly to linear functionals).

Remark 2.16. Up to now, everything was purely algebraic. We can talk about “vectors”,“covectors”, “tensors” (of various ranks), “triangles”, “parallel” . . . Geometric notionslike “length”, “angle”, “surface area”, “volume” are not available in this context. Thenext definition will give meaning to all of these concepts eventually.

Definition 2.17. A tensor g : V ∗ ⊗ V ∗ is called a metric tensor or simply metric on Vif it fulfils (we interpret g as bilinear functional on V here)

(i) g(v, w) = g(w, v) for any v, w ∈ V (symmetry) and

(ii) g(v, w) = 0 for any w ∈ V if and only if v = 0 (non-degeneracy).

If it additionally fulfils g(v, v) ≥ 0, then the metric is called positive definite.

Example 2.18. (i) A metric tensor on V is a scalar product, if and only if it is positivedefinite.In this case, we interpret

√g(v, v) as the length of the element v ∈ V . If the metric

is not positive definite, the interpretation becomes more involved.

(ii) For special relativity, the Minkowski metric on R4 given by

g((v0, v1, v2, v3), (w0, w1, w2, w3)) = −v0w0 + v1w1 + v2w2 + v3w3

plays an important role.Note that in this metric, g(v, v) = 0 is possible for non-zero elements v. In thetheory of relativity, such elements are called lightlike vectors.

54

Remark 2.19. The word “metric” is also used in a different sense in topology and func-tional analysis.

Proposition 2.20 (Diagonalization). Let g ∈ V ∗ ⊗ V ∗ be a metric tensor. Thenthere is a basis e1, . . . , en such that

g =

r∑i=1

ei ⊗ ei −n∑

i=r+1

ei ⊗ ei.

Proof. Choose any basis and write g = gijei⊗ ej . Since g is symmetric, the matrix G =

(gij) is symmetric as well (see remark 2.11). Any symmetric matrix can be diagonalizedby orthogonal matrices, i.e. there is a matrix A = (aij) and a diagonal matrix D =

(λkδkl) such that ATA = AAT = I and G = ATDA. Writing out the last equation incomponents we get

gij =n∑l=1

n∑k=1

aliλkδlkakj =

n∑l=1

λlalialj .

Then we get the following diagonal representation for g:

g =n∑l=1

λlalialjei ⊗ ej =

n∑l=1

λl

(alie

i)⊗(alje

j)

=n∑l=1

λlel ⊗ el,

where we defined el = aliei. Since the matrix A is invertible, the vectors e1, . . . , en are a

basis of V ∗. Note that the positioning of indices on a and δ does not matter, since theseare not components of a tensor anyway.Let e1, . . . , en be the dual basis of V . If λi = 0 for some i ∈ {1, . . . , n}, then

g(ei, ej) =n∑l=1

λlδliδlj = 0

for any j ∈ {1, . . . , n} hence g(ei, v) = 0 for any v ∈ V , i.e. g is degenerate. Since g isa metric tensor and thus non-degenerate, all the λl are non-zero. We can then furtherdefine

σl =

{1 λl > 0

−1 λl < 0

and write λl = σl√|λl|

2. Finally we get

g =

n∑l=1

σl

(√|λl|el

)⊗(√|λl|el

).

The elements√|λl|el (no sum over the double index here) are yet another basis (we

just rescaled the elements) and the coefficients σl are all +1 or −1. So after possiblyreordering the basis elements, we have found the desired representation.

55

Remark 2.21. The representation of the metric tensor constructed in the previous propo-sition is not unique, but the number r is the same for all such representation (Sylvester’slaw of inertia). This number, which specifies how many plusses and minusses appearin the metric is called the metrics signature. Thus the usual scalar product on R3 hassignature (+ + +), whereas the Minkowski metric has signature (−+ ++) or (+−−−)depending on convention.

Corollary 2.22. Let g be a metric tensor, then there is an orthonormal basis of V , i.e.a basis e1, . . . , en such that g(ei, ej) = 0 if i 6= j and g(ei, ej) ∈ {−1, 1}.

Proof. Represent g as constructed above:

g =n∑i=1

σiei ⊗ ei

with σi ∈ {−1, 1}. For the dual basis e1, . . . , en we get

g(el, ek) =

n∑i=1

σiδilδik = σlδlk.

Proposition 2.23. Let g ∈ V ∗⊗V ∗ be a metric on V . Then the contraction map fromV to V ∗ mapping v 7→ v := g(v, · ) = Tr1,3 g ⊗ v is an isomorphism.

Proof. The map is obviously linear (exercise) and since V and V ∗ have the same dimen-sion, we only have to show injectivity. Suppose g(v, · ) ∈ V ∗ is the zero element, i.e.g(v, w) = 0 for any w ∈ V . Then by the non-degeneracy of the metric, v = 0 so thekernel of the map is trivial.

Remark 2.24. (i) The above isomorphism is “natural”, i.e. it does not depend on thechoice of a basis. It does of course depend on the choice of the metric.

(ii) Contracting over the other factor, i.e. g( · , v) = Tr2,3 g⊗v results in the same mapdue to the symmetry of g (exercise).

(iii) When working in components, the metric has the form g = gijei ⊗ ej and the

isomorphism defined above is

v = g(vkek, · ) = gijvkei(ek)e

j = gijvjei.

(iv) When working with a fixed metric, it is often used to implicitly identify v and v.In components, from the contravariant vector vj (short for vjej), we define thecovariant vector vi := gijv

j (short for viei). In the physics literature this is called“lowering the index” of vj . Yet again, the components of vi are obtained by matrixmultiplication.

56

Remark 2.25. Let g be a metric on V . As we have seen, it defines a unique isomorphismv 7→ v from V to V ∗, which is in particular invertible. The inverse map v 7→ v can berepresented by a tensor g : V ∗ × V ∗ → R again such that

v = Tr1,3 g ⊗ v.

If we write g = gijei ⊗ ej , v = vjej and v = vlel, we get in components

vjej = gijvl Tr1,3 ei ⊗ ej ⊗ el = gijviej = gijgikvkej

Since the ej form a basis, this implies vj = δjkvk = gijgikv

k for any k ∈ {1, . . . , n}. Thevk are arbitrary, we get δjk = gjigik for any j, k ∈ {1, . . . , n}. In other words, the matrixgji is the inverse matrix of gik.We can use this result to calculate

g(v, w) = gijvkwlei ⊗ ej(ek, el) = gijviwj = gijgikvkgjlw

l = δjkgjlvkwl

= gklvkwl = g(v, w).

From this formula we can deduce, that g is itself a metric tensor on the space V ∗(exercise).When working in coordinates, it is common to use the same symbol for g and g, since

the two are distinguished by the positioning of indices.

Remark 2.26. (i) The metric g and the dual metric g now define unique maps fromV → V ∗, from V ∗ → V . We already looked at the coordinate expressions above:

v = vjej = vigijej and v = vie

i = vjgijei.

This operation is referred to as raising and lowering indices. It is common, to usethe same symbol for the coordinates of v and v as we did above, only distinguishingthem by the positioning of indices.

(ii) On Rn, there is the standard scalar product, which is a metric tensor. We haveoften used this metric tensor and the resulting isomorphism between Rn and (Rn)∗

to identify these spaces (identifying row and column vectors for example).

(iii) Of course, we can use the same metrics to raise and lower indices on higher ordertensors. So for a tensor hijkei ⊗ ej ⊗ ek ∈ V ⊗ V ⊗ V ∗, we can lower the secondand raise the third index to obtain a tensor hilmei ⊗ el ⊗ em ∈ V ⊗ V ⊗ V ∗. Forthe components we get

hilm = hijkgjlg

km.

Again, we use only the index positioning to indicate which tensor the componentsbelong to.In this way, we can also raise the indices of the metric tensor itself:

gklgikgjl = δilg

jl = gji.

Thus the operation is consistent with our definition of the dual tensor. Raising theindices of the metric g yields the dual metric g.

57

2.2 Smooth ManifoldsSmooth manifolds are generalizations of Rn that are still sufficiently “smooth”. For anarbitrary set X, we can still talk about functions on X but derivatives of such functionscannot necessarily be defined. In order to define such derivatives, we will need additionalstructure on X. We will give such a “smooth” structure identifying X with Rn locally,i.e. in the vicinity of each point.

Example 2.27. Consider the setM = [−1, 1)×R. We want to think ofM as the cylinder,i.e. we “glue” the left edge onto the right one. We define the following maps (calledcharts)

ϕ1 : (−1, 1)× R→ (−1, 1)× R(x, y) 7→ (x, y)

ϕ2 : ([−1, 0) ∪ (0, 1))× R→ (−1, 1)× R

(x, y) 7→

{(x+ 1, y) x < 1

(x− 1, y) x > 1

Note that the domains of the charts cover all ofM and that on the overlap of the domains((−1, 0) ∪ (0, 1))× R the concatenations (called transition maps)

ϕ2 ◦ ϕ−11 : (x, y) 7→

{(x+ 1, y) x < 1

(x− 1, y) x > 1

ϕ1 ◦ ϕ−12 : (x, y) 7→

{(x+ 1, y) x < 1

(x− 1, y) x > 1

are C∞ maps. The above gives a precise description of the “smooth structure” of acylinder.We can now slightly alter the scenario by replacing ϕ1 with the map

ϕ1 : (−1, 1)× R→ (−1, 1)× R(x, y) 7→ (x,−y),

keeping the second chart unaltered. The transition maps are still C∞ functions (wejust replaced y by −y in the above formulas), hence we got another “smooth structure”on M . The interpretation is, that we twisted the left edge upside down before glue-ing it to the right edge. By this construction, we obtain (one version of) the Möbiusband. Unlike the cylinder, this object cannot be embedded into R3 any more, hence wehave trouble visualizing it (we can however embed and visualize a bounded part of M).Nonetheless, the above charts give a precise mathematical description, that will allowus to differentiate functions on M .

Remark 2.28. Reminder: Let U, V be open subsets of Rn. A map f : U → V is calleda diffeomorphism, if it is bijective and f and f−1 are continuously differentiable (C1).

58

It is called a C∞-diffeomorphism, if f and f−1 are arbitrarily differentiable. Note thatfor any open subset U ⊂ U , the restriction f : U → f(U) is again a C∞-diffeomorphism(exercise).

Definition 2.29 (Atlas). Let M be a set and n ∈ N. An atlas on M is a family ofpairs (Ui, ϕi) (i in some index set I) of subsets Ui ⊂M and injective maps ϕ : Ui → Rnsuch that

(i)⋃i∈I Ui = M ,

(ii) ϕi(Ui ∩ Uj) is open for any i, j ∈ I and

(iii) ϕi ◦ ϕ−1j : ϕj(Ui ∩ Uj)→ ϕi(Ui ∩ Uj) is a C∞-map.

The maps ϕi are called charts (coordinate charts, coordinate maps) and the Ui thechart domains. The maps ϕi ◦ ϕ−1

j are called transition maps.The number n is called the dimension.

Remark 2.30. (i) From this definition, it follows in particular, that ϕi(Ui) is an opensubset of Rn for any i ∈ I and that ϕ : Ui → ϕ(Ui) is bijective.

(ii) Note that, since ϕj is injective, the transition map ϕi◦ϕ−1j bijectively maps ϕj(Ui∩

Uj) onto ϕi(Ui ∩Uj). By assumption, this transition map and its inverse ϕj ◦ ϕ−1i

are both C∞, so the transition maps are C∞-diffeomorphisms.

(iii) We will only consider smooth atlases (the transition maps are C∞). There are also“less smooth” versions where Cr maps or just continuous (C) maps are allowed astransition maps.

(iv) A chart is a way to assign coordinates to the objects in (a part of) M . Thereare of course many different possible coordinate systems and the transition mapsdescribe the change from one system to another. Even on Rn, you did alreadywork with non-trivial charts, e.g. polar or spherical coordinates.

(v) Since a chart ϕi maps to Rn, it can be written as ϕi(p) = (x1(p), . . . , xn(p)), wherex1, . . . , xn are functions from U1 → R. We can alternatively express the coordinatefunctions as xµ = πµ ◦ ϕi where πµ is the canonical projection from Rn → R ontothe µth coordinate. Thinking of the coordinates as functions on (part of) M willbecome important soon.

Definition 2.31. Let (Ui, ϕi) be an atlas on M , V ⊂ M and ψ : V → Rn injective.Then (V, ψ) is called a compatible chart (with respect to the atlas), if (Ui, ϕi) ∪ (V, ψ)is still an atlas.An atlas is called maximal, if it contains every compatible chart.

Remark 2.32. (i) A chart (V, ψ) is compatible, if and only if ψ ◦ ϕ−1i : ϕi(V ∩ Ui) →

ψ(V ∩Ui) is a C∞-diffeomorphism between open sets for each (Ui, ϕi) in the atlas.

59

(ii) For a given chart (Ui, ϕi), we can always shrink the domain of the chart in acompatible way. For any open subset O of ϕi(Ui), define V = ϕ−1

i (O) and ϕi =ϕi|V . Then (V, ϕi) is a compatible chart since ϕi ◦ϕ−1

j is the restriction of ϕi ◦ϕ−1j

to an open subset.

(iii) Any atlas is contained in a unique maximal atlas, which is the collection of allcompatible charts (it is necessary to show, that this is actually an atlas).

Definition 2.33. A manifold (more precisely C∞-manifold) of dimension n is a setM together with a maximal atlas (Ui, ϕi)i∈I of dimension n, that fulfils two additionaltechnical conditions:

(i) M can be covered by countably many chart domains, i.e. there is a countableJ ⊂ I such that

⋃i∈J Ui = M (second countable) and

(ii) for any p, q ∈ M , there are disjoint chart domains Ui, Uj such that p ∈ Ui andq ∈ Uj (Hausdorff property).

Alternatively we say, that the atlas (Ui, ϕi) defines a smooth structure on the set M .

Remark 2.34. (i) Specifying a maximal atlas explicitly is impossible in non-trivialcases. However a manifold can always be defined by giving any atlas on M (sincethat is contained in a unique maximal atlas). The maximality condition is thereto assure, that a different compatible atlas defines the same manifold.

(ii) The technical conditions are there to exclude certain pathological examples.

(iii) The general idea of a manifold is to define the smooth structure (“when is some mapsmooth”) locally, i.e. using charts, but in such a way, that the various descriptionsthat arise from different charts are compatible.This paradigm can be applied to various other types of structures: vector- andmore general fibre bundles, complex manifolds, topological manifolds ...

Example 2.35. (i) Any finite dimensional vector space is a manifold. An atlas con-sisting of a single chart can be given by choosing a basis e1, . . . , en and mappingviei ∈ V to (v1, . . . , vn) ∈ Rn. Any other choice of basis yields a compatible chart(the transition function is an invertible linear map).

(ii) Any open subset O ⊂ Rn is a manifold. An atlas is given by the identity map onO.

(iii) Many subsets of Rn can inherit a smooth structure, e.g.

Sn ={

(x1, . . . , xn+1 ∈ Rn+1∣∣ x2

1 + · · ·+ x2n+1 = 1

}.

60

(iv) Let M, (Ui, ϕi)i∈I , N, (Vj , ψj)j∈J be two manifolds of the same dimension. Thenon the disjoint union

M tN = {(p, 0)| p ∈M} ∪ {(q, 1)| q ∈ N}

define

ϕi : Ui × {0} → Rn

(p, 0) 7→ ϕ(p)

ψj : Vj × {1} → Rn

(q, 0) 7→ ψ(q)

for any i ∈ I and any j ∈ J . Then

(Ui × {0} , ϕi) ∪ (Vj × {1} , ψj)

defines a smooth structure on M tN .

(v) LetM, (Ui, ϕi)i∈I and N, (Vj , ψj)j∈J be two manifolds (of possibly different dimen-sions m and n), then

(Ui × Uj , ϕi × ψj)(i,j)∈I×J

defines a smooth structure on M ×N , where ϕi × ψj(p, q) = (ϕi(p), ψj(q)).

(vi) Let P 2 be the set of all straight lines through the origin of R3. Such a line can bespecified by a point (x, y, z) (not (0, 0, 0)) on the line and another point specifiesthe same line, if and only if it is of the form (λx, λy, λz).Let U1, U2 and U3 be the subsets of lines that are not in the y-z-, x-z- and x-y-planerespectively and define charts ϕ1, ϕ2 and ϕ3 on these sets

ϕ1(x, y, z) =(yx,z

x

)ϕ2(x, y, z) =

(x

y,z

y

)ϕ3(x, y, z) =

(yx,z

x

).

Note that each of these maps onto the entire space R2 and that their definitions doindeed only depend on the line and not on the representing point (ϕi(λx, λy, λz) =ϕi(x, y, z)). Finally note, that the transition maps are C∞. For example on U1∩U2

we have (ϕ−11 (a, b) is the line through (1, a, b))

ϕ2 ◦ ϕ−11 : R2\({0} × R)→ R2\({0} × R)

(a, b) 7→(

1

a,b

a

)is C∞. Thus we defined a smooth structure on P 2. The object we constructedhere, is called the projective plane. It cannot be embedded into R3 hence it’sdifficult to visualize.

61

Below, M will always be an n-dimensional manifold and (Ui, ϕi)i∈I an atlas.

Remark 2.36. Manifolds are generalizations of (certain aspects) of vector spaces. Thecharts allow us, to transport objects on the manifold to objects on subsets of Rn andhence we can define properties of these objects by corresponding properties of the objectson Rn. For this to make sense, we need to assure that the property we are interested in,is invariant under diffeomorphisms. We will illustrate this below for the neighbourhoodsof points and for the differentiability of functions.

Remark 2.37. Reminder: A subset U ⊂ Rn containing a point x ∈ Rn is called a neigh-bourhood x, if there is r > 0 such that Br(x) ⊂ U . Note, that this in particularimplies x ∈ U . A function f is continuous at x if and only if the preimage f−1(U) is aneighbourhood of x for any neighbourhood U of f(x).Most of the topological concepts on Rn where formulated using balls. The transition

maps of arbitrary coordinate changes do not preserve balls.However, a transition function ϕj ◦ ϕ−1

i : ϕi(Ui ∩ Uj) → ϕj(Ui ∩ Uj) is continuous(even C∞) so the inverse image of a neighbourhood of ϕj(p) is a neighbourhood of ϕi(p).Since the inverse map is continuous as well, the images of neighbourhoods of ϕi(p) areneighbourhoods of ϕj(p) as well. Hence the transition maps preserve neighbourhoodsand we can use those to define topological concepts.

Proposition 2.38. Let p be a point of M and V a subset containing p. Let i, j ∈ Ibe such, that p ∈ Ui ∩ Uj. Then ϕi(V ∩ Ui) is a neighbourhood of ϕi(p) if and only ifϕj(V ∩ Uj) is a neighbourhood of ϕj(p).

Proof. The transition map ϕj ◦ ϕ−1i is continuous (even C∞) and maps ϕi(p) to ϕj(p).

Suppose ϕi(V ∩ Ui) is a neighbourhood of ϕi(p). Since ϕi(Ui ∩ Uj) is open,

ϕi(V ∩ Ui ∩ Uj) = ϕi(V ∩ Ui) ∩ ϕi(Ui ∩ Uj)

is also a neighbourhood of ϕi(p) and so is

(ϕj ◦ ϕ−1i )(ϕi(V ∩ Ui ∩ Uj)) = ϕj(V ∩ Ui ∩ Uj) ⊂ ϕj(V ∩ Uj).

Exchanging i and j yields the other implication.

Definition 2.39. A set V ⊂ M is a neighbourhood of p ∈ M if ϕi(V ∩ Ui) is a neigh-bourhood of ϕi(p) for some (and hence for all) charts ϕi whose domain Ui containsp.

Remark 2.40. For any chart domain Ui containing a point p, we can construct neigh-bourhoods of p as follows: for r > 0 small enough, Br(p) will be contained in ϕi(Ui).Then ϕ−1

i (Br(p)) is a neighbourhood of p. Note that any neighbourhood V of p containsa smaller neighbourhood of this form.We can now define for example:

(i) A point p ∈ V ⊂M is an inner point of V , if V is a neighbourhood of p.

62

(ii) A set O ⊂M is called open if each of its points is an inner point of O.

(iii) A set C ⊂M is called closed, if M\C is open.

(iv) A sequence (pk) ⊂M converges to p ∈M if for every neighbourhood V of p, thereis an N ∈ N such that pk ∈ V for any k ≥ N .

(v) ...

The concepts can also be expressed in terms of the charts:

(i) A subset O ⊂M is open, if and only if ϕi(O∩Ui) is open for any i ∈ I. Note thatthis implies in particular, that the chart domains Ui are open sets.

(ii) A sequence (pk) ⊂M converges to p ∈M if and only if (ϕi(pk)) converges to ϕi(p)for any (and hence for all) charts ϕi with p ∈ Ui.

(iii) ...

Proposition 2.41. Let f : M → R be a function and p ∈ Ui ∩ Uj. Then f ◦ ϕ−1i is

continuous (or differentiable or smooth) in ϕi(p) if and only if f ◦ϕ−1j is continuous (or

differentiable or smooth) in ϕj(p).

Proof. We show the assertion for differentiability. So suppose f ◦ ϕ−1i is differentiable

in ϕi(p) = ϕi ◦ ϕ−1j (ϕj(p)). Since ϕi ◦ ϕ−1

j is a C∞-diffeomorphism, it is in particulardifferentiable in ϕj(p). Then, by the chain rule, we know that

f ◦ ϕ−1j = (f ◦ ϕ−1

i ) ◦ (ϕi ◦ ϕ−1j )

is differentiable in ϕj(p). Exchanging i and j yields the other implication.Since C∞-diffeomorphisms are smooth (and in particular continuous) maps, the proofs

for the other properties can be done in a similar way.

Definition 2.42. A function f : M → R is continuous (or differentiable or smooth) inp if f ◦ ϕ−1

i : ϕi(Ui) → R is continuous (or differentiable or smooth) in ϕi(p) for some(and hence for all) i ∈ I such that p ∈ Ui.The sets of continuous, r times continuously differentiable or smooth functions on M

are denoted by C(M), Cr(M) and C∞(M) respectively.

Remark 2.43. (i) About functions on open subsets of Rn we know: Polynomials, theexponential function and sin and cos are smooth. For f, g continuous (or differen-tiable or smooth), f + g and fg are continuous (or differentiable or smooth) andfg is if g 6= 0.Maps with values in Rn are continuous (or differentiable or smooth) if all theircomponent functions are. Concatenations of smooth maps are smooth.There is a smooth “bump function” g : Rn → R such that g = 1 on B1(0) andg = 0 outside of B2(0). By translating and rescaling this function, we can constructfunctions of arbitrarily small support, that are still constant 1 in a neighbourhoodof a given point.

63

(ii) It is not immediately obvious, that the spaces C(M) ⊃ Cr(M) ⊃ C∞(M) containany functions but the constants. However, let i be an index and g : ϕi(Ui) → Rbe a smooth function with compact support, then the function

f : M → R

p 7→

{g ◦ ϕi(p) p ∈ Ui0 p /∈ Ui

is smooth. Moreover, one easily checks that for f, g continuous (or differentiableor smooth), f + g and fg are continuous (or differentiable or smooth) and f

g is ifg 6= 0. This shows, that there are actually a lot of these functions. In particular,the spaces C(M), Cr(M) and C∞(M) are algebras.Let p be a fixed point in M and V a neighbourhood of p. Now choose g : Rn → Rsuch, that supp g ⊂ V and g = 1 on a neighbourhood of ϕ(p) and define thecorresponding f as above. The “bump function” f so constructed vanishes outsideof the neighbourhood V and is identically 1 in a smaller neighbourhood of p. Wewill later use such bump functions.

Definition 2.44. Let N be another manifold with atlas (Vj , ψj). Then a map Φ : M →N is continuous (or differentiable or smooth) in p ∈ M if for some (and hence for all)i ∈ I and j ∈ J such that p ∈ Ui and Φ(p) ∈ Vj , the map ψj ◦ Φ ◦ ϕ−1

i is continuous (ordifferentiable or smooth) in ϕi(p).

Remark 2.45. (i) Although we defined differentiability of functions, we did so withoutdefining (even partial) derivatives.

(ii) To check continuity (or differentiability or smoothness) of ψj ◦ Φ ◦ ϕ−1i in the

point p, we only need to know the function in a small neighbourhood of p. Sincethe transition maps preserve neighbourhoods, continuity (or differentiability orsmoothness) in p can be defined this way for any Φ defined on a neighbourhood ofp.

(iii) If M or N or both are open subsets of Rn, then we can use the identity functionsas charts. Hence definition 2.42 is a special case of definition 2.44 (N = R) andboth generalize the concept of continuity (or differentiability or smoothness) offunctions from Rn → Rm.Finally note, that the definitions are such, that the charts themselves are automat-ically smooth since ϕi ◦ϕ−1

i = id. Thus the coordinate functions xµ are smooth aswell.

Proposition 2.46 (Chain rule). Let M,N,Q be manifolds, p ∈ M and Φ : M → Nand Ψ : N → Q maps that are continuous (or differentiable or smooth) in p and Φ(p)respectively. Then Ψ ◦ Φ is continuous (or differentiable or smooth) in p.

Proof. Exercise.

64

Remark 2.47. Our next goal is to specify directions in a manifold. In Rn, this is easilydone, just give an element v ∈ Rn or specify a straight line. On a general manifoldhowever, there is no vector space structure. The naive attempt of specifying a line in achart fails, since arbitrary smooth coordinate changes do not preserve lines.Fix a point x ∈ Rn. For v ∈ Rn consider the directional derivative in direction v at

the point x, i.e. for f ∈ C∞(Rn)

Dvf = limh→0

f(x+ hv)− f(x)

h= 〈grad f(x)|v〉.

One easily checks, that this map fulfils for f, g ∈ C∞(Rn) and λ ∈ R

Dv(f + λg) = Dvf + λDvg

Dv(fg) = (Dvf)g(x) + f(x)Dvg.

Moreover for some w ∈ Rn and λ ∈ R,

Dv+λwf = Dvf + λDwf,

so the map v 7→ Dv preserves the vector space structure of Rn. So instead of specifyingdirections by elements v ∈ Rn, we can specify them using the operators Dv. This looksa lot more complicated, but in this form it can be generalized to manifolds. Note howthe elements x and v in Rn have fundamentally different roles in the above discussion.The former specifies a point whereas the latter specifies a direction.

Definition 2.48. Let p be a point ofM . A linear functional Xp : C∞(M)→ R is calleda derivation (directional derivative) or a tangent vector at p if it fulfils the Leibnitz rule

Xp(fg) = Xp(f)g(p) + f(p)Xp(g).

The set TpM of derivations at p is called the tangent space of M in the point p.

Remark 2.49. The space of linear functionals C∞(M)∗ has a natural vector space struc-ture. We show that TpM is a subspace and thus carries a vector space structure itself.Let Xp, Yp be derivations at p and λ ∈ R. Then for any f, g ∈ C∞(M)

(Xp + λYp)(fg) = Xp(fg) + λYp(fg)

= Xp(f)g(p) + f(p)Xp(g) + λYp(f)g(p) + f(p)Yp(g)

= (Xp + λYp)(f)g(p) + f(p)(Xp + λYp)(g)

holds, so Xp + λYp is a derivation at p.The space C∞(M)∗ is infinite dimensional, but the Leibnitz rule is a rather strict

restriction as we will see.

65

Proposition 2.50. Let p be a point of M , ε > 0 and γ : (−ε, ε)→M be a differentiablecurve such that γ(0) = p. Then the map

γ : C∞(M)→ R

f 7→(

d

dtf ◦ γ(t)

)(0)

is a derivation. The element γ is called the tangent to γ at p.

Proof. The required properties of γ follow immediately from the definition. For f, g ∈C∞(M) and λ ∈ R we have

γ(f + λg) =

(d

dt(f + λg)(γ(t))

)(0) =

(d

dt(f(γ(t)) + λg(γ(t)))

)(0) = γ(f) + λγ(g)

γ(fg) =

(d

dtf(γ(t))g(γ(t))

)(0) = γ(f)g(γ(0)) + f(γ(0))γ(g) = γ(f)g(p) + f(p)γ(g).

Remark 2.51. If M is embedded in Rn (think of the unit sphere S2 in R3), then anycurve in M is also a curve in Rn which has a tangent vector in the traditional sense.By identifying the “old” and “new” tangent vectors, the abstract space TpM can beidentified (mapped isomorphically) with an concrete subspace of Rn. Note however,that the concrete tangent spaces depend on the embedding. The embedding also createsartefacts. For example the concrete tangent spaces of different points may have nontrivialintersection, but this intersection is not meaningful since the abstract tangent spaces ofdifferent points never have nontrivial intersection.

Remark 2.52. Let p be a point ofM . We can construct smooth curves through p using achart covering p. Choose a chart such that p ∈ Ui and a smooth curve σ : (−ε, ε)→ φi(Ui)such that σ(0) = φ(p), then ϕ−1

i ◦ σ is a smooth curve on M through p.A particularly important example is given by the coordinate lines:

σµ : (−ε, ε)→ ϕi(Ui)

t 7→ (x1(p), . . . , xµ−1(p), xµ(p) + t, xµ+1(p), . . . , xn(p)).

Now we can define the partial derivatives in p with respect to the chosen chart (orcoordinates):

∂µf =d

dt

∣∣∣∣t=0

f(ϕ−1i ◦ σµ(t)) =

∂uµ

∣∣∣∣ϕi(p)

f ◦ ϕ−1i .

Note that the notation ∂∂xµ f is also used especially in physics texts. Although this is

sometimes a good mnemonic, the notation is purely formal since f is not a function ofthe coordinate xµ.Of course the coordinate lines depend on the chosen chart and so do the partial

derivatives.Note, that the tangent vectors ∂µ depend on p but in order to reduce notational

clutter, this dependence is often not made explicit.

66

Proposition 2.53. Let Xp be a derivation at p, V a neighbourhood of p and f ∈ C∞(M)such that f = 0 on V . Then Xp(f) = 0.

Proof. Let ρ ∈ C∞(M) be a bump function that vanishes outside of V and is identically1 on some smaller neighbourhood V ⊂ V of p (see remark 2.43). Then fρ = 0 everywhereand by the Leibnitz rule

0 = Xp(fρ) = Xp(f)ρ(p) + f(p)Xp(ρ) = Xp(f).

Corollary 2.54. If f, g ∈ C∞(M) coincide on some neighbourhood V of p, then Xp(f) =Xp(g). Moreover Xp(f) = 0 if f is constant in any neighbourhood of p

Proof. The function f − g vanishes on V hence 0 = Xp(f − g).For the constant 1 function 1 we get by the Leibnitz rule

Xp(1) = Xp(12) = Xp(1)1(p) + 1(p)Xp(1) = 2Xp(1)

holds, which implies Xp(1) = 0. By linearity we get Xp(c1) = 0 for any c ∈ R.Putting both together, Xp(f) = 0 if f = 0 in a neighbourhood of p.

Remark 2.55. The previous result shows, that Xp(f), just as the directional derivativeof a function on Rn depend only on the behaviour of the function f in an arbitrarilysmall neighbourhood of p (just the value of f at p is not enough). We say that Xp(f) isa local property of f .This is important, since it allows us to unambiguously apply Xp to functions, that

are only defined on some open neighbourhood of p. If V is an open neighbourhood of pand f : V → R a smooth function. Then we can find f ∈ C∞(M) such that f and fcoincide on a (possibly smaller) neighbourhood V ⊂ V and define Xp(f) := Xp(f). (Asan exercise, construct such a function f using the techniques from remark 2.43.)The choice of f is obviously not unique, but any two such functions will coincide on

a neighbourhood and will thus yield the same value Xp(f).

Lemma 2.56. Let U ⊂ Rn be open, v ∈ U and f : U → R smooth, then there aresmooth functions h1, . . . , hn (defined on a neighbourhood of v) such that

f(u) = f(v) +

n∑µ=1

(uµ − vµ)hµ(u)

holds on a neighbourhood of v.

Proof. Since f is smooth, by the fundamental theorem of calculus and the chain rule weget

f(u)− f(v) =

∫ 1

0

d

dtf(v + (u− v)t)dt =

∫ 1

0

n∑µ=1

(uµ − vµ)

(∂

∂uµf

)(v + (u− v)t)dt.

67

This has the required form, if we define

hµ(u) =

∫ 1

0

(∂

∂uµf

)(v + (u− v)t)dt.

Note that these functions are defined on some ball around v since U is open. It remainsto be shown, that the hµ are indeed smooth, which follows from Lebesgue’s theorem.

Remark 2.57. A particularly important case of functions defined only locally, are thecoordinate functions. Let Ui be a chart domain containing p, then the coordinate func-tions xµ = πµ ◦ ϕi are smooth functions on Ui. Thus we can apply derivations to themand in particular the partial derivatives:

∂ν(xµ) =∂

∂uν

∣∣∣∣ϕi(p)

xµ ◦ ϕ−1i =

∂uν

∣∣∣∣ϕi(p)

πµ = δµν .

Now suppose that Xp(xµ) = 0 for any µ ∈ {1, . . . , n}. We will show that this implies

Xp = 0. For an arbitrary f ∈ C∞(M), define

g : ϕi(Ui)→ Ru = (u1, . . . , un) 7→ f ◦ ϕ−1

i (u).

We now apply the previous lemma to the function g and v = ϕi(p), i.e. there are smoothh1, . . . , hn such that on some neighbourhood of ϕi(p)

g(u) = g(ϕi(p)) +n∑i=1

(uµ − xµ(p))hµ(u).

Applying this equation to u = ϕi( · ) we get the following expression for f (which holdson some neighbourhood of p)

f = f(p) +n∑i=1

(xµ − xµ(p))hµ ◦ ϕi.

Now apply Xp and use the Leibnitz rule to get

Xp(f) =n∑i=1

(Xp(xµ − xµ(p))hµ ◦ ϕi(p) + (xµ(p)− xµ(p))Xp(hµ ◦ ϕi)) = 0.

Since f was arbitrary, this implies Xp = 0.

Proposition 2.58. Let p be a point of M . For a chart ϕi such that p ∈ Ui, the partialderivatives ∂1, . . . , ∂n are a basis of TpM . In particular, the tangent space TpM hasdimension n.

68

Proof. We show linear independence first. Suppose that αµ∂µ = 0. Apply this to xν toget

0 = αµ∂µxν = αµδνµ = αν

which holds for any ν ∈ {1, . . . , n}.Now let Xp ∈ TpM be any derivation. Then Xp−Xp(x

µ)∂µ vanishes when applied toany coordinate function xν , so by the previous remark

Xp = Xp(xµ)∂µ.

Remark 2.59. The previous proposition implies in particular, that the map Rn 3 v 7→Dv ∈ TxRn is an isomorphism since it maps the canonical basis onto the partial deriva-tives ∂1, . . . , ∂n. Thus we can identify the tangent space TxRn with Rn itself and in factwe have been doing so on many previous occasions without explicitly mentioning it.

Remark 2.60. Let’s summarize the important facts about tangent spaces. To a smoothmanifold, we can define a vector space tangent space TpM in each point p. There is noa priori relation between the tangent spaces TpM and TqM for p 6= q. Tangent vectorsonly make sense at a point and tangent vectors at different points cannot be comparedor added.The tangent vectors act on (locally defined) smooth functions as derivations and they

can be used to specify a direction. To each smooth curve through a point p, we canassociate an element of TpM that is the tangent of that curve.To each (local) coordinate system, we get a canonical (local) basis of the tangent space

TpM .

69

2.3 The Tangent Bundle and other Vector BundlesRemark 2.61. Let p ∈ U ∩ U be a point ofM . We want to determine the transformationbehaviour of tangent vectors when changing from one set of coordinates ϕ to a new oneϕ. Denote the coordinate functions by xµ = πµ◦ϕ and xµ = πµ◦ϕ and the correspondingpartial derivatives by ∂µ and ∂µ. Denote the transition map by

Φ = ϕ ◦ ϕ−1 : ϕ(U ∩ U)→ ϕ(U ∩ U).

Since this is a map between open subsets of Rn, there are smooth component functionssuch that Φ = (Φ1, . . . ,Φn). Then Φµ(u1, . . . , un) is the value of the µ-coordinate in theold system (ϕ) given the coordinates u1, . . . , un in the new system (ϕ), in other wordsΦ expresses the old coordinates as functions of the new ones.Since both sets of partial derivatives are a basis of TpM , there has to be a transfor-

mation matrix TΦ = (aνµ) such that

∂µ = aνµ∂ν . (2.3)

We can determine the coefficients by applying the equation to xκ and using the definitionof the partial derivatives

aκµ = aνµ∂ν(xκ) = ∂µ(xκ) =∂

∂uµ

∣∣∣∣ϕ(p)

xκ ◦ ϕ−1 =∂

∂uµ

∣∣∣∣ϕ(p)

Φκ.

Thus TΦ is the Jacobi matrix of Φ in the point ϕ(p).Now let

Xp = Xµ∂µ = Xµ∂µ ∈ TpX

be a tangent vector. Using the above formula, we get

Xµ∂µ = Xµaνµ∂ν = Xν ∂ν

and since the basis expansion is unique Xν = aνµXµ. Thus we obtain the components

with respect to the old basis by multiplying the components with respect to the newbasis by the Jacobi matrix TΦ.In physics literature, the functions Φµ are often written as xµ(xν) and the entries of

the Jacobi matrix as ∂xµ

∂xν . Again these notations are formal, but they give equation (2.3)the form

∂xν=∂xµ

∂xν∂

∂xµ.

It is also common notation to put the tilde distinguishing the old and new coordinateson the indices.Note, that the Jacobi matrix TΦ is a smooth function of the new coordinates although

this dependence is often not written explicitly. Moreover, since Φ is invertible, the Jacobimatrix is invertible as well by the chain rule.

70

Example 2.62. The above is already interesting on Rn if we are changing from cartesianto polar coordinates for example. In this case, ϕ = id and ϕ−1 = Φ is given by

(r, θ) 7→ (r cos θ, r sin θ).

Then the Jacobi matrix is

TΦ =

(cos θ sin θ−r sin θ r cos θ

)At the point p = (1, 1), we want to transform the tangent vector ∂x−∂y. We determine

the cartesian coordinates ϕ(p) = (1, 1) respectively the polar coordinates ϕ(p) =(√

2, π4)

of p. Then we obtain in the point p the relations

∂r = cos θ∂x + sin θ∂y =1√2∂x +

1√2∂y

∂θ = −r sin θ∂x + r cos θ∂y = −∂x + ∂y

and inverting these

∂x =1√2∂r −

1

2∂θ and ∂y =

1√2∂r +

1

2∂θ.

Finally we get

∂x − ∂y = −∂θ.

Note that, to transform a tangent vector like this, we need to know the base point ofthat vector.

Remark 2.63. The collection of all the tangent spaces of a manifold M is called thetangent bundle

TM :=⊔p∈M

TpM.

We can think of that space as having the tangent space TpM attached to each point ofthe manifold M .The tangent bundle inherits a manifold structure from M by defining the following

atlas:

Vi :=⊔p∈Ui

TpM

Ψi : Vi → φi(Ui)× Rn

Xp = Xµ∂µ 7→ (x1(p), . . . , xn(p), X1, . . . , Xn).

We have to check that

71

(i) Ψi is bijective,

(ii) Ψi(Vi ∩ Vj) = φi(Ui ∩ Uj)× Rn is open for each i, j ∈ I,

(iii) the transition maps Ψj ◦Ψ−1i are smooth.

For the last point, we can use remark 2.61 to write for u ∈ ϕj(Ui ∩ Uj), v ∈ Rn

Ψj ◦Ψ−1i (u, v) = (Φ(u), TΦ(u)v).

Since Φ is smooth, so is its Jacobi-matrix TΦ and since the dependence on v is linear,the transition function is smooth. Note that on the vector part v, the transformationacts linearly (for fixed p ∈M).The map

π : TM →M

Xp 7→ p

is called the bundle projection. It is a smooth map (expressed in a chart, it correspondsto the projection (x, v) 7→ x). The space M is called the base space of the bundle andthe spaces TpM = π−1({p}) are called the fibres.

72