matrices a brief introduction - polito.it · deﬁnitions deﬁnition a matrix is a set of n real...

MatricesA brief introduction

Basilio Bona

DAUIN – Politecnico di Torino

September 2013

Basilio Bona (DAUIN) Matrices September 2013 1 / 74

Definitions

Definition

A matrix is a set of N real or complex numbers organized in m rows and ncolumns, with N = mn

A =

a11 a12 · · · a1na21 a22 · · · a2n· · · · · · aij · · ·am1 am2 · · · amn

≡

[aij]

i = 1, . . . ,m j = 1, . . . , n

A matrix is always written as a boldface capital letter viene as in A.

To indicate matrix dimensions we use the following symbols

Am×n Am×n A ∈ Fm×n A ∈ Fm×n

where F = R for real elements and F = C for complex elements.


Transpose matrix

Given a matrix Am×n we define a transpose matrix the matrix obtainedexchanging rows and columns

ATn×m =

a11 a21 · · · am1

a12 a22 · · · am2...

.... . .

...a1n a2n · · · amn

The following property holds (AT)T = A


Square matrix

A matrix is said to be square when m = n

A square n× n matrix is upper triangular when aij = 0, ∀i > j

An×n =

a11 a12 · · · a1n0 a22 · · · a2n...

.... . .

...0 0 · · · ann

If a square matrix is upper triangular its transpose is lower triangular andviceversa

ATn×n =

a11 0 · · · 0a12 a22 · · · 0...

.... . .

...a1n a2n · · · ann


Symmetric matrix

A real square matrix is said to be symmetric if A = AT, or

A−AT = O

In a real symmetric matrix there are at leastn(n+ 1)

2independent

elements.

If a matrix K has complex elements kij = aij + jbij (where j =√−1) its

conjugate is K with elements k ij = aij − jbij .

Given a complex matrix K, an adjoint matrix K∗ is defined, as the

conjugate transpose K∗ = KT= KT

A complex matrix is called self-adjoint or hermitian when K = K∗. Sometextbooks indicate this matrix as K† or KH


Diagonal matrix

A square matrix is diagonal if aij = 0 for i 6= j

An×n = diag(ai) =

a1 0 · · · 00 a2 · · · 0...

.... . .

...0 0 · · · an

A diagonal matrix is always symmetric.


Skew-symmetric matrix


A square matrix is skew-symmetric or antisymmetric if

A+ AT = 0 → A = −AT

Given the constraints of the above relation, a generic skew-symmetricmatrix has the following structure

An×n =

0 a12 · · · a1n−a12 0 · · · a2n...

.... . .

...−a1n −a2n · · · 0

In a skew-symmetric matrix there are at mostn(n − 1)

2non zero

independent elements. We will see in the following some importantproperties of the skew-symmetric 3× 3 matrices.


Block matrix

It is possible to represent a matrix with blocks as

A =

A11 · · · A1n

· · · Aij · · ·Am1 · · · Amn

where the blocks Aij have suitable dimensions.

Given the following matrices

A1 =

A11 · · · A1n

O Aij · · ·O O Amn

A2 =

A11 O O· · · Aij OAm1 · · · Amn

A3 =

A11 O OO Aij OO O Amn

A1 is upper block triangular, A2 is lower block triangular, and A3 is blockdiagonal


Matrix algebra

Matrices are elements of an algebra, i.e., a vector space together with aproduct operator. The main operations of this algebra are: product by ascalar, sum, and matrix product

Product by a scalar

αA = α

a11 a12 · · · a1na21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

=

αa11 αa12 · · · αa1nαa21 αa22 · · · αa2n...

.... . .

...αam1 αam2 · · · αamn

Sum

A+B =

a11 + b11 a12 + b12 · · · a1n + b1na21 + b21 a22 + b22 · · · a2n + b2n

......

. . ....

am1 + bm1 am2 + bm2 · · · amn + bmn


Matrix sum

Sum properties

A+O = A

A+ B = B+ A

(A +B) + C = A+ (B+ C)

(A+ B)T = AT + BT

The null (neutral, zero) element O takes the name of null matrix. Thesubtraction (difference) operation is defined using the scalar α = −1:

A− B = A+ (−1)B


Matrix product

Matrix product

The operation is performed using the well-known rule “rows by columns”:the generic element cij of the matrix product Cm×p = Am×n ·Bn×p is

cij =

n∑

k=1

aikbkj

The bi-linearity of the matrix product is guaranteed, since it is immediateto verify that, given a generic scalar α, the following identity holds:

α(A ·B) = (αA) ·B = A · (αB)


Product

Product properties

A ·B · C = (A · B) · C = A · (B · C)A · (B + C) = A ·B+ A · C(A+ B) · C = A · C+ B · C(A · B)T = BT ·AT

In general:

the matrix product is non-commutative: A ·B 6= B ·A, apart fromparticular cases;

A ·B = A · C does not imply B = C, apart from particular cases;

A ·B = O does not imply A = O or B = O, apart from particularcases.


Identity matrix

A neutral element wrt product exists and is called identity matrix, writtenas In or simply I when no ambiguity arises; given a rectangular matrixAm×n the following identities hold

Am×n = ImAm×n = Am×nIn

Identity matrix

I =

1 0 · · · 00 · · · · · · 0...

.... . .

...0 0 · · · 1


Idempotent matrix

Given a square matrix A ∈ Rn×n, the k-th power is

Ak =

k∏

ℓ=1

A

A matrix is said to be idempotent if

A2 = A ⇒ An = A


Trace

Trace

The trace of a square matrix An×n is the sum of its diagonal elements

tr (A) =

n∑

k=1

akk

The matrix traces satisfies the following properties

tr (αA+ βB) = α tr (A) + β tr (B)tr (AB) = tr (BA)tr (A) = tr (AT)tr (A) = tr (T−1AT) for non singular T (see below)


Minor

A minor of order p of a matrix Am×n is the determinant Dp of a squaresub-matrix obtained selecting any p rows and p columns of Am×n

The formal definition of determinant will be presented below

There are as many minors as there are possible choices of p on m rowsand of p on n columns.

Given a matrix Am×n, the principal minors of order k are the determinantsDk , with k = 1, · · · ,min{m, n}, obtained selecting the first k rows an kdcolumns of Am×n.


Minor and cofactor

Given A ∈ Rn×n, we indicate with A(ij) ∈ R

(n−1)×(n−1) the matrixobtained taking out the i -th row and the j-th column of A.

We define the minor Drc of a generic element arc of a square matrix An×n,the determinant of the matrix obtained taking out the r -th row and thec-th column, i.e., detA(rc)

Drc = detA(rc).

We define the cofactor of an element arc of a square matrix An×n theproduct

Arc = (−1)r+cDrc


Determinant

Once defined the cofactor, the determinant of a square matrix A can bedefined “by row”, i.e., choosing a generic row i ,

det (A) =

n∑

k=1

aik(−1)i+k det (A(ik)) =

n∑

k=1

aikAik

or, choosing a generic column j , we have the definition “by column”:

det (A) =

n∑

k=1

akj (−1)k+j det (A(kj)) =

n∑

k=1

akjAkj

Since these definition are recursive and assume the computation ofdeterminants of smaller order minors, it is necessary to define thedeterminant of a matrix 1× 1 (scalar), that is simply det (aij) = aij .


Properties of determinant

det(A · B) = det(A) det(B)

det(AT) = det(A)

det(kA) = kn det(A)

if one makes a number of s exchanges between rows or columns of A,obtaining a new matrix As , we have det(As) = (−1)s det(A)

if A has two equal or proportional rows/columns, we have det(A) = 0

if A has a row or a column that is a linear combination of other rowsor columns, we have det(A) = 0

if A e upper or lower triangular, we have det(A) =∏n

i=1 aii

if A is block triangular, with p blocks Aii on the diagonal, we havedet(A) =

∏pi=1 detAii


Singular matrix and rank

A matrix A is singular if det(A) = 0.

We define the rank of matrix Am×n, the number ρ(Am×n), computed asthe maximum integer such that at least a non zero minor Dp exists.

The following properties hold:

ρ(A) ≤ min{m, n}if ρ(A) = min{m, n}, A is said to have full rank

if ρ(A) < min{m, n}, the matrix does not have full rank and one saysthat there is a fall of rank

ρ(AB) ≤ min{ρ(A), ρ(B)}ρ(A) = ρ(AT)

ρ(AAT) = ρ(ATA) = ρ(A)

if An×n and detA < n then it has no full rank


Invertible matrix

Given a square matrix A ∈ Rn×n, it is invertible of nonsingular if an

inverse matrix A−1n×n exists, such that

AA−1 = A−1A = In

The matrix is invertible iff ρ(A) = n, or rather it has full rank; this impliesdet(A) 6= 0.

The inverse matrix can be computed as

A−1 =1

det(A)Adj(A)

The following properties hold: (A−1)−1 = A; (AT)−1 = (A−1)T.

The inverse matrix, if exists, allows to compute the following matrixequation

y = Ax

obtaining the unknown x as

x = A−1y.


Orthonormal matrix

A square matrix is orthonormal if A−1 = AT. the following identity holds

ATA = AAT = I

Given two square matrices A and B of equal dimension n× n, thefollowing identity holds

(AB)−1 = B−1A−1

An important results, called Inversion lemma, establish what follows: ifA,C are square invertible matrices and B,D are matrices of of suitabledimensions, then

(A+ BCD)−1 = A−1 − A−1B(DA−1B+ C−1)−1DA−1

Matrix (DA−1B+ C−1) must be invertible.

The inversion lemma is useful to compute the inverse of a sum of matricesA1 + A2, when A2 is decomposable into the product BCD and C is easilyinvertible, for instance diagonal or triangular.


Matrix derivative

If a matrix A(t) is composed of elements aij(t) that are all differentiablewrt (t), then the matrix derivative is

d

dtA(t) = A(t) =

[d

dtaij(t)

]

= [aij(t)]

If a square matrix A(t) has rank ρ(A(t)) = n for any time (t), then thederivative of its inverse is

d

dtA(t)−1 = −A−1(t)A(t)A(t)−1

Since the inverse operator is non linear, in general it results

[dA(t)

dt

]−1

6= d

dt

[A(t)−1

]


Symmetric Skew-symmetric decomposition

Given a real matrix A ∈ Rm×n, the two matrices

ATA ∈ Rn×n

AAT ∈ Rm×m

are both symmetric.

Given a square matrix A, it is always possible to factor it in a sum of twomatrices, as follows:

A = As + Aa

where

As =1

2(A+ AT) symmetric matrix

Aa =1

2(A− AT) skew-symmetric matrix


Similarity transformation

Given a square matrix A ∈ Rn×n and a non singular square matrix

T ∈ Rn×n, the new matrix B ∈ R

n×n, obtained as

B = T−1AT oppure B = TAT−1

is said to be similar to A, and the transformation T is called similaritytransformation.


Eigenvalues and eigenvectors

Considering the similarity transformation between A and Λ, where thelatter is diagonal Λ = diag(λi )

A = UΛU−1

andU =

[u1 u2 · · · un

]

Multiplying to the right A by U one obtains

AU = UΛ

and thenAui = λiui

This identity is the well-known formula that relates the matrix eigenvaluesto eigenvectors; the constant quantities λi are the eigenvalues of A, whilevectors ui are the eigenvectors of A, usually with non-unit norm.


Eigenvalues and eigenvectors

Given a square matrix An×n, the solutions λi (real or complex) of thecharacteristic equation

det(λI− A) = 0

are the eigenvalues of A.

det(λI− A) is a polynomial in λ, called characteristic polynomial.

If the eigenvalues arre all distinct, the vectors ui that satisfy the identity

Aui = λiui

are the eigenvectors of A.


Generalized eigenvectors

If the eigenvalues are not all distinct, one obtains the so-called generalizedeigenvalues, whose characterization goes beyond the scope of these notes.

From a geometrical point of view, the eigenvectors define those directionsin R

n (domain of the linear transformation represented by A) that areinvariant wrt the transformation A, while the eigenvalues provide therelated “scale factors” along these directions.

The set of eigenvalues of a matrix A will be indicated as Λ(A), or rather{λi (A)}; the set of eigenvectors of A will be indicated as {ui (A)}.In general, since the eigenvectors represent the invariant directions of thetransformation, they are represented up to a constant factor, so they areusually normalized; this is a tacit assumption that will be considered here,unless otherwise stated.


Eigenvalues

Properties

Given a matrix A and its eigenvalues {λi (A)}, the following holds true

{λi (A+ cI)} = {(λi (A) + c)}

Given a matrix A and its eigenvalues {λi (A)}, the following holds true

{λi (cA)} = {(cλi (A)}

Given an upper or lower triangular matrix

a11 a12 · · · a1n0 a22 · · · a2n...

.... . .

...0 0 · · · ann

,

a11 0 · · · 0a21 a22 · · · 0...

.... . .

...an1 an2 · · · ann

its eigenvalues are the terms on the diagonal {λi (A)} = {aii}; the sameapplies for a diagonal matrix.


Invariance of the eigenvalues

Given a matrix An×n and its eigenvalues {λi (A)}, the following holds true

det(A) =

n∏

i=1

λi

and

tr (A) =

n∑

i=1

λi

Given a general invertible transformation, represented by the matrix T, theeigenvalues of A are invariant to the similarity transformation

B = T−1AT

or rather{λi (B)} = {λi (A)}


Modal matrix

If we build a matrix M, whose columns are the unit eigenvalues ui (A) of A

M =[u1 · · · un

]

then the similarity transformation wrt M results i a diagonal matrix

Λ =

λ1 0 · · · 00 λ2 · · · 0...

.... . .

...0 0 · · · λn

= M−1AM

M takes the name of modal matrix.

If A is symmetric, its eigenvalues are all real and the following identityholds

Λ = MTAM

In this particular case M is orthonormal.


Singular value decomposition – SVD

Given a generic matrix A ∈ Rm×n, having rank r = ρ(A) ≤ s, with

s = min{m, n}, it can be factorized according to the

Singular value decomposition (SVD)

in the following way:

A = UΣVT =

s∑

i=1

σiuivTi

the important elements of this decomposition are σi , ui and vi


SVD

σi(A) ≥ 0 are called singular values and are equal to the non-negative

square roots of the eigenvalues of the symmetric matrix ATA:

{σi (A)} = {√

λi (ATA)} σi ≥ 0

listed in decreasing order

σ1 ≥ σ2 ≥ · · · ≥ σs ≥ 0

if rank r < s there are only r positive singular values; the remainingones are zero

σ1 ≥ σ2 ≥ · · · ≥ σr > 0; σr+1 = · · · = σs = 0

U is an orthonormal square matrix (m ×m)

U =[u1 u2 · · · um

]

whose columns are the eigenvectors ui of AAT


SVD

V is a orthonormal square matrix (n × n)

V =[v1 v2 · · · vn

]

whose columns are the eigenvectors vi of ATA

Σ is a rectangular matrix (m × n) with the following structure

if m < n Σ =[Σs O

]

if m = n Σ = Σs

if m > n Σ =

[Σs

O

]

.

Σs = diag(σi) is s × s diagonal, and its diagonal terms are thesingular values.


SVD

Otherwise we can decompose A in a way that puts in evidence the positivesingular values alone:

A =[P P

]

︸︷︷︸

U

[Σr OO O

]

︸︷︷︸

Σ

[QT

QT

]

︸︷︷︸

VT

= PΣrQT

where

P is a m × r orthonormal matrix; P is a m × (m − r) orthonormalmatrix;

Q is a n × r orthonormal matrix, QTia a n × (n − r) orthonormal

matrix;

Σr is a r × r diagonal matrix with diagonal elements the positivesingular values σi , i = 1, · · · , r .


SVD and rank

The rank r of A is equal to the number r ≤ s of nonzero singular values.

Given a generic matrix A ∈ Rm×n, the two matrices ATA and AAT are

symmetrical, have the same positive singular values, and differ only for thenumber of zero singular values.


Linear operators representation

Given two vector spaces X ⊆ Rn and Y ⊆ R

m, with dimensions n and m,and given two generic vectors x ∈ X and y ∈ Y, the generic lineartransformation between the two spaces can be represented by the matrixoperator A ∈ R

m×n, as follows:

y = Ax; x ∈ Rn; y ∈ R

m.

Therefore a matrix can be always interpreted as an operator thattransforms a vector from the domain in X to the range Y.Conversely, a linear operator has at least one matrix that represents it.


Image space and null space

The image space or range of a transformation A is the subspace Y definedby the following property:

R(A) = {y | y = Ax, x ∈ X}; R(A) ⊆ Y

The null space or kernel of a transformation A is the subspace of Xdefined by the following property:

N (A) = {x | 0 = Ax, x ∈ X}; N (A) ⊆ X

The null space represents all the vectors in X that are trasformed into theorigin of Y.The dimensions of the range and kernel are called, respectively, rank ρ(A)and nullity ν(A):

ρ(A) = dim(R(A)); ν(A) = dim(N (A)).


Image space and null space

If X and Y have finite dimensions, the the following equalities hold:

N (A) = R(AT)⊥

R(A) = N (AT)⊥

N (A)⊥ = R(AT)R(A)⊥ = N (AT)

where ⊥ indicates the orthogonal complement to the corresponding(sub-)space. We recall that {0}⊥ = R.

The following orthogonal decomposition of subspaces X and Y hold

X = N (A)⊕N (A)⊥ = N (A)⊕R(AT)

Y = R(A)⊕R(A)⊥ = R(A)⊕N (AT)

where the symbol ⊕ represents the direct sum operator between subspaces.


Generalized inverse

Given a generic real matrix A ∈ Rm×n, with m 6= n, the inverse matrix is

not defined. Nevertheless, it is possible to define a class of matrices A−,called pseudo-inverses or generalized inverses, that satisfy the followingrelation:

AA−A = A

If A has full rank, i.e., ρ(A) = min{m, n}, it is possible to define twoclasses of generalized inverses

if m < n (i.e., ρ(A) = m), the right inverse of A is a matrixAr ∈ R

n×m such thatAAr = Im×m

is n < m (i.e., ρ(A) = n), the left inverse of A is a matrix Aℓ ∈ Rn×m

such thatAℓA = In×n


Pseudo-inverse matrix

Among the possible left- or right- inverses, two classes are important:

right pseudo-inverse (m < n):

A+r = AT(AAT)−1

When ρ(A) = m, then (AAT)−1 exists.

left pseudo-inverse (n < m):

A+ℓ = (ATA)−1AT

When ρ(A) = n, then (ATA)−1 exists; this particular leftpseudo-inverse

(ATA)−1AT

is also known as the Moore-Penrose pseudo-inverse.


Moore-Penrose pseudo-inverse

In general, also if ATA is non invertible, it is always possible to define aMoore-Penrose pseudo-inverse A+ that satistfies the following relations:

AA+A = A

A+AA+ = A+

(AA+)T = AA+

(A+A)T = A+A

(1)


Left and right pseudo-inverses

The two pseudo-inverses A+r and A+

ℓ coincide with the traditional inverse

matrix A−1 when A is square and full-rank:

A−1 = A+r = A+

ℓ = A+

The linear transformation associated to A ∈ Rm×n

y = Ax,

with x ∈ Rn and y ∈ R

m, is equivalent to a system of m linear equationsin n unknowns, whose coefficients are the elements of A; this linear systemcan admit one solution, no solution or an infinite number of solutions.

If we use the pseudo-inverses to solve the linear system y = Ax, we mustdistinguish two cases, assuming that A has full rank.


Linear systems solution – 1

When n > m

there are more unknowns than equations; among the infinite possiblesolutions x ∈ R

n, we choose the one with minimum norm ‖x‖, given by

x∗ = A+d y = AT(AAT)−1y

All the other possible solutions of y = Ax are obtained as

x = x∗ + v = A+dy + v

where v ∈ N (A) is a vector belonging to the null space of A, withdimensions n −m. These other possible solutions can be expressed also as

x = A+d y + (I− A+

d A)w

where w ∈ Rn is a n × 1 generic vector. The matrix I− A+

dA projects w

on the null space of A, transforming w in v ∈ N (A); this matrix is calledprojection matrix.


Figure: Solution of y = Ax when n > m.



When m > n

there are more equations than unknowns; no exact solutions exist fory = Ax, but only approximate solutions, with an error e = y − Ax 6= 0.Among these possible approximate solutions we choose that minimizingthe norm of the error, i.e.,

x = arg minx∈Rn

‖y − Ax‖

The solution isx = A+

ℓy = (ATA)−1ATy

Geometrically it is the orthogonal projection of y on the orthogonalcomplement of N (A), i.e., on the subspace N (A)⊥ = R(AT).

The approximation error, also called projection error, is

e = (I− AA+s )y

and its norm is the lowest, as said above.


Figure: Solution of y = Ax when m > n.



The similarity between the projection matrix I− A+dA and the matrix that

gives the projection error I− AA+s is important and will be studied when

projection matrices will be treated.

In order to compute the generalized inverses, one can use the SVD.

In particular, the pseudo-inverse is computed as

A+ = V

[

Σ−1r OO O

]

UT = QΣ−1r PT.


Projections and projection matrices

The geometrical concept of a projection of a segment on a plane can beextended and generalized to the elements of a vector space. This conceptis important for the solution of a large number of problems, asapproximation, estimation, prediction and filtering problems.

Give a n-dimensional real vector space V(Rn) with dimensions, endowedwith the scalar product, and a k ≤ n dimensional subspace W(Rk), it ispossible to define the projection operator of vectors v ∈ V on the subspaceW.

The projection operator is the square projection matrix P ∈ Rn×n, whose

columns are the projections of the base elements of V in W. A matrix is aprojection matrix iff P2 = P i.e., is idempotent.

The projection can be orthogonal or non orthogonal; in the first case P issymmetrical, in the second case it is generic. If P is a projection matrix,also I− P is a projection matrix.


Projection matrices

Some examples of projection matrices are those associated to the leftpseudo-inverse

P1 = AA+s e P2 = I− AA+

s

and to the right pseudo-inverse

P3 = A+d A e P4 = I− A+

d A

From a geometrical point of view, P1 projects every vector v ∈ V in therange space R(A), while P2 projects v in its orthogonal complement

R(A)⊥ = N (AT).


Matrix norm – 1

Similarly to what can be established for a vector, it is possible to provide a“measure” of the matrix, i.e., give its “magnitude”, defining the matrixnorm.

Since a matrix represents a linear transformation between vectors, thematrix norm measures how big this transformation is, but in some way,must “normalize” the result, to avoid that the magnitude of thetransformed vector affects the norm; hence the following definition:

‖A‖ def= sup

‖x‖

‖Ax‖‖x‖ = sup

‖x‖=1‖Ax‖ .


Matrix norm – 2

Given a square matrix A ∈ Rn×n, its norm must satisfy the following

general (norm) axioms:

1 ‖A‖ > 0 for every A 6= O;2 ‖A‖ = 0 iff A = O;3 ‖A+ B‖ ≤ ‖A‖+ ‖B‖ (triangular inequality);4 ‖αA‖ = |α| ‖A‖ for any scalar α and any matrix A;5 ‖AB‖ ≤ ‖A‖ ‖B‖.

Given A ∈ Rn×n and its eigenvalues {λi (A)}, the following inequality

holds true1

∥∥A−1

∥∥≤ |λi | ≤ ‖A‖ ∀i = 1, . . . , n


Matrix norm – 3

Taking only into account real matrices, the most used norms are:

Spectral norm:

‖A‖2 =√

maxi{λi (A

TA)}

Frobenius norm

‖A‖F =

√∑

i

∑

j

a2ij =√

trATA

Max singular value:

‖A‖σ =√

maxi

{σi (A)}


Matrix norm – 4

1-norm or max-norm:

‖A‖1 = maxj

n∑

i=1

|aij |

∞-norm:

‖A‖∞ = maxi

n∑

j=1

|aij |

In general,‖A‖2 = ‖A‖σ

and‖A‖22 ≤ ‖A‖1 ‖A‖∞


Skew-symmetric matrices


A square matrix S is called skew-symmetric or antisymmetric when

S+ ST = O or S = −ST

A skew-symmetric matrix has the following structure

An×n =

0 s12 · · · s1n−s12 0 · · · s2n...

.... . .

...−s1n −s2n · · · 0

Therefore there it has at mostn(n − 1)

2independent elements.



For n = 3 it resultsn(n − 1)

2= 3, hence an skew-symmetric matrix has as

many element as a 3D vector v.

Given a vector v =[v1 v2 v3

]Tit is possible to build S, and given a

matrix S it is possible to extract the associated vector v.

We indicate this fact using the symbol S(v), where, by convention

S(v) =

0 −v3 v2v3 0 −v1−v2 v1 0



Some properties:

Given any vector v ∈ R3:

ST(v) = −S(v) = S(−v)

Given two scalars λ1, λ2 ∈ R:

S(λ1u+ λ2v) = λ1S(u) + λ2S(v)

Given any two vectors v,u ∈ R3:

S(u)v = u× v = −v × u = S(−v)u = ST(v)u

Therefore S(u) is the representation of the operator (u×) andviceversa.



The matrix S(u)S(u) = S2(u) is symmetrical and

S2(u) = uuT − ‖u‖2 I

Hence the dyadic product

D(u,u) = uuT = S2(u) + ‖u‖2 I


Eigenvalues and eigenvectors of skew-symmetric matrices

Given an skew-symmetric matrix S(v), its eigenvalues are imaginary orzero.

λ1 = 0, λ2,3 = ±j ‖v‖The eigenvalue related to the eigenvector λ1 = 0 is v; the other two arecomplex conjugate.

The set of skew-symmetric matrices is a vector space, denoted as so(3).

Given two skew-symmetric matrices S1 and S2, we call commutator or Liebracket the following operator

[S1,S2]def= S1S2 − S2S1

that is itself skew-symmetric.

Skew-symmetric matrices form a Lie algebra, which is related to the Liegroup of orthogonal matrices.


Orthogonal matrices

A square matrix A ∈ Rn is called orthogonal when

ATA =

α1 0 · · · 00 α2 · · · 0...

.... . .

...0 0 · · · αn

with αi 6= 0.

A square orthogonal matrix U ∈ Rn is called orthonormal when all the

constants αi are 1:UTU = UUT = I

ThereforeU−1 = UT


Orthonormal matrices

Other properties:

The columns, as well as the rows, of U or orthogonal to each otherand have unit norm.

‖U‖ = 1;

The determinant of U has unit module:

|det(U)| = 1

therefore it can be +1 or −1.

Given a vector x, its orthonormal transformation is y = Ux.



If U is an orthonormal matrix, then ‖AU‖ = ‖UA‖ = ‖A‖.Property in general valid also for unitary matrices, i.e., U∗U = I.

When U ∈ R3×3, only 3 out of 9 elements are independent.

Scalar product is invariant to orthonormal transformations,

(Ux) · (Uy) = (Ux)T(Uy) = xTUTUy = xTy = x · y

This means that vector lengths are invariant wrt orthonormaltrasformations

‖Ux‖ = (Ux)T(Ux) = xTUTUx = xTIx = xTx = ‖x‖



When considering orthonormal transformations, it is important todistinguish the two cases:

When det(U) = +1, U represents a proper rotation or simply arotation,

when det(U) = −1, U represents an improper rotation or reflection.

The set of rotations forms a continuous non-commutative (wrt product)group; the set of reflections do not have this “quality”.

Intuitively this means that infinitesimal rotations exist, while infinitesimalreflections do not have any meaning.

Reflections are the most basic transformation in 3D spaces, in the sensethat translations, rotations and roto-reflections (slidings) are obtainedfrom the composition of two or three reflections


Figure: Reflections.



If U is an orthonormal matrix, the distributive property wrt the crossproduct holds:

U(x× y) = (Ux)× (Uy)

(with general A matrices this is not true).

For any proper rotation matrix U and a generic vector x the following holds

US(x)UTy = U(x× (UTy)

)= (Ux)× (UUTy)

= (Ux)× y = S(Ux)y

where S(x) is the skew-symmetric matrix associated with x; therefore:

US(x)UT = S(Ux)US(x) = S(Ux)U


Bilinear and quadratic forms

A bilinear form associated to the matrix A ∈ Rm×n is the scalar quantity

defined as

b(x, y)def= xTAy = yTATx

A quadratic form associated to the square matrix A ∈ Rn×n is the scalar

quantity defined as

q(x)def= xTAx = xTATx

Every quadratic form associated to a skew-symmetric matrix S(y) isidentically zero

xTS(y)x ≡ 0 ∀xIndeed, assuming w = S(y)x = y × x, one obtains xTS(y)x = xTw, butsince, by definition, w is orthogonal to both y and x, the scalar productxTw will be always zero, and also the quadratic form at the left hand side.


Definite positive matrices – 1

Recalling the standard decomposition of a generic square matrix A insymmetric term As and an skew-symmetric one Aa, one concludes that thequadratic form depends only on the symmetric part of the matrix:

q(x) = xTAx = xT(As + Aa)x = xTAsx

A square matrix A is said to be positive definite if the associated quadraticform xTAx satisfies to the following conditions

xTAx > 0 ∀x 6= 0

xTAx = 0 x = 0

A square matrix A is said to be positive semidefinite if the associatedquadratic form xTAx satisfies to the following conditions

xTAx ≥ 0 ∀x

A square matrix A is said to be negative definite if −A is positive definite;similarly, a square matrix A is semidefinite negative if −A e semidefinitepositive.


Definite positive matrices – 2

Often we use the following notations:

definite positive matrix: A ≻ 0semidefinite positive matrix: A � 0definite negative matrix: A ≺ 0semidefinite negative matrix: A � 0

A necessary but not sufficient condition for a square matrix A to bepositivedefinite is that the elements on its diagonal are all strictly positive.

A necessary and sufficient condition for a square matrix A to be definitepositive is that all its eigenvalues are strictly positive.


Sylvester criterion

The Sylvester criterion states that a square matrix A is positive definite iffall its principal minors are strictly positive.

A definite positive matrix has full rank and is always invertible

The associated quadratic form xTAx satisfies the following identity

λmin(A) ‖x‖2 ≤ xTAx ≤ λmax(A) ‖x‖2

where λmin(A) and λmax(A) are, respectively, the minimum and themaximum eigenvalues.


Semidefinite matrix and rank

A semidefinite positive matrix An×n has rank ρ(A) = r < n, i.e., it has rstrictly positive eigenvalues and n − r zero eigenvalues.

The quadratic form sgoes to zero for every vector x ∈ N (A).

Given a real matrix of generic dimensions Am×n, we have seen that bothATA and AAT are symmetrical; in addition we know that

ρ(ATA) = ρ(AAT) = ρ(A)

These matrices have all real, non negative eigenvalues, and therefore theyare definite or semidefinite positive: in particular, if Am×n has full rank,then

if m < n, ATA � 0 and AAT ≻ 0,

if m = n, ATA ≻ 0 and AAT ≻ 0,

if m > n, ATA ≻ 0 and AAT � 0.


Matrix derivatives – 1

If matrix A has its elements that are functions of a quantity x , one candefine the matrix derivative wrt x as

d

dxA(x) :=

[daij

dx

]

If x is the time t, one writes

d

dtA(t) ≡ A(t) :=

[

daij(t)

dt

]

≡[aij]

If A is a time function through the variable x(t), then

d

dtA(x(t)) ≡ A(x(t)) :=

[

∂aij(x)

∂x

dx(t)

dt

]

≡[

∂aij(x)

∂x

]

x(t)


Matrix derivatives – 2

Given a vector-values scalar function φ(x) defined as φ(·) : Rn → R1, the

gradient of the function φ wrt to x is a column vector

∇xφ =∂φ

∂x:=

∂φ(x)

∂x1· · ·

∂φ(x)

∂xn

i.e., ∇x :=

∂

∂x1· · ·∂

∂xn

= gradx

If x(t) is a differentiable time function, then

dφ(x)

dt≡ φ(x) = ∇T

x φ(x)x

(Notice the convention: the gradient for us is a column vector, althoughmany textbooks assume it is a row vector)


Jacobian matrix

Given a m × 1 vector function f(x) =[f1(x) · · · fm(x)

]T, x ∈ R

n, theJacobian matrix (or simply the jacobian) is a m × n matrix defined as

Jf(x) =

(∂f1(x)

∂x

)T

· · ·(∂fm(x)

∂x

)T

=

∂f1(x)

∂x1· · · ∂f1(x)

∂xn

· · · ∂fi (x)

∂xj· · ·

∂fm(x)

∂x1· · · ∂fm(x)

∂xn

=

(gradxf1)T

· · ·(gradxfm)

T

and if x(t) is a differentiable time function, then

f(x) ≡ df(x)

dt=

df(x)

dxx(t) = Jf(x)x(t)

Notice that the rows of Jf are the transpose of the gradients of the variousfunctions.


Gradient

Given a bilinear form b(x, y) = xTAy, we call gradients the followingvectors:

gradient wrt x: gradxb(x, y)def=

∂b(x, y)

∂x= Ay

gradient wrt y: gradyb(x, y)def=

∂b(x, y)

∂y= ATx

Given the quadratic form q(x) = xTAx, we call gradient wrt x thefollowing vector:

∇xq(x) ≡ gradxq(x)def=

∂q(x)

∂x= 2Ax


matrices a brief introduction - polito.it · deﬁnitions deﬁnition a matrix is a set of n real...

Documents