matrices a brief introduction - polito.it · definitions definition a matrix is a set of n real...
TRANSCRIPT
MatricesA brief introduction
Basilio Bona
DAUIN – Politecnico di Torino
September 2013
Basilio Bona (DAUIN) Matrices September 2013 1 / 74
Definitions
Definition
A matrix is a set of N real or complex numbers organized in m rows and ncolumns, with N = mn
A =
a11 a12 · · · a1na21 a22 · · · a2n· · · · · · aij · · ·am1 am2 · · · amn
≡
[aij]
i = 1, . . . ,m j = 1, . . . , n
A matrix is always written as a boldface capital letter viene as in A.
To indicate matrix dimensions we use the following symbols
Am×n Am×n A ∈ Fm×n A ∈ Fm×n
where F = R for real elements and F = C for complex elements.
Basilio Bona (DAUIN) Matrices September 2013 2 / 74
Transpose matrix
Given a matrix Am×n we define a transpose matrix the matrix obtainedexchanging rows and columns
ATn×m =
a11 a21 · · · am1
a12 a22 · · · am2...
.... . .
...a1n a2n · · · amn
The following property holds (AT)T = A
Basilio Bona (DAUIN) Matrices September 2013 3 / 74
Square matrix
A matrix is said to be square when m = n
A square n× n matrix is upper triangular when aij = 0, ∀i > j
An×n =
a11 a12 · · · a1n0 a22 · · · a2n...
.... . .
...0 0 · · · ann
If a square matrix is upper triangular its transpose is lower triangular andviceversa
ATn×n =
a11 0 · · · 0a12 a22 · · · 0...
.... . .
...a1n a2n · · · ann
Basilio Bona (DAUIN) Matrices September 2013 4 / 74
Symmetric matrix
A real square matrix is said to be symmetric if A = AT, or
A−AT = O
In a real symmetric matrix there are at leastn(n+ 1)
2independent
elements.
If a matrix K has complex elements kij = aij + jbij (where j =√−1) its
conjugate is K with elements k ij = aij − jbij .
Given a complex matrix K, an adjoint matrix K∗ is defined, as the
conjugate transpose K∗ = KT= KT
A complex matrix is called self-adjoint or hermitian when K = K∗. Sometextbooks indicate this matrix as K† or KH
Basilio Bona (DAUIN) Matrices September 2013 5 / 74
Diagonal matrix
A square matrix is diagonal if aij = 0 for i 6= j
An×n = diag(ai) =
a1 0 · · · 00 a2 · · · 0...
.... . .
...0 0 · · · an
A diagonal matrix is always symmetric.
Basilio Bona (DAUIN) Matrices September 2013 6 / 74
Skew-symmetric matrix
Skew-symmetric matrix
A square matrix is skew-symmetric or antisymmetric if
A+ AT = 0 → A = −AT
Given the constraints of the above relation, a generic skew-symmetricmatrix has the following structure
An×n =
0 a12 · · · a1n−a12 0 · · · a2n...
.... . .
...−a1n −a2n · · · 0
In a skew-symmetric matrix there are at mostn(n − 1)
2non zero
independent elements. We will see in the following some importantproperties of the skew-symmetric 3× 3 matrices.
Basilio Bona (DAUIN) Matrices September 2013 7 / 74
Block matrix
It is possible to represent a matrix with blocks as
A =
A11 · · · A1n
· · · Aij · · ·Am1 · · · Amn
where the blocks Aij have suitable dimensions.
Given the following matrices
A1 =
A11 · · · A1n
O Aij · · ·O O Amn
A2 =
A11 O O· · · Aij OAm1 · · · Amn
A3 =
A11 O OO Aij OO O Amn
A1 is upper block triangular, A2 is lower block triangular, and A3 is blockdiagonal
Basilio Bona (DAUIN) Matrices September 2013 8 / 74
Matrix algebra
Matrices are elements of an algebra, i.e., a vector space together with aproduct operator. The main operations of this algebra are: product by ascalar, sum, and matrix product
Product by a scalar
αA = α
a11 a12 · · · a1na21 a22 · · · a2n...
.... . .
...am1 am2 · · · amn
=
αa11 αa12 · · · αa1nαa21 αa22 · · · αa2n...
.... . .
...αam1 αam2 · · · αamn
Sum
A+B =
a11 + b11 a12 + b12 · · · a1n + b1na21 + b21 a22 + b22 · · · a2n + b2n
......
. . ....
am1 + bm1 am2 + bm2 · · · amn + bmn
Basilio Bona (DAUIN) Matrices September 2013 9 / 74
Matrix sum
Sum properties
A+O = A
A+ B = B+ A
(A +B) + C = A+ (B+ C)
(A+ B)T = AT + BT
The null (neutral, zero) element O takes the name of null matrix. Thesubtraction (difference) operation is defined using the scalar α = −1:
A− B = A+ (−1)B
Basilio Bona (DAUIN) Matrices September 2013 10 / 74
Matrix product
Matrix product
The operation is performed using the well-known rule “rows by columns”:the generic element cij of the matrix product Cm×p = Am×n ·Bn×p is
cij =
n∑
k=1
aikbkj
The bi-linearity of the matrix product is guaranteed, since it is immediateto verify that, given a generic scalar α, the following identity holds:
α(A ·B) = (αA) ·B = A · (αB)
Basilio Bona (DAUIN) Matrices September 2013 11 / 74
Product
Product properties
A ·B · C = (A · B) · C = A · (B · C)A · (B + C) = A ·B+ A · C(A+ B) · C = A · C+ B · C(A · B)T = BT ·AT
In general:
the matrix product is non-commutative: A ·B 6= B ·A, apart fromparticular cases;
A ·B = A · C does not imply B = C, apart from particular cases;
A ·B = O does not imply A = O or B = O, apart from particularcases.
Basilio Bona (DAUIN) Matrices September 2013 12 / 74
Identity matrix
A neutral element wrt product exists and is called identity matrix, writtenas In or simply I when no ambiguity arises; given a rectangular matrixAm×n the following identities hold
Am×n = ImAm×n = Am×nIn
Identity matrix
I =
1 0 · · · 00 · · · · · · 0...
.... . .
...0 0 · · · 1
Basilio Bona (DAUIN) Matrices September 2013 13 / 74
Idempotent matrix
Given a square matrix A ∈ Rn×n, the k-th power is
Ak =
k∏
ℓ=1
A
A matrix is said to be idempotent if
A2 = A ⇒ An = A
Basilio Bona (DAUIN) Matrices September 2013 14 / 74
Trace
Trace
The trace of a square matrix An×n is the sum of its diagonal elements
tr (A) =
n∑
k=1
akk
The matrix traces satisfies the following properties
tr (αA+ βB) = α tr (A) + β tr (B)tr (AB) = tr (BA)tr (A) = tr (AT)tr (A) = tr (T−1AT) for non singular T (see below)
Basilio Bona (DAUIN) Matrices September 2013 15 / 74
Minor
A minor of order p of a matrix Am×n is the determinant Dp of a squaresub-matrix obtained selecting any p rows and p columns of Am×n
The formal definition of determinant will be presented below
There are as many minors as there are possible choices of p on m rowsand of p on n columns.
Given a matrix Am×n, the principal minors of order k are the determinantsDk , with k = 1, · · · ,min{m, n}, obtained selecting the first k rows an kdcolumns of Am×n.
Basilio Bona (DAUIN) Matrices September 2013 16 / 74
Minor and cofactor
Given A ∈ Rn×n, we indicate with A(ij) ∈ R
(n−1)×(n−1) the matrixobtained taking out the i -th row and the j-th column of A.
We define the minor Drc of a generic element arc of a square matrix An×n,the determinant of the matrix obtained taking out the r -th row and thec-th column, i.e., detA(rc)
Drc = detA(rc).
We define the cofactor of an element arc of a square matrix An×n theproduct
Arc = (−1)r+cDrc
Basilio Bona (DAUIN) Matrices September 2013 17 / 74
Determinant
Once defined the cofactor, the determinant of a square matrix A can bedefined “by row”, i.e., choosing a generic row i ,
det (A) =
n∑
k=1
aik(−1)i+k det (A(ik)) =
n∑
k=1
aikAik
or, choosing a generic column j , we have the definition “by column”:
det (A) =
n∑
k=1
akj (−1)k+j det (A(kj)) =
n∑
k=1
akjAkj
Since these definition are recursive and assume the computation ofdeterminants of smaller order minors, it is necessary to define thedeterminant of a matrix 1× 1 (scalar), that is simply det (aij) = aij .
Basilio Bona (DAUIN) Matrices September 2013 18 / 74
Properties of determinant
det(A · B) = det(A) det(B)
det(AT) = det(A)
det(kA) = kn det(A)
if one makes a number of s exchanges between rows or columns of A,obtaining a new matrix As , we have det(As) = (−1)s det(A)
if A has two equal or proportional rows/columns, we have det(A) = 0
if A has a row or a column that is a linear combination of other rowsor columns, we have det(A) = 0
if A e upper or lower triangular, we have det(A) =∏n
i=1 aii
if A is block triangular, with p blocks Aii on the diagonal, we havedet(A) =
∏pi=1 detAii
Basilio Bona (DAUIN) Matrices September 2013 19 / 74
Singular matrix and rank
A matrix A is singular if det(A) = 0.
We define the rank of matrix Am×n, the number ρ(Am×n), computed asthe maximum integer such that at least a non zero minor Dp exists.
The following properties hold:
ρ(A) ≤ min{m, n}if ρ(A) = min{m, n}, A is said to have full rank
if ρ(A) < min{m, n}, the matrix does not have full rank and one saysthat there is a fall of rank
ρ(AB) ≤ min{ρ(A), ρ(B)}ρ(A) = ρ(AT)
ρ(AAT) = ρ(ATA) = ρ(A)
if An×n and detA < n then it has no full rank
Basilio Bona (DAUIN) Matrices September 2013 20 / 74
Invertible matrix
Given a square matrix A ∈ Rn×n, it is invertible of nonsingular if an
inverse matrix A−1n×n exists, such that
AA−1 = A−1A = In
The matrix is invertible iff ρ(A) = n, or rather it has full rank; this impliesdet(A) 6= 0.
The inverse matrix can be computed as
A−1 =1
det(A)Adj(A)
The following properties hold: (A−1)−1 = A; (AT)−1 = (A−1)T.
The inverse matrix, if exists, allows to compute the following matrixequation
y = Ax
obtaining the unknown x as
x = A−1y.
Basilio Bona (DAUIN) Matrices September 2013 21 / 74
Orthonormal matrix
A square matrix is orthonormal if A−1 = AT. the following identity holds
ATA = AAT = I
Given two square matrices A and B of equal dimension n× n, thefollowing identity holds
(AB)−1 = B−1A−1
An important results, called Inversion lemma, establish what follows: ifA,C are square invertible matrices and B,D are matrices of of suitabledimensions, then
(A+ BCD)−1 = A−1 − A−1B(DA−1B+ C−1)−1DA−1
Matrix (DA−1B+ C−1) must be invertible.
The inversion lemma is useful to compute the inverse of a sum of matricesA1 + A2, when A2 is decomposable into the product BCD and C is easilyinvertible, for instance diagonal or triangular.
Basilio Bona (DAUIN) Matrices September 2013 22 / 74
Matrix derivative
If a matrix A(t) is composed of elements aij(t) that are all differentiablewrt (t), then the matrix derivative is
d
dtA(t) = A(t) =
[d
dtaij(t)
]
= [aij(t)]
If a square matrix A(t) has rank ρ(A(t)) = n for any time (t), then thederivative of its inverse is
d
dtA(t)−1 = −A−1(t)A(t)A(t)−1
Since the inverse operator is non linear, in general it results
[dA(t)
dt
]−1
6= d
dt
[A(t)−1
]
Basilio Bona (DAUIN) Matrices September 2013 23 / 74
Symmetric Skew-symmetric decomposition
Given a real matrix A ∈ Rm×n, the two matrices
ATA ∈ Rn×n
AAT ∈ Rm×m
are both symmetric.
Given a square matrix A, it is always possible to factor it in a sum of twomatrices, as follows:
A = As + Aa
where
As =1
2(A+ AT) symmetric matrix
Aa =1
2(A− AT) skew-symmetric matrix
Basilio Bona (DAUIN) Matrices September 2013 24 / 74
Similarity transformation
Given a square matrix A ∈ Rn×n and a non singular square matrix
T ∈ Rn×n, the new matrix B ∈ R
n×n, obtained as
B = T−1AT oppure B = TAT−1
is said to be similar to A, and the transformation T is called similaritytransformation.
Basilio Bona (DAUIN) Matrices September 2013 25 / 74
Eigenvalues and eigenvectors
Considering the similarity transformation between A and Λ, where thelatter is diagonal Λ = diag(λi )
A = UΛU−1
andU =
[u1 u2 · · · un
]
Multiplying to the right A by U one obtains
AU = UΛ
and thenAui = λiui
This identity is the well-known formula that relates the matrix eigenvaluesto eigenvectors; the constant quantities λi are the eigenvalues of A, whilevectors ui are the eigenvectors of A, usually with non-unit norm.
Basilio Bona (DAUIN) Matrices September 2013 26 / 74
Eigenvalues and eigenvectors
Given a square matrix An×n, the solutions λi (real or complex) of thecharacteristic equation
det(λI− A) = 0
are the eigenvalues of A.
det(λI− A) is a polynomial in λ, called characteristic polynomial.
If the eigenvalues arre all distinct, the vectors ui that satisfy the identity
Aui = λiui
are the eigenvectors of A.
Basilio Bona (DAUIN) Matrices September 2013 27 / 74
Generalized eigenvectors
If the eigenvalues are not all distinct, one obtains the so-called generalizedeigenvalues, whose characterization goes beyond the scope of these notes.
From a geometrical point of view, the eigenvectors define those directionsin R
n (domain of the linear transformation represented by A) that areinvariant wrt the transformation A, while the eigenvalues provide therelated “scale factors” along these directions.
The set of eigenvalues of a matrix A will be indicated as Λ(A), or rather{λi (A)}; the set of eigenvectors of A will be indicated as {ui (A)}.In general, since the eigenvectors represent the invariant directions of thetransformation, they are represented up to a constant factor, so they areusually normalized; this is a tacit assumption that will be considered here,unless otherwise stated.
Basilio Bona (DAUIN) Matrices September 2013 28 / 74
Eigenvalues
Properties
Given a matrix A and its eigenvalues {λi (A)}, the following holds true
{λi (A+ cI)} = {(λi (A) + c)}
Given a matrix A and its eigenvalues {λi (A)}, the following holds true
{λi (cA)} = {(cλi (A)}
Given an upper or lower triangular matrix
a11 a12 · · · a1n0 a22 · · · a2n...
.... . .
...0 0 · · · ann
,
a11 0 · · · 0a21 a22 · · · 0...
.... . .
...an1 an2 · · · ann
its eigenvalues are the terms on the diagonal {λi (A)} = {aii}; the sameapplies for a diagonal matrix.
Basilio Bona (DAUIN) Matrices September 2013 29 / 74
Invariance of the eigenvalues
Given a matrix An×n and its eigenvalues {λi (A)}, the following holds true
det(A) =
n∏
i=1
λi
and
tr (A) =
n∑
i=1
λi
Given a general invertible transformation, represented by the matrix T, theeigenvalues of A are invariant to the similarity transformation
B = T−1AT
or rather{λi (B)} = {λi (A)}
Basilio Bona (DAUIN) Matrices September 2013 30 / 74
Modal matrix
If we build a matrix M, whose columns are the unit eigenvalues ui (A) of A
M =[u1 · · · un
]
then the similarity transformation wrt M results i a diagonal matrix
Λ =
λ1 0 · · · 00 λ2 · · · 0...
.... . .
...0 0 · · · λn
= M−1AM
M takes the name of modal matrix.
If A is symmetric, its eigenvalues are all real and the following identityholds
Λ = MTAM
In this particular case M is orthonormal.
Basilio Bona (DAUIN) Matrices September 2013 31 / 74
Singular value decomposition – SVD
Given a generic matrix A ∈ Rm×n, having rank r = ρ(A) ≤ s, with
s = min{m, n}, it can be factorized according to the
Singular value decomposition (SVD)
in the following way:
A = UΣVT =
s∑
i=1
σiuivTi
the important elements of this decomposition are σi , ui and vi
Basilio Bona (DAUIN) Matrices September 2013 32 / 74
SVD
σi(A) ≥ 0 are called singular values and are equal to the non-negative
square roots of the eigenvalues of the symmetric matrix ATA:
{σi (A)} = {√
λi (ATA)} σi ≥ 0
listed in decreasing order
σ1 ≥ σ2 ≥ · · · ≥ σs ≥ 0
if rank r < s there are only r positive singular values; the remainingones are zero
σ1 ≥ σ2 ≥ · · · ≥ σr > 0; σr+1 = · · · = σs = 0
U is an orthonormal square matrix (m ×m)
U =[u1 u2 · · · um
]
whose columns are the eigenvectors ui of AAT
Basilio Bona (DAUIN) Matrices September 2013 33 / 74
SVD
V is a orthonormal square matrix (n × n)
V =[v1 v2 · · · vn
]
whose columns are the eigenvectors vi of ATA
Σ is a rectangular matrix (m × n) with the following structure
if m < n Σ =[Σs O
]
if m = n Σ = Σs
if m > n Σ =
[Σs
O
]
.
Σs = diag(σi) is s × s diagonal, and its diagonal terms are thesingular values.
Basilio Bona (DAUIN) Matrices September 2013 34 / 74
SVD
Otherwise we can decompose A in a way that puts in evidence the positivesingular values alone:
A =[P P
]
︸ ︷︷ ︸
U
[Σr OO O
]
︸ ︷︷ ︸
Σ
[QT
QT
]
︸ ︷︷ ︸
VT
= PΣrQT
where
P is a m × r orthonormal matrix; P is a m × (m − r) orthonormalmatrix;
Q is a n × r orthonormal matrix, QTia a n × (n − r) orthonormal
matrix;
Σr is a r × r diagonal matrix with diagonal elements the positivesingular values σi , i = 1, · · · , r .
Basilio Bona (DAUIN) Matrices September 2013 35 / 74
SVD and rank
The rank r of A is equal to the number r ≤ s of nonzero singular values.
Given a generic matrix A ∈ Rm×n, the two matrices ATA and AAT are
symmetrical, have the same positive singular values, and differ only for thenumber of zero singular values.
Basilio Bona (DAUIN) Matrices September 2013 36 / 74
Linear operators representation
Given two vector spaces X ⊆ Rn and Y ⊆ R
m, with dimensions n and m,and given two generic vectors x ∈ X and y ∈ Y, the generic lineartransformation between the two spaces can be represented by the matrixoperator A ∈ R
m×n, as follows:
y = Ax; x ∈ Rn; y ∈ R
m.
Therefore a matrix can be always interpreted as an operator thattransforms a vector from the domain in X to the range Y.Conversely, a linear operator has at least one matrix that represents it.
Basilio Bona (DAUIN) Matrices September 2013 37 / 74
Image space and null space
The image space or range of a transformation A is the subspace Y definedby the following property:
R(A) = {y | y = Ax, x ∈ X}; R(A) ⊆ Y
The null space or kernel of a transformation A is the subspace of Xdefined by the following property:
N (A) = {x | 0 = Ax, x ∈ X}; N (A) ⊆ X
The null space represents all the vectors in X that are trasformed into theorigin of Y.The dimensions of the range and kernel are called, respectively, rank ρ(A)and nullity ν(A):
ρ(A) = dim(R(A)); ν(A) = dim(N (A)).
Basilio Bona (DAUIN) Matrices September 2013 38 / 74
Image space and null space
If X and Y have finite dimensions, the the following equalities hold:
N (A) = R(AT)⊥
R(A) = N (AT)⊥
N (A)⊥ = R(AT)R(A)⊥ = N (AT)
where ⊥ indicates the orthogonal complement to the corresponding(sub-)space. We recall that {0}⊥ = R.
The following orthogonal decomposition of subspaces X and Y hold
X = N (A)⊕N (A)⊥ = N (A)⊕R(AT)
Y = R(A)⊕R(A)⊥ = R(A)⊕N (AT)
where the symbol ⊕ represents the direct sum operator between subspaces.
Basilio Bona (DAUIN) Matrices September 2013 39 / 74
Generalized inverse
Given a generic real matrix A ∈ Rm×n, with m 6= n, the inverse matrix is
not defined. Nevertheless, it is possible to define a class of matrices A−,called pseudo-inverses or generalized inverses, that satisfy the followingrelation:
AA−A = A
If A has full rank, i.e., ρ(A) = min{m, n}, it is possible to define twoclasses of generalized inverses
if m < n (i.e., ρ(A) = m), the right inverse of A is a matrixAr ∈ R
n×m such thatAAr = Im×m
is n < m (i.e., ρ(A) = n), the left inverse of A is a matrix Aℓ ∈ Rn×m
such thatAℓA = In×n
Basilio Bona (DAUIN) Matrices September 2013 40 / 74
Pseudo-inverse matrix
Among the possible left- or right- inverses, two classes are important:
right pseudo-inverse (m < n):
A+r = AT(AAT)−1
When ρ(A) = m, then (AAT)−1 exists.
left pseudo-inverse (n < m):
A+ℓ = (ATA)−1AT
When ρ(A) = n, then (ATA)−1 exists; this particular leftpseudo-inverse
(ATA)−1AT
is also known as the Moore-Penrose pseudo-inverse.
Basilio Bona (DAUIN) Matrices September 2013 41 / 74
Moore-Penrose pseudo-inverse
In general, also if ATA is non invertible, it is always possible to define aMoore-Penrose pseudo-inverse A+ that satistfies the following relations:
AA+A = A
A+AA+ = A+
(AA+)T = AA+
(A+A)T = A+A
(1)
Basilio Bona (DAUIN) Matrices September 2013 42 / 74
Left and right pseudo-inverses
The two pseudo-inverses A+r and A+
ℓ coincide with the traditional inverse
matrix A−1 when A is square and full-rank:
A−1 = A+r = A+
ℓ = A+
The linear transformation associated to A ∈ Rm×n
y = Ax,
with x ∈ Rn and y ∈ R
m, is equivalent to a system of m linear equationsin n unknowns, whose coefficients are the elements of A; this linear systemcan admit one solution, no solution or an infinite number of solutions.
If we use the pseudo-inverses to solve the linear system y = Ax, we mustdistinguish two cases, assuming that A has full rank.
Basilio Bona (DAUIN) Matrices September 2013 43 / 74
Linear systems solution – 1
When n > m
there are more unknowns than equations; among the infinite possiblesolutions x ∈ R
n, we choose the one with minimum norm ‖x‖, given by
x∗ = A+d y = AT(AAT)−1y
All the other possible solutions of y = Ax are obtained as
x = x∗ + v = A+dy + v
where v ∈ N (A) is a vector belonging to the null space of A, withdimensions n −m. These other possible solutions can be expressed also as
x = A+d y + (I− A+
d A)w
where w ∈ Rn is a n × 1 generic vector. The matrix I− A+
dA projects w
on the null space of A, transforming w in v ∈ N (A); this matrix is calledprojection matrix.
Basilio Bona (DAUIN) Matrices September 2013 44 / 74
Figure: Solution of y = Ax when n > m.
Basilio Bona (DAUIN) Matrices September 2013 45 / 74
Linear systems solution – 2
When m > n
there are more equations than unknowns; no exact solutions exist fory = Ax, but only approximate solutions, with an error e = y − Ax 6= 0.Among these possible approximate solutions we choose that minimizingthe norm of the error, i.e.,
x = arg minx∈Rn
‖y − Ax‖
The solution isx = A+
ℓy = (ATA)−1ATy
Geometrically it is the orthogonal projection of y on the orthogonalcomplement of N (A), i.e., on the subspace N (A)⊥ = R(AT).
The approximation error, also called projection error, is
e = (I− AA+s )y
and its norm is the lowest, as said above.
Basilio Bona (DAUIN) Matrices September 2013 46 / 74
Figure: Solution of y = Ax when m > n.
Basilio Bona (DAUIN) Matrices September 2013 47 / 74
Linear systems solution – 3
The similarity between the projection matrix I− A+dA and the matrix that
gives the projection error I− AA+s is important and will be studied when
projection matrices will be treated.
In order to compute the generalized inverses, one can use the SVD.
In particular, the pseudo-inverse is computed as
A+ = V
[
Σ−1r OO O
]
UT = QΣ−1r PT.
Basilio Bona (DAUIN) Matrices September 2013 48 / 74
Projections and projection matrices
The geometrical concept of a projection of a segment on a plane can beextended and generalized to the elements of a vector space. This conceptis important for the solution of a large number of problems, asapproximation, estimation, prediction and filtering problems.
Give a n-dimensional real vector space V(Rn) with dimensions, endowedwith the scalar product, and a k ≤ n dimensional subspace W(Rk), it ispossible to define the projection operator of vectors v ∈ V on the subspaceW.
The projection operator is the square projection matrix P ∈ Rn×n, whose
columns are the projections of the base elements of V in W. A matrix is aprojection matrix iff P2 = P i.e., is idempotent.
The projection can be orthogonal or non orthogonal; in the first case P issymmetrical, in the second case it is generic. If P is a projection matrix,also I− P is a projection matrix.
Basilio Bona (DAUIN) Matrices September 2013 49 / 74
Projection matrices
Some examples of projection matrices are those associated to the leftpseudo-inverse
P1 = AA+s e P2 = I− AA+
s
and to the right pseudo-inverse
P3 = A+d A e P4 = I− A+
d A
From a geometrical point of view, P1 projects every vector v ∈ V in therange space R(A), while P2 projects v in its orthogonal complement
R(A)⊥ = N (AT).
Basilio Bona (DAUIN) Matrices September 2013 50 / 74
Matrix norm – 1
Similarly to what can be established for a vector, it is possible to provide a“measure” of the matrix, i.e., give its “magnitude”, defining the matrixnorm.
Since a matrix represents a linear transformation between vectors, thematrix norm measures how big this transformation is, but in some way,must “normalize” the result, to avoid that the magnitude of thetransformed vector affects the norm; hence the following definition:
‖A‖ def= sup
‖x‖
‖Ax‖‖x‖ = sup
‖x‖=1‖Ax‖ .
Basilio Bona (DAUIN) Matrices September 2013 51 / 74
Matrix norm – 2
Given a square matrix A ∈ Rn×n, its norm must satisfy the following
general (norm) axioms:
1 ‖A‖ > 0 for every A 6= O;2 ‖A‖ = 0 iff A = O;3 ‖A+ B‖ ≤ ‖A‖+ ‖B‖ (triangular inequality);4 ‖αA‖ = |α| ‖A‖ for any scalar α and any matrix A;5 ‖AB‖ ≤ ‖A‖ ‖B‖.
Given A ∈ Rn×n and its eigenvalues {λi (A)}, the following inequality
holds true1
∥∥A−1
∥∥≤ |λi | ≤ ‖A‖ ∀i = 1, . . . , n
Basilio Bona (DAUIN) Matrices September 2013 52 / 74
Matrix norm – 3
Taking only into account real matrices, the most used norms are:
Spectral norm:
‖A‖2 =√
maxi{λi (A
TA)}
Frobenius norm
‖A‖F =
√∑
i
∑
j
a2ij =√
trATA
Max singular value:
‖A‖σ =√
maxi
{σi (A)}
Basilio Bona (DAUIN) Matrices September 2013 53 / 74
Matrix norm – 4
1-norm or max-norm:
‖A‖1 = maxj
n∑
i=1
|aij |
∞-norm:
‖A‖∞ = maxi
n∑
j=1
|aij |
In general,‖A‖2 = ‖A‖σ
and‖A‖22 ≤ ‖A‖1 ‖A‖∞
Basilio Bona (DAUIN) Matrices September 2013 54 / 74
Skew-symmetric matrices
Skew-symmetric matrix
A square matrix S is called skew-symmetric or antisymmetric when
S+ ST = O or S = −ST
A skew-symmetric matrix has the following structure
An×n =
0 s12 · · · s1n−s12 0 · · · s2n...
.... . .
...−s1n −s2n · · · 0
Therefore there it has at mostn(n − 1)
2independent elements.
Basilio Bona (DAUIN) Matrices September 2013 55 / 74
Skew-symmetric matrices
For n = 3 it resultsn(n − 1)
2= 3, hence an skew-symmetric matrix has as
many element as a 3D vector v.
Given a vector v =[v1 v2 v3
]Tit is possible to build S, and given a
matrix S it is possible to extract the associated vector v.
We indicate this fact using the symbol S(v), where, by convention
S(v) =
0 −v3 v2v3 0 −v1−v2 v1 0
Basilio Bona (DAUIN) Matrices September 2013 56 / 74
Skew-symmetric matrices
Some properties:
Given any vector v ∈ R3:
ST(v) = −S(v) = S(−v)
Given two scalars λ1, λ2 ∈ R:
S(λ1u+ λ2v) = λ1S(u) + λ2S(v)
Given any two vectors v,u ∈ R3:
S(u)v = u× v = −v × u = S(−v)u = ST(v)u
Therefore S(u) is the representation of the operator (u×) andviceversa.
Basilio Bona (DAUIN) Matrices September 2013 57 / 74
Skew-symmetric matrices
The matrix S(u)S(u) = S2(u) is symmetrical and
S2(u) = uuT − ‖u‖2 I
Hence the dyadic product
D(u,u) = uuT = S2(u) + ‖u‖2 I
Basilio Bona (DAUIN) Matrices September 2013 58 / 74
Eigenvalues and eigenvectors of skew-symmetric matrices
Given an skew-symmetric matrix S(v), its eigenvalues are imaginary orzero.
λ1 = 0, λ2,3 = ±j ‖v‖The eigenvalue related to the eigenvector λ1 = 0 is v; the other two arecomplex conjugate.
The set of skew-symmetric matrices is a vector space, denoted as so(3).
Given two skew-symmetric matrices S1 and S2, we call commutator or Liebracket the following operator
[S1,S2]def= S1S2 − S2S1
that is itself skew-symmetric.
Skew-symmetric matrices form a Lie algebra, which is related to the Liegroup of orthogonal matrices.
Basilio Bona (DAUIN) Matrices September 2013 59 / 74
Orthogonal matrices
A square matrix A ∈ Rn is called orthogonal when
ATA =
α1 0 · · · 00 α2 · · · 0...
.... . .
...0 0 · · · αn
with αi 6= 0.
A square orthogonal matrix U ∈ Rn is called orthonormal when all the
constants αi are 1:UTU = UUT = I
ThereforeU−1 = UT
Basilio Bona (DAUIN) Matrices September 2013 60 / 74
Orthonormal matrices
Other properties:
The columns, as well as the rows, of U or orthogonal to each otherand have unit norm.
‖U‖ = 1;
The determinant of U has unit module:
|det(U)| = 1
therefore it can be +1 or −1.
Given a vector x, its orthonormal transformation is y = Ux.
Basilio Bona (DAUIN) Matrices September 2013 61 / 74
Orthonormal matrices
If U is an orthonormal matrix, then ‖AU‖ = ‖UA‖ = ‖A‖.Property in general valid also for unitary matrices, i.e., U∗U = I.
When U ∈ R3×3, only 3 out of 9 elements are independent.
Scalar product is invariant to orthonormal transformations,
(Ux) · (Uy) = (Ux)T(Uy) = xTUTUy = xTy = x · y
This means that vector lengths are invariant wrt orthonormaltrasformations
‖Ux‖ = (Ux)T(Ux) = xTUTUx = xTIx = xTx = ‖x‖
Basilio Bona (DAUIN) Matrices September 2013 62 / 74
Orthonormal matrices
When considering orthonormal transformations, it is important todistinguish the two cases:
When det(U) = +1, U represents a proper rotation or simply arotation,
when det(U) = −1, U represents an improper rotation or reflection.
The set of rotations forms a continuous non-commutative (wrt product)group; the set of reflections do not have this “quality”.
Intuitively this means that infinitesimal rotations exist, while infinitesimalreflections do not have any meaning.
Reflections are the most basic transformation in 3D spaces, in the sensethat translations, rotations and roto-reflections (slidings) are obtainedfrom the composition of two or three reflections
Basilio Bona (DAUIN) Matrices September 2013 63 / 74
Figure: Reflections.
Basilio Bona (DAUIN) Matrices September 2013 64 / 74
Orthonormal matrices
If U is an orthonormal matrix, the distributive property wrt the crossproduct holds:
U(x× y) = (Ux)× (Uy)
(with general A matrices this is not true).
For any proper rotation matrix U and a generic vector x the following holds
US(x)UTy = U(x× (UTy)
)= (Ux)× (UUTy)
= (Ux)× y = S(Ux)y
where S(x) is the skew-symmetric matrix associated with x; therefore:
US(x)UT = S(Ux)US(x) = S(Ux)U
Basilio Bona (DAUIN) Matrices September 2013 65 / 74
Bilinear and quadratic forms
A bilinear form associated to the matrix A ∈ Rm×n is the scalar quantity
defined as
b(x, y)def= xTAy = yTATx
A quadratic form associated to the square matrix A ∈ Rn×n is the scalar
quantity defined as
q(x)def= xTAx = xTATx
Every quadratic form associated to a skew-symmetric matrix S(y) isidentically zero
xTS(y)x ≡ 0 ∀xIndeed, assuming w = S(y)x = y × x, one obtains xTS(y)x = xTw, butsince, by definition, w is orthogonal to both y and x, the scalar productxTw will be always zero, and also the quadratic form at the left hand side.
Basilio Bona (DAUIN) Matrices September 2013 66 / 74
Definite positive matrices – 1
Recalling the standard decomposition of a generic square matrix A insymmetric term As and an skew-symmetric one Aa, one concludes that thequadratic form depends only on the symmetric part of the matrix:
q(x) = xTAx = xT(As + Aa)x = xTAsx
A square matrix A is said to be positive definite if the associated quadraticform xTAx satisfies to the following conditions
xTAx > 0 ∀x 6= 0
xTAx = 0 x = 0
A square matrix A is said to be positive semidefinite if the associatedquadratic form xTAx satisfies to the following conditions
xTAx ≥ 0 ∀x
A square matrix A is said to be negative definite if −A is positive definite;similarly, a square matrix A is semidefinite negative if −A e semidefinitepositive.
Basilio Bona (DAUIN) Matrices September 2013 67 / 74
Definite positive matrices – 2
Often we use the following notations:
definite positive matrix: A ≻ 0semidefinite positive matrix: A � 0definite negative matrix: A ≺ 0semidefinite negative matrix: A � 0
A necessary but not sufficient condition for a square matrix A to bepositivedefinite is that the elements on its diagonal are all strictly positive.
A necessary and sufficient condition for a square matrix A to be definitepositive is that all its eigenvalues are strictly positive.
Basilio Bona (DAUIN) Matrices September 2013 68 / 74
Sylvester criterion
The Sylvester criterion states that a square matrix A is positive definite iffall its principal minors are strictly positive.
A definite positive matrix has full rank and is always invertible
The associated quadratic form xTAx satisfies the following identity
λmin(A) ‖x‖2 ≤ xTAx ≤ λmax(A) ‖x‖2
where λmin(A) and λmax(A) are, respectively, the minimum and themaximum eigenvalues.
Basilio Bona (DAUIN) Matrices September 2013 69 / 74
Semidefinite matrix and rank
A semidefinite positive matrix An×n has rank ρ(A) = r < n, i.e., it has rstrictly positive eigenvalues and n − r zero eigenvalues.
The quadratic form sgoes to zero for every vector x ∈ N (A).
Given a real matrix of generic dimensions Am×n, we have seen that bothATA and AAT are symmetrical; in addition we know that
ρ(ATA) = ρ(AAT) = ρ(A)
These matrices have all real, non negative eigenvalues, and therefore theyare definite or semidefinite positive: in particular, if Am×n has full rank,then
if m < n, ATA � 0 and AAT ≻ 0,
if m = n, ATA ≻ 0 and AAT ≻ 0,
if m > n, ATA ≻ 0 and AAT � 0.
Basilio Bona (DAUIN) Matrices September 2013 70 / 74
Matrix derivatives – 1
If matrix A has its elements that are functions of a quantity x , one candefine the matrix derivative wrt x as
d
dxA(x) :=
[daij
dx
]
If x is the time t, one writes
d
dtA(t) ≡ A(t) :=
[
daij(t)
dt
]
≡[aij]
If A is a time function through the variable x(t), then
d
dtA(x(t)) ≡ A(x(t)) :=
[
∂aij(x)
∂x
dx(t)
dt
]
≡[
∂aij(x)
∂x
]
x(t)
Basilio Bona (DAUIN) Matrices September 2013 71 / 74
Matrix derivatives – 2
Given a vector-values scalar function φ(x) defined as φ(·) : Rn → R1, the
gradient of the function φ wrt to x is a column vector
∇xφ =∂φ
∂x:=
∂φ(x)
∂x1· · ·
∂φ(x)
∂xn
i.e., ∇x :=
∂
∂x1· · ·∂
∂xn
= gradx
If x(t) is a differentiable time function, then
dφ(x)
dt≡ φ(x) = ∇T
x φ(x)x
(Notice the convention: the gradient for us is a column vector, althoughmany textbooks assume it is a row vector)
Basilio Bona (DAUIN) Matrices September 2013 72 / 74
Jacobian matrix
Given a m × 1 vector function f(x) =[f1(x) · · · fm(x)
]T, x ∈ R
n, theJacobian matrix (or simply the jacobian) is a m × n matrix defined as
Jf(x) =
(∂f1(x)
∂x
)T
· · ·(∂fm(x)
∂x
)T
=
∂f1(x)
∂x1· · · ∂f1(x)
∂xn
· · · ∂fi (x)
∂xj· · ·
∂fm(x)
∂x1· · · ∂fm(x)
∂xn
=
(gradxf1)T
· · ·(gradxfm)
T
and if x(t) is a differentiable time function, then
f(x) ≡ df(x)
dt=
df(x)
dxx(t) = Jf(x)x(t)
Notice that the rows of Jf are the transpose of the gradients of the variousfunctions.
Basilio Bona (DAUIN) Matrices September 2013 73 / 74
Gradient
Given a bilinear form b(x, y) = xTAy, we call gradients the followingvectors:
gradient wrt x: gradxb(x, y)def=
∂b(x, y)
∂x= Ay
gradient wrt y: gradyb(x, y)def=
∂b(x, y)
∂y= ATx
Given the quadratic form q(x) = xTAx, we call gradient wrt x thefollowing vector:
∇xq(x) ≡ gradxq(x)def=
∂q(x)
∂x= 2Ax
Basilio Bona (DAUIN) Matrices September 2013 74 / 74