appendix b matrices and matrix algebra - polito.itappendix b matrices and matrix algebra the...

37
Appendix B Matrices and Matrix Algebra The interested readers may find a complete treatment in several Italian [33, 40] or English textbooks; a classical one is [15], while for linear algebra my preference goes to the book of Strang [45]. His video lessons are available at MIT OpenCourseWare [50]. B.1 Definitions A matrix is a set of N real or complex numbers organized in m rows and n columns, with N = mn A = a 11 a 12 ··· a 1n a 21 a 22 ··· a 2n ··· ··· a ij ··· a m1 a m2 ··· a mn [ a ij ] i =1,...,m j =1,...,n A matrix is always written as a boldface capital letter as in A. To indicate matrix dimensions we use the following symbols A m×n A m×n A F m×n A F m×n where the field F is F = R when the matrix has real elements and F = C when the matrix has complex elements. Given a matrix A m×n we define a transpose matrix the matrix obtained exchang- ing rows and columns A T n×m = a 11 a 21 ··· a m1 a 12 a 22 ··· a m2 . . . . . . . . . . . . a 1n a 2n ··· a mn 181

Upload: others

Post on 26-Jan-2021

6 views

Category:

Documents


8 download

TRANSCRIPT

  • Appendix B

    Matrices and Matrix Algebra

    The interested readers may find a complete treatment in several Italian [33, 40] orEnglish textbooks; a classical one is [15], while for linear algebra my preference goesto the book of Strang [45]. His video lessons are available at MIT OpenCourseWare[50].

    B.1 Definitions

    A matrix is a set of N real or complex numbers organized inm rows and n columns,with N = mn

    A =

    a11 a12 · · · a1na21 a22 · · · a2n· · · · · · aij · · ·am1 am2 · · · amn

    ≡ [aij] i = 1, . . . ,m j = 1, . . . , nA matrix is always written as a boldface capital letter as in A.

    To indicate matrix dimensions we use the following symbols

    Am×n Am×n A ∈ Fm×n A ∈ Fm×n

    where the field F is F = R when the matrix has real elements and F = C when thematrix has complex elements.

    Given a matrix Am×n we define a transpose matrix the matrix obtained exchang-ing rows and columns

    ATn×m =

    a11 a21 · · · am1a12 a22 · · · am2...

    .... . .

    ...a1n a2n · · · amn

    181

    BasilioRettangolo

  • 182 Basilio Bona - Dynamic Modelling

    The following property holds (AT)T = A.

    A matrix is said to be square when m = n

    A square n× n matrix is upper triangular when aij = 0, ∀i > j

    An×n =

    a11 a12 · · · a1n0 a22 · · · a2n...

    .... . .

    ...0 0 · · · ann

    A square matrix is lower triangular if its transpose is upper triangular and vicev-ersa.

    ATn×n =

    a11 0 · · · 0a12 a22 · · · 0...

    .... . .

    ...a1n a2n · · · ann

    A real square matrix is said to be symmetric if

    A = AT → A−AT = O

    In a real symmetric matrix there are at mostn(n+ 1)

    2independent elements.

    If a matrix K has complex elements kij = aij + jbij (where j =√−1) its conjugate

    is K with elements kij = aij − jbij.

    Given a complex matrix K , an adjoint matrix K ∗ is defined, as the conjugate

    transpose K ∗ = KT= KT

    A complex matrix is called self-adjoint or hermitian when K = K ∗. Sometextbooks indicate this matrix as K † or KH.

    A square matrix is diagonal if aij = 0 for i ̸= j

    An×n = diag(ai) =

    a1 0 · · · 00 a2 · · · 0...

    .... . .

    ...0 0 · · · an

    A diagonal matrix is always symmetric and has at most n non zero elements.

    A square matrix is skew-symmetric or antisymmetric if

    A+AT = 0 → A = −AT

    Antisymmetric matrix properties will be described in Appendix B.6.

  • Basilio Bona - Dynamic Modelling 183

    B.1.1 Block matrix

    It is possible to represent a matrix with blocks as

    A =

    A11 · · · A1n... Aij ...Am1 · · · Amn

    where the blocks Aij have suitable dimensions.

    Given the following matrices

    A1 =

    A11 · · · A1nO Aij ...O O Amn

    A2 =A11 O O... Aij OAm1 · · · Amn

    A3 =A11 O OO Aij O

    O O Amn

    A1 is upper block triangular, A2 is lower block triangular, and A3 is blockdiagonal

    Example B.1.1

    Given two matrices A and B

    A =

    −1 37 −24 −5

    ; B = [ 0 −5 4−3 1 6]

    Matrix A is 3× 2, while B is 2× 3. The transpose matrices are

    AT =

    [−1 7 43 −2 −5

    ]; BT =

    0 −3−5 14 6

    The matrix

    C =

    −4 6 39 2 −57 −1 8

    is square and its transpose is

    CT =

    −4 9 76 2 −13 −5 8

    The matrix

    D =

    1 2 −32 4 5−3 5 −6

  • 184 Basilio Bona - Dynamic Modelling

    is symmetrical, since

    D −DT =

    1 2 −32 4 5−3 5 −6

    − 1 2 −32 4 5−3 5 −6

    =0 0 00 0 00 0 0

    The matrix

    E =

    1 0 00 2 00 0 3

    is diagonal and symmetrical. The matrix

    F =

    0 1 −4−1 0 64 −6 0

    is skew-symmetric, since

    F + FT =

    0 1 −4−1 0 64 −6 0

    + 0 −1 41 0 −6−4 6 0

    =0 0 00 0 00 0 0

    The 4× 4 matrix

    G =

    1 2 3 45 6 7 89 0 1 23 4 5 6

    can be written as the following block matrix

    G =

    [G11G12G21G22

    ]with

    G11 =

    [1 25 6

    ]G21 =

    [3 47 8

    ]G12 =

    [9 03 4

    ]G22 =

    [1 25 6

    ]or

    G11 =

    1 2 35 6 79 0 1

    G21 =482

    G12 =

    [3 4 5

    ]G22 =

    [6]

  • Basilio Bona - Dynamic Modelling 185

    B.1.2 Matrix algebra

    Matrices are elements of an algebra, i.e., a vector space together with a productoperator.

    The main operations associated to the matrix algebra are: product by a scalar,sum, and matrix product.

    Product by a scalar

    αA = α

    a11 a12 · · · a1na21 a22 · · · a2n...

    .... . .

    ...am1 am2 · · · amn

    =αa11 αa12 · · · αa1nαa21 αa22 · · · αa2n...

    .... . .

    ...αam1 αam2 · · · αamn

    Sum

    A+B =

    a11 + b11 a12 + b12 · · · a1n + b1na21 + b21 a22 + b22 · · · a2n + b2n

    ......

    . . ....

    am1 + bm1 am2 + bm2 · · · amn + bmn

    The matrix sum properties are:

    A+O = A existence of a null element

    A+B = B +A commutativity

    (A+B) +C = A+ (B +C ) distributivity

    (A+B)T = AT +BT

    The null (neutral, zero) element O takes the name of null matrix. The subtraction(difference) operation is defined using the scalar α = −1:

    A−B = A+ (−1)B

    �Example B.1.2

    Given the two square matrices A e B

    A =

    −1 3 57 −2 81 3 −2

    ; B = 0 −5 4−3 1 6

    9 −2 5

  • 186 Basilio Bona - Dynamic Modelling

    and the two scalarsα = −2; β = 5

    Matrix C , obtained as a linear combination

    C = αA+ βB

    is

    C = −2

    −1 3 57 −2 81 3 −2

    + 5 0 −5 4−3 1 6

    9 −2 5

    = 2 −31 10−29 9 14

    43 −16 29

    Matrix product

    The operation is performed using the well-known rule “rows by columns”: the genericelement cij of the matrix product Cm×p = Am×n ·Bn×p is

    cij =n∑k=1

    aikbkj

    The bi-linearity of the matrix product is guaranteed, since it is immediate to verifythat, given a generic scalar α, the following identity holds:

    α(A ·B) = (αA) ·B = A · (αB)

    The properties of the matrix product are:

    A ·B ·C = (A ·B) ·C = A · (B ·C )A · (B +C ) = A ·B +A ·C(A+B) ·C = A ·C +B ·C(A ·B)T = BT ·AT

    In general it is important to notice that

    • the matrix product is non-commutative: A ·B ̸= B ·A, apart from particularcases;

    • A ·B = A ·C does not imply B = C , apart from particular cases;

    • A ·B = O does not imply A = O or B = O , apart from particular cases.

    It is a common habit to omit the · sign of product, so that

    A ·B ≡ AB

  • Basilio Bona - Dynamic Modelling 187

    The product between block matrices follows the general rules; for instance given

    A =

    [A11 A12A21 A22

    ]B =

    [B11 B12B21 B22

    ]The product AB yields

    AB =

    [A11B11 +A12B21 A11B12 +A12B22A21B11 +A22B21 A21B12 +A22B22

    ]�Example B.1.3

    Given the two matrices A e B

    A =

    −1 3 57 −2 81 3 −2

    ; B = 0 −5 4−3 1 6

    9 −2 5

    We compute C 1 = AB and C 2 = BA as:

    C 1 = AB =

    36 −2 3978 −53 56−27 2 12

    C 2 = BA =−31 22 −4816 7 −19−18 46 19

    Now we compute BTAT and see that it is equal to CT1 :

    BTAT =

    36 78 −27−2 −53 239 56 12

    = CT1We present a case where AB = AC does not imply B = C . Given

    A =

    1 0 00 1 −10 −2 2

    B =1 0 00 2 20 1 3

    C =1 0 00 1 00 0 1

    We compute the product obtaining

    AC = AB =

    1 0 00 1 −10 −2 2

    We present a case where AB = O does not imply A = O or B = O .

    A =

    1 0 00 1 −10 −2 2

    ̸= O B =0 0 00 1 20 1 2

    ̸= OAB = O

  • 188 Basilio Bona - Dynamic Modelling

    B.1.3 Special matrices

    Some matrices have special structures

    Identity matrix

    A neutral element with respect to the product exists and is called identity matrix.It is a square n× n matrix written is a I n or simply I when no ambiguity arises

    I =

    1 0 · · · 00 1 · · · 0...

    .... . .

    ...0 0 · · · 1

    Given a rectangular matrix Am×n the following identities hold

    Am×n = ImAm×n = Am×nI n

    Idempotent matrix

    Given a square matrix A ∈ Rn×n, its k-th power is

    Ak =k∏ℓ=1

    A

    A matrix is said to be idempotent if

    A2 = A ⇒ An = A

    Example B.1.4

    Given the A matrix

    A =

    −1 3 57 −2 41 3 −2

    we compute the power A3:

    A3 = AAA =

    12 60 165295 −68 25−60 135 12

  • Basilio Bona - Dynamic Modelling 189

    Matrix B

    B =

    1 0 00 0.5 0.50 0.5 0.5

    is idempotent since it results that

    BB = B2 =

    1 0 00 0.5 0.50 0.5 0.5

    B.1.4 Trace

    The trace of a square matrix An×n is the sum of its diagonal elements

    tr (A) =n∑k=1

    akk

    The trace satisfies the following properties

    tr (αA+ βB) = α tr (A) + β tr (B)tr (AB) = tr (BA)tr (A) = tr (AT)tr (A) = tr (T−1AT ) for non singular T (for definition see page 194)

    where α, β are scalar parameters.

    Example B.1.5

    Given the two matrices A and B

    A =

    −1 3 57 2 41 3 −2

    ; B =1 3 57 −2 41 3 −2

    we compute tr (A):

    tr (A) = −1 + 2− 2 = −1; trB = 1− 2− 2 = −3

    notice that

    tr (A+B) = tr

    0 6 1014 0 82 6 −4

    = −4 = trA+ trB

  • 190 Basilio Bona - Dynamic Modelling

    and that

    tr (AB) = tr

    25 6 −325 29 3520 −9 21

    = 75 = tr (BA) 25 24 7−17 29 19

    18 3 21

    B.1.5 Minors and cofactors

    A minor of a generic matrixA ∈ Rm×n is the determinant of a smaller square matrix,obtained from A by removing one or more of its rows or columns.

    A minor of order p of a matrix A ∈ Rm×n is the determinant1 Dp of a squaresub-matrix B ∈ Rp×p obtained cancelling any (m− p) rows and (n− p) columns ofA ∈ Rm×n, or equivalently selecting p rows and p columns.

    There are as many minors as there are possible choices of p out of m rows and of pout of n columns.

    Given a matrix A ∈ Rm×n, the principal minors of order k are the determinantsDk, with k = 1, · · · ,min{m,n}, obtained selecting the first k rows and k columnsof A.

    Minors obtained by removing just one row and one column from square matrices(called also principal or first minors) are required for calculating matrix cofactors,which are necessary for computing both the determinant and the inverse of squarematrices.

    Given a square matrix A ∈ Rn×n, we indicate with A(ij) ∈ R(n−1)×(n−1) the sub-matrix obtained taking out the i-th row and the j-th column of A.

    We define the minor Drc of a generic element arc of a square matrix A ∈ Rn×n, thedeterminant of the matrix obtained deleting the r-th row and the c-th column, i.e.,

    Drc = detA(rc).

    We define the cofactor of an element arc of a square matrix A ∈ Rn×n the product

    Arc = (−1)r+cDrc

    Example B.1.6

    1The formal definition of determinant will be presented in a subsequent Section.

  • Basilio Bona - Dynamic Modelling 191

    Given the 3× 3 matrix

    A =

    1 −3 57 2 4−1 3 2

    we compute a generic minor D1, for instance, that obtained removing the secondrow and the third column. The reduced matrix becomes the following

    B =

    [1 −3−1 3

    ]whose determinant is

    D1 = det(B) = 3× 1− (−3×−1) = 0

    Example B.1.7

    Given the 3× 4 matrix

    A =

    1 −3 5 17 2 −4 2−1 3 2 3

    we compute the principal minors of order 3, D1, D2, D3.

    D1 = det(A1) = det[1]= 1

    D2 = det(A2) = det

    [1 −37 2

    ]= 1× 2− (−3× 7) = 23

    D3 = det(A3) = det

    1 −3 57 2 −4−1 3 2

    = 1×(4+12)−3×(14−4)+5×(21+2) = 101we notice that the principal minors of A are equal to the principal minors of AT.

    Example B.1.8

    Given the 3× 3 matrix

    A =

    1 5 17 −4 2−1 3 3

    we compute the minor of a32:

    D32 = detA(32) = det

    [1 17 2

    ]= 1× 2− (1× 7) = −5

    and the cofactor of a21 as:

    A21 = (−1)2+1 detA(21) = − det[5 13 3

    ]= −(5× 3− 3× 1) = −12

  • 192 Basilio Bona - Dynamic Modelling

    B.1.6 Determinants

    Once introduced the cofactor, the determinant of a square matrixA can be definedas a linear combination of elements and cofactors “by row”, i.e., choosing a genericrow i

    det (A) =n∑k=1

    aik(−1)i+k det (A(ik)) =n∑k=1

    aikAik

    or as a linear combination of elements and cofactors “by column” i.e., choosing ageneric column j

    det (A) =n∑k=1

    akj(−1)k+j det (A(kj)) =n∑k=1

    akjAkj

    Since these definition are recursive and assume the computation of determinants ofsmaller order minors, it is necessary to define the determinant of a 1×1 matrix, i.e.,a scalar, that is simply det (aij) = aij.

    Properties of the determinant

    • det(AB) = det(A) det(B)

    • det(AT) = det(A)

    • det(kA) = kn det(A)

    • if one exchanges s rows or s columns of A, obtaining a new matrix As, wehave det(As) = (−1)s det(A)

    • if A has two equal or proportional rows/columns, we have det(A) = 0

    • if A has a row or a column that is a linear combination of other rows orcolumns, we have det(A) = 0

    • if A è upper or lower triangular, we have det(A) =∏n

    i=1 aii

    • if A is block triangular, with p blocks Aii on the diagonal, we have det(A) =∏pi=1 detAii

    Example B.1.9

  • Basilio Bona - Dynamic Modelling 193

    Given the following 3× 3 matrix

    A =

    1 5 17 −4 2−1 3 3

    we compute its determinant as

    det(A) =

    (−1)(1+1)·[1]·det[−4 23 3

    ]+(−1)(1+2)·[5]·det

    [7 2−1 3

    ]+(−1)(1+3)·[1]·det

    [7 −4−1 3

    ]=

    − 116

    Given the following 3× 3 matrices

    A =

    1 5 17 −4 2−1 3 3

    and B = 1 −1 13 −4 2−1 3 2

    we compute the determinant of the product as the product of the two determinants

    det(A) = −116; det(B) = −1; det(AB) = det(BA) = 116

    dove

    AB =

    15 −18 13−7 15 35 −2 11

    ; BA = −7 12 2−27 37 1

    18 −11 11

    B.1.7 Rank and singular matrix

    We define the rank of matrix A ∈ Rm×n, the maximum integer ρ(A), such that atleast a non zero minor Dp exists.

    The following properties hold:

    • ρ(A) ≤ min{m,n}

    • if ρ(A) = min{m,n}, A is said to have full rank

    • if ρ(A) < min{m,n}, the matrix does not have full rank and one says thatthere is a fall of rank

    • ρ(AB) ≤ min{ρ(A), ρ(B)}

    • ρ(A) = ρ(AT)

  • 194 Basilio Bona - Dynamic Modelling

    • ρ(AAT) = ρ(ATA) = ρ(A)

    • if A ∈ Rn×n and detA < n then A has no full rank

    A square matrix A is singular if its rank is not full, i.e., if det(A) = 0.

    Example B.1.10

    Given two square matrices

    A =

    1 5 12 −4 2−1 2 −1

    and B = 1 −1 1 22 −2 2 4−1 1 3 −3

    their rank is

    ρ(A) = 2; ρ(B) = 2; ρ(AB) = 2

    We note immediately that the first and third column of A are equal, so its rank isnot full, while the second row of B is twice the first row; since the rank of B cannotbe larger that 3, this linear dependence loers the rank to 2.

    B.1.8 Adjoint matrix

    Given a square matrix A ∈ Rn×n, the adjoint matrix is defined as the squarematrix Adj(A) = {αij} whose elements are defined as

    αij = (−1)i+jDji

    namely, the matrix that has in its row i and column j the cofactor of the corre-sponding element aji of row j and column i.

    Example B.1.11

    Given two square matrices

    A =

    [1 52 −4

    ]; B =

    1 −1 22 −2 2−1 2 3

  • Basilio Bona - Dynamic Modelling 195

    we compute their adjoint matrices H e G.

    h11 = (−1)1+1DA,11 = (1)× (−4) = −4h12 = (−1)1+2DA,21 = (−1)× (5) = −5h21 = (−1)2+1DA,12 = (−1)× (2) = −2h22 = (−1)2+2DA,22 = (1)× (1) = 1

    H =

    [−4 −5−2 1

    ]and

    g11 = (−1)1+1DB,11 = (1)× [(−6) + (−4)] = −10g12 = (−1)1+2DB,21 = (−1)× [(−3) + (−4)] = 7g13 = (−1)1+3DB,31 = (1)× [(−2) + (4)] = 2g21 = (−1)2+1DB,12 = (−1)× [(6) + (2)] = −8g22 = (−1)2+2DB,22 = (1)× [(3) + (2)] = 5g23 = (−1)2+3DB,32 = (−1)× [(2) + (−4)] = 2g31 = (−1)3+1DB,13 = (1)× [(4) + (−2)] = 2g32 = (−1)3+2DB,23 = (−1)× [(2) + (−1)] = −1g33 = (−1)3+3DB,33 = (1)× [(−2) + (2)] = 0

    G =

    −10 7 2−8 5 22 −1 0

    B.1.9 Invertible matrix

    Given a square matrix A ∈ Rn×n, it is invertible or nonsingular if an inversematrix A−1 ∈ Rn×n exists, such that

    AA−1 = A−1A = I n

    The matrix is invertible iff ρ(A) = n, or it has full rank; this implies det(A) ̸= 0.

    The inverse matrix can be computed as

    A−1 =1

    det(A)Adj(A)

    The following properties hold: (A−1)−1 = A; (AT)−1 = (A−1)T.

    Given two square matrices A and B of equal dimension n×n, the following identityholds

    (AB)−1 = B−1A−1

  • 196 Basilio Bona - Dynamic Modelling

    An important results, called Inversion lemma, establish what follows: if A,C aresquare invertible matrices and B ,D are matrices of suitable dimensions, then

    (A+BCD)−1 = A−1 −A−1B(DA−1B +C−1)−1DA−1

    Matrix (DA−1B +C−1) must be invertible.

    The inversion lemma is useful for computing the inverse of a sum of matricesA1+A2,when A2 is decomposable into the product BCD and C is easily invertible, being,for instance, diagonal or triangular.

    Example B.1.12

    Let’s take the two matrices A e B in Example B.1.11

    A =

    [1 52 −4

    ]; B =

    1 −1 22 −2 2−1 2 3

    whose adjoints are known; since

    A−1 =1

    det(A)Adj(A); B−1 =

    1

    det(B)Adj(B);

    we compute

    det(A) = −14; det(B) = 2

    to obtain

    A−1 =

    [0.2857 0.35710.1429 −0.0714

    ]; B−1 =

    −5.0 3.5 1.0−4.0 2.5 1.01.0 −0.5 0.0

    B.1.10 Similarity transformation

    Given a square matrix A ∈ Rn×n and a non singular square matrix T ∈ Rn×n, thenew matrix B ∈ Rn×n, obtained as

    B = T−1AT or B = TAT−1

    is said to be similar to A, and the transformation T is called similarity trans-formation.

  • Basilio Bona - Dynamic Modelling 197

    B.2 Eigenvalues and eigenvectors

    Considering the similarity transformation between A and Λ,

    A = UΛU −1

    where Λ = diag(λi), and

    U =[u1 u2 · · · un

    ]Multiplying to the right A by U one obtains

    AU = UΛ

    that, given the matrix structures, implies the following identity

    Au i = λiu i

    This identity is the well-known formula that relates the matrix eigenvalues to eigen-vectors; the constant quantities λi are the eigenvalues of A, while vectors u i arethe eigenvectors of A, usually with non-unit norm.

    Given a square matrixAn×n, the solutions λi (real or complex) of the characteristicequation

    P(λ) def= det(λI −A) = 0are the eigenvalues of A. P(λ) is a polynomial in λ, called characteristic poly-nomial.

    If the eigenvalues are all distinct, the vectors u i that satisfy the identity

    Au i = λiu i

    are the eigenvectors of A.

    B.2.1 Generalized eigenvectors

    If the eigenvalues are not all distinct, one obtains the so-called generalized eigen-values, whose characterization goes beyond the scope of these notes.

    From a geometrical point of view, the eigenvectors define those directions in Rn (i.e.,the domain of the linear transformation represented by the matrix operator A) thatare invariant with respect to the transformation A, while the eigenvalues providethe related “scale factors” along these directions.

    The set of eigenvalues of a matrix A will be indicated as Λ(A), or rather {λi(A)};the set of eigenvectors of A will be indicated as {u i(A)}.

    In general, since the eigenvectors give the invariant directions of the transformation,they are represented up to a constant factor, so they are usually normalized; this isa implicit assumption that will be assumed in these notes, unless otherwise stated.

  • 198 Basilio Bona - Dynamic Modelling

    B.2.2 Eigenvalues Properties

    Given a matrix A and its eigenvalues {λi(A)}, the following holds true

    {λi(A+ cI )} = {(λi(A) + c)}

    {λi(cA)} = {(cλi(A)}

    Given an upper or lower triangular matrixa11 a12 · · · a1n0 a22 · · · a2n...

    .... . .

    ...0 0 · · · ann

    ,a11 0 · · · 0a21 a22 · · · 0...

    .... . .

    ...an1 an2 · · · ann

    its eigenvalues are the elements on the diagonal, {λi(A)} = {aii}; the same appliesfor a diagonal matrix.

    B.2.3 Invariance of the eigenvalues

    Given a matrix An×n and its eigenvalues {λi(A)}, the following holds true

    det(A) =n∏i=1

    λi

    and

    tr (A) =n∑i=1

    λi

    Given a general invertible transformation, represented by the matrix T , the eigen-values of A are invariant to the similarity transformation

    B = T−1AT

    i.e.,

    {λi(B)} = {λi(A)}

    B.2.4 Modal matrix

    If we build a matrix M , whose columns are the unit eigenvalues u i(A) of A

    M =[u1 · · · un

    ]

  • Basilio Bona - Dynamic Modelling 199

    then the similarity transformation with respect to M results in a diagonal matrix

    Λ =

    λ1 0 · · · 00 λ2 · · · 0...

    .... . .

    ...0 0 · · · λn

    = M −1AMM takes the name of modal matrix.

    If A is symmetric, its eigenvalues are all real and the following identity holds

    Λ = M TAM

    In this particular case M is orthonormal.2

    B.3 Singular value decomposition – SVD

    Given a generic matrix A ∈ Rm×n, having rank r = ρ(A) ≤ s, with s = min{m,n},it can be factorized according to the Singular value decomposition in the fol-lowing way:

    A = UΣV T =s∑i=1

    σiu ivTi (B.1)

    the important elements of this decomposition are σi, u i and v i

    • σi(A) ≥ 0 are called singular values and are equal to the non-negative squareroots of the eigenvalues of the symmetric matrix ATA:

    {σi(A)} = {√λi(A

    TA)} σi ≥ 0

    listed in decreasing order

    σ1 ≥ σ2 ≥ · · · ≥ σs ≥ 0

    if rank r < s there are only r positive singular values; the remaining ones arezero

    σ1 ≥ σ2 ≥ · · · ≥ σr > 0; σr+1 = · · · = σs = 0

    • U ∈ R(m×m) is an orthonormal square matrix

    U =[u1 u2 · · · um

    ]whose columns are the eigenvectors u i of AA

    T

    2Orthonormal matrices will be described in a subsequent Section.

  • 200 Basilio Bona - Dynamic Modelling

    • V ∈ R(n×n) is a orthonormal square matrix

    V =[v 1 v 2 · · · vn

    ]whose columns are the eigenvectors v i of A

    TA

    • Σ ∈ R(m×n) is a rectangular matrix with the following structure

    if m < n Σ =[Σ s O

    ]if m = n Σ = Σ s

    if m > n Σ =

    [Σ sO

    ]

    Σ s ∈ R(s×s) = diag(σi) is diagonal, and its diagonal terms are the singularvalues

    Σ s =

    σ1 0 · · · 00 σ2 · · · 0...

    .... . .

    ...0 0 · · · σs

    Otherwise we can decompose A to put in evidence the positive singular values alone:

    A =[P P̄

    ]︸ ︷︷ ︸U

    [Σ r OO O

    ]︸ ︷︷ ︸

    Σ

    [QT

    Q̄T

    ]︸ ︷︷ ︸

    V T

    = PΣ rQT

    where

    • P is a m× r orthonormal matrix; P̄ is a m× (m− r) orthonormal matrix;

    • Q is a n× r orthonormal matrix, Q̄T is a n× (n− r) orthonormal matrix;

    • Σ r is a r × r diagonal matrix where the positive singular values σi > 0,i = 1, · · · , r ≤ s are the diagonal elements.

    B.3.1 SVD and rank

    The rank r of A is equal to the number r ≤ s of nonzero singular values.

    Given a generic matrixA ∈ Rm×n, the two matricesATA andAAT are symmetrical,have the same positive singular values, and differ only for the number of zero singularvalues.

  • Basilio Bona - Dynamic Modelling 201

    Example B.3.1

    Given the matrix

    A =

    1 10 11 0

    where m = 3, n = 2 and r = ρ(A) = s = 2. Its SVD decomposition gives

    A = UΣV T

    where:

    U =[u1 u2 u3

    ]=

    −√6/3 0 −√3/3−√6/6 −√2/2 √3/3−√6/6

    √2/2

    √3/3

    Σ =

    [Σ sO

    ]=

    √3 00 10 0

    V =

    [v 1 v 2

    ]=

    [−√2/2

    √2/2

    −√2/2 −

    √2/2

    ]if we compute the eigenvalues of

    ATA =

    [2 11 2

    ]we obtain λ1 = 3, λ2 = 1 and notice that the elements on the diagonal of Σ s areexactly the square roots of the eigenvalues, i.e., σ1 =

    √λ1, σ2 =

    √λ2.

    We now apply the right end part of (B.1) obtaining

    σ1u1vT1 + σ2u2v

    T2 =

    √3

    −√6/3−√6/6−√6/6

    [−√2/2 −√2/2]+√1

    0−√2/2√2/2

    [√2/2 −√2/2]

    =

    1.0 1.00.5 0.50.5 0.5

    + 0.0 0.0−0.5 0.5

    0.5 −0.5

    =

    1 10 11 0

    = A�

  • 202 Basilio Bona - Dynamic Modelling

    B.4 Linear Transformations

    Given the notion of vector space, introduced in A.1, we can now define a lineartransformation.

    Given two vector spaces X ⊆ Rn and Y ⊆ Rm, respectively with dimensions n andm,and given two generic vectors x ∈ X and y ∈ Y , a generic linear transformationbetween the two spaces can be represented by the matrix operator A ∈ Rm×n, asfollows:

    y = Ax ; x ∈ Rn; y ∈ Rm.Therefore a matrix can be always interpreted as a linear operator that transformsa vector from the domain space X to the image space Y .

    Conversely, any linear operator has at least one matrix that represents it.

    B.4.1 Image space and null space

    The image space or range (space) of a transformation A is the subspace Ydefined by the following property:

    R(A) = {y | y = Ax , x ∈ X}; R(A) ⊆ Y

    The null space or kernel of a transformation A is the subspace of X defined bythe following property:

    N (A) = {x | 0 = Ax , x ∈ X}; N (A) ⊆ X

    The null space contains all the vectors in X that are transformed into the nullelement (or origin) of Y .

    The dimensions of the range and kernel spaces are called, respectively, rank ρ(A)and nullity ν(A):

    ρ(A) = dim(R(A)); ν(A) = dim(N (A)).

    If X and Y have finite dimensions, the following equalities holds:N (A) = R(AT)⊥R(A) = N (AT)⊥N (A)⊥ = R(AT)R(A)⊥ = N (AT)

    where ⊥ indicates the orthogonal complement to the corresponding (sub-)space.We recall that {0}⊥ = R.

    The following orthogonal decomposition of subspaces X and Y holdX = N (A)⊕N (A)⊥ = N (A)⊕R(AT)Y = R(A)⊕R(A)⊥ = R(A)⊕N (AT)

    where the symbol ⊕ represents the direct sum operator between subspaces.

  • Basilio Bona - Dynamic Modelling 203

    B.4.2 Generalized inverse

    Given a generic real matrixA ∈ Rm×n, withm ̸= n, the inverse matrix is not defined.Nevertheless, it is possible to define a class of matrices A−, called pseudo-inversesor generalized inverses, that satisfy the following relation:

    AA−A = A

    If A has full rank, i.e., ρ(A) = min{m,n}, it is possible to define two classes ofgeneralized inverses

    • if m < n (i.e., ρ(A) = m), the right inverse of A is a matrix Ar ∈ Rn×msuch that

    A(A−A) = AAr = Im×m

    • is n < m (i.e., ρ(A) = n), the left inverse of A is a matrix Aℓ ∈ Rn×m suchthat

    (AA−)A = AℓA = I n×n

    B.4.3 Pseudo-inverse matrix

    Among the possible left- or right- inverses, two classes are important:

    • right pseudo-inverse (m < n):

    A+r = AT(AAT)−1

    When ρ(A) = m, then (AAT)−1 exists.

    • left pseudo-inverse (n < m):

    A+ℓ = (ATA)−1AT

    When ρ(A) = n, then (ATA)−1 exists; this particular left pseudo-inverse

    (ATA)−1AT

    is also known as the Moore-Penrose pseudo-inverse.

    B.4.4 Moore-Penrose pseudo-inverse

    In general, also if ATA is non invertible, it is always possible to define a Moore-Penrose pseudo-inverse A+ that satistfies the following relations:

    (AA+)A = AA+(AA+) = A+

    (AA+)T = AA+

    (A+A)T = A+A

    (B.2)

  • 204 Basilio Bona - Dynamic Modelling

    B.4.5 Left and right pseudo-inverses

    When A is square and full-rank, the two pseudo-inverses A+r and A+ℓ , and the

    Moore-Penrose pseudo-inverse coincide with the traditional inverse matrix A−1 :

    A+r = A+ℓ = A

    + = A−1

    B.4.6 Linear equation systems

    The linear transformation associated to A ∈ Rm×n

    y = Ax ,

    with x ∈ Rn and y ∈ Rm, is equivalent to a system of m linear equations in nunknowns, whose coefficients are the elements of A; this linear system may admitone solution, no solution or an infinite number of solutions.

    If we use the pseudo-inverses to solve the linear system y = Ax , we must distinguishthree cases, assuming that A has full rank.

    • n = m

    There are as many unknowns as equations, and X ,Y have the same dimensions.The inverse matrix exists, since A has full rank, so the unknown x is obtained as

    x = A−1y .

    • n > m

    The dimension of the space of the unknowns X is larger than the dimension ofY , hence the system has more unknowns than equations; the system is undercon-strained. Among the infinite possible solutions x ∈ Rn, we choose the one withminimum norm ∥x∥, given by

    x ∗ = A+r y = AT(AAT)−1y

    All the other possible solutions of y = Ax are obtained as

    x̄ = x ∗ + v = A+r y + v

    where v ∈ N (A) is a vector belonging to the null space of A, with dimensionsn−m.

    These other possible solutions can be expressed also defining v = (I −A+r A)w , sothat

    x̄ = A+r y + (I −A+r A)w

  • Basilio Bona - Dynamic Modelling 205

    Figure B.1: Solution of y = Ax when m < n; in this case m = 1, n = 2.

    where w ∈ Rn is a n× 1 generic vector.

    The matrix I − A+r A projects w on the null space of A, transforming w inv ∈ N (A); this matrix is called projection matrix. An example of such lineartransformation is sketched in Figure B.1.

    • m > n

    The dimension of the space of the unknowns X is smaller than the dimension of Y ,hence the system has more equations than unknowns; the system is overconstrained.No exact solutions exist for y = Ax , but only approximate solutions, with an errore = y −Ax ̸= 0. Among these possible approximate solutions we choose the oneminimizing the norm of the error, i.e.,

    x̂ = arg minx∈Rn

    ∥y −Ax∥

    The solution is

    x̂ = A+ℓ y = (ATA)−1ATy

    Geometrically x̂ it is the orthogonal projection of y on the orthogonal comple-ment of N (A), i.e., on the subspace N (A)⊥ = R(AT).

    The approximation error, also called projection error, is

    ê = (I −AA+ℓ )y

    and its norm is the lowest among all possible norms, as said above. An example ofsuch linear transformation is sketched in Figure B.4.6.

  • 206 Basilio Bona - Dynamic Modelling

    Figure B.2: Solution of y = Ax when m > n.

    The similarity between the projection matrix I −A+r A and the matrix that givesthe projection error I − AA+ℓ is important and will be studied when projectionmatrices will be treated.

    Aggiungere esempi sviluppati con Matlab

    In order to compute the generalized inverses, one can use the SVD. In particular,the pseudo-inverse is computed as

    A+ = V

    [Σ−1r OO O

    ]U T = QΣ−1r P

    T.

    B.5 Projections and projection matrices

    The geometrical concept of a projection of a vector on a plane can be extendedand generalized to the elements of a vector space. This concept is important for thesolution of a large number of problems, as approximation, estimation, predictionand filtering problems.

    Give a n-dimensional real vector space V(Rn), endowed with the scalar product,and a k ≤ n dimensional subspace W(Rk), it is possible to define the projectionoperator of vectors v ∈ V on the subspace W.

    The projection operator is the square projection matrixP ∈ Rn×n, whose columnsare the projections of the base elements of V in W. A matrix is a projection matrixiff P2 = P i.e., is idempotent.

    Notice that the projection matrix is a square matrix, since the projected vector,although it belongs to a lesser dimensional space, it is nonetheless defined in Rn.

    The projection can be orthogonal or non orthogonal; in the first case P is symmet-rical, in the second case it is generic. If P is a projection matrix, also I − P is aprojection matrix.

    Some examples of projection matrices are those associated to the left pseudo-inverse

    P1 = AA+ℓ e P2 = I −AA

    +ℓ

  • Basilio Bona - Dynamic Modelling 207

    and to the right pseudo-inverse

    P3 = A+r A e P4 = I −A+r A

    From a geometrical point of view, P1 projects every vector v ∈ V in the range spaceR(A), while P2 projects v in its orthogonal complement R(A)⊥ = N (AT).

    B.5.1 Matrix norm

    Similarly to what can be established for a vector, it is possible to provide a “measure”of the matrix, i.e., give its “magnitude”, defining the matrix norm.

    Since a matrix represents a linear transformation between vectors, the matrix normmeasures how “big” this transformation is, but in some way, must be “normalized”,to avoid the fact that the magnitude of the transformed vector affects the norm;hence the following definition:

    ∥A∥ def= sup∥x∥

    ∥Ax∥∥x∥

    = sup∥x∥=1

    ∥Ax∥ .

    Given a square matrix A ∈ Rn×n, its norm must satisfy the following general (norm)axioms:

    1. ∥A∥ > 0 for every A ̸= O ;

    2. ∥A∥ = 0 iff A = O ;

    3. ∥A+B∥ ≤ ∥A∥+ ∥B∥ (triangular inequality);

    4. ∥αA∥ = |α| ∥A∥ for any scalar α and any matrix A;

    5. ∥AB∥ ≤ ∥A∥ ∥B∥.

    Given A ∈ Rn×n and its eigenvalues {λi(A)}, the following inequalities hold

    1∥∥A−1∥∥ ≤ |λi| ≤ ∥A∥ ∀i = 1, . . . , nTaking into account only real matrices, the most used matrix norms are:

    • Spectral norm:∥A∥2 =

    √maxi

    {λi(ATA)}

    • Frobenius norm∥A∥F =

    √∑i

    ∑j

    a2ij =√

    trATA

  • 208 Basilio Bona - Dynamic Modelling

    • Max singular value:∥A∥σ =

    √maxi

    {σi(A)}

    • 1-norm or max-norm:

    ∥A∥1 = maxj

    n∑i=1

    |aij|

    • ∞-norm:

    ∥A∥∞ = maxi

    n∑j=1

    |aij|

    In general,∥A∥2 = ∥A∥σ

    and∥A∥22 ≤ ∥A∥1 ∥A∥∞

    B.6 Antisymmetric matrices

    A square matrix S is called antisymmetric or skew-symmetric when

    S + ST = O or S = −ST

    A skew-symmetric matrix has the following structure

    Sn×n =

    0 s12 · · · s1n

    −s12 0 · · · s2n...

    .... . .

    ...−s1n −s2n · · · 0

    Therefore there it has at most

    n(n− 1)2

    independent elements.

    For n = 3 it resultsn(n− 1)

    2= 3, hence an antisymmetric matrix has as many

    element as a 3D vector v .

    Given a vector v =[v1 v2 v3

    ]Tit is possible to build S , and given a matrix S it

    is possible to extract the associated vector v .

    We indicate this fact using the symbol S(v), where, by convention

    S(v) =

    0 −v3 v2v3 0 −v1−v2 v1 0

    Some properties:

  • Basilio Bona - Dynamic Modelling 209

    • Given any vector v ∈ R3:

    ST(v) = −S(v) = S(−v)

    • Given two scalars λ1, λ2 ∈ R:

    S(λ1u + λ2v) = λ1S(u) + λ2S(v) (B.3)

    • Given any two vectors v ,u ∈ R3, it follows from simple inspection that

    S(u)v = u × v = −v × u = S(−v)u = ST(v)u

    Therefore S(u) is the representation of the cross product operator (u×) andviceversa.

    The matrix S(u)S(u) = S 2(u) is symmetrical and

    S 2(u) = uuT − ∥u∥2 I

    Hence we can define the dyadic product as

    D(u ,u) = uuT = S 2(u) + ∥u∥2 I

    Another property that will be used to characterize the energy in a rigid body is thefollowing: any quadratic form associated to a skew-symmetric matrix is identicallyzero, i.e.,

    xTS(u)x ≡ 0 (B.4)for any x and any u .

    The proof is quite simple: let us define w = S(u)x = u × x , we will havexTS(u)x = xTw , but since w is orthogonal both to u and x , the scalar prod-uct is zero, xTw = 0.

    If R is an orthonormal/rotation matrix, the distributive property with respect tothe external product holds3:

    R(v × u) = (Rv)× (Ru) (B.5)

    For any orthonormal matrix R and any vector v , the above properties allow to statethe following relations

    RS(v)RTu = R(v × (RTu)

    )= (Rv)× (RRTu)= (Rv)× u= S(Rv)u

    (B.6)

    Since they are valid for any generic u , we obtain

    RS(v)RT = S(Rv)RS(v) = S(Rv)R

    (B.7)

    3 This property is not always true, but when R is orthogonal, it holds.

  • 210 Basilio Bona - Dynamic Modelling

    B.6.1 Eigenvalues and eigenvectors of antisymmetric matri-ces

    Given an antisymmetric matrix S(v) ∈ R3, its eigenvalues are imaginary or zero.

    λ1 = 0, λ2,3 = ±j ∥v∥

    The eigenvalue related to the eigenvector λ1 = 0 is v ; the other two are complexconjugate.

    The set of antisymmetric matrices is a vector space, denoted as so(3).

    Antisymmetric matrices form a Lie algebra, which is related to the Lie group oforthogonal matrices.

    Given two antisymmetric matrices S 1 and S 2, we call commutator or Lie bracketthe following operator

    [S 1,S 2]def= S 1S 2 − S 2S 1

    that is itself antisymmetric.

    B.6.2 Symmetric-antisymmetric factorization

    Given a square matrix A, it is always possible to factor it in a sum of two matrices,as follows:

    A = As +Aa

    where

    As =1

    2(A+AT) symmetric matrix

    Aa =1

    2(A−AT) skew-symmetric matrix

    Example B.6.1

    Given the following vectors

    v 1 =[1 2 3

    ]Tv 2 =

    [3 2 1

    ]we have the skew-symmetric matrices

    S 1(v 1) =

    0 −3 23 0 −1−2 1 0

    S 2(v 2) = 0 −1 21 0 −3−2 3 0

  • Basilio Bona - Dynamic Modelling 211

    Now we compute the commutator

    [S 1,S 2] = S 1S 2 − S 2S 1 =

    −7 6 92 −6 61 2 −7

    −−7 2 16 −6 2

    9 6 −7

    = 0 4 8−4 0 4−8 −4 0

    B.7 Orthogonal matrices

    A square matrix A ∈ Rn is called orthogonal when

    ATA =

    α1 0 · · · 00 α2 · · · 0...

    .... . .

    ...0 0 · · · αn

    with αi ̸= 0.

    B.7.1 Orthonormal matrices

    A square orthogonal matrix U ∈ Rn is called orthonormal when all the constantsαi are 1:

    U TU = UU T = I

    Therefore

    U −1 = U T

    Other properties:

    • The columns, as well as the rows, of U or orthogonal to each other and haveunit norm.

    • ∥U ∥ = 1;

    • The determinant of U has unit module:

    |det(U )| = 1

    therefore it can be +1 or −1.

    • Given a vector x , its orthonormal transformation is y = Ux .

  • 212 Basilio Bona - Dynamic Modelling

    If U is an orthonormal matrix, then ∥AU ∥ = ∥UA∥ = ∥A∥.

    When U ∈ R3×3, only 3 out of 9 elements are independent.

    Scalar product is invariant to orthonormal transformations,

    (Ux ) · (Uy) = (Ux )T(Uy) = xTU TUy = xTy = x · y

    This means that vector lengths are invariant with respect to orthonormal trasfor-mations

    ∥Ux∥ = (Ux )T(Ux ) = xTU TUx = xTI x = xTx = ∥x∥

    When considering orthonormal transformations, it is important to distinguish thetwo cases:

    • When det(U ) = +1, U represents a proper rotation or simply a rotation,

    • When det(U ) = −1, U represents an improper rotation or reflection.

    The set of rotations forms a continuous non-commutative group (with respect tothe product); the set of reflections do not have this “quality”; intuitively this meansthat infinitesimal rotations exist, while infinitesimal reflections are not defined.

    Reflections are the most basic transformation in 3D spaces, in the sense that trans-lations, rotations and roto-reflections (slidings) are obtained from the compositionof two or three reflections, as shown in Figure B.3, where it is shown that a com-position of two reflections with respect to intersecting axes gives a rotation, while acomposition of two reflections with respect to parallel axes results in a translation.

    Figure B.3: Composition of reflections give both rotations and translations.

  • Basilio Bona - Dynamic Modelling 213

    If U is an orthonormal matrix, the distributive property with respect to the crossproduct holds:

    U (x × y) = (Ux )× (Uy)while this property does not hold with generic square A matrices .

    For any proper rotation matrix U and a generic vector x the following holds

    US(x )U Ty = U(x × (U Ty)

    )= (Ux )× (UU Ty)

    = (Ux )× y = S(Ux )ywhere S(x ) is the antisymmetric matrix associated with x ; therefore:

    US(x )U T = S(Ux )US(x ) = S(Ux )U

    B.8 Bilinear and quadratic forms

    A bilinear form associated to the matrix A ∈ Rm×n is the scalar quantity definedas

    b(x ,y)def= xTAy = yTATx

    A quadratic form associated to the square matrix A ∈ Rn×n is the scalar quantitydefined as

    q(x )def= xTAx = xTATx

    Every quadratic form associated to a skew-symmetric matrix S(y) is identicallyzero

    xTS(y)x ≡ 0 ∀xIndeed, assuming w = S(y)x = y×x , one obtains xTS(y)x = xTw , but since, bydefinition, w is orthogonal to both y and x , the scalar product xTw will be alwayszero, and also the quadratic form at the left hand side.

    B.8.1 Definite positive matrices

    Recalling the standard decomposition of a generic square matrix A in symmetricterms As and anti-symmetric terms Aa, one concludes that the quadratic formdepends only on the symmetric part of the matrix:

    q(x ) = xTAx = xT(As +Aa)x = xTAsx

    A square matrix A is said to be positive definite if the associated quadratic formxTAx satisfies to the following conditions

    xTAx > 0 ∀x ̸= 0xTAx = 0 x = 0

  • 214 Basilio Bona - Dynamic Modelling

    A square matrix A is said to be positive semidefinite if the associated quadraticform xTAx satisfies to the following conditions

    xTAx ≥ 0 ∀x

    A square matrix A is said to be negative definite if −A is positive definite;similarly, a square matrixA is semidefinite negative if −A è semidefinite positive.

    Often we use the following notations:

    definite positive matrix: A ≻ 0semidefinite positive matrix: A ≽ 0definite negative matrix: A ≺ 0semidefinite negative matrix: A ≼ 0

    A necessary but not sufficient condition for a square matrix A to be positive definiteis that the elements on its diagonal are all strictly positive.

    A necessary and sufficient condition for a square matrix A to be definite positive isthat all its eigenvalues are strictly positive.

    B.8.2 Sylvester criterion

    The Sylvester criterion states that a square matrix A is positive definite iff allits principal minors are strictly positive.

    A definite positive matrix has full rank and is always invertible. The associatedquadratic form xTAx satisfies the following identity

    λmin(A) ∥x∥2 ≤ xTAx ≤ λmax(A) ∥x∥2

    where λmin(A) and λmax(A) are, respectively, the minimum and the maximum eigen-values.

    B.8.3 Semidefinite matrices and rank

    A semidefinite positive matrix An×n has rank ρ(A) = r < n, i.e., it has r strictlypositive eigenvalues and n− r zero eigenvalues. The quadratic form goes to zero forevery vector x ∈ N (A).

    Given a real matrix of generic dimensions Am×n, we have seen that both ATA and

    AAT are symmetrical; in addition we know that

    ρ(ATA) = ρ(AAT) = ρ(A)

    These matrices have all real non negative eigenvalues, and therefore they are definiteor semidefinite positive: in particular, if Am×n has full rank, then

  • Basilio Bona - Dynamic Modelling 215

    • if m < n, ATA ≽ 0 and AAT ≻ 0,

    • if m = n, ATA ≻ 0 and AAT ≻ 0,

    • if m > n, ATA ≻ 0 and AAT ≽ 0.

    B.9 Matrix derivatives

    If a matrixA(t) is composed of elements aij(t) that are all differentiable with respectto a parameter t, then the matrix derivative is

    d

    dtA(t) =

    [daij(t)

    dt

    ]If t is time, the matrix time derivative is also written as

    d

    dtA(t) = Ȧ(t) = [ȧij(t)]

    If a square matrix A(t) has rank ρ(A(t)) = n for any time t, then the derivative ofits inverse is

    d

    dtA(t)−1 = −A−1(t)Ȧ(t)A(t)−1

    Since the inverse operator is a nonlinear operator, in general it results that[dA(t)

    dt

    ]−1̸= d

    dt

    [A(t)−1

    ]�

    Example B.9.1

    Given the matrix A(t)

    A(t) =

    [2 tt t2

    ];

    its derivative with respect to t is

    d

    dtA(t) =

    [0 11 t

    ]We compute its inverse A−1

    A−1 =1

    t2

    [t2 −t−t 2

    ]=

    [1 −1/t

    −1/t 2/t2]

  • 216 Basilio Bona - Dynamic Modelling

    We computed

    dtA(t)−1

    d

    dtA(t)−1 = −

    [1 −1/t

    −1/t 2/t2] [

    0 11 t

    ] [1 −1/t

    −1/t 2/t2]=

    [0 1/t2

    1/t2 −4/t3]

    and we can see that it is different from(d

    dtA(t)

    )−1=

    [−2t 11 0

    ]�

    If A is a time function through the variable x(t), then

    d

    dtA(x(t)) ≡ Ȧ(x(t)) def=

    [∂aij(x)

    ∂x

    dx(t)

    dt

    ]≡[∂aij(x)

    ∂x

    ]ẋ(t)

    Given a vector-values scalar function ϕ(x ) defined as ϕ(·) : Rn → R1, the gradientof the function ϕ with respect to to x is a column vector

    ∇xϕ =∂ϕ

    ∂xdef=

    ∂ϕ(x )

    ∂x1· · ·

    ∂ϕ(x )

    ∂xn

    i.e., ∇x def=

    ∂x1· · ·∂

    ∂xn

    = gradxIf x (t) is a differentiable time function, then

    dϕ(x )

    dt≡ ϕ̇(x ) = ∇Txϕ(x )ẋ

    (Notice the convention: the gradient for us is a column vector, although manytextbooks assume it is a row vector)

    B.9.1 Gradient

    Given a bilinear form b(x ,y) = xTAy , we call gradients the following vectors:

    gradient with respect to x : gradx b(x ,y)def=∂b(x ,y)

    ∂x= Ay

    gradient with respect to y : gradyb(x ,y)def=∂b(x ,y)

    ∂y= ATx

    Given the quadratic form q(x ) = xTAx , we call gradient with respect to x thefollowing vector:

    ∇xq(x ) ≡ gradxq(x )def=∂q(x )

    ∂x= 2Ax

  • Basilio Bona - Dynamic Modelling 217

    B.9.2 Jacobian matrix

    Given am×1 vector function f (x ) =[f1(x ) · · · fm(x )

    ]T, x ∈ Rn, the Jacobian

    matrix (or simply the jacobian) is a m× n matrix defined as

    J f (x ) =

    (∂f1(x )

    ∂x

    )T· · ·(

    ∂fm(x )

    ∂x

    )T =

    ∂f1(x )

    ∂x1· · · ∂f1(x )

    ∂xn

    · · · ∂fi(x )∂xj

    · · ·

    ∂fm(x )

    ∂x1· · · ∂fm(x )

    ∂xn

    = (gradx f 1)T· · ·(gradx f m)

    T

    and if x (t) is a differentiable time function, then

    ḟ (x ) ≡ df (x )dt

    =df (x )

    dxẋ (t) = J f (x )ẋ (t)

    Notice that the rows of J f are the transpose of the gradients of the various func-tions.