mathematics for economists - bauer college of business · rs- chapter 4 1 mathematics for...
TRANSCRIPT
RS- Chapter 4 1
Mathematics for Economists
Chapters 4-5Linear Models and Matrix Algebra
Johann Carl Friedrich Gauss (1777–1855)The Nine Chapters on the Mathematical Art(1000-200 BC)
Objectives of Math for Economists
To study economic problems with the formal and wonderful tools of mathematics.
To understand mathematical economics problems by stating the unknown, the data and the restrictions/conditions.
To plan solutions to these problems by finding a connection between the data and the unknown
To carry out your plans for solving mathematical economics problems
To examine the solutions to mathematical economics problems for general insights into current and future problems.
Remember: Math econ is like love – a simple idea but it can get complicated.
2
RS- Chapter 4 2
4. Linear Algebra
Some early history:
The beginnings of matrices and determinants goes back to the second century BC although traces can be seen back to the fourth century BC. But, the ideas did not make it to mainstream math until the late 16th century
The Babylonians around 300 BC studied problems which lead to simultaneous linear equations.
The Chinese, between 200 BC and 100 BC, came much closer to matrices than the Babylonians. Indeed, the text Nine Chapters on the Mathematical Art written during the Han Dynasty gives the first known example of matrix methods.
In Europe, 2x2 determinants were considered by Cardano at the end of the 16th century and larger ones by Leibniz and, in Japan, by Seki about 100 years later.
4. What is a Matrix?
A matrix is a set of elements, organized into rows and columns
dc
ba
rows
columns
• a and d are the diagonal elements. • b and c are the off-diagonal elements.
• Matrices are like plain numbers in many ways: they can be added, subtracted, and, in some cases, multiplied and inverted (divided).
Arthur Cayley (1821 – 1895, England)
RS- Chapter 4 3
4. Matrix: Details
Examples:
5
bd
bA ;
1
1
• Dimensions of a matrix: numbers of rows by numbers of columns. The Matrix A is a 2x2 matrix, b is a 1x3 matrix.
• A matrix with only one column or only one row is called a vector.
• If a matrix has an equal numbers of rows and columns, it is called a square matrix. Matrix A, above, is a square matrix.
• Usual Notation: Upper case letters => matricesLower case => vectors
In econometrics, we have data, say N (or T) observations, on a dependent variable, Y, and on k explanatory variables, X.
Under the usual notation, vectors will be column vectors: y andxk are Nx1 vectors:
⋮ & xj ⋮ j=1,..., k
X is a Txk matrix: X ⋯
⋮ ⋱ ⋮⋯
Its columns are the k Nx1 vectors xj. It is common to treat x1 as vector of ones, ι.
4. Matrix: Details
6
RS- Chapter 4 4
4. Matrix: Elementary Row Operations
7
• Elementary row operations: - Switching: Swap the positions of two rows - Multiplication: Multiply a row by a non-zero scalar - Addition: Add to one row a scalar multiple of another.
• An elementary matrix is a matrix which differs from the identity matrix by one single elementary row operation.
• If the matrix subject to elementary row operations is associated to a system of linear equations, then these operations do not change the solution set. Row operations can make the problem easier.
• Elementary row operations are used in Gaussian elimination to reduce a matrix to row echelon form.
4.1 Matrix multiplication: Details
Multiplication of matrices requires a conformability condition The conformability condition for multiplication is that the
column dimensions of the lead matrix A must be equal to the row dimension of the lag matrix B.
If A is an (mxn) and B an (nxp) matrix (A has the same number of columns as B has rows), then we define the product of AB. AB is (mxp) matrix with its ij-th element is
What are the dimensions of the vector, matrix, and result?
131211232221
131211
1211 ccccbb
bbbaaaB
231213112212121121121111 babababababa
• Dimensions: a(1x2), B(2x3) => c(1x3)
jk
n
j ijba 1
8
RS- Chapter 4 5
4.1 Transpose Matrix
49
08
13
4 01
983AA:Example
The transpose of a matrix A is another matrix AT (also written A′) created by any one of the following equivalent actions:
- write the rows (columns) of A as the columns (rows) of AT
- reflect A by its main diagonal to obtain AT
Formally, the (i,j) element of AT is the (j,i) element of A:
[AT]ij = [A]ji If A is a m × n matrix => AT is a n × m matrix.
(A')' = A
Conformability changes unless the matrix is square.
9
In econometrics, an important matrix is X’X. Recall X:
X ⋯
⋮ ⋱ ⋮⋯
a (Nxk) matrix
Then,
X’ ⋯
⋮ ⋱ ⋮⋯
a (kxN) matrix
4.1 Transpose Matrix
10
RS- Chapter 4 6
4.1 Special Matrices: Identity and Null
000
000
000
100
010
001 Identity Matrix: A square matrix with 1’s along the diagonal and 0’s everywhere else. Similar to scalar “1.”
Null matrix: A matrix in which all elements are 0’s. Similar to scalar “0.”
Both are diagonal matrices => off-diagonal elements are zero.
11
Both are examples of symmetric and idempotent matrices. That is,
- Symmetric: A = AT
- Idempotent: A = A2 = A3 = …
4.1 Basic Operations
Addition, Subtraction, Multiplication
hdgc
fbea
hg
fe
dc
ba
hdgc
fbea
hg
fe
dc
ba
dhcfdgce
bhafbgae
hg
fe
dc
ba
Just add elements
Just subtract elements
Multiply each row by each column and add
kdkc
kbka
dc
bak Multiply each
element by the scalar12
RS- Chapter 4 7
4.1 Basic Matrix Operations: Examples
222222
117
25
20
13
97
12
xxx CBA
Matrix addition
Matrix subtraction
Matrix multiplication
Scalar multiplication
13
65
11
32
01
97
12
222222 x
2726
34
32
01x
97
12
xxx CBA
8143
2141
16
42
8
1
13
4.1 Basic Matrix Operations: x′x
xj is an Nx1 vector. Then, xj′xj (a 1x1 matrix) is a scalar:
xj′xj … ⋮ =∑ ,
Similarly, let xk be another an Nx1 vector. Then,
xj′xk … ⋮ =∑ ,
If xk= ι (an Nx1 vector of ones)
xj′ ι … ⋮ =∑ , (= ι′xj )
14
RS- Chapter 4 8
4.1 Basic Matrix Operations: X′X
A special matrix in econometrics, X′X (a kxk matrix):
Recall X (Nxk): X ⋯
⋮ ⋱ ⋮⋯
& X’⋯
⋮ ⋱ ⋮⋯
2 21 1 1 1 2 1 1 1 1 2 1
2 21 2 1 1 2 1 2 2 1 2 2
1
21 1 1 2 1 1 2
... ...
... ...=
... ... ... ... ... ... ... ...
...
n n ni i i i i i i iK i i i i iK
n n nni i i i i i i iK i i i i iKi
n n ni iK i i iK i i iK iK i iK i
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x
X'X =
2
1
21 1 2
1
...
= ......
=
iK
i
ini i i iK
ik
ni i i
x
x
xx x x
x
x x15
4.1 Laws of Matrix Addition & Multiplication
22222121
12121111
2221
1211
2221
1211
abaa
abba
bb
bb
aa
aaBA
Commutative law of Matrix Addition: A + B = B + A
22222121
12121111
2221
1211
2221
1211
abab
abab
bb
aa
bb
bbAB
Matrix Multiplication is distributive across Additions:
A (B+ C) = AB + AC (assuming comformability applies).
16
RS- Chapter 4 9
4.1 Matrix Multiplication
Matrix multiplication is generally not commutative. That is,
AB BA even if BA is conformable(because different dot product of rows or col. of A&B)
76
10,
43
21BA
2524
1312
74136403
72116201AB
4027
43
47263716
41203110BA
17
4.1 Matrix multiplication
Exceptions to non-commutative law:
AB=BA iff
B = a scalar,
B = identity matrix I, or
B = the inverse of A -i.e., A-1
Theorem: It is not true that AB = AC => B=C
Proof:
132
111
212
;
011
010
111
;
321
101
121
CBA
Note: If AB = AC for all matrices A, then B=C.18
RS- Chapter 4 10
4.1 Inverse of a Matrix
Identity matrix: AI = A
100
010
001
3I
Notation: Ij is a jxj identity matrix.
Given A (mxn), the matrix B (nxm) is a right-inverse for A iff
AB = Im
Given A (mxn), the matrix C (mxn) is a left-inverse for A iff
CA = In
19
4.1 Inverse of a Matrix
Theorem: If A (mxn), has both a right-inverse B and a left-inverse C, thenC = B.
Proof:
We have AB=Im and CA=In.
Thus,
C(AB)= C Im = C and C(AB)=(CA)B= InB = B
=> C(nxm)=B(mxn)
Note:
- This matrix is unique. (Suppose there is another left-inverse D, then D=B by the theorem, so D=C.).
- If A has both a right and a left inverse, it is a square matrix. It is usually called invertible. We say “the matrix A is non-singular.”
RS- Chapter 4 11
4.1 Inverse of a Matrix
Inversion is tricky:(ABC)-1 = C-1B-1A-1
Theorem: If A (mxn) and B (nxp) have inverses, then AB is invertible and (AB)-1 = B-1A-1
Proof:
We have AA-1=Im and A-1A=In
BB-1=In and B-1B=Ip
Thus,
B-1A-1(AB) = B-1 (A-1A) B= B-1 InB = B-1 B = Ip
(AB) B-1A-1 = A (BB-1) A-1 = A In A-1 = A A-1 = Im
=> AB is invertible and (AB)-1 = B-1A-1
More on this topic later. 21
4.1 Transpose and Inverse Matrix
(A + B)' = A' + B'
If A' = A, then A is called a symmetric matrix.
Theorems:
- Given two comformable matrices A and B, then (AB)' = B'A'
- If A is invertible, then (A-1)' = (A')-1 (and A' is also invertible).
22
RS- Chapter 4 12
4.1 Partitioned Matrix
A partitioned matrix is a matrix which has been broken into sections called blocks or submatrices by horizontal and/or vertical lines extending along entire rows or columns. For example, the 3xm matrix can be partitioned as:
Augmented matrices are also partitioned matrices. They have been partitioned vertically into two blocks.
Partitioned matrices are used to simplify the computation of inverses.
))2x(1()2x1(
))2x(2()2x2(
|
|
|
|
2221
1211
33231
22221
11211
mAA
mAA
aaa
aaa
aaa
m
m
m
23
4.1 Partitioned Matrix
If two matrices, A and B, are partitioned the same way, addition can be done by blocks. Similarly, if both matrices are comformable partitioned, then multiplication can be done by blocks.
A block diagonal matrix is a partitioned square matrix, with main diagonal blocks square matrices and the off-diagonal blocks are null matrices.
Nice Property: The inverse of a block diagonal matrix is just the inverse of each block.
24
1
12
11
2
1
00
00
00
00
00
00
nn A
A
A
A
A
A
RS- Chapter 4 13
4.1 Partitioned Matrix: Partitioned OLS Solution
OLS solution:
b = (X′X)-1X′ y
Use of the partitioned inverse result produces a fundamental result: The Frisch-Waugh (1933) Theorem. For this, we need southeast element in the inverse of (X′X)-1X:
With the partitioned inverse, we get: b2 = [ ]-1(2,1) X1′y + [ ]-1(2,2) X2′y
yX
yX
XXXX
XXXX
'
'
''
''
2
11
2212
2111
1 1 1 2
2 1 2 2
-1X 'X X 'X
X 'X X 'X
[ ]-1(2,2)
25
4.1 Partitioned Matrix: Partitioned OLS Solution
b2 = [ ]-1(2,1) X1′y + [ ]-1(2,2) X2′y
We will derive later:
1212
121
11112
121
1111222
11112
211
111
1112211
111
11
2212
2111
]'[
])')'(('[]')'(''[ where
)'('
')'()'('')'()'( Inverse .2
''
''XX'Matrix .1
XMXD
XXXXXIXXXXXXXXXD
DXXXDX
DXXXXXXXDXXXXXXX
XXXX
XXXX
The algebraic result is: [ ]-1(2,1) = -D X2’X1(X1’X1)-1
[ ]-1(2,2) = D = [X2’M1X2]-1
=> b2 = -D X2’X1(X1’X1)-1X1′y + D X2′y = [X2′M1X2]-1X2′M1y
RS- Chapter 4 14
4.1 Properties of Symmetric Matrices
Definition:
If A' = A, then A is called a symmetric matrix.
Theorems:
- If A and B are nxn symmetric matrices, then (AB)' = BA
- If A and B are nxn symmetric matrices, then (A+B)' = B+A
- If C is any nxn matrix, then B = C'C is symmetric.
Useful symmetric matrices:
V = X’X
P = X(X’X)-1X’ P: Projection matrix
M = I – P = I - X(X’X)-1X’ M: Residual maker
27
4.1 Application 1: Linear System
There is a functional form relating a dependent variable y and kexplanatory variables X. The functional form is linear, but it depends on k unknown parameters, . The relation between y and X is not exact. There is an error, . We have T observations of y and X.
Then, the data is generated according to:
yi = Σj=1,..k xk,i k + i i=1, 2, ...., T.
Or using matrix notation:
y = X + where y & are (Tx1); X is (Txk); and is (Txk).
We will call this relation data generating process (DGP).
The goal of econometrics is to estimate the unknown vector . 28
RS- Chapter 4 15
4.1 Application 2: System of Equations
Assume an economic model as system of linear equations with: aij parameters, where i = 1.. n rows, j = 1.. m columns, and n=mxi endogenous variables, di exogenous variables and constants
nn
n
n
nm
m
m
nn d
d
d
x
x
x
ax
ax
ax
axa
axa
axa
2
1
2
22
12
211
22121
12111
Q: What is the nature of the set of solutions to this system of equations?
29
A general form matrix of a system of linear equationsAx = d where A = matrix of parametersx = column vector of endogenous variables d = column vector of exogenous variables and constants
Solve for x*
dAx
d
d
d
x
x
x
aaa
aaa
aaa
nnnmnn
m
m
2
1
2
1
21
22221
11211
30
Q: For what combinations of A and d there will zero, one, many or an infinite number of solutions? How do we compute (characterize) those sets of solutions?
4.1 Application 2: System of Equations
RS- Chapter 4 16
4.1 Solution of a General Equation System
Theorem: Given A (mxn). If A has a right-inverse, then the equation Ax = d has at least one solution for every d (mx1).
Proof:Pick an arbitrary d. Let H be a right-inverse (so AH=Im).Define x*=Hd. Thus,Ax* = A Hd = Imd = d => x* is a solution. ■
31
Theorem: Given A (mxn). If A has a left-inverse, then the equation Ax=d has at most one solution for every d (mx1). That is, if Ax=d has a solution x* for a particular d, then x* is unique.
Proof:Suppose x* is a solution and z* is another solution. Thus, Ax*=dand Az*=d. Let G be a left-inverse for A (so GA=In).
Ax*=d => GA x*= Gd=> Inx* = x* = Gd.
Az*= d => GA z* = Gd=> Inz* = z* = Gd.
Thus, x*=z*=Gd. ■
4.1 Solution of a General Equation System
32
RS- Chapter 4 17
Assume the 2x2 model2x + y = 124x + 2y = 24
Find x*, y*:y = 12 – 2x4x + 2(12 – 2x) = 244x +24 – 4x = 240 = 0 ? indeterminante!
Why?4x + 2y =242(2x + y) = 2(12) one equation with two
unknowns2x + y = 12
Conclusion: Not all simultaneous equation models have solutions
(not all matrices have inverses).
Problem with the previous proof? We’re assuming the left-inverse exists (and there’s always a solution).
4.1 Solution of a General Equation System
33
Theorem: Given A (mxn) invertible. Then, the equation Ax = dhas one and only one solution for every d (mx1).
Proof: Trivial from previous two theorems.
4.1 Solution of a General Equation System
34
RS- Chapter 4 18
4.1 Linear dependence
A set of vectors is linearly dependent if any one of them can be expressed as a linear combination of the remaining vectors; otherwise, it is linearly independent.
Formal definition: Linear independence (LI)
The set {u1,...,uk} is called a linearly independent set of vectors iff
c1 u1+....+ ckuk = θ => c1= c2=...=ck,=0.
Notes:
- Dependence prevents solving a system of equations. More unknowns than independent equations.
- The number of linearly independent rows or columns in a matrix is the rank of a matrix (rank(A)).
35
4.1 Linear dependence
Examples:
1)(02
2412
105
2410
125
//2
/1
'2
'1
'2
'1
Arankvv
v
vA
v
v
2)(023
54
162216
23
587
412;
5
4;
8
1;
7
2
321
3
21
321
Arankvvv
v
vv
Avvv
36
RS- Chapter 4 19
4.2 Application 1: One Commodity Market Model (2x2 matrix)
Economic Model
1) Qd = a – bP (a,b >0)
2) Qs = -c + dP (c,d >0)
3) Qd = Qs
Find P* and Q*
Scalar Algebra form
(Endogenous Vars :: Constants)
4) 1Q + bP = a
5) 1Q – dP = -c
37
db
bcadQ
db
caP
*
*
dAx
c
a
d
b
P
Q
dAx
c
a
P
Q
d
b
1*
1
*
*
1
1
1
1
Matrix algebra
4.2 Application 1: One Commodity Market Model (2x2 matrix)
38
RS- Chapter 4 20
4.2 Application 2: Finite Markov Chains
Markov processes are used to measure movements over time.
39
90110
100*6.100*3.,100*4.100*7.6.4.
3.7.100100
PP
PPx
plant?each at be willemployeesmany how year, one of end At the
6.4.
3.7.
PP
PPM
yprobabilitknown a w/ plantseach between move andstay employees The
100100x
B &A plants over two ddistribute are 0 at time Employees
0000BBBA
ABAA00
/011
BBBA
ABAA
00/0
BBABBAAA PBPBPAPABAMBA
BA
40
kkk MBA
BAMBA
BAMBA
/0
BBBA
ABAA
BBBA
ABAA00
2/022
BBBA
ABAA00
/011
x:yearsk After
87113
90*6.110*3.,90*4.110*7.6.4.
3.7.90110
PP
PP
PP
PPx
90110PP
PPx
plant?each at be willemployeesmany how years, twoof end At the
4.2 Application 2: Finite Markov Chains
RS- Chapter 4 21
4.3 Definite Matrices - Forms
A form is a polynomial expression in which each component term has a uniform degree. A quadratic form has a uniform second degree.
Examples:
9x + 3y + 2z -first degree form.
6x2 + 2xy + 2y2 -second degree (quadratic) form.
x2z + 2yz2 + 2y3 -third degree (cubic) form.
A quadratic form can be written as: x’A x, where A is a symmetric matrix.
41
4.3 Definite Matrices - Forms
For one variable, a quadratic form is the familiar: y = a x2
If a>0, then a x2 is always non-negative, and equals 0 only when x=0. We call a form like this positive definite.
If a<0, then a x2 is always non-positive, and equals 0 only when x=0. We call a form like this negative definite.
There are two intermediate cases, where the form can be equal to 0 for some non-zero values of x: negative/positive semidefinite.
For a general quadratic form, y = x’A x, we say the form is Positive definite if y is invariably positive (y >0)Positive semi-definite if y is invariably non-negative (y ≥ 0)Negative semi-definite if y is invariably non-positive (y ≤ 0)Negative definite if y is invariably negative (y < 0)Indefinite if y changes signs.
42
RS- Chapter 4 22
43
4.3 Definite Matrices - Definition A quadratic form is said to be indefinite if y changes signs.
A symmetric (n×n) A is called positive definite (pd), positve semidefinite (psd), negative semidefinite (nsd) and negative definite (nd) according to the corresponding sign of the quadratic form, y.
For example, if y = x’A x, is positive, for any non-zero vector x of n real numbers; we say A is positive definite.
Example: Let A = X′X.
Then, z′A z = z′X′X z = v′v >0. ⇒ X′X is pd
In general, we use eigenvalues to determine the definiteness of a matrix (and quadratic form).
4.4 Upper and Lower Triangular Matrices
LT
021
012
000
UT
100
600
521
A square (nxn) matrix C is:
-Upper Triangular (UT) iff Cij=0 for i>j(if the diagonal elements are all equal to 1,we have a upper-unit triangular (UUT) matrix)
-Lower Triangular (LT) iff Cij=0 for i<j(if the diagonal elements are all equal to 1,we have a lower-unit triangular (LUT) matrix)
-Diagonal (D) iff Cij=0 for i≠j
44
• Theorems:The product of the two UT (UUT) matrices is UT (UUT).The product of the two LT (LUT) matrices is LT (LUT).The product of the two D matrices is D.
RS- Chapter 4 23
An (nxn) matrix A can be factorized, with proper row and/or column permutations, into two factors, a LT matrix L and an UT matrix U:
45
4.4 UT & LT Matrices – LU Factorization
33
2322
131211
333231
2221
11
00
0x0
00
u
uu
uuu
lll
ll
l
LUA
Without permutations in A, the factorization may fail. We have an n2 by n2 system. For example, given a11 = l11 u11, if a11=0, then at least one of l11 & u11 has to be 0, which implies either L or U is singular (impossible if A is non-singular).
A proper permutation matrix, P, is enough for LU factorization. It is called LU factorization with Partial Pivoting (or PA = LU).
4.4 UT & LT Matrices – Forward Substitution
46
• The LU decomposition requires 2n3/3 (plus lower order terms) operations or “flops” –i.e., floating point operations (+,-,x,/). When n is large, n3 dominates, we describe this situation with “order n3”or O(n3).
• Q: Why are we interested in these matrices?Suppose Ax=d, where A is LT (with non-zero diagonal terms).
Then, the solutions are recursive (forward substitution).
Example:x1 = d1/a11
a21 x1 + a22 x2 = d2
a31 x1 +a32 x2 + a33 x3 = d3
Note: For an nxn matrix A, this process involves n2 flops.46
RS- Chapter 4 24
• Similarly, suppose Ax=d, where A is UT (with non-zero diagonal terms). Then, the solutions are recursive (backward substitution).
Example:a11 x1 +a12 x2 + a13 x3 = d1
a22 x2 + a23 x3 = d2
x3 = d3/a31
Note: Again, for A(nxn), this process involves n2 flops.
4.4 UT & LT Matrices – Back Substitution
47
48
• Finding a solution to Ax=dGiven A (nxn). Suppose we can decompose A into A=LU, where Lis LUT and U is UUT (with non-zero diagonal).
Then Ax=d => LUx = d.
Suppose L is invertible=> Ux = L-1d = c (or d = Lc)=> solve by forward substitution for c.
Then, Ux = c (Gaussian elimination) => solve by backward substitution for x.
• Theorem:If A (nxn) can be decomposed A=LU, where L is LUT and U is UUT (with non-zero diagonal), then Ax=d has a unique solution for every d.
4.4 UT & LT Matrices – Linear Systems
RS- Chapter 4 25
4.4 UT & LT Matrices – LDU Decomposition
• We can write a “symmetric” decomposition. Since U has non-zero diagonal terms, we can write U=DU*, where U* is UUT.Example:
100
210
421
*;
500
030
002
;
500
630
842
UDU
• Theorems:- If we can write A (nxn) as A=LDU, where L is LUT, D is diagonal with non zero diagonal elements, and U is UUT, then L,D, and U are unique.
- If we can write A (nxn) as A=LDU, and A is symmetric, then we can write A=LDL’.
49
4.4 Cholesky Decomposition
• Theorem: Cholesky decompositionA is a symmetric positive definite matrix (A symmetric, A=LDL’, and all diagonal elements of D are positive), then A = HH’.
Proof:Since A is symmetric, then A=LDL’.The product of a LUT matrix and a D matrix is a LUT matrix.Let D*=D1/2 and L be a LT matrix.Then H=LD* is matrix is LT => A=HH’. ■
• H is called the Cholesky factor of A (‘square root’ of a pd matrix.)
• The Cholesky decomposition is unique. It is used in the numerical solution of systems of equations, non-linear optimization, Kalman filter algorithms, IRF of VARs, etc.
50
RS- Chapter 4 26
4.4 Cholesky decomposition: Algorithm
51
• Let’s partition matrices A=HH’ as:
/
222221212111
21112
11
22
2111
2221
11
2221
2111
0
0
TT
T
T
TT
LLLLLl
Lll
L
Ll
LL
l
AA
Aa
• Algorithm1. Determine l11 and L21: l11 = √a11 & L21 = (1/l11) A21
(if A is pd => a11>0)
2. Compute L22 from A22 − L21 L21T = L22 L22
T
(if A is pd => A22 − L21 L21T = A22 − A21A21
T/a11 is pd)
André-Louis Cholesky (1875–1918, France)
4.4 Cholesky decomposition: Algorithm
52
• Example:
52
RS- Chapter 4 27
4.4 Cholesky decomposition: Algorithm
• Example:
Note: Again, for A(nxn), the Cholesky decomposition involves n3/3 flops.
53
4.4 Cholesky decomposition: Application
54
• System of EquationsIf A is a positive definite matrix, then we can solve Ax = b by (1) Compute the Cholesky decomposition A=HH′. (2) Solve Hy = b for y, (forward solution)(3) With y known, solve H′x = y for x. (backward solution) Q: How many flops? Step (1): n3/3 flops, Steps (2)+(3): 2n2 flops.
Note: A-1 is not computed (Gauss-Jordan methods needs 4n3 flops)
• Ordinary Least Squares (OLS)Systems of the form Ax = b with A symmetric and pd are common in economics. For example, the normal equations in OLS problems are of this form (the unknown is b):
(y - Xb)′ X = 0 => X′X b = X′ y No need to compute (X′X)-1 (=A-1) to solve for b.
RS- Chapter 4 28
4.5 Inverse matrix (Again)
Review
- AA-1 = I
- A-1A=I
- Necessary for matrix to be square to have unique inverse.
- If an inverse exists for a square matrix, it is unique
- (A')-1=(A-1)’
- If A is pd, then A-1 = H’-1H-1
- Solution to A x = d
A-1A x* = A-1 d
I x* =A-1 d => x* = A-1 d (solution depends on A-1)
- Linear independence a problem to get x*
- Determinant test! (coming soon)55
4.5 Inverse of a Matrix: Calculation
100
010
001
|
ihg
fed
cba
Process:• Append the identity matrix to A.
• Subtract multiples of the other rows from the first row to reduce the diagonal element to 1.
• Transform I as you go.
• When the original A matrix becomes I, the original identity has become A-1.
• Theorem: Let A be an invertible (nxn) matrix. Suppose that a sequence of elementary row-operations reduces A to the identity matrix. Then, the same sequence of elementary row-operations when applied to the identity matrix yields A-1.
zyx
wvu
tsr
|
100
010
001
RS- Chapter 4 29
4.5 Determination of the Inverse(Gauss-Jordan Elimination)
AX = I
I X = K
I X = X = A-1 => K = A-1
1) Augmented matrix
all A, X and I are (nxn) square matrices
X = A-1
Gauss elimination Gauss-Jordanelimination
further row operations
[A I ] [ UT H] [ I K]
2) Transform (using elementary row operations) augmented matrix
Wilhelm Jordan (1842– 1899, Germany)
Find A-1 using the Gauss-Jordan method.
4.5 Gauss-Jordan Elimination: Example 1
211
121
112
A
100211
010121
002
1
2
1
2
11
100211
010121
001112
|.1 )2/1(1RIA
102
1
2
3
2
10
012
1
2
1
2
30
002
1
2
1
2
11
100211
010121
002
1
2
1
2
11
.2 )1(&)1( 3121 RR
Process: Expand A|I. Start scaling and adding rows to get I|A-1.
RS- Chapter 4 30
102
1
2
3
2
10
03
2
3
1
3
110
002
1
2
1
2
11
102
1
2
3
2
10
012
1
2
1
2
30
002
1
2
1
2
11
.3 )3/2(2 AR
13
1
3
1
3
400
03
2
3
1
3
110
002
1
2
1
2
11
102
1
2
3
2
10
03
2
3
1
3
110
002
1
2
1
2
11
.4 )2/1(32R
4
3
4
1
4
1100
03
2
3
1
3
110
002
1
2
1
2
11
13
1
3
1
3
400
03
2
3
1
3
110
002
1
2
1
2
11
.5 )4/3(3R
4.5 Gauss-Jordan Elimination: Example 1
Gauss elimination
4
3
4
1
4
1100
4
1
4
3
4
1010
8
3
8
1
8
50
2
11
4
3
4
1
4
1100
03
2
3
1
3
110
002
1
2
1
2
11
.6 )2/1(&)3/1( 1323 RR
4
3
4
1
4
1100
4
1
4
3
4
1010
8
2
8
2
8
6001
4
3
4
1
4
1100
4
1
4
3
4
1010
8
3
8
1
8
50
2
11
.7 )2/1(12R
4
3
4
1
4
14
1
4
3
4
14
1
4
1
4
3
4
3
4
1
4
1100
4
1
4
3
4
1010
8
2
8
2
8
6001
|.8 11 AAI
4.5 Gauss-Jordan Elimination: Example 1
Gauss-Jordan
elimination
RS- Chapter 4 31
DDI
DDI
D
DDI
I
I
I
I
I
I
I
XXYX
XYXXXXYXXYXXXXRR
XYXXYXYY
XXYX
XXXYXXR
XXYXXYXXYXYY
XXXYXXRR
YYYX
XXXYXXR
YYYX
XYXX
XYXX
XYXXYXYY
YX
XX
)(0
0.4
][ where
)(0
0.3
0
0.2
0
0
0
0.1
1
1111
11
1
11][
11
11
11
21
1
211
12
11
Partitioned inverse (using the Gauss-Jordan method).
4.5 Gauss-Jordan Elimination: Example 2
• Q: How many flops to invert a matrix with the G-J method?A: Avoid inverses! But, if you must... The process of zeroing out one element of the left-hand matrix requires multiplying the line to be subtracted by a constant (2n flops), and subtracting it (2n flops). This must be done for (approximately) n2 matrix elements. Thus, the number of flops is about equal to 4n3 by the G-J method.
• Using a standard PC (100 Gigaflops, 109, per second), for a 30x30 matrix, the time required is less than a millisecond, comparing favorably with 1021+ years for the method of cofactors.
• More sophisticated (optimal) algorithms, taking advantage of zeros –i.e., the sparseness of the matrix-, can improve to n3 flops.
4.5 Gauss-Jordan Elimination: Computations
RS- Chapter 4 32
4.5 Matrix inversion: Note
It is not possible to divide one matrix by another. That is, we can not write A/B. For two matrices A and B, the quotient can be written as AB-1 or B-1A.
In general, in matrix algebra AB-1 B-1A.
Thus, writing A/B does not clearly identify whether it represents AB-1 or B-1A.
We’ll say B-1 post-multiplies A (for AB-1) and
B-1 pre-multiplies A (for B-1A)
Matrix division is matrix inversion.
63
4.6 Trace of a Matrix
The trace of an nxn matrix A is defined to be the sum of the elements on the main diagonal of A:
trace(A) = tr(A) = Σi aii.
where aii is the entry on the ith row and ith column of A.
Properties:
- tr(A + B) = tr(A) + tr(B)
- tr(cA) = c tr(A)
- tr(AB) = tr(BA)
- tr(ABC) = tr(CAB) (invariant under cyclic permutations.)
- tr(A) = tr(AT)
- d tr(A) = tr(dA) (differential of trace)
- tr(A) = rank(A) when A is idempotent –i.e., A= A2.
64
RS- Chapter 4 33
4.6 Application: Rank of the Residual Maker
We define M, the residual maker, as:
M = In - X(X′X)-1 X′ = In - P
where X is an nxk matrix, with rank(X)=k
Let’s calculate the trace of M:
tr(M) = tr(In) - tr(P) = n - k
- tr(IT) = n
- tr(P) = k
Recall tr(ABC) = tr(CAB)
=> tr(P) = tr(X(X′X)-1 X′) = tr(X′X (X′X)-1) = tr(Ik) = k
Since M is an idempotent matrix –i.e., M= M2-, then
rank(M) = tr(M) = n - k 65
4.7 Determinant of a Matrix
The determinant is a number associated with any squared matrix.
If A is an nxn matrix, the determinant is |A| or det(A).
Since the early days, a determinant was used to “determine” if a system of linear equations has a unique solution.
Cramer (1750) expanded the concept to sets of equations, but a bit later, they were recognized as independent functions, Vandermole (1772).
Determinants are used to characterize invertible matrices. A matrix is invertible (non-singular) if and only if |A|≠0.
That is, if |A|≠0 → A is invertible or non-singular.
Can be found using factorials, pivots, and cofactors!
Lots of interpretations. 66
RS- Chapter 4 34
4.7 Determinant of a Matrix
When n is small, determinants are used for inversion and to solve systems of equations.
Example: Inverse of a 2x2 matrix:
dc
baA bcadAA )det(||
ac
bd
bcadA
11 This matrix is called the adjugate of A (or adj(A)).
A-1 = adj(A)/|A|
cegbdiafhcdhbfgaei
ihg
fed
cba
ihg
fed
cba
ihg
fed
cba
ihg
fed
cba Sarrus’ Rule: Sum from left to right. Then, subtract from right to leftNote: N! terms
Q: How many flops? For A (3x3), we count 17 operations.
4.7 Determinant of a Matrix (3x3)
RS- Chapter 4 35
4.7 Determinants: Laplace formula
The determinant of a matrix of arbitrary size can be defined by the Leibniz formula or the Laplace formula.
The Laplace formula (or expansion) expresses the determinant |A| as a sum of n determinants of (n-1) × (n-1) sub-matrices of A. There are n2 such expressions, one for each row and column of A
Define the i,j minor Mij (usually written as |Mij|) of A as the determinant of the (n-1) × (n-1) matrix that results from deleting the i-th row and the j-th column of A.
69Pierre-Simon Laplace (1749–1827, France).
Define the Ci,j the cofactor of A as:
70
||)1( ,, jiji
ji MC
• The cofactor matrix of A -denoted by C-, is defined as the nxn matrix whose (i,j) entry is the (i,j) cofactor of A. The transpose of C is called the adjugate or adjoint of A -adj(A).
• Theorem (Determinant as a Laplace expansion)
Suppose A = [aij] is an nxn matrix and i,j= {1, 2, ...,n}. Then the determinant
njnjjjijij
ininiiii
CaCaCa
CaCaCaA
...
...||
22
2211
4.7 Determinants: Laplace formula
RS- Chapter 4 36
Example:
71
642
010
321
A
0)0(x4)3x2-x61)(1()0(x2
0))2x)1((x3)0(x)1(x2)6x1(x1
x3x2x1|| 131211
CCCA
|A|=0 => The matrix is singular. (Check!)
How many flops? For a A (3x3), we count 14 operations (better!). For A (nxn), we calculate n subdeterminants, each of which requires (n-1) subdeterminants, etc. Then, computations of order n! (plus some n terms), or O(n!).
4.7 Determinants: Laplace formula
4.7 Determinants: Properties
Interchange of rows and columns does not affect |A|. (Corollary, |A| = |A’|.)
To any row (column) of A we can add any multiple of any other row (column) without changing |A|.
(Corollary, if we transform A into U or L , |A|=|U| = |L|, which is equal to the product of the diagonal element of U or L.)
|I| = 1, where I is the identity matrix.
|kA| = kn |A|, where k is a scalar.
|A| = |A’|.
|AB| = |A||B|.
|A-1|=1/|A|.
Recursive flops formula: flopsn= n * (flopsn-1 + 2) - 172
RS- Chapter 4 37
4.7 Determinants: Computations
By today’s standards, a 30×30 matrix is small. Yet it would beimpossible to calculate a 30×30 determinant by Laplace formula.It would require over n! (30! ≈ 2.65 × 1032) multiplications.
If a computer performs one quatrillion (1.0x1015) multiplications per second (a Petaflops, the 2008 record), it would have to run for over 8.4 billion years to compute a 30×30 determinant by Laplace’s method.
Using today’s fastest computer (2013 China Tianhe-2, 33 petaflops), it would take 254 million years.
Not a very useful, computationally speaking, method. Avoid factorials! 73
4.7 Determinants: Computations
Faster way of evaluating the determinant: Bring the matrix to UT (or LT) form by linear transformations. Then, the determinant is equal to the product of the diagonal elements.
For A (nxn), each linear transformation involves adding a multiple of one row to another row, that is, n or fewer additions and n or fewer multiplications. Since there are n rows, this is a procedure of order n3 -or O(n3).
Example: For n = 30, we go from 30! = 2.65*1032 flops to 303 = 27,000 flops.
74
RS- Chapter 4 38
n
iini
n
iii
n
iii
nxn
nxn
nnnn
n
n
x
nxCd
Cd
Cd
A
d
d
d
CCC
CCC
CCC
A
x
x
x
dAA
x
n
1
12
11
1
2
1
21
22212
12111
11
1
*
*
*
*
11
adjoint 1
2
1
75
4.7 Determinants: Cramer’s Rule - Derivation
• Recall the solution to Ax=d, where A is an nxn matrix:x*=A-1d
Using the cofactor method to get the inverse we get:
AAx
aad
aad
aad
A
Aaa
aad
aa
aad
aa
aadCd
MCCdCdCdCd
Cd
Cd
Cd
ACdCdCd
CdCdCd
CdCdCd
Ax
x
x
*
iii
ijji
ijii
ii
iii
iii
iii
111
33323
23222
13121
1
12322
13123
3332
13122
3332
23221
3
11
31212111
3
11
3
13
3
12
3
11
333232131
323222121
313212111
*3
*2
*1
such that A Find.4)
)3
1 where)2
11)1
76
• Example: Let A be 3x3. Then,
4.7 Determinants: Cramer’s Rule - Derivation
RS- Chapter 4 39
77
A
A
aaa
aaa
aaa
aad
aad
aad
Ca
Cdx
iii
iii
1
333231
232221
131211
33323
23222
13121
3
111
3
11
*1
4.7 Determinants: Cramer’s Rule - Derivation
A
A
aaa
aaa
aaa
ada
ada
ada
Ca
Cdx
iii
iii
2
333231
232221
131211
33331
23221
13111
3
122
3
12
*2
78
A
A
aaa
aaa
aaa
daa
daa
daa
Ca
Cdx
iii
iii
3
333231
232221
131211
33231
22221
11211
3
133
3
13
*3
4.7 Determinants: Cramer’s Rule - Derivation
Gabriel Cramer (1704-1752, Switzerland).
RS- Chapter 4 40
AA
AA
AA
AA
Cd
Cd
Cd
Cd
A
x
x
x
x
nn
iini
n
iii
n
iii
n
iii
n
3
2
1
1
13
12
11
*
*
*
*
13
2
1
79
4.7 Determinants: Cramer’s Rule - Derivation
• Following the pattern, we have the general Cramer’s rule:
4.7 Cramer’s Rule Application: Macro
Model
80
gb
g
bA
1
10
01
111
tdeterminan The
010
01
111
formMatrix
0
0
bTa
I
G
C
Y
g
b
RS- Chapter 4 41
81
)(1
00
1
11
)(1
11
10
0
11
)(1100
01
11
00*000
0
00*000
0
00*000
0
gb
IbTag
A
AGIbTag
g
bTab
I
A
gb
bTagbI
A
ACbTagbI
g
bTab
I
A
gb
bTaI
A
AYbTaIbTa
I
A
GG
CC
YY
• Applying Cramer’s rule for the 3x3 case:
4.7 Cramer’s Rule Application: Macro Model
Ch. 4 - Notation and Definitions: Summary
A (Upper case letters) = matrix b (Lower case letters) = vector nxm = n rows, m columns rank(A) = number of linearly independent vectors of A trace(A) = tr(A) = sum of diagonal elements of A Null matrix = all elements equal to zero. Diagonal matrix = all off-diagonal elements are zero. I = identity matrix (diagonal elements: 1, off-diagonal: 0) |A| = det(A) = determinant of A A-1 = inverse of A A’=AT = Transpose of A |Mij|= Minor of A A=AT => Symmetric matrix AT A =A AT => Normal matrix AT =A-1 => Orthogonal matrix A =A2 => Idempotent matrix 82