mit2 086f12 notes unit3

7/27/2019 MIT2 086F12 Notes Unit3

1/107

DRAFTV1.2

From

Math,Numerics,&Programming(forMechanicalEngineers)

MasayukiYanoJamesDouglassPennGeorgeKonidarisAnthonyTPatera

September2012

The Authors.License: Creative Commons Attribution-Noncommercial-Share Alike 3.0

(CC BY-NC-SA 3.0), which permits unrestricted use, distribution, and reproduction

in any medium, provided the original authors and MIT OpenCourseWare source

are credited; the use is non-commercial; and the CC BY-NC-SA license is

retained. See also http://ocw.mit.edu/terms/ .
http://creativecommons.org/licenses/by-nc-sa/3.0/us/http://ocw.mit.edu/terms/http://ocw.mit.edu/terms/http://creativecommons.org/licenses/by-nc-sa/3.0/us/

7/27/2019 MIT2 086F12 Notes Unit3

2/107

7/27/2019 MIT2 086F12 Notes Unit3

3/107

ContentsIII LinearAlgebra1:MatricesandLeastSquares.Regression. 19915Motivation 20116MatricesandVectors:DefinitionsandOperations 203

16.1 Basic Vector and Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 20316.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

Transpose Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20416.1.2 Vector Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

Inner Product. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207Norm (2-Norm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211Orthonormality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

16.1.3 Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213Vector Spaces and Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

16.2 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21716.2.1 Interpretation of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21716.2.2 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

Matrix-Matrix Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21916.2.3 InterpretationsoftheMatrix-VectorProduct . . . . . . . . . . . . . . . . . . 222

Row Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222Column Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223Left Vector-Matrix Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

16.2.4 InterpretationsoftheMatrix-MatrixProduct . . . . . . . . . . . . . . . . . . 224Matrix-MatrixProductasaSeriesofMatrix-VectorProducts . . . . . . . . . 224Matrix-MatrixProductasaSeriesofLeftVector-MatrixProducts . . . . . . 225

16.2.5 Operation Count of Matrix-Matrix Product . . . . . . . . . . . . . . . . . . . 22516.2.6 The Inverse of a Matrix (Briefly) . . . . . . . . . . . . . . . . . . . . . . . . . 226

16.3 Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22716.3.1 Diagonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22816.3.2 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22816.3.3 Symmetric Positive Definite Matrices. . . . . . . . . . . . . . . . . . . . . . . 22816.3.4 Triangular Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23016.3.5 Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23016.3.6 Orthonormal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

16.4 Further Concepts in Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . 23316.4.1 Column Space and Null Space . . . . . . . . . . . . . . . . . . . . . . . . . . 23316.4.2 Projectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

3

7/27/2019 MIT2 086F12 Notes Unit3

4/107

17LeastSquares 23717.1 Data Fitting in Absence of Noise and Bias . . . . . . . . . . . . . . . . . . . . . . . 23717.2 Overdetermined Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

Row Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243Column Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

17.3 Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24517.3.1 Measures of Closeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24517.3.2 Least-SquaresFormulation(2 minimization) . . . . . . . . . . . . . . . . . . 24617.3.3 Computational Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 250

QRFactorizationandtheGram-SchmidtProcedure . . . . . . . . . . . . . . 25117.3.4 InterpretationofLeastSquares: Projection . . . . . . . . . . . . . . . . . . . 25317.3.5 Error Bounds for Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . 255

ErrorBoundswithRespecttoPerturbationinData,g(constantmodel) . . . 255ErrorBoundswithRespecttoPerturbationinData,g(general) . . . . . . . 256ErrorBoundswithRespecttoReductioninSpace,B . . . . . . . . . . . . . 262

18MatlabLinearAlgebra(Briefly) 26718.1 Matrix Multiplication (and Addition) . . . . . . . . . . . . . . . . . . . . . . . . . . 26718.2 TheMatlab InverseFunction: inv . . . . . . . . . . . . . . . . . . . . . . . . . . . 26818.3 SolutionofLinearSystems:Matlab Backslash . . . . . . . . . . . . . . . . . . . . . 26918.4 Solution of (Linear) Least-Squares Problems. . . . . . . . . . . . . . . . . . . . . . . 269

19Regression: Statistical Inference 27119.1 Simplest Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

19.1.1 FrictionCoefficientDeterminationProblemRevisited . . . . . . . . . . . . . 27119.1.2 Response Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27219.1.3 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27519.1.4 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

Individual Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . 278Joint Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

19.1.5 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28419.1.6 Inspection of Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

CheckingforPlausibilityoftheNoiseAssumptions . . . . . . . . . . . . . . . 285Checking for Presence of Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

19.2 General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28719.2.1 Response Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28719.2.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28819.2.3 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28919.2.4 Overfitting (and Underfitting) . . . . . . . . . . . . . . . . . . . . . . . . . . 290

4

7/27/2019 MIT2 086F12 Notes Unit3

5/107

UnitIIILinearAlgebra1:MatricesandLeast

Squares.Regression.

199

7/27/2019 MIT2 086F12 Notes Unit3

6/107

7/27/2019 MIT2 086F12 Notes Unit3

7/107

Chapter15MotivationDRAFT V1.2 The Authors. License: Creative Commons BY-NC-SA 3.0.

In odometry-based mobile robot navigation, the accuracy of the robots dead reckoning posetracking depends on minimizing slippage between the robots wheels and the ground. Even amomentary slip can lead to an error in heading that will cause the error in the robots locationestimatetogrowlinearlyoveritsjourney. Itisthusimportanttodeterminethefrictioncoefficientbetweentherobotswheelsandtheground,whichdirectlyaffectstherobotsresistancetoslippage.Just as importantly, this friction coefficient will significantly affect the performance of the robot:theabilitytopushloads.

WhenthemobilerobotofFigure15.1iscommandedtomoveforward,anumberofforcescomeinto play. Internally, the drive motors exert a torque (not shown in the figure) on the wheels,whichisresistedbythefrictionforceFf betweenthewheelsandtheground. IfthemagnitudeofFfdictatedbythesumofthedragforceFdrag (acombinationofallforcesresistingtherobotsmotion)andtheproductoftherobotsmassandaccelerationislessthanthemaximumstaticfrictionforceFmax between the wheels and the ground, the wheels will roll without slipping and the robotf,staticwillmoveforwardwithvelocityv=rwheel. If,however,themagnitudeofFf reachesFmax ,thef,staticwheelswillbegintoslipandFf willdroptoa lower levelFf,kinetic,thekinetic friction force. Thewheels will continue to slip (v

7/27/2019 MIT2 086F12 Notes Unit3

8/107

0 1 2 3 4 5 6 7 8

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

time (seconds)ime seconds)

Ffrictionriction vs.s Timime foror 50000 Grramm Loadoad

Ff, kiineeticc = k Fnorormaal + expxperirimenntalal errorrror

Ffriction

o(Newtons)

Newto

Ff, stattaticc = Ftaangengentiaal, applpplieded s Fnorormaal

measas

= s Fnorormaal + expxperiimenntalal errorrrorFf, stattaticcmax,x measeas

Figure15.2: Experimentalsetup for frictionmeasurement: Force transducer (A) is con- Figure 15.3: Sample data for one frictionnected to contact area (B) by a thin wire. measurement, yielding one data point for

Fmax,measNormal force is exerted on the contact area . DatacourtesyofJamesPenn.f,

static

by load stack (C). Tangential force is appliedusingturntable(D)viathefrictionbetweentheturntablesurfaceandthecontactarea. ApparatusandphotographcourtesyofJamesPenn.staticfrictionforce. Weexpectthat

Fmaxf,static =sFnormal,rear , (15.1)wheres isthestaticcoefficientoffrictionandFnormal,rear isthenormalforcefromthegroundonthe rear, driving, wheels. In order to minimize the risk of slippage (and to be able to push largeloads),robotwheelsshouldbedesignedforahighvalueofs betweenthewheelsandtheground.Thisvalue,althoughdifficulttopredictaccuratelybymodeling,canbedeterminedbyexperiment.

WefirstconductexperimentsforthefrictionforceFmax (inNewtons)asafunctionofnormalf,staticload Fnormal,applied (in Newtons) and (nominal) surface area of contact Asurface (in cm2) with thefrictionturntableapparatusdepictedinFigure15.2. Weightspermitustovarythenormalloadandwasherinsertspermitustovarythenominalsurfaceareaofcontact. Atypicalexperiment(ataparticularprescribedvalueofFnormal,applied andAsurface)yieldsthetimetraceofFigure15.3fromwhichtheFmax,means (ourmeasurement ofFmax ) isdeducedasthemaximumoftheresponse.f,static f,static

Wenextpostulateadependence(ormodel)Fmaxf,static(Fnormal,applied, Asurface) =0+1Fnormal,applied+2Asurface , (15.2)

where we expect but do notapriori assume from Messieurs Amontons and Coulomb that0 =0and2 =0(andofcourse1 s). Inordertoconfirmthat 0 =0and 2 =0 oratleastconfirmthat0 =0and2 =0 isnotuntrueandtofindagoodestimatefor1 s,wemustappealtoourmeasurements.

The mathematical techniques by which to determine s (and 0, 2) with some confidencefromnoisyexperimentaldataisknownasregression,whichisthesubjectofChapter19. Regression,inturn,isbestdescribedinthelanguageoflinearalgebra(Chapter16),andisbuiltuponthelinearalgebraconceptof leastsquares(Chapter17).

202

7/27/2019 MIT2 086F12 Notes Unit3

9/107

Chapter16MatricesandVectors:DefinitionsandOperations

16.1 BasicVectorandMatrixOperations16.1.1 DefinitionsLet us first introduce the primitive objects in linear algebra: vectors and matrices. A m-vectorvRm1 consistsofmrealnumbers1

v=

v1v2...

vm

.

Itisalsocalledacolumnvector,whichisthedefaultvectorinlinearalgebra. Thus,byconvention,vRm impliesthatv isacolumnvector inRm1. Notethatweusesubscript()i toaddressthe

R1ni-th component of a vector. The other kind of vector is a row vector v consisting of nentries

v= v1 v2 vn .Letusconsiderafewexamplesofcolumnandrowvectors.Example16.1.1vectorsExamplesof(column)vectors inR3 are

1 3 9.1v= 3, u= 7, and w= 7/3 .

6 1Theconceptofvectorsreadilyextendstocomplexnumbers,butweonlyconsiderrealvectorsinourpresentation

ofthischapter.

203

DRAFT V1.2 The Authors. License: Creative Commons BY-NC-SA 3.0.
http://creativecommons.org/licenses/by-nc-sa/3.0/us/http://creativecommons.org/licenses/by-nc-sa/3.0/us/

7/27/2019 MIT2 086F12 Notes Unit3

10/107

To address a specific component of the vectors, we write, for example, v1 = 1, u1 = 3, andw3 =. Examplesofrowvectors inR14 are v= 2 5 2 e and u= 1 1 0 .

Someofthecomponentsoftheserowvectorsarev2 =5andu4 =0.

AmatrixARmn consistsofmrowsandncolumnsforthetotalofmnentries,A=

A11 A12 A1nA21 A22 A2n

... ... ... ...Am1 Am2 Amn

.Extendingtheconventionforaddressinganentryofavector,weusesubscript()ij toaddresstheentry on the i-th row andj-th column. Note that the order in which the row and column arereferredfollowsthatfordescribingthesizeofthematrix. Thus,ARmn consistsofentries

Aij, i= 1, . . . , m , j= 1, . . . , n .Sometimesitisconvenienttothinkofa(column)vectorasaspecialcaseofamatrixwithonlyonecolumn, i.e.,n=1. Similarly,a(row)vectorcanbethoughtofasaspecialcaseofamatrixwithm=1. Conversely,anmnmatrixcanbeviewedasmrown-vectorsorncolumnm-vectors,aswediscussfurtherbelow.Example16.1.2matricesExamplesofmatricesare

1 3 0 0 1 4 9 and B= 2 8 1A= . 3 0 3 0

ThematrixA isa32matrix(AR32)andmatrixB isa33matrix(BR33). Wecanalsoaddressspecificentriesas,forexample,A12 = 3,A31 =4,andB32 =3.

While vectors and matrices may appear like arrays of numbers, linear algebra defines specialsetofrulestomanipulatetheseobjects. Onesuchoperationisthetransposeoperationconsiderednext.TransposeOperationThefirstlinearalgebraoperatorweconsideristhetransposeoperator,denotedbysuperscript()T.The transpose operator swaps the rows and columns of the matrix. That is, if B = AT withARmn,then

Bij =Aji , 1in, 1jm .Because the rows and columns of the matrix are swapped, the dimensions of the matrix are alsoswapped, i.e., ifARmn thenBRnm.

If we swap the rows and columns twice, then we return to the original matrix. Thus, thetransposeofatransposedmatrix istheoriginalmatrix, i.e.

(AT)T =A .204

7/27/2019 MIT2 086F12 Notes Unit3

11/107

Example16.1.3transposeLetusconsiderafewexamplesoftransposeoperation. AmatrixAand itstransposeB=AT arerelatedby

1 3A=

4 9

and B= 1 4 .

3 9 3

3

Therowsandcolumnsareswapped inthesensethatA31 =B13 = andA12 =B21 = 3. Also,becauseAR32,BR23. Interpretingavectorasaspecialcaseofamatrixwithonecolumn,wecanalsoapplythetransposeoperatortoacolumnvectortocreatearowvector, i.e.,given

3 7v= ,

thetransposeoperationyields Tu=v = 3 7 .

Notethatthetransposeofacolumnvector isarowvector,andthetransposeofarowvector isacolumnvector.

16.1.2 VectorOperationsThefirstvectoroperationweconsiderismultiplicationofavectorvRm byascalarR. Theoperationyields

u=v ,where

each

entry

of

u

Rm isgivenbyui =vi, i= 1, . . . , m .

Inotherwords,multiplicationofavectorbyascalarresultsineachcomponentofthevectorbeingscaledbythescalar.

ThesecondoperationweconsiderisadditionoftwovectorsvRm andwRm. Theadditionyields

u=v+w ,whereeachentryofuRm isgivenby

ui =vi+wi, i= 1, . . . , m.In order for addition of two vectors to make sense, the vectors must have the same number ofcomponents. Each entry of theresultingvector is simply thesum ofthe corresponding entriesofthetwovectors.

Wecansummarizetheactionofthescalingandadditionoperations inasingleoperation. LetvRm,wRm andR. Then,theoperation

u=v+w205

7/27/2019 MIT2 086F12 Notes Unit3

12/107

0 0.5 1 1.50

0.5

1

1.5

v

u=v

0 0.5 1 1.50

0.5

1

1.5

v

wu=v+w

(a) scalarscaling (b) vectoradditionFigure16.1: Illustrationofvectorscalingandvectoraddition.

yieldsavectoruRm whoseentriesaregivenbyui =vi+wi, i= 1, . . . , m .

The result is nothing more than a combination of the scalar multiplication and vector additionrules.Example16.1.4vectorscalingandaddition inR2Letus illustratescalingofavectorbyascalarandadditionoftwovectors inR2 using

1 1/2 3v= , w= , and = .

1/3 1 2First,letusconsiderscalingofthevectorv bythescalar. Theoperationyields

3 1 3/2u=v= = .

2 1/3 1/2This operation is illustrated in Figure 16.1(a). The vector v is simply stretched by the factor of3/2whilepreservingthedirection.

Now, letusconsideradditionofthevectorsv andw. Thevectoradditionyields1 1/2 3/2

u=v+w= + = .1/3 1 4/3

Figure 16.1(b) illustrates the vector addition process. We translate w so that it starts from thetip of v to form a parallelogram. The resultant vector is precisely the sum of the two vectors.NotethatthegeometricintuitionforscalingandadditionprovidedforR2 readilyextendstohigherdimensions.

206

7/27/2019 MIT2 086F12 Notes Unit3

13/107

Example16.1.5vectorscalingandaddition inR3T T

Letv= 1 3 6 ,w= 2 1 0 , and=3. Then, 1 2 1 6 7

u=v+w=

3

+ 3

1

=

3

+

3

=

0

.6 0 6 0 6

InnerProductAnother important operation is the inner product. This operation takes two vectors of the samedimension,vRm andwRm,andyieldsascalarR:

mm=vTw where = viwi .

i=1Theappearanceofthetransposeoperatorwillbecomeobviousonceweintroducethematrix-matrixmultiplicationrule. TheinnerproductinaEuclideanvectorspaceisalsocommonlycalledthedotproductandisdenotedby=vw. Moregenerally,theinnerproductoftwoelementsofavectorspace isdenotedby(,), i.e.,= (v,w).Example16.1.6 innerproduct

T TLetusconsidertwovectorsinR3,v= 1 3 6 andw= 2 1 0 . Theinnerproductofthesetwovectors is

m3

T=v w= viwi = 12 + 3(1)+60 =1 .i=1

Norm(2-Norm)Usingtheinnerproduct,wecannaturallydefinethe2-normofavector. GivenvRm,the2-normofv,denotedbyIvI2, isdefinedby

mmi=1

Note

that

the

norm

of

any

vector

is

non-negative,

because

it

is

a

sum

m

non-negative

numbers

IvI2 = vTv= 2vi .

g(squaredvalues). The2 normistheusualEuclideanlength;inparticular,form=2,theexpression2 2simplifiestothefamiliarPythagoreantheorem,IvI2 = v1 +v2. Whilethereareothernorms,we

almostexclusivelyusethe2-norm inthisunit. Thus, fornotationalconvenience,wewilldropthesubscript2andwritethe2-normofv asIvIwiththe implicitunderstandingI I I I2.

Bydefinition,anynormmustsatisfythetriangle inequality,Iv+wI IvI+IwI ,

207

7/27/2019 MIT2 086F12 Notes Unit3

14/107

for any v,w Rm. The theorem states that the sum of the lengths of two adjoining segmentsis longer than the distance between their non-joined end points, as is intuitively clear from Figure16.1(b). Fornormsdefinedby innerproducts,asour2-normabove,thetriangle inequality isautomaticallysatisfied.

Proof. Fornormsinducedbyaninnerproduct,theproofofthetriangleinequalityfollowsdirectlyfromthedefinitionofthenormandtheCauchy-Schwarzinequality. First,weexpandtheexpressionas

T T TIv+wI2 = (v+w)T(v+w) =v v+ 2v w+w w .ThemiddletermscanbeboundedbytheCauchy-Schwarz inequality,whichstatesthat

T Tv w |v w| IvIIwI .Thus,wecanboundthenormas

Iv+w

I2

Iv

I2+ 2

Iv

IIw

I+

Iw

I2 = (

Iv

I+

Iw

I)2 ,

andtakingthesquarerootofbothsidesyieldsthedesiredresult.

Example16.1.7normofavectorT T

Letv= 1 3 6 andw= 2 1 0 . The2 normsofthesevectorsare3m g

2IvI= vi = 12+ 32+ 62 = 46i=1

3m g 2and IwI= w = 22+ (1)2+ 02 = 5 .ii=1

Example16.1.8triangle inequalityLetus illustratethetriangle inequalityusingtwovectors

1 1/2v= and w= .

1/3 1Thelength(orthenorm)ofthevectorsare

10 5IvI= 1.054 and IwI= 1.118 .9 4

Ontheotherhand,thesumofthetwovectors is1 1/2 3/2

v+w= + = ,1/3 1 4/3

208

7/27/2019 MIT2 086F12 Notes Unit3

15/107

0 0.5 1 1.50

0.5

1

1.5

v

w

v+w

||v||

||w||

Figure16.2: Illustrationofthetriangle inequality.andits length is

145Iv+wI= 2.007 .6

Thenormofthesum isshorterthanthesumofthenorms,which isIvI+IwI 2.172 .

ThisinequalityisillustratedinFigure16.2. Clearly,thelengthofv+ wisstrictlylessthanthesumofthelengthsofvandw(unlessvandwalignwitheachother,inwhichcaseweobtainequality).

In

two

dimensions,

the

inner

product

can

be

interpreted

as

Tv w=IvIIwIcos() , (16.1)

where istheanglebetweenv andw. Inotherwords,the innerproduct isameasureofhowwellv and w align with each other. Note that we can show the Cauchy-Schwarz inequality from theaboveequality. Namely, |cos()| 1 impliesthat

T|v w|=IvIIwI|cos()| IvIIwI .In particular, we see that the inequality holds with equality if and only if = 0 or , whichcorrespondstothecaseswherevandwalign. ItiseasytodemonstrateEq.(16.1)intwodimensions.

Proof. Notingv,wR2,weexpresstheminpolarcoordinatescos(v) cos(w)v=IvI and w=IwI .sin(v) sin(w)

209

7/27/2019 MIT2 086F12 Notes Unit3

16/107

Theinnerproductofthetwovectorsyield2m

T=v w= viwi =IvIcos(v)IwIcos(w) +IvIsin(v)IwIsin(w)i=1

=IvIIwI cos(v)cos(w)+sin(v)sin(w)

1 iv iv) 1 iw 1 iv iv)1 iw iw)=IvIIwI (e +e (e +eiw) + (e e (e e2 2 2i 2i1 i(v+w) i(v+w) i(vw) i(vw)=IvIIwI e +e +e +e4

1 i(v+w) i(v+w)ei(vw)ei(vw) e +e4

1 i(vw) i(vw)=IvIIwI e +e2

=IvIIwIcos(vw) =IvIIwIcos() ,wherethe lastequalityfollowsfromthedefinition

v

w.

BeginAdvancedMaterialForcompleteness, letus introduceamoregeneralclassofnorms.

Example16.1.9p-normsThe2-norm,whichwewillalmostexclusivelyuse,belongtoamoregeneralclassofnorms,calledthep-norms. Thep-normofavectorvRm is

1/p

m

m IvIp = |vi|p ,i=1

wherep1. Anyp-normsatisfiesthepositivityrequirement,thescalarscalingrequirement,andthetriangleinequality. Weseethat2-norm isacaseofp-normwithp=2.

Anothercaseofp-normthatwefrequentlyencounteristhe1-norm,whichissimplythesumoftheabsolutevalueoftheentries, i.e.

mmIvI1 = |vi| .

i=1Theotherone isthe infinitynormgivenby

IvI = lim IvIp = max |vi| .p i=1,...,m

Inotherwords,the infinitynormofavectoris its largestentry inabsolutevalue.

EndAdvancedMaterial

210

7/27/2019 MIT2 086F12 Notes Unit3

17/107

5 4 3 2 1 0 1 2 3 40

1

2

3

4

5

6

7

u

v

w

Figure16.3: Setofvectorsconsideredto illustrateorthogonality.OrthogonalityTwovectorsvRm andwRm aresaidtobeorthogonaltoeachother if

Tv w= 0 .Intwodimensions, it iseasytoseethat

Tv w=IvIIwIcos() = 0 cos() = 0 =/2 .That is, the angle between v and w is /2, which is the definition of orthogonality in the usualgeometricsense.Example16.1.10orthogonalityLetusconsiderthreevectorsinR2,

4 3 0u= , v= , and w= ,

2 6 5andcomputethree innerproductsformedbythesevectors:

Tu v=43 + 26 = 0Tu w=40 + 25=10Tv w= 30 + 65=30 .

T T

Becauseu v=0,thevectorsuandv areorthogonaltoeachother. Ontheotherhand,u w = 0andthevectorsuandwarenotorthogonaltoeachother. Similarly,vandwarenotorthogonaltoeachother. ThesevectorsareplottedinFigure16.3;thefigureconfirmsthatuandvareorthogonalintheusualgeometricsense.

211

7/27/2019 MIT2 086F12 Notes Unit3

18/107

1 0.5 0 0.5 10

0.5

1

u

v

Figure16.4: Anorthonormalsetofvectors.OrthonormalityTwovectorsvRm andwRm aresaidtobeorthonormaltoeachother iftheyareorthogonaltoeachotherandeachhasunit length, i.e.

Tv w=0 and IvI=IwI= 1.Example

16.1.11

orthonormality

Twovectors

1 2 1 1u= and v=

1 2areorthonormaltoeachother. Itisstraightforwardtoverifythattheyareorthogonaltoeachother

5 5

T T1 2 1 1 1 2 1Tu v= = = 0

1 2 5 1 2andthateachofthemhaveunit length

1

5 5

IuI= ((2)2+ 12) = 15

1IvI= ((1)2+ 22) = 1.5

Figure 16.4 shows that the vectors are orthogonal and have unit length in the usual geometricsense.

16.1.3 LinearCombinationsLet

us

consider

aset

of

n m-vectors

v1 Rm, v2 Rm, . . . , vn Rm .

A linearcombinationofthevectors isgivenbynm

w= jvj ,j=1

where1, 2, . . . , n isasetofrealnumbers,andeachvj isanm-vector.212

7/27/2019 MIT2 086F12 Notes Unit3

19/107

Example16.1.12 linearcombinationofvectorsT T T

1 2 3Let us consider three vectors inR2, v = 4 2 , v = 3 6 , and v = 0 5 . Alinearcombinationofthevectors,with1 =1,2 =2,and3 =3, is

3

m 4 3 0

jvj = 1 + (2) + 3w=2 6 5

j=14 6 0 10

= + + = .2 12 15 5

Anotherexampleof linearcombination,with1 =1,2 =0,and3 =0, is3

2 6 5 2j=1

Notethatalinearcombinationofasetofvectors issimplyaweightedsumofthevectors.

Linear IndependenceAsetofn m-vectorsare linearlyindependent if

m 4 3 0 4jvj = 1 + 0 + 0 =w= .

n

jvj =0 only if 1 =2 = =n = 0 .j=1

Otherwise,thevectorsare linearlydependent.

m

Example16.1.13 linear independenceLetusconsiderfourvectors, 2 0 0 2 0, 2w = 0, 3w = 1, and 4w = 01w = .

0 3 0 5

1 2 4} is linearlydependentbecauseThesetofvectors{w , w , w 2 0 2 0

5 5 0+ 01 0= 01 21w4 = 11w + ;w3 3

0 3 5 0thelinearcombinationwiththeweights

{1,5/3,

1

}producesthezerovector. Notethatthechoice

oftheweightsthatachievesthisisnotunique;wejustneedtofindonesetofweightstoshowthatthevectorsarenot linearly independent(i.e.,are linearlydependent).

1 2Ontheotherhand,thesetofvectors{w , w , w3}islinearlyindependent. Consideringalinearcombination,

2 0 0 01 2 31 +2 +3 =1w w w 0+2 0+3 1= 0,

0 3 0 0213

7/27/2019 MIT2 086F12 Notes Unit3

20/107

we see that we must choose 1 = 0 to set the first component to 0, 2 = 0 to set the thirdcomponent to 0, and 3 = 0 to set the second component to 0. Thus, only way to satisfy the

1 2equation is to choose the trivial set of weights, {0,0,0}. Thus, the set of vectors{w , w , w3} islinearlyindependent.

BeginAdvancedMaterial

VectorSpacesandBasesGivenasetofn m-vectors,wecanconstructavectorspace,V,givenby

1 2V =span({v , v , . . . , vn}),where n

1 2, v , . . . , v vRm :v= km vk, k Rnn}) =span({v

k=11 2 n=spaceofvectorswhicharelinearcombinationsofv , v , . . . , v .

1In general we do not require the vectors{v , . . . , vn} to be linearly independent. When they arelinearlyindependent,theyaresaidtobeabasisofthespace. Inotherwords,abasisofthevectorspaceV isasetoflinearlyindependentvectorsthatspansthespace. Aswewillseeshortlyinourexample, there are many bases for any space. However, the number of vectors in any bases for agiven space isunique, and that number is called the dimension ofthe space. Letusdemonstratethe ideainasimpleexample.Example16.1.14Bases foravectorspace inR3LetusconsideravectorspaceV spannedbyvectors

1 2 01

v = 2 2v = 1 and 3v = 1.

0 0 0Bydefinition,anyvectorxV isoftheform

1 2 0 1+ 221 2 3x=1v +2v +3v =1 2+2 1+3 1= 21+2+3.

0 0 0 0Clearly, we can express any vector of the form x = (x1, x2,0)T by choosing the coefficients 1,2,and3judiciously. Thus,ourvectorspaceconsistsofvectorsofthe form(x1, x2,0)T, i.e.,allvectors inR3 withzero inthethirdentry.

We also note that the selection of coefficients that achieves (x1, x2,0)T is not unique, as itrequiressolutiontoasystemoftwolinearequationswiththreeunknowns. Thenon-uniquenessof

1 2thecoefficientsisadirectconsequenceof{v , v , v3}notbeinglinearlyindependent. Wecaneasilyverifythe lineardependencebyconsideringanon-trivial linearcombinationsuchas

1 2 0 02v1v23v3 = 2 21 13 1= 0.

0 0 0 0214

7/27/2019 MIT2 086F12 Notes Unit3

21/107

Becausethevectorsarenotlinearly independent,theydonotformabasisofthespace.To choose a basis for the space, we first note that vectors in the space V are of the form

(x1, x2,0)T. Weobservethat,forexample, 1 0

1w =

0

and 2w =

1

0 0

wouldspanthespacebecauseanyvectorinV avectoroftheform(x1, x2,0)Tcanbeexpressedasa linearcombination,

1 0 1x1 1 2=1w +2w =1 0+2 0= 2,x20 0 1 0

2 1bychoosing1 =x1 and2 =x . Moreover,w1 andw2 arelinearlyindependent. Thus,{w , w2}is1 2 1abasisforthespaceV. Unliketheset{v , v , v3}whichisnotabasis,thecoefficientsfor{w , w2}

thatyieldsxV isunique. Becausethebasisconsistsoftwovectors,thedimensionofV istwo.This issuccinctlywrittenas

dim(V) = 2 .Because a basis for a given space is not unique, we can pick a different set of vectors. For

example, 1 2

1z = 2 and 2z = 1,0 0

1is also a basis for V. Since z is not a constant multiple of z2, it is clear that they are linearlyindependent. We need to verify that they span the space V. We can verify this by a directargument,

1 2 1+ 22x1 1 2=1z +2z =1 2+2 1= 21+2.x20 0 0 0

Weseethat, forany(x1, x2,0)T,wecanfindthe linearcombinationofz1 andz2 bychoosingthecoefficients 1 = (x1 + 2x2)/3 and 2 =(2x1x2)/3. Again, the coefficients that represents x

1using{z , z2}areunique.ForthespaceV,andforanygivenbasis,wecanfindauniquesetoftwocoefficientstorepresent

anyvector

x

V

.

Inother

words,

any

vector

in

V

is

uniquely

described

by

two

coefficients,

or

parameters. Thus,abasisprovidesaparameterizationofthevectorspaceV. Thedimensionofthespaceistwo,becausethebasishastwovectors,i.e.,thevectorsinthespaceareuniquelydescribedbytwoparameters.

Whiletherearemanybasesforanyspace,therearecertainbasesthataremoreconvenientto

work with than others. Orthonormal bases bases consisting of orthonormal sets of vectors are such a class of bases. We recall that two set of vectors are orthogonal to each other if their

215

7/27/2019 MIT2 086F12 Notes Unit3

22/107

1innerproductvanishes. Inorderforasetofvectors{v , . . . , vn}tobeorthogonal,thevectorsmustsatisfy

Inotherwords,thevectorsaremutuallyorthogonal. Anorthonormalsetofvectorsisanorthogonal1setofvectorswitheachvectorhavingnormunity. Thatis,thesetofvectors{v , . . . , vn}ismutually

orthonormal if

We note that an orthonormal set of vectors is linearly independent by construction, as we nowprove.

1Proof. Let{v , . . . , vn}beanorthogonalsetof(non-zero)vectors. Bydefinition,thesetofvectorsislinearlyindependentiftheonlylinearcombinationthatyieldsthezerovectorcorrespondstoallcoefficientsequaltozero, i.e.

1 1+n 1v +

vn = 0

=

=n = 0 .

Toverifythisindeedisthecaseforanyorthogonalsetofvectors,weperformtheinnerproductofthe linearcombinationwithv1, . . . , vn toobtain

i)T(1 1 +n i)T 1 i)T i i)T n(v v + vn) =1(v v + +i(v v + +n(v v=iIviI2, i= 1, . . . , n .

Notethat(vi)Tvj =0, i=j,duetoorthogonality. Thus,settingthe linearcombinationequaltozerorequires

iIviI2 = 0, i= 1, . . . , n .Inotherwords,i = 0or

Ivi

I2 =0foreachi. Ifwerestrictourselvestoasetofnon-zerovectors,

thenwemusthavei =0. Thus,avanishinglinearcombinationrequires1 = =n =0,whichisthedefinitionof linear independence.

Because an orthogonal set of vectors is linearly independent by construction, an orthonormalbasis for a space V is an orthonormal set of vectors that spans V. One advantage of using anorthonormalbasis isthatfindingthecoefficients foranyvector inV isstraightforward. Suppose,

1wehaveabasis{v , . . . , vn}andwishtofindthecoefficients1, . . . , n thatresultsinxV. Thatis,weare lookingforthecoefficientssuchthat

1 i nx=1v + +iv + +nv .Tofindthei-thcoefficienti,wesimplyconsiderthe innerproductwithvi, i.e.

1 ii)T(v x= (vi)T(1v + +iv + +nvn)i)T i)T i)T1 i n=1(v v + +i(v v + +n(v v

i=i(vi)Tv =iIviI2 =i, i= 1, . . . , n ,wherethelastequalityfollowsfromIviI2 =1. Thatis,i = (vi)Tx,i= 1, . . . , n. Inparticular,foranorthonormalbasis,wesimplyneedtoperformn innerproductstofindthencoefficients. Thisis in contrast to anarbitrarybasis, whichrequiresa solution to annn linearsystem (which issignificantlymorecostly,aswewillsee later).

216

(vi)Tvj = 0, i=j .

(vi)Tvj = 0, i=j

vi= (vi)Tvi = 1, i= 1, . . . , n .

7/27/2019 MIT2 086F12 Notes Unit3

23/107

Example16.1.15OrthonormalBasisLetusconsiderthespacevectorspaceV spannedby

1 2 01v = 2 2v = 1 and 3v = 1.

0 0 0RecallingeveryvectorinV isoftheform(x1, x2,0)

T

,asetofvectors1 0

1w = 0 and 2w = 10 0

forms an orthonormal basis of the space. It is trivial to verify they are orthonormal, as they areorthogonal,i.e.,(w1)Tw2 =0,andeachvectorisofunitlengthIw1I=Iw2I=1. Wealsoseethatwecanexpressanyvectoroftheform(x1, x2,0)T bychoosingthecoefficients1 =x1 and2 =x2.

1Thus, {w , w2} spans the space. Because the set of vectors spans the space and is orthonormal(andhence linearly independent),it isanorthonormalbasisofthespaceV.

Anotherorthonormalsetofbasis isformedby 1 21w = 1 12 and 2w = 1.

5 50 0Wecaneasily verify that they are orthogonalandeachhasa unit length. The coefficients for an

1arbitraryvectorx= (x1, x2,0)T V represented inthebasis{w 2}are, wx11 1

1 2 0 =1)T1 = (w (x1+ 2x2)x= x25 50

x11 12 1 0 =2)T2 = (w (2x1x2).x= x2

5 50

EndAdvancedMaterial16.2 MatrixOperations16.2.1 InterpretationofMatricesRecallthatamatrixARmn consistsofmrowsandncolumnsforthetotalofmnentries,

A=

A11 A12 A1nA21 A22 A2n

... ... ... ...Am1 Am2 Amn

.Thismatrixcanbe interpretedinacolumn-centricmannerasasetofncolumnm-vectors. Alternatively,thematrixcanbe interpreted inarow-centricmannerasasetofmrown-vectors. Eachofthese interpretations isusefulforunderstandingmatrixoperations,whichiscoverednext.

217

7/27/2019 MIT2 086F12 Notes Unit3

24/107

16.2.2 MatrixOperationsThefirstmatrixoperationweconsiderismultiplicationofamatrixARm1n1 byascalarR.Theoperationyields

B=A ,whereeachentryofB

Rm1n1 isgivenby

Bij =Aij, i= 1, . . . , m1, j= 1, . . . , n1 .Similartothemultiplicationofavectorbyascalar,themultiplicationofamatrixbyascalarscaleseachentryofthematrix.

The second operation we consider is addition of two matrices ARm1n1 and B Rm2n2.Theadditionyields

C=A+B ,whereeachentryofCRm1n1 isgivenby

Cij =Aij +Bij, i= 1, . . . , m1, j= 1, . . . , n1 .Inorderforadditionoftwomatricestomakesense,thematricesmusthavethesamedimensions,m1 andn1.

Wecancombinethescalarscalingandadditionoperation. LetARm1n1,BRm1n1,andR. Then,theoperation

C=A+ByieldsamatrixCRm1n1 whoseentriesaregivenby

Cij =Aij +Bij, i= 1, . . . , m1, j= 1, . . . , n1 .Notethatthescalar-matrixmultiplicationandmatrix-matrixadditionoperationstreatthematricesasarraysofnumbers,operatingentrybyentry. Thisisunlikethematrix-matrixprodcut,whichisintroducednextafteranexampleofmatrixscalingandaddition.Example16.2.1matrixscalingandadditionConsiderthefollowingmatricesandscalar,

1 3 0 2 4 9 , B= 2 3A= , and = 2 . 3 4

Then, 1 3 0 2 1 3 + 4 4 9 + 2 2 3= 0 3 C=A+B= . 3 4 3 11

218

7/27/2019 MIT2 086F12 Notes Unit3

25/107

Matrix-MatrixProductLet us consider two matrices A Rm1n1 and B Rm2n2 with n1 = m2. The matrix-matrixproductofthematricesresults in

C=ABwith

n1

Cij = AikBkj, i= 1, . . . , m1, j= 1, . . . , n2 .k=1

Becausethesummation applies to the second indexofAandthefirst indexofB, the numberofcolumnsofAmustmatchthenumberofrowsofB: n1 =m2 must betrue. Letusconsidera fewexamples.Example16.2.2matrix-matrixproductLetusconsidermatricesAR32 andBR23 with

m

1 3

2 3 5

and B= .1 0

1A= 4 90 3

Thematrix-matrixproductyields 1 3 5 3 8

2 3 5 4 9 = 1 12 11 C=AB= ,1 0 1

0 3 3 0 3whereeachentryiscalculatedas

2

m

C11 = A1kBk1 =A11B11+A12B21 = 12 + 31 = 5k=1m2

C12 = A1kBk2 =A11B12+A12B22 = 13 + 30 = 3k=1m2

C13 = A1kBk3 =A11B13+A12B23 = 1 5 + 3(1)=8k=1m2

C21 = A2kBk1 =A21B11+A22B21 =42 + 91 = 1k=1

...m2C33 = A3kBk3 =A31B13+A32B23 = 0 5 + (3)(1)=3 .k=1

NotethatbecauseAR32 andBR23,CR33.Thisisverydifferentfrom

1 32 3 5 10 48= ,

1 6D=BA= 4 91 0 10 3

219

7/27/2019 MIT2 086F12 Notes Unit3

26/107

whereeachentry iscalculatedas3

D11 = A1kBk1 =B11A11+B12A21+B13A31 = 21 + 3(4)+(5)0 =10k=1

...

m

3

D22 = A2kBk2 =B21A12+B22A22+B23A32 = 13 + 09 + (1)(3)=6.k=1

NotethatbecauseBR23 andAR32,DR22. Clearly,C =AB=BA=D;C andD infacthavedifferent dimensions. Thus, matrix-matrixproduct isnot commutative in general, evenifbothAB andBAmakesense.

Example16.2.3 innerproductasmatrix-matrixproductTheinnerproductoftwovectorscanbeconsideredasaspecialcaseofmatrix-matrixproduct. Let

m

1

2v= 3 and w= 0 .

6 4T R13We have v,wR3(=R31). Taking the transpose, we have v . Noting that the second

dimensionofvT andthefirstdimensionofwmatch,wecanperformmatrix-matrixproduct,2

T=v w= 1 3 6 0 = 1(2)+30 + 64=22.4

Example16.2.4outerproductTheouterproductoftwovectors isyetanotherspecialcaseofmatrix-matrixproduct. TheouterproductB oftwovectorsvRm andwRm isdefinedas

TB=vw .TBecausevRm1 andwT R1m, thematrix-matrix product vw iswell-defined andyields as

mmmatrix.As inthepreviousexample, let

1 2v= 3 and w= 0 .6 4

Theouterproductoftwovectors isgivenby 2 2 6 12

Twv = 0 1 3 6 = 0 0 0 .4 4 12 24

Clearly,=vTw=wvT =B,astheyevenhavedifferentdimensions.220

7/27/2019 MIT2 086F12 Notes Unit3

27/107

Intheaboveexample,wesawthatAB=BAingeneral. Infact,ABmightnotevenbeallowedeven if BA is allowed (consider AR21 and BR32). However, although the matrix-matrixproduct isnotcommutative ingeneral,thematrix-matrixproductis associative, i.e.

ABC=A(BC) = (AB)C .Moreover,thematrix-matrixproduct isalsodistributive, i.e.

(A+B)C=AC+BC .

Proof. Theassociativeanddistributivepropertiesofmatrix-matrixproductisreadilyprovenfromitsdefinition. Forassociativity,weconsiderij-entryofthem1n3 matrixA(BC), i.e.

n1 n1 n2 n1 n2 n2 n1m m m mm mm =(A(BC))ij = Aik(BC)kj = Aik BklClj AikBklClj = AikBklCljk=1 k=1 l=1 k=1 l=1 l=1 k=1

n2 n1 n2m m m= AikBklClj = (AB)ilClj =((AB)C)ij, i, j .l=1 k=1 l=1

Sincetheequality(A(BC))ij =((AB)C)ij holdsforallentries,wehaveA(BC) = (AB)C.Thedistributivepropertycanalsobeprovendirectly. Theij-entryof(A+B)Ccanbeexpressed

asn1 n1 n1m m m

((A+B)C)ij = (A+B)ikCkj = (Aik +Bik)Ckj = (AikCkj +BikCkj)k=1 k=1 k=1n1 n1

m m

= AikCkj + BikCkj = (AC)ij + (BC)ij, i, j .k=1 k=1

Again,sincetheequalityholdsforallentries,wehave(A+B)C=AC+BC.

Anotherusefulruleconcerningmatrix-matrixproductandtransposeoperation is(AB)T =BTAT .

Thisrule isusedveryoften.

Proof. Theprooffollowsbycheckingthecomponentsofeachside. The left-handsideyieldsn1m((AB)T)ij = (AB)ji = Ajk Bki .

k=1Theright-handsideyields

n1 n1 n1m m m(BTAT)ij = (BT)ik(AT)kj = BkiAjk = Ajk Bki .

k=1 k=1 k=1

221

7/27/2019 MIT2 086F12 Notes Unit3

28/107

Thus,wehave((AB)T)ij = (BTAT)ij, i= 1, . . . , n2, j= 1, . . . , m1 .

16.2.3 InterpretationsoftheMatrix-VectorProductLet us consider a special case of the matrix-matrix product: the matrix-vector product. Thespecial case arises when the second matrix has only one column. Then, with A Rmn andw=BRn1 =Rn,wehave

C=AB ,where

n n

Cij = AikBkj = Aikwk, i= 1, . . . , m1, j= 1.k=1 k=1

SinceCRm1 =Rm,wecanintroducevRm andconciselywritethematrix-vectorproductasm

v=Aw ,where

m

mnvi = Aikwk, i= 1, . . . , m .

k=1Expandingthesummation,wecanthinkofthematrix-vectorproductas

v1 =A11w1+A12w2+ +A1nwnv2 =A21w1+A22w2+ +A2nwn

...vm =Am1w1+Am2w2+ +Amnwn .

Now,weconsidertwodifferent interpretationsofthematrix-vectorproduct.Row InterpretationThefirst interpretation istherow interpretation,whereweconsiderthematrix-vectormultiplicationasaseriesof innerproducts. Inparticular,weconsidervi asthe innerproductofi-throwofAandw. Inotherwords,thevectorv iscomputedentrybyentryinthesensethat

Ai1 Ai2 Ain

w1w2 i= 1, . . . , m .vi = ...wn

,

222

7/27/2019 MIT2 086F12 Notes Unit3

29/107

Example16.2.5row interpretationofmatrix-vectorproductAnexampleoftherow interpretationofmatrix-vectorproduct is

T0 1 0 3 2 1 0 1 0 2

=

1 0 0 3 2 1 TT

0 0 0

3 2 1

T0 0 1 3 2 1

=

32v=

.1 0 0

0 0 0 301

0 0 1 1

Column InterpretationThe second interpretation is the column interpretation, where we consider the matrix-vectormultiplicationasasumofnvectorscorrespondingtothencolumnsofthematrix,i.e.

A11 A12 A1n

v= w1+ w2+ + wn .A21

...A22

...A2n

...Am1 Am2 Amn

In this case, we consider v as a linear combination of columns of A with coefficients w. Hencev=Aw issimplyanotherwaytowritea linearcombinationofvectors: thecolumnsofAarethevectors,andwcontainsthecoefficients.Example16.2.6column interpretationofmatrix-vectorproductAnexampleofthecolumn interpretationofmatrix-vectorproduct is

0 1 0 0 1 0 2

3v= = 3 10+ 2 00+ 1 00= 30.1 0 00 0 0 21

0 0 1 0 0 1 1Clearly, the outcome of the matrix-vector product is identical to that computed using the rowinterpretation.

LeftVector-MatrixProductWenowconsideranotherspecialcaseofthematrix-matrixproduct: theleftvector-matrixproduct.This special case arises when the first matrix only has one row. Then, we have A

R1m and

TBRmn. LetusdenotethematrixA,which isarowvector,byw . Clearly,wRm,becauseT R1mw . The leftvector-matrixproductyields

v=wTB ,where mm

vj = wkBkj, j= 1, . . . , n .k=1

223

7/27/2019 MIT2 086F12 Notes Unit3

30/107

Theresultantvectorv isarowvector inR1n. The leftvector-matrixproductcanalsobe interpreted in two different manners. The first interpretation considers the product as a series of dotproducts,whereeachentryvj iscomputedasadotproductofwwiththej-thcolumnofB, i.e.

vj = w1 w2 wm B1jB2j

..

.

Bmj , j= 1, . . . , n .

Thesecondinterpretationconsiderstheleftvector-matrixproductasalinearcombinationofrowsofB, i.e.

v=w1 B11 B12 B1n +w2 B21 B22 B2n+ +wm Bm1 Bm2 Bmn .

16.2.4 InterpretationsoftheMatrix-MatrixProductSimilartothematrix-vectorproduct,thematrix-matrixproductcanbeinterpretedinafewdifferentways. Throughoutthediscussion,weassumeARm1n1 andBRn1n2 andhenceC =ABR

m1n2.Matrix-MatrixProductasaSeriesofMatrix-VectorProductsOneinterpretationofthematrix-matrixproductistoconsideritascomputingC onecolumnatatime,wherethej-thcolumnofC resultsfromthematrix-vectorproductofthematrixAwiththej-thcolumnofB,i.e.

Cj =ABj, j= 1, . . . , n2 ,whereCj referstothej-thcolumnofC. Inotherwords,

C1jC2j

...=

A11 A12 A1n1A21 A22 A2n1

. . ... . . .. . . .

B1jB2j

..., j= 1, . . . , n2 .

Cm1j Am11 Am12 Am1n1 Bn1jExample16.2.7matrix-matrixproductasaseriesofmatrix-vectorproductsLetusconsidermatricesAR32 andBR23 with

1 3A= 4 9 2 3 5and B= .1 0 1

0 3ThefirstcolumnofC=ABR33 isgivenby

1 3 5C1 =AB1 = 4 9 2 = 1 .1

0 3 3224

7/27/2019 MIT2 086F12 Notes Unit3

31/107

Similarly,thesecondandthirdcolumnsaregivenby 1 3 3 4 9 3 =

0 12C2 =AB2 =0 3 0

and 1 3 8C3 =AB3 = 4 9 5 = 11 .1

0 3 3PuttingthecolumnsofC together

5 3 8C= C1 C2 C3 = 1 12 11 .

3 0 3

Matrix-MatrixProductasaSeriesofLeftVector-MatrixProductsIn the previous interpretation, we performed the matrix-matrix product by constructing the resultant matrix one column at a time. We can also use a series of left vector-matrix products toconstruct the resultant matrix one row at a time. Namely, in C =AB, the i-th row ofC resultsfromthe leftvector-matrixproductofi-throwofAwiththematrixB, i.e.

Ci =AiB, i= 1, . . . , m1 ,whereCi referstothei-throwofC. Inotherwords,

Ci1 Cin1 = Ai1 Ain1

B11 B1n2. ... . ... .

Bm21 Bm2n2, i= 1, . . . , m1 .

16.2.5 OperationCountofMatrix-MatrixProductMatrix-matrix product is ubiquitous in scientific computing, and significant effort has been putinto efficient performance of the operation on modern computers. Let us now count the numberof additions and multiplications required to compute this product. Consider multiplication ofARm1n1 andBRn1n2. TocomputeC=AB,weperform

mn1Cij = AikBkj, i= 1, . . . , m1, j= 1, . . . , n2 .

k=1ComputingeachCij requiresn1 multiplicationsandn1 additions,yieldingthetotalof2n1 operations. Wemustperformthisform1n2 entriesinC. Thus,thetotaloperationcountforcomputingC is 2m1n1n2. Considering the matrix-vector product and the inner product as special cases ofmatrix-matrixproduct,wecansummarizehowtheoperationcountscales.

225

7/27/2019 MIT2 086F12 Notes Unit3

32/107

Operation Sizes OperationcountMatrix-matrix m1 =n1 =m2 =n2 =n 2n3Matrix-vector m1 =n1 =m2 =n,n2 = 1 2n2Innerproduct n1 =m1 =n,m1 =n2 = 1 2n

The operation count is measured in FLoating Point Operations, or FLOPs. (Note FLOPS isdifferentfromFLOPs: FLOPSreferstoFLoatingPointOperationsperSecond,whichisaspeedassociated

with

a

particular

computer/hardware

and

aparticular

implementation

of

an

algorithm.)

16.2.6 TheInverseofaMatrix(Briefly)Wehave nowstudied thematrix vectorproduct, in which, given a vector xRn, we calculate anew vector b = Ax, where A Rnn and hence b Rn. We may think of this as a forwardproblem, in which given x we calculate b = Ax. We can now also ask about the correspondinginverseproblem: givenb,canwefindxsuchthatAx=b? Note inthissection,andforreasonswhich shall become clear shortly, we shall exclusively consider square matrices, and hence we setm=n.

To begin, let us revert to the scalar case. If b is a scalar and a is a non-zero scalar, we knowthatthe (very simple linear) equation ax=bhas thesolution x=b/a. We may write thismore

1suggestively as x = a1b since of course a = 1/a. It is important to note that the equationax=bhasasolutiononlyifaisnon-zero;ifaiszero,thenofcoursethereisnoxsuchthatax=b.(This isnotquitetrue: in fact, ifb=0anda=0thenax=bhasan infinityofsolutionsanyvalueofx. Wediscussthissingularbutsolvablecase inmoredetailinUnitV.)

Wecannowproceedtothematrixcasebyanalogy.ThematrixequationAx=bcanofcoursebeviewedasasystemof linearequations innunknowns. Thefirstequationstatesthatthe innerproductofthefirstrowofAwithxmustequalb1;ingeneral,theith equationstatesthattheinnerproductoftheith rowofAwithxmustequalbi. Then ifA isnon-zerowecouldplausiblyexpectthatx=A1b. Thisstatementisclearlydeficientintworelatedways: whatwedomeanwhenwesayamatrixisnon-zero? andwhatdowe infactmeanbyA1.

Asregardsthefirstquestion,Ax=bwillhaveasolutionwhenAisnon-singular: non-singularisthe

proper

extension

of

the

scalar

concept

of

non-zero

in

this

linear

systems

context.

Conversely,

if A is singular then (except for special b) Ax = b will have no solution: singular is the properextensionofthescalarconceptofzero inthis linearsystemscontext. Howcanwedetermine ifamatrixA issingular? Unfortunately, it isnotnearlyassimpleasverifying,say,thatthematrixconsistsofat leastonenon-zeroentry,orcontainsallnon-zeroentries.

Therearevariety ofways to determinewhether amatrix isnon-singular, manyof which mayonlymake goodsense in laterchapters(in particular, inUnit V): a non-singularnnmatrix Ahas n independent columns (or, equivalently, n independent rows); a non-singular nn matrixAhasallnon-zeroeigenvalues;anon-singularmatrixAhasanon-zerodeterminant(perhapsthisconditionisclosesttothescalarcase,butitisalsoperhapstheleastuseful);anon-singularmatrixA has allnon-zero pivots in a (partially pivoted) LU decomposition process (described in UnitV). For now, we shall simply assume that A is non-singular. (We should also emphasize that inthenumericalcontextwemustbeconcernednot onlywithmatriceswhich mightbesingularbutalsowithmatriceswhicharealmostsingular insomeappropriatesense.) Asregardsthesecondquestion,wemustfirst introducetheidentity matrix,I.

Letusnowdefineanidentitymatrix. Theidentitymatrixisammsquarematrixwithonesonthediagonalandzeroselsewhere, i.e.

1, i=jIij = .

0, i j226

=

7/27/2019 MIT2 086F12 Notes Unit3

33/107

Identitymatrices inR1,R2,andR3 are 1 0 0

1 0 0 1 0I= 1 , I= , and I= .0 1

0 0 1The identitymatrix isconventionallydenotedbyI. IfvRm,thei-thentryofIv is

m

(Iv)i = Iikvkk=1

0 0 0 0=(( +Ii1v1+ + Ii,i+1vi+1+Ii,i1vi1+Iiivi+ Iimvm=vi, i= 1, . . . , m .

TI TSo, we have Iv = v. Following the same argument, we also have v = v . In essence, I is them-dimensionalversionofone.

WemaythendefineA1 asthat(unique)matrixsuchthatA1A=I. (Ofcourseinthescalar1 1 1case, this defines a as the unique scalar such that a a = 1 and hence a = 1/a.) In fact,

A1A=I andalsoAA1 =I andthus this isa case inwhichmatrixmultiplicationdoes indeedcommute. Wecannowderivetheresultx=A1b: webeginwithAx=bandmultiplybothsidesbyA1 toobtainA1Ax=A1bor,sincethematrixproduct isassociative,x=A1b. OfcoursethisdefinitionofA1 doesnotyettellushowtofindA1: weshallshortlyconsiderthisquestionfrom a pragmaticMatlab perspective and then in Unit V from a more fundamental numericallinear algebra perspective. We should note here, however, that the matrix inverse is very rarelycomputedorused inpractice, forreasonswewillunderstand inUnitV.Nevertheless,the inversecan be quite useful for very small systems (n small) and of course more generally as an central

m

concept in the consideration of linear systems of equations. Example16.2.8The inverseofa 22matrixWeconsiderherethecaseofa2

2matrixAwhichwewriteas

A= a b .c d

Ifthecolumnsaretobeindependentwemusthavea/b=c/dor(ad)/(bc)=1oradbc=0whichinfactistheconditionthatthedeterminantofAisnonzero. The inverseofA isthengivenby

1 d bA1 = .

adbc c aNotethatthis inverse isonlydefined ifadbc=0,andwethusseethenecessitythatA isnon-singular. It isasimplemattertoshowbyexplicitmatrixmultiplicationthatA1A=AA1 =I,asdesired.

16.3 SpecialMatricesLetusnowintroduceafewspecialmatricesthatweshallencounterfrequentlyinnumericalmethods.

227

7/27/2019 MIT2 086F12 Notes Unit3

34/107

16.3.1 DiagonalMatricesAsquarematrixA issaidtobediagonal iftheoff-diagonalentriesarezero, i.e.

Aij = 0, i=j .Example16.3.1diagonalmatricesExamplesofdiagonalmatrixare

2 0 01 0

= 0 1 0, and C= 4 .A= , B0 3

0 0 7Theidentitymatrixisaspecialcaseofadiagonalmatrixwithalltheentriesinthediagonalequalto 1. Any 11matrix istriviallydiagonalas itdoesnothaveanyoff-diagonalentries.

16.3.2 SymmetricMatricesA square matrix A is said to be symmetric if the off-diagonal entries are symmetric about thediagonal, i.e.

Aij =Aji , i= 1, . . . , m , j= 1, . . . , m .Theequivalentstatement isthatA isnotchangedbythetransposeoperation, i.e.

AT =A .Wenotethatthe identitymatrix isaspecialcaseofsymmetricmatrix. Letus lookatafewmoreexamples.Example16.3.2SymmetricmatricesExamplesofsymmetricmatricesare

2 31 2

= 1 1, and C= 4 .A= , B2 33 1 7

Notethatanyscalar,ora11matrix, istriviallysymmetricandunchangedundertranspose.

16.3.3 SymmetricPositiveDefiniteMatricesA mm square matrix A is said to be symmetric positive definite (SPD) if it is symmetric andfurthermoresatisfies

vTAv>0, vRm (v=0) .Beforewediscuss itsproperties, letusgiveanexampleofaSPDmatrix.

228

7/27/2019 MIT2 086F12 Notes Unit3

35/107

Example16.3.3SymmetricpositivedefinitematricesAnexampleofasymmetricpositivedefinitematrix is

2 1A= .1 2

WecanconfirmthatAissymmetricbyinspection. TocheckifAispositivedefinite,letusconsiderthequadraticform

2 2 2 2m m m mq(v)vTAv= vi Aijvj= Aijvivj

i=1 j=1 i=1 j=1=A11v12+A12v1v2+A21v2v1+A22v22=A11v12+ 2A12v1v2+A22v2 ,2

wherethe last equality follows from thesymmetry condition A12 =A21. Substituting theentriesofA,

2 21 1 1 32 2 2 2 2q(v) =vTAv= 2v12v1v2+ 2v2 = 2 v1 v2 v2 +v = 2 v1 v2 + v .2 22 4 2 4Becauseq(v) isasumoftwopositiveterms(eachsquared), it isnon-negative. It isequaltozeroonly if

1 3 2v1 v2 =0 and v2 = 0.2 4Thesecond conditionrequires v2 =0, andthe first conditionwithv2 =0 requires v1 =0. Thus,wehave

q(v) =vTAv>0, vR2 ,andvTAv= 0ifv=0. ThusA issymmetricpositivedefinite.

Symmetricpositivedefinitematricesareencounteredinmanyareasofengineeringandscience.

Theyarisenaturallyinthenumericalsolutionof,forexample,theheatequation,thewaveequation,andthelinearelasticityequations. Oneimportantpropertyofsymmetricpositivedefinitematricesisthattheyarealways invertible: A1 alwaysexists. Thus, ifA isanSPD matrix,then, foranyb,thereisalwaysauniquexsuchthat

Ax=b .Inalaterunit,wewilldiscusstechniquesforsolutionoflinearsystems,suchastheoneabove. Fornow, wejustnote that thereare particularly efficient techniques for solving thesystem whenthematrix issymmetricpositivedefinite.

229

7/27/2019 MIT2 086F12 Notes Unit3

36/107

16.3.4 TriangularMatricesTriangular matrices are square matrices whose entries are all zeros either below or above thediagonal. A mm square matrix is said to be upper triangular if all entries below the diagonalarezero, i.e.

Aij = 0, i > j .Asquarematrix issaidtobe lowertriangular ifallentriesabovethediagonalarezero, i.e.

Aij = 0, j > i .We will see later that a linear system, Ax = b, in which A is a triangular matrix is particularlyeasytosolve. Furthermore,thelinearsystemisguaranteedtohaveauniquesolutionaslongasalldiagonalentriesarenonzero.Example16.3.4triangularmatricesExamplesofuppertriangularmatricesare

1 0 21 2

A= and B= 0 4 1 .0 3 0 0 3Examplesof lowertriangularmatricesare

2 0 01 0

C= and D= 7 5 0.7 63 1 4

BeginAdvancedMaterial

16.3.5Orthogonal

Matrices

AmmsquarematrixQ issaidtobeorthogonal if itscolumns formanorthonormalset. Thatis, ifwedenotethej-thcolumnofQbyqj,wehave

Q= q1 q2 qm ,where

T 1, i=jqi qj = .0, i=j

OrthogonalmatriceshaveaspecialpropertyQTQ=I .

ThisrelationshipfollowsdirectlyfromthefactthatcolumnsofQformanorthonormalset. Recallthat the ij entry ofQTQ isthe inner productofthe i-throwof QT (which is the i-thcolumnofQ)andthej-thcolumnofQ. Thus,

T 1, i=j(QTQ)ij =qi qj = ,0, i=j

230

7/27/2019 MIT2 086F12 Notes Unit3

37/107

which isthedefinitionofthe identitymatrix. OrthogonalmatricesalsosatisfyQQT =I ,

which infact isaminormiracle.Example16.3.5OrthogonalmatricesExamplesoforthogonalmatricesare

2/ 5 1/ 5 1 0Q= and I= .

1/ 5 2/ 5 0 1Wecaneasilyverifythatthecolumnsthematrix Qareorthogonalto eachotherandeachareofunitlength. Thus,Qisanorthogonalmatrix. WecanalsodirectlyconfirmthatQTQ=QQT =I.Similarly,the identitymatrix istriviallyorthogonal.

Let us discuss a few important properties of orthogonal matrices. First, the action by an

orthogonalmatrixpreservesthe2-normofavector, i.e.IQxI2 =IxI2, xRm .

Thisfollowsdirectlyfromthedefinitionof2-normandthefactthatQTQ=I, i.e.TIQxI22 = (Qx)T(Qx) =xTQTQx=xTIx=x x=IxI22 .

Second, orthogonal matrices are always invertible. In fact, solving a linear system defined by anorthogonalmatrix istrivialbecause

Qx=b QTQx=QTb x=QTb .In considering linear spaces, we observed that a basis provides a unique description of vectors inV intermsofthecoefficients. AscolumnsofQformanorthonormalsetofm m-vectors,itcanbethoughtofasanbasisofRm. InsolvingQx=b,wearefindingtherepresentationofbincoefficientsof{q1, . . . , q m}. Thus, the operation byQT (or Q) represent asimple coordinatetransformation.LetussolidifythisideabyshowingthatarotationmatrixinR2 isanorthogonalmatrix.Example16.3.6RotationmatrixRotation of a vector is equivalent to representing the vector in a rotated coordinate system. Arotationmatrixthatrotatesavector inR2 byangle is

cos() sin()R() = .

sin() cos()Let us verify that the rotation matrix is orthogonal for any . The two columns are orthogonalbecause

sin()Tr1r2 = cos() sin() =cos()sin()+sin()cos() = 0, .cos()Eachcolumn isofunitlengthbecause

Ir1I22 =(cos())2+(sin())2 = 1Ir2I22 = (sin())2+(cos())2 = 1, .

231

7/27/2019 MIT2 086F12 Notes Unit3

38/107

Thus, the columns of the rotation matrix is orthonormal, and the matrix is orthogonal. Thisresult verifies that the action of the orthogonal matrix represents a coordinate transformation inR

2. The interpretationofanorthogonalmatrixasacoordinatetransformationreadilyextendstohigher-dimensionalspaces.

16.3.6 OrthonormalMatricesLetusdefineorthonormalmatricestobemnmatriceswhosecolumnsformanorthonormalset,i.e.

Q= q1 q2 qn ,with

T 1, i=jqi qj =0, i=j .

Note that, unlike an orthogonal matrix, we do not require the matrix to be square. Just likeorthogonalmatrices,wehave

QTQ=I ,where I is an nn matrix. The proof is identical to that for the orthogonal matrix. However,QQT doesnotyieldan identitymatrix,

QQT =I ,unlessofcoursem=n.Example16.3.7orthonormalmatricesAn

example

of

an

orthonormal

matrix

is

1/ 6 2/ 5 Q= 2/ 6 1/ 5 .1/ 6 0

WecanverifythatQTQ=I because 1/ 6 2/ 5 1/ 6 2/ 6 1/ 6 1 0= . QTQ= 2/ 6 1/ 5 0 12/ 5 1/ 5 01/ 6 0

However,QQT =I because 1/ 6 2/ 5 29/30 1/15 1/61/15 13/15 1/3 1/ 6 2/ 6 1/ 6 = QQT = 2/ 6 1/ 5 . 2/ 5 1/ 5 01/6 1/3 1/61/ 6 0

EndAdvancedMaterial

232

=

7/27/2019 MIT2 086F12 Notes Unit3

39/107

BeginAdvancedMaterial16.4 FurtherConcepts inLinearAlgebra16.4.1 ColumnSpaceandNullSpaceLet us introduce more concepts in linear algebra. First is the column space. The column spaceof matrix A is a space of vectors that can be expressed as Ax. From the column interpretationofmatrix-vector product, we recallthat Ax is a linearcombination ofthecolumns ofAwiththeweightsprovidedbyx. WewilldenotethecolumnspaceofARmn bycol(A),andthespace isdefinedas

col(A) ={vRm :v=AxforsomexRn} .ThecolumnspaceofA isalsocalledtheimageofA, img(A),ortherangeofA,range(A).

Thesecondconceptisthenullspace. ThenullspaceofARmn isdenotedbynull(A)andisdefinedas

null(A) ={xRn :Ax= 0} ,i.e., the null space of A is a space of vectors that results in Ax = 0. Recalling the columninterpretationofmatrix-vectorproductandthedefinitionoflinearindependence,wenotethatthecolumnsofAmustbe linearlydependent inorder forAtohaveanon-trivialnullspace. ThenullspacedefinedaboveismoreformallyknownastherightnullspaceandalsocalledthekernelofA,ker(A).Example16.4.1columnspaceandnullspaceLetusconsidera32matrix

0 2A=

1 0

.

0 0ThecolumnspaceofA isthesetofvectorsrepresentableasAx,whichare

0 2 0 2 2x2Ax= 1 0 x1 = 1x1+ 0x2 = .x1x20 0 0 0 0

So,thecolumnspaceofAisasetofvectorswitharbitraryvalues inthefirsttwoentriesandzerointhethirdentry. That is,col(A) isthe1-2plane inR3.

BecausethecolumnsofAarelinearlyindependent,theonlywaytorealizeAx=0isifxisthezerovector. Thus,thenullspaceofAconsistsofthezerovectoronly.

Letusnowconsidera23matrixB= 1

2 21 03 .ThecolumnspaceofB consistsofvectorsoftheform

x11 2 0 x1+ 2x2= .Bx= x22 1 3 2x1x2+ 3x3x3233

7/27/2019 MIT2 086F12 Notes Unit3

40/107

Byjudiciouslychoosingx1,x2,andx3,wecanexpressanyvectors inR2. Thus,thecolumnspaceofB isentireR2, i.e.,col(B) =R2.

BecausethecolumnsofB arenot linearly independent,weexpectB tohaveanontrivialnullspace. Invokingtherowinterpretationofmatrix-vectorproduct,avectorxinthenullspacemustsatisfy

x1+ 2x2 = 0 and 2x1

x2+ 3x3 = 0 .The first equation requires x1 =2x2. The combination of the first requirement and the second

5equationyieldsx3 = 3x2. Thus,thenullspaceofB is 2 :R .null(B) = 1 5/3Thus,thenullspace isaone-dimensionalspace(i.e.,a line) inR3.

16.4.2 ProjectorsAnotherimportantconceptinparticularforleastsquarescoveredinChapter17istheconceptofprojector. Aprojector isasquarematrixP that is idempotent, i.e.

P2 =P P =P .Letv beanarbitraryvector inRm. TheprojectorP projectsv,which isnotnecessary incol(P),ontocol(P), i.e.

w=Pvcol(P), vRm .In addition, the projector P does not modify a vector that is already in col(P). This is easilyverifiedbecause

Pw=PPv=Pv=w, wcol(P) .Intuitively,aprojectorprojectsavectorvRm ontoasmallerspacecol(P). Ifthevectorisalreadyincol(P),then itwouldbe leftunchanged.

The complementary projector of P is a projector IP. It is easy to verify that IP is aprojector itselfbecause

(IP)2 = (IP)(IP) =I2P+P P =IP .ItcanbeshownthatthecomplementaryprojectorI

P projectsontothenullspaceofP,null(P).

Whenthe spacealongwhichthe projectorprojects isorthogonalto the space ontowhichtheprojector projects, the projector is said to be an orthogonal projector. Algebraically, orthogonalprojectorsaresymmetric.

When an orthonormal basis for a space is available, it is particularly simple to construct anorthogonalprojectorontothespace. Say{q1, . . . , q n} isanorthonormalbasis foran-dimensionalsubspaceofRm,n < m. GivenanyvRm,werecallthat

Tui =qi v234

7/27/2019 MIT2 086F12 Notes Unit3

41/107

is the component of v in the direction of qi represented in the basis{qi}. We then introduce thevector

wi =qi(qiTv) ;the sum of such vectors would produce the projection of vRm onto V spanned by{qi}. Morecompactly, ifweformanmnmatrix

Q= q1 qn ,thentheprojectionofv ontothecolumnspaceofQis

w=Q(QTv) = (QQT)v .Werecognizethattheorthogonalprojectorontothespanof{qi}orcol(Q) is

P =QQT .Of course P is symmetric, (QQT)T = (QT)TQT = QQT, and idempotent, (QQT)(QQT) =Q(Q

T

Q)QT

=QQT

. EndAdvancedMaterial

235

7/27/2019 MIT2 086F12 Notes Unit3

42/107

236

7/27/2019 MIT2 086F12 Notes Unit3

43/107

Chapter17LeastSquares

17.1Data

Fitting

in

Absence

of

Noise

and

Bias

We motivate our discussion by reconsidering the friction coefficient example of Chapter 15. Werecall that, according to Amontons, the static friction, Ff,static, and the applied normal force,Fnormal,applied,arerelatedby

Ff,static sFnormal,applied ;here s is the coefficient of friction, which is only dependent on the two materials in contact. Inparticular,themaximum staticfriction isa linearfunctionoftheappliednormalforce, i.e.

Fmaxf,static =sFnormal,applied .We wish to deduce s by measuring the maximum static friction attainable for several differentvaluesoftheappliednormalforce.

Ourapproachtothisproblemistofirstchoosetheformofamodelbasedonphysicalprinciplesandthen deducetheparametersbased ona set of measurements. In particular, letus consider asimpleaffinemodel

y=Ymodel(x;) =0+1x .The variable y is the predicted quantity, or the output, which is the maximum static frictionFmax The variable x is the independent variable, or the input, which is the maximum normalf,static.force Fnormal,applied. The function Ymodel is our predictive model which is parameterized by aparameter = (0, 1). NotethatAmontons law isaparticularcaseofourgeneralaffinemodelwith0 =0and1 =s. Ifwetakemnoise-free measurementsandAmontonslawisexact,thenweexpect

Fmaxf,static i =sFnormal,applied i, i= 1, . . . , m .Theequationshouldbesatisfiedexactly foreachoneofthemmeasurements. Accordingly,thereisalsoauniquesolutiontoourmodel-parameteridentificationproblem

yi =0+1xi, i= 1, . . . , m ,237

DRAFT V1.2 The Authors. License: Creative Commons BY-NC-SA 3.0.
http://creativecommons.org/licenses/by-nc-sa/3.0/us/http://creativecommons.org/licenses/by-nc-sa/3.0/us/

7/27/2019 MIT2 086F12 Notes Unit3

44/107

withthesolutiongivenbytrue =0andtrue =s.0 1

Because the dependency of the output y on the model parameters {0, 1} is linear, we canwritethesystemofequationsasam2matrixequation

1 x1

0 =1

y1y2...

1 x2. .. .. .

,1 xm ym' v '" v '" v"

X Yor,morecompactly,

X=Y .Using therow interpretationof matrix-vectormultiplication, we immediately recover theoriginalsetofequations,

0+1x1 =y1y2...=Y .X= 0+1x2...

0+1xm ymOr, using the column interpretation, we see that our parameter fitting problem corresponds tochoosingthetwoweightsforthetwom-vectorstomatchtheright-handside,

1 x1

=

y1y2...

=Y .X=0

+1

1...

x2...

1

xm

ym

We emphasize that the linear system X = Y is overdetermined, i.e., more equations thanunknowns(m > n). (Westudythis inmoredetail inthenextsection.) However,wecanstillfindasolutiontothesystembecausethefollowingtwoconditionsaresatisfied:

Unbiased: Ourmodelincludesthetruefunctionaldependencey=sx,andthusthemodelis capable of representing this true underlying functional dependence. This would not bethe case if, for example, we consider a constant model y(x) =0 because our model wouldbe incapableofrepresentingthe lineardependenceofthe friction forceonthenormal force.Clearly the assumption of no bias is a very strong assumption given the complexity of thephysicalworld.Noise free: We have perfect measurements: each measurement yi corresponding to theindependent variable xi provides the exact value of the friction force. Obviously this isagainrathernaveandwillneedtoberelaxed.

Undertheseassumptions,thereexistsaparametertruethatcompletelydescribethemeasurements,i.e.

yi =Ymodel(x;true), i= 1, . . . , m .238

7/27/2019 MIT2 086F12 Notes Unit3

45/107

(Thetrue willbeuniqueifthecolumnsofXareindependent.) Consequently,ourpredictivemodelisperfect,andwecanexactlypredicttheexperimentaloutputforanychoiceofx, i.e.

Y(x) =Ymodel(x;true), x ,whereY(x)istheexperimentalmeasurementcorrespondingtotheconditiondescribedbyx. However, in practice, the bias-free and noise-free assumptions are rarely satisfied, and our model isneveraperfectpredictorofthereality.InChapter19,wewilldevelopaprobabilistictoolforquantifyingtheeffectofnoiseandbias;thecurrent chapter focuses on developing a least-squares technique for solving overdetermined linearsystem(in the deterministiccontext) which isessentialto solvingthesedata fittingproblems. Inparticularwewillconsiderastrategyforsolvingoverdetermined linearsystemsoftheform

Bz=g ,whereBRmn,zRn,andgRm withm > n.

Beforewediscusstheleast-squaresstrategy,letusconsideranotherexampleofoverdeterminedsystems in the context of polynomial fitting. Let us consider a particle experiencing constantacceleration,e.g. duetogravity. Weknowthatthepositionyoftheparticleattimetisdescribedbyaquadraticfunction

y(t) = 1at2+v0t+y0 ,2

where a is the acceleration, v0 is the initial velocity, and y0 is the initial position. Suppose thatwedonotknowthethreeparametersa,v0,andy0 thatgovernthemotionoftheparticleandweare interested indeterminingtheparameters. Wecoulddothisbyfirstmeasuringthepositionofthe particle at several different times and recording the pairs {ti, y(ti)}. Then, we could fit ourmeasurementstothequadraticmodeltodeducetheparameters.

Theproblemoffindingtheparametersthatgovernthemotionoftheparticle isaspecialcaseofamoregeneralproblem: polynomialfitting. Letusconsideraquadraticpolynomial, i.e.

y(x) =true+truex+true 2x ,0 1 2where true = {true, true, true} is the set of true parameters characterizing the modeled phe0 1 2nomenon. Suppose that we do not know true but we do know that our output depends on theinputx inaquadraticmanner. Thus,weconsideramodeloftheform

Ymodel(x;) =0+1x+2x2 ,andwedeterminethecoefficientsbymeasuringtheoutputy forseveraldifferentvaluesofx. Weare free to choose the number of measurements m and the measurement points xi, i= 1, . . . , m.Inparticular,uponchoosingthemeasurementpointsandtakingameasurementateachpoint,weobtainasystemoflinearequations,

yi =Ymodel(xi;) =0+1xi+2xi2, i= 1, . . . , m ,whereyi isthemeasurementcorrespondingtothe inputxi.

2Notethattheequationislinearinourunknowns{0, 1, 2}(theappearanceofx onlyaffectsithe manner in which data enters the equation). Because the dependency on the parameters is

239

7/27/2019 MIT2 086F12 Notes Unit3

46/107

linear,wecanwritethesystemasmatrixequation, 1 x1 x2 y1 1

01 x2 x2y2 2 1= ,. . . .. . . .

.

. . .

2

ym 1 xm x2m"'"'"'vY vX v

or,morecompactly,Y =X .

NotethatthisparticularmatrixX hasaratherspecialstructureeachrowformsageometricsej1riesandtheij-thentryisgivenbyBij =x . MatriceswiththisstructurearecalledVandermondei

matrices.Asinthefrictioncoefficientexampleconsideredearlier,therowinterpretationofmatrix-vector

productrecoverstheoriginalsetofequation

0+1x1+2x2y1y2...

12

Y = = =X .0+1x2+2x2...0+1xm+2x2ym m

With the column interpretation, we immediately recognize that this is a problem of finding thethreecoefficients,orparameters,ofthelinearcombinationthatyieldsthedesiredm-vectorY,i.e.

1 x1 x2y1y2...

=0

12

Y =

=X .1...

x2...

x2+1 +2 ...

2ym 1 xm xmWeknowthat ifhavethreeormorenon-degeneratemeasurements(i.e.,m3),thenwecanfindtheuniquesolutiontothelinearsystem. Moreover,thesolutionisthecoefficientsoftheunderlying

, true, truepolynomial,(true ).0 1 2Example17.1.1AquadraticpolynomialLetusconsideramorespecificcase,wheretheunderlyingpolynomial isoftheform

1 2 1 2y(x) = + x cx .2 3 8

Werecognizethaty(x) =Ymodel(x;true)forYmodel(x;) =0+1x+2x2 andthetrueparameters1 2 1

true true true=

, = , and =

c .0 1 22 3 8

The parameter c controls the degree of quadratic dependency; in particular, c = 0 results in anaffinefunction.

First, we consider the case with c = 1, which results in a strong quadratic dependency, i.e.,true = 1/8. The result of measuring y at three non-degenerate points (m = 3) is shown inFigure17.1(a). Solvingthe33 linearsystemwiththecoefficientsastheunknown,weobtain

1 2 10 =

2, 1 =

3, and 2 =

8 .240

2

7/27/2019 MIT2 086F12 Notes Unit3

47/107

0 0.5 1 1.5 2 2.5 30.8

0.6

0.4

0.2

0

0.2

0.4

0.6

x

y

ymeas

y

0 0.5 1 1.5 2 2.5 30.8

0.6

0.4

0.2

0

0.2

0.4

0.6

x

y

ymeas

y

(a)m=3 (b)m= 7Figure17.1: Deducingthecoefficientsofapolynomialwithastrongquadraticdependence.

Notsurprisingly,

we

can

find

the

true

coefficients

of

the

quadratic

equation

using

three

data

points.

Suppose we take more measurements. An example of taking seven measurements (m = 7) is

shown in Figure 17.1(b). We now have seven data points and three unknowns, so we must solvethe 73 linear system, i.e., find the set ={0, 1, 2} that satisfies all seven equations. Thesolutiontothe linearsystem,ofcourse, isgivenby

1 2 10 = , 1 = , and 2 = .

2 3 8Theresultiscorrect(=true)and,inparticular,nodifferentfromtheresultforthem=3case.

Wecanmodifytheunderlyingpolynomialslightlyandrepeatthesameexercise. Forexample,let

us

consider

the

case

with

c= 1/10,

which

results

in

amuch

weaker

quadratic

dependency

of

y on x, i.e., true = 1/80. As shown in Figure 17.1.1, we can take either m = 3 or m = 72measurements. Similartothec=1case,weidentifythetruecoefficients,

1 2 10 = , 1 = , and 2 = ,

2 3 80usingtheeitherm= 3orm=7(infactusinganythreeormorenon-degeneratemeasurements).

In the friction coefficient determination and the (particle motion) polynomial identification

problems,wehaveseenthatwecanfindasolutiontothemnoverdeterminedsystem(m > n)if(a) ourmodel includestheunderlying input-outputfunctionaldependencenobias;(b) andthemeasurementsareperfectnonoise.

As already stated, in practice, these two assumptions are rarely satisfied; i.e., models are often(infact,always)incompleteandmeasurementsareoften inaccurate. (Forexample,inourparticlemotion model, we have neglected friction.) We can still construct a mn linearsystem Bz =gusing our model and measurements, but the solution to the system in general does not exist.Knowingthatwecannotfindthesolutiontotheoverdetermined linearsystem,ourobjective is

241

7/27/2019 MIT2 086F12 Notes Unit3

48/107

0 0.5 1 1.5 2 2.5 31

0.5

0

0.5

1

1.5

2

x

y

ymeas

y

0 0.5 1 1.5 2 2.5 31

0.5

0

0.5

1

1.5

2

x

y

ymeas

y

(a)m=3 (b)m= 7Figure17.2: Deducingthecoefficientsofapolynomialwithaweakquadraticdependence.

tofind

asolution

that

is

close

to

satisfying

the

solution.

In

the

following

section,

we

will

define

thenotionofclosenesssuitableforouranalysisandintroduceageneralprocedureforfindingtheclosestsolutiontoageneraloverdeterminedsystemoftheform

Bz=g ,where B Rmn with m > n. We will subsequently address the meaning and interpretation ofthis(non-)solution.17.2 OverdeterminedSystemsLet us consider an overdetermined linear system such as the one arising from the regressionexampleearlieroftheform

Bz=g ,or,moreexplicitly,

B11 B12 g1g2 z1 = .B21 B22 z2B31 B32 g3

Our objective is to find z that makes the three-component vector equation true, i.e., find thesolution

to

the

linear

system.

In

Chapter

16,

we

considered

the

forward

problem

of

matrix-vectormultiplicationinwhich,givenz,wecalculateg=Bz . Wealsobrieflydiscussedtheinverse

probleminwhichgivengwewouldliketofindz. Butform=n,B1 doesnotexist;asdiscussedintheprevioussection,theremaybenoz thatsatisfiesBz=g. Thus,wewillneedto look forazthatsatisfiestheequationcloselyinthesensewemustspecifyandinterpret. Thisisthefocusofthissection.1

1Note later (inUnitV) weshall look attheostensibly simpler case inwhichB issquare anda solutionz existsandisevenunique. But,formanyreasons,overdeterminedsystemsareanicerplacetostart.

242

7/27/2019 MIT2 086F12 Notes Unit3

49/107

Row InterpretationLet us consider a row interpretation of the overdetermined system. Satisfying the linear systemrequires

Bi1z1+Bi2z2 =gi, i= 1,2,3 .Note that each of these equations define a line inR2. Thus, satisfying the three equations isequivalent tofindinga point that is shared by all three lines, which in general is not possible, aswewilldemonstrate inanexample.Example17.2.1row interpretationofoverdeterminedsystemLetusconsideranoverdeterminedsystem

1 2 z1 = 5/22 .2 1z22 3 2

Using the row interpretation of the linear system, we see there are three linear equations to besatisfied. Thesetofpointsx= (x1, x2)thatsatisfiesthefirstequation,

51x1+ 2x2 = ,2

forma lineL1 ={(x1, x2): 1x2+ 2x2 = 5/2}

inthetwodimensionalspace. Similarly,thesetsofpointsthatsatisfythesecondandthirdequationsform linesdescribedby

L2 ={(x1, x2):2x1+ 1x2 = 2}L3 ={(x1, x2):2x13x2 =2} .

ThesesetofpointsinL1,L2,andL3,orthe lines,areshown inFigure17.3(a).The solution to the linear system must satisfy each of the three equations, i.e., belong to all

three lines. This means that there must be an intersection of all three lines and, if it exists, thesolution isthe intersection. This linearsystemhasthesolution

1/2z= .

1However,threelinesintersectinginR2 isarareoccurrence;infacttheright-handsideofthesystemwas chosen carefully so that the system has a solution in this example. If we perturb either thematrixortheright-handsideofthesystem,itislikelythatthethreelineswillnolongerintersect.

Amoretypicaloverdeterminedsystem isthefollowingsystem,

1 2 0 2 1 z1 = 2 .z22 3 4Again,interpretingthematrixequationasasystemofthreelinearequations,wecanillustratetheset of points that satisfy each equation as a line inR2 as shown in Figure 17.3(b). There is nosolutiontothisoverdeterminedsystem,becausethere isnopoint(z1, z2)thatbelongstoallthreelines, i.e.,thethree linesdonot intersectatapoint.

243

7/27/2019 MIT2 086F12 Notes Unit3

50/107

2 1 0 1 22

1

0

1

2

L1

L2

L3

2 1 0 1 22

1

0

1

2

L1

L2

L3

(a) systemwithasolution (b) systemwithoutasolutionFigure17.3: Illustrationoftherowinterpretationoftheoverdeterminedsystems. EachlineisasetofpointsthatsatisfiesBix=gi,i= 1,2,3.

ColumnInterpretationLet us now consider a column interpretation of the overdetermined system. Satisfying the linearsystemrequires

B11 B12 = g1g2 .z1 +z2 B21 B22B31 B32 g3

Inotherwords,weconsidera linearcombinationoftwovectors inR3 andtrytomatchtheright-handsidegR3. ThevectorsspanatmostaplaneinR3,sothereisnoweight(z1, z2)thatmakestheequationholdunlessthevectorghappenstolieintheplane. Toclarifytheidea,letusconsideraspecificexample.Example17.2.2column interpretationofoverdeterminedsystemForsimplicity, letusconsiderthefollowingspecialcase:

1 0 1 0 1 z1 = .3/2z20 0 2

Thecolumn interpretationresults in

1 0 1 0z1+ 1z2 = 3/2.0 0 2

Bychangingz1 andz2,wecanmove intheplane 1 0 z1 0z1+ 1z2 = .z20 0 0

244

7/27/2019 MIT2 086F12 Notes Unit3

51/107

2

1

0

1

2

21

01

2

1

0

1

2

col(B)

span(col(B))

g

Bz*

Figure17.4: Illustrationofthecolumn interpretationoftheoverdeterminedsystem.Clearly, ifg3 =0, it isnot possibletofindz1 andz2 thatsatisfythe linearequation,Bz =g. Inotherwords,gmust lie intheplanespannedbythecolumnsofB,whichisthe12plane inthiscase.Figure 17.4 illustrates the column interpretation of the overdetermined system. The vectorg R3 does not lie in the space spanned by the columns of B, thus there is no solution to the

system. However, ifg3 issmall,thenwecanfindaz suchthatBz isclosetog, i.e.,agoodapproximationtog. Suchanapproximation isshown inthefigure,andthenextsectiondiscusseshowtofindsuchanapproximation.

17.3 LeastSquares17.3.1 MeasuresofClosenessIn the previous section, we observed that it is in general not possible to find a solution to anoverdeterminedsystem. Ouraim isthustofindz suchthatBz isclosetog, i.e.,z suchthat

Bzg ,forBRmn,m > n. Forconvenience, letus introducetheresidual function,which isdefinedas

r(z)gBz .Notethat

nmri =gi(Bz)i =gi Bijzj, i= 1, . . . , m .j=1

Thus,ri istheextenttowhichi-thequation(Bz)i =gi isnotsatisfied. Inparticular,ifri(z)=0,i= 1, . . . , m,thenBz=g andz isthesolutiontothe linearsystem. Wenotethattheresidual isa measure of closeness described by mvalues. It ismoreconvenient to havea single scalar valueforassessingtheextenttowhichtheequation issatisfied. Asimplewaytoachievethis istotakea norm of the residual vector. Different norms result in different measures of closeness, which inturnproducedifferentbest-fitsolutions.

245

7/27/2019 MIT2 086F12 Notes Unit3

52/107

BeginAdvancedMaterialLetusconsiderfirsttwoexamples,neitherofwhichwewillpursue inthischapter.

Example17.3.1 1minimizationThe first method is based on measuring the residual in the 1-norm. The scalar representing theextentofmismatch is

m mm mJ1(z) Ir(z)I1 = |ri(z)|= |(gBz)i| .

i=1 i=1Thebestz,denotedbyz, isthez thatminimizestheextentofmismatchmeasured inJ1(z), i.e.

z =arg min J1(z) .zRm

TheargminzRn J1(z)returnstheargumentz thatminimizesthe functionJ1(z). Inotherwords,z satisfies

J1(z)J1(z), zRm .This minimization problem can be formulated as a linear programming problem. The minimizeris not necessarily unique and the solution procedure is not as simple as that resulting from the2-norm. Thus,wewillnotpursuethisoptionhere.

Example17.3.2 minimizationThesecondmethodisbasedonmeasuringtheresidualinthe-norm. Thescalarrepresentingtheextentofmismatch is

J(z) Ir(z)I = max |ri(z)|= max |(gBz)i|.i=1,...,m i=1,...,mThebestz thatminimizesJ(z) is

z =argminJ(z) .zRn

This so-called min-max problem can also be cast as a linear programming problem. Again, thisprocedureisrathercomplicated,andthesolution isnotnecessarilyunique.

EndAdvancedMaterial

17.3.2 Least-SquaresFormulation(.2minimization)Minimizingtheresidualmeasured in(say)the1-normor-normresultsinalinearprogrammingproblemthat isnotsoeasytosolve. Herewewillshowthatmeasuringtheresidual inthe2-normresults inaparticularlysimpleminimizationproblem. Moreover,thesolutiontotheminimizationproblemisuniqueassumingthatthematrixB isfullrankhasnindependentcolumns. WeshallassumethatB does indeedhave independentcolumns.

246

7/27/2019 MIT2 086F12 Notes Unit3

53/107

Thescalarfunctionrepresentingtheextentofmismatchfor2 minimization isJ2(z) Ir(z)I2 =rT(z)r(z) = (gBz)T(gBz) .2

Notethatweconsiderthesquareofthe2-normforconvenience,ratherthanthe2-normitself. Ourobjective istofindz suchthat

z

=arg

min

zRnJ2(z)

,

which isequivalenttofindz withIgBzI2 =J2(z)< J2(z) =IgBzI22, z=z .2

(Noteargminreferstotheargumentwhichminimizes: soministheminimum andargministheminimizer.) NotethatwecanwriteourobjectivefunctionJ2(z)as

mmJ2(z) =Ir(z)I2 =rT(z)r(z) = (ri(z))2 .2

i=1

Inotherwords,ourobjectiveistominimizethesumofthesquareoftheresiduals,i.e.,leastsquares. Thus,wesaythatz istheleast-squaressolutiontotheoverdeterminedsystemBz=g: z isthatz whichmakesJ2(z)thesumofthesquaresoftheresidualsassmallaspossible.

Note that if Bz = g does have a solution, the least-squares solution is the solution to theoverdeterminedsystem. Ifz isthesolution,thenr=Bzg=0andinparticularJ2(z)=0,which

istheminimumvaluethatJ2 cantake. Thus,thesolutionz istheminimizerofJ2: z=z . Letusnowderiveaprocedureforsolvingtheleast-squaresproblemforamoregeneralcasewhereBz=gdoesnothaveasolution.

Forconvenience,wedropthesubscript2oftheobjective functionJ2,andsimplydenote itbyJ. Again,ourobjective istofindz suchthat

J(z)< J(z),

z=z .ExpandingouttheexpressionforJ(z),wehave

J(z) = (gBz)T(gBz) = (gT(Bz)T)(gBz)=gT(gBz)(Bz)T(gBz)

T=g ggTBz(Bz)Tg+ (Bz)T(Bz)T TBT=g ggTBzz g+zTBTBz ,

wherewehaveusedthetransposerulewhichtellsusthat(Bz)T =zTBT. WenotethatgTBz isascalar,so itdoesnotchangeunderthetransposeoperation. Thus,gTBz canbeexpressedas

TBTgTBz= (gTBz)T =z g ,againbythetransposerule. ThefunctionJ thussimplifiesto

T TBTJ(z) =g g2z g+zTBTBz .Forconvenience, letusdefineNBTBRnn,sothat

T TBTJ(z) =g g2z g+zTN z .247

7/27/2019 MIT2 086F12 Notes Unit3

54/107

It issimpletoconfirmthateachterm intheaboveexpression is indeedascalar.Thesolutiontotheminimizationproblem isgivenby

Nz =d ,whered=BTg. Theequation iscalledthenormalequation,whichcanbewrittenoutas

N11 N12 N1n d1z1

z2... = .N21 N22 N2n. . ... . . ... . . d2...

"

Nn1 Nn2 Nnn zn dn

'

Theexistenceanduniquenessofz isguaranteedassumingthatthecolumnsofBareindependent.Weprovidebelowtheproofthatz istheuniqueminimizerofJ(z)inthecaseinwhichBhas

independentcolumns.

Proof. WefirstshowthatthenormalmatrixN issymmetricpositivedefinite,i.e.xTN x>0, xRn (x=0) ,

assumingthecolumnsofB are linearly independent. ThenormalmatrixN =BTB issymmetricbecause

"

NT = (BTB)T =BT(BT)T =BTB=N .ToshowN ispositivedefinite,wefirstobservethat

xTN x=xTBTBx= (Bx)T(Bx) =IBxI2 .That is, xTN x is the 2-norm of Bx. Recall that the norm of a vector is zero if and only if thevectoristhezerovector. Inourcase,

xTNx=0 ifandonly if Bx= 0 .

'

BecausethecolumnsofB are linearly independent,Bx=0 ifandonlyifx=0. Thus,wehavexTNx=IBxI2 >0, x= 0 .

Thus,N issymmetricpositivedefinite.Nowrecallthatthefunctiontobeminimizedis

T TBTJ(z) =g g2z g+zTN z .Ifz minimizesthefunction,thenforanyz=0,wemusthave

J(z)< J(z +z) ;

Letus

expand

J(z

+

z):

T J(z +z) =g g2(z +z)TBTg+ (z +z)TN(z +z),

T BT=g g2z g+ (z)TNz2zTBTg+zTN z+ (z)TN z +zTN z ,v vJ(z) zTNTz=zTN z

=J(z) + 2zT(NzBTg) +zTN z .

248

=

=

=

7/27/2019 MIT2 086F12 Notes Unit3

55/107

Note that NT = N because NT = (BTB)T = BTB = N. If z satisfies the normal equation,Nz =BTg,then

NzBTg= 0 ,andthus

J(z +z) =J(z) +zTNz .Thesecondterm isalwayspositivebecauseN ispositivedefinite. Thus,wehave

J(z +z)> J(z), z= 0 ,or,settingz=zz ,

J(z)< J(z), z=z .Thus, z satisfying the normal equation N z =BTg is the minimizer of J, i.e., the least-squares

solutiontotheoverdeterminedsystemBz=g.

Example17.3.3 21 least-squaresand itsgeometric interpretationConsiderasimplecaseofaoverdeterminedsystem,

B= 2 z = 1 .1 2

Because the system is 21, there is a single scalar parameter, z, to be chosen. To obtain thenormalequation,wefirstconstructthematrixN andthevectord(bothofwhicharesimplyscalarforthisproblem):

2N =BTB= 2 1 = 51

d=BTg= 2 1 1 = 4 .2

Solvingthenormalequation,weobtaintheleast-squaressolution Nz =d 5z = 4 z = 4/5 .

Thischoiceofz yields2 4 8/5

Bz = = ,1 5 4/5whichofcourse isdifferentfromg.

TheprocessisillustratedinFigure17.5. ThespanofthecolumnofB isthelineparameterizedT

by 2 1 z,zR. RecallthatthesolutionBz isthepointonthe linethat isclosesttog inthe least-squaressense, i.e.

IBzgI2

7/27/2019 MIT2 086F12 Notes Unit3

56/107

0.5 0 0.5 1 1.5 2 2.5

0.5

0

0.5

1

1.5

2

2.5

col(B)

span(col(B))

g

Bz*

Figure17.5: Illustrationof least-squares inR2.Recallingthatthe2 distanceistheusualEuclideandistance,weexpecttheclosestpointtobetheorthogonalprojectionofg ontothe line span(col(B)). Thefigureconfirmsthatthis indeed isthecase. Wecanverifythisalgebraically,

2 4 1 3/5BT(Bzg) = 2 1 = 2 1 = 0 .

1 5 2 6/5Thus,theresidualvectorBz g andthecolumnspaceofB areorthogonaltoeachother. Whilethegeometric illustrationoforthogonalitymaybedifficult forahigher-dimensional leastsquares,theorthogonalityconditioncanbecheckedsystematicallyusingthealgebraicmethod.

17.3.3 ComputationalConsiderationsLet us analyze the computational cost of solving the least-squares system. The first step is theformulationofthenormalmatrix,

N =BTB ,whichrequiresamatrix-matrixmultiplicationofBT Rnm andBRmn. BecauseN issymmetric,weonlyneedtocomputetheuppertriangularpartofN,whichcorrespondstoperformingn(n+1)/2 m-vector inner products. Thus, the computational cost is mn(n+1). Forming theright-handside,

mit2 086f12 notes unit3

Documents