mit2 086f12 notes unit3

Upload: rogerio-vilella-junqueira

Post on 13-Apr-2018

248 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 MIT2 086F12 Notes Unit3

    1/107

    DRAFTV1.2

    From

    Math,Numerics,&Programming(forMechanicalEngineers)

    MasayukiYanoJamesDouglassPennGeorgeKonidarisAnthonyTPatera

    September2012

    The Authors.License: Creative Commons Attribution-Noncommercial-Share Alike 3.0

    (CC BY-NC-SA 3.0), which permits unrestricted use, distribution, and reproduction

    in any medium, provided the original authors and MIT OpenCourseWare source

    are credited; the use is non-commercial; and the CC BY-NC-SA license is

    retained. See also http://ocw.mit.edu/terms/ .

    http://creativecommons.org/licenses/by-nc-sa/3.0/us/http://ocw.mit.edu/terms/http://ocw.mit.edu/terms/http://creativecommons.org/licenses/by-nc-sa/3.0/us/
  • 7/27/2019 MIT2 086F12 Notes Unit3

    2/107

  • 7/27/2019 MIT2 086F12 Notes Unit3

    3/107

    ContentsIII LinearAlgebra1:MatricesandLeastSquares.Regression. 19915Motivation 20116MatricesandVectors:DefinitionsandOperations 203

    16.1 Basic Vector and Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 20316.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

    Transpose Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20416.1.2 Vector Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

    Inner Product. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207Norm (2-Norm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211Orthonormality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

    16.1.3 Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213Vector Spaces and Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

    16.2 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21716.2.1 Interpretation of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21716.2.2 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

    Matrix-Matrix Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21916.2.3 InterpretationsoftheMatrix-VectorProduct . . . . . . . . . . . . . . . . . . 222

    Row Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222Column Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223Left Vector-Matrix Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

    16.2.4 InterpretationsoftheMatrix-MatrixProduct . . . . . . . . . . . . . . . . . . 224Matrix-MatrixProductasaSeriesofMatrix-VectorProducts . . . . . . . . . 224Matrix-MatrixProductasaSeriesofLeftVector-MatrixProducts . . . . . . 225

    16.2.5 Operation Count of Matrix-Matrix Product . . . . . . . . . . . . . . . . . . . 22516.2.6 The Inverse of a Matrix (Briefly) . . . . . . . . . . . . . . . . . . . . . . . . . 226

    16.3 Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22716.3.1 Diagonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22816.3.2 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22816.3.3 Symmetric Positive Definite Matrices. . . . . . . . . . . . . . . . . . . . . . . 22816.3.4 Triangular Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23016.3.5 Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23016.3.6 Orthonormal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

    16.4 Further Concepts in Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . 23316.4.1 Column Space and Null Space . . . . . . . . . . . . . . . . . . . . . . . . . . 23316.4.2 Projectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

    3

  • 7/27/2019 MIT2 086F12 Notes Unit3

    4/107

    17LeastSquares 23717.1 Data Fitting in Absence of Noise and Bias . . . . . . . . . . . . . . . . . . . . . . . 23717.2 Overdetermined Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

    Row Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243Column Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

    17.3 Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24517.3.1 Measures of Closeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24517.3.2 Least-SquaresFormulation(2 minimization) . . . . . . . . . . . . . . . . . . 24617.3.3 Computational Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 250

    QRFactorizationandtheGram-SchmidtProcedure . . . . . . . . . . . . . . 25117.3.4 InterpretationofLeastSquares: Projection . . . . . . . . . . . . . . . . . . . 25317.3.5 Error Bounds for Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . 255

    ErrorBoundswithRespecttoPerturbationinData,g(constantmodel) . . . 255ErrorBoundswithRespecttoPerturbationinData,g(general) . . . . . . . 256ErrorBoundswithRespecttoReductioninSpace,B . . . . . . . . . . . . . 262

    18MatlabLinearAlgebra(Briefly) 26718.1 Matrix Multiplication (and Addition) . . . . . . . . . . . . . . . . . . . . . . . . . . 26718.2 TheMatlab InverseFunction: inv . . . . . . . . . . . . . . . . . . . . . . . . . . . 26818.3 SolutionofLinearSystems:Matlab Backslash . . . . . . . . . . . . . . . . . . . . . 26918.4 Solution of (Linear) Least-Squares Problems. . . . . . . . . . . . . . . . . . . . . . . 269

    19Regression: Statistical Inference 27119.1 Simplest Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

    19.1.1 FrictionCoefficientDeterminationProblemRevisited . . . . . . . . . . . . . 27119.1.2 Response Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27219.1.3 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27519.1.4 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

    Individual Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . 278Joint Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

    19.1.5 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28419.1.6 Inspection of Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

    CheckingforPlausibilityoftheNoiseAssumptions . . . . . . . . . . . . . . . 285Checking for Presence of Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

    19.2 General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28719.2.1 Response Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28719.2.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28819.2.3 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28919.2.4 Overfitting (and Underfitting) . . . . . . . . . . . . . . . . . . . . . . . . . . 290

    4

  • 7/27/2019 MIT2 086F12 Notes Unit3

    5/107

    UnitIIILinearAlgebra1:MatricesandLeast

    Squares.Regression.

    199

  • 7/27/2019 MIT2 086F12 Notes Unit3

    6/107

  • 7/27/2019 MIT2 086F12 Notes Unit3

    7/107

    Chapter15MotivationDRAFT V1.2 The Authors. License: Creative Commons BY-NC-SA 3.0.

    In odometry-based mobile robot navigation, the accuracy of the robots dead reckoning posetracking depends on minimizing slippage between the robots wheels and the ground. Even amomentary slip can lead to an error in heading that will cause the error in the robots locationestimatetogrowlinearlyoveritsjourney. Itisthusimportanttodeterminethefrictioncoefficientbetweentherobotswheelsandtheground,whichdirectlyaffectstherobotsresistancetoslippage.Just as importantly, this friction coefficient will significantly affect the performance of the robot:theabilitytopushloads.

    WhenthemobilerobotofFigure15.1iscommandedtomoveforward,anumberofforcescomeinto play. Internally, the drive motors exert a torque (not shown in the figure) on the wheels,whichisresistedbythefrictionforceFf betweenthewheelsandtheground. IfthemagnitudeofFfdictatedbythesumofthedragforceFdrag (acombinationofallforcesresistingtherobotsmotion)andtheproductoftherobotsmassandaccelerationislessthanthemaximumstaticfrictionforceFmax between the wheels and the ground, the wheels will roll without slipping and the robotf,staticwillmoveforwardwithvelocityv=rwheel. If,however,themagnitudeofFf reachesFmax ,thef,staticwheelswillbegintoslipandFf willdroptoa lower levelFf,kinetic,thekinetic friction force. Thewheels will continue to slip (v

  • 7/27/2019 MIT2 086F12 Notes Unit3

    8/107

    0 1 2 3 4 5 6 7 8

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    time (seconds)ime seconds)

    Ffrictionriction vs.s Timime foror 50000 Grramm Loadoad

    Ff, kiineeticc = k Fnorormaal + expxperirimenntalal errorrror

    Ffriction

    o(Newtons)

    Newto

    Ff, stattaticc = Ftaangengentiaal, applpplieded s Fnorormaal

    measas

    = s Fnorormaal + expxperiimenntalal errorrrorFf, stattaticcmax,x measeas

    Figure15.2: Experimentalsetup for frictionmeasurement: Force transducer (A) is con- Figure 15.3: Sample data for one frictionnected to contact area (B) by a thin wire. measurement, yielding one data point for

    Fmax,measNormal force is exerted on the contact area . DatacourtesyofJamesPenn.f,

    static

    by load stack (C). Tangential force is appliedusingturntable(D)viathefrictionbetweentheturntablesurfaceandthecontactarea. ApparatusandphotographcourtesyofJamesPenn.staticfrictionforce. Weexpectthat

    Fmaxf,static =sFnormal,rear , (15.1)wheres isthestaticcoefficientoffrictionandFnormal,rear isthenormalforcefromthegroundonthe rear, driving, wheels. In order to minimize the risk of slippage (and to be able to push largeloads),robotwheelsshouldbedesignedforahighvalueofs betweenthewheelsandtheground.Thisvalue,althoughdifficulttopredictaccuratelybymodeling,canbedeterminedbyexperiment.

    WefirstconductexperimentsforthefrictionforceFmax (inNewtons)asafunctionofnormalf,staticload Fnormal,applied (in Newtons) and (nominal) surface area of contact Asurface (in cm2) with thefrictionturntableapparatusdepictedinFigure15.2. Weightspermitustovarythenormalloadandwasherinsertspermitustovarythenominalsurfaceareaofcontact. Atypicalexperiment(ataparticularprescribedvalueofFnormal,applied andAsurface)yieldsthetimetraceofFigure15.3fromwhichtheFmax,means (ourmeasurement ofFmax ) isdeducedasthemaximumoftheresponse.f,static f,static

    Wenextpostulateadependence(ormodel)Fmaxf,static(Fnormal,applied, Asurface) =0+1Fnormal,applied+2Asurface , (15.2)

    where we expect but do notapriori assume from Messieurs Amontons and Coulomb that0 =0and2 =0(andofcourse1 s). Inordertoconfirmthat 0 =0and 2 =0 oratleastconfirmthat0 =0and2 =0 isnotuntrueandtofindagoodestimatefor1 s,wemustappealtoourmeasurements.

    The mathematical techniques by which to determine s (and 0, 2) with some confidencefromnoisyexperimentaldataisknownasregression,whichisthesubjectofChapter19. Regression,inturn,isbestdescribedinthelanguageoflinearalgebra(Chapter16),andisbuiltuponthelinearalgebraconceptof leastsquares(Chapter17).

    202

  • 7/27/2019 MIT2 086F12 Notes Unit3

    9/107

    Chapter16MatricesandVectors:DefinitionsandOperations

    16.1 BasicVectorandMatrixOperations16.1.1 DefinitionsLet us first introduce the primitive objects in linear algebra: vectors and matrices. A m-vectorvRm1 consistsofmrealnumbers1

    v=

    v1v2...

    vm

    .

    Itisalsocalledacolumnvector,whichisthedefaultvectorinlinearalgebra. Thus,byconvention,vRm impliesthatv isacolumnvector inRm1. Notethatweusesubscript()i toaddressthe

    R1ni-th component of a vector. The other kind of vector is a row vector v consisting of nentries

    v= v1 v2 vn .Letusconsiderafewexamplesofcolumnandrowvectors.Example16.1.1vectorsExamplesof(column)vectors inR3 are

    1 3 9.1v= 3, u= 7, and w= 7/3 .

    6 1Theconceptofvectorsreadilyextendstocomplexnumbers,butweonlyconsiderrealvectorsinourpresentation

    ofthischapter.

    203

    DRAFT V1.2 The Authors. License: Creative Commons BY-NC-SA 3.0.

    http://creativecommons.org/licenses/by-nc-sa/3.0/us/http://creativecommons.org/licenses/by-nc-sa/3.0/us/
  • 7/27/2019 MIT2 086F12 Notes Unit3

    10/107

    To address a specific component of the vectors, we write, for example, v1 = 1, u1 = 3, andw3 =. Examplesofrowvectors inR14 are v= 2 5 2 e and u= 1 1 0 .

    Someofthecomponentsoftheserowvectorsarev2 =5andu4 =0.

    AmatrixARmn consistsofmrowsandncolumnsforthetotalofmnentries,A=

    A11 A12 A1nA21 A22 A2n

    ... ... ... ...Am1 Am2 Amn

    .Extendingtheconventionforaddressinganentryofavector,weusesubscript()ij toaddresstheentry on the i-th row andj-th column. Note that the order in which the row and column arereferredfollowsthatfordescribingthesizeofthematrix. Thus,ARmn consistsofentries

    Aij, i= 1, . . . , m , j= 1, . . . , n .Sometimesitisconvenienttothinkofa(column)vectorasaspecialcaseofamatrixwithonlyonecolumn, i.e.,n=1. Similarly,a(row)vectorcanbethoughtofasaspecialcaseofamatrixwithm=1. Conversely,anmnmatrixcanbeviewedasmrown-vectorsorncolumnm-vectors,aswediscussfurtherbelow.Example16.1.2matricesExamplesofmatricesare

    1 3 0 0 1 4 9 and B= 2 8 1A= . 3 0 3 0

    ThematrixA isa32matrix(AR32)andmatrixB isa33matrix(BR33). Wecanalsoaddressspecificentriesas,forexample,A12 = 3,A31 =4,andB32 =3.

    While vectors and matrices may appear like arrays of numbers, linear algebra defines specialsetofrulestomanipulatetheseobjects. Onesuchoperationisthetransposeoperationconsiderednext.TransposeOperationThefirstlinearalgebraoperatorweconsideristhetransposeoperator,denotedbysuperscript()T.The transpose operator swaps the rows and columns of the matrix. That is, if B = AT withARmn,then

    Bij =Aji , 1in, 1jm .Because the rows and columns of the matrix are swapped, the dimensions of the matrix are alsoswapped, i.e., ifARmn thenBRnm.

    If we swap the rows and columns twice, then we return to the original matrix. Thus, thetransposeofatransposedmatrix istheoriginalmatrix, i.e.

    (AT)T =A .204

  • 7/27/2019 MIT2 086F12 Notes Unit3

    11/107

    Example16.1.3transposeLetusconsiderafewexamplesoftransposeoperation. AmatrixAand itstransposeB=AT arerelatedby

    1 3A=

    4 9

    and B= 1 4 .

    3 9 3

    3

    Therowsandcolumnsareswapped inthesensethatA31 =B13 = andA12 =B21 = 3. Also,becauseAR32,BR23. Interpretingavectorasaspecialcaseofamatrixwithonecolumn,wecanalsoapplythetransposeoperatortoacolumnvectortocreatearowvector, i.e.,given

    3 7v= ,

    thetransposeoperationyields Tu=v = 3 7 .

    Notethatthetransposeofacolumnvector isarowvector,andthetransposeofarowvector isacolumnvector.

    16.1.2 VectorOperationsThefirstvectoroperationweconsiderismultiplicationofavectorvRm byascalarR. Theoperationyields

    u=v ,where

    each

    entry

    of

    u

    Rm isgivenbyui =vi, i= 1, . . . , m .

    Inotherwords,multiplicationofavectorbyascalarresultsineachcomponentofthevectorbeingscaledbythescalar.

    ThesecondoperationweconsiderisadditionoftwovectorsvRm andwRm. Theadditionyields

    u=v+w ,whereeachentryofuRm isgivenby

    ui =vi+wi, i= 1, . . . , m.In order for addition of two vectors to make sense, the vectors must have the same number ofcomponents. Each entry of theresultingvector is simply thesum ofthe corresponding entriesofthetwovectors.

    Wecansummarizetheactionofthescalingandadditionoperations inasingleoperation. LetvRm,wRm andR. Then,theoperation

    u=v+w205

  • 7/27/2019 MIT2 086F12 Notes Unit3

    12/107

    0 0.5 1 1.50

    0.5

    1

    1.5

    v

    u=v

    0 0.5 1 1.50

    0.5

    1

    1.5

    v

    wu=v+w

    (a) scalarscaling (b) vectoradditionFigure16.1: Illustrationofvectorscalingandvectoraddition.

    yieldsavectoruRm whoseentriesaregivenbyui =vi+wi, i= 1, . . . , m .

    The result is nothing more than a combination of the scalar multiplication and vector additionrules.Example16.1.4vectorscalingandaddition inR2Letus illustratescalingofavectorbyascalarandadditionoftwovectors inR2 using

    1 1/2 3v= , w= , and = .

    1/3 1 2First,letusconsiderscalingofthevectorv bythescalar. Theoperationyields

    3 1 3/2u=v= = .

    2 1/3 1/2This operation is illustrated in Figure 16.1(a). The vector v is simply stretched by the factor of3/2whilepreservingthedirection.

    Now, letusconsideradditionofthevectorsv andw. Thevectoradditionyields1 1/2 3/2

    u=v+w= + = .1/3 1 4/3

    Figure 16.1(b) illustrates the vector addition process. We translate w so that it starts from thetip of v to form a parallelogram. The resultant vector is precisely the sum of the two vectors.NotethatthegeometricintuitionforscalingandadditionprovidedforR2 readilyextendstohigherdimensions.

    206

  • 7/27/2019 MIT2 086F12 Notes Unit3

    13/107

    Example16.1.5vectorscalingandaddition inR3T T

    Letv= 1 3 6 ,w= 2 1 0 , and=3. Then, 1 2 1 6 7

    u=v+w=

    3

    + 3

    1

    =

    3

    +

    3

    =

    0

    .6 0 6 0 6

    InnerProductAnother important operation is the inner product. This operation takes two vectors of the samedimension,vRm andwRm,andyieldsascalarR:

    mm=vTw where = viwi .

    i=1Theappearanceofthetransposeoperatorwillbecomeobviousonceweintroducethematrix-matrixmultiplicationrule. TheinnerproductinaEuclideanvectorspaceisalsocommonlycalledthedotproductandisdenotedby=vw. Moregenerally,theinnerproductoftwoelementsofavectorspace isdenotedby(,), i.e.,= (v,w).Example16.1.6 innerproduct

    T TLetusconsidertwovectorsinR3,v= 1 3 6 andw= 2 1 0 . Theinnerproductofthesetwovectors is

    m3

    T=v w= viwi = 12 + 3(1)+60 =1 .i=1

    Norm(2-Norm)Usingtheinnerproduct,wecannaturallydefinethe2-normofavector. GivenvRm,the2-normofv,denotedbyIvI2, isdefinedby

    mmi=1

    Note

    that

    the

    norm

    of

    any

    vector

    is

    non-negative,

    because

    it

    is

    a

    sum

    m

    non-negative

    numbers

    IvI2 = vTv= 2vi .

    g(squaredvalues). The2 normistheusualEuclideanlength;inparticular,form=2,theexpression2 2simplifiestothefamiliarPythagoreantheorem,IvI2 = v1 +v2. Whilethereareothernorms,we

    almostexclusivelyusethe2-norm inthisunit. Thus, fornotationalconvenience,wewilldropthesubscript2andwritethe2-normofv asIvIwiththe implicitunderstandingI I I I2.

    Bydefinition,anynormmustsatisfythetriangle inequality,Iv+wI IvI+IwI ,

    207

  • 7/27/2019 MIT2 086F12 Notes Unit3

    14/107

    for any v,w Rm. The theorem states that the sum of the lengths of two adjoining segmentsis longer than the distance between their non-joined end points, as is intuitively clear from Figure16.1(b). Fornormsdefinedby innerproducts,asour2-normabove,thetriangle inequality isautomaticallysatisfied.

    Proof. Fornormsinducedbyaninnerproduct,theproofofthetriangleinequalityfollowsdirectlyfromthedefinitionofthenormandtheCauchy-Schwarzinequality. First,weexpandtheexpressionas

    T T TIv+wI2 = (v+w)T(v+w) =v v+ 2v w+w w .ThemiddletermscanbeboundedbytheCauchy-Schwarz inequality,whichstatesthat

    T Tv w |v w| IvIIwI .Thus,wecanboundthenormas

    Iv+w

    I2

    Iv

    I2+ 2

    Iv

    IIw

    I+

    Iw

    I2 = (

    Iv

    I+

    Iw

    I)2 ,

    andtakingthesquarerootofbothsidesyieldsthedesiredresult.

    Example16.1.7normofavectorT T

    Letv= 1 3 6 andw= 2 1 0 . The2 normsofthesevectorsare3m g

    2IvI= vi = 12+ 32+ 62 = 46i=1

    3m g 2and IwI= w = 22+ (1)2+ 02 = 5 .ii=1

    Example16.1.8triangle inequalityLetus illustratethetriangle inequalityusingtwovectors

    1 1/2v= and w= .

    1/3 1Thelength(orthenorm)ofthevectorsare

    10 5IvI= 1.054 and IwI= 1.118 .9 4

    Ontheotherhand,thesumofthetwovectors is1 1/2 3/2

    v+w= + = ,1/3 1 4/3

    208

  • 7/27/2019 MIT2 086F12 Notes Unit3

    15/107

    0 0.5 1 1.50

    0.5

    1

    1.5

    v

    w

    v+w

    ||v||

    ||w||

    Figure16.2: Illustrationofthetriangle inequality.andits length is

    145Iv+wI= 2.007 .6

    Thenormofthesum isshorterthanthesumofthenorms,which isIvI+IwI 2.172 .

    ThisinequalityisillustratedinFigure16.2. Clearly,thelengthofv+ wisstrictlylessthanthesumofthelengthsofvandw(unlessvandwalignwitheachother,inwhichcaseweobtainequality).

    In

    two

    dimensions,

    the

    inner

    product

    can

    be

    interpreted

    as

    Tv w=IvIIwIcos() , (16.1)

    where istheanglebetweenv andw. Inotherwords,the innerproduct isameasureofhowwellv and w align with each other. Note that we can show the Cauchy-Schwarz inequality from theaboveequality. Namely, |cos()| 1 impliesthat

    T|v w|=IvIIwI|cos()| IvIIwI .In particular, we see that the inequality holds with equality if and only if = 0 or , whichcorrespondstothecaseswherevandwalign. ItiseasytodemonstrateEq.(16.1)intwodimensions.

    Proof. Notingv,wR2,weexpresstheminpolarcoordinatescos(v) cos(w)v=IvI and w=IwI .sin(v) sin(w)

    209

  • 7/27/2019 MIT2 086F12 Notes Unit3

    16/107

    Theinnerproductofthetwovectorsyield2m

    T=v w= viwi =IvIcos(v)IwIcos(w) +IvIsin(v)IwIsin(w)i=1

    =IvIIwI cos(v)cos(w)+sin(v)sin(w)

    1 iv iv) 1 iw 1 iv iv)1 iw iw)=IvIIwI (e +e (e +eiw) + (e e (e e2 2 2i 2i1 i(v+w) i(v+w) i(vw) i(vw)=IvIIwI e +e +e +e4

    1 i(v+w) i(v+w)ei(vw)ei(vw) e +e4

    1 i(vw) i(vw)=IvIIwI e +e2

    =IvIIwIcos(vw) =IvIIwIcos() ,wherethe lastequalityfollowsfromthedefinition

    v

    w.

    BeginAdvancedMaterialForcompleteness, letus introduceamoregeneralclassofnorms.

    Example16.1.9p-normsThe2-norm,whichwewillalmostexclusivelyuse,belongtoamoregeneralclassofnorms,calledthep-norms. Thep-normofavectorvRm is

    1/p

    m

    m IvIp = |vi|p ,i=1

    wherep1. Anyp-normsatisfiesthepositivityrequirement,thescalarscalingrequirement,andthetriangleinequality. Weseethat2-norm isacaseofp-normwithp=2.

    Anothercaseofp-normthatwefrequentlyencounteristhe1-norm,whichissimplythesumoftheabsolutevalueoftheentries, i.e.

    mmIvI1 = |vi| .

    i=1Theotherone isthe infinitynormgivenby

    IvI = lim IvIp = max |vi| .p i=1,...,m

    Inotherwords,the infinitynormofavectoris its largestentry inabsolutevalue.

    EndAdvancedMaterial

    210

  • 7/27/2019 MIT2 086F12 Notes Unit3

    17/107

    5 4 3 2 1 0 1 2 3 40

    1

    2

    3

    4

    5

    6

    7

    u

    v

    w

    Figure16.3: Setofvectorsconsideredto illustrateorthogonality.OrthogonalityTwovectorsvRm andwRm aresaidtobeorthogonaltoeachother if

    Tv w= 0 .Intwodimensions, it iseasytoseethat

    Tv w=IvIIwIcos() = 0 cos() = 0 =/2 .That is, the angle between v and w is /2, which is the definition of orthogonality in the usualgeometricsense.Example16.1.10orthogonalityLetusconsiderthreevectorsinR2,

    4 3 0u= , v= , and w= ,

    2 6 5andcomputethree innerproductsformedbythesevectors:

    Tu v=43 + 26 = 0Tu w=40 + 25=10Tv w= 30 + 65=30 .

    T T

    Becauseu v=0,thevectorsuandv areorthogonaltoeachother. Ontheotherhand,u w = 0andthevectorsuandwarenotorthogonaltoeachother. Similarly,vandwarenotorthogonaltoeachother. ThesevectorsareplottedinFigure16.3;thefigureconfirmsthatuandvareorthogonalintheusualgeometricsense.

    211

  • 7/27/2019 MIT2 086F12 Notes Unit3

    18/107

    1 0.5 0 0.5 10

    0.5

    1

    u

    v

    Figure16.4: Anorthonormalsetofvectors.OrthonormalityTwovectorsvRm andwRm aresaidtobeorthonormaltoeachother iftheyareorthogonaltoeachotherandeachhasunit length, i.e.

    Tv w=0 and IvI=IwI= 1.Example

    16.1.11

    orthonormality

    Twovectors

    1 2 1 1u= and v=

    1 2areorthonormaltoeachother. Itisstraightforwardtoverifythattheyareorthogonaltoeachother

    5 5

    T T1 2 1 1 1 2 1Tu v= = = 0

    1 2 5 1 2andthateachofthemhaveunit length

    1

    5 5

    IuI= ((2)2+ 12) = 15

    1IvI= ((1)2+ 22) = 1.5

    Figure 16.4 shows that the vectors are orthogonal and have unit length in the usual geometricsense.

    16.1.3 LinearCombinationsLet

    us

    consider

    aset

    of

    n m-vectors

    v1 Rm, v2 Rm, . . . , vn Rm .

    A linearcombinationofthevectors isgivenbynm

    w= jvj ,j=1

    where1, 2, . . . , n isasetofrealnumbers,andeachvj isanm-vector.212

  • 7/27/2019 MIT2 086F12 Notes Unit3

    19/107

    Example16.1.12 linearcombinationofvectorsT T T

    1 2 3Let us consider three vectors inR2, v = 4 2 , v = 3 6 , and v = 0 5 . Alinearcombinationofthevectors,with1 =1,2 =2,and3 =3, is

    3

    m 4 3 0

    jvj = 1 + (2) + 3w=2 6 5

    j=14 6 0 10

    = + + = .2 12 15 5

    Anotherexampleof linearcombination,with1 =1,2 =0,and3 =0, is3

    2 6 5 2j=1

    Notethatalinearcombinationofasetofvectors issimplyaweightedsumofthevectors.

    Linear IndependenceAsetofn m-vectorsare linearlyindependent if

    m 4 3 0 4jvj = 1 + 0 + 0 =w= .

    n

    jvj =0 only if 1 =2 = =n = 0 .j=1

    Otherwise,thevectorsare linearlydependent.

    m

    Example16.1.13 linear independenceLetusconsiderfourvectors, 2 0 0 2 0, 2w = 0, 3w = 1, and 4w = 01w = .

    0 3 0 5

    1 2 4} is linearlydependentbecauseThesetofvectors{w , w , w 2 0 2 0

    5 5 0+ 01 0= 01 21w4 = 11w + ;w3 3

    0 3 5 0thelinearcombinationwiththeweights

    {1,5/3,

    1

    }producesthezerovector. Notethatthechoice

    oftheweightsthatachievesthisisnotunique;wejustneedtofindonesetofweightstoshowthatthevectorsarenot linearly independent(i.e.,are linearlydependent).

    1 2Ontheotherhand,thesetofvectors{w , w , w3}islinearlyindependent. Consideringalinearcombination,

    2 0 0 01 2 31 +2 +3 =1w w w 0+2 0+3 1= 0,

    0 3 0 0213

  • 7/27/2019 MIT2 086F12 Notes Unit3

    20/107

    we see that we must choose 1 = 0 to set the first component to 0, 2 = 0 to set the thirdcomponent to 0, and 3 = 0 to set the second component to 0. Thus, only way to satisfy the

    1 2equation is to choose the trivial set of weights, {0,0,0}. Thus, the set of vectors{w , w , w3} islinearlyindependent.

    BeginAdvancedMaterial

    VectorSpacesandBasesGivenasetofn m-vectors,wecanconstructavectorspace,V,givenby

    1 2V =span({v , v , . . . , vn}),where n

    1 2, v , . . . , v vRm :v= km vk, k Rnn}) =span({v

    k=11 2 n=spaceofvectorswhicharelinearcombinationsofv , v , . . . , v .

    1In general we do not require the vectors{v , . . . , vn} to be linearly independent. When they arelinearlyindependent,theyaresaidtobeabasisofthespace. Inotherwords,abasisofthevectorspaceV isasetoflinearlyindependentvectorsthatspansthespace. Aswewillseeshortlyinourexample, there are many bases for any space. However, the number of vectors in any bases for agiven space isunique, and that number is called the dimension ofthe space. Letusdemonstratethe ideainasimpleexample.Example16.1.14Bases foravectorspace inR3LetusconsideravectorspaceV spannedbyvectors

    1 2 01

    v = 2 2v = 1 and 3v = 1.

    0 0 0Bydefinition,anyvectorxV isoftheform

    1 2 0 1+ 221 2 3x=1v +2v +3v =1 2+2 1+3 1= 21+2+3.

    0 0 0 0Clearly, we can express any vector of the form x = (x1, x2,0)T by choosing the coefficients 1,2,and3judiciously. Thus,ourvectorspaceconsistsofvectorsofthe form(x1, x2,0)T, i.e.,allvectors inR3 withzero inthethirdentry.

    We also note that the selection of coefficients that achieves (x1, x2,0)T is not unique, as itrequiressolutiontoasystemoftwolinearequationswiththreeunknowns. Thenon-uniquenessof

    1 2thecoefficientsisadirectconsequenceof{v , v , v3}notbeinglinearlyindependent. Wecaneasilyverifythe lineardependencebyconsideringanon-trivial linearcombinationsuchas

    1 2 0 02v1v23v3 = 2 21 13 1= 0.

    0 0 0 0214

  • 7/27/2019 MIT2 086F12 Notes Unit3

    21/107

    Becausethevectorsarenotlinearly independent,theydonotformabasisofthespace.To choose a basis for the space, we first note that vectors in the space V are of the form

    (x1, x2,0)T. Weobservethat,forexample, 1 0

    1w =

    0

    and 2w =

    1

    0 0

    wouldspanthespacebecauseanyvectorinV avectoroftheform(x1, x2,0)Tcanbeexpressedasa linearcombination,

    1 0 1x1 1 2=1w +2w =1 0+2 0= 2,x20 0 1 0

    2 1bychoosing1 =x1 and2 =x . Moreover,w1 andw2 arelinearlyindependent. Thus,{w , w2}is1 2 1abasisforthespaceV. Unliketheset{v , v , v3}whichisnotabasis,thecoefficientsfor{w , w2}

    thatyieldsxV isunique. Becausethebasisconsistsoftwovectors,thedimensionofV istwo.This issuccinctlywrittenas

    dim(V) = 2 .Because a basis for a given space is not unique, we can pick a different set of vectors. For

    example, 1 2

    1z = 2 and 2z = 1,0 0

    1is also a basis for V. Since z is not a constant multiple of z2, it is clear that they are linearlyindependent. We need to verify that they span the space V. We can verify this by a directargument,

    1 2 1+ 22x1 1 2=1z +2z =1 2+2 1= 21+2.x20 0 0 0

    Weseethat, forany(x1, x2,0)T,wecanfindthe linearcombinationofz1 andz2 bychoosingthecoefficients 1 = (x1 + 2x2)/3 and 2 =(2x1x2)/3. Again, the coefficients that represents x

    1using{z , z2}areunique.ForthespaceV,andforanygivenbasis,wecanfindauniquesetoftwocoefficientstorepresent

    anyvector

    x

    V

    .

    Inother

    words,

    any

    vector

    in

    V

    is

    uniquely

    described

    by

    two

    coefficients,

    or

    parameters. Thus,abasisprovidesaparameterizationofthevectorspaceV. Thedimensionofthespaceistwo,becausethebasishastwovectors,i.e.,thevectorsinthespaceareuniquelydescribedbytwoparameters.

    Whiletherearemanybasesforanyspace,therearecertainbasesthataremoreconvenientto

    work with than others. Orthonormal bases bases consisting of orthonormal sets of vectors are such a class of bases. We recall that two set of vectors are orthogonal to each other if their

    215

  • 7/27/2019 MIT2 086F12 Notes Unit3

    22/107

    1innerproductvanishes. Inorderforasetofvectors{v , . . . , vn}tobeorthogonal,thevectorsmustsatisfy

    Inotherwords,thevectorsaremutuallyorthogonal. Anorthonormalsetofvectorsisanorthogonal1setofvectorswitheachvectorhavingnormunity. Thatis,thesetofvectors{v , . . . , vn}ismutually

    orthonormal if

    We note that an orthonormal set of vectors is linearly independent by construction, as we nowprove.

    1Proof. Let{v , . . . , vn}beanorthogonalsetof(non-zero)vectors. Bydefinition,thesetofvectorsislinearlyindependentiftheonlylinearcombinationthatyieldsthezerovectorcorrespondstoallcoefficientsequaltozero, i.e.

    1 1+n 1v +

    vn = 0

    =

    =n = 0 .

    Toverifythisindeedisthecaseforanyorthogonalsetofvectors,weperformtheinnerproductofthe linearcombinationwithv1, . . . , vn toobtain

    i)T(1 1 +n i)T 1 i)T i i)T n(v v + vn) =1(v v + +i(v v + +n(v v=iIviI2, i= 1, . . . , n .

    Notethat(vi)Tvj =0, i=j,duetoorthogonality. Thus,settingthe linearcombinationequaltozerorequires

    iIviI2 = 0, i= 1, . . . , n .Inotherwords,i = 0or

    Ivi

    I2 =0foreachi. Ifwerestrictourselvestoasetofnon-zerovectors,

    thenwemusthavei =0. Thus,avanishinglinearcombinationrequires1 = =n =0,whichisthedefinitionof linear independence.

    Because an orthogonal set of vectors is linearly independent by construction, an orthonormalbasis for a space V is an orthonormal set of vectors that spans V. One advantage of using anorthonormalbasis isthatfindingthecoefficients foranyvector inV isstraightforward. Suppose,

    1wehaveabasis{v , . . . , vn}andwishtofindthecoefficients1, . . . , n thatresultsinxV. Thatis,weare lookingforthecoefficientssuchthat

    1 i nx=1v + +iv + +nv .Tofindthei-thcoefficienti,wesimplyconsiderthe innerproductwithvi, i.e.

    1 ii)T(v x= (vi)T(1v + +iv + +nvn)i)T i)T i)T1 i n=1(v v + +i(v v + +n(v v

    i=i(vi)Tv =iIviI2 =i, i= 1, . . . , n ,wherethelastequalityfollowsfromIviI2 =1. Thatis,i = (vi)Tx,i= 1, . . . , n. Inparticular,foranorthonormalbasis,wesimplyneedtoperformn innerproductstofindthencoefficients. Thisis in contrast to anarbitrarybasis, whichrequiresa solution to annn linearsystem (which issignificantlymorecostly,aswewillsee later).

    216

    (vi)Tvj = 0, i=j .

    (vi)Tvj = 0, i=j

    vi= (vi)Tvi = 1, i= 1, . . . , n .

  • 7/27/2019 MIT2 086F12 Notes Unit3

    23/107

    Example16.1.15OrthonormalBasisLetusconsiderthespacevectorspaceV spannedby

    1 2 01v = 2 2v = 1 and 3v = 1.

    0 0 0RecallingeveryvectorinV isoftheform(x1, x2,0)

    T

    ,asetofvectors1 0

    1w = 0 and 2w = 10 0

    forms an orthonormal basis of the space. It is trivial to verify they are orthonormal, as they areorthogonal,i.e.,(w1)Tw2 =0,andeachvectorisofunitlengthIw1I=Iw2I=1. Wealsoseethatwecanexpressanyvectoroftheform(x1, x2,0)T bychoosingthecoefficients1 =x1 and2 =x2.

    1Thus, {w , w2} spans the space. Because the set of vectors spans the space and is orthonormal(andhence linearly independent),it isanorthonormalbasisofthespaceV.

    Anotherorthonormalsetofbasis isformedby 1 21w = 1 12 and 2w = 1.

    5 50 0Wecaneasily verify that they are orthogonalandeachhasa unit length. The coefficients for an

    1arbitraryvectorx= (x1, x2,0)T V represented inthebasis{w 2}are, wx11 1

    1 2 0 =1)T1 = (w (x1+ 2x2)x= x25 50

    x11 12 1 0 =2)T2 = (w (2x1x2).x= x2

    5 50

    EndAdvancedMaterial16.2 MatrixOperations16.2.1 InterpretationofMatricesRecallthatamatrixARmn consistsofmrowsandncolumnsforthetotalofmnentries,

    A=

    A11 A12 A1nA21 A22 A2n

    ... ... ... ...Am1 Am2 Amn

    .Thismatrixcanbe interpretedinacolumn-centricmannerasasetofncolumnm-vectors. Alternatively,thematrixcanbe interpreted inarow-centricmannerasasetofmrown-vectors. Eachofthese interpretations isusefulforunderstandingmatrixoperations,whichiscoverednext.

    217

  • 7/27/2019 MIT2 086F12 Notes Unit3

    24/107

    16.2.2 MatrixOperationsThefirstmatrixoperationweconsiderismultiplicationofamatrixARm1n1 byascalarR.Theoperationyields

    B=A ,whereeachentryofB

    Rm1n1 isgivenby

    Bij =Aij, i= 1, . . . , m1, j= 1, . . . , n1 .Similartothemultiplicationofavectorbyascalar,themultiplicationofamatrixbyascalarscaleseachentryofthematrix.

    The second operation we consider is addition of two matrices ARm1n1 and B Rm2n2.Theadditionyields

    C=A+B ,whereeachentryofCRm1n1 isgivenby

    Cij =Aij +Bij, i= 1, . . . , m1, j= 1, . . . , n1 .Inorderforadditionoftwomatricestomakesense,thematricesmusthavethesamedimensions,m1 andn1.

    Wecancombinethescalarscalingandadditionoperation. LetARm1n1,BRm1n1,andR. Then,theoperation

    C=A+ByieldsamatrixCRm1n1 whoseentriesaregivenby

    Cij =Aij +Bij, i= 1, . . . , m1, j= 1, . . . , n1 .Notethatthescalar-matrixmultiplicationandmatrix-matrixadditionoperationstreatthematricesasarraysofnumbers,operatingentrybyentry. Thisisunlikethematrix-matrixprodcut,whichisintroducednextafteranexampleofmatrixscalingandaddition.Example16.2.1matrixscalingandadditionConsiderthefollowingmatricesandscalar,

    1 3 0 2 4 9 , B= 2 3A= , and = 2 . 3 4

    Then, 1 3 0 2 1 3 + 4 4 9 + 2 2 3= 0 3 C=A+B= . 3 4 3 11

    218

  • 7/27/2019 MIT2 086F12 Notes Unit3

    25/107

    Matrix-MatrixProductLet us consider two matrices A Rm1n1 and B Rm2n2 with n1 = m2. The matrix-matrixproductofthematricesresults in

    C=ABwith

    n1

    Cij = AikBkj, i= 1, . . . , m1, j= 1, . . . , n2 .k=1

    Becausethesummation applies to the second indexofAandthefirst indexofB, the numberofcolumnsofAmustmatchthenumberofrowsofB: n1 =m2 must betrue. Letusconsidera fewexamples.Example16.2.2matrix-matrixproductLetusconsidermatricesAR32 andBR23 with

    m

    1 3

    2 3 5

    and B= .1 0

    1A= 4 90 3

    Thematrix-matrixproductyields 1 3 5 3 8

    2 3 5 4 9 = 1 12 11 C=AB= ,1 0 1

    0 3 3 0 3whereeachentryiscalculatedas

    2

    m

    C11 = A1kBk1 =A11B11+A12B21 = 12 + 31 = 5k=1m2

    C12 = A1kBk2 =A11B12+A12B22 = 13 + 30 = 3k=1m2

    C13 = A1kBk3 =A11B13+A12B23 = 1 5 + 3(1)=8k=1m2

    C21 = A2kBk1 =A21B11+A22B21 =42 + 91 = 1k=1

    ...m2C33 = A3kBk3 =A31B13+A32B23 = 0 5 + (3)(1)=3 .k=1

    NotethatbecauseAR32 andBR23,CR33.Thisisverydifferentfrom

    1 32 3 5 10 48= ,

    1 6D=BA= 4 91 0 10 3

    219

  • 7/27/2019 MIT2 086F12 Notes Unit3

    26/107

    whereeachentry iscalculatedas3

    D11 = A1kBk1 =B11A11+B12A21+B13A31 = 21 + 3(4)+(5)0 =10k=1

    ...

    m

    3

    D22 = A2kBk2 =B21A12+B22A22+B23A32 = 13 + 09 + (1)(3)=6.k=1

    NotethatbecauseBR23 andAR32,DR22. Clearly,C =AB=BA=D;C andD infacthavedifferent dimensions. Thus, matrix-matrixproduct isnot commutative in general, evenifbothAB andBAmakesense.

    Example16.2.3 innerproductasmatrix-matrixproductTheinnerproductoftwovectorscanbeconsideredasaspecialcaseofmatrix-matrixproduct. Let

    m

    1

    2v= 3 and w= 0 .

    6 4T R13We have v,wR3(=R31). Taking the transpose, we have v . Noting that the second

    dimensionofvT andthefirstdimensionofwmatch,wecanperformmatrix-matrixproduct,2

    T=v w= 1 3 6 0 = 1(2)+30 + 64=22.4

    Example16.2.4outerproductTheouterproductoftwovectors isyetanotherspecialcaseofmatrix-matrixproduct. TheouterproductB oftwovectorsvRm andwRm isdefinedas

    TB=vw .TBecausevRm1 andwT R1m, thematrix-matrix product vw iswell-defined andyields as

    mmmatrix.As inthepreviousexample, let

    1 2v= 3 and w= 0 .6 4

    Theouterproductoftwovectors isgivenby 2 2 6 12

    Twv = 0 1 3 6 = 0 0 0 .4 4 12 24

    Clearly,=vTw=wvT =B,astheyevenhavedifferentdimensions.220

  • 7/27/2019 MIT2 086F12 Notes Unit3

    27/107

    Intheaboveexample,wesawthatAB=BAingeneral. Infact,ABmightnotevenbeallowedeven if BA is allowed (consider AR21 and BR32). However, although the matrix-matrixproduct isnotcommutative ingeneral,thematrix-matrixproductis associative, i.e.

    ABC=A(BC) = (AB)C .Moreover,thematrix-matrixproduct isalsodistributive, i.e.

    (A+B)C=AC+BC .

    Proof. Theassociativeanddistributivepropertiesofmatrix-matrixproductisreadilyprovenfromitsdefinition. Forassociativity,weconsiderij-entryofthem1n3 matrixA(BC), i.e.

    n1 n1 n2 n1 n2 n2 n1m m m mm mm =(A(BC))ij = Aik(BC)kj = Aik BklClj AikBklClj = AikBklCljk=1 k=1 l=1 k=1 l=1 l=1 k=1

    n2 n1 n2m m m= AikBklClj = (AB)ilClj =((AB)C)ij, i, j .l=1 k=1 l=1

    Sincetheequality(A(BC))ij =((AB)C)ij holdsforallentries,wehaveA(BC) = (AB)C.Thedistributivepropertycanalsobeprovendirectly. Theij-entryof(A+B)Ccanbeexpressed

    asn1 n1 n1m m m

    ((A+B)C)ij = (A+B)ikCkj = (Aik +Bik)Ckj = (AikCkj +BikCkj)k=1 k=1 k=1n1 n1

    m m

    = AikCkj + BikCkj = (AC)ij + (BC)ij, i, j .k=1 k=1

    Again,sincetheequalityholdsforallentries,wehave(A+B)C=AC+BC.

    Anotherusefulruleconcerningmatrix-matrixproductandtransposeoperation is(AB)T =BTAT .

    Thisrule isusedveryoften.

    Proof. Theprooffollowsbycheckingthecomponentsofeachside. The left-handsideyieldsn1m((AB)T)ij = (AB)ji = Ajk Bki .

    k=1Theright-handsideyields

    n1 n1 n1m m m(BTAT)ij = (BT)ik(AT)kj = BkiAjk = Ajk Bki .

    k=1 k=1 k=1

    221

  • 7/27/2019 MIT2 086F12 Notes Unit3

    28/107

    Thus,wehave((AB)T)ij = (BTAT)ij, i= 1, . . . , n2, j= 1, . . . , m1 .

    16.2.3 InterpretationsoftheMatrix-VectorProductLet us consider a special case of the matrix-matrix product: the matrix-vector product. Thespecial case arises when the second matrix has only one column. Then, with A Rmn andw=BRn1 =Rn,wehave

    C=AB ,where

    n n

    Cij = AikBkj = Aikwk, i= 1, . . . , m1, j= 1.k=1 k=1

    SinceCRm1 =Rm,wecanintroducevRm andconciselywritethematrix-vectorproductasm

    v=Aw ,where

    m

    mnvi = Aikwk, i= 1, . . . , m .

    k=1Expandingthesummation,wecanthinkofthematrix-vectorproductas

    v1 =A11w1+A12w2+ +A1nwnv2 =A21w1+A22w2+ +A2nwn

    ...vm =Am1w1+Am2w2+ +Amnwn .

    Now,weconsidertwodifferent interpretationsofthematrix-vectorproduct.Row InterpretationThefirst interpretation istherow interpretation,whereweconsiderthematrix-vectormultiplicationasaseriesof innerproducts. Inparticular,weconsidervi asthe innerproductofi-throwofAandw. Inotherwords,thevectorv iscomputedentrybyentryinthesensethat

    Ai1 Ai2 Ain

    w1w2 i= 1, . . . , m .vi = ...wn

    ,

    222

  • 7/27/2019 MIT2 086F12 Notes Unit3

    29/107

    Example16.2.5row interpretationofmatrix-vectorproductAnexampleoftherow interpretationofmatrix-vectorproduct is

    T0 1 0 3 2 1 0 1 0 2

    =

    1 0 0 3 2 1 TT

    0 0 0

    3 2 1

    T0 0 1 3 2 1

    =

    32v=

    .1 0 0

    0 0 0 301

    0 0 1 1

    Column InterpretationThe second interpretation is the column interpretation, where we consider the matrix-vectormultiplicationasasumofnvectorscorrespondingtothencolumnsofthematrix,i.e.

    A11 A12 A1n

    v= w1+ w2+ + wn .A21

    ...A22

    ...A2n

    ...Am1 Am2 Amn

    In this case, we consider v as a linear combination of columns of A with coefficients w. Hencev=Aw issimplyanotherwaytowritea linearcombinationofvectors: thecolumnsofAarethevectors,andwcontainsthecoefficients.Example16.2.6column interpretationofmatrix-vectorproductAnexampleofthecolumn interpretationofmatrix-vectorproduct is

    0 1 0 0 1 0 2

    3v= = 3 10+ 2 00+ 1 00= 30.1 0 00 0 0 21

    0 0 1 0 0 1 1Clearly, the outcome of the matrix-vector product is identical to that computed using the rowinterpretation.

    LeftVector-MatrixProductWenowconsideranotherspecialcaseofthematrix-matrixproduct: theleftvector-matrixproduct.This special case arises when the first matrix only has one row. Then, we have A

    R1m and

    TBRmn. LetusdenotethematrixA,which isarowvector,byw . Clearly,wRm,becauseT R1mw . The leftvector-matrixproductyields

    v=wTB ,where mm

    vj = wkBkj, j= 1, . . . , n .k=1

    223

  • 7/27/2019 MIT2 086F12 Notes Unit3

    30/107

    Theresultantvectorv isarowvector inR1n. The leftvector-matrixproductcanalsobe interpreted in two different manners. The first interpretation considers the product as a series of dotproducts,whereeachentryvj iscomputedasadotproductofwwiththej-thcolumnofB, i.e.

    vj = w1 w2 wm B1jB2j

    ..

    .

    Bmj , j= 1, . . . , n .

    Thesecondinterpretationconsiderstheleftvector-matrixproductasalinearcombinationofrowsofB, i.e.

    v=w1 B11 B12 B1n +w2 B21 B22 B2n+ +wm Bm1 Bm2 Bmn .

    16.2.4 InterpretationsoftheMatrix-MatrixProductSimilartothematrix-vectorproduct,thematrix-matrixproductcanbeinterpretedinafewdifferentways. Throughoutthediscussion,weassumeARm1n1 andBRn1n2 andhenceC =ABR

    m1n2.Matrix-MatrixProductasaSeriesofMatrix-VectorProductsOneinterpretationofthematrix-matrixproductistoconsideritascomputingC onecolumnatatime,wherethej-thcolumnofC resultsfromthematrix-vectorproductofthematrixAwiththej-thcolumnofB,i.e.

    Cj =ABj, j= 1, . . . , n2 ,whereCj referstothej-thcolumnofC. Inotherwords,

    C1jC2j

    ...=

    A11 A12 A1n1A21 A22 A2n1

    . . ... . . .. . . .

    B1jB2j

    ..., j= 1, . . . , n2 .

    Cm1j Am11 Am12 Am1n1 Bn1jExample16.2.7matrix-matrixproductasaseriesofmatrix-vectorproductsLetusconsidermatricesAR32 andBR23 with

    1 3A= 4 9 2 3 5and B= .1 0 1

    0 3ThefirstcolumnofC=ABR33 isgivenby

    1 3 5C1 =AB1 = 4 9 2 = 1 .1

    0 3 3224

  • 7/27/2019 MIT2 086F12 Notes Unit3

    31/107

    Similarly,thesecondandthirdcolumnsaregivenby 1 3 3 4 9 3 =

    0 12C2 =AB2 =0 3 0

    and 1 3 8C3 =AB3 = 4 9 5 = 11 .1

    0 3 3PuttingthecolumnsofC together

    5 3 8C= C1 C2 C3 = 1 12 11 .

    3 0 3

    Matrix-MatrixProductasaSeriesofLeftVector-MatrixProductsIn the previous interpretation, we performed the matrix-matrix product by constructing the resultant matrix one column at a time. We can also use a series of left vector-matrix products toconstruct the resultant matrix one row at a time. Namely, in C =AB, the i-th row ofC resultsfromthe leftvector-matrixproductofi-throwofAwiththematrixB, i.e.

    Ci =AiB, i= 1, . . . , m1 ,whereCi referstothei-throwofC. Inotherwords,

    Ci1 Cin1 = Ai1 Ain1

    B11 B1n2. ... . ... .

    Bm21 Bm2n2, i= 1, . . . , m1 .

    16.2.5 OperationCountofMatrix-MatrixProductMatrix-matrix product is ubiquitous in scientific computing, and significant effort has been putinto efficient performance of the operation on modern computers. Let us now count the numberof additions and multiplications required to compute this product. Consider multiplication ofARm1n1 andBRn1n2. TocomputeC=AB,weperform

    mn1Cij = AikBkj, i= 1, . . . , m1, j= 1, . . . , n2 .

    k=1ComputingeachCij requiresn1 multiplicationsandn1 additions,yieldingthetotalof2n1 operations. Wemustperformthisform1n2 entriesinC. Thus,thetotaloperationcountforcomputingC is 2m1n1n2. Considering the matrix-vector product and the inner product as special cases ofmatrix-matrixproduct,wecansummarizehowtheoperationcountscales.

    225

  • 7/27/2019 MIT2 086F12 Notes Unit3

    32/107

    Operation Sizes OperationcountMatrix-matrix m1 =n1 =m2 =n2 =n 2n3Matrix-vector m1 =n1 =m2 =n,n2 = 1 2n2Innerproduct n1 =m1 =n,m1 =n2 = 1 2n

    The operation count is measured in FLoating Point Operations, or FLOPs. (Note FLOPS isdifferentfromFLOPs: FLOPSreferstoFLoatingPointOperationsperSecond,whichisaspeedassociated

    with

    a

    particular

    computer/hardware

    and

    aparticular

    implementation

    of

    an

    algorithm.)

    16.2.6 TheInverseofaMatrix(Briefly)Wehave nowstudied thematrix vectorproduct, in which, given a vector xRn, we calculate anew vector b = Ax, where A Rnn and hence b Rn. We may think of this as a forwardproblem, in which given x we calculate b = Ax. We can now also ask about the correspondinginverseproblem: givenb,canwefindxsuchthatAx=b? Note inthissection,andforreasonswhich shall become clear shortly, we shall exclusively consider square matrices, and hence we setm=n.

    To begin, let us revert to the scalar case. If b is a scalar and a is a non-zero scalar, we knowthatthe (very simple linear) equation ax=bhas thesolution x=b/a. We may write thismore

    1suggestively as x = a1b since of course a = 1/a. It is important to note that the equationax=bhasasolutiononlyifaisnon-zero;ifaiszero,thenofcoursethereisnoxsuchthatax=b.(This isnotquitetrue: in fact, ifb=0anda=0thenax=bhasan infinityofsolutionsanyvalueofx. Wediscussthissingularbutsolvablecase inmoredetailinUnitV.)

    Wecannowproceedtothematrixcasebyanalogy.ThematrixequationAx=bcanofcoursebeviewedasasystemof linearequations innunknowns. Thefirstequationstatesthatthe innerproductofthefirstrowofAwithxmustequalb1;ingeneral,theith equationstatesthattheinnerproductoftheith rowofAwithxmustequalbi. Then ifA isnon-zerowecouldplausiblyexpectthatx=A1b. Thisstatementisclearlydeficientintworelatedways: whatwedomeanwhenwesayamatrixisnon-zero? andwhatdowe infactmeanbyA1.

    Asregardsthefirstquestion,Ax=bwillhaveasolutionwhenAisnon-singular: non-singularisthe

    proper

    extension

    of

    the

    scalar

    concept

    of

    non-zero

    in

    this

    linear

    systems

    context.

    Conversely,

    if A is singular then (except for special b) Ax = b will have no solution: singular is the properextensionofthescalarconceptofzero inthis linearsystemscontext. Howcanwedetermine ifamatrixA issingular? Unfortunately, it isnotnearlyassimpleasverifying,say,thatthematrixconsistsofat leastonenon-zeroentry,orcontainsallnon-zeroentries.

    Therearevariety ofways to determinewhether amatrix isnon-singular, manyof which mayonlymake goodsense in laterchapters(in particular, inUnit V): a non-singularnnmatrix Ahas n independent columns (or, equivalently, n independent rows); a non-singular nn matrixAhasallnon-zeroeigenvalues;anon-singularmatrixAhasanon-zerodeterminant(perhapsthisconditionisclosesttothescalarcase,butitisalsoperhapstheleastuseful);anon-singularmatrixA has allnon-zero pivots in a (partially pivoted) LU decomposition process (described in UnitV). For now, we shall simply assume that A is non-singular. (We should also emphasize that inthenumericalcontextwemustbeconcernednot onlywithmatriceswhich mightbesingularbutalsowithmatriceswhicharealmostsingular insomeappropriatesense.) Asregardsthesecondquestion,wemustfirst introducetheidentity matrix,I.

    Letusnowdefineanidentitymatrix. Theidentitymatrixisammsquarematrixwithonesonthediagonalandzeroselsewhere, i.e.

    1, i=jIij = .

    0, i j226

    =

  • 7/27/2019 MIT2 086F12 Notes Unit3

    33/107

    Identitymatrices inR1,R2,andR3 are 1 0 0

    1 0 0 1 0I= 1 , I= , and I= .0 1

    0 0 1The identitymatrix isconventionallydenotedbyI. IfvRm,thei-thentryofIv is

    m

    (Iv)i = Iikvkk=1

    0 0 0 0=(( +Ii1v1+ + Ii,i+1vi+1+Ii,i1vi1+Iiivi+ Iimvm=vi, i= 1, . . . , m .

    TI TSo, we have Iv = v. Following the same argument, we also have v = v . In essence, I is them-dimensionalversionofone.

    WemaythendefineA1 asthat(unique)matrixsuchthatA1A=I. (Ofcourseinthescalar1 1 1case, this defines a as the unique scalar such that a a = 1 and hence a = 1/a.) In fact,

    A1A=I andalsoAA1 =I andthus this isa case inwhichmatrixmultiplicationdoes indeedcommute. Wecannowderivetheresultx=A1b: webeginwithAx=bandmultiplybothsidesbyA1 toobtainA1Ax=A1bor,sincethematrixproduct isassociative,x=A1b. OfcoursethisdefinitionofA1 doesnotyettellushowtofindA1: weshallshortlyconsiderthisquestionfrom a pragmaticMatlab perspective and then in Unit V from a more fundamental numericallinear algebra perspective. We should note here, however, that the matrix inverse is very rarelycomputedorused inpractice, forreasonswewillunderstand inUnitV.Nevertheless,the inversecan be quite useful for very small systems (n small) and of course more generally as an central

    m

    concept in the consideration of linear systems of equations. Example16.2.8The inverseofa 22matrixWeconsiderherethecaseofa2

    2matrixAwhichwewriteas

    A= a b .c d

    Ifthecolumnsaretobeindependentwemusthavea/b=c/dor(ad)/(bc)=1oradbc=0whichinfactistheconditionthatthedeterminantofAisnonzero. The inverseofA isthengivenby

    1 d bA1 = .

    adbc c aNotethatthis inverse isonlydefined ifadbc=0,andwethusseethenecessitythatA isnon-singular. It isasimplemattertoshowbyexplicitmatrixmultiplicationthatA1A=AA1 =I,asdesired.

    16.3 SpecialMatricesLetusnowintroduceafewspecialmatricesthatweshallencounterfrequentlyinnumericalmethods.

    227

  • 7/27/2019 MIT2 086F12 Notes Unit3

    34/107

    16.3.1 DiagonalMatricesAsquarematrixA issaidtobediagonal iftheoff-diagonalentriesarezero, i.e.

    Aij = 0, i=j .Example16.3.1diagonalmatricesExamplesofdiagonalmatrixare

    2 0 01 0

    = 0 1 0, and C= 4 .A= , B0 3

    0 0 7Theidentitymatrixisaspecialcaseofadiagonalmatrixwithalltheentriesinthediagonalequalto 1. Any 11matrix istriviallydiagonalas itdoesnothaveanyoff-diagonalentries.

    16.3.2 SymmetricMatricesA square matrix A is said to be symmetric if the off-diagonal entries are symmetric about thediagonal, i.e.

    Aij =Aji , i= 1, . . . , m , j= 1, . . . , m .Theequivalentstatement isthatA isnotchangedbythetransposeoperation, i.e.

    AT =A .Wenotethatthe identitymatrix isaspecialcaseofsymmetricmatrix. Letus lookatafewmoreexamples.Example16.3.2SymmetricmatricesExamplesofsymmetricmatricesare

    2 31 2

    = 1 1, and C= 4 .A= , B2 33 1 7

    Notethatanyscalar,ora11matrix, istriviallysymmetricandunchangedundertranspose.

    16.3.3 SymmetricPositiveDefiniteMatricesA mm square matrix A is said to be symmetric positive definite (SPD) if it is symmetric andfurthermoresatisfies

    vTAv>0, vRm (v=0) .Beforewediscuss itsproperties, letusgiveanexampleofaSPDmatrix.

    228

  • 7/27/2019 MIT2 086F12 Notes Unit3

    35/107

    Example16.3.3SymmetricpositivedefinitematricesAnexampleofasymmetricpositivedefinitematrix is

    2 1A= .1 2

    WecanconfirmthatAissymmetricbyinspection. TocheckifAispositivedefinite,letusconsiderthequadraticform

    2 2 2 2m m m mq(v)vTAv= vi Aijvj= Aijvivj

    i=1 j=1 i=1 j=1=A11v12+A12v1v2+A21v2v1+A22v22=A11v12+ 2A12v1v2+A22v2 ,2

    wherethe last equality follows from thesymmetry condition A12 =A21. Substituting theentriesofA,

    2 21 1 1 32 2 2 2 2q(v) =vTAv= 2v12v1v2+ 2v2 = 2 v1 v2 v2 +v = 2 v1 v2 + v .2 22 4 2 4Becauseq(v) isasumoftwopositiveterms(eachsquared), it isnon-negative. It isequaltozeroonly if

    1 3 2v1 v2 =0 and v2 = 0.2 4Thesecond conditionrequires v2 =0, andthe first conditionwithv2 =0 requires v1 =0. Thus,wehave

    q(v) =vTAv>0, vR2 ,andvTAv= 0ifv=0. ThusA issymmetricpositivedefinite.

    Symmetricpositivedefinitematricesareencounteredinmanyareasofengineeringandscience.

    Theyarisenaturallyinthenumericalsolutionof,forexample,theheatequation,thewaveequation,andthelinearelasticityequations. Oneimportantpropertyofsymmetricpositivedefinitematricesisthattheyarealways invertible: A1 alwaysexists. Thus, ifA isanSPD matrix,then, foranyb,thereisalwaysauniquexsuchthat

    Ax=b .Inalaterunit,wewilldiscusstechniquesforsolutionoflinearsystems,suchastheoneabove. Fornow, wejustnote that thereare particularly efficient techniques for solving thesystem whenthematrix issymmetricpositivedefinite.

    229

  • 7/27/2019 MIT2 086F12 Notes Unit3

    36/107

    16.3.4 TriangularMatricesTriangular matrices are square matrices whose entries are all zeros either below or above thediagonal. A mm square matrix is said to be upper triangular if all entries below the diagonalarezero, i.e.

    Aij = 0, i > j .Asquarematrix issaidtobe lowertriangular ifallentriesabovethediagonalarezero, i.e.

    Aij = 0, j > i .We will see later that a linear system, Ax = b, in which A is a triangular matrix is particularlyeasytosolve. Furthermore,thelinearsystemisguaranteedtohaveauniquesolutionaslongasalldiagonalentriesarenonzero.Example16.3.4triangularmatricesExamplesofuppertriangularmatricesare

    1 0 21 2

    A= and B= 0 4 1 .0 3 0 0 3Examplesof lowertriangularmatricesare

    2 0 01 0

    C= and D= 7 5 0.7 63 1 4

    BeginAdvancedMaterial

    16.3.5Orthogonal

    Matrices

    AmmsquarematrixQ issaidtobeorthogonal if itscolumns formanorthonormalset. Thatis, ifwedenotethej-thcolumnofQbyqj,wehave

    Q= q1 q2 qm ,where

    T 1, i=jqi qj = .0, i=j

    OrthogonalmatriceshaveaspecialpropertyQTQ=I .

    ThisrelationshipfollowsdirectlyfromthefactthatcolumnsofQformanorthonormalset. Recallthat the ij entry ofQTQ isthe inner productofthe i-throwof QT (which is the i-thcolumnofQ)andthej-thcolumnofQ. Thus,

    T 1, i=j(QTQ)ij =qi qj = ,0, i=j

    230

  • 7/27/2019 MIT2 086F12 Notes Unit3

    37/107

    which isthedefinitionofthe identitymatrix. OrthogonalmatricesalsosatisfyQQT =I ,

    which infact isaminormiracle.Example16.3.5OrthogonalmatricesExamplesoforthogonalmatricesare

    2/ 5 1/ 5 1 0Q= and I= .

    1/ 5 2/ 5 0 1Wecaneasilyverifythatthecolumnsthematrix Qareorthogonalto eachotherandeachareofunitlength. Thus,Qisanorthogonalmatrix. WecanalsodirectlyconfirmthatQTQ=QQT =I.Similarly,the identitymatrix istriviallyorthogonal.

    Let us discuss a few important properties of orthogonal matrices. First, the action by an

    orthogonalmatrixpreservesthe2-normofavector, i.e.IQxI2 =IxI2, xRm .

    Thisfollowsdirectlyfromthedefinitionof2-normandthefactthatQTQ=I, i.e.TIQxI22 = (Qx)T(Qx) =xTQTQx=xTIx=x x=IxI22 .

    Second, orthogonal matrices are always invertible. In fact, solving a linear system defined by anorthogonalmatrix istrivialbecause

    Qx=b QTQx=QTb x=QTb .In considering linear spaces, we observed that a basis provides a unique description of vectors inV intermsofthecoefficients. AscolumnsofQformanorthonormalsetofm m-vectors,itcanbethoughtofasanbasisofRm. InsolvingQx=b,wearefindingtherepresentationofbincoefficientsof{q1, . . . , q m}. Thus, the operation byQT (or Q) represent asimple coordinatetransformation.LetussolidifythisideabyshowingthatarotationmatrixinR2 isanorthogonalmatrix.Example16.3.6RotationmatrixRotation of a vector is equivalent to representing the vector in a rotated coordinate system. Arotationmatrixthatrotatesavector inR2 byangle is

    cos() sin()R() = .

    sin() cos()Let us verify that the rotation matrix is orthogonal for any . The two columns are orthogonalbecause

    sin()Tr1r2 = cos() sin() =cos()sin()+sin()cos() = 0, .cos()Eachcolumn isofunitlengthbecause

    Ir1I22 =(cos())2+(sin())2 = 1Ir2I22 = (sin())2+(cos())2 = 1, .

    231

  • 7/27/2019 MIT2 086F12 Notes Unit3

    38/107

    Thus, the columns of the rotation matrix is orthonormal, and the matrix is orthogonal. Thisresult verifies that the action of the orthogonal matrix represents a coordinate transformation inR

    2. The interpretationofanorthogonalmatrixasacoordinatetransformationreadilyextendstohigher-dimensionalspaces.

    16.3.6 OrthonormalMatricesLetusdefineorthonormalmatricestobemnmatriceswhosecolumnsformanorthonormalset,i.e.

    Q= q1 q2 qn ,with

    T 1, i=jqi qj =0, i=j .

    Note that, unlike an orthogonal matrix, we do not require the matrix to be square. Just likeorthogonalmatrices,wehave

    QTQ=I ,where I is an nn matrix. The proof is identical to that for the orthogonal matrix. However,QQT doesnotyieldan identitymatrix,

    QQT =I ,unlessofcoursem=n.Example16.3.7orthonormalmatricesAn

    example

    of

    an

    orthonormal

    matrix

    is

    1/ 6 2/ 5 Q= 2/ 6 1/ 5 .1/ 6 0

    WecanverifythatQTQ=I because 1/ 6 2/ 5 1/ 6 2/ 6 1/ 6 1 0= . QTQ= 2/ 6 1/ 5 0 12/ 5 1/ 5 01/ 6 0

    However,QQT =I because 1/ 6 2/ 5 29/30 1/15 1/61/15 13/15 1/3 1/ 6 2/ 6 1/ 6 = QQT = 2/ 6 1/ 5 . 2/ 5 1/ 5 01/6 1/3 1/61/ 6 0

    EndAdvancedMaterial

    232

    =

  • 7/27/2019 MIT2 086F12 Notes Unit3

    39/107

    BeginAdvancedMaterial16.4 FurtherConcepts inLinearAlgebra16.4.1 ColumnSpaceandNullSpaceLet us introduce more concepts in linear algebra. First is the column space. The column spaceof matrix A is a space of vectors that can be expressed as Ax. From the column interpretationofmatrix-vector product, we recallthat Ax is a linearcombination ofthecolumns ofAwiththeweightsprovidedbyx. WewilldenotethecolumnspaceofARmn bycol(A),andthespace isdefinedas

    col(A) ={vRm :v=AxforsomexRn} .ThecolumnspaceofA isalsocalledtheimageofA, img(A),ortherangeofA,range(A).

    Thesecondconceptisthenullspace. ThenullspaceofARmn isdenotedbynull(A)andisdefinedas

    null(A) ={xRn :Ax= 0} ,i.e., the null space of A is a space of vectors that results in Ax = 0. Recalling the columninterpretationofmatrix-vectorproductandthedefinitionoflinearindependence,wenotethatthecolumnsofAmustbe linearlydependent inorder forAtohaveanon-trivialnullspace. ThenullspacedefinedaboveismoreformallyknownastherightnullspaceandalsocalledthekernelofA,ker(A).Example16.4.1columnspaceandnullspaceLetusconsidera32matrix

    0 2A=

    1 0

    .

    0 0ThecolumnspaceofA isthesetofvectorsrepresentableasAx,whichare

    0 2 0 2 2x2Ax= 1 0 x1 = 1x1+ 0x2 = .x1x20 0 0 0 0

    So,thecolumnspaceofAisasetofvectorswitharbitraryvalues inthefirsttwoentriesandzerointhethirdentry. That is,col(A) isthe1-2plane inR3.

    BecausethecolumnsofAarelinearlyindependent,theonlywaytorealizeAx=0isifxisthezerovector. Thus,thenullspaceofAconsistsofthezerovectoronly.

    Letusnowconsidera23matrixB= 1

    2 21 03 .ThecolumnspaceofB consistsofvectorsoftheform

    x11 2 0 x1+ 2x2= .Bx= x22 1 3 2x1x2+ 3x3x3233

  • 7/27/2019 MIT2 086F12 Notes Unit3

    40/107

    Byjudiciouslychoosingx1,x2,andx3,wecanexpressanyvectors inR2. Thus,thecolumnspaceofB isentireR2, i.e.,col(B) =R2.

    BecausethecolumnsofB arenot linearly independent,weexpectB tohaveanontrivialnullspace. Invokingtherowinterpretationofmatrix-vectorproduct,avectorxinthenullspacemustsatisfy

    x1+ 2x2 = 0 and 2x1

    x2+ 3x3 = 0 .The first equation requires x1 =2x2. The combination of the first requirement and the second

    5equationyieldsx3 = 3x2. Thus,thenullspaceofB is 2 :R .null(B) = 1 5/3Thus,thenullspace isaone-dimensionalspace(i.e.,a line) inR3.

    16.4.2 ProjectorsAnotherimportantconceptinparticularforleastsquarescoveredinChapter17istheconceptofprojector. Aprojector isasquarematrixP that is idempotent, i.e.

    P2 =P P =P .Letv beanarbitraryvector inRm. TheprojectorP projectsv,which isnotnecessary incol(P),ontocol(P), i.e.

    w=Pvcol(P), vRm .In addition, the projector P does not modify a vector that is already in col(P). This is easilyverifiedbecause

    Pw=PPv=Pv=w, wcol(P) .Intuitively,aprojectorprojectsavectorvRm ontoasmallerspacecol(P). Ifthevectorisalreadyincol(P),then itwouldbe leftunchanged.

    The complementary projector of P is a projector IP. It is easy to verify that IP is aprojector itselfbecause

    (IP)2 = (IP)(IP) =I2P+P P =IP .ItcanbeshownthatthecomplementaryprojectorI

    P projectsontothenullspaceofP,null(P).

    Whenthe spacealongwhichthe projectorprojects isorthogonalto the space ontowhichtheprojector projects, the projector is said to be an orthogonal projector. Algebraically, orthogonalprojectorsaresymmetric.

    When an orthonormal basis for a space is available, it is particularly simple to construct anorthogonalprojectorontothespace. Say{q1, . . . , q n} isanorthonormalbasis foran-dimensionalsubspaceofRm,n < m. GivenanyvRm,werecallthat

    Tui =qi v234

  • 7/27/2019 MIT2 086F12 Notes Unit3

    41/107

    is the component of v in the direction of qi represented in the basis{qi}. We then introduce thevector

    wi =qi(qiTv) ;the sum of such vectors would produce the projection of vRm onto V spanned by{qi}. Morecompactly, ifweformanmnmatrix

    Q= q1 qn ,thentheprojectionofv ontothecolumnspaceofQis

    w=Q(QTv) = (QQT)v .Werecognizethattheorthogonalprojectorontothespanof{qi}orcol(Q) is

    P =QQT .Of course P is symmetric, (QQT)T = (QT)TQT = QQT, and idempotent, (QQT)(QQT) =Q(Q

    T

    Q)QT

    =QQT

    . EndAdvancedMaterial

    235

  • 7/27/2019 MIT2 086F12 Notes Unit3

    42/107

    236

  • 7/27/2019 MIT2 086F12 Notes Unit3

    43/107

    Chapter17LeastSquares

    17.1Data

    Fitting

    in

    Absence

    of

    Noise

    and

    Bias

    We motivate our discussion by reconsidering the friction coefficient example of Chapter 15. Werecall that, according to Amontons, the static friction, Ff,static, and the applied normal force,Fnormal,applied,arerelatedby

    Ff,static sFnormal,applied ;here s is the coefficient of friction, which is only dependent on the two materials in contact. Inparticular,themaximum staticfriction isa linearfunctionoftheappliednormalforce, i.e.

    Fmaxf,static =sFnormal,applied .We wish to deduce s by measuring the maximum static friction attainable for several differentvaluesoftheappliednormalforce.

    Ourapproachtothisproblemistofirstchoosetheformofamodelbasedonphysicalprinciplesandthen deducetheparametersbased ona set of measurements. In particular, letus consider asimpleaffinemodel

    y=Ymodel(x;) =0+1x .The variable y is the predicted quantity, or the output, which is the maximum static frictionFmax The variable x is the independent variable, or the input, which is the maximum normalf,static.force Fnormal,applied. The function Ymodel is our predictive model which is parameterized by aparameter = (0, 1). NotethatAmontons law isaparticularcaseofourgeneralaffinemodelwith0 =0and1 =s. Ifwetakemnoise-free measurementsandAmontonslawisexact,thenweexpect

    Fmaxf,static i =sFnormal,applied i, i= 1, . . . , m .Theequationshouldbesatisfiedexactly foreachoneofthemmeasurements. Accordingly,thereisalsoauniquesolutiontoourmodel-parameteridentificationproblem

    yi =0+1xi, i= 1, . . . , m ,237

    DRAFT V1.2 The Authors. License: Creative Commons BY-NC-SA 3.0.

    http://creativecommons.org/licenses/by-nc-sa/3.0/us/http://creativecommons.org/licenses/by-nc-sa/3.0/us/
  • 7/27/2019 MIT2 086F12 Notes Unit3

    44/107

    withthesolutiongivenbytrue =0andtrue =s.0 1

    Because the dependency of the output y on the model parameters {0, 1} is linear, we canwritethesystemofequationsasam2matrixequation

    1 x1

    0 =1

    y1y2...

    1 x2. .. .. .

    ,1 xm ym' v '" v '" v"

    X Yor,morecompactly,

    X=Y .Using therow interpretationof matrix-vectormultiplication, we immediately recover theoriginalsetofequations,

    0+1x1 =y1y2...=Y .X= 0+1x2...

    0+1xm ymOr, using the column interpretation, we see that our parameter fitting problem corresponds tochoosingthetwoweightsforthetwom-vectorstomatchtheright-handside,

    1 x1

    =

    y1y2...

    =Y .X=0

    +1

    1...

    x2...

    1

    xm

    ym

    We emphasize that the linear system X = Y is overdetermined, i.e., more equations thanunknowns(m > n). (Westudythis inmoredetail inthenextsection.) However,wecanstillfindasolutiontothesystembecausethefollowingtwoconditionsaresatisfied:

    Unbiased: Ourmodelincludesthetruefunctionaldependencey=sx,andthusthemodelis capable of representing this true underlying functional dependence. This would not bethe case if, for example, we consider a constant model y(x) =0 because our model wouldbe incapableofrepresentingthe lineardependenceofthe friction forceonthenormal force.Clearly the assumption of no bias is a very strong assumption given the complexity of thephysicalworld.Noise free: We have perfect measurements: each measurement yi corresponding to theindependent variable xi provides the exact value of the friction force. Obviously this isagainrathernaveandwillneedtoberelaxed.

    Undertheseassumptions,thereexistsaparametertruethatcompletelydescribethemeasurements,i.e.

    yi =Ymodel(x;true), i= 1, . . . , m .238

  • 7/27/2019 MIT2 086F12 Notes Unit3

    45/107

    (Thetrue willbeuniqueifthecolumnsofXareindependent.) Consequently,ourpredictivemodelisperfect,andwecanexactlypredicttheexperimentaloutputforanychoiceofx, i.e.

    Y(x) =Ymodel(x;true), x ,whereY(x)istheexperimentalmeasurementcorrespondingtotheconditiondescribedbyx. However, in practice, the bias-free and noise-free assumptions are rarely satisfied, and our model isneveraperfectpredictorofthereality.InChapter19,wewilldevelopaprobabilistictoolforquantifyingtheeffectofnoiseandbias;thecurrent chapter focuses on developing a least-squares technique for solving overdetermined linearsystem(in the deterministiccontext) which isessentialto solvingthesedata fittingproblems. Inparticularwewillconsiderastrategyforsolvingoverdetermined linearsystemsoftheform

    Bz=g ,whereBRmn,zRn,andgRm withm > n.

    Beforewediscusstheleast-squaresstrategy,letusconsideranotherexampleofoverdeterminedsystems in the context of polynomial fitting. Let us consider a particle experiencing constantacceleration,e.g. duetogravity. Weknowthatthepositionyoftheparticleattimetisdescribedbyaquadraticfunction

    y(t) = 1at2+v0t+y0 ,2

    where a is the acceleration, v0 is the initial velocity, and y0 is the initial position. Suppose thatwedonotknowthethreeparametersa,v0,andy0 thatgovernthemotionoftheparticleandweare interested indeterminingtheparameters. Wecoulddothisbyfirstmeasuringthepositionofthe particle at several different times and recording the pairs {ti, y(ti)}. Then, we could fit ourmeasurementstothequadraticmodeltodeducetheparameters.

    Theproblemoffindingtheparametersthatgovernthemotionoftheparticle isaspecialcaseofamoregeneralproblem: polynomialfitting. Letusconsideraquadraticpolynomial, i.e.

    y(x) =true+truex+true 2x ,0 1 2where true = {true, true, true} is the set of true parameters characterizing the modeled phe0 1 2nomenon. Suppose that we do not know true but we do know that our output depends on theinputx inaquadraticmanner. Thus,weconsideramodeloftheform

    Ymodel(x;) =0+1x+2x2 ,andwedeterminethecoefficientsbymeasuringtheoutputy forseveraldifferentvaluesofx. Weare free to choose the number of measurements m and the measurement points xi, i= 1, . . . , m.Inparticular,uponchoosingthemeasurementpointsandtakingameasurementateachpoint,weobtainasystemoflinearequations,

    yi =Ymodel(xi;) =0+1xi+2xi2, i= 1, . . . , m ,whereyi isthemeasurementcorrespondingtothe inputxi.

    2Notethattheequationislinearinourunknowns{0, 1, 2}(theappearanceofx onlyaffectsithe manner in which data enters the equation). Because the dependency on the parameters is

    239

  • 7/27/2019 MIT2 086F12 Notes Unit3

    46/107

    linear,wecanwritethesystemasmatrixequation, 1 x1 x2 y1 1

    01 x2 x2y2 2 1= ,. . . .. . . .

    .

    . . .

    2

    ym 1 xm x2m"'"'"'vY vX v

    or,morecompactly,Y =X .

    NotethatthisparticularmatrixX hasaratherspecialstructureeachrowformsageometricsej1riesandtheij-thentryisgivenbyBij =x . MatriceswiththisstructurearecalledVandermondei

    matrices.Asinthefrictioncoefficientexampleconsideredearlier,therowinterpretationofmatrix-vector

    productrecoverstheoriginalsetofequation

    0+1x1+2x2y1y2...

    12

    Y = = =X .0+1x2+2x2...0+1xm+2x2ym m

    With the column interpretation, we immediately recognize that this is a problem of finding thethreecoefficients,orparameters,ofthelinearcombinationthatyieldsthedesiredm-vectorY,i.e.

    1 x1 x2y1y2...

    =0

    12

    Y =

    =X .1...

    x2...

    x2+1 +2 ...

    2ym 1 xm xmWeknowthat ifhavethreeormorenon-degeneratemeasurements(i.e.,m3),thenwecanfindtheuniquesolutiontothelinearsystem. Moreover,thesolutionisthecoefficientsoftheunderlying

    , true, truepolynomial,(true ).0 1 2Example17.1.1AquadraticpolynomialLetusconsideramorespecificcase,wheretheunderlyingpolynomial isoftheform

    1 2 1 2y(x) = + x cx .2 3 8

    Werecognizethaty(x) =Ymodel(x;true)forYmodel(x;) =0+1x+2x2 andthetrueparameters1 2 1

    true true true=

    , = , and =

    c .0 1 22 3 8

    The parameter c controls the degree of quadratic dependency; in particular, c = 0 results in anaffinefunction.

    First, we consider the case with c = 1, which results in a strong quadratic dependency, i.e.,true = 1/8. The result of measuring y at three non-degenerate points (m = 3) is shown inFigure17.1(a). Solvingthe33 linearsystemwiththecoefficientsastheunknown,weobtain

    1 2 10 =

    2, 1 =

    3, and 2 =

    8 .240

    2

  • 7/27/2019 MIT2 086F12 Notes Unit3

    47/107

    0 0.5 1 1.5 2 2.5 30.8

    0.6

    0.4

    0.2

    0

    0.2

    0.4

    0.6

    x

    y

    ymeas

    y

    0 0.5 1 1.5 2 2.5 30.8

    0.6

    0.4

    0.2

    0

    0.2

    0.4

    0.6

    x

    y

    ymeas

    y

    (a)m=3 (b)m= 7Figure17.1: Deducingthecoefficientsofapolynomialwithastrongquadraticdependence.

    Notsurprisingly,

    we

    can

    find

    the

    true

    coefficients

    of

    the

    quadratic

    equation

    using

    three

    data

    points.

    Suppose we take more measurements. An example of taking seven measurements (m = 7) is

    shown in Figure 17.1(b). We now have seven data points and three unknowns, so we must solvethe 73 linear system, i.e., find the set ={0, 1, 2} that satisfies all seven equations. Thesolutiontothe linearsystem,ofcourse, isgivenby

    1 2 10 = , 1 = , and 2 = .

    2 3 8Theresultiscorrect(=true)and,inparticular,nodifferentfromtheresultforthem=3case.

    Wecanmodifytheunderlyingpolynomialslightlyandrepeatthesameexercise. Forexample,let

    us

    consider

    the

    case

    with

    c= 1/10,

    which

    results

    in

    amuch

    weaker

    quadratic

    dependency

    of

    y on x, i.e., true = 1/80. As shown in Figure 17.1.1, we can take either m = 3 or m = 72measurements. Similartothec=1case,weidentifythetruecoefficients,

    1 2 10 = , 1 = , and 2 = ,

    2 3 80usingtheeitherm= 3orm=7(infactusinganythreeormorenon-degeneratemeasurements).

    In the friction coefficient determination and the (particle motion) polynomial identification

    problems,wehaveseenthatwecanfindasolutiontothemnoverdeterminedsystem(m > n)if(a) ourmodel includestheunderlying input-outputfunctionaldependencenobias;(b) andthemeasurementsareperfectnonoise.

    As already stated, in practice, these two assumptions are rarely satisfied; i.e., models are often(infact,always)incompleteandmeasurementsareoften inaccurate. (Forexample,inourparticlemotion model, we have neglected friction.) We can still construct a mn linearsystem Bz =gusing our model and measurements, but the solution to the system in general does not exist.Knowingthatwecannotfindthesolutiontotheoverdetermined linearsystem,ourobjective is

    241

  • 7/27/2019 MIT2 086F12 Notes Unit3

    48/107

    0 0.5 1 1.5 2 2.5 31

    0.5

    0

    0.5

    1

    1.5

    2

    x

    y

    ymeas

    y

    0 0.5 1 1.5 2 2.5 31

    0.5

    0

    0.5

    1

    1.5

    2

    x

    y

    ymeas

    y

    (a)m=3 (b)m= 7Figure17.2: Deducingthecoefficientsofapolynomialwithaweakquadraticdependence.

    tofind

    asolution

    that

    is

    close

    to

    satisfying

    the

    solution.

    In

    the

    following

    section,

    we

    will

    define

    thenotionofclosenesssuitableforouranalysisandintroduceageneralprocedureforfindingtheclosestsolutiontoageneraloverdeterminedsystemoftheform

    Bz=g ,where B Rmn with m > n. We will subsequently address the meaning and interpretation ofthis(non-)solution.17.2 OverdeterminedSystemsLet us consider an overdetermined linear system such as the one arising from the regressionexampleearlieroftheform

    Bz=g ,or,moreexplicitly,

    B11 B12 g1g2 z1 = .B21 B22 z2B31 B32 g3

    Our objective is to find z that makes the three-component vector equation true, i.e., find thesolution

    to

    the

    linear

    system.

    In

    Chapter

    16,

    we

    considered

    the

    forward

    problem

    of

    matrix-vectormultiplicationinwhich,givenz,wecalculateg=Bz . Wealsobrieflydiscussedtheinverse

    probleminwhichgivengwewouldliketofindz. Butform=n,B1 doesnotexist;asdiscussedintheprevioussection,theremaybenoz thatsatisfiesBz=g. Thus,wewillneedto look forazthatsatisfiestheequationcloselyinthesensewemustspecifyandinterpret. Thisisthefocusofthissection.1

    1Note later (inUnitV) weshall look attheostensibly simpler case inwhichB issquare anda solutionz existsandisevenunique. But,formanyreasons,overdeterminedsystemsareanicerplacetostart.

    242

  • 7/27/2019 MIT2 086F12 Notes Unit3

    49/107

    Row InterpretationLet us consider a row interpretation of the overdetermined system. Satisfying the linear systemrequires

    Bi1z1+Bi2z2 =gi, i= 1,2,3 .Note that each of these equations define a line inR2. Thus, satisfying the three equations isequivalent tofindinga point that is shared by all three lines, which in general is not possible, aswewilldemonstrate inanexample.Example17.2.1row interpretationofoverdeterminedsystemLetusconsideranoverdeterminedsystem

    1 2 z1 = 5/22 .2 1z22 3 2

    Using the row interpretation of the linear system, we see there are three linear equations to besatisfied. Thesetofpointsx= (x1, x2)thatsatisfiesthefirstequation,

    51x1+ 2x2 = ,2

    forma lineL1 ={(x1, x2): 1x2+ 2x2 = 5/2}

    inthetwodimensionalspace. Similarly,thesetsofpointsthatsatisfythesecondandthirdequationsform linesdescribedby

    L2 ={(x1, x2):2x1+ 1x2 = 2}L3 ={(x1, x2):2x13x2 =2} .

    ThesesetofpointsinL1,L2,andL3,orthe lines,areshown inFigure17.3(a).The solution to the linear system must satisfy each of the three equations, i.e., belong to all

    three lines. This means that there must be an intersection of all three lines and, if it exists, thesolution isthe intersection. This linearsystemhasthesolution

    1/2z= .

    1However,threelinesintersectinginR2 isarareoccurrence;infacttheright-handsideofthesystemwas chosen carefully so that the system has a solution in this example. If we perturb either thematrixortheright-handsideofthesystem,itislikelythatthethreelineswillnolongerintersect.

    Amoretypicaloverdeterminedsystem isthefollowingsystem,

    1 2 0 2 1 z1 = 2 .z22 3 4Again,interpretingthematrixequationasasystemofthreelinearequations,wecanillustratetheset of points that satisfy each equation as a line inR2 as shown in Figure 17.3(b). There is nosolutiontothisoverdeterminedsystem,becausethere isnopoint(z1, z2)thatbelongstoallthreelines, i.e.,thethree linesdonot intersectatapoint.

    243

  • 7/27/2019 MIT2 086F12 Notes Unit3

    50/107

    2 1 0 1 22

    1

    0

    1

    2

    L1

    L2

    L3

    2 1 0 1 22

    1

    0

    1

    2

    L1

    L2

    L3

    (a) systemwithasolution (b) systemwithoutasolutionFigure17.3: Illustrationoftherowinterpretationoftheoverdeterminedsystems. EachlineisasetofpointsthatsatisfiesBix=gi,i= 1,2,3.

    ColumnInterpretationLet us now consider a column interpretation of the overdetermined system. Satisfying the linearsystemrequires

    B11 B12 = g1g2 .z1 +z2 B21 B22B31 B32 g3

    Inotherwords,weconsidera linearcombinationoftwovectors inR3 andtrytomatchtheright-handsidegR3. ThevectorsspanatmostaplaneinR3,sothereisnoweight(z1, z2)thatmakestheequationholdunlessthevectorghappenstolieintheplane. Toclarifytheidea,letusconsideraspecificexample.Example17.2.2column interpretationofoverdeterminedsystemForsimplicity, letusconsiderthefollowingspecialcase:

    1 0 1 0 1 z1 = .3/2z20 0 2

    Thecolumn interpretationresults in

    1 0 1 0z1+ 1z2 = 3/2.0 0 2

    Bychangingz1 andz2,wecanmove intheplane 1 0 z1 0z1+ 1z2 = .z20 0 0

    244

  • 7/27/2019 MIT2 086F12 Notes Unit3

    51/107

    2

    1

    0

    1

    2

    21

    01

    2

    1

    0

    1

    2

    col(B)

    span(col(B))

    g

    Bz*

    Figure17.4: Illustrationofthecolumn interpretationoftheoverdeterminedsystem.Clearly, ifg3 =0, it isnot possibletofindz1 andz2 thatsatisfythe linearequation,Bz =g. Inotherwords,gmust lie intheplanespannedbythecolumnsofB,whichisthe12plane inthiscase.Figure 17.4 illustrates the column interpretation of the overdetermined system. The vectorg R3 does not lie in the space spanned by the columns of B, thus there is no solution to the

    system. However, ifg3 issmall,thenwecanfindaz suchthatBz isclosetog, i.e.,agoodapproximationtog. Suchanapproximation isshown inthefigure,andthenextsectiondiscusseshowtofindsuchanapproximation.

    17.3 LeastSquares17.3.1 MeasuresofClosenessIn the previous section, we observed that it is in general not possible to find a solution to anoverdeterminedsystem. Ouraim isthustofindz suchthatBz isclosetog, i.e.,z suchthat

    Bzg ,forBRmn,m > n. Forconvenience, letus introducetheresidual function,which isdefinedas

    r(z)gBz .Notethat

    nmri =gi(Bz)i =gi Bijzj, i= 1, . . . , m .j=1

    Thus,ri istheextenttowhichi-thequation(Bz)i =gi isnotsatisfied. Inparticular,ifri(z)=0,i= 1, . . . , m,thenBz=g andz isthesolutiontothe linearsystem. Wenotethattheresidual isa measure of closeness described by mvalues. It ismoreconvenient to havea single scalar valueforassessingtheextenttowhichtheequation issatisfied. Asimplewaytoachievethis istotakea norm of the residual vector. Different norms result in different measures of closeness, which inturnproducedifferentbest-fitsolutions.

    245

  • 7/27/2019 MIT2 086F12 Notes Unit3

    52/107

    BeginAdvancedMaterialLetusconsiderfirsttwoexamples,neitherofwhichwewillpursue inthischapter.

    Example17.3.1 1minimizationThe first method is based on measuring the residual in the 1-norm. The scalar representing theextentofmismatch is

    m mm mJ1(z) Ir(z)I1 = |ri(z)|= |(gBz)i| .

    i=1 i=1Thebestz,denotedbyz, isthez thatminimizestheextentofmismatchmeasured inJ1(z), i.e.

    z =arg min J1(z) .zRm

    TheargminzRn J1(z)returnstheargumentz thatminimizesthe functionJ1(z). Inotherwords,z satisfies

    J1(z)J1(z), zRm .This minimization problem can be formulated as a linear programming problem. The minimizeris not necessarily unique and the solution procedure is not as simple as that resulting from the2-norm. Thus,wewillnotpursuethisoptionhere.

    Example17.3.2 minimizationThesecondmethodisbasedonmeasuringtheresidualinthe-norm. Thescalarrepresentingtheextentofmismatch is

    J(z) Ir(z)I = max |ri(z)|= max |(gBz)i|.i=1,...,m i=1,...,mThebestz thatminimizesJ(z) is

    z =argminJ(z) .zRn

    This so-called min-max problem can also be cast as a linear programming problem. Again, thisprocedureisrathercomplicated,andthesolution isnotnecessarilyunique.

    EndAdvancedMaterial

    17.3.2 Least-SquaresFormulation(.2minimization)Minimizingtheresidualmeasured in(say)the1-normor-normresultsinalinearprogrammingproblemthat isnotsoeasytosolve. Herewewillshowthatmeasuringtheresidual inthe2-normresults inaparticularlysimpleminimizationproblem. Moreover,thesolutiontotheminimizationproblemisuniqueassumingthatthematrixB isfullrankhasnindependentcolumns. WeshallassumethatB does indeedhave independentcolumns.

    246

  • 7/27/2019 MIT2 086F12 Notes Unit3

    53/107

    Thescalarfunctionrepresentingtheextentofmismatchfor2 minimization isJ2(z) Ir(z)I2 =rT(z)r(z) = (gBz)T(gBz) .2

    Notethatweconsiderthesquareofthe2-normforconvenience,ratherthanthe2-normitself. Ourobjective istofindz suchthat

    z

    =arg

    min

    zRnJ2(z)

    ,

    which isequivalenttofindz withIgBzI2 =J2(z)< J2(z) =IgBzI22, z=z .2

    (Noteargminreferstotheargumentwhichminimizes: soministheminimum andargministheminimizer.) NotethatwecanwriteourobjectivefunctionJ2(z)as

    mmJ2(z) =Ir(z)I2 =rT(z)r(z) = (ri(z))2 .2

    i=1

    Inotherwords,ourobjectiveistominimizethesumofthesquareoftheresiduals,i.e.,leastsquares. Thus,wesaythatz istheleast-squaressolutiontotheoverdeterminedsystemBz=g: z isthatz whichmakesJ2(z)thesumofthesquaresoftheresidualsassmallaspossible.

    Note that if Bz = g does have a solution, the least-squares solution is the solution to theoverdeterminedsystem. Ifz isthesolution,thenr=Bzg=0andinparticularJ2(z)=0,which

    istheminimumvaluethatJ2 cantake. Thus,thesolutionz istheminimizerofJ2: z=z . Letusnowderiveaprocedureforsolvingtheleast-squaresproblemforamoregeneralcasewhereBz=gdoesnothaveasolution.

    Forconvenience,wedropthesubscript2oftheobjective functionJ2,andsimplydenote itbyJ. Again,ourobjective istofindz suchthat

    J(z)< J(z),

    z=z .ExpandingouttheexpressionforJ(z),wehave

    J(z) = (gBz)T(gBz) = (gT(Bz)T)(gBz)=gT(gBz)(Bz)T(gBz)

    T=g ggTBz(Bz)Tg+ (Bz)T(Bz)T TBT=g ggTBzz g+zTBTBz ,

    wherewehaveusedthetransposerulewhichtellsusthat(Bz)T =zTBT. WenotethatgTBz isascalar,so itdoesnotchangeunderthetransposeoperation. Thus,gTBz canbeexpressedas

    TBTgTBz= (gTBz)T =z g ,againbythetransposerule. ThefunctionJ thussimplifiesto

    T TBTJ(z) =g g2z g+zTBTBz .Forconvenience, letusdefineNBTBRnn,sothat

    T TBTJ(z) =g g2z g+zTN z .247

  • 7/27/2019 MIT2 086F12 Notes Unit3

    54/107

    It issimpletoconfirmthateachterm intheaboveexpression is indeedascalar.Thesolutiontotheminimizationproblem isgivenby

    Nz =d ,whered=BTg. Theequation iscalledthenormalequation,whichcanbewrittenoutas

    N11 N12 N1n d1z1

    z2... = .N21 N22 N2n. . ... . . ... . . d2...

    "

    Nn1 Nn2 Nnn zn dn

    '

    Theexistenceanduniquenessofz isguaranteedassumingthatthecolumnsofBareindependent.Weprovidebelowtheproofthatz istheuniqueminimizerofJ(z)inthecaseinwhichBhas

    independentcolumns.

    Proof. WefirstshowthatthenormalmatrixN issymmetricpositivedefinite,i.e.xTN x>0, xRn (x=0) ,

    assumingthecolumnsofB are linearly independent. ThenormalmatrixN =BTB issymmetricbecause

    "

    NT = (BTB)T =BT(BT)T =BTB=N .ToshowN ispositivedefinite,wefirstobservethat

    xTN x=xTBTBx= (Bx)T(Bx) =IBxI2 .That is, xTN x is the 2-norm of Bx. Recall that the norm of a vector is zero if and only if thevectoristhezerovector. Inourcase,

    xTNx=0 ifandonly if Bx= 0 .

    '

    BecausethecolumnsofB are linearly independent,Bx=0 ifandonlyifx=0. Thus,wehavexTNx=IBxI2 >0, x= 0 .

    Thus,N issymmetricpositivedefinite.Nowrecallthatthefunctiontobeminimizedis

    T TBTJ(z) =g g2z g+zTN z .Ifz minimizesthefunction,thenforanyz=0,wemusthave

    J(z)< J(z +z) ;

    Letus

    expand

    J(z

    +

    z):

    T J(z +z) =g g2(z +z)TBTg+ (z +z)TN(z +z),

    T BT=g g2z g+ (z)TNz2zTBTg+zTN z+ (z)TN z +zTN z ,v vJ(z) zTNTz=zTN z

    =J(z) + 2zT(NzBTg) +zTN z .

    248

    =

    =

    =

  • 7/27/2019 MIT2 086F12 Notes Unit3

    55/107

    Note that NT = N because NT = (BTB)T = BTB = N. If z satisfies the normal equation,Nz =BTg,then

    NzBTg= 0 ,andthus

    J(z +z) =J(z) +zTNz .Thesecondterm isalwayspositivebecauseN ispositivedefinite. Thus,wehave

    J(z +z)> J(z), z= 0 ,or,settingz=zz ,

    J(z)< J(z), z=z .Thus, z satisfying the normal equation N z =BTg is the minimizer of J, i.e., the least-squares

    solutiontotheoverdeterminedsystemBz=g.

    Example17.3.3 21 least-squaresand itsgeometric interpretationConsiderasimplecaseofaoverdeterminedsystem,

    B= 2 z = 1 .1 2

    Because the system is 21, there is a single scalar parameter, z, to be chosen. To obtain thenormalequation,wefirstconstructthematrixN andthevectord(bothofwhicharesimplyscalarforthisproblem):

    2N =BTB= 2 1 = 51

    d=BTg= 2 1 1 = 4 .2

    Solvingthenormalequation,weobtaintheleast-squaressolution Nz =d 5z = 4 z = 4/5 .

    Thischoiceofz yields2 4 8/5

    Bz = = ,1 5 4/5whichofcourse isdifferentfromg.

    TheprocessisillustratedinFigure17.5. ThespanofthecolumnofB isthelineparameterizedT

    by 2 1 z,zR. RecallthatthesolutionBz isthepointonthe linethat isclosesttog inthe least-squaressense, i.e.

    IBzgI2

  • 7/27/2019 MIT2 086F12 Notes Unit3

    56/107

    0.5 0 0.5 1 1.5 2 2.5

    0.5

    0

    0.5

    1

    1.5

    2

    2.5

    col(B)

    span(col(B))

    g

    Bz*

    Figure17.5: Illustrationof least-squares inR2.Recallingthatthe2 distanceistheusualEuclideandistance,weexpecttheclosestpointtobetheorthogonalprojectionofg ontothe line span(col(B)). Thefigureconfirmsthatthis indeed isthecase. Wecanverifythisalgebraically,

    2 4 1 3/5BT(Bzg) = 2 1 = 2 1 = 0 .

    1 5 2 6/5Thus,theresidualvectorBz g andthecolumnspaceofB areorthogonaltoeachother. Whilethegeometric illustrationoforthogonalitymaybedifficult forahigher-dimensional leastsquares,theorthogonalityconditioncanbecheckedsystematicallyusingthealgebraicmethod.

    17.3.3 ComputationalConsiderationsLet us analyze the computational cost of solving the least-squares system. The first step is theformulationofthenormalmatrix,

    N =BTB ,whichrequiresamatrix-matrixmultiplicationofBT Rnm andBRmn. BecauseN issymmetric,weonlyneedtocomputetheuppertriangularpartofN,whichcorrespondstoperformingn(n+1)/2 m-vector inner products. Thus, the computational cost is mn(n+1). Forming theright-handside,