parameter estimation in linear descriptor systems

Linköping Studies in Science and TechnologyThesis No. 1085

Parameter Estimation in

Linear Descriptor Systems

Markus Gerdin

AUTOMATIC CONTROL

COMMUNICATION SYSTEMS

LINKÖPING

Division of Automatic Control and Communication Systems

Department of Electrical Engineering

Linköping University, SE–581 83 Linköping, Sweden

WWW: http://www.control.isy.liu.se

Email: [email protected]

Linköping 2004

Parameter Estimation in Linear Descriptor Systems

c© 2004 Markus Gerdin

Department of Electrical Engineering,Linköping University,SE–581 83 Linköping,

Sweden.

ISBN 91-7373-931-6ISSN 0280-7971

LiU-TEK-LIC-2004:14

Printed by UniTryck, Linköping, Sweden 2004

To Jessica

Abstract

Linear descriptor systems form the natural way in which linear models ofphysical systems are delivered from an object-oriented modeling tool likeModelica. Linear descriptor systems are also known as linear differential-algebraic equations in the continuous-time case. If some parameters in suchmodels are unknown, one might need to estimate them from measured datafrom the modeled system. This is a form of system identification called graybox identification. The objective of this work is to examine how gray boxidentification can be performed for linear descriptor systems.

To solve this problem, we use some well-known canonical forms to ex-amine how to transform the descriptor systems into state-space form. Ingeneral, the input must be redefined to make the transformation into state-space form possible. To be able to implement the suggested identificationmethods, we examine how the transformations can be computed using nu-merical software from the linear algebra package LAPACK.

Noise modeling is an important part of parameter estimation and systemidentification, so we also examine how a noise model can be added to lineardescriptor systems. The result is that white noise in general cannot beadded to all equations of a linear continuous-time descriptor system, sincethis could lead to differentiation of the noise which is not well defined.It is also noted that a Kalman filter can be implemented if the model istransformed into state-space form.

We also discuss the problem of finding initial values for the parametersearch. We show how to formulate a biquadratic polynomial, that givesinitial values for the parameter search when minimized.

i

Acknowledgments

There are several people who helped me during the work with this thesis.First of all I would like to thank my supervisors Professor Lennart Ljungand Professor Torkel Glad for guiding me in my research in an excellent wayand always taking time to answer my questions.

Furthermore, I would like to thank everyone at the Control and Commu-nication group for providing a nice working atmosphere. I would especiallylike to mention Thomas Schön for the fruitful cooperation on the work onnoise modeling for linear differential-algebraic equations and for proofread-ing parts of the thesis, Martin Enqvist for proofreading and for discussingdifferent topics, Doctor Johan Löfberg for helping me understand sum ofsquares optimization, Johan Sjöberg for proofreading and for discussions ondifferential-algebraic equations, Gustaf Hendeby for proofreading and forLATEX help, and Magnus Åkerblad for taking time to discuss many spon-taneous questions. I am also thankful for help with many practical issuesfrom Ulla Salaneck.

The TUS group at the Department of Electrical Engineering deservesmy gratitude for keeping the computers running at all times.

This work has been supported by the Swedish Foundation for Strate-gic Research (SSF) through VISIMOD and EXCEL and by the SwedishResearch Council (VR) which is gratefully acknowledged.

Finally I would like to thank my family and friends for inspiration andsupport and Jessica for being an important part of my life, although youare far away.

Markus Gerdin

Linköping, March 2004

iii

Contents

1 Introduction 11.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 21.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Linear Descriptor Systems 72.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 A Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.1 Alternative Canonical Forms . . . . . . . . . . . . . . 172.4 State-Space Form . . . . . . . . . . . . . . . . . . . . . . . . . 192.5 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.6 Discrete-Time Descriptor Systems . . . . . . . . . . . . . . . 25

2.6.1 Regularity . . . . . . . . . . . . . . . . . . . . . . . . . 252.6.2 A Canonical Form . . . . . . . . . . . . . . . . . . . . 262.6.3 State-Space Form . . . . . . . . . . . . . . . . . . . . . 27

2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Computation of the Canonical Forms 313.1 Generalized Eigenvalues . . . . . . . . . . . . . . . . . . . . . 323.2 Computation of the Canonical Forms . . . . . . . . . . . . . . 333.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.4 Discrete-Time Descriptor Systems . . . . . . . . . . . . . . . 37

v

vi Contents

3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Noise Modeling and Kalman Filtering 394.1 Noise Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.1.1 Time Domain Derivation . . . . . . . . . . . . . . . . 404.1.2 Frequency Domain Derivation . . . . . . . . . . . . . . 46

4.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.3 Sampling with Noise Model . . . . . . . . . . . . . . . . . . . 52

4.3.1 A Shortcut . . . . . . . . . . . . . . . . . . . . . . . . 554.4 Kalman Filtering . . . . . . . . . . . . . . . . . . . . . . . . . 554.5 Discrete-Time Descriptor Systems . . . . . . . . . . . . . . . 56

4.5.1 Noise Modeling . . . . . . . . . . . . . . . . . . . . . . 564.5.2 Kalman Filtering . . . . . . . . . . . . . . . . . . . . . 58

4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5 Parameter Estimation in Linear Descriptor Systems 595.1 System Identification . . . . . . . . . . . . . . . . . . . . . . . 59

5.1.1 Time Domain Identification . . . . . . . . . . . . . . . 605.1.2 Frequency Domain Identification . . . . . . . . . . . . 62

5.2 Time Domain Identification for Descriptor Systems . . . . . . 635.3 Frequency Domain Identification for Descriptor Systems . . . 645.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . 685.5 Discrete-Time Descriptor Systems . . . . . . . . . . . . . . . 69

5.5.1 Time Domain Identification . . . . . . . . . . . . . . . 695.5.2 Frequency Domain Identification . . . . . . . . . . . . 70

5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6 Initialization of Parameter Estimates 736.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736.2 Transforming the Problem . . . . . . . . . . . . . . . . . . . . 76

6.2.1 The Case of Invertible E(θ) . . . . . . . . . . . . . . . 766.2.2 The Case of Non-Invertible E(θ) . . . . . . . . . . . . 78

6.3 Sum of Squares Optimization . . . . . . . . . . . . . . . . . . 816.4 Discrete-Time Descriptor Systems . . . . . . . . . . . . . . . 826.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7 Conclusions 85

Bibliography 87

Notation

E, J , K, L System matrices in a descriptor system, see equation (2.1)and (2.2)

A, B, C, D System matrices in a state-space system or a canonicalform of a descriptor system

N Nilpotent matrix in canonical forms of descriptor systems,see Section 2.3

ξ(t) Vector of internal variables in a descriptor system at timet, see equation (2.1) and (2.2)

x(t) State vector in a state-space description at time t

u(t) Input signal at time t

y(t) Output signal at time t

e(t) Noise signal at time t

v1(t) Process noise at time t

v2(t) Measurement noise at time t

U(·) Frequency domain version of u(t)Y (·) Frequency domain version of y(t)G(·) Transfer function from input to outputH(·) Transfer function from noise to outputt Time variable

vii

viii Notation

s Laplace transform variablez z transform variable

L[·] Laplace transform of the argumentZ[·] z transform of the argumentdet(·) The determinant of the argumentE(x) Expected value of the stochastic variable x

⊗ Kronecker productvec(·) An ordered stack of the columns of the (matrix) argument

from left to right starting with the first columnq Shift operator, qu(t) = u(t + 1)

p Differentiation operator, pu(t) = du(t)dt

x(t) Derivative of x(t) with respect to time

P , Q Transformation matrices from Lemma 2.3In Identity matrix of size n × n

θ Vector of unknown variables in a system identificationproblem

ZN Measured data, {u(0), y(0), ..., u(N), y(N)} or{U(ω1), Y (ω1), ..., U(ωN ), Y (ωN )}

y(t|θ) A model’s prediction of y(t) given θ and Zt−1

ε(t, θ) Prediction error, y(t) − y(t|θ)

1

Introduction

Modeling of physical systems is a fundamental problem within the engi-neering sciences. Examples of physical systems that can be modeled are theweather, a human cell and an electrical motor. Although models of thesesystems of course differ greatly in complexity, they all have in common thatthey can be used to make predictions. A model of the weather could be usedto make weather forecasts, a model of a human cell could be used to predicthow it will react to different drugs, and a model of an electrical motor couldbe used to predict the effect of applying a certain voltage. The models canbe constructed in different ways. One method is to use well-known physicalrelations, such as Newton’s and Kirchoff’s laws. We will call this physical

modeling. Another way is to estimate a model using measurements fromthe system. This is called system identification. For the electrical motor,we could for example measure the applied voltage and the resulting angleon the axis of the motor and estimate a model from that. A third case,which is a combination of the two previous modeling methods, is when wehave constructed a model using physical relations but do not know the val-ues of certain parameters in the model. These parameters could then beestimated using measurements from the system even if we cannot measurethem directly. We will refer to this as gray box identification.

Traditionally, physical modeling has been performed by manually writ-ing down the equations that describe the system. If gray box identification

1

2 Chapter 1 Introduction

then is necessary, the equations must be transformed manually into a suit-able form. The manual modeling has today partly been replaced by toolsthat automate the physical modeling process. They both include tools formodeling systems within a certain domain, such as electrical systems, andgeneral modeling tools that allow modeling of systems that contain com-ponents from different domains. An example of a modeling language formulti-domain modeling is Modelica (Fritzson, 2004; Tiller, 2001). Thesetools can greatly simplify the modeling task, but if gray box identificationis necessary, it is usually still necessary to transform equations manuallyinto a format that is suitable for system identification. In this thesis wewill examine how gray box identification in models generated by a modelingtool can be automated.

1.1 Problem Formulation

In the most general setting, we want to estimate parameters in a collectionof equations that has been created by a modeling tool. These equationsrelate a vector of variables ξ(t) that vary with time, and their derivativeswith respect to time ξ(t), with inputs to the system u(t). Here t pointsout dependence on time. In the equations, there are also some unknownparameters θ that are to be estimated. An output y(t) from the system ismeasured. The relationships can be described by the equations

F (ξ(t), ξ(t), u(t), t, θ) = 0 (1.1a)y(t) = h(ξ(t), θ). (1.1b)

This is called a differential-algebraic equation, or DAE. Estimation of pa-rameters in this general structure is discussed in the book by Schittkowski(2002). Here the unknown parameters are estimated in a noise-free environ-ment. In principle this means that the parameters are estimated by mini-mizing the difference between measured and simulated y(t) in a quadraticnorm. Most of the discussion in Schittkowski (2002) is devoted to how theresulting optimization problem can be solved. We will in this thesis restrictthe discussion to the case when the equations form a linear time-invariantDAE, or a linear descriptor system, where more can be done for example interms of noise modeling. In this case, the equations can be written as

E(θ)ξ(t) = J(θ)ξ(t) + K(θ)u(t) (1.2a)y(t) = L(θ)ξ(t) (1.2b)

1.2 Outline 3

where E(θ), J(θ), K(θ), and L(θ) are matrices that contain unknown pa-rameters θ that are to be estimated. The discrete-time counterpart of (1.2)will also be discussed.

Below we provide an example of a descriptor system. For this exampleit would be possible to transform the system into a form suitable for sys-tem identification manually, but it would be much more convenient if theidentification software could handle the descriptor system directly.

Example 1.1 Consider a cart which is driven forward by an electricalmotor connected to one pair of the wheels. The parameters of a model ofthe system are the mass m, the radius of the wheels r, the torque constantof the motor k, and the resistance R and inductance L of the motor coil.The variables describing the system are the velocity of the cart v(t), theacceleration of the cart a(t), the force between the wheels and the groundF (t), the torque from the motor M(t), the angular velocity of the motor axisω(t), some voltages in the motor, uL(t), uR(t), and ug(t), and the currentI(t). The input to the system is the voltage u(t). If this system is modeledwith a modeling tool, we get a collection of equations describing the system,e.g.:

F (t) = ma(t)dv(t)

dt= a(t) (1.3a)

F (t) = rM(t) rω(t) = v(t) (1.3b)M(t) = kI(t) ug(t) = kω(t) (1.3c)uR(t) = RI(t) u(t) = uL(t) + ug(t) + uR(t) (1.3d)

uL(t) = LdI(t)

dt(1.3e)

These equations could automatically be rewritten in the form (1.2). How-ever, it would be tedious work to transform the equations into a form suit-able for system identification if we wanted the estimate one or more of theparameters. How this process can be automated is one of the problemsdiscussed in this thesis.

1.2 Outline

The purpose of this thesis is to describe how unknown parameters in lineardescriptor systems can be estimated from measurement data. A backgroundon descriptor systems is provided in Chapter 2, where different transforma-tions are examined. The chapter includes a transformation to state-space


form and sampling of continuous-time descriptor systems. In Chapter 3it is discussed how the transformations in Chapter 2 can be performed bynumerical software from the linear algebra package LAPACK. Noise mod-eling is an important part of system identification, so in Chapter 4 it isexamined how a noise model can be added to a linear descriptor system. InChapter 5, the results from the previous chapters are used to show how thesystem identification problem for linear descriptor systems can be solved.The parameter estimation is usually performed as a minimization of a func-tion which may have several local minima, so in Chapter 6 we discuss howinitial values for the parameter search can be selected. Some concludingremarks are provided in Chapter 7.

The discussion in the thesis is concentrated on continuous-time descrip-tor systems, but at the end of each chapter the results are extended to thediscrete-time case.

1.3 Contributions

The main contributions of the thesis are:

• The idea to redefine the input of a linear descriptor system to allow astate-space description (Chapter 2).

• The sampling method for linear continuous-time descriptor systems inChapter 2.

• The discussion on how the canonical forms in Chapter 2 can be com-puted with the linear algebra package LAPACK (Chapter 3).

• Theorem 4.1 and 4.3 that describe how white noise can be added tolinear descriptor systems (Chapter 4).

• The discussion on how a Kalman filter can be implemented for linearcontinuous-time descriptor systems in Chapter 4.

• The conclusion in Chapter 5 that unknown parameters in linear de-scriptor systems can be estimated by calculating a state-space descrip-tion for each parameter value for which it is necessary.

• The result that the parameter initialization problem under certainconditions can be transformed to the minimization of a biquadraticpolynomial (Chapter 6).

1.3 Contributions 5

The main results in Chapter 4 have been developed in cooperation withThomas Schön and Professor Fredrik Gustafsson and have previously beenpublished in

Schön, T., Gerdin, M., Glad, T., and Gustafsson, F. (2003). Amodeling and filtering framework for linear differential-algebraicequations. In Proceedings of the 42nd IEEE Conference on De-

cision and Control, pages 892–897, Maui, Hawaii, USA.

The main results in Chapter 2 and 5 have been published in

Gerdin, M., Glad, T., and Ljung, L. (2003). Parameter estima-tion in linear differential-algebraic equations. In Proceedings of

the 13th IFAC symposium on system identification, pages 1530–1535, Rotterdam, the Netherlands.

2

Linear Descriptor Systems

In this chapter we will discuss some concepts concerning linear descriptorsystems that will be needed to motivate or develop the theory discussed inthe later chapters. Linear descriptor systems are also known as singular sys-tems, implicit systems, and in the continuous-time case, linear differential-algebraic equations (DAE).

2.1 Introduction

A linear descriptor system in continuous-time is a system of equations inthe form

Eξ(t) = Jξ(t) + Ku(t) (2.1a)y(t) = Lξ(t). (2.1b)

In this description E and J are constant square matrices and K and L areconstant matrices. Note that E may be a singular matrix. This makes itpossible to include a purely algebraic equation in the description by lettinga row of E be equal to zero. The vectors u(t) and y(t) are the input andthe measured output respectively. Finally, the vector ξ(t) contains internal

variables that describe the current state of the system. The discrete-time

7

8 Chapter 2 Linear Descriptor Systems

counterpart of (2.1) is

Eξ(t + 1) = Jξ(t) + Ku(t) (2.2a)y(t) = Lξ(t). (2.2b)

Two good references on descriptor systems are the book by Dai (1989b)and the survey by Lewis (1986). They discuss both general properties ofdescriptor systems such as regularity and canonical forms, as well as control-lability, observability, and different control and estimation strategies. Theyare both focused on the continuous-time case, but also treat discrete-timedescriptor systems. Many references to earlier work is provided by bothauthors. Within the numerical analysis literature, Brenan et al. (1996) isworth mentioning. The main topic is the numerical solution of nonlin-ear differential-algebraic equations, but linear continuous-time descriptorsystems are also treated. One can also note that descriptor systems arespecial cases of the general linear constant differential equations discussedby Rosenbrock (1970). Rosenbrock’s analysis is mainly carried out in thefrequency domain. Descriptor systems are also special cases of the general

differential systems discussed by Kailath (1980, Chapter 8). The viewpointon descriptor systems as just a collection of equations describing how the in-ternal variables of the system are related, is related to the behavioral systems

discussed by Polderman and Willems (1998).The main topics of this chapter is to describe how the descriptor sys-

tem (2.1) can be transformed into different canonical forms (Section 2.3),and how it can be further transformed into a state-space system with aredefined input (Section 2.4). It is also discussed how a descriptor systemcan be sampled by first transforming it to state space form in Section 2.5.In Section 2.6 the results for the continuous-time case are extended to thediscrete-time case.

The canonical forms in Section 2.3 have been discussed previously by forexample Dai (1989b), but are derived here for completeness. Furthermore,the proof provided here is constructed so that the different steps can be read-ily implemented with numerical software, which is discussed in Chapter 3.The idea to transform a continuous-time descriptor system into state-spaceform by redefining the input as discussed in Section 2.4 has, as far as weknow, not been published by other authors. However, for the discrete-timecase, some related ideas are provided by Dai (1987). Consequently, themethod for sampling descriptor systems given in Section 2.5 is also new.Dai (1989b, page 265) also points out that sampling of descriptor systemsneeds further work.

2.2 Regularity 9

Before proceeding into details of this canonical forms it may be worth-while to note that (2.1) has the transfer function

G(s) = L(sE − J)−1K. (2.3)

The only difference between G(s) in (2.3) and the transfer function of astate-space system is that G(s) in (2.3) may be non-proper (have higherdegree in the numerator than in the denominator) in the general case. Thiscan be realized from the following example:

s

[0 10 0

]

︸︷︷︸

E

−

[1 00 1

]

︸︷︷︸

J

−1

= −

[1 s

0 1

]

(2.4)

It can be noted that the transfer function in (2.3) only is well defined if(sE−J) is non-singular. In Section 2.2 we will define the non-singularity ofthis matrix as regularity of the system (2.1). We will also see that regularityof a descriptor system is equivalent to the existence of a unique solution.

2.2 Regularity

A basic assumption which is made throughout this thesis is that the inversein (2.3) is well defined, and therefore we formalize this with a definition.

Definition 2.1 (Regularity)The linear continuous-time descriptor system

Eξ(t) = Jξ(t) + Ku(t) (2.5a)y(t) = Lξ(t) (2.5b)

is called regular ifdet(sE − J) 6≡ 0, (2.6)

that is the determinant is not zero for all s.

This definition is the same as the one used by Dai (1989b). The reasonthat regularity of a linear descriptor system is a reasonable assumption, isthat it is equivalent to the existence of a unique solution, as discussed by Dai(1989b, Chapter 1). To illustrate this, we examine the Laplace transformedversion of (2.5a):

sEL [ξ(t)] − Eξ(0) = JL [ξ(t)] + KL [u(t)] (2.7)


where L[·] means the Laplace transform of the argument. Rearranging this,we get

(sE − J)L [ξ(t)] = KL [u(t)] + Eξ(0) (2.8)

If the system is regular, we get that L [ξ(t)] is uniquely determined by

L [ξ(t)] = (sE − J)−1 (KL [u(t)] + Eξ(0)) . (2.9)

If, on the other hand, the system is not regular, there exists a vectorα(s) 6≡ 0 such that

(sE − J)α(s) ≡ 0. (2.10)

We get that if the system is not regular and a solution of (2.8) is L [ξ(t)],then so is L [ξ(t)]+kα(s) for any constant k. A solution is consequently notunique. It is also obvious that a solution may not even exist if the systemis not regular, for example if (sE − J) ≡ 0.

To draw conclusions about ξ(t) from the existence of L [ξ(t)], we shouldexamine if the inverse Laplace transform exists. We do not go into thesetechnicalities here. However, in the next section we will see how a regulardescriptor system can be transformed into a form where the existence of asolution is obvious.

It is usually a reasonable assumption that a system has an input whichuniquely determines the value of the internal variables for each initial con-dition. With this motivation, it will be assumed throughout this thesis thatthe systems encountered are regular.

We conclude this section with a small example to illustrate the connec-tion between solvability and regularity.

Example 2.1 (Regularity) Consider the body with mass m in Fig-ure 2.1. The body has position ξ1(t) and velocity ξ2(t) and is affected by aforce F (t). The equations describing the system are

ξ1(t) = ξ2(t) (2.11a)

mξ2(t) = F (t) (2.11b)

which also can be written as

[1 00 m

]

︸︷︷︸

E

[ξ1(t)

ξ2(t)

]

=

[0 10 0

]

︸︷︷︸

J

[ξ1(t)ξ2(t)

]

+

[01

]

︸︷︷︸

K

F (t) (2.12)

2.3 A Canonical Form 11

mF (t)

ξ2(t)

ξ1(t)

Figure 2.1 A body affected by a force.

which is a linear descriptor system (without output equation). We get that

det(sE − J) = ms2 (2.13)

and the system is regular if and only if m 6= 0. According to the discussionearlier this gives that there exists a unique solution if and only if m 6= 0.This is also obvious from the original equations (2.11). In this example wealso see that regularity is a reasonable requirement on the system.

2.3 A Canonical Form

In this section we examine how a linear continuous-time descriptor systemcan be rewritten in a form which resembles a state-space system and ex-plicitly shows how the solution of the descriptor system can be obtained.This transformation will later play an important role in the developmentof the identification algorithms. Similar transformations have been consid-ered earlier in the literature (see e.g., Dai, 1989b), but the proofs whichare presented in this section have been constructed so that the indicatedcalculations can be computed by numerical software in a reliable manner.How the different steps of the proofs can be computed numerically is stud-ied in detail in Chapter 3. It can be noted that the system must be regularfor the transformation to exist, but as discussed in Section 2.2 regularity isequivalent to solvability of the system.

The main result is presented in Theorem 2.1, but to derive this result weuse a series of lemmas as described below. The first lemma describes howthe system matrices E and J simultaneously can be written in triangularform with the zero diagonal elements of E sorted to the lower right block.


Lemma 2.1Consider a system


If (2.14) is regular, then there exist non-singular matrices P1 and Q1 suchthat

P1EQ1 =

[E1 E2

0 E3

]

and P1JQ1 =

[J1 J2

0 J3

]

(2.15)

where E1 is non-singular, E3 is upper triangular with all diagonal elementszero and J3 is non-singular and upper triangular.

Note that either the first or the second block row in (2.15) may be of sizezero.

Proof. The Kronecker canonical form of a regular matrix pencil discussedin, e.g., Kailath (1980, Chapter 6) directly shows that it is possible toperform the transformation (2.15). 2

In the case when the matrix pencil is regular, the Kronecker canoni-cal form is also called the Weierstrass canonical form. The Kronecker andWeierstrass canonical forms are also discussed by (Gantmacher, 1960, Chap-ter 12). The original works by Weierstrass and Kronecker are (Weierstrass,1867) and (Kronecker, 1890).

Note that the full Kronecker form is not computed by the numerical soft-ware discussed in Section 3.2. The Kronecker form is here just a convenientway of showing that the transformation (2.15) is possible.

The next two lemmas describe how the internal variables of the sys-tem can be separated into two parts by making the system matrices blockdiagonal.

Lemma 2.2Consider (2.15). There exist matrices L and R such that

[I L

0 I

] [E1 E2

0 E3

] [I R

0 I

]

=

[E1 00 E3

]

(2.16)

and [I L

0 I

] [J1 J2

0 J3

] [I R

0 I

]

=

[J1 00 J3

]

. (2.17)


See Kågström (1994) and references therein for a proof of this lemma.

Lemma 2.3Consider a system


If (2.18) is regular, there exist non-singular matrices P and Q such that thetransformation

PEQQ−1ξ(t) = PJQQ−1ξ(t) + PKu(t) (2.19)

gives the system[I 00 N

]

Q−1ξ(t) =

[A 00 I

]

Q−1ξ(t) +

[B

D

]

u(t) (2.20)

where N is a nilpotent matrix.

Proof. Let P1 and Q1 be the matrices in Lemma 2.1 and define

P2 =

[I L

0 I

]

(2.21a)

Q2 =

[I R

0 I

]

(2.21b)

P3 =

[E−1

1 0

0 J−13

]

(2.21c)

where L and R are from Lemma 2.2. Also let

P = P3P2P1 (2.22a)Q = Q1Q2. (2.22b)

Then

PEQ =

[I 0

0 J−13 E3

]

(2.23)

and

PJQ =

[E−1

1 J1 00 I

]

(2.24)

Here N = J−13 E3 is nilpotent since E3 is upper triangular with zero diagonal

elements and J−13 is upper triangular. J−1

3 is upper triangular since J3 is.

Defining A = E−11 J1 finally gives us the desired form (2.20). 2


We are now ready to present the main result in this section, which showshow a solution of the descriptor equations can be obtained. We get thisresult by observing that the first block row of (2.20) is a normal state-spacedescription and showing that the solution of the second block row is a sumof the input and some of its derivatives.

Theorem 2.1Consider a system


If (2.25) is regular, its solution can be described by

w1(t) = Aw1(t) + Bu(t) (2.26a)

w2(t) = −Du(t) −m−1∑

i=1

N iDu(i)(t) (2.26b)

[w1(t)w2(t)

]

= Q−1ξ(t) (2.26c)

y(t) = LQ

[w1(t)w2(t)

]

. (2.26d)

Proof. According to Lemma 2.3 we can without loss of generality assumethat the system is in the form

[I 00 N

] [w1(t)w2(t)

]

=

[A 00 I

] [w1(t)w2(t)

]

+

[B

D

]

u(t) (2.27a)[w1(t)w2(t)

]

= Q−1ξ(t) (2.27b)

y(t) = LQ

[w1(t)w2(t)

]

. (2.27c)

where

w(t) =

[w1(t)w2(t)

]

(2.28)

is partitioned according to the matrices.

Now, if N = 0 we have that

w2(t) = −Du(t) (2.29)


and we are done. If N 6= 0 we can multiply the second row of (2.27a) withN to get

N2w2(t) = Nw2(t) + NDu(t). (2.30)

We now differentiate (2.30) and insert the second row of (2.27a). This gives

w2(t) = −Du(t) − NDu(t) + N2w2(t) (2.31)

If N2 = 0 we are done, otherwise we just continue until Nm = 0 (this is truefor some m since N is nilpotent). We would then arrive at an expressionlike

w2(t) = −Du(t) −

m−1∑

i=1

N iDu(i)(t) (2.32)

and the proof is complete. 2

Note that the internal variables of the system, and therefore also theoutput, may depend directly on derivatives of the input. However, it can benoted that the internal variables of physical systems seldom depend directlyon derivatives of the input since this would for example lead to the internalvariables taking infinite values for a step input. In the common case of nodependence on the derivative of the input, we will have

ND = 0. (2.33)

This relation will also play an important role in Chapter 4 when it is ex-amined how a noise model can be added to the system without having toaccept derivatives of the noise in the solution.

We conclude the section with an example which shows what the form(2.26) is for a simple electrical system.

Example 2.2 (Canonical form) Consider the electrical circuit in Fig-ure 2.2. With I1(t) as the output and u(t) as the input, the equationsdescribing the systems are

0 0 L

0 0 00 0 0

I1(t)

I2(t)

I3(t)

=

0 0 01 −1 −10 −R 0

I1(t)I2(t)I3(t)

+

101

u(t) (2.34a)

y(t) =[1 0 0

]

I1(t)I2(t)I3(t)

(2.34b)


u(t)

I1(t)

I2(t) I3(t)

R L

Figure 2.2 A small electrical circuit.

Transforming the system into the form (2.20) gives

1 0 00 0 00 0 0

0 0 10 1 01 0 −1

I1(t)

I2(t)

I3(t)

=

0 0 00 1 00 0 1

0 0 10 1 01 0 −1

I1(t)I2(t)I3(t)

+

1L

− 1R

− 1R

u(t) (2.35a)

y(t) =[1 0 0

]

I1(t)I2(t)I3(t)

. (2.35b)

Further transformation into the form (2.26) gives

w1(t) =1

Lu(t) (2.36a)

w2(t) = −

[− 1

R

− 1R

]

u(t) (2.36b)

[w1(t)w2(t)

]

=

0 0 10 1 01 0 −1

I1(t)I2(t)I3(t)

(2.36c)

y(t) =[1 0 0

]

1 0 10 1 01 0 0

[w1(t)w2(t)

]

. (2.36d)

We can here see how the state-space part has been singled out by the trans-formation. In (2.36c) we can see that the state-space variable w1(t) is equal


to I3(t). This is natural, since the only dynamic element in the circuit isthe inductor. The two variables in w2(t) are I2(t) and I1(t) − I3(t). Thesevariables depend directly on the input.

2.3.1 Alternative Canonical Forms

The transformations presented above are the ones that will be used in thisthesis, mainly because they clearly show the structure of the system andbecause they can be computed with numerical software as will be discussedin Chapter 3. Several other transformations have been suggested in theliterature, so we will review some alternative transformations here. Allmethods discussed assume that the descriptor system is regular.

Shuffle Algorithm

The shuffle algorithm, which was suggested by Luenberger (1978), was as thename suggests presented as an algorithm to reach a certain canonical form.The non-reduced form of the shuffle algorithm applied to the descriptorsystem (2.1) gives the canonical form

ξ(t) = E−1

(

Jξ(t) +

m∑

i=0

Kiu(i)(t)

)

. (2.37)

We show below how to calculate the matrices E, J , and Ki. The shufflealgorithm has the advantage that no coordinate transformation is necessary.However, in (2.37) it looks as if the initial condition ξ(0) can be chosenarbitrarily, which is not the case. It is instead partly determined by u(0)and its derivatives. There is also a reduced form of the shuffle algorithmwhich explicitly shows how the initial conditions can be chosen.

The form (2.37) is computed by first transforming the matrix[E J K

](2.38)

by row operations (for example Gauss elimination) into the form[E1 J1 K1

0 J2 K2

]

(2.39)

where E1 is non-singular. We now have the system[E1

0

]

ξ(t) =

[J1

J2

]

ξ(t) +

[K1

K2

]

u(t). (2.40)


By differentiating the second row (this is the “shuffle” step) we get[

E1

−J2

]

︸︷︷︸

E

ξ(t) =

[J1

0

]

︸︷︷︸

J

ξ(t) +

[K1

0

]

︸︷︷︸

K0

u(t) +

[0

K2

]

︸︷︷︸

K1

u(t). (2.41)

Note that we through this differentiation loose information about the con-nection between the initial conditions ξ(0) and u(0). If E is non-singular,we just multiply with E−1 from the left to get (2.37). If it is singular, theprocess is continued until we get a non-singular E.

SVD Coordinate System

The SVD coordinate system of the descriptor system (2.1) is calculated bytaking the singular value decomposition (SVD) of E,

UEV =

[Σ 00 0

]

(2.42)

where Σ contains the non-zero singular values of E and U and V are or-thogonal matrices. The transformation

UEV V −1ξ(t) = UJV V −1ξ(t) + UKu(t) (2.43)

then gives the system[Σ 00 0

]

V −1ξ(t) =

[J11 J12

J21 J22

]

V −1ξ(t) +

[K1

K2

]

u(t). (2.44)

It can be noted that the block rows here do not need to have the same sizeas the block rows in the canonical form (2.20). The SVD coordinate systemwas discussed by Bender and Laub (1987) who use it to examine generalsystem properties and to derive a linear-quadratic regulator for continuous-time descriptor systems. This transformation cannot immediately be usedto get a state-space-like description, but it is used as a first step in othertransformations, e.g., Kunkel and Mehrmann (1994).

Triangular Form

We get the triangular form if we stay with the description in Lemma 2.1.The transformed system is then

[E1 E2

0 E3

]

Q−11 ξ(t) =

[J1 J2

0 J3

]

Q−11 ξ(t) +

[K1

K2

]

u(t) (2.45)

2.4 State-Space Form 19

where E1 is non-singular, E3 is upper triangular with all diagonal elementszero and J3 is non-singular and upper triangular. Using this form we couldderive a expression similar to (2.26). A drawback is that here both w1(t)and w2(t) would depend on derivatives of u(t), which can be verified bymaking calculations similar to those in the proof of Theorem 2.1. A goodthing about this form is that the matrices L and R of Lemma 2.2 do nothave to be computed.

2.4 State-Space Form

Within the control community, the theory for state-space systems is muchmore developed than the theory for descriptor systems. For state-space sys-tems there are many methods available for control design, state estimationand system identification, see e.g., Glad and Ljung (2000), Kailath et al.(2000), and Ljung (1999). For state-space systems it is also well establishedhow the systems can be sampled, that is how an exact discrete-time coun-terpart of the systems can be calculated under certain assumptions on theinput, see e.g., Åström and Wittenmark (1984). To be able to use theseresults for descriptor systems, we in this section examine how a continuous-time descriptor system can be transformed to a state-space system. We willsee that a descriptor system always can be transformed to a state spacesystem if we are allowed to redefine the input as one of its derivatives.

What we will do is to transform a continuous-time descriptor system


into state-space form,

x(t) = Ax(t) + Bu(t) (2.47a)y(t) = Cx(t) + u(t). (2.47b)

Here we have written u(t) in the state-space form to point out the factthat the input might have to be redefined as one of its derivatives. We willassume that the descriptor system is regular. This implies, according to


Theorem 2.1, that the system can be transformed into the form

w1(t) = Aw1(t) + Bu(t) (2.48a)

w2(t) = −Du(t) −

m−1∑

i=1

N iDu(i)(t) (2.48b)

[w1(t)w2(t)

]

= Q−1ξ(t) (2.48c)

y(t) = LQ

[w1(t)w2(t)

]

. (2.48d)

If m = 1 no derivatives of u(t) occur in the description and we directlyget that (2.48) is equivalent to the state-space description

w1(t) = A︸︷︷︸

A

w1(t) + B︸︷︷︸

B

u(t) (2.49a)

y(t) = LQ

[I

0

]

︸︷︷︸

C

w1(t) + LQ

[0

−D

]

︸︷︷︸

D

u(t). (2.49b)

If m > 1, the idea is to redefine the input as its m − 1:th derivative, sothe original input and some of its derivatives need to be included as statevariables in the new description. We therefore define a vector with the inputand some of its derivatives,

w3(t) =

u(t)u(t)

...u(m−2)(t)

. (2.50)

This vector will be part of the state vector in the transformed system. To beable to include w3(t) in the state vector, we need to calculate its derivativewith respect to time:

w3(t) =

u(t)u(t)

...u(m−1)(t)

=

0 I . . . 0...

.... . .

...0 0 . . . I

0 0 . . . 0

w3(t) +

0...0I

u(m−1)(t) (2.51)

2.4 State-Space Form 21

We can now rewrite (2.48) to depend on w3(t) instead of dependingdirectly on the different derivatives of u(t). The new description will be

w1(t) = Aw1(t) +[B 0 . . . 0

]w3(t) (2.52a)

w2(t) = −[D ND . . . Nm−2D

]w3(t) − Nm−1Du(m−1)(t) (2.52b)

w3(t) =

0 I . . . 0...

.... . .

...0 0 . . . I

0 0 . . . 0

w3(t) +

0...0I

u(m−1)(t) (2.52c)

y(t) = LQ

[w1(t)w2(t)

]

(2.52d)

The final step to obtain a state-space description is to eliminate w2(t)from these equations. The elimination is performed by inserting (2.52b)into (2.52d):

[w1(t)w3(t)

]

=

A B 0 . . . 00 0 I . . . 0...

......

. . ....

0 0 0 . . . I

0 0 0 . . . 0

︸︷︷︸

A

[w1(t)w3(t)

]

+

00...0I

︸︷︷︸

B

u(m−1)(t) (2.53a)

y(t) = LQ

[I 0 0 . . . 00 −D −ND . . . −Nm−2D

]

︸︷︷︸

C

[w1(t)w3(t)

]

+ LQ

[0

−Nm−1D

]

︸︷︷︸

D

u(m−1)(t)

(2.53b)

If we let

x(t) =

[w1(t)w3(t)

]

(2.54)

this can be written in the compact form

x(t) = Ax(t) + Bu(m−1)(t) (2.55a)

y(t) = Cx(t) + Du(m−1)(t). (2.55b)


The main purpose of this thesis is to examine how unknown parametersin linear descriptor systems can be estimated, and this is what the state-space system will be used for in the following chapters. However, as pointedout in the beginning, it may be useful to do this conversion in other casesas well, for example when designing controllers. The controller would thengenerate the control signal u(m−1)(t). In order to obtain the actual controlsignal u(t) we have to integrate u(m−1)(t).

We conclude the section by continuing Example 2.2 and writing thesystem in state-space form.

Example 2.3 In Example 2.2 we saw that the equations for the electricalcircuit could be written as

w1(t) =1

Lu(t) (2.56a)

w2(t) = −

[− 1

R

− 1R

]

u(t) (2.56b)

[w1(t)w2(t)

]

=

0 0 10 1 01 0 −1

I1(t)I2(t)I3(t)

(2.56c)

y(t) =[1 0 0

]

1 0 10 1 01 0 0

[w1(t)w2(t)

]

. (2.56d)

Since m = 1 (no derivatives of u(t) occur in the description), w3(t) is notnecessary and (2.49) can be used. This gives us the state-space description

x(t) =1

Lu(t) (2.57a)

y(t) = x(t) +1

Ru(t). (2.57b)

For this simple case, the state-space description could have been derivedmanually from the original equations, but the procedure in the exampleshows how we can compute the state space description automatically. Forlarger systems it may be more difficult to derive the state-space descriptionmanually.

2.5 Sampling

As discussed earlier, there theory for state-space systems is much moredeveloped than the theory for descriptor systems. In the previous section,

2.5 Sampling 23

we showed how a continuous-time descriptor system can be transformedinto a continuous-time state-space system, which gives us the possibilityto use theory for continuous-time state-space systems. However, in manycases measured data from a system is available as sampled data. This couldbe the case both for control, for estimation and for system identification.To handle such cases for continuous time state-space system, one commonapproach is to sample the state-space system, that is to calculate a discretetime counterpart of the state-space system. In this section we examine howa continuous-time descriptor system can be sampled.

The basic result for sampling of state-space systems with piecewise con-stant input is given in Lemma 2.4 below. The main result of this section isthe extension of this lemma to descriptor systems.

Lemma 2.4Consider the state-space system

x(t) = Ax(t) + Bu(t) (2.58a)y(t) = Cx(t) + Du(t). (2.58b)

If u(t) is constant for Tsk ≤ t < Tsk + Ts for constant Ts and k = 0, 1, 2, ...,then x(Tsk) and y(Tsk) are exactly described by the discrete-time state-space system

x(Tsk + Ts) = Φx(Tsk) + Γu(Tsk) (2.59a)y(Tsk) = Cx(Tsk) + Du(Tsk), (2.59b)

where

Φ = eATs (2.60)

Γ =

∫ Ts

0eAτdτB. (2.61)

This is a well known result, and the derivation can be found in forexample Åström and Wittenmark (1984).

Now, if we assume that u(m−1)(t) is piecewise constant, Lemma 2.4 canbe applied to (2.49) or (2.55) to give an exact discrete-time description ofthe original continuous-time descriptor system. We have thus arrived at thefollowing theorem:

Theorem 2.2Consider the regular continuous-time descriptor system



with the canonical form

w1(t) = Aw1(t) + Bu(t) (2.63a)

w2(t) = −Du(t) −

m−1∑

i=1

N iDu(i)(t) (2.63b)

[w1(t)w2(t)

]

= Q−1ξ(t) (2.63c)

y(t) = LQ

[w1(t)w2(t)

]

. (2.63d)

If u(m−1)(t) is constant for Tsk ≤ t < Tsk + Ts for constant Ts andk = 0, 1, 2, ..., then y(Tsk) is exactly described by the discrete-time state-space system

x(Tsk + Ts) = Φx(Tsk) + Γu(m−1)(Tsk) (2.64a)

y(Tsk) = Cx(Tsk) + Du(m−1)(Tsk). (2.64b)

where

Φ = eATs (2.65)

Γ =

∫ Ts

0eAτdτB. (2.66)

and A, B, C, and D are defined in (2.49) or (2.53).

An interesting fact with the sampling procedure suggested here, is thatthe sampled version of a continuous-time descriptor system is a discrete-timestate-space system and not a discrete-time descriptor system.

Note that there are other assumptions on the behavior of u(m−1)(t)between the sample points which also will allow us to calculate an exactdiscrete-time description. One such assumption is that it is piecewise lin-ear.

It can be noted that it is quite uncommon that the internal variables ofphysical systems actually depend on derivatives of the input. If the internalvariables do not depend on derivatives of the input, we will have ND = 0 inthe equations above. The derivations in this section are of course valid alsofor this case, although many of the formulas can be written in a simplerform. For example, we will have m = 1, so we do not need to redefinethe input. Note however that the matrix E in the descriptor system may

2.6 Discrete-Time Descriptor Systems 25

very well be singular even if there is no dependence on derivatives of theinput, so it is still necessary to use the formulas above to write the systemin state-space form and sample it.

2.6 Discrete-Time Descriptor Systems

In this section the discrete-time descriptor system

Eξ(t + 1) = Jξ(t) + Ku(t) (2.67a)y(t) = Lξ(t) (2.67b)

will be treated. Since the sampled version of a continuous-time descriptorsystem is a discrete-time state-space system (see Section 2.5), there is prob-ably fewer applications for discrete-time descriptor systems than for discretetime state-space systems. However, applications could be found among trulydiscrete-time systems such as some economical systems. Discrete-time andcontinuous-time descriptor systems can be treated in a similar fashion, sothe discussion here will be rather brief.

We will show how (2.67) can be written in different canonical forms andthen transformed into state-space form, but we can directly note that (2.67)actually is a discrete-time linear system with the transfer function

G(z) = L(zE − J)−1K. (2.68)

The only difference between G(z) and the transfer function of a discrete-time state-space system is that G(z) here may be non-proper, that is havehigher degree in the numerator than in the denominator. This correspondsto a non-causal system. For an example of matrices E and J that give anon-proper system, see (2.4).

Similarly to the continuous-time case, the transfer function is only welldefined if (zE − J) is non-singular. In the next section we will define non-singularity of this matrix as regularity for the corresponding system andshow that the system is solvable if the system is regular.

2.6.1 Regularity

A basic assumption that will be made about the discrete-time descriptorsystems is that the inverse in (2.68) is well defined, and below this is for-malized with a definition.


Definition 2.2 (Regularity)The discrete-time descriptor system


is called regular if

det(zE − J) 6≡ 0, (2.70)

that is the determinant is not zero for all z.

This definition is the same as the one used by Dai (1989b). As in thecontinuous-time case, regularity is equivalent to the existence of unique so-lution. This is discussed by for example Luenberger (1978) and Dai (1989b).To illustrate this we examine the z transform of equation (2.69a):

(zE − J)Z [ξ(t)] = KZ [u(t)] + zEξ(0) (2.71)

Z[·] represents the z transform of the argument. From this equation we candraw the conclusion that there exists a unique solution Z[ξ(t)] if and onlyif the system is regular.

2.6.2 A Canonical Form

In this section we present a transformation for discrete-time descriptor sys-tems, which gives a canonical form similar to the one for the continuous-timecase presented in Section 2.3. The only difference between the two formsis actually that the derivatives in the continuous-time case are replaced bytime shifts in the discrete-time case.

Theorem 2.3Consider a system

Eξ(t + 1) = Jξ(t) + Ku(t) (2.72a)y(t) = Lξ(t). (2.72b)


If (2.72) is regular, its solution can be described by

w1(t + 1) = Aw1(t) + Bu(t) (2.73a)

w2(t) = −Du(t) −

m−1∑

i=1

N iDu(t + i) (2.73b)

[w1(t)w2(t)

]

= Q−1ξ(t) (2.73c)

y(t) = LQ

[w1(t)w2(t)

]

. (2.73d)

The proof is the same as the one for Theorem 2.1 with all derivativesreplace with time shifts (also in the required lemmas), so it is omitted.

Note that the system is non-causal in the general case as the output candepend on future values of the input. If

ND = 0 (2.74)

however, the system will be causal.

2.6.3 State-Space Form

As mentioned earlier, state-space systems are much more thoroughly treatedin the literature than descriptor systems are. This is also true for the discretetime case, so in this section we examine how a discrete time descriptorsystem can be transformed to a discrete-time state-space system.

We assume that the system has been converted into the form

w1(t + 1) = Aw1(t) + Bu(t) (2.75a)

w2(t) = −Du(t) −

m−1∑

i=1

N iDu(t + i) (2.75b)

[w1(t)w2(t)

]

= Q−1ξ(t) (2.75c)

y(t) = LQ

[w1(t)w2(t)

]

, (2.75d)

which according to Theorem 2.3 is possible if the system is regular. If m = 1,


we directly get the state-space description

w1(t + 1) = A︸︷︷︸

A

w1(t) + B︸︷︷︸

B

u(t) (2.76a)

y(t) = LQ

[I

0

]

︸︷︷︸

C

w1(t) + LQ

[0

−D

]

︸︷︷︸

D

u(t). (2.76b)

If m > 1 we begin by defining a vector with time shifted inputs, correspond-ing to Equation (2.50):

w3(t) =

u(t)u(t + 1)

...u(t + m − 2)

(2.77)

To include w3(t) in the state vector, the time shifted version of it must becalculated:

w3(t + 1) =

u(t + 1)u(t + 2)

...u(t + m − 1)

=

0 I . . . 0...

.... . .

...0 0 . . . I

0 0 . . . 0

w3(t) +

0...0I

u(t + m − 1)

(2.78)Now (2.75) can be rewritten to depend on w3(t) instead of depending di-rectly on the time shifted versions of u(t). The new description of thesolutions will be

w1(t + 1) = Aw1(t) +[B 0 . . . 0

]w3(t) (2.79a)

w2(t) = −[D ND . . . Nm−2D

]w3(t)

− Nm−1Du(t + m − 1)(2.79b)

w3(t + 1) =

0 I . . . 0...

.... . .

...0 0 . . . I

0 0 . . . 0

w3(t) +

0...0I

u(t + m − 1) (2.79c)

y(t) = LQ

[w1(t)w2(t)

]

. (2.79d)

2.7 Conclusions 29

The final step to get a state-space description is to eliminate w2(t) from theseequations. The elimination is performed by inserting (2.79b) into (2.79d):

[w1(t + 1)w3(t + 1)

]

=

A B 0 . . . 00 0 I . . . 0...

......

. . ....

0 0 0 . . . I

0 0 0 . . . 0

︸︷︷︸

A

[w1(t)w3(t)

]

+

00...0I

︸︷︷︸

B

u(t + m − 1) (2.80a)

y(t) = LQ

[I 0 0 . . . 00 −D −ND . . . −Nm−2D

]

︸︷︷︸

C

[w1(t)w3(t)

]

+

LQ

[0

−Nm−1D

]

︸︷︷︸

D

u(t + m − 1)

(2.80b)

If we let

x(t) =

[w1(t)w3(t)

]

(2.81)

this can be written in the compact form

x(t + 1) = Ax(t) + Bu(t + m − 1) (2.82a)

y(t) = Cx(t) + Du(t + m − 1). (2.82b)

The state-space description will in this thesis be used for parameter estima-tion. It could however also have other applications, such as control designand estimation. Similar ideas has for example been used by Dai (1987) forKalman filtering.

2.7 Conclusions

We presented the concept of regularity for descriptor systems and notedthat it is equivalent to the existence of a unique solution. We also discusseda canonical form that is well-known in the literature, and provided a proofthat will allow numerical computation as will be discussed in Chapter 3.This canonical form was then used to derive a state-space description. Toget this state-space description, the input might have to be redefined as one


of its derivatives in the continuous-time case or future values in the discrete-time case. For the continuous-time case, the state-space description wasthen used to sample the descriptor system.

3

Computation of the Canonical

Forms

The transformations presented in Chapter 2 will be used extensively in thethesis. Their existence were proven in Chapter 2, but it was not discussedhow they could actually be computed. To be able use the transformationsin for example a numerical implementation of an identification algorithm,it is of course crucial to be able to compute them numerically in a reliablemanner. We will here discuss how this computation can be performed.

The discussion will include pointers to free implementations of somealgorithms in the linear algebra package LAPACK (Anderson et al., 1999).LAPACK is a is a free collection of routines written in Fortran77 that canbe used for systems of linear equations, least-squares solutions of linearsystems of equations, eigenvalue problems, and singular value problems.LAPACK is more or less the standard way to solve this kind of problems,and is used by commercial software like Matlab. For operations that canbe easily implemented in for example Matlab or Mathematica, such asmatrix multiplication and inversion, no pointers to special implementationswill be made.

Some ideas related to the method presented in this section for computingthe canonical forms, have earlier been published by Varga (1992). Thepresentation here is however more detailed, and is closely connected to thederivation of the canonical forms presented in Chapter 2. Furthermore, wewill use software from LAPACK.

31

32 Chapter 3 Computation of the Canonical Forms

In Section 3.1 we will discuss generalized eigenvalue problems and sometools which are used for solving these problems, as these are the tools whichwe will use to compute the canonical forms. In Section 3.2 we then discusshow the actual computation is performed. The chapter is concluded with asummary of the algorithm for computing the canonical forms and a note onhow the results can be used in the discrete-time case.

3.1 Generalized Eigenvalues

The computation of the canonical forms will be performed with tools thatnormally are used for computation of generalized eigenvalues. Therefore,some theory for generalized eigenvalues will be presented in this section.The theory presented here about generalized eigenvalues can be found infor example Bai et al. (2000) and Golub and van Loan (1996, Section 7.7).

Consider a matrix pencil

λE − J (3.1)

where the matrices E and J are n× n with constant real elements and λ isa scalar variable. We will assume that the pencil is regular, that is

det(λE − J) 6≡ 0 (3.2)

with respect to λ. The generalized eigenvalues are defined as those λ forwhich

det(λE − J) = 0. (3.3)

If the degree p of the polynomial det(λE − J) is less than n, the pencil alsohas n − p infinite generalized eigenvalues. This happens when rankE < n

(Golub and van Loan, 1996, Section 7.7). We illustrate the concepts withan example.

Example 3.1 (Generalized eigenvalues) Consider the matrix pencil

λ

[1 00 0

]

−

[−1 01 −1

]

. (3.4)

We have that

det

(

λ

[1 00 0

]

−

[−1 01 −1

])

= 1 + λ (3.5)

so the matrix pencil has two generalized eigenvalues, ∞ and −1.

3.2 Computation of the Canonical Forms 33

Generalized eigenvectors will not be discussed here, the interested reader isinstead referred to for example Bai et al. (2000).

Since it may be difficult to solve Equation (3.3) for the generalized eigen-values, different transformations of the matrices that simplifies computationof the generalized eigenvalues exist. The transformations are of the form

P (λE − J)Q (3.6)

with invertible matrices P and Q. Such transformations do not change theeigenvalues since

det(P (λE − J)Q) = det(P ) det(λE − J) det(Q). (3.7)

One such form is the Kronecker canonical form of a matrix pencil discussedin e.g., Gantmacher (1960) and Kailath (1980). However, this form cannotin general be computed numerically in a reliable manner (Bai et al., 2000).For example it may change discontinuously with the elements of the matricesE and J . The transformation which we will use here is therefore instead thegeneralized Schur form which requires fewer operations and is more stableto compute (Bai et al., 2000).

The generalized Schur form of a real matrix pencil is a transformation

P (λE − J)Q (3.8)

where PEQ is upper quasi-triangular, that is it is upper triangular withsome 2 by 2 blocks corresponding to complex generalized eigenvalues on thediagonal and PJQ is upper triangular. P and Q are orthogonal matrices.The generalized Schur form can be computed with the LAPACK commandsdgges or sgges. These commands also give the possibility to sort certaingeneralized eigenvalues to the lower right. An algorithm for ordering of thegeneralized eigenvalues is also discussed by Sima (1996). Here we will usethe possibility to sort the infinite generalized eigenvalues to the lower right.

The generalized Schur form discussed here is also called the generalizedreal Schur form, since the original and transformed matrices only containreal elements.

3.2 Computation of the Canonical Forms

The discussion in this section is based on the steps of the proof of the formin Theorem 2.1. We therefore begin by examining how the diagonalizationin Lemma 2.1 can be performed numerically.


The goal is to find matrices P1 and Q1 such that

P1(λE − J)Q1 = λ

[E1 E2

0 E3

]

−

[J1 J2

0 J3

]

(3.9)

where E1 is non-singular, E3 is upper triangular with all diagonal elementszero and J3 is non-singular and upper triangular. This is exactly the formwe get if we compute the generalized Schur form with the infinite gen-eralized eigenvalues sorted to the lower right. This computation can beperformed with the LAPACK commands dgges or sgges. E1 correspondsto finite generalized eigenvalues and is non-singular since it is upper quasi-triangular with non-zero diagonal elements and E3 corresponds to infinitegeneralized eigenvalues and is upper triangular with zero diagonal elements.J3 is non-singular (and thus upper triangular with non-zero diagonal ele-ments), otherwise the pencil would not be regular.

The next step is to compute the matrices L and R in Lemma 2.2, thatis we want to solve the system

[I L

0 I

] [E1 E2

0 E3

] [I R

0 I

]

=

[E1 00 E3

]

(3.10a)[I L

0 I

] [J1 J2

0 J3

] [I R

0 I

]

=

[J1 00 J3

]

. (3.10b)

Performing the matrix multiplication on the left hand side of the equationsyields

[E1 E1R + E2 + LE3

0 E3

]

=

[E1 00 E3

]

(3.11a)[J1 J1R + J2 + LJ3

0 J3

]

=

[J1 00 J3

]

(3.11b)

which is equivalent to the system

E1R + LE3 = −E2 (3.12a)J1R + LJ3 = −J2. (3.12b)

Equation (3.12) is a generalized Sylvester equation (Kågström, 1994). Thegeneralized Sylvester equation (3.12) can be solved from the linear systemof equations (Kågström, 1994)

[In ⊗ E1 ET

3 ⊗ Im

In ⊗ J1 JT3 ⊗ Im

] [vec(R)vec(L)

]

=

[− vec(E2)− vec(J2)

]

. (3.13)

3.3 Summary 35

Here In is an identity matrix with the same size as E3 and J3, Im is anidentity matrix with the same size as E1 and J1, ⊗ represents the Kroneckerproduct and vec(X) denotes an ordered stack of the columns of a matrix X

from left to right starting with the first column.One way to solve the generalized Sylvester equation (3.12) is to use the

linear system of equations (3.13). This system can be quite large, so it maybe a better choice to use specialized software such as the LAPACK routinesstgsyl or dtgsyl.

The steps in the proof of Lemma 2.3 and Theorem 2.1 only containstandard matrix manipulations, such as multiplication and inversion. Theyare straightforward to implement, and will not be discussed further here.

3.3 Summary

In this section a summary of the steps to compute the canonical forms isprovided. It can be used to implement the computations without studyingSection 3.2 in detail. The summary is provided as a numbered list with thenecessary computations.

1. Start with a system


that should be transformed into[I 00 N

]

Q−1ξ(t) =

[A 00 I

]

Q−1ξ(t) +

[B

D

]

u(t) (3.15)

or

w1(t) = Aw1(t) + Bu(t) (3.16a)

w2(t) = −Du(t) −m−1∑

i=1

N iDu(i)(t) (3.16b)

[w1(t)w2(t)

]

= Q−1ξ(t) (3.16c)

y(t) = LQ

[w1(t)w2(t)

]

. (3.16d)


2. Compute the generalized Schur form of the matrix pencil λE − J sothat

P1(λE − J)Q1 = λ

[E1 E2

0 E3

]

−

[J1 J2

0 J3

]

. (3.17)

The generalized eigenvalues should be sorted so that diagonal elementsof E1 contain only non-zero elements and the diagonal elements of E3

are zero. This computation can be made with one of the LAPACKcommands dgges and sgges.

3. Solve the generalized Sylvester equation (3.18) to get the matrices L

and R.

E1R + LE3 = −E2 (3.18a)J1R + LJ3 = −J2. (3.18b)

The generalized Sylvester equation (3.18) can be solved from the linearequation system

[In ⊗ E1 ET

3 ⊗ Im

In ⊗ J1 JT3 ⊗ Im

] [vec(R)vec(L)

]

=

[− vec(E2)− vec(J2)

]

(3.19)

or with the LAPACK commands stgsyl or dtgsyl. Here In is anidentity matrix with the same size as E3 and J3, Im is an identitymatrix with the same size as E1 and J1, ⊗ represents the Kroneckerproduct and vec(X) denotes an ordered stack of the columns of amatrix X from left to right starting with the first column.

4. We now get the form (3.15) and (3.16) according to

P =

[E−1

1 0

0 J−13

] [I L

0 I

]

P1 (3.20a)

Q = Q1

[I R

0 I

]

(3.20b)

N = J−13 E3 (3.20c)

A = E−11 J1 (3.20d)

[B

D

]

= PK. (3.20e)



The method for computing the canonical forms for discrete-time descriptorsystems is identical to the computation for continuous-time systems. Thiscan be realized since the proofs of the transformations in the continuous-time and discrete-time cases (Chapter 2) are similar and the computationfor the continuous-time case is based on the proof of the transformation.For the summary in Section 3.3, the only thing that changes is actually thefirst step. For the discrete-time case it takes the following form:

1. Start with a system


that should be transformed into[I 00 N

]

Q−1ξ(t + 1) =

[A 00 I

]

Q−1ξ(t) +

[B

D

]

u(t) (3.22)

or

w1(t + 1) = Aw1(t) + Bu(t) (3.23a)

w2(t) = −Du(t) −

m−1∑

i=1

N iDu(t + i) (3.23b)

[w1(t)w2(t)

]

= Q−1ξ(t) (3.23c)

y(t) = LQ

[w1(t)w2(t)

]

. (3.23d)

The steps 2–4 are identical to Section 3.3.

3.5 Conclusions

We examined how the canonical forms discussed in Chapter 2 can be com-puted with numerical software. The calculation is based on tools for thesolution of generalized eigenvalue problems, so generalized eigenvalue prob-lems where briefly discussed. Implementations of the used tools for gener-alized eigenvalue problems are available in the free LAPACK package.

4

Noise Modeling and Kalman

Filtering

In applications such as state estimation and control, it is often beneficial tohave a noise model that describes how noise and unmeasured inputs affectthe modeled system. A noise model can also be used to make up for the factthat the model only is an approximation of the physical system. As will bediscussed in Chapter 5, a noise model is important for system identification.

For continuous-time state-space models, a noise model can be addedaccording to

x(t) = Ax(t) + B1u(t) + B2v1(t) (4.1a)y(t) = Cx(t) + v2(t), (4.1b)

where v1(t) and v2(t) are white noise signals. A Kalman filter (Kailath et al.,2000) can then be implemented to estimate the state and predict future statevalues and outputs. To be able to use a similar approach for descriptor sys-tems, we will in this chapter examine how a noise model can be added todescriptor systems, how the model can be sampled and then how a Kalmanfilter can be implemented. At the end of the chapter we will treat noisemodeling and Kalman filtering for the discrete-time case.

39

40 Chapter 4 Noise Modeling and Kalman Filtering

4.1 Noise Modeling

In the case of continuous-time descriptor systems we can add a noise modelaccording to

Eξ(t) = Jξ(t) + K1u(t) + K2v1(t) (4.2a)y(t) = Lξ(t) + v2(t), (4.2b)

where v1(t) represents the unmeasured inputs and and v2(t) represents themeasurement noise. K2 is a constant matrix. This is analogous to hownoise is added in a state-space model, see (4.1). It can be realized fromthe discussion in Chapter 3, that the internal variables ξ(t) can depend onderivatives of v1(t). But if v1(t) is white noise, the derivative is not welldefined (see Åström, 1970), so then the internal variables cannot dependon derivatives of v1(t). In this section we derive conditions on K2 thatguarantees that ξ(t) does not depend on derivatives of v1(t). Two equivalentconditions are derived, one using time domain methods (Section 4.1.1) andone using frequency domain methods (Section 4.1.2).

The problem with derivatives of the noise signal has been noted earlier,e.g., Darouach et al. (1997). Here the problem was treated by assuming thatE and J are such that no derivatives can appear in the solution regardlessof K2, which can be rather restrictive.

4.1.1 Time Domain Derivation

In this section we derive a condition on K2 which is equivalent to thatderivatives of v1(t) do not affect ξ(t) using time domain methods.

Consider (4.2). We can rewrite the equations as

Eξ(t) = Jξ(t) +[K1 K2

][

u(t)v1(t)

]

(4.3a)

y(t) = Lξ(t) + v2(t). (4.3b)

If we now consider the vector[

u(t)v1(t)

]

(4.4)

as the input and assume that the system is regular, we know from Lemma 2.3that there exists transformation matrices P and Q such that the transfor-mation

PEQQ−1ξ(t) = PJQQ−1ξ(t) + P[K1 K2

][

u(t)v1(t)

]

(4.5)

4.1 Noise Modeling 41

gives the system[I 00 N

]

Q−1ξ(t) =

[A 00 I

]

Q−1ξ(t) +

[B1 B2

D1 D2

] [u(t)v1(t)

]

(4.6)

where N is a nilpotent matrix. Furthermore, Theorem 2.1 gives that thesolution can be described by

w1(t) = Aw1(t) + B1u(t) + B2v1(t) (4.7a)

w2(t) = −D1u(t) − D2v1(t)

−m−1∑

i=1

N iD1u(i)(t) −

m−1∑

i=1

N iD2v(i)1 (t)

(4.7b)

[w1(t)w2(t)

]

= Q−1ξ(t) (4.7c)

y(t) = LQ

[w1(t)w2(t)

]

+ v2(t). (4.7d)

When we have a state-space description, v1(t) and v2(t) are white noisesignals. If they were not white noise, we would technically not have astate-space description since future noise values then would depend on thecurrent noise value. To be able to transform (4.2) into state-space form wewould like to allow that v1(t) and v2(t) are white noise also here. However,continuous-time white noise signals are delicate mathematical objects. Weare actually here, and throughout the chapter, somewhat careless with thedefinition of and notation for the white noise. It should be defined usingWiener-processes. For a treatment on this, see Åström (1970). Furthermore,the derivative of white noise signals is not well defined, see further Åström(1970). Therefore one has for example to consider carefully what (4.2)means if v1(t) and v2(t) are assumed to be white noise signals. Most im-portantly, we cannot allow that any derivatives of v1(t) occur in (4.7). Ifm = 1 this requirement is trivially fulfilled and (4.7) is equivalent to thestate-space description

w1(t) = A︸︷︷︸

A

w1(t) + B1︸︷︷︸

B1

u(t) + B2︸︷︷︸

B2

v1(t) (4.8a)

y(t) = LQ

[I

0

]

︸︷︷︸

C

w1(t) + LQ

[0

−D1

]

︸︷︷︸

D

u(t) + LQ

[0

−D2

]

︸︷︷︸

N

v1(t) + v2(t). (4.8b)


If m > 1 however, we get from 4.7b that we have to require that

ND2 = 0 (4.9)

to avoid differentiation of v1(t).Note that (4.9) is related to the impulse controllability with respect to

v1(t), see for example the book by Dai (1989b) or the original paper byCobb (1984). If the system were impulse controllable with respect to v1(t),as many derivatives of it as possible would be included. What we need isactually the opposite of impulse controllability with respect to v1(t).

The requirement (4.9) may seem difficult to check in the original model(4.2), but in the following theorem we show that it is equivalent to thematrix K2 being in the range of a certain matrix. This makes it possible toavoid derivatives of the noise already at the modeling stage. To formulatethe theorem, we need to consider the transformation (4.5) with matrices P

and Q which gives a system in the form (4.6). Let the matrix N have thesingular value decomposition

N = U

[Σ 00 0

]

V T = U

[Σ 00 0

][V1 V2

]T, (4.10)

where V2 contains the last k columns of V having zero singular values.Finally, define the matrix M as

M = P−1

[I 00 V2

]

. (4.11)

It is now possible to derive a condition on K2.

Theorem 4.1The condition (4.9) is equivalent to

K2 ∈ R(M) (4.12)

where K2 is defined in (4.2) and M is defined in (4.11).

The expression (4.12) means that K2 is in the range of M , that is thecolumns of K2 are linear combinations of the columns of M .

Proof. From Lemma 2.3 we know that there exist matrices P and Q suchthat

PEQQ−1ξ(t) = PJQQ−1ξ(t) + P[K1 K2

][

u(t)v1(t)

]

(4.13)


gives the canonical form

[I 00 N

]

Q−1ξ(t) =

[A 00 I

]

Q−1ξ(t) +

[B1 B2

D1 D2

] [u(t)v1(t)

]

. (4.14)

Note that K2 can be written as

K2 = P−1

[B2

D2

]

. (4.15)

Let the matrix N have the singular value decomposition

N = U

[Σ 00 0

]

V T (4.16)

where Σ is a diagonal matrix with nonzero elements. Since N is nilpotentit is also singular, so k singular values are zero. Partition V as

V =[V1 V2

], (4.17)

where V2 contains the last k columns of V having zero singular values. ThenNV2 = 0.

We first prove the implication (4.12) ⇒ (4.9): Assume that (4.12) is fulfilled.K2 can then be written as

K2 = M

[S

T

]

= P−1

[I 00 V2

] [S

T

]

= P−1

[S

V2T

]

(4.18)

for some matrices S and T . Comparing with (4.15), we see that B2 = S

and D2 = V2T . This gives

ND2 = NV2T = 0 (4.19)

so (4.9) is fulfilled.

Now the implication (4.9) ⇒ (4.12) is proved: Assume that (4.9) is fulfilled.We then get

0 = ND2 = U

[Σ 00 0

] [V T

1

V T2

]

D2 = U

[ΣV T

1 D2

0

]

. (4.20)

This gives that

V T1 D2 = 0, (4.21)


so the columns of D2 are orthogonal to the columns of V1, and D2 can bewritten as

D2 = V2T. (4.22)

Equation (4.15) now gives

K2 = P−1

[B2

D2

]

= P−1

[B2

V2T

]

= P−1

[I 00 V2

] [B2

T

]

= M

[B2

T

]

∈ R(M).

(4.23)(4.12) is fulfilled.

2

Now, if we assume that the matrix K2 in (4.2) is such that (4.9), orequivalently (4.12), is fulfilled, the form (4.7) can be written as

w1(t) = Aw1(t) + B1u(t) + B2v1(t) (4.24a)

w2(t) = −D1u(t) − D2v1(t) −

m−1∑

i=1

N iD1u(i)(t) (4.24b)

[w1(t)w2(t)

]

= Q−1ξ(t) (4.24c)

y(t) = LQ

[w1(t)w2(t)

]

+ v2(t). (4.24d)

We now proceed to transform (4.24) into a state-space description withu(m−1)(t) as the input using the same method as in Section 2.4. We thusdefine w3(t) according to (2.50), which gives the description

w1(t) = Aw1(t) +[B1 0 . . . 0

]w3(t) + B2v1(t) (4.25a)

w2(t) = −[D1 ND1 . . . Nm−2D1

]w3(t)

− Nm−1D1u(m−1)(t) − D2v1(t)

(4.25b)

w3(t) =

0 I . . . 0...

.... . .

...0 0 . . . I

0 0 . . . 0

w3(t) +

0...0I

u(m−1)(t) (4.25c)

y(t) = LQ

[w1(t)w2(t)

]

+ v2(t). (4.25d)


Eliminating w2(t) and stacking w1(t) and w3(t) together now gives the de-scription

[w1(t)w3(t)

]

=

A B1 0 . . . 00 0 I . . . 0...

......

. . ....

0 0 0 . . . I

0 0 0 . . . 0

︸︷︷︸

A

[w1(t)w3(t)

]

+

00...0I

︸︷︷︸

B1

u(m−1)(t) +

[B2

0

]

︸︷︷︸

B2

v1

(4.26a)

y(t) = LQ

[I 0 0 . . . 00 −D1 −ND1 . . . −Nm−2D1

]

︸︷︷︸

C

[w1(t)w3(t)

]

+

LQ

[0

−Nm−1D1

]

︸︷︷︸

D

u(m−1)(t)+

LQ

[0

−D2

]

︸︷︷︸

N

v1(t) + v2(t).

(4.26b)

Defining

x(t) =

[w1(t)w3(t)

]

(4.27)

gives the more compact notation

x(t) = Ax(t) + B1u(m−1)(t) + B2v1(t) (4.28a)

y(t) = Cx(t) + Du(m−1)(t) +[N I

][v1(t)v2(t)

]

. (4.28b)

We have shown that it is possible to construct a state-space system witha noise model that has the same solution as the descriptor system with noisemodel (4.2) if ND2 = 0 holds. Note that in the state-space model, the noiseon the output equation is in general correlated with the noise on the stateequation through the v1(t) term. This correlation is eliminated if D2 = 0.


Then N = 0 so the state-space description simplifies to

x(t) = Ax(t) + B1u(m−1)(t) + B2v1(t) (4.29a)

y(t) = Cx(t) + Du(m−1)(t) + v2(t). (4.29b)

Here, the noise on the state and output equations are correlated only if v1(t)and v2(t) are.

4.1.2 Frequency Domain Derivation

In the previous section, Theorem 4.1 gave a condition on how noise can beadded to a descriptor system without making the internal variables of thesystem depend on derivatives of the noise. The criterion was based on acanonical form. As will be shown in this section, an equivalent result canalso be derived in the frequency domain without requiring calculation of thecanonical form.

Instead of examining ifND2 = 0 (4.30)

to avoid derivatives of the noise, we will here examine if the transfer functionfrom the process noise to the internal variables is proper (i.e., does nothave higher degree in the numerator than the denominator). These twoconditions are obviously equivalent, since a transfer function differentiatesits input if and only if it is non-proper. Consider the descriptor system

Eξ(t) = Jξ(t) + K1u(t) + K2v1(t) (4.31a)y(t) = Lξ(t) + v2(t). (4.31b)

The question is if the transfer function

G(s) = (sE − J)−1K2 (4.32)

is proper. Note that we want to examine if the internal variables dependon derivatives of the noise, so L is not included in the transfer function.

Throughout the section, some concepts from the theory of matrix frac-

tion descriptions (MFD) will be needed. MFD:s are discussed in, for ex-ample, Kailath (1980) and in Rugh (1996) where they are called polynomial

fraction descriptions.We start by defining the row degree of a polynomial matrix and the con-

cept of a row reduced polynomial matrix according to Rugh (1996, page 308).


Definition 4.1The i:th row degree of a polynomial matrix P (s), written as ri[P ], is thedegree of the highest degree polynomial in the i:th row of P (s).

Definition 4.2If the polynomial matrix P (s) is square (n × n) and nonsingular, then it iscalled row reduced if

deg[det P (s)] = r1[P ] + · · · + rn[P ]. (4.33)

We will also need the following theorem from Kailath (1980):

Theorem 4.2If the n × n polynomial matrix D(s) is row reduced, then D−1(s)N(s) isproper if and only if each row or N(s) has degree less than or equal thedegree of the corresponding row of D(s), i.e., ri[N ] ≤ ri[D], i = 1, . . . , n.

For the proof, see Kailath (1980, page 385).We will examine if the transfer function (4.32) (which actually is a left

MFD) fulfills the conditions of Theorem 4.2. According to Rugh (1996, page308) a MFD can be converted into row reduced form by pre-multiplicationof a unimodular1 matrix U(s). More specifically, with

D(s) = U(s)(sE − J) (4.34)N(s) = U(s)K2, (4.35)

and consequently

D−1(s)N(s) = (sE − J)−1K2 = G(s), (4.36)

D(s) is row reduced for a certain unimodular matrix U(s). U(s) is notunique, it can for example be scaled by a constant. However, Theorem 4.2shows that for each choice of U(s), the transfer function G(s) of the systemis proper if the highest degree of the polynomials in each row in N(s) is lowerthan or equal to the highest degree of the polynomials in the correspondingrow of D(s). This gives a condition on K2 in the following way:

Writing U(s) as

U(s) =

m∑

i=0

Uisi (4.37)

1A polynomial matrix is called unimodular if its determinant is a nonzero real num-ber (Rugh, 1996, page 290).


and writing the j:th row of Ui as Uij, shows that the condition

UijK2 = 0 i > rj[D], j = 1 . . . n (4.38)

guarantees that the transfer function G(s) of the system is proper. Here, n

is the size of the square matrices E and J , or equivalently the number ofelements in the vector ξ(t).

Conversely, assume that (4.38) does not hold. Then some row degree ofN(s) is higher than the corresponding row degree of D(s), so the transferfunction G(s) is then according to Theorem 4.2 not proper.

This discussion proves the following theorem.

Theorem 4.3Consider the transfer function G(s) = (sE − J)−1K2 where the matricesE and J are n × n. Let U(s) be a unimodular matrix such that D(s) =U(s)(sE − J) is row reduced. Write U(s) as

U(s) =

m∑

i=0

Uisi (4.39)

and let Uij be the j:th row of Ui. Then G(s) is proper if and only if

UijK2 = 0 i > rj [D], j = 1 . . . n. (4.40)

Note that the criterion discussed in this section requires that the MFD istransformed into row reduced form. An algorithm for finding this transfor-mation is provided in Rugh (1996, Chapter 16).

We have now proved two theorems, one using time domain methodsand one using frequency domain methods, that give conditions which areequivalent to the fact that v1(t) is not differentiated. This means that thesetwo conditions are equivalent as well.

4.2 Example

In this section the results of the previous section are exemplified on a simplephysical descriptor system. We will use Theorem 4.1 and 4.3 to examinehow a noise model can be added to a system consisting of two rotatingmasses as shown in Figure 4.1. It will be shown that noise can only beadded in equations where it can be physically motivated. The system isdescribed by the torques M1(t), M2(t), M3(t) and M4(t) and the angular

4.2 Example 49

PSfrag

M1M2 M3

M4ω1

ω2

Figure 4.1 Two interconnected rotating masses.

velocities ω1(t) and ω2(t). The masses have the moments of inertia J1 andJ2. The equations describing this system are

J1ω1(t) = M1(t) + M2(t) (4.41a)J2ω2(t) = M3(t) + M4(t) (4.41b)M2(t) = −M3(t) (4.41c)ω1(t) = ω2(t). (4.41d)

Equation (4.41a) and (4.41b) describe the angular accelerations the torquesproduce, and Equation (4.41c) and (4.41d) describe how the two parts areconnected. Written in descriptor form, these equations are

J1 0 0 00 J2 0 00 0 0 00 0 0 0

ω1(t)ω2(t)

M2(t)

M3(t)

=

0 0 1 00 0 0 10 0 −1 −1−1 1 0 0

ω1(t)ω2(t)M2(t)M3(t)

+

1 00 10 00 0

[M1(t)M4(t)

]

(4.42)

if we consider M1(t) and M4(t) as inputs. Using the following transforma-tion matrices P and Q

P =

1 1 1 00 0 0 −10 0 −1 0

− J2

J1+J2

J1

J1+J2− J2

J1+J20

, (4.43)

Q =

1J1+J2

J2

J1+J20 0

1J1+J2

− J1

J1+J20 0

0 0 1 −10 0 0 1

, (4.44)


the descriptor system can be transformed into the canonical form (2.20) inLemma 2.3. The transformation

w(t) = Q−1

ω1(t)ω2(t)M2(t)M3(t)

(4.45)

gives

1 0 0 00 0 0 00 0 0 0

0 − J1J2

J1+J20 0

w(t) =

0 0 0 00 1 0 00 0 1 00 0 0 1

w(t) +

1 10 00 0

− J2

J1+J2

J1

J1+J2

[M1(t)M4(t)

]

. (4.46)

Now to the important part. If we want to incorporate noise into the de-scriptor equation (4.42) by adding K2v1(t) to the right hand side of (4.42),which K2-matrices are allowed?

To answer this question Theorem 4.1 can be consulted. We begin bycalculating the matrices P−1 and V2 from (4.43) and (4.46). We have that

N =

0 0 00 0 0

− J1J2

J1+J20 0

⇒ V2 =

0 01 00 1

(4.47)

and that

P−1 =

J1

J1+J20 1 −1

J2

J1+J20 0 1

0 0 −1 00 −1 0 0

(4.48)

The condition of Theorem 4.1 can now be calculated:

K2 ∈ R

(

P−1

[I 00 V2

])

= R

J1

J1+J21 −1

J2

J1+J20 1

0 −1 00 0 0

(4.49)

4.2 Example 51

This simply means that white noise cannot be added to equation (4.41d)(if J1 > 0 and J2 > 0). We will comment on this result below, but first weshow how to derive the same condition using the frequency domain methodin Theorem 4.3. Transforming the system into row reduced form gives(assuming J1 > 0 and J2 > 0)

U(s) =

− 1J1

1J2

0 s

0 1 0 00 0 1 00 0 0 1

(4.50)

=

− 1J1

1J2

0 0

0 1 0 00 0 1 00 0 0 1

︸︷︷︸

U0

+

0 0 0 10 0 0 00 0 0 00 0 0 0

︸︷︷︸

U1

s (4.51)

and

D(s) =

0 0 1J1

− 1J2

0 J2s 0 −10 0 1 11 −1 0 0

(4.52)

with notation from section 4.1.2.The row degrees of D(s) are r1[D] = 0, r2[D] = 1, r3[D] = 0, and

r4[D] = 0. Theorem 4.3 shows that the transfer function is proper if andonly if

0 0 0 10 0 0 00 0 0 0

K2 = 0. (4.53)

What equation (4.53) says is that the last row of K2 must be zero, which isthe same conclusion as was reached using the time domain method, Theo-rem 4.1.

The result that white noise cannot be added to the equation

ω1(t) = ω2(t) (4.54)

is a result that makes physical sense since this equation represents a rigidconnection. Furthermore, a noise term added to this equation would require


at least one of ω1 and ω2 to make instantaneous changes. The equations

J1ω1(t) = M1(t) + M2(t) (4.55)J2ω2(t) = M3(t) + M4(t) (4.56)

show that at least one of the torques Mi(t) would have to take infinite values.This is of course not physically reasonable. Consequently, the Theorems 4.1and 4.3 tell us how to add noise in a physically motivated way, at leastfor this example. They could therefore be used to guide users of object-oriented modeling software on how noise can be added to models. Thenoise model created by the user could then be used for state estimation orsystem identification.

4.3 Sampling with Noise Model

Also when we have a noise model, it is interesting to examine what thesampled description of a continuous-time descriptor system is. To derivea sampling result, we need the following result for sampling of stochasticstate-space systems:

Lemma 4.1Consider a state-space system with a noise model

x(t) = Ax(t) + v(t) (4.57a)y(t) = Cx(t) + e(t) (4.57b)

where v(t) and e(t) are white noise signals with

E(v(t)vT (t)

)= R1 (4.58a)

E(v(t)eT (t)

)= R12 (4.58b)

E(e(t)eT (t)

)= R2. (4.58c)

The values of the state variables and the outputs of the state-space system atdiscrete times kTs, k = 1, 2, . . . , are related through the stochastic differenceequations

x(Tsk + Ts) = Φx(Tsk) + v(Tsk) (4.59a)z(Tsk + Ts) = y(Tsk + Ts) − y(Tsk) = θx(Tsk) + e(Tsk) (4.59b)

4.3 Sampling with Noise Model 53

where

Φ = eATs (4.60a)

θ = C

∫ Ts

0eAτdτ (4.60b)

and the discrete stochastic variables v(t) and e(t) have zero mean valuesand the covariances

E(v(t)vT (t)

)= R1 =

∫ Ts

0eA(Ts−τ)R1

(

eA(Ts−τ))T

dτ (4.61a)

E(v(t)eT (t)

)= R12 =

∫ Ts

0eA(Ts−τ)

(R1Θ

T (τ) + R12

)dτ (4.61b)

E(e(t)eT (t)

)= R2 =∫ Ts

0Θ(τ)R1Θ

T (τ) + Θ(τ)R12 + RT12Θ

T (τ) + R2 dτ.

(4.61c)

Θ(τ) = C

∫ Ts

τ

eA(s−τ)ds. (4.61d)

This result can be found in for example (Åström, 1970, Chapter 3).We now want to use Lemma 4.1 to derive the sampled counterpart of

the descriptor system

Eξ(t) = Jξ(t) + K2v1(t) (4.62a)y(t) = Lξ(t) + v2(t). (4.62b)

To simplify the discussion we examine the case without input signal. Asystem with input signal can be handled according to what was discussedin Section 2.5. The noise signals v1(t) and v2(t) have the covariances

E(v1(t)v

T1 (t)

)= Q1 (4.63a)

E(v1(t)v

T2 (t)

)= Q12 (4.63b)

E(v2(t)v

T2 (t)

)= Q2. (4.63c)


From Section 4.1.1 we known that (4.62) can be transformed into the state-space system

x(t) = Ax(t) + Bv1(t)︸︷︷︸

v1(t)

(4.64a)

y(t) = Cx(t) +[

N I][v1(t)v2(t)

]

︸︷︷︸

v2(t)

(4.64b)

if K2 is such that v1(t) is not differentiated. The covariances of v1(t) andv2(t) are

E(v1(t)v

T1 (t)

)= R1 = BQ1B

T (4.65a)

E(v1(t)v

T2 (t)

)= R12 = B

[Q1 Q12

][NT

I

]

(4.65b)

E(v2(t)v

T2 (t)

)= R2 =

[N I

][

Q1 Q12

QT12 Q2

] [NT

I

]

. (4.65c)

Since R1, R12, and R2 are known for the state-space model (4.64), a sampledversion of the original descriptor system (4.62) can now be calculated usingLemma 4.1. We get that a sampled version of (4.62) is

x(Tsk + Ts) = Φx(Tsk) + v(Tsk) (4.66a)z(Tsk + Ts) = y(Tsk + Ts) − y(Tsk) = θx(Tsk) + e(Tsk) (4.66b)

with

Φ = eATs (4.67a)

θ = C

∫ Ts

0eAτdτ (4.67b)

4.4 Kalman Filtering 55

and

E(v(t)vT (t)

)= R1 =

∫ Ts

0eA(Ts−τ)R1

(

eA(Ts−τ))T

dτ (4.68a)

E(v(t)eT (t)

)= R12 =

∫ Ts

0eA(Ts−τ)

(R1Θ

T (τ) + R12

)dτ (4.68b)

E(e(t)eT (t)

)= R2 =∫ Ts

0Θ(τ)R1Θ

T (τ) + Θ(τ)R12 + RT12Θ

T (τ) + R2 dτ.

(4.68c)

Θ(τ) = C

∫ Ts

τ

eA(s−τ)ds. (4.68d)

4.3.1 A Shortcut

Above we examined how to transform a continuous-time descriptor systemwith noise model into a continuous-time state-space system and further intoa discrete-time description. If we do not have any prior knowledge aboutthe noise model, and want to identify it as a black-box model, it might bebetter to add the noise model after we have converted the descriptor system(without noise model) into the state-space form (2.55), or even after wehave sampled the system. Then we do not have to make the conversion ofthe noise model, and we do not have to consider (4.9).

4.4 Kalman Filtering

We have now established how to transfer a continuous-time descriptor sys-tem into a discrete-time state-space system which gives an equivalent de-scription of the output at the sampling instants. This opens up the possi-bility to use a discrete-time Kalman filter to estimate the states and makepredictions. To be concrete, assume that we have arrived at the discrete-time state-space model

x(Tsk + Ts) = Ax(Tsk) + Bu(Tsk) + Nv1(Tsk) (4.69a)y(Tsk) = Cx(Tsk) + Du(Tsk) + v2(Tsk). (4.69b)

The implementation of a Kalman filter is then straightforward, see for exam-ple, Anderson and Moore (1979) or Kailath et al. (2000). We could also usethe continuous-time state-space description (4.8) or (4.28) and implement a


continuous-time Kalman filter. Note that implementation of a continuous-time Kalman filter with digital hardware always involves some sort of ap-proximation since digital hardware operates in discrete-time.

Note that we only get estimations of the state vector x(t) and the outputy(t), not of the vector ξ(t), through a normal Kalman filter. However, inthis thesis we only need to predict y(t), we do not need to estimate ξ(t).

Kalman filtering of continuous-time descriptor systems has previouslybeen treated by Germani et al. (2002). The results in this contributionrequires that the noise model has a certain structure and that some mea-surements are noise-free and can be used to reduce the descriptor system tostate-space form. Another contribution is Darouach et al. (1997). Here theproblem with derivatives of the noise was treated by assuming that N = 0,which can be rather restrictive. The estimation problem itself was solved byusing the SVD coordinate system, and assuming that the internal variablescorresponding to the second block row could be calculated from u(t) andy(t).


4.5.1 Noise Modeling

A noise model can be added to a discrete-time descriptor system accordingto

Eξ(t + 1) = Jξ(t) + K1u(t) + K2v1(t) (4.70a)y(t) = Lξ(t) + v2(t), (4.70b)

similarly to the continuous-time case. Here, v1(t) and v2(t) are uncorrelatedsequences of white noise and K2 is a constant matrix. We assume that thedescriptor system is regular.

In Chapter 2 we saw that discrete-time descriptor systems may be non-causal. The noisy system discussed here might be non-causal not only withrespect to the input signal, but also with respect to the noise. This can seenby first writing the system as

Eξ(t + 1) = Jξ(t) +[K1 K2

][

u(t)v1(t)

]

(4.71a)

y(t) = Lξ(t) + v2(t) (4.71b)


and then applying Theorem 2.3. The solutions can be described by

w1(t + 1) = Aw1(t) + B1u(t) + B2v1(t) (4.72a)

w2(t) = −D1u(t) −m−1∑

i=1

N iD1u(t + i) (4.72b)

− D2v1(t) −m−1∑

i=1

N iD2v1(t + i) (4.72c)

[w1(t)w2(t)

]

= Q−1ξ(t) (4.72d)

y(t) = LQ

[w1(t)w2(t)

]

+ v2(t). (4.72e)

A difference from the continuous-time case is that we do not have toput any restriction on the noise model, as dependence on future values ofthe noise is theoretically possible. The dependence on future values of thenoise can be handled for example by time shifting the noise sequence. If wedefine

v1(t) = v1(t + m − 1) (4.73)

equation (4.72) can be written as

w1(t + 1) = Aw1(t) + B1u(t) + B2v1(t − m + 1) (4.74a)

w2(t) = −D1u(t) −

m−1∑

i=1

N iD1u(t + i) (4.74b)

− D2v1(t − m + 1) −

m−1∑

i=1

N iD2v1(t + i − m + 1) (4.74c)

[w1(t)w2(t)

]

= Q−1ξ(t) (4.74d)

y(t) = LQ

[w1(t)w2(t)

]

+ v2(t) (4.74e)

which is a causal description with respect to the noise. Note that the se-quences v1(t) and v1(t) will have the same statistical properties since theyare both white noise sequences. The noise sequences v1(t) and v2(t) mustbe uncorrelated, otherwise v2(t) will be correlated with v1(t − m + 1).


4.5.2 Kalman Filtering

The system (4.71) can be transformed into state-space form using the tech-nique in Section 2.6.3. We would then get

x(t + 1) = Ax(t) + B1u(t + m − 1) + B2v1(t + m − 1) (4.75a)

y(t) = Cx(t) + D1u(t + m − 1) + D2v1(t + m − 1) + v2(t). (4.75b)

which, using (4.73), also can be written as

x(t + 1) = Ax(t) + B1u(t + m − 1) + B2v1(t) (4.76a)

y(t) = Cx(t) + D1u(t + m − 1) + D2v1(t) + v2(t). (4.76b)

This is a state-space description if u(t + m − 1) is considered as the input.It can however be argued that dependence on future noise values is notphysical, so another approach may be to require that ND2 = 0, so that thesystem is causal with respect to the noise. We could use a similar approachas in Section 4.1 to make sure that this holds.

When the discrete-time descriptor system has been converted into state-space form, implementation of the Kalman filter is straightforward, see forexample, Anderson and Moore (1979) or Kailath et al. (2000).

Previous work on Kalman filtering of discrete-time descriptor systems isamong others Chisci and Zappa (1992); Dai (1987, 1989a); Darouach et al.(1993); Deng and Liu (1999); Nikoukhah et al. (1998, 1999). The approachtaken in this section for discrete-time descriptor systems, is similar to theone in Dai (1987). Dai (1987) also uses the idea to time-shift the noisesequence and write the system in state-space form, but he does not discusshow a system with input signal should be treated. The discussion hereis still relevant, since it includes the case with input signal and since itrelates to the approach suggested for continuous-time descriptor systems inSection 4.4.

4.6 Conclusions

We noted that if noise is added to arbitrary equations of a continuous-timedescriptor system, derivatives of the noise signal might affect the internalvariables. Since derivatives of white noise is not well defined, we derived amethod to add noise without causing derivatives of it to affect the internalvariables. Furthermore, if the descriptor system is converted to state spaceform, it is possible to implement a Kalman filter.

5

Parameter Estimation in Linear

Descriptor Systems

We have now developed enough theory to start discussing how the param-eter estimation problem for linear descriptor systems can be solved. Somepreliminaries on system identification will be provided in Section 5.1. Timedomain identification methods for continuous-time linear descriptor systemswill be discussed in Section 5.1.2 and frequency domain methods will be dis-cussed in Section 5.3. An example is provided in Section 5.4. In Section 5.5discrete-time descriptor systems are treated.

5.1 System Identification

System identification can be said to be mathematical modeling of dynamicalsystems using input and output data, i.e., using experimental data. Asdiscussed in Chapter 2, a regular linear descriptor system can always bedescribed by a transfer function, G(·, θ). In Chapter 4 we pointed outthat it is common that a noise model is included to describe the effects ofunknown disturbances on the system. In the linear case, the noise affectingthe output can be described as a linear filter H(·, θ) applied to a white noiseprocess. The model structure examined would then be

y(t) = G(p, θ)u(t) + H(p, θ)e(t) (5.1)

59

60 Chapter 5 Parameter Estimation in Linear Descriptor Systems

in the continuous-time case and

y(t) = G(q, θ)u(t) + H(q, θ)e(t) (5.2)

in the discrete-time case. H(·, θ) is assumed to have a causal inverse. Thesignal u(t) is a known input, and e(t) is a white noise signal with zeromean. The operators p and q are the differentiation and time shift operatorsrespectively.

In this section, we will discuss some well established methods for estima-tion of parameters. Time domain methods will be discussed in Section 5.1.1and frequency domain methods will be discussed in Section 5.1.2. The dif-ferent methods are only discussed briefly. For a more thorough discussion,the reader is referred to, e.g., Ljung (1999) which is a standard referencefor the material discussed in this section.

The rest of the chapter will describe what we need to do to apply thesemethods when the transfer functions are parameterized as linear descriptorsystems.

5.1.1 Time Domain Identification

With time domain methods for system identification, we here mean methodsthat compute estimates of the unknown parameters θ using measurementsof the input and output for a number of time instants, ZN = {u(0), y(0), ...,u(N), y(N)}. Several methods are available in the literature, but here wewill limit the discussion to the prediction error method and the maximum

likelihood method. The principle behind the prediction error method is tominimize the prediction errors,

ε(t, θ) = y(t) − y(t|θ), (5.3)

with respect to the unknown parameters. The predicted output y(t|θ) is themodel’s prediction of y(t) given Zt−1. For the discrete-time case (5.2), thestandard predictor is

y(t|θ) = H−1(q, θ)G(q, θ)u(t) +(1 − H−1(q, θ)

)y(t). (5.4)

For the continuous-time case (5.1), the predictor can be calculated byfirst sampling the model. Another method is to use the Kalman filter forcontinuous-time models with discrete-time data.

For the minimization of the prediction errors (5.3), one could choosedifferent norms. One common choice is the quadratic criterion, given by

VN (θ, ZN ) =1

N

N∑

t=1

1

2ε2(t, θ). (5.5)

5.1 System Identification 61

The criterion is usually minimized by a numerical search method, for exam-ple Gauss-Newton.

The maximum likelihood method estimates the unknown parameters bymaximizing the probability of the measured output with respect to the un-known parameters. In the case when the prediction errors have a Gaussiandistribution with known variance, the maximum likelihood criterion will co-incide with (5.5). This will be the case for the model structure (5.2) if e(t)has a Gaussian distribution with known variance.

The discussion in this section assumes that y(t), u(t), and e(t) are scalarvalued. The formulas can however be extended to the multivariable case,as discussed by (Ljung, 1999, Section 7.2).

State Space Models

Physical systems are often modeled in state-space form,

x(t) = A(θ)x(t) + B1(θ)u(t) + B2(θ)v1(t) (5.6a)y(t) = C(θ)x(t) + D(θ)u(t) + v2(t) (5.6b)

or

x(t + 1) = A(θ)x(t) + B1(θ)u(t) + B2(θ)v1(t) (5.7a)y(t) = C(θ)x(t) + D(θ)u(t) + v2(t). (5.7b)

In this description, v1(t) is process noise and v2(t) is measurement noisewith covariances according to

Ev1(t)vT1 (t) = R1(θ) (5.8a)

Ev1(t)vT2 (t) = R12(θ) (5.8b)

Ev2(t)vT2 (t) = R2(θ). (5.8c)

This is as just another parameterization of the linear systems (5.1) and (5.2).The transfer function G(·, θ) can be calculated as

G(p, θ) = C(θ) (pI − A(θ))−1B1(θ) (5.9)

in the continuous-time case and

G(q, θ) = C(θ) (qI − A(θ))−1 B1(θ) (5.10)


in the discrete-time case. If we have an output error model, that is B2(θ) =0, we have that H(·, θ) = 1. If B2(θ) 6= 0, we can for the discrete casecalculate H(q, θ) as

H(q, θ) = C(θ) (qI − A(θ))−1K(θ) + I (5.11)

where

K(θ) =(A(θ)P (θ)CT (θ) + R12(θ)

) (C(θ)P (θ)CT (θ) + R2(θ)

)−1

. (5.12)

P (θ) is the positive semidefinite solution of the stationary Riccati equation

P (θ) = A(θ)P (θ)AT (θ) + R1(θ) −(A(θ)P (θ)CT (θ) + R12(θ)

)

×(C(θ)P (θ)CT (θ) + R2(θ)

)−1 (

A(θ)P (θ)CT (θ) + R12(θ))T

.

(5.13)

Note that the predictor (5.4) is the predictor given by the stationary Kalmanfilter for (5.7). We could therefore use the stationary Kalman filter to cal-culate the predictions.

For the continuous-time case, one method to calculate the predictionsis to sample the model and use the discrete-time Kalman filter. Anothermethod is to use the Kalman filter for continuous-time models with discrete-time data.

5.1.2 Frequency Domain Identification

Frequency domain methods aim to estimate the unknown parameters θ fromfrequency domain data, ZN = {U(ω1), Y (ω1), . . . , U(ωN ), Y (ωN )}, whereY (ωk) and U(ωk) are the discrete Fourier transforms of the correspondingtime domain signals in the discrete-time case, or approximations of theFourier transforms in the continuous-time case. The Y (ωk) and U(ωk) canbe obtained directly from the system using a measurement device providingfrequency domain data, or calculated from time domain data. Assumingthat the system is described by (5.1) or (5.2), we have the approximaterelations

Y (ωk) ≈ G(iωk, θ)U(ωk) + H(iωk, θ)E(ωk) (5.14)

in continuous-time and

Y (ωk) ≈ G(eiωk , θ)U(ωk) + H(eiωk , θ)E(ωk) (5.15)

in discrete-time. The relations are only approximate since a transient termshould be added to make the relations exact if the signals are not periodic.

5.2 Time Domain Identification for Descriptor Systems 63

In order to estimate the parameters, a criterion like

VN (θ, ZN) =

N∑

k=1

|Y (ωk) − G(eiωk , θ)U(ωk)|2Wk (5.16)

in the discrete-time case, and

VN (θ, ZN) =N∑

k=1

|Y (ωk) − G(iωk, θ)U(ωk)|2Wk (5.17)

in the continuous-time case is minimized. The weighting functions Wk canbe selected using the noise model H(·, θ). If the noise model depends onθ a second term usually has to be added to the criterion to get consistentestimates, see further Ljung (1999) and Pintelon and Schoukens (2001).

In should be noted that an advantage with frequency domain identifica-tion methods is that continuous-time models and discrete-time models canbe handled similarly. It is an advantage that continuous-time models donot need special treatment since it is usually easiest to express models ofphysical systems in continuous time.

5.2 Time Domain Identification for Descriptor

Systems

In this section we describe how the general time domain identification meth-ods described in Section 5.1.1 can be applied to continuous-time descriptorsystems.

To use the prediction error method, we need to calculate a predictiony(t|θ). As discussed in Chapters 2 and 4, a regular descriptor system can betransformed into a state-space description which gives the same solution. Todo this we might need to redefine the input as one of its derivatives. Thiscan be a problem if only sampled data is available, since the derivativesthen would have to be calculated by difference approximations. If, on theother hand, the input can be selected, we can select a signal that can bedifferentiated analytically. It would then be good to select a signal wherethe highest derivative that affect the internal variables is piecewise constantor linear. An exact discrete-time description can then be calculated asdiscussed earlier.

We get that a straightforward way to calculate the predictions is tocalculate an equivalent state-space description for the descriptor system,and then use the methods described in Section 5.1.1.


In practice, we are however faced with the important question of howthe state-space description should be calculated. As discussed in Chapter 3,the canonical forms can be computed using numerical software. But if someelements of the matrices are unknown, numerical software cannot be used!Another approach could be to calculate the canonical forms using symboli-cal software. But this approach has not been thoroughly investigated, andsymbolical software is usually not as easily available as numerical software.For example, LAPACK is free software. The remedy is to make the con-

version using numerical software for each value of the parameters that the

identification algorithm needs. Consider for example the case when we wishto estimate the parameters by minimizing the prediction error criterion

VN (θ, ZN ) =1

N

N∑

t=1

1

2ε2(t, θ) (5.18)

using a Gauss-Newton search. For each parameter value θ that the Gauss-Newton algorithm needs, we compute a state space description using themethods in Chapter 3 and calculate the prediction errors ε(t, θ).

5.3 Frequency Domain Identification for Descrip-

tor Systems

The work which has been done this far has been based on transformingthe descriptor system into a state-space system, and using identificationmethods for state-space descriptions. As was discussed earlier, this trans-formation always exists if the system is regular, and can be computed nu-merically. We have however seen that the work to transform the descriptorsystem into state-space form might be significant in some cases. Further-more, we sometimes have to redefine the input as one of its derivatives. Ifthe original input can be selected, then it might be possible to differentiateit analytically. If, on the other hand, only a measured input is available, itmust be differentiated numerically, which can be a problem if the signal isnoisy.

Here we examine another approach to the identification problem thatoffers an alternative way to handle these potential problems, namely identi-fication in the frequency domain. The conversion into state-space form canbe avoided in the output error case as we will see below. A model whichdifferentiates the input will have a large amplification for high frequencies.In the frequency domain we could therefore handle this problem by notincluding measurements with a too high frequency in ZN .

5.3 Frequency Domain Identification for Descriptor Systems 65

As discussed in Section 5.1.2, it is assumed that the model structure isspecified by transfer functions (or matrices of transfer functions) accordingto

y(t) = G(p, θ)u(t) + H(p, θ)e(t) (5.19)

when performing frequency domain identification. H(p, θ) is assumed tohave a causal inverse

A linear continuous-time descriptor system with only measurement noise(an output error model),

E(θ)ξ(t) = J(θ)ξ(t) + K1(θ)u(t) (5.20a)y(t) = L(θ)ξ(t) + e(t), (5.20b)

can transformed directly into the form (5.19) under the usual assumptionthat det(sE − J) is not zero for all s. The only difference from the transferfunction of a state-space system is that G(p, θ) may be non-proper here.The transfer functions are

G(p, θ) = L(θ) (pE(θ) − J(θ))−1K1(θ) (5.21a)

H(p, θ) = 1. (5.21b)

When the transfer function has been calculated, all we have to do is toplug it into any identification algorithm for the frequency domain. Bookswhich treat this are e.g., Ljung (1999) and Pintelon and Schoukens (2001).Note that G(p, θ) here easily could be calculated using symbolical software.We can therefore compute G(p, θ) once and for all, and do not have toperform the calculation for each parameter value. One possible selection ofidentification method is to minimize the criterion

VN (θ, ZN ) =

N∑

k=1

|Y (ωk) − G(iωk, θ)U(ωk)|2 (5.22)

with respect to the parameters θ.Estimates of the Fourier transforms of the input and output signals are

needed. As discussed in Section 5.1.2, these could be provided directlyby a special measurement device or estimated from time domain data. Adrawback with identification in the frequency domain is that knowledge ofthe initial values of the internal variables is more difficult to utilize than fortime domain identification.

In the more complex case when the model also has process noise,

E(θ)ξ(t) = J(θ)ξ(t) + K1(θ)u(t) + K2(θ)v1(t) (5.23a)y(t) = L(θ)ξ(t) + v2(t), (5.23b)


the noise filter H(p, θ) cannot be calculated in a straightforward manner.One approach to calculate H(p, θ) here is to first transform the descriptorsystem to state space form and then follow the procedure described in Sec-tion 5.1.1. We now in principle need to do the same transformation thatneeds to be done when estimating the parameters in the time domain. Wetherefore do not have the possibility of calculating G(p, θ) and H(p, θ) onceand for all with symbolical software as could be done for the output errorcase.

5.4 Example

In this section the algorithms examined earlier in the chapter are exemplifiedon a physical system. The system setup examined is a model of a generator,see Figure 5.1. Dependence on time is omitted in this section.

It should be stressed that the role of these example is to show how wecan work with DAE-descriptions from Modelica-like modeling environmentsin system identification applications. In this case, despite a singular E-matrix, the model will be reduced to a standard state-space description bythe transformation mechanisms described previously. The properties of theactual estimates obtained will thus follow from well known techniques andresults, and we will therefore not discuss accuracy aspects of the estimatedmodels.

M1 M2

φ, ω

φin J

L R1

R2u1

u2 u3

u4

I

+ +

++

−−

−−

k

Figure 5.1 A model of a generator.

The model can be described as follows: The input is the angle φin onthe left axis. This axis is connected to a rotating mass with inertia J whichis rotated the angle φ and rotates with the angular velocity ω. The torqueacting on the left side of the mass is M1 and the torque on the right side is

5.4 Example 67

M2. The mass is then connected to a second axis which is connected to theactual generator. The variables describing the second axis and the electricalquantities are then assumed to depend on each other according to M2 = kI

and u1 = kω for some constant k. The rest of the electrical circuit consistsof two resistors and one inductor. The measured output is the voltage u4.

Note that here it is easy to spot that too many variables are includedin the model if the torques are not interesting, as the mass then will haveno impact on the system. This is to show that the identification algorithmtreats this automatically. (In larger models it may be more difficult to spotthat some variables are unnecessary.)

The system is now modeled in an object-oriented manner by just writingdown the equations describing the different parts and the ways they areconnected:

φin = φ φ = ω Jω = M1 + M2

M2 = kI u1 = kω u2 = LI

u3 = R1I u4 = R2I u1 = u2 + u3 + u4

To apply the identification methods, these equations must be written indescriptor form:

0 0 0 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 J 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 L 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0

M1

M2

ω

φ

I

u1

u2

u3

u4

=

0 0 0 −1 0 0 0 0 00 0 1 0 0 0 0 0 01 1 0 0 0 0 0 0 00 −1 0 0 k 0 0 0 00 0 k 0 0 −1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 R1 0 0 −1 00 0 0 0 R2 0 0 0 −10 0 0 0 0 −1 1 1 1

M1

M2

ω

φ

I

u1

u2

u3

u4

+

100000000

φin (5.24a)


y =[0 0 0 0 0 0 0 0 1

]

M1

M2

ω

φ

I

u1

u2

u3

u4

(5.24b)

The values R1 and R2 are assumed to be known (measured beforehand),so the values of L and k are to be estimated. (The value of J will not affectthe output). Below the unknown parameters are identified both with timeand frequency domain methods. To generate input and output data foridentification experiments, the model was simulated in the MathModelicaadd-on (Fritzson, 2004) for Mathematica (Wolfram, 1999).

5.4.1 Implementation

In this section it is shown how the identification method suggested in Sec-tion 5.1.1 can be implemented.

To use, e.g., the System Identifictaion Toolbox for Matlab,(Ljung, 2000), the model was put into the idgrey object format. Thismeans that an m-file must be written, which, for each parameter vectorproduces the A,B,C,D,K, x0. This means that this m-file will call thetransformation routines described in Chapter 3, which include calls to toolsform LAPACK. The model object is created by

mi = idgrey(’generator_id’,[10 -10],’c’,[],0, ...

’DisturbanceModel’,’None’);

and the model-defining m-file generator_id has the format

function [A,B,C,D,K,X0]=generator_id(pars,Tsm,Auxarg)

%Model of generator

global Es Fs Gs Ms

%Symbolic matrices into which the following

%numerical values are inserted:

%Known:

J=1;

k=-1;

R=1;


RL=1;

%To be estimated

L=pars(1);

k=pars(2);

E=subs(Es);%Inserting the values

F=subs(Fs);

G=subs(Gs);

H=subs(Ms);

[A,B,C,D,m]=normform3(E,F,G,H); % Call to Lapack routine

K=0; %First order model

X0=0;

In this case a well defined state-space model is generated for all parametervalues except L = 0, so the estimation command

m = pem(data,mi)

will work in a familiar fashion.


5.5.1 Time Domain Identification

As discussed in Section 2.6.3 and 4.5, a discrete-time descriptor system canbe transformed into a discrete-time state-space system. We can therefore usethe prediction error method or maximum likelihood method as described inSection 5.1.1. However, as in the continuous-time case, we are faced with thechoice of either calculating the state-space description symbolically, or doingit numerically. The approach suggested here is to compute it numericallyfor each parameter value that a state-space description is necessary, sinceit is discussed in the previous chapters how this transformation can beperformed.

Consider for example the case when we wish to estimate the parametersby minimizing the prediction error criterion

VN (θ, ZN ) =1

N

N∑

t=1

1

2ε2(t, θ) (5.25)


using a Gauss-Newton search. As in the continuous-time case, we for eachparameter value θ that the Gauss-Newton algorithm needs compute a state-space description using the methods in Chapter 3 and then calculate theprediction errors ε(t, θ).

5.5.2 Frequency Domain Identification

Analogously to the continuous-time case, frequency domain identification isa way to avoid having to transform the descriptor system into state-spaceform. For frequency domain identification in discrete-time, it is assumedthat the system is described by

y(t) = G(q, θ)u(t) + H(q, θ)e(t), (5.26)

as discussed in Section 5.1.2. H(q, θ) is assumed to have a causal inverse.A linear discrete-time descriptor system with an output error noise

model,

E(θ)ξ(t + 1) = J(θ)ξ(t) + K1(θ)u(t) (5.27a)y(t) = L(θ)ξ(t) + e(t), (5.27b)

has the transfer functions

G(q, θ) = L(θ) (qE(θ) − J(θ))−1K1(θ) (5.28a)

H(q, θ) = 1. (5.28b)

We can here plug G(q, θ) directly into a criterion like

VN (θ, ZN) =N∑

k=1

∣∣Y (ωk) − G(eiωk , θ)U(ωk)

∣∣2. (5.29)

As in the continuous-time case, the situation is more complicated if wehave a full noise model as in

E(θ)ξ(t + 1) = J(θ)ξ(t) + K1(θ)u(t) + K2(θ)v1(t) (5.30a)y(t) = L(θ)ξ(t) + v2(t). (5.30b)

Also here, the simplest way to calculate H(q, θ) is probably to go via a state-space description. Consequently, not much is gained compared to using atime-domain method.

5.6 Conclusions 71

5.6 Conclusions

After an introduction on system identification, we described how descriptorsystems fit into the system identification framework. Both time domainand frequency domain methods were examined. For the time domain case,a transformation of the system must be made for each parameter valuethat the minimization algorithm needs. For the frequency domain case, thiscalculation can be avoided for output error models, but here it is insteadrequired that frequency domain data is available.

6

Initialization of Parameter

Estimates

Since descriptor equations can be formed by simply writing down basicphysical relations, the matrices are often physical parameters or knownconstants. This special structure is not used by the parameter estimationmethods discussed in the earlier chapters. In this chapter we will discusshow to utilize the structure that models on descriptor form often have toinitialize the parameter search.

6.1 Introduction

The parameter estimation methods discussed in the earlier chapters meth-ods have in common that they construct a criterion function V (θ) thatshould be minimized to estimate the unknown parameters. For the phys-ically parameterized model structures discussed in this thesis, V (θ) is acomplex function of the parameters θ. This means that the criterion func-tion in general cannot be minimized analytically. Instead, we have to resortto numerical search methods as discussed by Ljung (1999). Such methodsonly guarantee convergence to a local minimum, and experience shows thatit can be difficult to find the global minimum of V (θ) for physically pa-rameterized model structures. One remedy is to use physical insight whenselecting initial values for the numerical search, and another is to do severalsearches with different starting values. Although these remedies can work

73

74 Chapter 6 Initialization of Parameter Estimates

well in many cases, there is still no guarantee that the global optimum isfound. In this chapter we will therefore discuss how initial parameter valuesfor the numerical search can be chosen by minimization of a polynomial. Toillustrate this, consider the following example:

Example 6.1 (Initalization through transfer function coefficients.)Consider a body with mass m to which a force F (t) is applied. The

motion of the body is damped by friction with damping coefficient k. Ifx(t) is the position of the body, the equation for the motion of the body ismx(t) = F (t) − kx(t). The position is the measured output of the model.With v(t) denoting the velocity of the body, this can be written in descriptorform as

[1 00 m

] [x(t)v(t)

]

=

[0 10 −k

] [x(t)v(t)

]

+

[01

]

F (t) (6.1a)

y =[1 0

][x(t)v(t)

]

. (6.1b)

The transfer function for this system is

G(s, θ) =[1 0

](

s

[1 00 m

]

−

[0 10 −k

])−1 [

01

]

=1

ms2 + ks. (6.2)

If a black-box estimation procedure has given the transfer function

G(s) =1

2s2 + 3s + 0.01(6.3)

a polynomial which measures the difference of the transfer function coeffi-cients is

p(θ) = (m − 2)2 + (k − 3)2 + 0.012. (6.4)

This polynomial is minimized by m = 2 and k = 3.

As shown in the example, we assume that a black-box model of thesystem has been estimated beforehand by for example a subspace method(Ljung, 1999). The polynomial is then formed as a measure of the “distance”between the black-box model and the physically parameterized model. Al-though the measure is formed in an ad hoc manner, it should in many casesgive a better initial value than a pure guess. However, we will have no guar-antees for the quality of the initial values selected. Therefore the results

6.1 Introduction 75

should be compared to the results for initial values selected from physicalinsight, or for randomly selected initial values.

We saw in Example 6.1, that if the black-box model and the physicallyparameterized model both are in transfer function form, one straightfor-ward way to get initial values for the parameter search is to try to makethe coefficients of the numerator and denominator polynomials as equal aspossible. Note that linear state-space and linear descriptor models easilycan be converted to transfer functions as discussed earlier.

Although the polynomial p(θ) in Example 6.1 was trivial to minimize,one can note that p(θ) can be a high order polynomial with as many variablesas there are unknown parameters. In some cases it could be preferableto have a polynomial with a lower degree, but with a higher number ofvariables. For parameterized state-space systems, Parrilo and Ljung (2003)discusses a method to find a polynomial which is biquadratic in its variables(this work is based on the paper Xie and Ljung, 2002). This method requiresthat the elements of the state-space matrices are unknown parameters orconstants. It is also proposed that the polynomial could be minimized bysum of squares optimization. The price to get a biquadratic polynomialto minimize is that more variables than the unknown parameters must beincluded in the polynomial.

The requirement that the elements of the state-space matrices shouldbe unknown physical parameters or known constants can be rather strict.Since one usually needs to make different transformations to get a statespace description, the elements of the matrices are usually functions of theunknown physical parameters. It is much more likely that the elementsof the matrices of a linear descriptor system are unknown parameters orconstant, since basic physical equations often are simple integrators andstatic relations. By applying the technique in Parrilo and Ljung (2003)to descriptor systems, we can therefore utilize the structure that often ispresent in descriptor systems. This is what this chapter is about. We willalso discuss sum of squares optimization (Parrilo, 2000; Prajna et al., 2002),which in some cases can be used to find the global minimum of a polynomial.

That descriptor systems often have nice structure is also motivated byExample 6.2 below.

Example 6.2 Consider the system in Example 6.1:

[1 00 m

] [x(t)v(t)

]

=

[0 10 −k

] [x(t)v(t)

]

+

[01

]

F (t) (6.5)


In descriptor form, the elements of the matrices are clearly known or physicalparameters. This is however not the case if the system is written in state-space form:

[x(t)v(t)

]

=

[0 1

0 − km

] [x(t)v(t)

]

+

[01m

]

F (t) (6.6)

6.2 Transforming the Problem

In this section, we describe how the problem of finding initial values forthe parameters to be estimated can be posed as the minimization of abiquadratic polynomial. The transformation is based on that we have aconsistently estimated black-box model in state-space form,

x(t) = A0x(t) + B0u(t) (6.7a)y(t) = C0x(t), (6.7b)

which could have been estimated using for example a subspace method,see Ljung (1999). The idea is then that there should exist a transformationbetween the parameterized descriptor model and the black-box model for theoptimal parameter values. Because of modeling errors and noise, there willhowever not in general exist an exact transformation between the systems,and we therefore choose to minimize a norm which measures the differencebetween the two systems as a function of the parameters.

As the transformations are simplified considerably in the special casewhen E(θ) is invertible, this case is discussed separately in section 6.2.1.The general case is discussed in section 6.2.2.

6.2.1 The Case of Invertible E(θ)

Consider the descriptor system

E(θ)ξ(t) = J(θ)ξ(t) + K(θ)u(t) (6.8a)y(t) = L(θ)ξ(t) (6.8b)

and let E(θ) be invertible. Lemma 2.3 gives that a transformation

PE(θ)QQ−1ξ(t) = PJ(θ)QQ−1ξ(t) + PK(θ)u(t) (6.9a)

y(t) = L(θ)QQ−1ξ(t) (6.9b)

6.2 Transforming the Problem 77

with invertible P and Q results in a state-space description,

x(t) = PJ(θ)Qx(t) + PK(θ)u(t) (6.10a)y(t) = L(θ)Qx(t) (6.10b)ξ(t) = Qx(t). (6.10c)

It is clear that it is possible to achieve all state-space descriptions that areequivalent to (6.8) in this way by including a further similarity transforma-tion of the state-space description in P and Q.

If we now have a consistent estimate of the system in the form (6.7),we want to find parameter values θ that make the input-output behaviorof (6.7) and (6.8) as equal as possible. If it were possible to make themexactly equal, there would be matrices P and Q and parameter values θ

such that

PE(θ)Q = I (6.11a)

PJ(θ)Q = A0 (6.11b)

PK(θ) = B0 (6.11c)

L(θ)Q = C0 (6.11d)

which also can be written as

PE(θ) = Q−1 (6.12a)

PJ(θ) = A0Q−1 (6.12b)

PG(θ) = B0 (6.12c)

L(θ) = C0Q−1. (6.12d)

As there will always be some noise and modeling errors, we cannot expectthese equations to hold exactly. Therefore we form a polynomial that mea-sures how well these equations are satisfied:

p1(θ, P,Q−1) = ‖PE(θ) − Q−1‖2F

+ ‖PJ(θ) − A0Q−1‖2

F

+ ‖PG(θ) − B0‖2F

+ ‖L(θ) − C0Q−1‖2

F (6.13)

Here ‖ · ‖2F denotes the squared Frobenius norm, i.e., the sum of all squared

matrix elements. This polynomial is always biquadratic in the unknown


parameters θ and the elements of the matrices P and Q−1, if the elementsof the descriptor matrices are constants or unknown parameters. When thepolynomial is formed as in Example 6.1, the polynomial is not guaranteedto be biquadratic, but could have higher degree. The method in this sectionconsequently guarantees that the polynomial to be minimized is biquadraticto the prize of a higher number of variables. If minimization of (6.13) doesnot give good results, one may instead try to minimize

p2(θ, P−1, Q) = ‖E(θ)Q − P−1‖2F

+ ‖J(θ)Q − P−1A0‖2F

+ ‖G(θ) − P−1B0‖2F

+ ‖L(θ)Q − C0‖2F . (6.14)

This polynomial is biquadratic in the unknown parameters θ and the ele-ments of the matrices P−1 and Q if the elements of the descriptor matricesare constants or unknown parameters. It also measures how well (6.11) issatisfied.

6.2.2 The Case of Non-Invertible E(θ)

In the case when E(θ) is not invertible, it is still possible to formulate apolynomial that can give good initial values for the parameter search whenminimized. However, in this more complex case, it cannot in general beguaranteed that the polynomial will be biquadratic in the unknown vari-ables. Therefore we will also discuss additional assumptions to achieve this.

As the output of linear continuous-time descriptor systems can dependon derivatives of the input, we must assume that the estimated black-boxmodel of the system is in the form

x(t) = A0x(t) + B0u(t) (6.15a)

y(t) = C0x(t) +

m−1∑

k=0

D0ku(k)(t). (6.15b)

Furthermore, we know from Lemma 2.3 that for each selection of parametervalues θ there exists a transformation

PE(θ)QQ−1ξ(t) = PJ(θ)QQ−1ξ(t) + PK(θ)u(t) (6.16a)

y(t) = L(θ)QQ−1ξ(t) (6.16b)

6.2 Transforming the Problem 79

that gives the system[I 00 N

] [w1(t)w2(t)

]

=

[A 00 I

] [w1(t)w2(t)

]

+

[B

D

]

u(t) (6.17a)[w1(t)w2(t)

]

= Q−1ξ(t) (6.17b)

y(t) = L(θ)Q

[w1(t)w2(t)

]

. (6.17c)

According to Theorem 2.1 this can be further transformed into the form

w1(t) = Aw1(t) + Bu(t) (6.18a)

w2(t) = −Du(t) −

m−1∑

i=1

N iDu(i)(t) (6.18b)

[w1(t)w2(t)

]

= Q−1ξ(t) (6.18c)

y(t) = L(θ)Q

[w1(t)w2(t)

]

. (6.18d)

We now want to find parameter values θ and transformation matrices P

and Q such that the models (6.15) and (6.18) have the same input-outputbehavior. From (6.15)–(6.18), we see that this is the case if the followingequations are satisfied.

PE(θ)Q =

[I 00 N

]

(6.19a)

PJ(θ)Q =

[A0 00 I

]

(6.19b)

PK(θ) =

[B0

D

]

(6.19c)

L(θ)Q =[C0 C2

](6.19d)

D00 = −C2D (6.19e)

D0k = −C2NkD, k = 1 . . . m − 1. (6.19f)

Nm = 0 (6.19g)

Here we introduced the matrix C2 to simplify the notation. Equation (6.19g)guarantees that N is nilpotent. This can also be achieved by for example pa-rameterizing N as an upper triangular matrix with zero diagonal elements,


but then extra care would have to be taken to guarantee that N is nilpotentof the correct order. A polynomial that measures how well these equationsare satisfied can now be formed:

p3(θ, P,Q−1, N,D,C2) = ‖PE(θ) −

[I 00 N

]

Q−1‖2F (6.20)

+ ‖PJ(θ) −

[A0 00 I

]

Q−1‖2F

+ ‖PK(θ) −

[B0

D

]

‖2F

+ ‖L(θ) −[C0 C2

]Q−1‖2

F

+ ‖D00 + C2D‖2F

+m−1∑

k=1

‖D0k + C2NkD‖2

F

+ ‖Nm‖2F

This polynomial can unfortunately not be guaranteed to be biquadraticin its variables, even if the elements of the descriptor matrices are constantsor unknown parameters. However, if the true system has

D0k = 0, k = 0 . . . m − 1 (6.21)

and the descriptor model is such that

C2D = 0 (6.22a)

C2NkD = 0, k = 1 . . . m − 1 (6.22b)Nm = 0 (6.22c)

then (6.20) simplifies to

p4(θ, P,Q−1, N,D,C2) = ‖PE(θ) −

[I 00 N

]

Q−1‖2F (6.23)

+ ‖PJ(θ) −

[A0 00 I

]

Q−1‖2F

+ ‖PK(θ) −

[B0

D

]

‖2F

+ ‖L(θ) −[C0 C2

]Q−1‖2

F .

6.3 Sum of Squares Optimization 81

This polynomial is biquadratic in its variables (the elements of θ and theunknown matrices).

The relation (6.21) can in many cases be physically motivated, since it iscommon that the output of physical systems does not depend directly on theinput or its derivatives. If this is the case, the descriptor matrices shouldbe parameterized so that the (6.22) holds for all or almost all parametervalues. Note that it always can be tested afterwards if (6.22) is fulfilled.This is simply done by testing if C2D = 0, if C2N

kD = 0, and if Nm = 0.

6.3 Sum of Squares Optimization

The polynomials that are formed in this chapter could be minimized byany method that gives the global minimum. One family of methods thatcould be used are algebraic methods, such as Gröbner bases. Here we willdiscuss another method which relaxes the minimization to a sum of squaresproblem, as described by, e.g., Parrilo (2000) and Prajna et al. (2002). Todescribe this procedure, we first need to note that the problem

minθ

p(θ) (6.24)

also can be written as

maxθ,λ

λ (6.25a)

subject to p(θ) − λ ≥ 0. (6.25b)

Now, since a sum of squared real polynomials fi(θ, λ) always is greater thanor equal to zero, a relaxation of (6.25) is

maxθ,λ

λ (6.26a)

subject to p(θ) − λ =∑

i

f2i (θ, λ). (6.26b)

As described in the references, the relaxed problem always gives a lowerbound on the optimal value, and for many problems it also attains a strictlower bound. The relaxed problem can be solved using semidefinite pro-gramming as described by Prajna et al. (2002). The algorithms for findingthe lower bound also often find variable values that attain this lower bound,and in this case we of course have the actual optimum.

The reason that the algorithm gives a lower bound, which is not guar-anteed to be the actual optimum, is that non-negativity of a polynomial


is not equivalent to the polynomial being a sum of squares. However, inthe following cases non-negativity and the existence of a sum of squaresdecomposition are equivalent (Prajna et al., 2002):

• Univariate polynomials of any (even) degree.

• Quadratic polynomials in any number of variables.

• Quartic polynomials in two variables.

Unfortunately, the polynomials we have formed are biquadratic, so we arenot guaranteed to find the minimum. If the optimal value is zero we willanyway have equivalence between non-negativity and the existence of a sumof squares decomposition since the original formulation of the polynomials,since (6.13), (6.14), (6.20), or (6.23) then are suitable sum of square decom-positions for λ = 0. We will have this case if there exists a parameter valuethat make the input-output behavior of the descriptor system and black-boxmodel exactly equal.


The discussion in Section 6.2 is valid also for the discrete-time case. Theonly difference is that we have a discrete-time descriptor system

E(θ)ξ(t + 1) = J(θ)ξ(t) + K(θ)u(t) (6.27a)y(t) = L(θ)ξ(t) (6.27b)

for which we need to find initial values for parameter estimation. In the casewhen E(θ) is invertible, we assume that a consistently estimated black-boxmodel

x(t + 1) = A0x(t) + B0u(t) (6.28a)y(t) = C0x(t) (6.28b)

is available. The polynomials p1 and p2 in (6.13) and (6.14) can then beminimized to find initial parameter values. In the case where E(θ) is notinvertible, we instead assume that a black-box according to

x(t + 1) = A0x(t) + B0u(t) (6.29a)

y(t) = C0x(t) +

m−1∑

k=0

D0ku(t + k) (6.29b)

6.5 Conclusions 83

is available. The polynomial p3 in (6.20) can then be used to find initialvalues. If the assumptions (6.21) and (6.22) are fulfilled, the simpler poly-nomial p4 in (6.23) can be used instead.

6.5 Conclusions

We noted that the standard system identification problem often is a mini-mization with many local minima. As the minimization problem normallyis solved using a standard optimization method, it is important to havegood initial values for the parameters that are to be estimated. We notedthat a polynomial which measures the difference between the coefficients oftransfer functions can be formed. If this polynomial is minimized, it shouldgive good initial values. However, this polynomial can have a high degree,so we examined how a polynomial which is biquadratic can be formed. Thispolynomial also gives an initial guess for the parameters if it is minimized,but has more unknown variables. To guarantee that this polynomial isbiquadratic, we used the special structure that often is present in lineardescriptor systems.

7

Conclusions

The aim of this work is to examine how unknown parameters in lineardescriptor systems can be estimated. An important reason for posing thisquestion, is that object-oriented modeling tools like Modelica generate mod-els of this form. After the discussion in the thesis we in principle know howthe estimation could be performed — the descriptor system is transformedinto state-space form or into the frequency domain so that standard iden-tification methods can be used. However, in practice there is still work tobe done. A goal is that a user of an object-oriented modeling tool shouldbe able to estimate unknown parameters in a model with the press of abutton, after providing the necessary data files. To make this possible,identification software must be linked to or included in the modeling tool.To implement this, we should use the method from Chapter 3 for computa-tion of the canonical forms. Another aspect is the noise modeling. A usermight very well have insight on where disturbances affects the system. Thiscould then be marked in the model. The modeling tool should here use theresults from Chapter 4 to show the user where disturbances can be added.The noise model could then be used both for parameter estimation, and forconstruction of a Kalman filter for estimation of the internal variables.

85

86 Chapter 7 Conclusions

Bibliography

Anderson, B. and Moore, J. (1979). Optimal Filtering. Information andSystem Sciences Series. Prentice-Hall, Englewood Cliffs, N.J.

Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra,J., Croz, J. D., Greenbaum, A., Hammarling, S., McKenney, A., andSorensen, D. (1999). LAPACK Users’ Guide. Society for Industrial andApplied Mathematics, Philadelphia, 3. edition.

Åström, K. (1970). Introduction to Stochastic Control Theory. Mathematicsin Science and Engineering. Academic Press, New York and London.

Åström, K. and Wittenmark, B. (1984). Computer Controlled Systems,

Theory and Design. Information and System Sciences Series. Prentice-Hall, Englewood Cliffs, N.J.

Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., and van der Vorst, H. (2000).Templates for the Solution of Algebraic Eigenvalue Problems, A Practical

Guide. SIAM, Philadelphia.

Bender, D. and Laub, A. (1987). The linear-quadratic optimal regulatorfor descriptor systems. IEEE Transactions on Automatic Control, AC-32(8):672–688.

87

88 Bibliography

Brenan, K., Campbell, S., and Petzold, L. (1996). Numerical Solution of

Initial-Value Problems in Differential-Algebraic Equations. Classics InApplied Mathematics. SIAM, Philadelphia.

Chisci, L. and Zappa, G. (1992). Square-root Kalman filtering of descriptorsystems. Systems & Control Letters, 19(4):325–334.

Cobb, D. (1984). Controllability, observability, and duality in singular sys-tems. IEEE Transactions on Automatic Control, AC-29(12):1076–1082.

Dai, L. (1987). State estimation schemes for singular systems. In Preprints

of the 10th IFAC World Congress, Munich, Germany, volume 9, pages211–215.

Dai, L. (1989a). Filtering and LQG problems for discrete-time stochasticsingular systems. IEEE Transactions on Automatic Control, 34(10):1105–1108.

Dai, L. (1989b). Singular Control Systems. Lecture Notes in Control andInformation Sciences. Springer-Verlag, Berlin, New York.

Darouach, M., Boutayeb, M., and Zasadzinski, M. (1997). Kalman filteringfor continuous descriptor systems. In Proceedings of the American Control

Conference, pages 2108–2112, Albuquerque, New Mexico. AACC.

Darouach, M., Zasadzinski, M., and Mehdi, D. (1993). State estimationof stochastic singular linear systems. International Journal of Systems

Science, 24(2):345–354.

Deng, Z. and Liu, Y. (1999). Descriptor kalman filtering. International

Journal of Systems Science, 30(11):1205–1212.

Fritzson, P. (2004). Principles of Object-Oriented Modeling and Simulation

with Modelica 2.1. Wiley-IEEE, New York.

Gantmacher, F. (1960). The Theory of Matrices, volume 2. Chelsea Pub-lishing Company, New York.

Gerdin, M., Glad, T., and Ljung, L. (2003). Parameter estimation in lineardifferential-algebraic equations. In Proceedings of the 13th IFAC sympo-

sium on system identification, pages 1530–1535, Rotterdam, the Nether-lands.

Bibliography 89

Germani, A., Manes, C., and Palumbo, P. (2002). Kalman-bucy filtering forsingular stochastic differential systems. In Proceedings of the 15th IFAC

World Congress, Barcelona, Spain.

Glad, T. and Ljung, L. (2000). Control Theory, Multivariable and Nonlinear

Methods. Taylor and Francis, New York.

Golub, G. and van Loan, C. (1996). Matrix Computations. The JohnHopkins University Press, Baltimore and London, 3 edition.

Kågström, B. (1994). A perturbation analysis of the generalized sylvesterequation. Siam Journal on Matrix Analysis and Applications, 15(4):1045–1060.

Kailath, T. (1980). Linear Systems. Information and Systems SciencesSeries. Prentice Hall, Englewood Cliffs, N.J.

Kailath, T., Sayed, A., and Hassibi, B. (2000). Linear Estimation. PrenticeHall Information and System Sciences Series. Prentice Hall, Upper SaddleRiver, N.J.

Kronecker, L. (1890). Algebraische reduction der schaaren bilinearer formen.S.-B. Akad. Berlin, pages 763–776.

Kunkel, P. and Mehrmann, V. (1994). Canonical forms for linear differential-algebraic equations with variable coefficients. Journal of Computational

and Applied Mathematics, 56(3):225–251.

Lewis, F. (1986). A survey of linear singular systems. Circuits Systems and

Signal Processing, 5(1):3–36.

Ljung, L. (1999). System Identification - Theory for the User. Informationand System Sciences Series. Prentice Hall PTR, Upper Saddle River, N.J.,2. edition.

Ljung, L. (2000). System Identification Toolbox for use with Matlab -

User’s Guide, version 5. MathWorks, Natick, Mass.

Luenberger, D. (1978). Time-invariant descriptor systems. Automatica,14:473–480.

Nikoukhah, R., Campbell, S., and Delebecque, F. (1998). Kalman filteringfor general discrete-time LTI systems. In Proceedings of the 37th IEEE

Conference on Decision & Control. Tampa, Florida USA., pages 2886–2891. IEEE.

90 Bibliography

Nikoukhah, R., Campbell, S., and Delebecque, F. (1999). Kalman filteringfor general discrete-time linear systems. IEEE Transactions on Automatic

Control, 44(10):1829–1839.

Parrilo, P. (2000). Structured Semidefinite Programs and Semialgebraic Ge-

ometry Methods in Robustness and Optimization. PhD thesis, CaliforniaInstitute of Technology, Pasadena, California.

Parrilo, P. and Ljung, L. (2003). Initialization of physical parameter esti-mates. In Proceedings of the 13th IFAC symposium on system identifica-

tion, pages 1524–1529, Rotterdam, the Netherlands.

Pintelon, R. and Schoukens, J. (2001). System Identification: A frequency

domain approach. IEEE Press, New York.

Polderman, J. and Willems, J. (1998). Introduction to Mathematical Sys-

tems Theory: a behavioral approach. Number 26 in Texts in AppliedMathematics. Springer-Verlag, New York.

Prajna, S., Papachristodoulou, A., and Parrilo, P. A. (2002). SOS-

TOOLS. Sum of squares optimization toolbox for Matlab. User’s

guide. Available at http://www.cds.caltech.edu/sostools andhttp://www.aut.ee.ethz.ch/˜parrilo/sostools.

Rosenbrock, H. H. (1970). State-Space and Multivariable Theory. JohnWiley & Sons, Inc., New York.

Rugh, W. (1996). Linear System Theory. Prentice Hall, Upper Saddle River,N.J.

Schittkowski, K. (2002). Numerical Data Fitting in Dynamical Systems.Kluwer Academic Publishers, Dordrecht.

Schön, T., Gerdin, M., Glad, T., and Gustafsson, F. (2003). A modelingand filtering framework for linear differential-algebraic equations. In Pro-

ceedings of the 42nd IEEE Conference on Decision and Control, pages892–897, Maui, Hawaii, USA.

Sima, V. (1996). Algorithms for linear-quadratic optimization. Dekker, NewYork.

Tiller, M. (2001). Introduction to Physical Modeling with Modelica. Kluwer,Boston, Mass.

Bibliography 91

Varga, A. (1992). Numerical algorithms and software tools for analysis andmodelling of descriptor systems. In Prepr. of 2nd IFAC Workshop on

System Structure and Control, Prague, Czechoslovakia, pages 392–395.

Weierstrass, K. (1867). Zur theorie der bilinearen und quadratischen formen.Monatsh. Akad. Wiss., Berlin, pages 310–338.

Wolfram, S. (1999). The Mathematica Book. Cambridge Univ. Press, Cam-bridge, 4. edition.

Xie, L. and Ljung, L. (2002). Estimate physical parameters by black-boxmodeling. In Proceedings of the 21st Chinese Control Conference, pages673–677.

Tekn. lic. DissertationsDivision of Automatic Control and Communication Systems

Linköping University

P. Andersson: Adaptive Forgetting through Multiple Models and Adaptive Con-trol of Car Dynamics. Thesis No. 15, 1983.B. Wahlberg: On Model Simplification in System Identification. Thesis No. 47,1985.A. Isaksson: Identification of Time Varying Systems and Applications of SystemIdentification to Signal Processing. Thesis No 75, 1986.G. Malmberg: A Study of Adaptive Control Missiles. Thesis No 76, 1986.S. Gunnarsson: On the Mean Square Error of Transfer Function Estimates withApplications to Control. Thesis No. 90, 1986.M. Viberg: On the Adaptive Array Problem. Thesis No. 117, 1987.K. Ståhl: On the Frequency Domain Analysis of Nonlinear Systems. ThesisNo. 137, 1988.A. Skeppstedt: Construction of Composite Models from Large Data-Sets. ThesisNo. 149, 1988.P. A. J. Nagy: MaMiS: A Programming Environment for Numeric/SymbolicData Processing. Thesis No. 153, 1988.K. Forsman: Applications of Constructive Algebra to Control Problems. ThesisNo. 231, 1990.I. Klein: Planning for a Class of Sequential Control Problems. Thesis No. 234,1990.F. Gustafsson: Optimal Segmentation of Linear Regression Parameters. ThesisNo. 246, 1990.H. Hjalmarsson: On Estimation of Model Quality in System Identification. The-sis No. 251, 1990.S. Andersson: Sensor Array Processing; Application to Mobile CommunicationSystems and Dimension Reduction. Thesis No. 255, 1990.K. Wang Chen: Observability and Invertibility of Nonlinear Systems: A Differ-ential Algebraic Approach. Thesis No. 282, 1991.J. Sjöberg: Regularization Issues in Neural Network Models of Dynamical Sys-tems. Thesis No. 366, 1993.P. Pucar: Segmentation of Laser Range Radar Images Using Hidden Markov FieldModels. Thesis No. 403, 1993.H. Fortell: Volterra and Algebraic Approaches to the Zero Dynamics. ThesisNo. 438, 1994.T. McKelvey: On State-Space Models in System Identification. Thesis No. 447,1994.

T. Andersson: Concepts and Algorithms for Non-Linear System Identifiability.Thesis No. 448, 1994.P. Lindskog: Algorithms and Tools for System Identification Using Prior Knowl-edge. Thesis No. 456, 1994.J. Plantin: Algebraic Methods for Verification and Control of Discrete EventDynamic Systems. Thesis No. 501, 1995.J. Gunnarsson: On Modeling of Discrete Event Dynamic Systems, Using Sym-bolic Algebraic Methods. Thesis No. 502, 1995.A. Ericsson: Fast Power Control to Counteract Rayleigh Fading in Cellular RadioSystems. Thesis No. 527, 1995.M. Jirstrand: Algebraic Methods for Modeling and Design in Control. ThesisNo. 540, 1996.K. Edström: Simulation of Mode Switching Systems Using Switched Bond Graphs.Thesis No. 586, 1996.J. Palmqvist: On Integrity Monitoring of Integrated Navigation Systems. ThesisNo. 600, 1997.A. Stenman: Just-in-Time Models with Applications to Dynamical Systems.Thesis No. 601, 1997.M. Andersson: Experimental Design and Updating of Finite Element Models.Thesis No. 611, 1997.U. Forssell: Properties and Usage of Closed-Loop Identification Methods. ThesisNo. 641, 1997.M. Larsson: On Modeling and Diagnosis of Discrete Event Dynamic systems.Thesis No. 648, 1997.N. Bergman: Bayesian Inference in Terrain Navigation. Thesis No. 649, 1997.V. Einarsson: On Verification of Switched Systems Using Abstractions. ThesisNo. 705, 1998.J. Blom, F. Gunnarsson: Power Control in Cellular Radio Systems. ThesisNo. 706, 1998.P. Spångéus: Hybrid Control using LP and LMI methods – Some Applications.Thesis No. 724, 1998.M. Norrlöf: On Analysis and Implementation of Iterative Learning Control. The-sis No. 727, 1998.A. Hagenblad: Aspects of the Identification of Wiener Models. Thesis no 793,1999.F. Tjärnström: Quality Estimation of Approximate Models. Thesis no 810, 2000.C. Carlsson: Vehicle Size and Orientation Estimation Using Geometric Fitting.Thesis no 840, 2000.J. Löfberg: Linear Model Predictive Control: Stability and Robustness. Thesisno 866, 2001.

O. Härkegård: Flight Control Design Using Backstepping. Thesis no 875, 2001.J. Elbornsson: Equalization of Distortion in A/D Converters. Thesis No. 883,2001.J. Roll: Robust Verification and Identification of Piecewise Affine Systems. ThesisNo. 899, 2001.I. Lind: Regressor Selection in System Identification using ANOVA. Thesis No. 921,2001.R. Karlsson: Simulation Based Methods for Target Tracking. Thesis No. 930,2002.P-J. Nordlund: Sequential Monte Carlo Filters and Integrated Navigation. The-sis No. 945, 2002.M. Östring: Identification, Diagnosis, and Control of a Flexible Robot Arm.Thesis No. 948, 2002.C. Olsson: Active Engine Vibration Isolation using Feedback Control. ThesisNo. 968, 2002.J. Jansson: Tracking and Decision Making for Automotive Collision Avoidance.Thesis No. 965, 2002.N. Persson: Event Based Sampling with Application to Spectral Estimation.Thesis No. 981, 2002.D. Lindgren: Subspace Selection Techniques for Classification Problems. ThesisNo. 995, 2002.E. Geĳer Lundin: Uplink Load in CDMA Cellular Systems. Thesis No. 1045,2003.M. Enqvist: Some Results on Linear Models of Nonlinear Systems. ThesisNo. 1046, 2003.T. Schön: On Computational Methods for Nonlinear Estimation. Thesis No. 1047,2003.F. Gunnarsson: On Modeling and Control of Network Queue Dynamics. ThesisNo. 1048, 2003.S. Björklund: A Survey and Comparison of Time-Delay Estimation Methods inLinear Systems. Thesis No. 1061, 2003.

parameter estimation in linear descriptor systems

Documents