a survey of maneuvering target tracking—part v: multiple-mode l...

SUBMITTED TO IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS: NOVEMBER 26, 2003 1

A Survey of Maneuvering Target Tracking—Part V: Multiple-Mode l Methods

X. Rong Li and Vesselin P. JilkovDepartment of Electrical Engineering, University of New Orleans

New Orleans, LA 70148, USA504-280-7416 (phone), 504-280-3950 (fax), [email protected], [email protected]

Abstract

This is the fifth part of a series of papers that provide a comprehensive survey of techniques for tracking maneuvering targetswithout addressing the so-called measurement-origin uncertainty. PartI and Part II deal with target motion models. Part III coversmeasurement models and associated techniques. Part IV is concerned with tracking techniques that are based on decisions regardingtarget maneuvers. This part surveys the multiple-model methods—the use of multiple models (and filters) simultaneously—whichis the prevailing approach to maneuvering target tracking in the recent years. The survey is presented in a structured way, centeredaround three generations of algorithms: autonomous, cooperating, and variable structure. It emphasizes on the underpinning ofeach algorithm and covers various issues in algorithm design, application,and performance.

Index Terms

Target Tracking, Multiple Model, IMM, Survey

CONTENTS

I Introduction 2

II Hybrid Estimation 3

III Overview of Multiple-Model Approach 4III-A Basic Idea of MM Approach . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . 4III-B Structures of MM Algorithms . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 4III-C Three Generations of MM Algorithms . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5III-D Optimality Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 6

IV The First Generation: Autonomous MM Estimation 8IV-A Optimal Autonomous MM Estimation . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 8IV-B Autonomous Operations of Conditional Filtering . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 10IV-C Output Processing for MM Estimation . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10IV-D Convergence of AMM Estimates . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 12IV-E Tracking Applications . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 12

V The Second Generation: Cooperating MM Estimation 14V-A Optimal Cooperating MM Estimation . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 14V-B Cooperation Strategies for MM Estimation . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . 16

V-B.1 GPB and IMM Merging Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17V-B.2 Other Merging-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21V-B.3 Hypothesis Reduction by Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23V-B.4 Merging vs. Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24V-B.5 Iteration Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

V-C Multiple-Model Smoothing . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 29V-D Convergence of CMM Estimation Algorithms . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 30V-E Tracking Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 31

V-E.1 Surveillance for Air Traffic Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31V-E.2 Defence Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32V-E.3 Tracking in Presence of Correlated Noise, Glint, or Multipath . . . . . . . . . . . . . . 33

Research supported by NSF grant ECS-9734285 and NASA/LEQSFgrant (2001-4)-01.


VI The Third Generation: Variable-Structure MM Estimation 34VI-A Theoretical Foundation of VSMM Estimation . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 34VI-B Model-Set Decision Given Candidate Sets . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 35VI-C MM Estimation Given Model-Set Sequence . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 36VI-D VSMM Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 38VI-E Tracking Applications . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 41

VII MM Algorithm Design Issues 42VII-A Model-Set Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . 42VII-B Determination of Transition Probabilities . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 44VII-C Various MM Designs and Performance Studies . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 45

VIII Nonstatistical Methods 46

IX Concluding Remarks 48

References 49

I. I NTRODUCTION

This is the fifth part of a series of papers that provide a comprehensive survey of techniques for tracking maneuvering targetswithout addressing the so-called measurement-origin uncertainty. Part I [209] and Part II [205] deal with general target motionmodels and ballistic target motion models, respectively. Part III [206] covers measurement models, including measurementmodel-based techniques. Part IV [208] surveys various decision based methods.

In the absence of measurement-origin uncertainty, maneuvering target tracking faces two interrelated main challenges: targetmotion uncertainty and nonlinearity. Multiple-model (MM)methods have been generally considered the mainstream approachto maneuvering target tracking under motion (model) uncertainty. This part surveys such methods, that is, methods in whichmultiple models are used simultaneously at a time for maneuvering target tracking. Nonlinearity is best handled by nonlinearfiltering techniques, to be surveyed in subsequent parts. MMmethods and nonlinear filters are clearly complementary to eachother and their integration is certainly appealing.

Estimation can be classified as point estimation and densityestimation. Density estimation aims at approximating the entiredensity (distribution) of the estimatee (i.e., quantity tobe estimated—the target state), while point estimation approximates theestimatee directly. MM methods are applicable to density estimation as well as point estimation. To be more focused, however,this part only handles MM methods for point estimation, leaving those for density estimation to subsequent parts. For thesame reason, this part focuses on the interplay of model-based filters, rather than individual model based filtering. As such, itmay be helpful for the reader to assume that each single-model based filter is a Kalman filter. The more general case of MMnonlinear filtering is covered in subsequent parts.

This survey is meant to be as structured as possible. It emphasizes on the underlying ideas, concepts, and assumptions ofthe methods, rather than on particular implementations forspecific applications. This should help the reader understand notonly how these methods work but also their pros and cons. It ishoped that a distinctive feature of this survey is that it revealsthe interrelationships among various methods well. However, the reader should keep in mind that many of such statementsare based on our personal views and preferences, not always accurate or unbiased, although a good deal of effort has beenmade toward this goal. In addition to such discussions, a considerable amount of material included in this survey has notappeared elsewhere, including a significant number of open problems and ideas for further research. Also, more recent resultsare discussed in greater detail.

Within maneuvering target tracking, MM methods probably have the vastest literature, as evidenced by the reference listand the length of this paper. Regrettably, many important issues associated with applications of the MM methods, particularlythose of implementation and tuning of the MM algorithms for specific applications, cannot be discussed in greater detailasmany readers would hope for. Due to our limited energy and time, we hope the reader will accept our apology for omissionor oversight of any work that deserves to be mentioned or discussed at a greater length. As stated repeatedly in the previousparts, we appreciate receiving comments, corrections, andmissing material that should be included in this part. While we maynot be able to respond to each input, information received will be considered seriously for the refinement of this part foritsfinal publication in a journal or book.

The rest of the paper is organized as follows. Sec. 2 formulates the problem of hybrid estimation, which is what theMM methods are good for. Sec. 3 introduces and provides an overview of the MM approach, including its strengths,structures, underlying criteria, and three generations. The three generations—autonomous, cooperating, and variable-structureMM estimation—are surveyed in Secs. 4, 5, and 6, respectively, including their tracking applications. Sec. 7 covers MMalgorithm design issues, including model-set design and transition probability determination. Sec. 8 is dedicated tononstatisticalMM methods. Concluding remarks are given in the final section.


II. H YBRID ESTIMATION

In simple terms,hybrid estimationis the estimation of a quantity (a parameter or process) thathas bothcontinuous anddiscretecomponents [192], [321]. It is particularly good for process estimation with structural uncertainty. Target trackingisa hybrid estimation problem involving both continuous and discreteuncertainties[192]. In the prevailing approaches to targettracking, the modeling of the target motion/dynamics and the sensory system is essential. It is customary in these approachesto use the continuous-valued process/plant noise and measurement noise to cover the unknown modeling errors of the system.However, the major challenge of maneuvering target tracking arises from the target motion uncertainty, which is discrete-valued,not to mention the discrete-valued uncertainties in the measurement origins and the number of targets.

The target motion uncertaintyexhibits itself in the situations where a target may undergoa known or unknown maneuverduring an unknown time period. In general, a nonmaneuveringmotion and different maneuvers can be described only indifferent motion models. The use of an incorrect model oftenleads to unacceptable results.

A primary approach to target tracking in the presence of motion uncertainty is the so-calledmultiple-model(MM) method,which is one of the most natural approaches to hybrid estimation. While the MM method can be applied to hybrid estimationof a parameter, in this paper we are concerned with its application to estimation of the target state process as ahybrid process.Generally speaking, a hybrid process is one with both continuous and discrete components, such as the state of a hybrid system.A continuous-time (stochastic)hybrid systemis described by the following dynamic and measurement equations

x(t) = f [x(t), s(t), w(t), t]

z(t) = h[x(t), s(t), v(t), t]

along with a law that governs the evolution ofs, which is often given in probabilistic terms. Herex is called thebase state,which (usually) variescontinuously, just like the state of a conventional system;s is known as the systemmodeor modalstate, which has astaircase-typetrajectory; that is, it may either stay unchanged or jump;z is the measurement; andw, vare the process and measurement noise, respectively. We refer to the set of modes as themode spaceand denote it byS.In simple terms, we say thatx is continuous-valuedand s is discrete-valued(even thoughS may be a continuum in somecases). In this sense, the whole stateξ = [x′, s′]′ of a hybrid system is ahybrid process. For such a system, hybrid estimationrefers to the problem of estimatingx, s, or ξ. This system is aMarkov jump systemif s is a Markov chain, that is, if (fordiscrete-time systems)P{s(j)

k+1|s(i)k } = pij,k, ∀i, j, k, wheres

(i)k signifies that modei is in effect at timek. Often,s is assumed

a homogeneous Markov chain, that is, thetransition probabilitiespij,k = pij ,∀i, j are not a function of time indexk.One of the simplest discrete-time hybrid systems is the so-called jump-linear system, given by

xk+1 = Fk(sk+1)xk + Gk(sk+1)wk(sk+1) (1)

zk = Hk(sk)xk + vk(sk) (2)

This system is nonlinear because, for example,x or z does not depend on the stateξ of the system in a linear fashion. Werethe system modes given, however, the system would be linear. In fact,s may actually jump at unknown time instants, hencethe name. This system is known as aMarkov jump linear systemif s is a Markov chain. Ifs is unknown but time-invariant,it can be viewed or argued as an unknown parameter of the system and the system is linear from this perspective.

A variant of (1), also with first-order dependence on{sk}, is xk+1 = Fk(sk)xk + Gk(sk)wk(sk). More generally, thefollowing second-order{sk}-dependence model

xk+1 = Fk(sk+1, sk)xk + Gk(sk+1, sk)wk(sk+1, sk) (3)

was advocated by [57], [59]. It can describe jumps ofx that occur simultaneously with and due to jumps ofs. In other words,this model is capable of characterizing the system’s behavior at the instant of a jump ins and thus eliminates the need forintroducing such a model explicitly, as is done in most practical designs for MM estimation. Important algorithms for thefirst-order dependence models can be generalized to this second-order dependence model of a greater modeling power. Forexample, it has been shown [59] that a time-reversed versionof (the autonomous part of) (3)xk+1 = Fk(sk+1, sk)xk has ingeneral a second-order dependence on time-reversed{sk}, even ifFk(sk+1, sk) is actually{sk}-invariant. Such time-reversedmodels are useful in, for example, the well-known backward filters for smoothing.

III. OVERVIEW OF MULTIPLE-MODEL APPROACH

A. Basic Idea of MM Approach

Conventional solutions to hybrid estimation problems follow the strategy that can be characterized as “estimation afterdecision,” “decision followed by estimation,” or simply “decision-estimation.” At any time, it first decides on a (best) modeland then runs a single filter based on the model as if it were thetrue one. The decision-based methods for maneuvering targettracking, surveyed in Part IV [208], belong to this class. This approach has several obvious drawbacks. First, possibleerrors indeciding on the model are not accounted for in the estimation. Second, decision is done irrevocably before estimation, althoughestimation results are often beneficial to decision making.Although these drawbacks are well perceived, their remedies are


hard to come by within this conventional strategy. For example, accounting for decision errors would require estimation in thepresence of an unknown model-truth mismatch, which is very challenging and is still an open problem. Also, traditional model-based estimation cannot be done before the decision since itrelies on the use of a single model. One possibility here is tousea model-free (nonparametric) estimation method, which appears an overkill for maneuvering target tracking, where althoughuncertain the true mode is only over a fairly limited set. Rather, the semi-parametric methodology seems more attractive herein general (see, e.g., [33]). Another possible improvementis to have an iterated version with several decision-estimation cyclesat each time to take advantage of the estimation results in the decision step. This actually amounts to a degenerated MMapproach and its benefit is probably not commensurate with the increased complexity.

The multiple-model approachgets around the difficulty due to the model uncertainty by using more than one model. Itsbasic idea is to assume a setM of models as possible candidates of the true mode in effect atthe time; run a bank ofelementalfilters, each based on a unique model in the set; and generate the overall estimates by a process based on the results of theseelemental filters. As such, the MM method provides an integrated approach to the joint decision and estimation problem ofmaneuvering target tracking. It can be classified as a semi-parametric approach. In the optimization theoretic parlance, theMM method has the potential of arriving at a globally optimalsolution, which is inherently superior in performance to thetwo-stage optimization strategy of the conventional approach.

In this survey, we maintain that MM and non-MM estimation methods are separated as follows. At anysingle time, thelatter actually runs only one model-based filter, possibly out of a set of candidates, but filters at different times may differ,while the former runs multiple model-based filtersat least at some time.

For simplicity here we only describe the MM method for Markovjump linear systems for two main reasons. Almost allMM algorithms aretheoreticallyvalid only for this class of systems, and extensions of our description here to other hybridsystems is straightforward in theory, although this is not necessarily the case for the development of the corresponding MMalgorithms. For a Markov jump linear system, theith model in the MM method obeys the following equations:

xk+1 = F(i)k xk + G

(i)k w

(i)k , E[w

(i)k ] = w

(i)k , cov(w(i)

k ) = Q(i)k (4)

zk = H(i)k xk + v

(i)k , E[v

(i)k ] = v

(i)k , cov(v(i)

k ) = R(i)k (5)

where superscript(i) denotes quantities pertinent to modelm(i) in M, and the jumps, if any, of the system mode are assumedto have the following homogeneoustransition probabilities

P{m(j)k+1|m

(i)k } ∆

= P{sk+1 = m(j)|sk = m(i)} = πij , ∀m(i),m(j), k (6)

wherem(i)k denotes the event that modelm(i) matches the system modes in effect at timek:

m(i)k

∆= {sk = m(i)} (7)

Similarly, their finite sequences are denoted assk = {s(i1)1 , . . . , s

(ik)k } andmk = {m(i1)

1 , . . . ,m(ik)k }, respectively.

Mode vs. model. The use of the termsmodeandmodelin the literature is chaotic. To be more precise, in this paper a moderefers to a pattern of a phenomenon or a structure of a system,and amodelis a mathematical representation or description ofthe phenomenon pattern (system structure)at a certain accuracy level. It is models, not modes, on which an estimator is based.For example, the behavior pattern of an aircraft during a turn is amodeof turning. Many mathematicalmodelsare availableat different accuracy levels that describe such a mode [209]. Equivalently, we may think that a modepreciselydescribes thetruth and a model is an approximation of the mode. Such a distinction is necessary whenever the mismatch between the modeland mode is of concern. The model setM differs in general from the mode spaceS in two aspects: (a) they have differentnumbers of elements—M usually has much fewer elements thanS; and (b) a model is usually asimplified description of amode. For example, one may use a small set of models, such as a nonmaneuver model plus several constant-turn models fortracking a target that may undergo various (complex) maneuvers (modes).

B. Structures of MM Algorithms

In general, four key components of MM estimation algorithmscan be identified as follows.

• Model-set determination. This includes both offline design and possibly online adaptation of the model set. An MMestimation algorithm distinguishes itself from non-MM estimators by the use of a set of models, instead of a singlemodel. The performance of an MM estimator depends largely onthe set of models used. The major task in the applicationof MM estimation is the design (and possibly adaptation) of the set M of multiple models.

• Cooperation strategy. This refers to all measures taken to deal with the discrete-valued uncertainties within the model set,particularly those for hypotheses about the model sequences. It includes not only pruning of unlikely model sequences,merging of “similar” model sequences, and selection of (most) likely model sequences, but also iterative strategies, suchas those based on the EM algorithm.


• Conditional filtering. This is the recursive (or batch) estimation of the continuous-valued components of the hybrid processconditioned on some assumed mode sequence. It is conceptually the same as state estimation of a conventional systemwith only continuous-valued state.

• Output processing. This is the process that generates overall estimates usingresults of all filters as well as measurements.It includes fusing/combining estimates from all filters andselecting the best ones.

Cooperation

Strategy

Filter 1

Filter 2

Output

-

-

-

-

-

-

�

�

-

-

-

-

Data

66??

66??

Fig. 1. General structure of MM estimation algorithms (with two model-based filters).

The operation of MM estimation algorithms has a general structure, as depicted in Fig. 1 with only two models. In thefigure, the outer loops between the filters and the cooperation strategy represent (possibly multiscan) recursions; theverticalarrows between the filters and the cooperation strategy represent their cooperation/interaction within one recursion. The threecomponents (exclusive conditional filtering) are not present in a non-MM algorithm essentially. Output processing is coveredin Sec. IV-C. Cooperation strategies are the main topic of Sec. V-B. Design and adaptation of model sets are addressed indetail in Secs. VII-A and VI-B, respectively. In some MM algorithms conditional filtering and cooperation strategies are tightlycoupled and can hardly be separated. Output processing, cooperation strategies, and model-set adaptation, respectively, are thecornerstones of three generations of MM algorithms developed so far, which are discussed next.

C. Three Generations of MM Algorithms

Three generations of MM algorithms have been identified in [195]. This identification is very beneficial: Different generationshave their fundamental differences in operations, structures, and limitations/capabilities/potentials; the threegenerations cameinto existence sequentially; and more convincingly later generations do inherit superior characteristics of the earlier generations.This identification also helps reveal possible directions for further development.

The first generation MM method was pioneered by Magill [232] and Lainiotis (see, e.g., [179], [180], [342]), and widelyapplied and promoted by Maybeck [239] and others. The secondgeneration, represented unquestionably by Blom’sinteractingMM (IMM) algorithm [56], [55], [58], has earned an enviable reputation for MM estimation via its significant number ofsuccessful applications. Its popularization and further development have been spearheaded by Bar-Shalom (see, e.g.,[13], [14],[18], [19], [21]). Its practical value in tracking has been strongly advocated and well demonstrated by Bar-Shalom and others,notably Blom and Blair [15]. The third generation, characterized by itsvariable structure, is gaining momentum rapidly andis becoming the state of the art of MM estimation. Its initiation [198], [191], [201] and advancement have been led by Li’steam (see, e.g., [196], [221], [213], [192], [195]).

There are two types of estimation problem for hybrid systems. The first one involves an unknown (random or nonrandom)but time-invariantmode. This is the case for estimating the state of a system with an unknown model that does not changeover time or with a known model involving an unknown (time-invariant) parameter. On the contrary, the mode in the secondtype mayjump from time to time.

The first generation is characterized by the fact that each ofits elemental filters operates individually and independentlyof all the other elemental filters. Its advantage over many non-MM approaches stems from its superioroutput processingofresults from elemental filters to generate the overall estimate. It would beoptimal if the true mode weretime-invariantbutunknown over a set that is identical to the model set used. These MM algorithms have been known under various names. Abetter name is “autonomous MM” (AMM ) algorithms for several reasons, to be clear later.

The second generation inherits the first generation’s superior output processing, and its elemental filters work together as ateam via effectiveinternal cooperation, rather than work independently as in the first generation. The cooperation includesall measures taken to achieve a better performance, such as individualized reconditioning of each filter (e.g., reinitializationas in the IMM algorithm, see Sec. V-B.1), performance enhancement via interactive iterations and competitions among filters(e.g., those based on the EM algorithms, see Sec. V-B.5), joint parameter adaptation (e.g., cohesive online identification oftransition probabilities, see Sec. VII-B), and other hypothesis reduction strategies, discussed in Sec. V-B. This generation has


the potential ofoptimal performance when the true mode mayjump among members of a set that is identical to the model setused. Many of the algorithms in this generation have been called “switching” or “dynamic” MM algorithms before, which arenot accurate or representative. We refer to the second generation ascooperating MM (CMM) algorithms.

The model groups or teams in the first two generations have a fixed membership over time and thus have afixed structure.They are allowed to have a variable membership in the third generation, leading to avariable structure, that is, a variableset of models. Still under active development, this generation is potentially much more advanced in the sense of having anopen architecture than its ancestors, which have a closed architecture. Not only does it inherit the second generation’s effectiveinternal cooperation and the first generation’s superior output processing, but it also adapts to the outside world by producingnew elemental filters if the existing ones are not good enoughand by eliminating elemental filters that are harmful. Thisgeneration has been known asvariable-structure MM (VSMM) algorithms. It ismost suitablein the case where there is asignificant truth-modelmismatch: the model set used does not match the set of possible true modes.

A non-MM algorithm relies entirely on the performance of the single “best” individual decidedprior to his/her performance.In contrast, an MM algorithm asks all individuals in a group to perform simultaneously and produces the overall estimateafter their performance. In thefirst generation, these individuals workindependently. Its superiority to non-MM algorithmsstems from its flexibility of generating its output reports based on the individual resultsa posteriori. The second generationfocuses on internal cooperation. Its individuals form acooperative team. It outperforms the first generation by team work. Thethird generation explores the best teammakeup. It determines an adaptive, cooperative team with avariable membership—itmay recruit new members and fire bad or incompetent members. Each generation is more capable than its predecessors at theprice of an increased sophistication level. It is interesting to note that the development of MM algorithms has been along thedirection from the final product to the underlying structurethrough the internal mechanisms; that is, from the output report ofthe team, to the internal cooperation of the team, and then tothe makeup of the team.

Outside of the target tracking area, almost all MM research and development have dealt only with the first generation sofar; knowledge of the second generation is limited; and the third generation is hardly known.

D. Optimality Criteria

Consider a hybrid random variableξ = (x,m), wherex is continuous valued andm hasM possible values. The completeBayesian solution of estimating(x,m) using dataz is the mixed (joint) pdf-pmfp(x,m|z) = f(x|m, z)p(m|z). This clearlyinvolves adensity estimationproblem. Since a density function requires in general infinitely many numbers to describe itcompletely, this solution generally has an infinite dimension. For a hybrid process{ξk}, this solution requires in generalrecursive estimation of the density function, known asnonlinear filtering. This is the topic of more than one subsequent partof this survey, which covers various exact and approximate nonlinear filtering methods. In this part, we deal only withpointestimation; that is, estimators that have the same, finite dimension as the estimatee(i.e., the quantity to be estimated, whichcould be the base state, modal state, or hybrid state in our case).

Least squares (LS), maximum likelihood (ML ), minimum mean-square error (MMSE), maximum a posteriori (MAP), andmethod of moments are probably most widely used methods for point estimation. Virtually all point estimation algorithmsusing multiple models developed so far are in essence based on either the MMSE or MAP criterion. This is understandablesince MMSE and MAP are two primary Bayesian criteria for estimating a random quantity and the state of a hybrid systemis more naturally treated asrandomthan as deterministic.

For convenience, we adopt the following notation for estimators of continuous-valuex and discrete-valuedm:

xMMSE = E[x|z], mMMSE = E[m|z], (xJMAP, mJMAP) = arg max(x,m)

p(x,m|z) (8)

xMAP = arg maxx

f(x|z), mMAP = arg maxm

p(m|z), xMAP(m) = arg maxx

f(x|z, m) (9)

wheref(·) is pdf,p(·) is pmf,p(x,m)∆= f(x|m)p(m) is a mixed (joint) pdf-pmf, andarg maxx g(x) stands for “the argument

x that maximizesg(x),” meaning themaximizer(i.e., the location of the largest peak) ofg(x). Note that

• In xMAP(m), m can be any mode estimator in general, but is almost always taken to bemMAP.• As defined above,mMMSE does not exist whenever the convex sumα1m1 + · · · + αMmM is meaningless or does not

exist, which is the case, for example, ifm(i) andm(j) have different dimensions.• f(x) =

∑Mi=1 f(x|m(i))p(m(i)) is p(x,m) averagedover possible values ofm.

Fig. 2 illustrates the differences amongx1 = xMMSE, x2 = xJMAP, x3 = xMAP, x4 = xMAP(mMAP), and x5 = xMAP(m) fora Gaussian mixture density:

f(x) = p1N (x; x1, σ21) + p2N (x; x2, σ

22) + p3N (x; x3, σ

23), N (x; xi, σ

2i ) =

1√2πσi

exp[−(x − xi)2/(2σ2

i )]

with (x1, x2, x3) = (−3, 0, 5), (σ21, σ

22, σ

23) = (1.62, 2.252, 22), andpi = P{m = m(i)}. In the figure, the thicker line is the

mixture density. Note that


x5 x3 x4 x1 x2 x5 x3 x x1

(a) (p1, p2, p3) = (0.21, 0.41, 0.38) (b) (p1, p2, p3) = (0.25, 0.51, 0.24), x = x2 = x4

Fig. 2. Illustration of MMSE and MAP estimators.

• x1 = xMMSE = p1x1 + p2x2 + p3x3 is the centerof probability mass, that is, thebalance pointat which the line withf(x) as the mass density function would not tip to the left or rightif a pivot is placed.

• x2 = xJMAP is the location of the peak with the largestweightedpeak value of all component densities:pi[maxx N (x; xi, σ

2i )] = pi

1σi

√2π

.• x3 = xMAP is the location of the (largest) peak of the mixture densityf(x). It is always within the interval between the

smallest and the largest locations of peaks of all componentdensities.• x4 = xMAP(mMAP) is the location of the (largest) peak of the component density N (x; xi, σ

2i ) with largestpi.

• x5 = xMAP(m = m(i)) is the location of the (largest) peak of the component density N (x; xi, σ2i ).

• Fig. 2(a) and (b) are for two mixture densities with the same component densities but different sets of weights:xMMSE,xJMAP, xMAP, xMAP(mMAP) are relatively close to each other in (b), but quite different in (a), although the sets of weightsdiffer not much. For the two cases,xMMSE changes least andxJMAP and xMAP(mMAP) change most.

These estimators minimize expectations of Bayes cost functions Ci, i = 1, 2, 3, 4, 5 respectively (see, e.g., [94], [99]). Forexample,C1(x − x) = (x − x)2 andC3(x − x) = limε→0 1l(|x − x| − ε), where1l(x) is the unit-step function and

1l(|x − x| − ε) =

{

0 |x − x| < ε1 |x − x| > ε

describes a “golf hole” of radiusε.Parallel to MAP estimation, ML estimators can also be obtained with the involved posterior pdf or pmf replaced by the

corresponding likelihood functions, although they have not been proposed systematically in MM estimation. It turns out,however, that the application of likelihood principle to estimation of a nonrandom hybrid quantity differs significantly fromthe MAP case. Assume thatξ = (x,m) is nonrandomwith continuous-valuedx and M possible values{m(1), . . . ,m(M)}for m. Thenf(z|x,m) represents a set of likelihood functions{f(z|x,m(1)), . . . , f(z|x,m(M))}. The joint ML estimator ofx andm and xML (m) given m are well defined as

ξML

= (xJML, mJML) = arg max(x,m)

f(z|x,m) = arg max(x,m(i))

{f(z|x,m(1)), . . . , f(z|x,m(M))} (10)

xML (m) = arg maxx

f(z|x,m(i)) if m = m(i)

but the ML estimators ofx andm separately are not well defined, because neitherf(z|x) nor f(z|m) has a generally accepteddefinition, although several definitions are possible. One of such definitions of the likelihood functionsL(x|z) andL(m|z) isbased on the so-called generalized likelihood principle, leading to

L(x|z) = maxm(i)

{f(z|x,m(1)), . . . , f(z|x,m(M))}, L(m|z)|m=m(i) = maxx

f(z|x,m(i)), i = 1, 2, . . . ,M

and the corresponding maximizers are then taken to be the generalized ML estimatorsxGML and mGML , respectively, whichare however equal toxJML and mJML becausemaxx L(x|z) = max(x,m(i)){f(z|x,m(1)), . . . , f(z|x,m(M))}, which is (10).

We choose the continuous-valued part ofξML

as xML , namely,xML := xJML. As such,xML is the ML counterpart ofxJMAP,not xMAP. The likelihood functions can also be defined by expectationor marginalization (see, e.g., [127])

L(x|z) = f(z|x) = E[f(z|x,m)|x] =∑

i

f(z|x,m(i))p(m(i)|x)

L(m|z) = f(z|m) = E[f(z|x,m)|m] =

∫

f(z|x,m)f(x|m)dx (11)


which, however, requires treating the quantity being averaged out as random, in violation of the previous assumption that xand m are nonrandom. Nevertheless, for MM estimationf(z|m) = E[f(z|x,m)|m] is a sensible way to go because it isnatural to treat the base statex as random and the model as nonrandom. Therefore, we usemML based on (11). With this,xML (mML ) = arg maxx f(z|x, mML ) is well defined.

IV. T HE FIRST GENERATION: AUTONOMOUSMM ESTIMATION

A. Optimal Autonomous MM Estimation

Fundamental assumptions. The first generation, autonomous MM (AMM ) algorithms were developed based on the followingtwo fundamental assumptions:

A1. The true modes is time-invariant (i.e.,sk = s,∀k).A2. The true modes at any time has a mode spaceS that is time-invariant1 and identical to the finite model setM used

(i.e., Sk = M,∀k).

Assumptions A1 and A2 allow in principle the true modes to be deterministic or random. Almost all AMM algorithmsdeveloped so far assume a randoms, although assuming an unknown but nonrandoms appears, in our opinion, somewhat morenatural for many applications. Furthermore, A2 is in fact not needed ifs is assumed nonrandom, such as for the maximumlikelihood estimation implied above and in Sec. IV-D. By A2,there is no mode-model mismatch and thus we will usem todenote both the mode and the model and treat a model (which is deterministic) as a realization of a mode (which is random)in this section. Note thatSk = M implies, but is not implied by,sk ∈ M (i.e., Sk ⊆ M). A1 implies thatm(i) is the truemodel for all times if and only if it is so at some single time and thus conditioned on anythingA, the mode probability andmode-sequence probability are equal:

P{m(i)k |A, A1} = P{m(i)

1 , . . . ,m(i)k |A, A1}

wherem(i)k denotes that modelm(i) matches the true mode at timek. Denote byM = |M| the number of models used. Then,

all possible model sequences (through timek) areconstant(by A1) and there are exactlyM of them (by A2), given by:

mk(i) = {m(i)

1 , . . . ,m(i)k }, m(i) ∈ M (12)

Note that Assumptions A1 and A2 are embedded in this definition of mk(i).

MMSE-AMM . As first proposed in [232], the MMSE-optimal AMMbase-state estimatoris given by the total expectationtheorem as (see, e.g., [21])

xk|k = E[xk|zk, A1, A2] =M∑

i=1

E[xk|zk,mk(i)]P{mk

(i)|zk, A1, A2} =M∑

i=1

x(i)k|kµ

(i)k (13)

wherezk = (z1, . . . , zk) is measurements through timek, µ(i)k = P{m(i)

k |zk,A1,A2} is the posteriormode probabilityunderA1 and A2 that the mode in effect is constant and equal to one and only one but possibly anyone of the models inM, andx

(i)k|k = E[xk|zk,mk

(i)] is the MMSE estimate from theith elemental filter assumingm(i) is true throughout time.Under A1 and A2,xMMSE

k|k is unbiased in the senseE[xk − xMMSEk|k ] = 0; its mean-square error (MSE) matrix(often called

error covariance loosely) is minimum of all base-state estimatorsxk|k, given originally in [183] (see also [21]) by

Pk|k = MSE(xk|k|zk, A1, A2) =

M∑

i=1

[P(i)k|k + (x

(i)k|k − xk|k)(x

(i)k|k − xk|k)′]µ(i)

k (14)

whereP(i)k|k = MSE(x

(i)k|k|A1,A2) is the MSE matrix of the MMSE estimatorx(i)

k|k under A1 and A2. The correspondingmodeestimator is given by

mk|k = E[mk|zk, A1, A2] =M∑

i=1

E[mk|zk,mk(i)]P{mk

(i)|zk, A1, A2} =M∑

i=1

m(i)µ(i)k (15)

MSE(mk|k|zk) = E[(mk − mk|k)(mk − mk|k)′|zk, A1, A2] =

M∑

i=1

(m(i) − mk|k)(m(i) − mk|k)′µ(i)k (16)

This mode estimator exists and is meaningful only if the convex sum (15) is well defined and meaningful, which is not thecase ifm(i) is only an index of the system structure or behavior pattern,e.g.,M = {1, 2, . . . ,M}. For instance, ifm(1) = 1is a booster model of a missile andm(2) = 2 represents a climbing motion of an aircraft, their weightedsum is meaningless.Even if m(2) = 2 is also for the missile (as a reentry model), their weighted sum is still hard to interpret: what does it mean

1This part of A2 is implied by A1, but A2 may be invoked without A1,as the second generation does.


in this case ifmMMSEk|k = 1.63? Also, (m(1),m(2)) = (1, 2) and (m(1),m(2)) = (2, 3) would lead to differentmMMSE

k|k , whichrenders interpretation difficult. This mode estimator is meaningful in general when allm(i) are points in a vector space.

Table I gives the MMSE-AMM algorithm (of the base state) under Assumptions A1 and A2 for a Gaussian system (4)–(5)where the Kalman filter is optimal given the mode.

TABLE I

AMM ALGORITHM (ONE CYCLE) UNDER A1 AND A2 FOR A GAUSSIAN SYSTEM WHEREKALMAN FILTER IS OPTIMAL GIVEN THE MODE .

1. Model-conditioned filtering (fori = 1, 2, . . . , M ):

Predicted state: x(i)k|k−1

= F(i)k−1x

(i)k−1|k−1

+ G(i)k−1w

(i)k−1

Predicted covariance: P(i)k|k−1

= F(i)k−1P

(i)k−1|k−1

(F(i)k−1)′ + G

(i)k−1Q

(i)k−1(G

(i)k−1)′

Measurement residual: z(i)k = zk − H

(i)k x

(i)k|k−1

− v(i)k

Residual covariance: S(i)k = H

(i)k P

(i)k|k−1

(H(i)k )′ + R

(i)k

Filter gain: K(i)k = P

(i)k|k−1

(H(i)k )′(S

(i)k )−1

Updated state: x(i)k|k

= x(i)k|k−1

+ K(i)k z

(i)k

Updated covariance: P(i)k|k

= P(i)k|k−1

− K(i)k S

(i)k (K

(i)k )′

2. Mode probability update (fori = 1, 2, . . . , M ):

Model likelihood: L(i)k

∆= p[z

(i)k |mk

(i), zk−1] =

exp[−(1/2)(z(i)k

)′(S(i)k

)−1z(i)k

]

|2πS(i)k

|1/2

Mode probability: µ(i)k =

µ(i)k−1

L(i)k

∑

j µ(j)k−1

L(j)k

3. Estimate fusion:

Overall estimate: xk|k =∑

i x(i)k|k

µ(i)k

Overall covariance: Pk|k =∑

i[P(i)k|k

+ (xk|k − x(i)k|k

)(xk|k − x(i)k|k

)′]µ(i)k

MAP-AMM . The mixed pdf-pmf of the base state and mode at timek is

p(xk,mk|zk, A1, A2) = f(xk|zk,mk, A1)p(mk|zk, A1, A2) = {f(i)(xk|zk)µ(i)k , i ≤ M}

wheref(i)(xk|zk) = f(xk|zk,mk(i)) is the density assuming the mode sequence ism

(i)1 , . . . ,m

(i)k (i.e., m(i) is the true model).

It thus follows from the total probability theorem that the base state has the posterior mixture density

f(xk|zk, A1, A2) =

M∑

i=1

f(xk|zk,mk(i))P{m(i)

k |zk, A1, A2} =

M∑

i=1

f(i)(xk|zk)µ(i)k (17)

andf(xk|zk, mk) = f(j)(xk|zk) if mk = mk(j). The correspondingMAP-AMM estimatorsare given by

xMAPk|k = arg max

xk

f(xk|zk, A1, A2) (18)

ξMAPk|k = (xJMAP

k|k , mJMAPk|k ) = arg max

(xk,m(i)){f(i)(xk|zk)µ

(i)k , i ≤ M}

mkMAP = (m1, . . . , mk), m1 = · · · = mk = mMAP

k = arg maxm(i)

µ(i)k

xMAPk|k (mk) = arg max

xk

f(j)(xk|zk) if mk = mk(j), xMAP

k|k (mkMAP) = xMAP

k|k (mk)|mk=mkMAP

In words, the MAP base-state estimatorxMAPk|k is the peak location of the mixture densityf(xk|zk,A1,A2), which is a

probabilistically weighted sum off(i)(xk|zk); the joint MAP estimator(xJMAPk|k , mJMAP

k|k ) is the maximizer of the pdf-pmf

p(xk,mk|zk,A1,A2) or the set off(i)(xk|zk)µ(i)k ; the MAP mode estimatormMAP

k is the one with the largest posteriorprobability and thus the MAP mode-sequence estimatormk

MAP is the constant sequence of this mode throughout time (mMAPk

andmkMAP can also be interpreted as outcomes of the corresponding MAPtests); the model-sequence conditioned MAP estimator

xMAPk|k (mk) is the peak location of the component densityf(j)(xk|zk) corresponding to the modelm = m(j).It should be clear that the special MAP estimatorsxMAP

k|k (mkMAP) and xMAP

k|k (mk) can be obtained from the set of component

MAP estimatorsx(i)MAPk|k = arg maxxk

f(i)(xk|zk), but the MAP estimatorxMAPk|k and the joint MAP estimatorxJMAP

k|k cannot in

general, althoughxJMAPk|k coincides with thex(i)MAP

k|k with the largest value off(i)(xk|zk)µ(i)k over xk andm(i).

Many of these MAP-AMM estimators were presented in [99] in simpler terms and evaluated along with the MMSE estimatorin terms of several performance measures via computer simulations for a simplified aircraft tracking example.


The MSE matrices (error covariances) of these MAP estimators do not have a known, explicit, analytic form, but they arerelated to that of the MMSE estimatorxMMSE

k|k via the following easily obtainable general relationship:

Pk|k = MSE(xk|k|zk) = MSE(xMMSEk|k |zk) + (xk|k − xMMSE

k|k )(xk|k − xMMSEk|k )′ (19)

wherexk|k is any estimator, including the above MAP estimators.

B. Autonomous Operations of Conditional Filtering

It is clear from the above that there are two key functions in the operation of an AMM algorithm: model-based conditionalfiltering and output processing. They are discussed next.

Assumption A1 is the defining assumption of the AMM algorithms. Under A1, each elemental filter does conditional filteringbased on a constant model sequence. In other words, each filter works individually and independently of other filters. As such,each conditional filtering operation is autonomous, hence the name “autonomous multiple-model algorithms.”

For elemental filteri, the goal is to compute the pdff(i)(xk|zk) = f(xk|zk,mk(i)) for the MAP case and the estimate

{x(i)k|k, P

(i)k|k} for the MMSE case, wherex(i)

k|k = E[xk|zk,mk(i)] =

∫

xkf(i)(xk|zk)dxk and P(i)k|k = MSE(x

(i)k|k). If base state

xk and measurementszk are jointly Gaussian undermk(i), thenf(i)(xk|zk) = N (xk; x

(i)k|k, P

(i)k|k) and thus conditional filtering

operations for MMSE and MAP estimation are theoretically the same. For the general, non-Gaussian case, neither class hasan explicit, analytic form. However, if modelm(i) is linear (i.e., the system conditioned onmk

(i) is linear) and satisfies thewhiteness and uncorrelatedness assumption of the Kalman filter for the process and measurement noises,recursive linearMMSE estimation(x(i)

k−1|k−1, P(i)k−1|k−1) → (x

(i)k|k, P

(i)k|k) is given by the Kalman filter explicitly regardless of Gaussianity. This

is not the case for the conditional filtering for MAP estimation.Adaptive estimation were studied by many (see, e.g., [184]), including the case of unknown time-invariant parameters

with independent measurements. These results were extended by Magill [232] in a non-recursive form to scalar, dependentmeasurements generated by state-space models with unknowndiscrete parameters. Magill’s results were further extended byLainiotis et al. [310], [183], [179], [182], [180] to a recursive form, with an exact error covariance form, for vector measurementsand arbitrary continuous and discrete parameters. Early works dealt only with estimation for systems with a time-invariantmode that is unknown (a nonrandom constant) or uncertain (a random variable, but not a random process), which led tothe autonomousoperations of conditional filtering [310], [303], [118], [185]. Many reinventions, extensions, and applicationsof this generation can be found in the literature under various names, including the “partition (partitioned or partitioning)filter” [179], [182], [180], [342], the “multiple model adaptive filter” [179], [182], the “parallel processing algorithm” [7], the“multiple model adaptive estimator” [239], the “static multiple-model algorithm” [18], [214], the “filter bank method” [65], the“self-tuning estimator” [65], the “operating regime approach” [156], and in the same spirit the “mixture of experts” [251], [73].These names suggest the structure, features, and capability of the first generation, particularly in comparison with non-MMalgorithms. For example, this MM algorithm was applied recently in [181] to state estimation of a nonlinear system withoutmode uncertainty. It runs a bank of perturbed (linearized) Kalman filters, each with a nominal state trajectory that is a randomrealization of the true one, as proposed in [179]. A large number of nominal trajectories are generated first and then clusteredto achieve better cost effectiveness [181].

C. Output Processing for MM Estimation

The output processor generates the overall estimate using information available from all elemental filters as well as data.It is in general an information fusion process. This type of fusion, however, differs from multisensor data fusion in at leasttwo fundamental aspects [195]: (i) at most one (and at least one under Assumption A2) estimate is correct in the MM fusion(but which one is unknown), whereas more than one estimate may be correct in sensor fusion, and (ii) different filters use thesame data in the MM fusion but different data in sensor fusion. Note, however, that the same data can be treated artificiallyas multiple pieces of data with perfect coupling.

The principles proposed for output processing of AMM algorithms turn out to be general, applicable to later generationsas well, although their specific implementations are tuned for AMM algorithms. In view of this, the following discussionisdirected to the general MM, not just AMM, algorithms. These principles can be classified into two groups: hard decision andsoft decision [192], [195].

Hard decision. This approach identifies a “good” subsetB of model sequences by ahard decisionprocedure, and thengenerates the overall estimate from the estimates conditioned on these model sequences. The setB may change with respectto time as more data are collected, although only constant model sequences are considered in the AMM algorithms.

The most natural subsetB is the set of model sequences deemed most likely or “not too unlikely.” Several different decisionprocedures for this purpose have been proposed, including what can be calledB-best approach and the Viterbi algorithm. Thishard decision amounts topruning of unlikely model sequences. It is applied here tooutput processing. As discussed later inSec. V-B, the same procedure has also been applied tohypothesis reductionas a cooperation strategy. These two applicationshave always appeared together so far, which is natural, although each actually stands alone without the other.


The Viterbi algorithm or forward dynamic programming (see,e.g., [115]) can identify thesinglebest (i.e., most probable)model history for each model at any time. It always hasM (i.e., number of models) survived model histories at any timek. Each of them is the best model history for a model-based filters at a time. However, theseM histories definitely includethe best onemk

MAP but they are not necessarily theM best ones (for second and third generations) because, for example, thesecond best model history for a model may be better than the best one for another model. In short, this procedureselects“thefittest” for each model; some of these fittest can be quite unfitto the overall truth, but they are needed toguarantee“survivalof the fittest” for the future. In theB-best approach, the subsetB of the B most probable model sequences at timek canbe found by MAP hypothesis tests (see, e.g., [129], [333]) using all data throughk in a batch format. However,recursiveimplementations are usually needed. RecursiveB-best algorithms rely on MAP hypothesis tests using data available at thetime, where test statistic is usually a function of mode sequence probabilities and conditional measurement residuals. ManyMM algorithms for adaptive control make decisions in this way. However, they actually do not guarantee to yield theB mostprobable sequences over the entire time horizon because some or all of these best sequences may have less probable partialhistories and been deleted earlier in the process. On the contrary, the Viterbi algorithm can be implemented recursively if themodel sequence is a (hidden) Markov chain with mode-historyindependent measurements. Another drawback of theB-bestapproach is that some or many of theB most probable model sequences may be quite similar and had better be merged toreduce processing load.

Knowing the above pros and cons of these two approaches, it appears sensible to combine them as follows. Use theViterbi algorithm for hypothesis reduction (i.e., to decide which model sequences should be maintained), which involvesM2

conditional filtering operations (see Sec. V-B.5); but use the B-best algorithm to selectB most probable ones out of theM2

model sequences for output processing if we can only afford processingB sequences at a time for output, which may be thecase for the MAP base state estimatorxMAP

k|k as the peak location of a mixture density ofB components.In the extreme case of the “survival of the fittest” where the set B has only one model sequence, the estimate based on

the single most probable model sequencei is taken to be the overall estimate:(xk|k, Pk|k) = (x(i)k|k, P

(i)k|k). We emphasize that

compared with the conventional “decision-estimation” non-MM approach, this “estimation-decision” MM approach is superioreven in this extreme case because, as explained at the beginning of Sec. III-A, it makes a more informed decision since thedecision is madeafter the completion of conditional filtering (estimation).

Other hard decision procedures are possible, including heuristic rules, expert systems, neural networks, and so on.The overall estimate given a subsetB of more than one model sequence is usually taken to be a probabilistically weighted

sum of estimates based on model sequences inB, but other ways as described next could also be used.Soft decision. The output processing does not need to involve hard decisions. In fact, the most widely used MMSE based

estimators generate the overall estimate by a weighted sum of MMSE-estimatesx(i)k|k from all elemental filters with mode-

sequence probabilitiesµk(i) as the weights:xMMSE

k|k =∑

i x(i)k|kµk

(i). Likewise for mMMSEk|k . This can be thought of as a soft

decision for convenience.2

This soft decision is often applied in practice to fuse non-MMSE estimates as well. This is particularly popular for a hybridsystem with non-Gaussian linear subsystems to which the Kalman filters are applicable. In this case,x

(i)k|k are linear MMSE

estimators (i.e., have minimum MSE of all linear estimators), but not MMSE estimators. As a result,xk|k =∑

i x(i)k|kµk

(i) isneither MMSE nor linear MMSE. It is a nonlinear estimator hopefully close to the MMSE estimator. Although this hope lacksstrong theoretical support, thisxk|k can be expected to beat the overall linear MMSE estimatorxLMMSE

k|k (see the end of Sec.V-A for evidence).

Like any discrete-valued quantity, for a mode sequence, MAPestimationmkMAP and MAP decision (test) are actually alias.

The special MAP estimatorxMAPk|k (mk

MAP) is hard decision based, equal to the component MAP estimatorx(i)MAPk|k corresponding

to the most probable mode sequencemkMAP. Likewise for xMAP

k|k (mk). The joint MAP estimatorxJMAPk|k is also equal to one of

the component MAP estimatorsx(j)MAPk|k by a hard decision. However, the MAP estimatorxMAP

k|k relies on a soft decision thatrequires in general all the component densities as well as the mode-sequence probabilities [99].

Still another class of soft decision procedures is non-probabilistic, such as those based on Dempster-Shafer evidencetheory,fuzzy logic, neural networks, as discussed in Sec. VIII.

Other approaches to output processing are possible, such ascombinations of the above approaches.

D. Convergence of AMM Estimates

Magill’s original work [232] includes sufficient conditions on the convergence of the mode probabilities for a single-outputlinear system. This was extended to multiple outputs by others [310], [183], [179], [180]. It was shown in [130], [25] thatthe correct (true) model has a probability that tends to unity (almost surely) as time goes under Assumptions A1 and A2and that [25] the correct model and any of the incorrect models (a) cannot be perfectly distinguished in finite time by theirlikelihood functionsf(zk|mk

(i)) and (b) do not have identical marginal likelihood functionsf(zk|zk−1,mk(i)) as k increases.

2However, this is not to be confused with the soft decision in communication and coding, which refers to a decision that is revocable.


As such,mMLk|k, mMAP

k|k , andmMMSEk|k are all consistent in that they converge with probability one to the true model (mMMSE is

also mean-square consistent) under the assumptions statedabove [25]. Here the maximum likelihood estimatormMLk|k is the

model with the largest model likelihood. For a setM of linear time-invariant systems with white, uncorrelated, stationaryGaussian noise under Assumptions A1 and A2, the (a) and (b) above can be replaced by the easily verifiable condition [25]ρ(i)s = |ρ(i) − ρs| 6= 0,∀i 6= s,m(i) ∈ M with

ρ(i) = log det(S(i)) + tr[(S(i))−1S(i)], ρs = ρ(i)|m(i)=s = log det(S) + dim(zk) (20)

or by the even more easily verifiable condition [22]:det(S − S(i)) 6= 0, ∀i 6= s,m(i) ∈ M, whereS andS(i) are the steady-state measurement prediction covariance (MSE matrices) ofthe correct models and an incorrect modelm(i), respectively, ascalculated in the corresponding Kalman filters, andS(i) is the true steady-state measurement prediction MSE matrix of theincorrect modelm(i). For AMM estimation of the base state, we then have

xMMSEk|k → xs,MMSE

k|k , xMAPk|k , xJMAP

k|k , xMAPk|k (mMAP

k|k ) → xs,MAPk|k , xML

k|k, xMLk|k(mML

k|k), xMLk|k(mML

k|k) → xs,MLk|k

as k → ∞ with probability one under the stated assumptions. Herexs,MMSEk|k , xs,MAP

k|k , and xs,MLk|k are MMSE, MAP, and ML

estimators based on the true model, andxMLk|k, xML

k|k(mMLk|k), xML

k|k(mMLk|k) were defined in Sec. III-D.

The above results hold when the true models is in M (Assumption A2). For any model pairm(i) andm(j) in M, as shownin [24], the likelihood ratiof(zk|mk

(i))/ f(zk|mk(j)) goes to zero with probability one ifρ(j) < ρ(i) and only if ρ(j) ≤ ρ(i)

under Assumption A1 and that the corresponding measurementresidual sequences of the linear-Gaussian system are ergodicand have a finite and positive-definite steady-state mean-square matrix. It follows that regardless if the true models is in M ornot, the probability of the modelm(i) in M with the smallest “distance”ρ(i)

s to the true models tends to unity almost surelyas time goes under the assumptions stated above (but withoutA2) if this smallest “distance” is unique[24]. Consequently, allthe above convergence results formk|k and xk|k of the linear-Gaussian systems hold true if the true models is replaced bythe (assumed unique) model closest to it.3 Closely related, but slightly less convenient results wereobtained in [130] based onthe Kullback-Leibler information. Note that in this case MMSE- and MAP-AMM estimators all (implicitly and incorrectly)assume the true model is inM, but ML-AMM estimators need not and would not benefit from this assumption.

It was argued in [231] that in the case when the true models is not in M, the convergence of the probability of theclosest-to-truth model to unity is not necessarily desirable because many models inM may capture distinct characteristics ofthe true one and thus is better reflected in the limit. An AMM algorithm with mode “probabilities” converging to constantsother than zero and one was proposed. It modifies the Bayes’ rule for mode probability update by replacing the predictedmode probabilities with constant (e.g., uniform) prior probabilities.

E. Tracking Applications

The AMM (first generation) algorithms have numerous applications outside of the target tracking area. They are particularlypopular for problems involving unknown parameters (see, e.g., [342]). They form an important approach for dealing withsystems subject to faults (see, e.g., [26]). However, several more recent publications [106], [250], [365], [292] demonstratedthat the second- and third-generation MM algorithms still outperform the AMM algorithms for such applications, wherein thesystem mode does jump, in violation of the basic assumptionsof the AMM estimation. This is easily understandable for peopleengaged in target tracking research and development, in view of the duality between fault detection and maneuvering targettracking [164]. For obvious reasons, here we only survey limited AMM applications in maneuvering target tracking.

Although not a straightforward implementation of the AMM estimator, the maneuvering target tracking algorithm proposedin [327] has some features of the AMM algorithms. It combinesa nonmaneuver filter and a maneuver estimator chosen by ahard decision logic. The maneuver estimator itself is a two-model AMM procedure, which generates the overall estimate bya probabilistically weighted sum.

In [242], a two-model AMM algorithm was designed, which includes a Singer or 3Dconstant-turn (CT) model [209] aswell as aconstant-acceleration(CA) model, for line-of-sight angular tracking of a close-range, highly maneuverable, airbornetarget using forward looking infrared sensor measurements. Further enhancements of this image tracker were proposed andanalyzed in [329]. Briefly, some of the elemental filters wereallowed to have a rectangular field of view; the algorithm wastuned to harsher target dynamics by considering both Gauss-Markov acceleration and constant turn-rate models; and an initial

3The “distance”ρ(i)s was given the following Kullback-type information theoretic interpretation in [24]. Let

λk(i, j) = log[f(zk|mk(i))/f(zk|mk

(j))] − log[f(zk−1|mk(i))/f(zk−1|mk

(j))], d(i, j) = limk→∞

|E[λk(i, j)|sk]|, d(i) = d(s, i)

wheres is the true model andsk is the constant true model sequence through timek. Thend(i) ≥ d(j) if and only if ρ(i)s ≥ ρ

(j)s . As shown in [24],d(i, j) is

a pseudo distance (i.e., satisfies the triangle inequality and is symmetric and nonnegative definite, but not positive definite, hence the prefixpseudoand the needfor the extra uniqueness assumption), although it is closelyrelated to the Kullback-Leibler information measureI(s, j) = E(log[f(zk |sk)/f(zk |mk

(j))]|sk),

which does not satisfy the triangle inequality and is not symmetric. Despite this connection ofρ(i)s to information “distance,” no general results were available

for nonlinear, non-Gaussian systems based ond(i) directly.


target acquisition algorithm was devised to remove significant biases in the estimated target template to be used in a correlatorwithin the tracker. Further along this line, a 3-model AMM configurations based on a second-order Markov acceleration model[209] together with the CA and Singer models was investigated in [353]. For a different application, in [178] a 3-model AMMalgorithm with a first-order Gauss-Markov acceleration, first-order Gauss-Markov velocity, and a nearly constant positionmodel of a pilot’s head motion was applied as a predictor for avirtual environment flight simulator. Also, an AMM algorithmpreceding the above applications can be found in [249]. These publications, via their demonstrations of the superiority of theMM approach to the single-model based trackers (e.g., EKF),have also helped establish that using well selected, possiblymore sophisticated models for certain tracking applications can reduce the number of elemental filters significantly since theyprovide a better coverage of possible motion modes than simple models.

Results of a comprehensive study were presented recently in[273] on the capabilities of an AMM algorithm for tracking andinterception of a highly maneuverable fighter aircraft armed with electronic countermeasures (ECM) by an air-to-air missileequipped with a monopulse radar seeker. The scenario involves a sequence of periodical evasive maneuvers (EM) of the aircraftand electronic jinking4 (EJ) generated by the aircraft ECM system. The system mode space in terms of the EM-EJ pairs ofevasion strategies is unknown to the tracker (the missile homing system), and was approximated via “quantization” by a set of45 strategies, serving as the “ground truth” in the simulations. Due to feasibility considerations, however, the design of the MMtracker included only a smallfixed set of six most representative models, selected by a “trial-and-error” process to cover themode space reasonably. Each of the six elemental EKFs with an11-dimensional state was carefully tuned to a particular evasionstrategy (motion-jinking model). While demonstrating a significant improvement over previous, non-MM filters (e.g., singleEKF), the simulation results presented therein again exposed some typical deficiencies of the AMM algorithms. These include:failed or delayed identification—the filter may lock on an erroneous model and fail to switch to the true one;poor estimation,mainly due to the mismatch between the dominant model and thetrue mode. Overall, besides the important practical resultsas well as demonstrating the superiority of the MM approach to single filters, this study illustrates that AMM estimationisnot well suited to situations with frequent modal changes, for which later generations are more suitable. Also, covering a largemode space by a smallfixed set of models inevitably causes a large model-truth mismatch that normally leads to inadequateestimation performance. Instead, the third-generation,variable-structureMM algorithms appear much more attractive andpromising for this challenging, practical problem, characterized by alarge mode spaceand high motion dynamics. Anotherstudy for implementation of AMM estimation for a similar air-to-air missile guidance problem was reported in [266].

The AMM approach was employed for another important class ofproblems—ballistic target tracking [104], [103], [307].Tracking of a ballistic target was considered in [104], [103] for all flight phases: boost (including post-boost), coast(free flight),and reentry (possibly maneuverable) [205]. The tracker designed consists of seven autonomous filters running in parallel, eachcorresponding to one of the three specific flight phases. The output of the algorithm is selected from those of the elementalfilters by a hard decision logic, e.g., from the one based on the most likely model:x(mML ). Using an autonomous system offilters appears more reasonable here than in such applications as aircraft tracking since the modal changes here are infrequent.The use of a hard decision for output was motivated by two considerations. First, models of the different flight phases havestate vectors of different dimensions and fusion of estimates of these vectors by soft decision for output is nontrivialandlacks theoretical support. Second, such a fused estimate isstatistically inferior to the single best estimate for interception anddestruction applications where the hit rate is more important than average errors [218]. These arguments for hard decision aredebatable, though. For example, it can be argued that a hard decision is in general more prone to false alarms than a softdecision, possibly resulting in much poorer estimates. Forsuch reasons soft decision based schemes (e.g., IMM) have beenadopted more often for similar problems [160], [254], [39],[141], [142], [80].

In [307], the focus is on tracking a tactical ballistic missile in its incoming phase capable of random (bang-bang) evasivemaneuvers for the purposes of interception and destruction. From the estimation point of view, the main uncertainty facingthe tracker in such a scenario, embedded in a differential game framework, reduces to the unknown maneuver onset time.Thus the models involved differ only in the maneuver onset time.5 The choice of the AMM configuration appears appropriatewithin this formulation as far as estimation of the target stateafter the maneuver is concerned since one mode cannot jump toanother.

V. THE SECOND GENERATION: COOPERATINGMM ESTIMATION

A. Optimal Cooperating MM Estimation

Fundamental assumptions. In the second generation, cooperating MM (CMM) algorithms, the fundamental assumption A2of the first generation is retained without a change, but A1 isrelaxed to allow a time-varying mode sequence{sk}. Similarto the first generation, this sequence in principle can be random or deterministic, but for mathematical tractability, it is almostalways more conveniently assumed to be a random process, in particular a Markov (or semi-Markov) process, as stated formallybelow.

4It is a periodic switching of the aircraft’s apparent radar reflection center from one wing tip to the other.5This idea of multiple models with respect to the maneuver onset time was not new—the AMM of [232] was mentioned in [64] as an option for the input

estimation method; however, it was abandoned and a hard decision (generalized likelihood ratio test approach) was used instead.


A1’ . The true mode sequence{sk} is Markov (or semi-Markov).If {sk} is indeed random, the above Markovian assumption is justifiable for many applications. A hybrid system with A1’

is known as aMarkov jump system(MJS). Like the first generation, under A2 there is no mode-model mismatch and thus wewill again usem to denote both the mode and the model and treat a model as a realization of a mode in this section. UnderA2, there areMk possible model sequences (or realizations) through timek, which increases exponentially with time, whereM = |M| is the number of possible models at each time. This full tree (or more precisely, trellis) is illustrated in Fig. 3(a)for a three-model case. Under A1, it reduces to the corresponding “tree” for the first generation, as shown in Fig. 3(b). Wedenote a generic model sequence through timek as

mk(ik) = mk

(i1,...,ik) = {m(i1)1 , . . . ,m

(ik)k } = {(s1, . . . , sk) = (m(i1), . . . ,m(ik))}, m(in) ∈ M

and the set of all such sequences asMk. For simplicity we useik ∈ Mk to meanmk

(ik) ∈ Mk.

Model 1

Model 2

Model 3

k = 1 k = 2 k = 3 k = 4 k = 5 k = 6

•

•

•

•

•

•

@@

@@

AAAAAAA

@@

@@

¡¡

¡¡

¡¡

¡¡

¢¢¢¢¢¢¢

•

•

•

@@

@@

AAAAAAA

@@

@@

¡¡

¡¡

¡¡

¡¡

¢¢¢¢¢¢¢

•

•

•

@@

@@

AAAAAAA

@@

@@

¡¡

¡¡

¡¡

¡¡

¢¢¢¢¢¢¢

•

•

•

@@

@@

AAAAAAA

@@

@@

¡¡

¡¡

¡¡

¡¡

¢¢¢¢¢¢¢

•

•

•

@@

@@

AAAAAAA

@@

@@

¡¡

¡¡

¡¡

¡¡

¢¢¢¢¢¢¢ Model 1

Model 2

Model 3

k = 1 k = 2 k = 3 k = 4 k = 5 k = 6

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

(a) CMM (b) AMM

Fig. 3. Possible model sequences of CMM and AMM algorithms.

The discussion in the rest of this subsection is largely parallel to that of the optimal AMM estimation in Sec. IV-A. It isnonetheless presented here because of its importance for understanding suboptimal CMM algorithms presented later.

MMSE-CMM . The MMSE-CMM base-state estimatorat timek is given by (see, e.g., [21])

xk|k = E[xk|zk, A1’ , A2] =∑

ik∈Mk

E[xk|zk,mk(ik)]P{mk

(ik)|zk, A1’ , A2} =∑

ik∈Mk

x(ik)k|k µk

(ik) (21)

whereµk(ik) = P{mk

(ik)|zk,A1’ ,A2} is the posteriormode-sequence probabilityassuming that the mode sequence in effect is

one and only one but possibly anyone in the setMk and x

(ik)k|k = E[xk|zk,mk

(ik)] is the conditional MMSE-estimate assumingsequencemk

(ik) is true. Like the first generation, under A1’ and A2xMMSEk|k is unbiased and with an MSE matrix that is minimum

of all base-state estimatorsxk|k, given by (see, e.g., [21])

Pk|k = MSE(xk|k|zk, A1’ , A2) =∑

ik∈Mk

[

MSE(x(ik)k|k |A1’ , A2) + (x

(ik)k|k − xk|k)(x

(ik)k|k − xk|k)′

]

µk(ik) (22)

The correspondingmode estimatorat timek is given by

mk|k = E[mk|zk, A1’ , A2] =∑

ik∈Mk

E[mk|zk,mk(ik)]P{mk

(ik)|zk, A1’ , A2} =∑

ik∈Mk

m(ik)µk(ik)

MSE(mk|k|zk) = E[(mk − mk|k)(mk − mk|k)′|zk, A1’ , A2] =∑

ik∈Mk

m(ik)(m(i) − mk|k)(m(i) − mk|k)′µk(ik)

Although not seen in the literature, the estimators for the base state sequence and mode sequence using a batch of datathroughk can be obtained as

mk|k = E[mk|zk, A1’ , A2] =∑

ik∈Mk

E[mk|zk,mk(ik)]µ

k(ik) =

∑

ik∈Mk

m(ik)µk(ik) = (m1|k, m2|k, . . . , mk|k)

xk|k = E[xk|zk, A1’ , A2] =∑

ik∈Mk

E[xk|zk,mk(ik)]µ

k(ik) =

∑

ik∈Mk

xk|k(ik)

µk(ik) = (x1|k, x2|k, . . . , xk|k)

where m(ik) = (m(i1), . . . ,m(ik)) stands for a possibly time-varying model sequence throughk with the index sequenceik = (i1, . . . , ik), x

k|k(ik)

= E[xk|zk,mk(ik)] is the corresponding estimate of the base state sequence, and mMMSE

n|k and xMMSEn|k

aresmoothedMMSE estimates, given by

mn|k = E[mn|zk, A1’ , A2] =∑

ik∈Mk

E[mn|zk,mk(ik)]µ

k(ik) =

∑

ik∈Mk

m(in)µk(ik)

xn|k = E[xn|zk, A1’ , A2] =∑

ik∈Mk

E[xn|zk,mk(ik)]µ

k(ik) =

∑

ik∈Mk

x(ik)n|k µk

(ik)


MAP-CMM . The mixed pdf-pmf of the base state at timek and the mode sequence through timek is

p(xk,mk|zk, A1’ , A2) = f(xk|zk,mk)p(mk|zk, A1’ , A2) = {f(ik)(xk|zk)µk(ik), i

k ∈ Mk}

wheref(ik)(xk|zk) = f(xk|zk,mk(ik)) is the density assuming the mode sequence in effect ismk

(ik). It follows that the basestate has the posterior mixture density

f(xk|zk, A1’ , A2) =∑

ik∈Mk

f(xk|zk,mk(ik))P{mk

(ik)|zk, A1’ , A2} =∑

ik∈Mk

f(ik)(xk|zk)µk(ik) (23)

andf(xk|zk, mk) = f(ik)(xk|zk) if mk = mk(ik). Thus the MAP-CMM estimators are given by

xMAPk|k = arg max

xk

∑

ik∈Mk

f(ik)(xk|zk)µk(ik) (24)

(xJMAPk|k , m

k|kJMAP) = arg max

(xk,m(ik)){f(ik)(xk|zk)µk

(ik), ik ∈ Mk}

mk|kMAP = arg max

m(ik)

{µk(ik), i

k ∈ Mk} = (m1|k, m2|k, . . . , mk|k)MAP

xk|kMAP = arg max

xk

∑

ik∈Mk

f(ik)(xk|zk)µk

(ik) = (x1|k, x2|k, . . . , xk|k)MAP

xMAPk|k (mk) = arg max

xk

f(ik)(xk|zk) if mk = m(ik), xMAPk|k (m

k|kMAP) = xMAP

k|k (mk)|mk=m

k|kMAP

Unlike the MMSE estimation, the componentmn|k of a MAP sequence estimatemk|kMAP is not equal tomMAP

n|k , the MAP estimateof the componentm(n) of the mode sequencemk. Likewise for the base state.

Compared with Sec. IV-A, these optimal CMM estimators clearly correspond to the respective optimal AMM estimatorsby replacing the constant mode sequencemk

(i) therein with a possibly time-varying mode sequencemk(ik). As a result, all

discussions of the AMM estimators regarding interpretations, dependence on component MAP estimators, and MSE matrixapply to the CMM estimators accordingly.

MMSE vs. MAP. As explained in Sec. III-D, the MMSE estimatorxMMSEk|k minimizes the mean-square error, while the

MAP estimatorxMAPk|k maximizes the rate of hitting a tiny “golf hole” centered atxk in the base state space. They are suitable

for different applications, as elaborated in [218].(xJMAPk|k , m

k|kJMAP) is a joint estimator that maximizes the rate of hitting the

“golf hole” and simultaneously choosing the correct model sequence. As explained in Sec. III-D,xMMSEk|k is the mean of the

posterior mixture density;xMAPk|k is the location of the highest peak of the mixture density;xJMAP

k|k and xMAPk|k (mk) are locations

of the highest peaks of the corresponding component densities. When the mixture density consists of many components, asis the case for CMM estimation with arandommode sequence having an exponentially increased number of realizations, theuse ofxJMAP

k|k or xMAPk|k (mk) for any mk is on shaky ground: They are based on a component density which is likely to have a

very small (say,1%), albeit largest of all components, probability of being the true density.6 The use ofxMMSEk|k and xMAP

k|k ison much firmer ground. If, however, the mode sequence is not random, the mixture density and hencexMMSE

k|k and xMAPk|k are

meaningless. In this case, the maximum likelihood estimators xJMLk|k and xML

k|k(mk|kML ) appear to be reasonable choices.

The above discussion is based on Assumption A2 that the true mode is exactly one of the models in the setM used.While this assumption is exact or very reasonable for most communication or coding problems, it is almost never true formaneuvering target tracking where the motion uncertainty is almost never resolved exactly by the models used. For instance,a target almost never takes an exact constant turn and even ifit does the turn rate would not be exactly equal to one of thoseused in the models.xJMAP

k|k , xMAPk|k (mk), xJML

k|k , and xMLk|k(mk) rely heavily on impossibility of such mismatches between modes

and models. While assuming no such mismatch explicitly,xMMSEk|k and xMAP

k|k are more robust against (less sensitive to) thismismatch because they rely on all component densities, rather than put all their eggs in one basket as the other MAP and MLestimators do. This in effect “covers” modes between any twomodels used (i.e., the convex set formed by the models used).

As explained in Secs. III-D and IV-B, given probabilistic weights the entire density functionf(ik)(xk|zk) of every component

is needed to computexMAPk|k , while xMMSE

k|k relies only on its first two moments{x(ik)k|k , P

(ik)k|k }. Note, however, that calculation

of the moments needed forxMMSEk|k is an integration problem, often harder than solving the maximization problem needed for

xMAPk|k given every component density, which can usually be reducedto differentiation and equation solving. Note also that

finding the global maximizer of a mixture density numerically is much simpler than that of a general multivariate function.For instance, the global maximizer of a mixture density is bound to be in the convex set formed by the outmost peak locationsof all component densities.

6The MAP estimatormk|kMAP of the modesequence is still one of the most reasonable choices.


While the output processing of an MMSE-based MM algorithm is usually soft-decision based in that its overall estimate is aweighted sum of results from each elemental filter, a hard-decision based output processing could also be used. [98] proposed forcertain applications to use the estimate from a single elemental filter that has the smallest mean-square error among allelementalfilters. The best elemental filterx(i)

k|k can be identified as the one with the smallest deviation(x(i)k|k − xMMSE

k|k )′(x(i)k|k − xMMSE

k|k )from the optimal estimate because it can be easily shown that

mse(x(i)k|k|zk) = mse(xMMSE

k|k |zk) + (x(i)k|k − xMMSE

k|k )′(x(i)k|k − xMMSE

k|k )

where mse(x|z) = E[(x − x)′(x − x)|z]. This in general requires knowledge of the MMSE-optimal estimate xMMSEk|k .

Linear MMSE . An optimallinear MMSE estimator was proposed in [81] for a Markov jump-linearsystem, defined similarlyby (1)–(2) except thatw andv are replaced by mode-independentw andDv, with D a matrix. The cornerstone of this LMMSEestimator is the introduction of the(M ·n)-dimensional stacked vectoryk = [(y

(1)k )′, . . . , (y(M)

k )′]′ to represent then-dimensionalbase state subject to the assumed model uncertainty amongM known models, where everyy(i)

k is ann-dimensional zero vectorexcepty(j)

k = xk if model m(j) is true. The problem of estimating the hybrid state(xk, sk) is thus reduced to the conventionalproblem of estimatingyk. As a result, the recursive optimal LMMSE estimatoryk|k = [(y

(1)k|k)′, . . . , (y(M)

k|k )′]′ of yk is available.

The base state estimator is simplyxk|k =∑

i y(i)k|k and MSE(xk|k) is equal to sum over alln×n blocks of MSE(yk|k). A more

informative but concise description of this LMMSE estimator was given in [192]. Simulation results given in [81] show thatthis optimal linear MMSE estimator performs in general not as well as thesuboptimal nonlinearIMM estimator, and therewas no single case considered in which the LMMSE estimator outperforms the IMM estimator significantly. This demonstratesthe high nonlinearity of hybrid estimation problems and theneed for goodnonlinear estimators. More recently, [83] showedthat for a mean-square stable Markov jump-linear system with an ergodic Markov chain, MSE(yk|k) of this LMMSE estimatorconverges to a unique positive semidefinite solution of an algebraic Riccati equation. The corresponding matrix in the LMMSEestimator was replaced by this steady-state solution to arrive at a steady-state estimator, similar in spirit to the development ofthe steady-state Kalman filter. This steady-state estimator was generalized in [82] to the case where the system given a modesequence involves some uncertain parameters.

Most probable trajectory estimation. Conceptually similar to maximum likelihood sequence estimation, the most probabletrajectory (MPT) estimation is a nonlinear filtering approach that determines a state sequence fitting best to the data insomesense (see, e.g., [265]). [364], [224] considered state estimation of a continuous-time hybrid system with a fixed number Mof possible mode processs(t), wheres(t) hasN known, possible distributions. A finite-dimensional hybrid filter in recursiveform was presented in [364] that is optimal in the MPT sense, which includes optimal base-state sequence estimation andidentification of the most probable distribution ofs(t). The MPT optimality here is with respect to a cost function definedfor a system that is an average of the hybrid system over possible modes. This approach is quite robust with regard to noisecharacterization: It was applied in [224] to state estimation of a jump-linear system as a piecewise-linear approximation of ahighly nonlinear system, including a satellite orbit determination example.

B. Cooperation Strategies for MM Estimation

Since the number of possible model sequences (hypotheses) increases exponentially with time (more precisely, geometricallywith discrete time), brute-force implementations of the above optimal CMM estimatorsxMMSE

k|k and xMAPk|k are infeasible.

Consequently, strategies have been developed to cope with this difficulty. We refer to them ascooperation strategies. DifferentCMM algorithms are distinct from one another in the cooperation strategies used. Cooperation strategies developed so far canbe classified into two general categories: hypothesis reduction strategies and iterative strategies.

Different hypothesis reduction strategieshave been proposed to keep the number of hypotheses within a certain limit:

• Merging of “similar” model sequences, resulting in a tree with combined branches, which can be achieved by soft decisionsand in effect reinitialization of the filters with an (approximately) “equivalent” estimate and covariance;

• Pruning of “unlikely” model sequences, resulting in a truncated tree, which is necessarily hard-decision based;• Selection of the best model sequence(s), including the globally best single sequence or the best sequence for each current

model;• Random selection of a subset of the possible hypotheses;• Others, such as decoupling weakly coupled mode sequences toform clusters, as briefly reviewed in [277].

The basic idea of merging and pruning strategies is to replace the ever growing mixture tree with a simpler tree thatapproximates the original tree in some sense. In general, since the base state under A1’ and A2 has a mixture density ofan ever growing number of components, the problem of hypothesis reduction is in essence that of mixture density reduction,which has been studied in other areas within target tracking(see, e.g., [294]) as well as in statistics extensively (see, e.g.,[328], [245]), and thus various mixture density reduction techniques can be applied here. Although closely related, hypothesisreduction is not to be confused with the output processing, discussed in Sec. IV-C. Selection is a special pruning strategy,which deletes all model sequences except the best one(s).


Random selection decides on a subsetC of all possible model sequences at random, performs the corresponding conditionalfiltering operations, and then generates the overall estimate from the conditional estimates. Clearly, this approach can beviewed/argued as a special, albeit extreme, pruning strategy. It has a straightforward batch implementation and several possiblerecursive implementations. For example, one natural recursive implementation is to select a set of model historiesmk−1

∗ atrandomfor each elemental filterat timek, wheremk−1

∗ = (mk−2∗ ,m

(i)k−1) is formed from some model historymk−2

∗ selected

before and some modelm(i) at timek − 1. Another one is to select a set of model sequencesmk∗ = (mk−1

∗ ,m(i)k ) at random

at timek. Note that the first implementation runs elemental filters based on every model atk, but this is not generally the casein the second implementation. An early publication of this random selection approach is [2]. More recently developed MMparticle filtering algorithms also belong to this class. These results are surveyed in a subsequent part of this survey.

Another class isiterative strategies. Disengaged from the tree (mixture) structure of the distribution, they try to solve theestimation problem with recourse to the power of iteration.More developed in this class are those based on the so-calledEMalgorithm, an elegant and powerful optimization method particularly suitable for ML and MAP estimation.

All results in Sec. V-B depend on A1’ and A2, but we drop explicit indications of this dependence for simplicity.1) GPB and IMM Merging Strategies:

In these strategies, the ever growing hypothesis tree is approximated repeatedly by a simpler tree, each branch of whichlumps“close” or “similar” branches of the original tree. In the following discussion, for simplicity we omit formulas for MSEmatrices(error covariances).

GPB. A straightforward and probably the most natural implementation of this idea is the so-called Generalized PseudoBayesian algorithms of ordern (GPBn) [148], [316], [18]. They reduce the hypothesis tree by having a fixed memory depthsuch that all the hypotheses that are the same in the latestn time steps are merged and thus each of theM filters runsMn−1

times at each recursion. TheGPB1andGPB2algorithms are the most popular ones in this class [1], [148], [74], [316], [18].Although for simplicity our discussion below is based on theGPB2 algorithm explicitly, it can be extended to the generalGPBn

case straightforwardly. Here the effects of different estimatesx(ik)k|k = E[xk|zk,mk

(ik)] with probabilitiesµk(ik) = P{mk

(ik)|zk}based on the same model pairm(i) andm(j) at k − 1 andk respectively are lumped (merged) by the single estimate

x(i,j)k|k = E[xk|zk,m

(i)k−1,m

(j)k ] with probability µ

(i,j)k−1,k = P{m(i)

k−1,m(j)k |zk}

Since mk(ik) = (mk−2

(ik−2),m

(i)k−1,m

(j)k ), where mk−2

(ik−2)= (m

(i1)1 ,m

(i2)2 , . . . ,m

(ik−2)k−2 ) is mode history through timek − 2,

it follows from the total probability (expectation) theorem that x(i,j)k|k and µ

(i,j)k−1,k are actually averages ofx(ik−2,i,j)

k|k =

E[xk|zk,mk−2(ik−2)

,m(i)k−1,m

(j)k ] andµk

(ik−2,i,j) = P{mk−2(ik−2)

,m(i)k−1,m

(j)k |zk} over mk−2

(ik−2):

x(i,j)k|k =

∑

ik−2∈Mk−2

x(ik−2,i,j)k|k P{mk−2

(ik−2)|zk,m

(i)k−1,m

(j)k }, µ

(i,j)k−1,k =

∑

ik−2∈Mk−2

µk(ik−2,i,j)

This lumping or merging isexact for MMSE output processingin that7

xMMSEk|k =

∑

ik∈Mk

x(ik)k|k µ

(ik)k =

∑

(i,j)∈M2

x(i,j)k|k µ

(i,j)k−1,k (25)

which stems from the linearity of the conditional expectation. This lumping is, however,approximate for conditional filtering,which is anonlinear operation in that the outputx(j)

k|k is nonlinear in the inputx(j)k−1|k−1. This is the case even for a linear-

Gaussian system with a known mode sequence, where only one term of x(j)k|k is linear in x

(j)k−1|k−1 and the other term depends

on the measurement atk. For recursive conditional filtering atk, this lumping approximates the recursion{

x(ik−1)k−1|k−1, µ

k−1(ik−1)

: ik−1 ∈ Mk−1}

→{

x(ik)k|k , µk

(ik) : ik ∈ Mk}

which consists ofMk conditional filtering operations, by the simplified recursion{

x(i,j)k−1|k−1, µ

(i,j)k−2,k−1 : (i, j) ∈ M2

}

→{

x(i,j)k|k , µ

(i,j)k−1,k : (i, j) ∈ M2

}

which has onlyM2 conditional filtering operations,M for each model. In simple words, the lumping amounts to assuming

x(ik−3,i,j)k−1|k−1 = E[xk−1|zk−1,mk−3

(ik−3),m

(i)k−2,m

(j)k−1] ≈ x

(i,j)k−1|k−1 = E[xk−1|zk−1,m

(i)k−2,m

(j)k−1] (26)

which can be called the GPB2’sfundamental assumption. The conditional estimates of a jump-linear system (4)–(5)are givenexplicitly by

x(i,j)k|k = E[xk|zk,m

(i)k−1,m

(j)k ] = x

(i)k−1|k−1 + K

(j)k (z

(j)k − H

(j)k F

(j)k−1x

(i)k−1|k−1), (i, j) ∈ M2

7This is the same as replacing bodies of distributed mass by their total masses placed at their respective centroids for the calculation of the centroid of thetotal body system.


whereK(j)k is the Kalman filter gain atk, and x

(i)k−1|k−1 is a lumped estimate:

x(i)k−1|k−1 = E[xk−1|zk−1,m

(i)k−1] =

∑

l∈M

x(l,i)k−1|k−1P{m(l)

k−2|zk−1,m(i)k−1}, i ∈ M (27)

This GPB2 approximation can also be applied to CMM estimation based on non-MMSE criteria. For example,

f(ik−3,i,j)(xk−1|zk−1) = f(xk−1|zk−1,mk−3(ik−3)

,m(i)k−2,m

(j)k−1) ≈ f(i,j)(xk−1|zk−1) = f(xk−1|zk−1,m

(i)k−2,m

(j)k−1)

Clearly, this approximation implies (26) but is not impliedby (26) and thus is actually more fundamental than (26). Similarlyas above,f(i,j)(xk−1|zk−1) is actually the probabilistically weighted average off(ik−3,i,j)(xk−1|zk−1) over mk−3

(ik−3):

f(i,j)(xk−1|zk−1) =∑

ik−3∈Mk−3

f(ik−3,i,j)(xk−1|zk−1)P{mk−3(ik−3)

|zk−1,m(i)k−2,m

(j)k−1}

The (MMSE-based) GPB1 algorithm [1] is based on approximately lumping (merging) the effects of all past model historieson x

(ik−2,i)k−1|k−1 with µk−1

(ik−2,i)to yield:

x(ik−2,i)k−1|k−1 ≈ x

(i)k−1|k−1 = E[xk−1|zk−1,m

(i)k−1], i ∈ M (28)

µ(i)k−1 = P{m(i)

k−1|zk−1} =∑

ik−2∈Mk−2

µk−1(ik−2,i)

, i ∈ M (29)

which requires onlyM conditional filtering operations, one for each model, at each recursion.The standard GPBn strategy requires storage ofMn−1 estimatesx(ik−n,...,ik−1)

k−1|k−1 and Mn conditional filtering operations.

[334], [342] proposed to trade computation for storage by storing only thenM estimates{x(i)k−n|k−n, . . . , x

(i)k−1|k−1, i ∈ M},

together with{zk−n, zk−n+1, . . . , zk−1}, and recomputing allx(ik−n,...,ik−1)k−1|k−1 , which requires on the order of(M2 + M3 +

· · · + Mn) conditional filtering operations.Reinitialization . Hypothesis reduction forrecursivesingle-scan CMM estimation amounts toreinitialization of each ele-

mental filter since it is reflected in the inputs to elemental filters at each recursion [192]. Consider the recursion at time k anddenote byX(i)

k−1 the set of input quantities (omit MSE matrices) to the elemental filter based on modelm(i), as depicted inFig. 4. Then,

X(i)k−1 =

{

x(ik−1)k−1|k−1, µ

k−1(ik−1)

: ik−1 ∈ Mk−1}

MMSE-CMM{

f(ik−1)(xk−1|zk−1), µk−1(ik−1)

: ik−1 ∈ Mk−1}

MAP-CMM

{x(ik−n,...,ik−1)k−1|k−1 , µ

(ik−n,...,ik−1)k−n,...,k−1 : (ik−n, . . . , ik−1) ∈ Mn−1} GPBn

{x(j)k−1|k−1, µ

(j)k−1, j ∈ M} GPB2

For GPB1 and MMSE-AMM (first generation),X(i)k−1 has only one elementx(i)

k−1|k−1, given by

x(i)k−1|k−1 =

{

x(i)k−1|k−1 = E[xk−1|zk−1,mk−1

(i) ] MMSE-AMM

xk−1|k−1 = E[xk−1|zk−1] =∑

j∈M x(j)k−1|k−1P{m(j)

k−1|zk−1} GPB1

IMM . A significantly more cost-effective reinitialization than that of the GPB1 is

x(i)k−1|k−1 = E[xk−1|zk−1,m

(i)k ] = E

{

E[xk−1|m(j)k−1,m

(i)k , zk−1]|zk−1,m

(i)k

}

=∑

j∈M

x(j)k−1|k−1P{m(j)

k−1|zk−1,m(i)k } (30)

This leads to theInteracting Multiple-Model(IMM) algorithm [56], [55], [58]. Like the GPB1 algorithm, the IMM algorithmalso runs each of theM filters only once at each recursion. Compared with the GPB1 reinitialization x

(i)k−1|k−1 = E[xk−1|zk−1],

the extra conditioning onm(i)k in the IMM reinitialization x

(i)k−1|k−1 = E[xk−1|zk−1,m

(i)k ] is both legitimate and effective

[190], [192]. It is legitimate becausem(i)k is assumed true anyway when calculatingx

(i)k|k in the conditional filtering; it is

effective becausem(i)k carries valuable information aboutmk−1 since the mode sequence is dependent, which in turn affects

xk−1.The reinitialization in the GPB1 and IMM algorithms is illustrated in Fig. 5 and explained as follows [192]. The GPB1

algorithm reinitializeseachfilter with the “best possible”common singlequasi-sufficient statistic—the previousoverallestimate;the elemental filters interact with one another only throughthe use of this common input at each recursion, which carriesinformation from all elemental filters. In the IMM algorithm, each filteri at timek hasits own, individualized reinitializationx

(i)k−1|k−1, which forms the “best possible”quasi-sufficient statisticof all old informationand the knowledge/assumption that

model m(i) matches the system mode atk. Such individualized reinitializationis clearly intuitively more appealing than


FilterReinitialization

zk

-

-

-

- -

?¾

Model 1 based filterX

(1)k−1

x(1)

k|k

- -

?¾

Model 2 based filterX

(2)k−1

x(2)

k|k

...

- -

?¾

Model M based filterX

(M)k−1

x(M)

k|k

EstimateFusion

xk|k-

?¾Fig. 4. Structure of single-scan recursive MM estimation algorithms.

Model 1

Model 2

Model 3

x(i)

1|1x1|1 x

(i)

2|2x2|2 x

(i)

3|3x3|3

•

•

•

•

@@

@@

¡¡

¡¡

•

•

•

•

@@

@@

¡¡

¡¡

•

•

•

•

@@

@@

¡¡

¡¡@@

@@

¡¡

¡¡

@@

@@

¡¡

¡¡Model 1

Model 2

Model 3

x(i)

1|1 x(i)

1|1 x(i)

2|2 x(i)

2|2 x(i)

3|3 x(i)

3|3

•

•

•

•

•

•

@@

@@

AAAAAAA

@@

@@

¡¡

¡¡

¡¡

¡¡

¢¢¢¢¢¢¢

•

•

•

@@

@@

AAAAAAA

@@

@@

¡¡

¡¡

¡¡

¡¡

¢¢¢¢¢¢¢

•

•

•

@@

@@

AAAAAAA

@@

@@

¡¡

¡¡

¡¡

¡¡

¢¢¢¢¢¢¢

•

•

•

•

•

•

(a) GPB1 (b) IMM

Fig. 5. GPB1 and IMM reinitializations.

the GPB1’scommon reinitialization. The superiority of the IMM to the GPB1 stems from this smart extra conditioning orindividualized reinitialization, known asmixing, as evidenced by numerous applications detailed later in Sec. V-E. Note thatGPB1 uses a single merging operation for both output and conditional filtering, whereas IMM and GPB2 both use two separateones.

Note that the IMM reinitialization (30) differs from (27) ofthe GPB2 algorithm only in the time at which the model isassumed to be true. In this sense, the IMM algorithm does the mixing at a better time (before conditional filtering) than the GPB2algorithm (after conditional filtering) [58]. For the case where each model-based system dynamics is linear, state-predictionmixing,8 with x

(j,i)k|k−1 = E[xk|m(j)

k−1, zk−1,m

(i)k ],

x(i)k|k−1 = E[xk|zk−1,m

(i)k ] = E

{

E[xk|m(j)k−1, z

k−1,m(i)k ]|zk−1,m

(i)k

}

=∑

j∈M

x(j,i)k|k−1P{m(j)

k−1|zk−1,m(i)k }

is equivalent to the IMM reinitialization (30). If the model-based system dynamics is nonlinear, it is more accurate in generalthan reinitialization (30) but is computationally less efficient because it requiresM predictions in each conditional filteringoperation, rather than a single one if (30) is used.

The architecture of the IMM algorithm is illustrated in Fig.6 with three models. A complete recursion of the IMM algorithmwith Kalman filters as its elemental filters is summarized in Table II for the Markov jump linear system (4)–(5) with whiteGaussian process and measurement noises. Straightforwardimplementation may have numerical problems. A numericallyrobustversion of the IMM algorithm (as well as the AMM algorithm) was presented in [214].

IMM with semi-Markov models . As explained in Sec. III.I of Part I [209], asemi-Markovprocess model has a greatermodeling power and suits better to more real-world problemsthan a Markov model. The evolution of a semi-Markov processcan be visualized as follows (Fig. 7). From any modem(i), the next modem(j) to take place is chosen at random accordingto transition probabilityπij , and the time betweenm(i) and m(j) (i.e., the sojourn timeτ (i) in mode m(i)) is chosen atrandom according tosojourn timepdf fij(τ). A class of semi-Markov models, highly relevant to MM tracking, is thesojourn-time dependent Markov(STDM) process [255], [69]. Here the process representing the modal state is characterized by the

8The benefit of the extra conditioning is more evident for state-prediction mixing than for reinitialization.


zk

-

-

- -

?¾

Model 1 based filterx

(1)

k−1|k−1

P(1)

k−1|k−1

x(1)

k|k

P(1)

k|k

-

?¾


(2)

k−1|k−1

P(2)

k−1|k−1

x(2)

k|k

P(2)

k|k

-

?¾


(3)

k−1|k−1

P(3)

k−1|k−1

x(3)

k|k

P(3)

k|k

AAAAAU-

¢¢¢¢¢

e+

xk|k-

Pk|k

¡¡

¡¡

¡ª

¢¢

¢¢

¢¢

¢¢

¢¢®

@@

@@

@I

¡¡

¡¡

¡ª @@

@@

@I

AA

AA

AA

AA

AAK

-

-

-

¾

¾

¾

e

e

e

+

+

+

Fig. 6. Structure of the IMM estimation algorithm (with threemodels).

TABLE II

ONE CYCLE OF THEIMM ESTIMATOR.

1. Model-conditioned reinitialization (fori = 1, 2, . . . , M ):

Predicted mode probability:µ(i)k|k−1

∆= P{m

(i)k |zk−1} =

∑

j πjiµ(j)k−1

Mixing weight: µj|ik−1

∆= P{m

(j)k−1|m

(i)k , zk−1} = πjiµ

(j)k−1/µ

(i)k|k−1

Mixing estimate: x(i)k−1|k−1

∆= E[xk−1|m

(i)k , zk−1] =

∑

j x(j)k−1|k−1

µj|ik−1

Mixing covariance:

P(i)k−1|k−1

=∑

j [P(j)k−1|k−1

+ (x(i)k−1|k−1

− x(j)k−1|k−1

)(x(i)k−1|k−1

− x(j)k−1|k−1

)′]µj|ik−1

2. Model-conditioned filtering (fori = 1, 2, . . . , M ):

Predicted state: x(i)k|k−1

= F(i)k−1x

(i)k−1|k−1

+ G(i)k−1w

(i)k−1

Predicted covariance: P(i)k|k−1

= F(i)k−1P

(i)k−1|k−1

(F(i)k−1)′ + G

(i)k−1Q

(i)k−1(G

(i)k−1)

′

Measurement residual: z(i)k = zk − H

(i)k x

(i)k|k−1

− v(i)k

Residual covariance: S(i)k = H

(i)k P

(i)k|k−1

(H(i)k )′ + R

(i)k

Filter gain: K(i)k = P

(i)k|k−1

(H(i)k )′(S

(i)k )−1

Updated state: x(i)k|k

= x(i)k|k−1

+ K(i)k z

(i)k

Updated covariance: P(i)k|k

= P(i)k|k−1

− K(i)k S

(i)k (K

(i)k )′

3. Mode probability update (fori = 1, 2, . . . , M ):

Model likelihood: L(i)k

∆= p[z

(i)k |m

(i)k , zk−1]

assume=

exp[−(1/2)(z(i)k

)′(S(i)k

)−1z(i)k

]

|2πS(i)k

|1/2

Mode probability: µ(i)k =

µ(i)k|k−1

L(i)k

∑

j µ(j)k|k−1

L(j)k

4. Estimate fusion:

Overall estimate: xk|k =∑

i x(i)k|k

µ(i)k

Overall covariance: Pk|k =∑

i[P(i)k|k

+ (xk|k − x(i)k|k

)(xk|k − x(i)k|k

)′]µ(i)k

sojourn-time dependent transition probabilities, definedas

πij(τ) = P{

sk = m(j)|sk−1 = · · · = sk−τ = m(i) 6= sk−τ−1

}

= P{

m(j)k |m(i)

k−1, τ(i)k−1 = τ

}

whereτ(i)k−1 is the time already stayed in modem(i) at timek − 1.

An extension of the IMM configuration to the case of an STDM process was proposed in [69]. It operates in the samemanner as the standard IMM algorithm except that for the recursive cycle(k − 1 → k) each transition probabilityπij isreplaced by average (expected value) ofπij(τ) over all possibleτ values:

πij(k − 1) , P{m(j)k |m(i)

k−1, zk−1} =

k−1∑

τ=1

πij(τ)p(i)k−1(τ) = E[πij(τ)|m(i)

k−1, zk−1]


-

6

-

-

-

-

τ0 τ1 τ2 τ3 t

s

m(0)

m(1)

m(2)

0

Fig. 7. A semi-Markov process.

An “exact” recursion forp(i)k (τ) = P{τ (i)

k = τ |m(i)k , zk} was derived in [256]. Later [279] showed that this recursionis

actually approximate, and the IMM reinitializationx(i)k−1|k−1 = E[xk−1|zk−1,m

(i)k ] loses its magic in the case of an STDM

process and turns out to be similar to that of GPB2:

E[xk−1|zk−1,m(i)k ] = E

{

E[xk−1|m(j)k−1,m

(i)k , zk−1]|zk−1,m

(i)k

}

=∑

j

x(j,i)k−1|k−1P{m(j)

k−1|zk−1,m(i)k }

Use of the standard IMM reinitialization (30) here essentially amounts to ignoring the sojourn-time dependence becausex

(j,i)k−1|k−1 , E[xk−1|m(j)

k−1,m(i)k , zk−1] = E[xk−1|m(j)

k−1, zk−1] , x

(j)k−1|k−1 holds only if the Markov chain is not sojourn-

time dependent.From a practical point of view, the STDM model appears rathercomplicated to design mainly because the required knowledge

of the sojourn-time dependent transition probabilities ishard to come by. The problem can be simplified slightly by consideringa narrower class—homogeneous semi-Markov processes [332],[238] specified by an embedded Markov chain with given initialmode probabilities, transition probabilities, and sojourn-time pmfs of each modem(i), defined as

µ(i)k (τ) = P

{

sk 6= m(i)|sk−1 = · · · = sk−τ = m(i) 6= sk−τ−1

}

= P{

sk 6= m(i)|m(i)k−1, τ

(i)k−1 = τ

}

That is, µ(i)k (τ) is the probability that a jump occurs at timek given that the last jump was at timek − τ to modem(i). It is

time invariant and∑∞

τ=1 µ(i)k (τ) = 1 due to homogeneity of the process. In this case, the STDM transition probabilities can

be determined byπij(τ) = µ(i)k (τ)πij [332], [238], which can serve as a useful guideline to build an STDM process model.

2) Other Merging-Based Algorithms:Bayesian filter bank. Generally speaking, the mode transition may be base-statedependent and thus not Markov nor semi-Markov. [291] proposed a general Bayesian density-based scheme for hybrid estimation with nonlinear measurements andsuchnon-Markovian mode jumps. The scheme is linear in the numberof models and a computational implementation based onGaussian sum approximation techniques was proposed. For the particular case of homogeneous Markov jump linear systemthis scheme can be used for point estimation and can be implemented via standard techniques of merging close componentsand/or pruning unlikely components. A more detailed description will be given in a subsequent part of this survey. A systematictreatment of mixture component reduction techniques in a more general setting can be found in [294], [295].

Change of measure based. By change of measure, a measure-theoretic technique basedon the Radon-Nikodym theorem,[321], [320], [322] developed a Gaussian wavelet estimator(GWE)9 based on hypothesis merging to limit the growth of theGaussian mixture. Its merging strategy very much resemblesthe GPB2 strategy.10 However, it differs fundamentally from theIMM (and GPB) algorithms in computing the mode probabilities. It accounts for the effect of the measurement residualsz

(i)k

on the state estimate update directly in the state space:

P{m(i)k−1,m

(j)k |zk} ∝ exp

[

1

2(‖x(i,j)

k|k ‖2

(P(i,j)

k|k)−1

− ‖x(i,j)k|k−1‖2

(P(i,j)

k|k−1)−1

)

]

This formula gives more weight to the models that have a larger normalized change in the magnitude of the updated mean,unlike the conventional, intuitively appealing IMM and GPB1 formulas, which give more weight to those with a smallernormalized measurement residual:P{m(i)

k |zk} ∝ exp(− 12‖z

(i)k ‖2

(S(i)k )−1

). [323] included a comparative study between GWE

and the IMM algorithm for scenarios with different data rates. While for high rates the differences were tiny, for low rates theGWE showed significant improvement.

More generally, an exact estimator with all components of the mixture was presented in [109] by change of measure, alongwith simulation results illustrating its improvement overthe IMM algorithm. Recently, [108] presented simulation results ofan approximate implementation of a fixed complexity by “exact pruning” (without explanation), showing superior performanceto the IMM algorithm for a passive tracking example.

9A Gaussian wavelet is simply a Gaussian mixture, where the mother wavelet is simply the Gaussian distribution function.10In [321], [320], [322], dynamic modes and measurement models are denoted by different indices, resulting in triple indices.


Enhancement by mode observations.Tracking performance can be improved by using additional observations of targetfeatures, such as target attitude and image, provided by, e.g., an imaging sensor. These observations are related more directlyto the target motion mode than the kinematic measurements. Many of them are actually used in target recognition and feature-aided tracking. As such, incorporation of such observations in tracking is closely related to joint tracking and recognition.Issues in modeling and utilization of such information for target tracking have been studied extensively (see, e.g., [163], [188],[8], [324], [325], [93]). Many of them are beyond the scope ofthis paper and their coverage is planned in a subsequent part.We present here only a brief review of recent results using mode observations within the MM context.

Loosely speaking, a modal sensor can be modeled as a classifier over a setM [322] (possibly augmented by a “no-decision”event [359], [360], [112]). Letyk andyk be the mode observations at and through timek, respectively, along with the kinematicmeasurementszk. It follows from straightforward Bayesian calculus [110],[112] that for the Markov jump-linear system (1)–(2), the joint posterior densityp(xk, sk|zk, yk) of the hybrid state is again a mixture density with an exponentially increasingnumber of components, similar to the case with kinematic measurements only. This fact was established earlier (see [109],[321]) in terms ofunnormalized11 joint posterior densitiesq(xk, sk|zk, yk) by change of measure in discrete time [107]. Mainefforts thereafter have been focused on developing tractable approximate estimators with fixed computation/memory.12

Based on the optimal Bayesian mixture-density representation, [111], [110], [112] proposed an extension of the IMMalgorithm, referred to as image-enhanced IMM (IE-IMM), derived by IMM-like merging. It consists of all the familiar IMMsteps plus an additional step to update the mode probabilities with the mode (image) observations as13

P{m(i)k |zk, yk} =

1

cp(sk|m(i)

k , zk, yk−1)P{m(i)k |zk, yk−1} (31)

where the likelihoodp(sk|m(i)k , zk, yk−1) is provided by the model of the modal sensor andP{m(i)

k |zk, yk−1} comes fromthe standard IMM part after the update with the kinematic measurementzk. The IE-IMM design for tracking included aconstant-velocity(CV) and two CT models with known turn rates. Simulation resultsdemonstrated that mode observationsindeed enhance the IMM’s performance significantly in the case of a high quality modal sensor, but the enhancement diminishesas the quality of the mode observations becomes poorer.

As mentioned above, a recursion for the unnormalized joint posterior densityq(xk, sk|zk, yk) can be obtained by change ofmeasure. An exact estimator in this class was presented in [109]. Based on the recursion for the unnormalized density [321],[320], [322] proposed the approximate implementation, Gaussian wavelet estimator (GWE), described above. Again, the entireeffect of the mode observations is on the mode probability, also given by (31). The comparative simulation results amongIMM, IE-IMM, and GWE algorithms presented in [322] indicatedthat generally GWE provides an improvement over IE-IMMand IMM, which in some cases (e.g., good imager, poor kinematic data) may become considerable (25% over IE-IMM and50% over IMM) to justify its increase in computation. An exact hybrid filter based on change of measure that accounts forintermittent mode measurements was presented in [3], alongwith an approximate, GPB-type implementation and EM-basedestimation of transition probability matrix. More detailsof GWE and image-enhanced MM estimation can be found in [318],[189], [319].

Multirate IMM . [138], [139] proposed to use the IMM strategy to combine a bank of filters, each with a different data ratethat is consistent with its assumed target dynamics (e.g., aCA filter would benefit a higher data rate more than a CV filter).Afilter with a lower data rate updates less frequently, and thus compared with the corresponding full-rate IMM algorithm usingsimilar models, this multirate IMM algorithm provides certain computational savings at a cost of somewhat larger peak errorsat maneuver onset. The multirate data bank is obtained from the original data by a discrete wavelet transform.

Weighted-model based single filter. To reduce the computational complexity of MM algorithms, [234] proposed an MMbased single filter algorithm, where the filter is based on an average model, which is a probabilistically weighted sum of themodels used. The weights are updated over time by the recursive formula for hypothesis probabilities in the multihypothesisversion of the Shiryayev sequential probability ratio test(i.e., the quickest change-point detector under some conditions) [235]and approximate model likelihoods assuming Gaussianity.

3) Hypothesis Reduction by Pruning:The basic idea of hypothesis reduction by pruning has been explained in Sec. IV-C under the title “hard decision,” althoughthat subsection is for output processing. More specifically, for hypothesis reduction at timek a “good” subsetBk of modelsequences is identified and maintained while discarding/pruning less likely ones in the setM

k of all possible model sequences.The number of model sequences inB

k may or may not change over time.B best. This approach intends to maintain only a numberBk of the best (i.e., most probable or likely) model sequences at

time k. This idea has an exact and straightforward batch implementation: selectBk sequences with the largest probabilities(or likelihoods). But recursive implementations are more realistic. Consider a recursion at timek with Bk−1 “best” model

11Using unnormalized density is advantageous in providing linearity of the fundamental Bayesian recursion [291].12Fortunately, it turns out that for the special case of mode-observation-only tracking, optimal estimates forP{m

(i)k |yk} andf(xk|y

k) can be obtained

recursively with a fixed (non-increasing) computational requirements. The corresponding algorithms were proposed in [360] for P{m(i)k |yk} and in [100]

for E[xk|yk]. Further results along this line can be found in [176], [177].

13This makes perfect sense since the mode observations are modeled as a classifier, which carries no information about target’skinematic state directly.


histories (with associated base-state estimatesx(ik−1)k−1|k−1 and mode-sequence probabilitiesµk−1

(ik−1)) and M models (elemental

filters). Each elemental filter performsBk−1 conditional filtering operations, using onex(ik−1)k−1|k−1 as input in each operation,

yielding as many asBk−1M updated estimates{x(ik−1,j)k|k , (ik−1, j) ∈ B

k−1 × M} and the corresponding mode-sequenceprobabilitiesµk

(ik−1,j). Only Bk of theseBk−1M model sequences (and the associated base-state estimates)with the largestµk

(ik−1,j) are retained14 (and renormalized) for the next recursion (and output atk). The above mode-sequence probabilitiesµk

(ik−1,j) can be replaced by model-sequence likelihoods. As explained in Sec. IV-C, this recursive implementation actuallydoes not guarantee that theBk model sequences are actually most probable among all possible sequences, because the mostprobable ones may have less probable partial histories and thus been discarded at an earlier recursion. Another drawback isthat some or many of theB most probable model sequences may be quite similar and had better be merged to save processingresources. ThisB-best idea is the one underlying many hypothesis reduction techniques, such as the one in the MultipleHypothesis Tracking (MHT) algorithms (see, e.g., [289], [277], [34]). For maneuvering target tracking, it was first proposed in[333], [129], [333], [331], [332], and extended to Markov jump systems with unknown transition probabilities in [330] and toMM smoothing in [238].Bk is usually fixed over time, as in these publications, but the approach works for time-varyingBk.For instance,Bk can be determined automatically by requiring all survived sequences to have a probability above a threshold,as proposed in [213], [92], [63], or probably better to have their ratios of probability to the largest one above a threshold [213].

k = 1 k = 2 k = 3 k = 4

•

•

•

•

•

•@

@@

@@

AAAAAAAAA

@@

@@@

¡¡

¡¡¡

¡¡

¡¡¡

¢¢¢¢¢¢¢¢¢

•

•

•@

@@

@@

AAAAAAAAA

@@

@@@

¡¡

¡¡¡

¡¡

¡¡¡

¢¢¢¢¢¢¢¢¢

•

•

•@

@@

@@

AAAAAAAAA

@@

@@@

¡¡

¡¡¡

¡¡

¡¡¡

¢¢¢¢¢¢¢¢¢

5

31

2

31

6

56

4 2

4

5 7

4

4 2

3

11

3

24

4

73

3(a) Hypotheses with link likelihoods

k = 1 k = 2 k = 3 k = 4

•

•

•

•

•

•

•

•

•

•

•

•

¡¡

¡¡¡

@@

@@@

@@

@@@

8

10

8

(b) Most likely paths (k = 3)

k = 1 k = 2 k = 3 k = 4

•

•

•

•

•

•

•

•

•

•

•

•@

@@

@@¡¡

¡¡¡A

AAAAAAAA

@@

@@@

@@

@@@ ¢

¢¢¢¢¢¢¢¢

14

13

15

(c) Most likely paths (k = 4)

Fig. 8. Viterbi algorithm.

Viterbi algorithm . The Viterbi algorithm or forward dynamic programming (see, e.g., [115]) aims at finding the best path(model sequence) over a time horizon in a recursive manner. Fig. 8 illustrates this procedure for a hypothesis tree of threemodes similar to the one in Fig. 3(a). The number next to a link(mode transition) in Fig. 8(a) is its log-likelihood of beingthe true one; the number next to a node (mode) in Figs. 8(b) and8(c) is the log-likelihood of reaching the node by the mostlikely path (sequence of transitions), which isassumedto be the sum of the log-likelihoods of the links on the path; only themost likely paths reaching each node are indicated in Figs. 8(b) and 8(c). Clearly, the most likely paths (the thicker lines)through timek = 3 andk = 4 are(1, 2, 2) and(2, 3, 3, 1) respectively. Note that atk = 3 the path(2, 3, 3) is not most likely,but it must be the most likely one arriving at node3. This idea has a straightforward implementation for hypothesis reduction[11]. Consider a recursion at timek with M model histories (with associatedx(ik−1)

k−1|k−1 and µk−1(ik−1)

) and M models. Each

elemental filterj performsM conditional filtering operations, using onex(ik−1)k−1|k−1 as input in each operation, yieldingM

updated estimatesx(ik−1,j)k|k and the corresponding mode-sequence probabilitiesµk

(ik−1,j). Only the sequence with the largestµk

(ik−1,j) for eachj is retained for the next recursion (and output atk). In this way, the best model history for each model atany time is identified. It always hasM survived model histories at any time. Each of them is the bestmodel history for anelemental filter at the time. TheseM histories include the best onemk

MAP but are not necessarily theM best ones because,for example, the second best model history for a model may be better than the best one for another model. These suboptimalhistories are needed toguaranteethe inclusion of the best sequence for the future.

The above recursive procedure intends to find the most probable sequencemkMAP, namely, one with the largest probability

µk(ik) = P{mk

(ik)|zk} =1

cf(zk|mk

(ik), zk−1)P{m(ik)

k |mk−1(ik−1)

, zk−1}P{mk−1(ik−1)

|zk−1}

lnµk(ik) = lnµk−1

(ik−1)+ ∆k−1,k, ∆k−1,k = ln f(zk|mk

(ik), zk−1) + lnP{m(ik)

k |mk−1(ik−1)

, zk−1} − ln c (32)

If the mode transition log-likelihood∆k−1,k were independent of the mode historymk−2(ik−2)

= (m(1)1 ,m

(2)2 , . . . ,m

(ik−2)k−2 ), the

above Viterbi algorithm would yield the most probable sequencemkMAP. However, for a hybrid system∆k−1,k actually depends

on the mode historymk−2(ik−2)

, which is analogous to the case with Fig. 8(a) in which the number next to a link depends on the

14As such, updated estimatesx(ik−1,j)k|k

not among theBk best sequences actually need not be computed, but the corresponding model-sequence likelihoods

are needed to obtainµk(ik−1,j)

.


path reaching its left node. Consequently, the above procedure is only suboptimal even if the mode sequence is a Markov chainsuch thatP{m(ik)

k |mk−1(ik−1)

, zk−1} = P{m(ik)k |m(ik−1)

k−1 } = πik−1,ikbecause in this caselnµk

(ik∗) = lnµk−1

(ik−1∗ )

+ ∆∗k−1,k could

be larger thanlnµk(ik) even if lnµk−1

(ik−1∗ )

< lnµk−1(ik−1)

for some mode historymk−2

(ik−2∗ )

different frommk−2(ik−2)

. This dependence

of ∆k−1,k on mk−2(ik−2)

is a major (but subtle) difference between the hidden Markovmodels (HMM) of a hybrid system orfor target tracking (see [280] for a tutorial on HMM for tracking) and the more standard HMM found in such applications asspeech processing (see [285], [286] for a tutorial), wheref(zk|mk

(ik), zk−1) reduces top(zk|m(ik)

k ) because direct (continuous-or discrete-valued) observations of the mode are available. However, this dependence may be removed in the framework ofthe EM algorithm [283], discussed later in Sec. V-B.5.

4) Merging vs. Pruning:Experience indicates that base-state estimators for maneuvering target tracking based on merging strategies is usually superiorto those based on pruning. We believe that this performance difference stems from their difference in sensitivity to themode-model mismatch, similar to our discussion in Sec. V-A that contrasts MMSE estimation and MAP estimation. Simply put,the resultant sequence by merging is not limited to the assumed set of model sequences and in this sense merging is lesssensitive than pruning to the mismatch between the assumed models and the true model. On the contrary, pruning appearsmore intuitively appealing than merging for target tracking in the presence of clutter, where a measurement is better treatedeither from a target or clutter, rather than possibly from something in between.

Another major difference between merging and pruning is that the sum of the probabilities of the merged model sequencesis always one, while without renormalization that of the survived model sequences of a fixed number after pruning is everdecreasing, usually dramatically, as time goes. For example, for a problem with ten possible models at each time, if after pruningten model sequences are retained, their sum of probabilities at timek could be as low as10/10k = 10−(k−1) (assuming auniform distribution) or possibly even lower because many good sequences could have been deleted earlier in a recursiveimplementation. Even if the most likely sequences have a probability a million times that of the average probability of asequence, this sum has an upper bound10× 106/10k = 10−(k−7),∀k ≥ 7, which still drops dramatically as time goes. Thereis hardly any reason to expect the mean or global maximizer ofa mixture density of10k components can be approximatedwell by one with these10 components.

It should be clear from the above discussion that a test scenario in which the true model (sequence) is one of those assumedby the MM algorithm favors implicitly the pruning or selection based techniques, which is however not very realistic. Inotherwords, merging outperforms pruning and selection particularly when the true model differs significantly from those assumed.

IMM vs. B-best. Surprisingly few implementations of pruning strategies for tracking a single maneuvering target (withoutclutter) have been reported in the literature. One reason isprobably that early simulation results by the authors of theB-best pruning indicated that generally the performance of the B-best pruning is inferior to GPB-type merging of the samecomputational complexity [331]. Nevertheless, [297] reported an implementation of a B-best strategy based tracker for homingmissile guidance that features rapid and large variation intarget acceleration. For this fairly realistic scenario the B-beststrategy implemented with11 models (quantized acceleration levels in[−9g, 9g]) showed fairly satisfactory accuracy, but atthe very high cost of keeping253 model sequences. A comparison between a two-model (CV and Singer) IMM algorithmand two B-best strategy based MMSE and MAP filters, respectively, over a simplistic 1D scenario was reported in the shortnote [352]. It claimed that the B-best strategy provided better accuracy than the IMM algorithm, but no indication was givenof how many filters were used. A thorough and realistic comparison between two trackers based on the IMM and B-beststrategies, respectively, was presented in [146], [147] for a civilian (2D ATC tracking) and military (3D anti-aircraft guntracker) scenarios. The results showed that at a comparablecomputational complexity the two trackers were highly competitivein terms of estimation accuracy but had complementary strengths and trade-off patterns: The IMM algorithm led to smallerpeak errors at maneuver onset, while the B-best algorithm resulted in smaller steady-state errors during nonmaneuvering andmaneuvering motions. Although these results make sense intuitively, they provide a rather indirect comparison of the twotechniques ashypothesis reduction strategiesbecause different underlying designs were used for the filters: While the IMMstrategy used two kinematic models with a Markov model for mode transitions, the B-best strategy used three kinematic modelsand a sequence of independent binary random variables for the mode evolution. It would be more representative to comparethe two strategies with more similar designs and parameters.

IMM vs. Viterbi . Both the Viterbi and IMM algorithms were implemented and compared in [11] for a model-set designof thirteen models quantizing a 2D acceleration vector of a magnitude up to4g. The simulation results showed a comparableaccuracy of the Viterbi and IMM algorithms with generally smaller peak errors of the latter during maneuver onset. The Viterbialgorithm was slightly more accurate in cases with small modal separation. A closely related algorithm, along with simulationcomparison results, was presented in [283] using the EM technique, discussed later in Sec. V-B.5.

Combined pruning and merging. It is intuitive appealing to combine pruning and merging strategies, or more generally,hard decision with soft decision. Obviously, numerous waysof combination exist. An integrated idea is to prune all modehistories ending at modelm(j) at timek if its probability

µ(j)k = P{m(j)

k |zk} =∑

ik∈Mk

P{m(j)k |zk,mk−1

(ik−1),m

(j)k }P{mk−1

(ik−1),m

(j)k |zk} =

∑

ik−1∈Mk−1

µk(ik−1,j)


is below a threshold or prune those with the same pair(m(i)k−1,m

(j)k ) if its probability

µ(i,j)k−1,k = P{m(i)

k−1,m(j)k |zk} =

∑

ik−2∈Mk−2

µk(ik−2,i,j)

is below a threshold. This is actually an integration of merging and pruning sinceµ(j)k andµ

(i,j)k−1,k correspond to merging (see

Sec. V-B.1) and can be obtained (approximately) by the IMM (or GPB1) and GPB2 algorithms, respectively, although thecorrespondingx(i)

k|k and x(i,j)k|k are not needed directly here.

5) Iteration Based Algorithms:MMSE estimation and MAP estimation are actually problems ofintegration and maximization of a posterior density, respec-tively. ML estimation amounts to finding the global maximizer of the likelihood function. The above hypothesis reductionstrategies take advantage of the structure (i.e., mixture density) of the base-state distribution. Alternatively, these problems canbe solved numerically without explicit reliance on the mixture-density structure (see, e.g., [175]).

Although a large number of numerical integration algorithms are available, to our knowledge, none have been proposed forfinding an MMSE estimatorxMMSE specifically in the MM context. Effort along this line appears worthwhile. The situation isbetter for MAP-based MM estimation. A class of iterative search based algorithms have been proposed, almost all of whichrely on the so-called EM algorithm.

EM algorithm . The EM algorithm [87] is an iterative procedure of finding a maximizer of a likelihood function, particularlysuitable for the so-called incomplete-data problems (see,e.g., [246]). Consider the problem of estimating a parameter θ usingdataZ by the maximum likelihood method, given by

θML

= arg maxθ

fθ(Z) (33)

This is often hard or intractable if the likelihood functionfθ(Z) is unavailable (e.g., without a closed form). In many cases,however,fθ(Y,Z) for some “complete” data(Y,Z) is available and has a simple form, whereY is some “nuisance” randomparameter, known asmissing data, hidden data, unobservable data, or left-out data. Estimation based onfθ(Z) is in this sense

an “incomplete-data” problem. It is intuitively appealingto replacefθ(Z) with fθ(Z|θ) ∆= E[fθ(Y,Z)|Z, θ] given the best

available estimateθ of θ, where the average is overY for the givenZ and θ in E[fθ(Y,Z)|Z, θ], which is, for example,sometimes equal tofθ(Y(Z, θ),Z) with Y(Z, θ) = E[Y|Z, θ]. The basic idea of the EM algorithm is to solve the problemof arg maxθ fθ(Z) by the iterationθj+1 = arg maxθ fθ(Z|θj) with a better and better estimateθ of θ in the hope that

θj+1 = arg maxθ

E[fθ(Y,Z)|Z, θj ]j→∞−→ arg max

θE[fθ(Y,Z)|Z, θ] = arg max

θfθ(Z) = θ

ML

More specifically, starting with some initial estimateθ0 of θ, each iteration of the EM algorithm consists of two conceptualsteps (they are often combined actually):

• E-step (expectation):Q(θ|θj) = E[ln fθ(Y,Z)|Z, θj ]• M-step (maximization):θj+1 = arg maxθ Q(θ|θj)

The iteration stops when‖θj+1 − θj‖ is below some threshold and thenθj+1 is taken to beθML

. Clearly,Q(θ|θj)|θ=θj+1≥

Q(θ|θj)|θ=θj. It follows from Jensen’s inequality that this iteration enjoys the monotone propertyfθ(Z)|θ=θj+1

≥ fθ(Z)|θ=θj,

which guarantees global convergence15 of the likelihood valuesfθ(Z)|θ=θjfor a boundedfθ(Z) under mild regularity

conditions and likewise for global convergence ofθj under more stringent conditions (see, e.g., [246]).The key in the application of the EM algorithm is to identify the missing dataY and come up with a (Baum’s auxiliary

function)Q(θ|θj) that is equivalent toE[ln fθ(Y,Z)|Z, θj ] and has an easy-to-find maximum. The EM algorithm is attractivemainly for its simplicity, wide applicability, low computation and storage per iteration, and global convergence. Itsmainshortcoming is the lack of guarantee to converge to global maxima. A main application domain of the EM algorithm isestimation problems with a mixture density [288], [246], towhich most target tracking and MM estimation problems belong.The best known applications so far of the EM algorithm in target tracking are those with the Probabilistic Multiple HypothesisTracking (PMHT) method [313], [314], [315], [119], [355], where the above likelihood functionfθ(Z) is replaced by theposterior densityf(θ|Z) or equivalently the joint densityf(θ,Z).

EM-based MM estimation. For estimation of a hybrid system with base-state sequencexk, mode sequencemk, anddatazk, the complete-data problem is{θ,Y,Z} = {xk,mk, zk} (the set is unordered here). Then, two natural choices are(θ,Y,Z) = (xk,mk, zk) and (θ,Y,Z) = (mk, xk, zk), depending on what is to be estimated. The first choice aims atestimating the base-state sequence by treating the mode sequence as missing/hidden data, leading to the EM formulationforthe base-state sequence MAP estimationxk

MAP = arg maxxk f(xk|zk) with

E: Q(xk|xk[j]) = E[ln p(xk,mk, zk)|zk, xk

[j]], M: xk[j+1] = arg max

xkQ(xk|xk

[j]) (34)

15That is, convergence fromany starting point to a stationary point, including saddle points as well as (local andglobal) maxima.


where subscript[j] stands for iterationj. The second choice is for mode-sequence estimation, which treats the base-statesequence as hidden data, leading to the EM formulationmk

MAP = arg maxmk f(mk|zk) with

E: Q(mk|mk[j]) = E[ln p(xk,mk, zk)|zk, mk

[j]], M: mk[j+1] = arg max

mkQ(mk|mk

[j]) (35)

For a Markov jump system, we have the following key decomposition

p(xk,mk, zk) = f(zk|xk,mk)f(xk|mk, xk−1)p(mk|mk−1)p(xk−1,mk−1, zk−1)

The results in the remainder of this subsection are valid only for a Markov jump linear system (MJLS) with white, mutuallyindependent Gaussian process and measurement noises.

As stated in [225], [226],maxxk Q(xk|xk[j]) is equivalent to minimizing a sum of weighted squares of base-state prediction

errors, measurement prediction errors, and initial state estimation error of a linear Gaussian system. This system is an“average” Gaussian MJLS over possible modes at each time in that it depends on the mode only through its probabilityµ

(i)κ = P (m

(i)κ , zk|xk

[j]) = 1cκ

α(i)κ β(i)

κ , κ ≤ k, wherecκ is the normalization factor,α(i)κ andβ(i)

κ are computed via HMM-typeforward and backward recursions, respectively, as given in[225], [226]. From the equivalence of the optimal weighted LS,MAP, and Kalman smoothing for such systems it thus follows: in each batch iteration (usingzk) of the EM algorithm forMAP estimation of the base-state sequencexk

MAP = arg maxxk f(xk|zk), the required{µ(i)κ , κ ≤ k} based onxk

[j] can becomputed (in the E-step) by HMM-type forward and backward recursions for the above “average” linear Gaussian system; andthen xk

[j+1] can be obtained (in the M-step) by a fixed-interval Kalman smoother [225], [226].As derived in [225], [226],maxmk Q(mk|mk

[j]) is equivalent tomaxl δk(l). What is important here is thatδk(l), themaximum score of a model sequencemk with m(l) in effect at timek (i.e., mk = m(l)), has a recursive formδκ(l) =maxi[δκ−1(i) + lnπil] + gl(zκ, yκ, y2

κ), whereπil is the mode transition probability (6),gl(·) is a known function of modelm(l), yκ = [x′

κ, x′κ−1]

′, yκ = E[yκ|zk, mk[j]], andy2

κ = E[yκy′κ|zk, mk

[j]]. As such, within each batch iteration (usingzk) of

the EM algorithm for MAP estimation of the mode sequencemkMAP = arg maxmk p(mk|zk), the required{yκ, y2

κ, κ ≤ k} canbe obtained (in the E-step) by an efficient fixed-interval Kalman smoother for a system with base stateyκ equivalent to theoriginal linear system based onmk

[j]; and thenmk[j+1] can be obtained (in the M-step) by solving the optimal path problem

with the above scoreδκ(l) efficiently via dynamic programming (Viterbi algorithm) [225], [226]. Along a similar line, simplerEM-based ML estimation algorithm of the constant mode sequence (i.e., for the first generation AMM algorithm) was givenin [251] and [73] in the context of the so-called mixture of experts (see Sec. VIII).

EM-based MM estimation for tracking . Clearly, the above EM-based hybrid estimation techniquescan be applied tomaneuvering target tracking using multiple models explicitly or implicitly. Indeed, a number of such algorithms have beendeveloped in [281], [282], [283], [228], [227], [354], [355], [27], [28]. Here the key question is: What amounts to a maneuvermodel? Similar to the decision-based methods surveyed in Part IV [208], two answers have been proposed: (a) the unknowninput (acceleration)uk [281], [282], [283], [228], [227] and (b) the statistics (mean and covariance) of process noisewk [354],[355], [27], [28].

In [281], [282], [283], [228], [227], the target maneuver isdescribed by a linear time-invariant system in white Gaussiannoise

xk =Fxk−1 + Guk + wk, wk ∼ N (0, Qk)

zk =Hxk + vk, vk ∼ N (0, Rk)

with an unknown input (acceleration)uk modeled as a homogeneous Markov chain havingM possible levelsu(1), . . ., u(M)

and initial and transition probabilitiesp(u1), p(uk|uk−1). The base-state sequencexk is treated as missing data in [281], [282],[283] for mode-sequence estimation, whereQ(mk|mk

[j]) of (35) reduces to

Q(uk|uk[j]) =

k∑

κ=1

{

ln p(uκ|uκ−1) −1

2

∥

∥

∥Guκ−1 − x

[j]κ|k + Fx

[j]κ−1|k

∥

∥

∥

2

Q−1κ

}

, ‖x‖2A , x′Ax

The E-step clearly boils down to the computation ofx[j]κ|k , E[xκ|zk, uk

[j]], obtainable by a fixed-interval Kalman smoother

given uk[j], and the M-step can be implementedexactly by the Viterbi algorithm givenx[j]

κ|k since the scoreQ(uκ|uk[j]) −

Q(uκ−1|uk[j]) at timeκ depends only on the transition(uκ−1, uκ) [281], [282], [283]. Note that the transition score∆k−1,k of

(32) used in [11] depends on the entire historymk−1 only throughxk becausef(

zk|mk, zk−1)

= E[f (zk|xk,mk) |mk, zk−1].It would be independent ofmk−1 if f

(

zk|mk, zk−1)

= E[f (zk|xk,mk) |mk, zk−1] were used, as in the EM formulation,where average is overxk only.


As is typical for the PMHT approach, [228], [227] treats discrete uncertainties (uk here) as missing data16 for base-statesequence estimation via the EM formulation (34), leading to

Q(xk|xk[j]) = −1

2

k∑

κ=1

{

‖zκ − Hxκ‖2R−1

κ+ ‖xκ − Fxκ−1 − Gu[j]

κ ‖2Q−1

κ

}

+ ‖x0 − x0‖2P−1

0

where u[j]κ =

∑Mi=1 u(i)µ

(i)κ andµ

(i)κ = P{uκ = u(i)|zk, xk

[j]} is computed (E-step) via a forward-backward HMM smoothergiven xk

[l]. The maximizer ofQ(xk|xk[j]) given uk

[j] is found (M-step) by a fixed-interval Kalman smoother [228],[227].The approach of [354], [355] differs from that of [228], [227] in that the maneuver models differ in their process noisewk,

governed by a hidden Markov chain with different covariancelevelsQ(i), rather than the unknown input levels. This approachwas combined with the so-called turbo-PMHT in [293] for maneuvering target tracking in clutter. Further, by choosing thecomplete-data problem as(θ,Y,Z) = ((xk,mk), rk, zk), whererk stands for the sequence of data association events, anotherversion was proposed in [293], in which the forward-backward smoothing for M-step is replaced by an IMM smoother to handlethe uncertainty inrk. The EM formulation here is, however, directed to data association, not motion uncertainty. Comparativeresults of all the above EM-based trackers were also reported in [293]. Similarly, [27], [28] also models various maneuversby process noise ofM covariance levels (two levels were implemented). However,the base state is treated as a “nuisance.”Conceptually, the E-step amounts to fixed-interval Kalman smoothing, but the M-step is greatly simplified by assuming jumpsamong the levels are independent over time: it is trivially implemented for eachκ = 1, 2, . . . , k independently without a needfor dynamic programming. Still another popular formulation of maneuvers is in terms of turn rate. To our knowledge thereisno EM-based such formulation in the literature so far.

Remarks. The optimalMMSE-CMM estimator has an exponentially (geometrically) increasing computational complexityon the order ofMk, while the above EM-basedMAP-CMM algorithms have a linearly increasing complexity on the orderof (M2 + n3

x)k in each iterationfor the batch from initial time to timek, wherenx is the base state dimension. The pricepaid by these EM-based algorithms to achieve this linear complexity is: In general there is no hope to obtain theexactMAPestimate in finite iterations and no guarantee to converge tothe exactMAP estimate even after infinite iterations. Like almostall iterative algorithms for optimization problems, thereis no guarantee for the global convergence of an EM algorithmtothe global maximum—it may well converge to alocal maximum or saddle point or possibly even to a minimum in rarecases (see [246] for a simple example). A widely used strategy here is to try different initial points to enhance the chanceof converging to the global maximum at the cost of substantially increased computation. While practitioners would not insistto have an exact optimal estimate, giving up the requirementto be close to the global maximum is a major relaxation of theMAP estimation goal. There is no reason to believe that with such a relaxation the corresponding MMSE-based algorithm witha comparable complexity could not be developed. Nevertheless, the EM-based approach has certain undeniable merits: Itissystematic, theoretically elegant, and very powerful. This does not imply that it is free of serious drawbacks for practical, real-time applications in maneuvering target tracking. (See [355] for a comprehensive discussion.) What is probably worst isthat theEM algorithm requiresbatch processingof data, just like other MAP estimators. This is acceptable for some applications, suchas trajectory determination, but not for most maneuvering target tracking problems, where real-time processing is necessary.Although no recursive form in general, the EM algorithm may serve as a basis for developing approximate,recursivealgorithms,as described next.

EM-based recursive MM estimation for tracking. Two such EM-based recursive algorithms for maneuvering target trackinghave been proposed recently in [281], [282], [283], [157], [158]. The recursive algorithm proposed in [281], [282], [283] isan approximation of the batch solution for mode-sequence MAP estimation presented therein. It reduces the batch processingto the Viterbi algorithm combined with a one-step Kalman filter by (a) modifying the Baum’s auxiliary functionQ(uk|uk

[j]) =

E[ln p(xk, uk, zk)|zk, uk[j]] to an approximate sequential versionq(uk|uk−1) = E[ln p(xk, uk, zk)|zk, uk−1]|uk−1=uk−1 suitable

for recursion, (b) replacing the smoothed state estimates by the filtered state estimates, and (c) ignoring the dependence of xk

on uk−1 given E[xk|uk−1, zk], cov(xk|uk−1, zk), anduk. The resulting recursive algorithm usesM2 Kalman filtering cyclesat each recursion and is essentially the same as that of [11]:At time k for each transition (link)(uk−1, uk) of the Viterbitrellis, measurement residualzk and its covarianceSk are obtained by a one-step Kalman filter, and then the best path for eachlevel of uk is determined by the Viterbi algorithm based on the transition costs

∆k−1,k = ln p(uk|uk−1) −1

2‖zk‖2

S−1k

, Sk = Sk(HkQkHk)−1Sk

As pointed out in [281], [282], [283], it differs in effect from that of [11] only in thatSk is in place ofSk. No interpretationfor this placement was given, although it appears beneficialjudging from simulation results presented therein with an assumedmodel set{0,±0.15,±0.3}m/s2 in each of the two coordinates (thus there are25 models). A simple scenario with a true inputlevel jumping from0 to 0.15m/s2 and then back to0 again was considered, which does not seem representative ofthe reality.Contradicting the results shown in [11], by far poorer results from the IMM algorithm were also given in [283], possibly due

16In fact this work accounted also for clutter measurements in a PMHT framework.


to the particular design/implementation. More surprisingly, [283] claimed that the computational time of the IMM algorithm,which hasM complexity, is 6 times that of the recursive EM-based algorithms of [281], [282], [283] and [11], which bothhaveM2 complexity.

The above recursive EM-based algorithm is based on mode-sequence estimation. More natural is a recursive EM-basedalgorithm for base-sequence estimation directly. Such an algorithm was proposed in [157], [158], named “reweighted IMM(RIMM)” algorithm by its authors. It is identical to the IMM algorithm except that the mixing and output formulas are replacedby a new weighted sum in which the weights account for not onlythe probability of each model being true but also the accuracyof the estimate from each elemental filter:

x(j)k|k−1 = P

(j)k|k−1

∑

i∈M

(P(i,j)k|k−1)

−1x(i,j)k|k−1µ

i|jk−1, (P

(j)k|k−1)

−1 =∑

i∈M

(P(i,j)k|k−1)

−1µi|jk−1 (36)

xk|k = Pk|k∑

i∈M

(P(i)k|k)−1x

(i)k|kµ

(i)k , P

−1

k|k =∑

i∈M

(P(i)k|k)−1µ

(i)k (37)

where the mixing weightsµi|jk−1 and model probabilityµ(i)

k are computed as in the IMM algorithm (see Table II) and

x(i,j)k|k−1 = E[xk|zk−1,m

(i)k−1,m

(j)k ] = Fj x

(i)k−1|k−1 + Gju

(j)k−1, P

(i,j)k|k−1 =

πij

µ(j)k|k−1

F(j)k P

(i)k−1|k−1(F

(j)k )′ + G

(j)k Q

(j)k (G

(j)k )′

Both “reweighted” sum formulas are combinations of a probabilistic weighted sum for MMSE-based MM estimation and the“parallel resistors” formula for fusion of (probabilistically correct) estimates with uncorrelated errors (see Sec.8.3.3 of [19]).Note that at each recursion, the RIMM algorithm requiresM2 predictions butM updates, while the IMM algorithm requiresMpredictions andM updates (see Sec. V-B.1). Similar to the GPB2 case, letXk−1 = {x(i)

k−1|k−1, P(i)k−1|k−1, µ

(i)k−1, i ∈ M}. The

reweighted output formula follows from minimizing a sum of quadratic errors of fitting(xk−1, xk) to measurementzk, dynamics,and the estimatesx(i)

k−1|k−1, i ∈ M with a givenprobabilistic weightµ(j)k , j ∈ M , that is,xk|k = arg minxk

q(xk|Xk−1) withq(xk|Xk−1) = minxk−1

q(xk−1, xk|Xk−1), where

q(xk−1, xk|Xk−1) =∑

i∈M

‖xk−1 − x(i)k−1|k−1‖2

(P(i)k−1)

−1µ

(i)k−1

+∑

j∈M

{

‖zk − H(j)k xk‖2

(R(j)k )−1

+ ‖xk − F(j)k−1xk−1‖2

(G(j)k Q

(j)k G

(j)′k )−1

}

µ(j)k

As shown in [157], [158],q(xk|Xk−1) can be written either in the GPB2 form ofM2 pairs of quadratics

q(xk|Xk−1) =∑

i,j∈M

{


(R(j)k )−1

+ ‖xk − F(j)k−1x

(i)k−1|k−1‖2

(P(i,j)

k|k−1/µ

i|jk−1)

−1

}

µ(i)k

or in the IMM form of M pairs of quadratics:

q(xk|Xk−1) =∑

j∈M

{


(R(j)k )−1

+ ‖xk − x(j)k|k−1‖2

(P(j)

k|k−1)−1

}

µ(i)k

with mixing estimatex(j)k|k−1 and covarianceP (j)

k|k−1 given by (36). (37) thus follows easily from minimizingq(xk|Xk−1) in thisIMM form. As explained in Sec. V-B.1, these two forms are equivalent for output processing, but not for conditional filtering.Each pair of the quadratics above is minimized by a Kalman conditional filter. As argued in [157], [158],−q(xk−1, xk|Xk−1)can be interpreted/justified as an approximation of the Baum’s auxiliary functionQ(xk|xK

[l]) = E[ln p(xK ,mk+1, zK)|zK , xK[l]],∀k ≤

K for a data batchzK = (z1, . . . , zk, zk+1, . . . , zK) for an Alternating Expectation Conditional Maximization (AECM)algorithm [252], [246]. The approximations arise from truncation of xk to (xk−1, xk) and replacement of the smoothedestimates from multiple iterations by filtered estimates from a single forward path. The AECM algorithm is an extensionof the Expectation Conditional Maximization algorithm [253], which replaces a complex M-step of the EM algorithm withseveral computationally simpler maximization steps conditioned on some constraints on the estimatee, called CM steps. In theAECM algorithm the specification of the complete data could be different on each CM step. This provides more flexibilityneeded for formulating sequential problems than most otherEM-based algorithms. Compared with the standard two-modelandthree-model IMM designs of [18], simulation results included in [158] suggests that a two-model RIMM design had a morefavorable speed error.

Other iterative algorithms . An iterative algorithm for joint MAP estimation of the basestate and mode sequences wasproposed in [226] based on the block component optimization[125]:

M1: xk[j+1] = arg max

xkf(xk, zk|mk

[j]), M2: mk[j+1] = arg max

mkp(mk, zk|xk

[j+1])


which belongs to the general class of the so-called coordinate ascent (or descent) methods [229], [30]. For a Gauss-Markovjump-linear system, the M1 and M2 steps can be accomplished by a Kalman smoother and Viterbi algorithm, respectively[226]. In fact, we believe it is better to swap the above M1 andM2 steps as follows

M1: mk[j+1] = arg max

mkp(mk, zk|xk

[j]), M2: xk[j+1] = arg max

xkf(xk, zk|mk

[j+1])

becausemk is usually not sensitive toxk (think, e.g., the case withmk = uk above) butxk depends onmk significantly. Anothercoordinate ascent algorithm was proposed in [90] to obtain the MAP estimate of the mode sequencemk by treating eachmκ asone coordinate through iterationsm

[j]κ = arg maxmκ

p(mκ|zk,m[j]1 , . . . ,m

[j]κ−1,m

[j−1]κ+1 , . . . ,m

[j−1]k ) for κ = 1, . . . , k. Likewise

for the MAP estimation of the base-state sequencexk. It was demonstrated in [90] by simulation results from maneuveringtarget tracking that these two MAP algorithms outperform the corresponding EM-based MAP algorithms of [226] discussedabove.

C. Multiple-Model Smoothing

In maneuvering target tracking a number of problems exist that allow offline processing. One example is trajectory recon-struction (see, e.g., [270]). Also, if an estimation delay can be tolerated the tracking performance may be improved dramaticallyby smoothing [233], [95], [96], [97], [173], [174], [77], [78]. Besides, smoothing can also be used as an integral part ofanonlinear filtering algorithm to improve performance without time delay.

Smoothing is estimation (or more precisely “retrodiction”[95], [96], [97]) of a process at or through timen using datazk

through timek (k > n). In the Bayesian setting, the complete solution amounts tofinding the distribution functionf(yn|zk)or f(yn|zk). For hybrid systems,y could be the base statex, modem, or hybrid stateξ = (x,m). Then, findingf(yn|zk)andf(yn|zk) are state smoothing and state sequence (or trajectory) smoothing, respectively. Formal solutions to many of thesesmoothing problems in the sense of MMSE and MAP point estimation have been presented in Sec. V-A. Here we focus onbase-state smoothing for point (not density) estimation, that is,xn|k with n < k, since almost all existing results are limited to

this case. As given in Sec. V-A, the MMSE-optimal smootherxMMSEn|k = E[xn|zk] is a weighted sum ofx(ik)

n|k = E[xn|zk,mk(ik)]

given mode sequencemk(ik) with weightsµk

(ik). The probabilistic weightsµk(ik) = P{mk

(ik)|zk} can be obtained by the Bayes’

rule. For a jump-linear system with white Gaussian noise, each conditional smoothed estimatex(ik)n|k is given by the well-known

Kalman smoother (see, e.g., [287], [248], [247], [7]). As inthe case of MMSE-filtering, unfortunately, this optimal solution hasan exponentially increasing complexity and is thus infeasible for real-time applications. So a number of suboptimal solutionswith polynomial or even linear complexity have been proposed [238], [59], [132], [133], [76], [174], [78]. Some of the issuesassociated with smoothing for target tracking are discussed in [95], [96], [97].

For smoothing, particularly recursive smoothing, three common classes have been traditionally considered: fixed interval(xn|k, n = 1, 2, . . . , k with a fixedk), fixed point (xp|k, k = 1, 2, . . . with a fixedp), and fixed lag (xk−L|k, k = 1, 2, . . . witha fixedL). MM smoothing has been largely limited to the cases of fixed interval and fixed lag except that of [154].

Fixed-interval smoothing. [59] presented general results for time-reversion of discrete-time Markov jump systems (MJS)and, in particular, models in reverse time that are equivalent to the original Markov jump-linear systems (MJLS). As anapplication, it presented an optimal solution for fixed-interval smoothing of an MJS based on fusion of posterior distributionsobtained by two optimal MM estimators, one running forward for the original system and the other running backward using theequivalent reverse-time model. The approach is quite general, not limited to a MJLS or point estimation. The main difficultyis to obtain the equivalent reverse-time model and the optimal forward and backward MM estimators. For an approximateimplementation the IMM algorithm was suggested to replace the optimal MM estimators. This implementation was demonstratedby simulation via trajectory smoothing of the state of an MJLS to have very good estimation accuracy and mode identificationat a low computational cost. The approximate fixed-intervalsmoother of [132] is conceptually similar in that it also consistsof fusing forward and backward IMM estimators, but with two significant differences. First, fusion is based on a simpler butmore restrictive rule—the parallel-resistor formula (see,e.g., [116] or Sec. 8.3.3 of [19]). As a result, the backward IMMestimator has to be initialized without any prior knowledgeof the state.17 Second, it bypassed the task of finding equivalentreverse-time model and derived the required backward IMM algorithm directly from the original MJLS with white Gaussiannoise. The simulation results for maneuvering target tracking demonstrated a dramatic improvement of the smoothed estimatesin comparison with the forward/backward IMM estimators alone. In both smoothers, fusion is done between every pair offorward and backward conditional filters, resulting inM2 fusion operations per time step, although fusion between the overallestimates of the two IMM estimators would reduce it to just one fusion operation per time step, but with some performanceloss. These IMM smoothers are both MMSE based, although the general approach of [59] is not limited to the MMSE criterion.

As pointed out in Sec. V-A, the components of an MMSE sequenceestimatexk|kMMSE = E[xn|zk] are MMSE smoothed

estimatesxMMSEn|k = E[xn|zk]: x

k|kMMSE = (xMMSE

1|k , . . . , xMMSEk|k ), but the components of a MAP sequence estimatex

k|kMAP =

arg maxxk f(xk|zk) are not MAP smoothed estimatesxMAPn|k = arg maxxn

f(xn|zk): xk|kMAP = (x1|k, . . . , xk|k)MAP 6= (xMAP

1|k , . . . , xMAPk|k )

17This fusion rule can be replaced by more general ones (e.g., those of [222]) so that the backward estimator can be initialized as desired.


because the peak location of the joint pdff(x, y) is not (x∗, y∗), wherex∗ andy∗ are the peak locations of the marginal pdfsf(x) and f(y) respectively. Thus, the EM-based algorithms, discussed inSec. V-B.5, for MAP sequence estimation do notprovide fixed-interval MAP smoothed estimates in one shot. However, they can be modified easily for MAP smoothing of astate sequence. More importantly, a MAP sequence estimate appears more meaningful and useful in practice than a sequenceof such MAP smoothed estimates.

Fixed-lag smoothing. The fixed-lag smoothing algorithm of [238] differs from theMM filtering algorithm of [333], [129],[333], [331], [332] based on the B-best pruning strategy (see Sec. V-B.3) only in that conditional filtering is replaced byconditional smoothing, achieved by fixed-lag Kalman smoothers [7]. Two MM algorithms were developed in [131], [133] forone-step fixed-lag smoothing based on two different representations off(xk−1|zk) via the total probability theorem

A : f(xk−1|zk) =M∑

i=1

f(xk−1|m(i)k−1, z

k)µ(i)k−1|k, B : f(xk−1|zk) =

M∑

i=1

f(xk−1|m(i)k , zk)µ

(i)k

Algorithm A involvesM2 one-step smoothers after approximatingf(xk−1|m(i)k−1, z

k) andf(xk−1|m(i)k−1,m

(j)k , zk) by single

Gaussians via moment matching,18 while algorithm B involves onlyM one-step smoothers based on the standard IMMapproximation off(xk−1|m(i)

k , zk) by a single Gaussian. Algorithm B was extended toL-step fixed-point smoothing in[154]. The central idea of theL-step fixed-interval smoother (over a sliding window) of [76] is to form a grand state(x′

k, x′k−1, . . . , x′

k−L)′ by state stacking (augmentation) and run the IMM estimator for this augmented system to producethe smoothed estimateE

[

(x′k, x′

k−1, . . . , x′k−L)′|zk

]

“automatically.” A key enabling approximation is thatf(xk, xk−1, . . .,

xk−L|m(j)k , zk) is Gaussian, which is quite stronger than the approximationB above. Another underlying implication is that

no mode jumps within the interval(k − L, . . . , k − 1] are accounted for explicitly. Additional strong approximations weremade to evaluate the retrodicted probabilitiesP{m(i)

k−n|zk}, n = 1, . . . , L since the IMM estimator gives onlyP{m(i)k |zk}.

Another approximate way of evaluating these mode probabilities was given in [275]. Simulation results over the scenario of[132] demonstrated that even with only a small lag this smoother can outperform the IMM filtering algorithm very significantlyat the cost of an increase in computation about(L + 1) times. Further successful tracking applications of this state-stackingbased IMM smoother were reported in [76] for single maneuvering target in clutter (smoother coupled with PDAF), and in[78] for multiple maneuvering targets in clutter (smoothercoupled with JPDAF). It was found during our investigation for[154] that this state-stacking based IMM smoother and the one-step IMM smoother of [133] had almost equal accuracy forthe scenario simulated, while the latter is much less computationally demanding.

For real-time applications, the above smoothing results can only provide time delayed estimates. Nevertheless, such delayedestimation can be used to improve filtering results (i.e., without a delay) of an MM algorithm, which is inherently nonlinear, byrefining its past estimates and rerunning the MM algorithm using the refined estimates. This approach was taken consciouslyand promoted in [154] for performance enhancement, which reduces the peak error significantly, but increases steady-stateerror slightly.

D. Convergence of CMM Estimation Algorithms

For a stochastic jump-linear system, a hybrid estimation algorithm that converges exponentially exists if a set of conditions,including those on observability, given in [145] are satisfied. Here the exponential convergence [145] refers to an algorithmwhich correctly identifies the system mode in finite time and has a base state estimate sequence with a unique mean andconvergent covariance, and an estimation-error mean converging exponentially to a bounded set with a guaranteed rate.Earlier,[278] considered the problem of mode-sequence identification. It defines

µ(j)k =

1

c

M∑

i=1

(ΠL)ij µ(i)k−1 exp[−Asj ], Asj = lim

L→∞‖Z(s)

k − Z(j)k ‖2/(Lσ2)

whereZ(j)k = [z

(j)′(k−1)L+1, z

(j)′(k−1)L+2, . . . , z

(j)′kL ]′ is the measurements over the block (interval) ofL discrete times,[(k− 1)L+

1, (k− 1)L+2, . . . , kL] if m(j) is the correct model over the block;s stands for the true model;(ΠL)ij stands for the(i, j)thelement of theLth power of the transition probability matrixΠ; and c is the sum of the numerators overi = 1, . . . ,M . Fora hybrid system, it was shown in [278] that the weightsµ

(j)k will converge and the true models has the largest steady-state

value limk→∞ µ(s)k under the condition that all mode transitions are possible but infrequent and the true model has the best fit

to the data. Note thatµ(j)k is an approximation of the following de facto mode probability in the Gaussian case

µ(j)k =

1

c

M∑

i=1

(ΠL)ijµ(i)k−1 exp[−‖Z(s)

k − Z(j)k ‖2/(Lσ2)]

18TheseM2 one-step smoothers were lumped intoM one-step smoothers by a hardly justifiable heuristic in [131].


assuming no mode transition within the block, by replacing the exponential factor with its stead-state value as the block sizeincreases. As such, the above results forµ

(j)k hold approximately true forµ(j)

k , which is meaningful ifµ(j)k is used as a fitness

measure for an MM estimator. In fact, such a block MM estimator was proposed in [278] for mode sequence MAP estimation.By a theoretical analysis of a generic CV-CA IMM algorithm, [88] concluded that even if the true system has a CA plant,

the error of theconstant-velocity(CV) filter remains bounded and the steady-state RMS error can be estimated from theacceleration estimates of the CA filter.


Almost all tracking applications of the second generation MM algorithms so far are those of the IMM algorithm [12],[14], [15]. Many of these applications have been documentedin the survey of [243]. Since then numerous new successfulapplications have been reported. They further demonstratethe excellent performance of the IMM algorithm for various trackingproblems. While addressing older results briefly, many already discussed in [243], we will pay more attention to more recentones.

1) Surveillance for Air Traffic Control:The first real application of the IMM algorithm is probably the jump-diffusion prototype tracker developed by Blom’s teamas the track maintenance part of a multisensor multitarget tracking system [60] (see also [61], [55, 1992]) for Eurocontrol,the European organization for the safety of air navigation.This sophisticated tracker included four models for horizontalmotion (straight constant/changing speed motion, and left/right turns) and two for vertical motion (level motion and changingaltitude). Such a model set well represents typical en-route motions of a civilian aircraft—horizontal CV motions most of thetime with occasional changes in speed, or horizontal constant turns, or small vertical climb/descent maneuvers. It also allowsdecoupled tracking of the altitude and horizontal motions.An EKF was employed to handle the nonlinearity involved in someof the horizontal models (with 2D position, ground speed, course and transversal acceleration as the state components)(seeSec. V.B.2 of [209]). The more efficient second-order-dependence model (3) was employed to govern the mode transitions.The comprehensive performance evaluation with both simulation and realair traffic control (ATC) data presented in [60]demonstrated accurate state estimation, fast response to mode changes, and high credibility of the tracker. Overall the trackeroutperformed by far the existing ones based a single model (α-β-γ or EKF based). This IMM-based prototype tracker has beenimplemented and installed in the Air Traffic Management Surveillance Tracker and Server (ARTAS) of Eurocontrol, which isoperational in many European countries and is the basis for the Surveillance Data Processing and Distribution (SDPD) systemin Europe [136], [54].

Another comprehensive study of the capabilities of the IMM algorithm for advanced ATC systems within the Hadamardproject of the French civil aviation administration was reported in [335], including detailed design and evaluation ofsix differentMM configurations. It was found that the best trade-off between performance and computation was achieved by a two-modelIMM configuration of a CV model and a CT model with an unknown turn rate (Sec. V.B of [209]) as a state component andfictitious process noise for the longitudinal acceleration. In contrast to [60], no explicit model for the longitudinalaccelerationwas included because it is rare and small in civil aircraft motion. This two-model CV-CT IMM configuration was shown tomeet the fairly stringent requirements of the project very well. A principal conclusion of this study was that “stringent speedestimation can be obtained only with MM algorithms.”

A detailed design and evaluation of an IMM algorithm with a two-model (CV-CT) configuration was given in [199] (seealso [18], [21]), along with many guidelines and insights onparameter selection and tuning of the algorithm. In particular, itwas demonstrated that the CT model is well suited to the ATC applications and best results are obtained if it is included asamaneuver model. An implementation of a two-model (CV-CT) IMM algorithm was reported in [161] for the Micro En RouteAutomated Radar Tracking System (µEARTS) that again showed the superiority of the IMM estimator to single-model (adaptiveα-β or EKF) based filters. The series of papers [362], [340], [171] presented the work for a large-scale (about 1000 targets)multisensor multitarget tracking system for ATC surveillance, which was developed into the software package MATSurv (for“Multisensor Air Traffic Surveillance”) [361]. It combinesthe IMM algorithm for state estimation with assignment algorithmsfor radar data association in a dense environment. This combination is supported by the prior comparative study of IMMPDA vs.IMM-assignment on real ATC surveillance data reported in [172]. More specifically, [362] implemented an IMM configurationwith two second-order linear models. [340] reported a significant performance enhancement by utilizing a nonlinear CT modelin the (CV-CT) IMM configuration, with an improvement of 10-50% in the horizontal prediction errors on real data obtainedfrom five ATC radars. Additionally, the IMM configuration facilitated data association better than the Kalman filter. Moreresults on real and simulated data were presented in [171], along with a discussion of parallelized algorithms with superlinearspeedup for multitarget tracking using the IMM estimator. Other references addressing the design of IMM tracking filters forATC surveillance include [337], [127].

2) Defence Applications:Compared with the ATC applications, which involve relatively benign maneuvers, tracking hostile or non-cooperative targets,such as evasive manned aircraft or antiship missiles, is unparallelly more difficult and challenging. For example, thesetargets possess very strong maneuverability, often not well known to trackers; their motion behavior is quite unpredictable


without sufficient knowledge of their types, missions, tactics, etc.; they may apply countermeasures to degrade the quality ofmeasurements and hamper tracking efforts. The good news is that data quality and rates are usually higher than in the ATCcase. As evidenced by the vast majority of studies, the MM approach appears to be the most powerful framework capable ofmeeting the challenges of maneuvering target tracking in a feasible way.

Benchmark tracking problems. Several benchmark problems were initiated in 1994 for a unified performance evaluationand fair comparison of tracking algorithms using a phased-array radar, which will be discussed in greater detail in a subsequentpart. Comparative studies of a variety of tracking algorithms and designs were reported using the first benchmark [40], [48]and second benchmark [47], [45], [49]. They suggested that among all solutions proposed only the IMM algorithm was ableto handle satisfactorily the wide range of maneuver scenarios, varying from mild 2-3g turns of cargo aircraft to intenseseriesof severe 5-7g turns of fighter aircraft [44], [86], [350], [168], [35] (using 3 models) and [155] (using 2 models). Some ofthese performance studies were verified by real experiments. The IMM design of [350] used CV, constant-thrust, and 3DCT models. The constant-thrust model was implemented adaptively within the standard CA filter by correcting the predictedacceleration vector (before and after mixing) so as to make it parallel to the predicted velocity vector. In a similar mannerthe predicted acceleration vector of the 3D CT filter was madeperpendicular to the predicted velocity and the speed waskept “nearly” constant by means of the kinematic-constraint technique of [4]. The main approaches to the second benchmarkproblem—including the one in [50], the IMMIPDAF of [350], theIMMPDA solutions of [167], and the IMM-MHT solutionof [36]—all use the IMM algorithm as a base-state estimator. [167] used three coordinate-uncoupled models: a CV model withlow process noise for benign motion, a CV model with high process noise for ongoing maneuver, and a CA model with highprocess noise for maneuver onset/termination. [36] employed a horizontal CT model with polar velocity (see Sec. V.B.2 of[209]), a CV model, and a CA model, where the CA and CT models allow altitude changes (see also [35], [68]). Generallyspeaking, the IMMPDA solution reduced more radar time whilethe IMM-MHT reduced more radar energy. More recently[293] presented comparative results for the second benchmark problem between maneuvering (i.e., MM) PMHTs (see Sec.V-B.5) showing that the PMHT performed reasonably well, almost as well as the above two solutions. The use of the IMMalgorithm in the subsequent benchmark problems as a base tracking filter appears beyond question. Additional informationfor adaptive sampling and waveform selection in phased-array radars using the IMM algorithm can be found in [308], [194],[211], [311].

Ballistic targets. In recent years the IMM algorithm proved very useful for another practical problem of vital importance—tracking of tactical ballistic missiles (TBM) in all flight phases:boost (including post-boost),coast (free flight), andreentry(possibly maneuverable). The motion of a TBM is much more constrained than that of a manned maneuvering aircraft and canbe modeled relatively well during any particular flight phase (mode) (see [205]). In contrast to the hard-decision basedAMMapproach of [104], [103] (see Sec. IV-E), various IMM-basedalgorithms, which make soft decision, have been proposed toavoid the deficiencies associated with a hard decision [317], [254], [39], [141], [142], [80]. The IMM algorithm was employedin a prototype system for TBM in [317]. Four models were used:specialized boost, coast, and reentry models and an auxiliary“general-purpose” CA model intended to provide a “back-up”estimate to the other filters (through the mixing mechanism)incases when the specialized models are inadequate (e.g., dueto unexpected maneuvers, such as trajectory corrections, retargeting).A critical issue in the MM tracking of ballistic targets is design of the transition probabilities since they are time-varying,for possible transitions are strongly dependent on the current mode of flight. This dependence was accounted for in [317]byswitching among five transition probability matrices depending on the estimated target altitude and flight phase. For example,the boast and coast models/filters are dropped when reentry phase is established. Another major issue, mixing of differenttarget states/covariances, was avoided by mixing only the common position and velocity components. The implementations of[39], [254], [141] considered only the boost and coast phases. [39] used a three-model configuration with CV for coaster,CAfor a “generic” filter, and a detailed flight dynamics based ten-state model for booster [205]. Specific for this implementationis the unconventionalad hocstate/covariance mixing, allowed only between booster andCA, and between coaster and CA.This seems to make sense if the CA filter is accurate enough in all conditions to provide a backup in case the booster orcoaster filter does not perform well due to a mode change. For asimilar implementation (2-model boost-coast IMM) [254]proposed and analyzed different time-varying distributions of the boost to coast transition based on a predicted burnout timeand the uncertainty of this prediction. The study showed that the 2-model IMM algorithm is able to “detect” the burnout andprovide highly accurate estimates shortly thereafter. No rocket staging, however, was allowed in this study. It seems that a CAor other (e.g., correlated) generic filter for accelerationwould help cope with possible staging. For modeling the boost-to-coasttransition in a two-model IMM configuration, [141], [142] proposed a sigmoidal function for transition probability, dependingon the estimated altitude anda priori altitude at which the booster cutoff is likely to occur. Performance comparison of thisversion with an IMM algorithm having constant transition probabilities, EKF, andα-β-γ trackers over simulated and real datademonstrated its better capabilities provided the parameters guess does not mismatch the truth by far. Another implementationof IMM for tracking of a TBM in the entire flight was given in [80].

GPB1 applications. The early paper [264] formulated the GPB1 algorithm for a semi-Markov mode sequence, but one withexponential distributed sojourn times was simulated, which is actually Markov. [257] applied this algorithm to passive trackingof a submarine with vertical maneuvers. By quantizing unknown input into several known levels, the GPB1 algorithm reduces toa single Kalman filter with a probabilistically weighted input for a linear target motion. Submarine tracking was studied further


in [259], [260], [258], [261] by GPB1 tracking in range, velocity, and depth using passive time delay measurements. [123],[263] presented detailed GPB1 designs for realistic 3D manned maneuvering aircraft tracking scenarios. A Singer modelwithseveral known quantized mean levels of acceleration was employed for an MM description of the target dynamics and the targetmaneuver was modeled by (semi-) Markov transitions betweenthe levels. Versions in rectangular and spherical coordinates weredeveloped and investigated, showing a good accuracy and filter stability over a wide range of target acceleration, unreachableby a conventional single (e.g., EKF) filter. Other related implementations and applications of the GPB1 algorithm can befoundin [336], [262], [71]. [160] proposed a six-model GPB1-typeMM tracker for a maneuvering reentry vehicle (MaRV) thatquantizes the acceleration vector to represent the possible maneuvers, left/right turns, climb/dive, and deceleration within theMaRV model [75], [205].

3) Tracking in Presence of Correlated Noise, Glint, or Multipath:Correlated noise. Radar tracking at high sampling rate leads to significant temporal correlation of the measurement errors,which degrades the performance of those trackers relying onwhiteness of the measurement noise. Techniques for handlingcorrelated measurements within the IMM framework were proposed in [128] and [357]. The approach of [128] is based onmodeling the errors as a first-order Markov process with known coefficients. After decorrelation using the standard measurementdifference method, the IMM algorithm was applied straightforwardly (see also [243] for a discussion). [357] proposed amoregeneral technique that performs the decorrelation in an adaptive manner within the IMM framework without the assumption ofknown correlation parameters. It was demonstrated that this adaptive version and the one with known correlation parametershave similar performance.

Glint . When tracking large targets at short distance, the resulting radar measurement errors (known asglint noise) ischaracterized by non-Gaussian distributions with a heavy tail due to the interference caused by reflections from differentelements of the target. The presence of glint can seriously degrade tracking performance if white Gaussian measurementerrorsare assumed by the tracker. Modifications of the Kalman filtercapable of accounting for glint can be found in [134], [236],[237](see also [243]). A good approximate model for the distribution of a glint is a mixture of a Gaussian with a moderate varianceand a Laplacian distribution with a large variance and a small weight [356]. This model has been generally accepted now. It wasused in [356] to develop a tracking filter implementing the Masreliez filter for non-Gaussian noise [236] for the approximatespherical target-measurement model of [123]. In the context of maneuvering target tracking, this approach was extended in[358], where a two-model IMM configuration with two modified19 Masreliez filters (instead of Kalman filters) was proposed.A different approach was proposed in [84], [85]. Two measurement models were included in the IMM design—one matched tothe system with Gaussian observation noise and the other to the system with Laplacian noise of a large variance. Conditionalfiltering was implemented in [84] by EKFs for both models.20 Further, to handle the maneuvering target case, the IMM designwas expanded in [85] with a (Singer) maneuver model. Under the assumption that the model sets describing the target motionand measurement noise, respectively, are independent,21 a “layered” version of the IMM algorithm was developed, which hascomputational advantages. A somewhat similar approach wasfollowed in [368]. The performance evaluation over two scenariosfrom the benchmark problem of [49] showed a significant advantage of this layered IMM over the algorithm of [358] in termsof noise reduction, faster response to mode changes, and better mode identification. A possible enhancement here is to replacethe EKFs in [84] by better filters, such as those based on measurement conversion (see [206]) or the approximate best linearunbiased filters for polar/spherical measurements of [367], [366]. For a 2D homing missile scenario, [312] implementedanIMM algorithm with two decoupled models for range and bearing in Gaussian and Laplacian noise, respectively, as in [84].Inthis setting the measurement equations are linear. Anotherstudy of target tracking in glint was presented in [326]. It employedmixture reduction techniques [295] for MM estimation.

Multipath . The multipath propagation effects arise in radar or sonar tracking especially when the target is in a close vicinityof a reflecting surface. For example, due to the combination of the return from a low elevation target and sea-surface reflectedreturns, the measurement error can be huge as compared to the“normal” ones. The effect is very complex and may bedevastating to a tracker that assumes normal and uncorrelated measurement errors. An elegant solution to this problem basedon the IMM method was proposed in [17]. As shown therein analytically, this is essentially a hybrid estimation problem dueto the jumpwise behavior of the multipath error process arising on top of the standard measurement error process. It wasdemonstrated that the multipath effect can be successfullyalleviated by an IMM mechanism with a “no multipath” model anda first-order autocorrelated multipath model, without the need for a detailed physical model. Somewhat related, unknown noisecan be identified by an IMM algorithm, as proposed in [200], [20].

19Since the Masreliez filter requires in general performing a convolution operation, an efficient approximation based on normal expansion of the predictedmeasurement distribution was developed.

20The use of an LMMSE-based EKF, rather than the MMSE filter, forLaplacian noise was motivated by its great computational advantage at an acceptableloss of accuracy as compared to the exact nonlinear MMSE filter(derived therein).

21Target motion and glint seem coupled due to the target attitude changes during maneuver on/off that could cause glint to appear/disappear or change.


VI. T HE THIRD GENERATION: VARIABLE -STRUCTUREMM ESTIMATION

A. Theoretical Foundation of VSMM Estimation

The first two generations have afixed structurein the sense that they use a fixed set of models throughout the time thanks totheir fundamental Assumption A2. The third generation abandons A2, but retains A1’, resulting in having a variable structure,hence the namevariable-structure MM (VSMM) estimation.

State dependency of mode set. A key concept in VSMM estimation is the state dependency of amode set [191], [201],[192]. Simply put, given the current mode (and base state), the set of possible modes at the next time is a subset of themode space, determined by the mode transition law. Considertracking a car with three models: straight (m(1) = 1), left-turn(m(2) = 2), and right-turn (m(3) = 3). Initially it goes straight on a street atk = 1; it arrives at a four-way intersection atk = 10, where it could go straight or take a left or right turn; atk = 11, if it took a left turn atk = 10 the car could either gostraight or continue the left turn (making it a U turn); then it goes straight until it enters into an open space atk = 20, whereany motion pattern could be converted to any other. The state-dependent mode sets throughk = 20 are

S(1)1 = · · · = S

(1)9 = {1}, S(1)

10 = {1, 2, 3}, S(1)11 = {1}, S(2)

11 = {1, 2}, S(3)11 = {1}, . . . , S(i)

20 = {1, 2, 3}, i = 1, 2, 3

The sequence of possible mode sets throughk = 20 is

S20 = {S1, . . . , S10, S11, S12, . . . , S20} = {{1}, . . . , {1, 2, 3}, {1, 2}, {1}, . . . , {1, 2, 3}}

whereSk = ∪iS(i)k is the union of state-dependent mode sets atk. Note that the set of possible modes at timek depends on

modesk−1 in effect atk−1 and the base statexk−1 andxk. It was shown in [191], [201], [192] that an MM estimator cannotbe optimal if at some timek it uses a model setMk different from the mode spaceSk. As such, the use of a fixed model set,say,M = {1, 2, 3}, is clearly not preferable for this example.

Clearly, the state dependency of the mode set cannot be described by the mode set itself. That is why a graph-theoreticformulation of the MM estimation was proposed in [198], [201], [192], where a mode and a possible transition from one modeto another are represented by a node and a directed edge, respectively, resulting in a directed graph (digraph) as a representationof a mode set and its associated state dependency. This formulation has certain advantages, as elaborated in [201], [192], andis the basis of a class of VSMM algorithms [213], [195], [341].

The second generation abandons the constant mode assumption of the first generation; instead it imposes some type ofMarkovian property on the mode sequence. Somewhat similarly, the third generation abandons the constant mode-spaceassumption of the first two generations and explores the state dependency of the mode set.

Optimal VSMM estimation . As presented in [198], [191], [201], [192], the MMSE-optimal VSMM estimator is given by

xk|k = E[

E[xk|Sk, zk]|zk]

=∑

Sk

x(Sk)k|k P{Sk|zk} (38)

Pk|k =∑

Sk

[

P(Sk)k|k + (xk|k − x

(Sk)k|k )(xk|k − x

(Sk)k|k )′

]

P{Sk|zk} (39)

wherex(Sk)k|k andP

(Sk)k|k are the optimal estimate and its error covariance respectively at timek assuming that the true mode-set

sequence isSk, given by

x(Sk)k|k = E

[

E[xk|sk, Sk, zk]|Sk, zk]

=∑

sk∈Sk

x(sk)k|k P{sk|Sk, zk}

P(Sk)k|k =

∑

sk∈Sk

[

(x(Sk)k|k − x

(sk)k|k )(x

(Sk)k|k − x

(sk)k|k )′ + P

(sk)k|k

]

P{sk|Sk, zk}

where x(sk)k|k is the optimal estimate at timek assuming the true mode sequence issk, andP

(sk)k|k is its error covariance. The

summations in (38)–(39) are over all mode-set sequences such that every possible mode sequence is in one and only oneSk.

Note, however, that a mode-set sequence may contain more than one possible mode sequence.The optimal VSMM estimator in the MAP sense is given byxk|k = arg maxxk

f(xk|zk), where

f(xk|zk) =∑

Sk

f(xk|Sk, zk)P{Sk|zk} =∑

Sk

∑

sk∈Sk

f(xk|sk, zk)P{sk|Sk, zk}P{Sk|zk}

is a mixture density, each componentf(xk|Sk, zk) of which is itself a mixture density.RAMS approach. The optimal VSMM estimator is computationally infeasible. For most applications, its higher level with

multiplemodel-set sequences should be replaced, due to limited computational resources, by asingle(hopefully “best”) model-set sequence, obtained in practice throughmodel-set adaptationin a recursive manner. This is theRecursive Adaptive Model-Set(RAMS) approach[196], [198], [191], [201], [195]. In general, each recursion of a RAMS algorithm has two tasks:


• Model-set adaptationdetermines at each time the model set to use for the MM estimation, utilizing posterior informationcontained in the data as well as prior knowledge. This is unique for VSMM estimation. Different RAMS algorithms differfrom each other primarily with respect to how the model-set adapts.

• Model-set sequence conditioned estimationintends to provide best possible estimates given a model-set sequence. Itconsists of (a) initialization—assign initial probabilities to new models and initialize the filters based on them—whichisabsent in the first two generations, and (b) cooperation strategies and conditional filtering, similarly as in the first twogenerations.

Model-set adaptation. Model-set adaptation can be decomposed asmodel-set expansionandmodel-set reduction[196], [195].This decomposition has several significant advantages overnaive model-set switching in terms of tractability, performance, andgenerality [196], [221], [195]. Model-set expansion is often more important than model-set reduction: Although inclusion ofan impossible model is as bad as missing a possible model, theperformance of an MM algorithm will suffer greatly if a highlylikely model is missed, but only slightly if a highly unlikely model is included. As a result, a delay in including the correctmodel will always result in significant performance deterioration, while a delay in terminating an incorrect model usually doesnot incur great performance loss if the correct model is in the set. Unfortunately, model-set expansion is in general much moredifficult than model-set reduction.

Both expansion and reduction of a model-set require two functional tasks:model-set candidation, which determines candidatesets for expansion or reduction, andmodel-set decision, which selects and retains the best candidate set(s). Model-set candidationfor expansion amounts to activation or generation of a set ofnew models, which is the main task of each model-set adaptationalgorithm, discussed in Sec. VI-D. This candidation is mucheasier for model-set reduction.

B. Model-Set Decision Given Candidate Sets

Model-set decision may be formulated as a statistical decision problem, in particular, a problem of testing statisticalhypotheses in a sequential setting, which is natural since observations are available sequentially, and beneficial in terms ofdecision delay and threshold determination. Since hypothesis testing always assumes hypotheses are fixed, in this subsection,the true modes in effect and the model sets are assumed constant during the time period over which the test is performed.

Model-set likelihood and probability. Since the task is to decide on the right model set, the probabilities and/or likelihoodsof the model sets involved are naturally of major interest. The marginal likelihood of a model-setM at timek is the sum ofthe predicted probabilitiesP{m(i)

k |s ∈ M, zk−1} times the marginal likelihoodsf [zk|s = m(i), zk−1] of all the modelsm(i)

in M [196], [221], [195]:

LM

k∆= f [zk|s ∈ M, zk−1] =

∑

m(i)∈M

f [zk|s = m(i), zk−1]P{s = m(i)|s ∈ M, zk−1}

where zk is the measurement residual. Thejoint likelihood of the model-setM is defined asLkM

∆= f [zk|s ∈ M]. Let Λk be

the joint likelihood ratio ofmodel-setM1 to M2, which is equal to the product of model-set marginal likelihood ratios undermild conditions [196]

Λk ∆=

LkM1

LkM2

=∏

k0≤κ≤k

LM1κ

LM2κ

wherek0 is the test starting time. The (posterior) probability thatthe true mode is inM is defined as

µM

k∆= P{s ∈ M|s ∈ M, zk} =

∑

m(i)∈M

P{m(i)k |s ∈ M, zk} =

∑

m(i)∈M

µ(i)k

which is the sum of the probabilities of all modes inM, whereM is the union of all model sets under consideration, includingM as a subset. The mode probabilityµ

(i)k is available from an MM estimator usingM.

Several hypothesis tests were proposed in [196], [195], [212] and applied in [221], [217], [195], [204], [203], [219], [341]for model-set decision given candidate sets based on model-set likelihoods or probabilities.

Model-set decision given two model sets. Assumes ∈ M. Consider the problem of choosing between two model setsM1

andM2, that is, testingH1 : s ∈ M1 vs. H2 : s ∈ M2

in a sequential setting, where their unionM= M1 ∪M2 is used before a decision is made. Note thatM1 andM2 may includecommon models. This problem is solved optimally in [196], [195], [212] by the following model-set sequential likelihood ratiotest (MS-SLRT) for some thresholdsA andB: chooseM1 whenΛk ≥ B; chooseM2 whenΛk ≤ A; otherwise useM, go tothe next time cycle, ask for one more measurement, and continue to test. This test is optimal in the sense of making quickestdecisions with guaranteed decision error bounds

P{ChooseM2|s ∈ M1} ≤ α, P{ChooseM1|s ∈ M2} ≤ β, 0 < α, β < 1


for any givenα and β. Replacing the likelihood ratioΛk in the above by the probability ratioP k = µM1

k /µM2

k yields themodel-set sequential probability ratio test (MS-SPRT) [196], [195], [212]. It is optimal in the sense of making quickest decisionswith guaranteed decision error bounds

P{ChooseM2, s ∈ M1} ≤ α, P{ChooseM1, s ∈ M2} ≤ β, 0 < α, β < 1

The thresholdsA andB are given approximately byA = β1−α , B = 1−β

α . Clearly, MS-SLRT and MS-SPRT can be used toanswer such important questions as “Which model set is betterto use,M1 or M2?” and “Is it better to delete a subsetM1

from the current model setM?”. They are also basis for solutions of problems involving more than two model sets.Model-set decision given more than two model sets. Consider the problem of whether it is better to addoneof the model

setsM1, . . . , MN to the current setM. This can be formulated as testing

H0 : s ∈ M vs. H1 : s ∈ M1 · · · vs. HN : s ∈ MN

in a sequential setting, whereM is used before a decision is made. The following multiple model-set sequential likelihoodratio test (MMS-SLRT) was proposed in [196], [195], [212] as a solution to this problem:S1. PerformN MS-SLRTs simultaneously forN pairs of hypotheses(H0 : s ∈ M vs. H1 : s ∈ M1), . . . , (H0 : s ∈ M vs.

HN : s ∈ MN ). These tests areone-sidedin the sense thatH0 is never rejected, which is implemented by using thresholdsB = α

1−β andA = −∞. This step ends when only one of the hypothesesH1,H2, . . . ,HN remains. Specifically, reject

all Mi for which Λk ∆= Lk

M/Lk

Mi≥ B and continue to the next time cycle to test for the remaining pairs with one more

measurement until only one of the hypothesesH1,H2, . . . ,HN , sayHj , is not rejected.S2. Perform an MS-SLRT to testH0 : s ∈ M vs. Hj : s ∈ Mj , whereHj is the winning hypothesis in S1.

With a slight modification, this test can be used to answer other important questions, such as“ Is it better to deleteoneofthe model setsM1, . . . , MN from the current setM?”.

Probably the most versatile test developed so far for model-set decision is the mode-set probability sequential ranking test(MSP-SRT) [196], [195], [212]. It is based onranking of mode-set probability: At each timek, rank allNk of the model setsM1, M2, . . . , MN that have survived (i.e., not yet rejected or accepted) by time k as M(1), M(2), . . . , M(Nk) such that their

mode-set probabilitiesµM(i)

k∆= P{s ∈ M(i)|zk} are in a decreasing order:

µM(1)

k ≥ µM(2)

k ≥ · · · ≥ µM(Nk)

k

Then, a sequential decision is made by comparing a mode-set probability ratioP k with a pair of thresholdsA andB, whereP k = µ

M(i)

k /µM

k if the current model setM is involved, such as to answer the question “is it better to add/deletesome(unknown)of the model setsM1, M2, . . . , MN to/from M,” otherwise,P k = µ

M(i)

k /µM(1)

k , such as for the problems of choosingone, L(known), orsome(unknown) out of the model setsM1, M2, . . . , MN .

Alternative solutions were also presented in [196], [195],where the MS-SLRTs and model-set likelihood ratioΛk in MMS-SLRT are replaced by MS-SPRTs and mode-set probability ratio P k, respectively, andP k in MSP-SRT are replaced byΛk.In addition, a so-called multiple-level test was also presented in [196]. See [196], [195], [212] for more details, along withsimulation results for some simple models typically used for maneuvering target tracking.

These tests are general, intuitively appealing, computationally efficient, and easy to implement because they use onlymodel-set likelihoods or probabilities, which are available in MMSE-based MM algorithms if the model sets are already used. Notethat an adaptation of model set is accomplished whenever a model set other than the current one is accepted. As such, model-setadaptation requires a series of hypothesis tests.

C. MM Estimation Given Model-Set Sequence

In this subsection, it is assumed that the sequence of model sets has been determined by model-set adaptation. For simplicity,Mk andM

k are also used to denote the events{sk ∈ Mk} and{sk ∈ Mk}, respectively, and a perfect match between modes

and models is assumed.Initialization of new models and filters. A model is a new one if it is inMk but not inMk−1. Two important questions arise

naturally: (a) How to assign initial probabilities to the new models? (b) How to obtain initial estimates and error covariancesfor the filters based on the new models? Answers to these questions are essential for the implementation of any MM algorithmof a truly variable structure. Many heuristics and ad hoc treatments have appeared in the literature. In fact, the key to theoptimal initialization of new models and the correspondingelemental filters is the state dependency of the mode set, explainedin Sec. VI-A. As applied to model and filter initialization here with a single state estimate and mode probability, the optimalassignment of the initial probability to a new model accounts only for the probabilities of those models that may switch to it;and the optimal initial state estimate for a filter based on a (new or old) model is determined only from the estimates (and theprobabilities) of the filters based on those models that may switch to the new model.

After writing down formulas for the optimal initialization, it can be recognized that they are similar to those in the model-conditioned reinitialization (mixing) step of the IMM estimator (see Table II). This recognition leads to theVSIMM recursion


(Table III), presented in [193], [195]. It gives a generic recursion for VSMM estimation based on a time-varying model set.It was shown in [193] that the VSIMM recursion is optimal in the MMSE sense for a Markov jump-linear system under thefollowing two fundamental assumptions of the RAMS approach(of zero depth):

x(i)k|k = E[xk|m(i)

k , zk] = E[xk|m(i)k , Mk−1, z

k], P{m(i)k |Mk, zk} = P{m(i)

k |Mk, Mk−1, zk}, ∀m(i) ∈ Mk

and the linear-Gaussian assumption of the Kalman filter given the system mode.

TABLE III

VSIMM RECURSION.

1. Model-set conditioned (re)initialization [∀m(i) ∈ Mk]:predicted mode probability:

µ(i)k|k−1

∆= P{m

(i)k |Mk, Mk−1, zk−1} =

∑

m(j)∈Mk−1

πjiµ(j)k−1

mixing weight:µj|ik−1

∆= P{m

(j)k−1|m

(i)k , Mk−1, zk−1} = πjiµ

(j)k−1/µ

(i)k|k−1

mixing estimate:x(i)k−1

∆= E[xk−1|m

(i)k , Mk−1, zk−1] =

∑

m(j)∈Mk−1

x(j)k−1|k−1

µj|ik−1

mixing covariance:

P(i)k−1 =

∑

m(j)∈Mk−1

[

P(j)k−1|k−1

+ (x(i)k−1 − x

(j)k−1|k−1

)(x(i)k−1 − x

(j)k−1|k−1

)′]

µj|ik−1

2. Model-conditioned filtering [∀m(i) ∈ Mk]:

predicted state:x(i)k|k−1

∆= E[xk|m

(i)k , Mk−1, zk−1] = F

(i)k−1x

(i)k−1 + G

(i)k−1w

(i)k−1

predicted covariance: P(i)k|k−1

= F(i)k−1P

(i)k−1(F

(i)k−1)′ + G

(i)k−1Q

(i)k−1(G

(i)k−1)′

measurement residual:

z(i)k

∆= zk − E[zk|m

(i)k , Mk−1, zk−1] = zk − H

(i)k x

(i)k|k−1

− v(i)k

residual covariance:S(i)k = H

(i)k P

(i)k|k−1

(H(i)k )′ + R

(i)k

filter gain: K(i)k = P

(i)k|k−1

(H(i)k )′(S

(i)k )−1

updated state: x(i)k|k

∆= E[xk|m

(i)k , Mk−1, zk] = x

(i)k|k−1

+ K(i)k z

(i)k

updated covariance:P(i)k|k

= P(i)k|k−1

− K(i)k S

(i)k (K

(i)k )′

3. Mode probability update [∀m(i) ∈ Mk]:

model likelihood:L(i)k

∆= p[z(i)|m

(i)k , Mk−1, zk−1]

assume=

exp[−(1/2)(z(i)k

)′(S(i)k

)−1z(i)k

]

|2πS(i)k

|1/2

mode probability: µ(i)k

∆= P{m

(i)k |Mk, Mk−1, zk] =

µ(i)k|k−1

L(i)k

∑

m(j)∈Mk

µ(j)k|k−1

L(j)k

4. Fusion:

overall estimate: xk|k∆= E[xk|Mk, Mk−1, zk] =

∑

m(i)∈Mk

x(i)k|k

µ(i)k

overall covariance: Pk|k =∑

m(i)∈Mk

[

P(i)k|k

+ (xk|k − x(i)k|k

)(xk|k − x(i)k|k

)′]

µ(i)k

The VSIMM recursionautomatically initializes all new models and filters “optimally”: All new models are assigned theoptimal initial probabilities and the filters based on thesemodels are initialized with the optimal initial conditions(estimatesand error covariances). This VSIMM recursion is almost identical to one cycle of the IMM algorithm (compare Tables III andII). It is a natural extension of the IMM algorithm given a time-varying model set. It is extremely useful for VSMM estimationbecause of its cost-effectiveness, efficiency, and applicability: Other than the model setsMk andMk−1 and the transition lawamong their models, it requires exactly the same thing as theIMM algorithm does. This recursion is used in most VSMMalgorithms developed so far. Another nice feature of the VSIMM recursion is that as shown in [193], it uses the transitionprobabilitiesπij with respect to the total setM, rather thanMk or Mk−1. Were this not true, each possible model set wouldrequire a distinct design of the corresponding set of transition probabilities.

Fusion of two MM estimates. A question important for MM estimation is the following: Given two separate MM estimatesbased on two model sets, respectively, how to obtain the estimate based on all models in these two sets? For example, amodel-set adaptation algorithm may decide to add a set of models M2 to the current model setM1 after the estimates basedon model setM1 have been obtained. The solution to this problem is the following optimal fusion rule, presented in [193],


[195]. Consider two optimal MM estimators based on acommon model-set historyMk−1 but two distinct model setsM1 andM2 at timek, respectively:

{

x(i)k|k, P

(i)k|k, L

(i)k , µ

(i)k|k−1

}

m(i)∈M1

,{

x(i)k|k, P

(i)k|k, L

(i)k , µ

(i)k|k−1

}

m(i)∈M2

whereL(i)k andµ

(i)k|k−1 are model likelihood and predicted model probability respectively of modelm(i) in setM1 or M2. It

was shown in [193] that the optimal MM estimator based on model set M = M1 ∪M2 at timek and a common historyMk−1

is given byxk|k =

∑

m(i)∈M

x(i)k|kµ

(i)k , Pk|k =

∑

m(i)∈M

[P(i)k|k + (x

(i)k|k − xk|k)(x

(i)k|k − xk|k)′]µ(i)

k

where

µ(i)k

∆= P{m(i)

k |M, Mk−1, zk} =L

(i)k µ

(i)k|k−1

∑

m(i)∈ML

(i)k µ

(i)k|k−1

Note that for a common modelm(i) of the two sets, itsx(i)k|k, P

(i)k|k, L

(i)k , µ

(i)k|k−1 are identical for the two MM estimators (i.e.,

they do not depend on which model set is used atk). If M1 and M2 have a common modelm(j), thenµ(i)k above can be

obtained from all model probabilitiesµ(i|M1)k

∆= P{m(i)

k |sk ∈ M1, zk} and µ

(i|M2)k

∆= P{m(i)

k |sk ∈ M2, zk} of the two MM

estimators directly without knowledge ofL(i)k andµ

(i)k|k−1:

µ(i)k =

1

αµ

(i|M1)k , ∀m(i) ∈ M1, µ

(i)k =

µ(j|M1)k

αµ(j|M2)k

µ(i|M2)k , ∀m(i) ∈ M2

where

α =∑

m(i)∈M1

µ(i|M1)k +

µ(j|M1)k

µ(j|M2)k

∑

m(i)∈(M2−M1)

µ(i|M2)k

Most VSMM algorithms developed so far, including [221], [217], [213], [195], [207], [210], [341], use this optimal fusionrule.

D. VSMM Algorithms

MM estimation is believed to eventually develop into one of a“kit of tools,” represented by various variable- and fixed-structure algorithms. Development of good model-set adaptation algorithms is perhaps the most important task in VSMMestimation. As stated before, model-set adaptation consists of model-set candidation and model-set decision given candidatesets. Fairly general and satisfactory results have been obtained for model-set decision, but no such results are available oreven in sight for model-set candidation, which as a result, becomes the main task for each individual model-set adaptationalgorithm. In other words, different VSMM algorithms may have the same procedure to select the best set from the candidatesets but they differ from one another primarily in model-setcandidation, namely, how the candidate sets are determined.

An adaptive structureis a variable structure in which the structure varies via adaptation in real time. Many adaptive structuresare possible. They can be classified into two broad families,active model-setand model-set generation[195], depending onwhether the total model-set (i.e., the set of all possible models) can be specified in advance or not.

Active model-sets. In the active model-setfamily [195], the total model-set is finite and can be determined in advancebefore any measurement is received. At any given time it usesan activeor working subset of the total model-set determinedadaptively, hence the name. Its underlying idea is somewhatsimilar to that of the active-set method for constrained optimizationproblems: At each time some models may be terminated and others may be activated.

As outlined in [198], [201], [192],model-set switchingis one of the simplest classes of active model-set structures in whichthe active set is determined by switching among a number ofpredeterminedsubsets of the total model-set. These subsets are thecandidate sets for model-set adaptation. The switching canbe soft as well as hard, similar to soft and hard decision for outputprocessing. The soft switching assumes that each predetermined subset at any time has a certain probability of having a membermodel matching the true mode [221]. A hard switching is one based on a set of “hard” rules. The key task with this class ofstructures is the design of the model subsets, determination of the candidate subsets, and decision procedure for switching.Such a structure, calledmodel-group switching (MGS) algorithm, was presented in [221], [216] with a comprehensive designexample given in [217], [215], where each “group” represents a certain cluster of closely related system modes, hence thename. This MGS algorithm uses a two-stage hard switching: Inaddition to the current group, a candidate group is activatedfirst if deemed appropriate by a hard rule, the union of the groups is run, with the help of the optimal fusion rule of two MMestimates in Sec. VI-C, until a decision is made between the two groups by the sequential tests of Sec. VI-B. It runs only onegroup most of the time and thus provides a substantial savingin computation over afixed-structure(FS) algorithm using thetotal model-set, as demonstrated in [217], [215], [195]. Different groups may have common models, which facilitate group


switching and initialization of newly activated filters. Several designs of the model subset switching (see, e.g., [223], [149],[150], [217], [215], [170]) have been reported for maneuvering target tracking, along with illustrations of their superior costeffectiveness to the FS-IMM algorithm.

Another simple class of active model-set structures is called likely-model set(LMS) structure, outlined in [198], [201],[192]. Simply put, its active set is formed by deleting the models in the total set that are unlikely to match the true mode atthe given time. In order to follow the true mode that may jump,it must have a mechanism of expanding the active modelset, i.e., determination of the candidate sets. There are various possible ways of expansion. One of the most natural ideas isbased on the concept of the state dependency of the mode set (see Sec. VI-A) that given the current mode the set of possiblemodes at the next time is a subset of the mode space, determined by the mode transition law (i.e., the adjacency relationsof the modes). A simple implementation is the following, resulting in the so-calledlikely-model set algorithm[213], [195].Identify each model in the model setMk−1 in effect at timek − 1 to be unlikely (e.g., if its probability is below a thresholdt1), principal (if its probability exceedst2), or significant (if its probability is in betweent1 and t2). Then, the model setMk can be obtained as follows: (a) discard the unlikely ones; (b) keep the significant ones; and (c) activate the models towhich the principal ones may switch directly. Clearly, the model activation relies on the graph-theoretic representation of themodel set [198], [201], [195], as briefly mentioned in Sec. VI-A. The unlikely models inMk−1 are those whose ratios ofprobability to the largest one are below a certain thresholdand can be eliminated following the sequential ranking testof Sec.VI-B. Alternatively, a simpler but less accurate way is to delete all the models inMk−1 except theB models of the largestprobabilities, whereB is a constant, determined from computational considerations. As demonstrated in [213], [195] for amaneuvering target tracking example, this LMS algorithm issomewhat more cost effective than the MGS algorithm of [221]and substantially outperforms the FS-IMM algorithm. A simplified version of the LMS idea was proposed in [341], where themodel set used at any time is the state-dependent set of models that can be switched from a principal model, called minimalsub-model set in [341], including the principal model itself—the significant models are not necessarily kept and the unlikelymodels are not necessarily deleted. The principal model is identified as the one with the largest probability and likelihood orthe one that is closest to the estimated true mode at the time.Adaptation of the model set then amounts to switching amongthe state-dependent model sets, determined by the sequential test of Sec. VI-B. It is substantially more cost effectivethan thefixed-structure IMM algorithm, as demonstrated in [341].

Still another simple class is those with a hierarchical architecture. The active set in thishierarchical model-setstructureconsists of hierarchical levels of models [73], [292]. The makeup (i.e., model subset) of a lower level as a candidate setisdetermined under the guidance of the higher level(s). An MM estimator typically operates at each level but interactionsamonglevels are generally beneficial [292]. If some models that form one or more levels are generated (instead of activated) inrealtime, the corresponding hierarchical structure may be deemed to belong to the model-set generation family. Not all hierarchicalMM algorithms have an adaptive structure. For example, those proposed in [118], [343], [85], [202] are hierarchical MMalgorithms of a fixed structure since the model set used is time-invariant, albeit of a hierarchical structure.

Model-set generation. In the model-set generationfamily [195], new models are generated in real time and thus it isimpossible to specify the total model-set in advance.

A natural idea for model-set generation is to augment the working setMk of models by one (or more) that matches anestimatemk of the true mode at timek, leading to the so-calledestimated-mode augmentation. The augmented modelmk canbe an estimate of the mode under any optimality criterion in principle, such as the expected mode (conditional mean)mMMSE

k|k =

E[sk|sk ∈ Mk, zk] =∑

m(i)∈Mkm(i)P{sk = m(i)|sk ∈ Mk, zk}, resulting in theexpected-mode augmentation[207], [210],

the estimatemMLk|k = arg maxm f(zk|sk = m) that is the model with the largest likelihood, resulting in themaximum–likelihood

model augmentation[292], and maximum a posteriori estimatemMAPk = arg maxm P{sk = m|zk}, which is the model with

the largest posterior probability. A promising alternative is to augment the model set also by thepredictedmodes, such asmMMSE

k+1|k = E[sk+1|sk ∈ (Mk ∪ mk|k), zk], mMLk+1|k = arg maxm f(zk|sk+1 = m), andmMAP

k+1|k = arg maxm P{sk+1 = m|zk},to anticipate the next mode transition, leading to what can be calledpredicted-mode augmentation. As shown in [207], [210]theoretically, such an augmentation improves accuracy of MM estimation, which is supported by the demonstrations given in[186], [207], [210], [292] for a variety of applications. Since the estimated mode is added constantly to the working setMk ofmodels, the optimal fusion rule of two MM estimators of Sec. VI-C is instrumental here. More generally, the model setMk canbe augmented by two or more models using average modes over a (likely) subset ofMk [207], [210] or models with the largestlikelihoods or probabilities. Augmentations based on MMSE, ML, and MAP estimates have distinct characteristics. Whiletheexpected mode is in the convex hull formed by models inMk, this is not necessarily so for the MAP and ML mode estimates.MMSE-based augmentation is limited to the case where modelsm(i) are in the same vector space (and thus their sum ismeaningful) and depends on the current model setMk used, but allows clearly an iterationm[j]

k = E[sk|sk ∈ M[j−1]k , zk],

M[j]k = M

[j−1]k ∪ m

[j]k|k to improve the mode estimation. This is not the case for ML andMAP estimates. Note that the mode

estimates need not be obtained by filters based on models inMk. For example, an adaptive IMM algorithm for radar trackingof a maneuvering target was proposed in [186] that uses an acceleration model determined by a separate Kalman filter on topof a fixed set of models (i.e., augmented by this model).

Another natural and simple class of adaptive structure is the so-calledadaptive gridstructure, where a model is represented


by a grid point. As outlined in [198], [201], [192], it quantizes the mode space unevenly and adaptively. It usually starts froma coarse grid and adjusts the grid in real time based on data aswell as prior information. The grid adjustment usually includesa local grid refinement over one or more highly likely subsetsof the mode space. The possible locally-refined grids form thecandidate model sets here, which are not given explicitly though. This structure belongs to the model-set generation familyunless all models in all the grid levels can be determined in advance. The problem here is closely related to model-set design,where theoretical results of Sec. VII-A provide guidelines. Many practical schemes for adaptation of the grid are possible. Thealgorithms/designs of [185], [120], [241], [268], [105], [150], [339], [301], [114], [284] are examples of this structure, whereillustrations of their superior cost effectiveness to the FS-IMM algorithm were also given.

More specifically, adaptation of an initial coarse grid to a subsequent fine grid was proposed in [120] for an AMM algorithmcombined with the PDA filter. Also for an AMM algorithm, [241]presented a filter bank that moves over a predefinedfixed grid according to a decision logic, including five versions based on measurement residuals, expected mode, variationin mode estimates, mode probabilities, and error covariance, respectively. This moving-bank method was adopted in severalapplications [126], [296], [338]. [198], [201] suggested to employ the expected modemMMSE

k as the center of an adaptivegrid for an example of nonstationary noise identification. Aset of target acceleration models was proposed in [267], [268]where the acceleration of the center model is determined by an additional Kalman filter in a two-stage filtering setting (seeSec. 6.3 of [208]). It appears that the performance can be enhanced if the (conditional and/or fused) estimates from the MMestimator is utilized in the two-stage filter as well. [89] proposed to replace the above two-stage filter with a fuzzy Kalmanfilter characterized by a fuzzified process noise covariance. [105] proposed a moving set of CT models centered around onewith a turn rate determined by the magnitude of the acceleration divided by the speed of the target. In [149], [150], a setMk

of three CT models—left, center, and right—were made adaptiveby online adjustment of their assumed turn ratesωLk , ωC

k ,andωR

k centered aroundωCk based on their posterior mode probabilitiesµL

k , µCk , µR

k , with the expected turn rate taken to beωC

k+1; that is,ωCk+1 := ωk|k = E[ωk|zk, Mk] = ωL

k µLk + ωC

k µCk + ωR

k µRk . As pointed out in [207], [210], an alternative is to

useωCk+1 := ωk+1|k = E[ωk+1|zk, Mk] = ωL

k µLk+1|k +ωC

k µCk+1|k+ ωR

k µRk+1|k, whereµk+1|k are predicted mode probabilities.

As presented in [284], choosingωCk+1 := ωk|k or ωk+1|k and the correspondingωL

k+1 and ωRk+1 by a marginal model-set

likelihood ratio test yields improved performance. An enhancement of the motion and sizing of the moving bank was proposedin [339] based on a probabilistic discretization of the modespace locally centered aroundmMMSE

k using the probability of thenormalized measurement residual squaredz′(i)S

−1(i) z(i) of each elemental filter as a measure of the model-mode mismatch. The

“filter spawning” technique proposed in [114] for fault detection and estimation first detects a mode change (by a MAP test),decides on the direction of the new mode, and then locally refined models (grid points) are spawned along that direction withthe help of the current mode estimate. Performance superiority of these adaptive-grid structures to the correspondingfixedstructures were demonstrated in all the publications above. Two perturbation model based adaptive grid schemes were proposedin [302], [299], [300], [301] for ship and aircraft tracking. The first scheme uses a fixed grid but each elemental filter includesthe deviation (perturbation) of its assumed mode value fromthe true mode as a state variable and estimates it, resultingin defacto an adaptive-grid scheme. The second scheme differs from the first one in that the fixed grid is replaced by an adaptivegrid, where each grid point is updated by the corresponding estimated perturbation at each recursion. To avoid the mask ofthe model differences by their corresponding elemental filters, it was proposed in [230] to adjust assumed model parametersin real time to keep the inter-residual distance measureε′ijSijεij below a threshold, whereεij = z(i) − z(j) and Sij > 0 issome weigh matrix. [271] reported simulation results of automatic adjusting model parameters by an if-then rule for processnoise covariance and heuristic estimation of turn rate based on kinematics. [185] proposed a simplex-directed mathematicalprogramming scheme for an AMM algorithm, where the grid is formed by the vertices of a simplex and updated by certainrules (e.g., replacement of worst vertices by their mirror images and scaling up or down of the simplex) based on mode (vertex)probabilities. Other programming schemes are of course possible here, as discussed next.

Closely related to the adaptive-grid structures, still another class of variable structures relies on optimization techniques. Here,the problem is formulated as that of finding the optimal modelset given data. Although applicable to model-set adaptation, itis actually more suitable for model-set design, to be discussed in Sec. VII-A. Such an algorithm was proposed in [162] basedon the genetic algorithm (GA) [137]. This algorithm uses a population of n strings (chromosomes) of real or binary codes,M = {m(1),m(2), . . . ,m(n)}, where each stringm(i) (not necessarily an index) represents a possible model. Starting from arandom sample uniformly distributed over the mode space as the initial populationM0, it runs an AMM algorithm to obtainposterior model probabilities in each generation. The posterior model probability serves as the objective function, known asfitness function in GA. The next generation is produced by thegenetic operations of selection, crossover, and mutation.Thisprocess is repeated as desired. Selection is a process by which individual strings are tentatively selected as candidate parentsof the next generation based on their fitness values. It is an implementation of the “survival of the fittest.” Crossover (orrecombination) produces the new generation of strings withhopefully improved fitness by randomly selecting mating pairsfrom the tentative parent pool and crossing over of these pairs (e.g., crossover ofABDCE andabcde to generateABcde andabCDE). It guarantees the diversity and improvement of each generation and is generally considered the most important andrepresentative GA operation. Mutation is the occasional (with a small probability) random alteration of (the single digit of) thevalue of a string. Other less popular/fundamental GA operations [124], such as inversion, were not used in [162]. Although


numerous possibilities exist, no concrete information wasprovided therein as how the operations of selection, crossover, andmutation were implemented, except that the so-called biased roulette wheel selection was hinted where a string is selected atrandom with a probability proportional to its fitness. It wasdemonstrated in [162] via three simple examples that this GA-based algorithm converges to the true model in dozens of generations, each using only one measurement update and having apopulation of size10 (i.e., 10 models), although a more typical size is50 in most GA applications. The MAP estimate and theassociated error covariance were chosen to reinitialize all elemental filters; however, how the prior probabilities ofthe modelsin the new generation are assigned is not clear. GA was also suggested in [72] to update the parameters of the entire modelset in the context of “mixture of experts.” Other generally applicable optimization techniques (see, e.g., the survey of [175]),such as the popular simulated annealing and tabu search, arepotentially applicable here as well. The simplex-directedschemeof [185] and the recursive quadratic programming of [72], [73] also belong to this class in some sense.

It is possible to include one or more adaptive models or elemental filters, in addition to fixed models, within an adaptivestructure. The above estimated-mode augmentation is an example. This makes sense intuitively since the fixed models canobtain rapidly a rough initial estimate for the adaptive models, which fine-tune themselves automatically to yield accurateestimates. This leads to atwo-level adaptivestructure, meaning that both the models (or its elemental filters) and the modelsets are adaptive. It belongs to the model-set generation family in general.

These adaptive structures are easily implementable and substantially more cost-effective than the state-of-the-artsecond-generation algorithm. They are particularly suitable for different classes of problems and thus are complementary to eachother. Their combinations are certainly possible and may beadvantageous for certain problems.


A challenging application tackled very actively in the mostrecent years is tracking of ground targets, in particular, in a roadnetwork. This is usually aided by reports of a ground target motion indicator (GTMI). This problem is characterized mainlyby the presence of a large number of constraints on the targetmotion, depending on the target type as well as the terrainconditions, available in the form of topography information, such as road maps. The existence of these constraints requiresthe use of a model set that is too large for conventional FS-MMestimation algorithms. That is why the only application ofthe first two generations of MM algorithms known to us is that of an FS-IMM algorithm to tracking dim ground targets inheavy clutter without any road and terrain constraints by a ground based infrared search and track (IRST) sensor, reported in[38]. The actual mode set for any given target and transitions between modes are naturally time varying and state dependent,making the VSMM method ideally suited to this problem. To ourknowledge the VS-IMM solution is the only effectiveone to this problem, used by all contractors in the Affordable Moving Surface Target Engagement (AMSTE) study by U.S.Defense Advanced Research Projects Agency. A formulation of the problem and a comprehensive VS-IMM solution were firstpublished in [169], [170]. The specific design implemented the VS-IMM recursive algorithm (see Table III) with an individualmodel-set adaptation for each target based on its current and predicted state and the known topography of the surveillanceregion. Main situations that require addition or deletion of a model at each revisit time are: on road/off road motion, motion injunctions/intersections, road entry/exit conditions, road obscuration. Furthermore, [166] incorporated an additional “stopped-target” model into the total model set to cope with the possible “move-stop-move” evasion strategy of targets since the GMTIis incapable of detecting slow motion or stationary targets. Quite significant advantages of the VS-IMM algorithm over theFS-IMM algorithm were demonstrated by the simulations. Similar VS-IMM approaches and results were also presented in[305] and [272]. [79], [53] proposed to combine a VS-IMM estimator witha joint belief-probability data association approachto track, identify, and group multiple moving targets. VS was used to capture the behavior of highly maneuverable targetsthrough move-stop-move cycle, incorporating features such as motion constraints on road networks and high maneuver terrain.[9] employed particle filters into the framework of [170] to cope with the non-Gaussianity of the posterior densities whenconstraints are applied. The VS method was employed in [113]to handle another interesting ground target problem—monitoringthe motion of aircraft and vehicles in an airport area based on surface movement radar (SMR) data. Such surveillance is anessential part of airport movement guidance and control systems. Embedding map information was made by incorporatingan elegant kinematic constraint technique [309] (e.g., by using the curvature of a taxiway) in the constant-turn model withpolar velocity (see Sec. V.B.2 of [209]). Its comparison with the EKF and FS-IMM algorithm using synthesized and real datademonstrated again that VS is highly beneficial in terms of mode identification, accuracy, and computational savings.

VII. MM A LGORITHM DESIGN ISSUES

A. Model-Set Design

Model-set design and choice/decision are closely related but different. Model-set choice/decision deals with the problemof deciding which set is the best given a family of candidate model sets, discussed in Sec. VI-B.Model-set designdoes nothave a given family of candidate model sets in general. It determines the model set to be used for a given problem. Clearly,model-set choice can be viewed as an integral part of model-set design. Model-set design is the most important issue in theapplication of MM estimation. The performance of an MM algorithm for a given problem depends largely on the set of models


used and the primary difficulty in the application of the MM method is the design of the model set. Numerous publicationshave appeared in which ad hoc designs were presented, as surveyed in Secs. V-E and VII-C.

There are two types of model-set design: offline and online. Offline design is for the total model setM used by the firsttwo generations or the initial model set in VSMM estimation.In an FS-MM algorithm, the model set used cannot vary and isdetermined a priori by model-set design. In a VSMM algorithm, the model set in effect at any time is determined by model-setadaptation, discussed in Sec. VI-B, which may be viewed as anonline (real-time) design process. This section focuses onoffline model-set design.

General design methods. Model-set offline design was formulated in [219], [197] mathematically as a problem of approx-imating the true mode as a random variables with a certain distribution by a discrete random variablem (random model)with a certain probability mass function (pmf). The range ofthe variablem is the model set and the pmf is the initial modelprobabilities needed for MM estimation. Three general design methods were proposed in [219], [197] based on this formulation:minimum distribution-mismatch, minimum modal-distance,and moment matching.

The minimum distribution-mismatch designminimizes the mismatch (or distance)‖Fs − Fm‖ = maxx |Fs(x) − Fm(x)|between the cumulative distribution functionsFs(x) and Fm(x) of the modes and modelm. Given the number of modelsM = |M|, it was shown in [219], [197] that this method yields in the scalar case the optimal model setM = {m(1), . . . ,m(M)}such thatFm(m(i)) = 2i−1

2M = Fs(x)|x=m(i) , meaning thatm(i) can be determined as follows: Divide therange of the cdfFs(x) by 2M equal intervals and the value ofx such thatFs(x) = 2i−1

2M is then the optimal location ofm(i). This optimalmodel set has a uniform pmfpm(m(i)) = 1/M . A procedure that uses a minimal number of models given any tolerance onthe distribution mismatch was also given in [219], [197] forthe case wherem is a vector.

The minimum modal-distance designminimizes the distance‖s − m‖ betweens and m in the mode/model space, ratherthan in the space of distribution functions. The problem of (scalar or vector) quantization and data compression [121] is in asense a special case of the model-set design problem in this formulation. Significant theoretical results for this design werepresented in [219], [197], including the following conditions for a model set to be optimal and properties of an optimal modelset. Assume thatS = {S1, . . . , SN} is a partition of the mode space whereSi is covered by modelm(i) exclusivelyin thesense{s ∈ Si} = {m = m(i)}. Then, the following conditions hold for the optimality in the sense of minimizing distance‖s−m‖2 = E[(s−m)′(s−m)] (and some more general metrics): (a) Given any partitionS = {S1, . . . , SN} of mode space,a model setM = {m(1), . . . ,m(M)} is optimal if each modelm(i) is a centroid (mean) of the corresponding partition memberSi: m(i) = arg minm E[(s−m)′(s−m)|s ∈ Si]; (b) given any model setM = {m(1), . . . ,m(M)}, a partition is optimal if andonly if points in any partition memberSi are closer tom(i) than to any otherm(j) ∈ M; that is, a points must be assignedto its nearest neighborm(i) in M. Simply put, the optimal model set is within the class in which models are located at thecentroid (mean) of members of a nearest-neighbor partitionof the mode space. This result suggests iterative procedures tofind an optimal model set. For example, we may start with an initial partition of mode space; find a candidate of the modelset as the centroid of each partition member; use the nearest-neighbor rule to update the partition; and repeat this process.Alternatively, we may start with an initial model set; use the nearest-neighbor rule to obtain the corresponding partition; obtainan update of the model set as the centroid of each partition member; and repeat this process. This centroid model set that coverseachSi by its centroidm(i) = E[s|s ∈ Si] exclusively has several nice and intuitive properties. Forexample, the (random)model and mode have the same mean:E[m] = E[s]; the modeling error is orthogonal to the model:E[m(s − m)′] = 0; thecross power of the mode and model is equal to the power of the model E[m′s] = E[s′m] = E[m′m]. Many of these resultswere inspired by those in vector quantization and data compression.

The minimum distribution-mismatch and minimum modal-distance designs require knowledge of the distribution of the truemode, which is hard to come by in practice. Themoment-matching designmatches the moments of the model to those of themode, which is much more easily available. Lets and Cs be the mean and covariance matrix ofs. It was shown in [219],[197] that we can always use a model set with as few as but not fewer than rank(Cs) + 1 models to matchs andCs exactly.A set of concrete moment-matching designs was presented in [219], [197], including those with minimum number of models,those with symmetric pmfs, and those with equal inter-modeldistance (called diamond-set designs in [219], [197] because themodel locations form a diamond geometrically). In each of these designs, the probability mass and the location of every modelare determined. The simplest possible diamond-set design (with one at the center and six on the first layer) was implementedin [207] for an example of maneuvering target tracking usingMM algorithms.

Examples of some of these design methods can be found in [219], [220].As pointed out in Sec. VI-D, model-set design can be formulated as that of finding the optimal model set given data based

on optimization techniques. Such an algorithm was proposedin [162] based on the genetic algorithm [137]. Similar to thealgorithm described in Sec. VI-D, this algorithm uses a population of N strings (vectors), where each string isM -dimensional,representing a set ofM models. Starting fromN random samples uniformly distributed over the mode space asthe initialpopulation, it runsN AMM algorithms in parallel forK measurement updates to obtain the probabilities of all models in eachgeneration. The maximum model probabilityPMj

= maxi{P{s = m(i)|s ∈ Mj}, i = 1, . . . ,M} in each stringMj serves asthe fitness of the string. The next generation is produced by the genetic operations of selection, crossover, and mutation, asdescribed in Sec. VI-D. This process is repeated as desired.This algorithm was demonstrated in [162] to converge to the truemodel in40 generations, each over a batch ofK = 50 measurement updates. Note that this GA-based method is applicable


only to the case with aknown, fixednumber of models. As pointed out in Sec. VI-D, other generally applicable optimizationtechniques, such as simulated annealing and tabu search, are potentially applicable here and some would allow a variablenumber of models. A key issue here is the choice of objective (fitness) function for optimization. Many objective functionsare possible, particularly those discussed in [219], [204], [203], [220]. The use of model probabilitycalculated within eachmodel setas the fitness function in [162] does not appear desirable since the probability of a model is relative only to otherswithin the set and thus is meaningful for comparison only within a model set but not across different sets. For example,m(1)

in the setM1 = {m(1),m(2),m(4)} may have a larger probability thanm(3) in the setM2 = {m(1),m(2),m(3)} even if m(3)

is closer to the true model:P{m = m(1)|m ∈ M1} > P{m = m(3)|m ∈ M2} > P{m = m(1)|m ∈ M2}. A simple way outis to calculate model probabilities over the union of the model sets, obtainable from the model likelihoods in theN AMMestimators, rather than within each model set.

Guidelines for model-set design. Clearly, many criteria/measures for model-set design arepossible and their choice isimportant. An array of such criteria and measures were proposed and discussed in [219], [204], [203] for different purposes ofMM estimation. These include:‖x − xM‖ for base-state estimation, wherexM is an estimate of the base statex using modelset M; ‖s − s‖ for mode estimation, wheres is the estimate of the modes; rates (probabilities) of correct, incorrect, andno identification for mode identification;‖ξ − ξM‖ for hybrid-state estimation, whereξM is an estimate of the hybrid stateξusing model setM; and more general information-theoretic measures based onthe Kullback-Leibler information and mutualinformation.

One of the most natural and simplest measures is‖x − xM‖ for base-state estimation, formally introduced in [191], [201]for model-set design. More theoretical results for model-set design are available based on this measure than on other measures.Let xM = E[x|z, S = M] be the optimal MM estimators assuming model setsM is the optimal model set, wherez is thedata and will be omitted below for simplicity. Given an arbitrary model setM and let D = (M − S) ∪ (S − M) be itssymmetric difference from the optimal setS. Note first that it follows from (19) that‖x − xB‖ ≤ ‖x − xA‖ if and only if‖xS − xB‖ ≤ ‖xS − xA‖, wherexS is the optimal MM estimator using the mode spaceS and‖u− v‖2 = E[(u− v)′(u− v)].With this definition of the norm,‖x− xB‖2 is actually the mean-square error (mse) ofxB. It was shown in [191], [201], [192]

that ‖xS − xM‖ = |1 − c| ‖xS − xD‖, wherec = P{s=m(i)|s∈M}P{s=m(i)|s∈S} for any modelm(i) common toM andS, which implies that

use of too many models is as bad as use of too few models. Moreover, consider the problem of adding an arbitrary model setA to another arbitrary model setM without overlap (i.e.,M ∩ A = ∅). Let M

′ = M ∪ A. It was shown in [191], [201], [192]that setM′ is better than setM in the sense thatxM′ has a smaller mse thanxM if and only if

r <

√b2 cos2 θ + 1 − b2 − b cos θ

1 − b(40)

whereb = P{s=m(i)|s∈M′}

P{s=m(i)|s∈M} for any modelm(i) in M, r = ‖xS−xA‖‖xS−xM‖ , andcos θ = (xS−xM)′ (xS−xA)

‖xS−xM‖ ‖xS−xA‖ . Note that (40) describes a ball

of radius1/(1− b) centered at(− b1−b , 0, 0, . . . , 0) if xM and xS are placed at(1, 0, 0, . . . , 0) and(0, 0, 0, . . . , 0), respectively.

A simple example was given in [219], [220] that illustrates how this result, which requires knowledge of the optimal MMestimatorxS, can be used in practice. The above result holds even ifM

′ is not a subset of the mode spaceS. In the case withM

′ ⊂ S (e.g., whenM′ is a set of discrete points of a continuous mode spaceS as a parameter space), as shown in [207],

setM′ is better than setM (i.e., addingA to M is better) in the sense thatxM′ has a smaller mse thanxM if and only if theposterior probability of the model setA is below a threshold:

P{s ∈ A|z} <2‖x − xM‖2

‖x − xM‖2 + ‖x − xA‖2

This condition always holds if‖x − xA‖ < ‖x − xM‖, which should be the case ifxA =: xs = E[x|z, s = s] (i.e., augmentM by A = {s}). This result provides a theoretical support for the estimated-mode augmentation (see Sec. VI-D) for VSMMestimation, as presented in [207], [210].

In order to apply the MM method to problems with uncertainparameters over spaceS, two important questions are: (a)which quantity is best selected as the estimatee (i.e., the quantity to be estimated) and (b) how to quantize the parameterspaceS. The following general guideline was presented in [194] forestimatee selection: If the ultimate goal is to estimate aparameters, which is related to another parameterp nonlinearly, then a model set{s1, . . . , sM} in the space ofs is superiorto a model set{p1, . . . , pM} in the space ofp even if p has a better physical interpretation. For the second question, aprocedure to determine the choice of the quantization points M = {m(1), . . . ,m(M)} was presented in [306], given the numberof quantization pointsM . It minimizes J(M) =

∫

S‖x − xM‖2

W f(s)ds, whereW is a weighting matrix, specified by thedesigner, and the pdff(s) = 1/

∫

Sds. The resultant choice is optimal in the sense of having the minimum average weighted

mse for the true mode over the spaceS. In the Gaussian case, this vector minimization problem canbe solved numerically in astraightforward fashion. An example was given in [306] thatdemonstrates its superiority to several heuristic choices, includingthe simple, popular uniform quantization scheme.

Caution must be exercised in model-set design. For example,there should be enough separation between models so that theyare “distinguishable,” “observable,” or “identifiable.” This separation should well exhibit itself in the measurement residuals


[145], especially between the filters based on the true modeland mismatched models. Otherwise, the MM estimator will notbe selective in terms of choosing the correct model because the residuals have a dominant effect on estimation results. Anecessary condition for the effective performance of MM estimation was presented in [70] for a stochastic linear time-invariantsystem with an uncertain parameter. For a single-input single-output system with uncertain input bias, thedc gain Gdc ofthe system transfer function from the input to the output (measurement) must be nonzero. This makes sense intuitively. Thesteady-state output (measurement residual)z

(i)ss is proportional to the dc gain times the bias difference (as the step input), which

is the actual input biasb minus the input biasb(i) assumed in filteri: z(i)ss = (I − B)Gdc(b − b(i)) with B depending on the

steady-state Kalman filter gain for filteri, whereI is the identity matrix. For the case of uncertain system matrix parametersbut known inputu, z

(i)ss = (I −B)[Gdc −G

(i)dc ]u, whereG

(i)dc is the input gain for filteri. For a multiple-input multiple-output

system, the necessary condition becomes that each column ofthe dc gain matrix (difference) must have at least one nonzeroelement. Other relevant results can be found in, e.g., [23].Such results are beneficial for performance enhancement of MMestimation, such as those presented in [240].

Model efficacy. Each model has a certain effective coverage region of the true mode within the model set in use. Knowledgeabout such relative efficacy is quite useful in model-set design. A concept ofrelative efficacyof a model in terms of its coverage,along with its quantitative measures, was introduced in [219]. More specifically, a “window” functionwi(x) was introducedto quantify the efficacy of modelm(i) in covering the true modes = x relative to other models in the set, wherewi(x) is afunction of x. The largerwi(x) is, the more effective the modelm(i) is (relative to other models in the set) givens = x. Twoversions ofwi(x) were defined in [219]. Consider a model setM = {m(1), . . . ,m(M)}. A probability-based efficacy of modelm(i) in M is wi(x) = P{m = m(i)|s = x,m ∈ M}; that is,wi(x) is the probability that the random modelm will take onthe valuem(i) given that it has to take on a value inM and the fact that the true modes is equal tox. Its effect on modelprobability is clear throughP{m = m(i)|m ∈ M} =

∫

wi(x)fs(x)dx, wherefs(x) is the pdf of the true mode. Alternatively,the testing-based model efficacy, defined bywi(x) = P{Hi not rejected|s = x}/L, is the probability thatHi is not rejectedby an (optimal) test givens = x divided by the numberL of hypotheses that are not rejected at the end of the test for thehypothesis testing problem

H1 : m = m(1) vs. · · · vs. HM : m = m(M)

using all available data. This definition is theoretically equivalent to but implementationally advantageous than thedefinitionwi(x) = P{acceptHi|s = x}. A simple example of these two model efficacies can be found in[219].

B. Determination of Transition Probabilities

Theoretically, post-first-generation MM estimators assume that the transition probability matrix (TPM) governing the modejumps is completely known. In target tracking, however, it is practically unknown, since it depends critically on the unknowncontrol inputs, or worse, the mode sequence is not really Markov. The determination (design, tuning, adaptation) of theTPMamounts to identifying a Markov transition law that “best” fits the unknown truth, similar to that of the process noise covarianceQ in the Kalman filtering. Fortunately, the performance of MM estimation is not very sensitive to the choice of the TPMprovided it is not too far off; but to a certain degree this choice provides a trade-off between the peak estimation errorsat theonset and termination of a maneuver and the steady-state errors during CV motion (see, e.g., [199], [21]).

Offline design. Traditionally, the TPM has been considered in tracking as adesign parameter chosena priori. Numerousdesigns and tuning results have been reported in the literature. Most of them are ad hoc, but some are more or less systematic,including those proposed and studied in [43], [18], [68], [39], [254], [62]. A simple design of TPMΠ = (πij)M×M , appearedas early as in [264], [257], [123] and used by many, isπii = q, πij = 1−q

M−1 , i, j = 1, . . . ,M directly in discrete time for somelargeq (e.g.,q = 0.97 [11]). Such a designdirectly in discrete timeis questionable for a discretized system with a nonuniformsampling (revisit) interval since the TPM depends on the sampling intervals as well as target behavior (in continuous time). Amore systematic design is based on modeling the Markov jump process in the continuous time [43], [68], [39]. It follows fromthe forward and backward Kolmogorov equations [276] thatΠ(T ) = eΛT , whereΛ = (λij)M×M is the transition density matrixof the process, defined similarly as forΠ, with λii < 0, λij > 0, i 6= j and

∑Mj=1 λij = 0. The diagonal elementsλii of Λ and

themean sojourn timeτ i of modem(i) are related byλii = −1/ τ i since the sojourn timeτ i of a state (mode)m(i) of a Markovjump process has an exponential distribution with parameter −λii. Its direct discrete-time counterpart isπii = 1−1/τ i, whereτ i is in terms of discrete time [18], [19]. Fromλii = −1/ τ i andΠ(T ) = eΛT it follows approximately thatπii = e−T/τi ,which is more widely used, such as in [60] for the design of an IMM-based ATC surveillance system and in [155] for TPMadaptation in a two-model IMM tracker with adaptive sampling for the first benchmark problem [48]. Another design, used in[161], is πii = 1− T

τ i, which is in fact the above direct discrete-time design but with τ i in continuous time and is equal to the

linear approximation of the approximate continuous-time model πii = e−T/τ i . This was modified toπii = max{qi, 1 − Tτ i}

in [199], [19], whereqi is the minimum probability for modem(i) to stay on, as opposed to jumping to any other mode.The choice ofqi = µ

(i)∞ was suggested in [39], whereµ(i)

∞ is the steady-state probability of modem(i), independent of theinitial mode. As presented in [68],Π can be designed as follows ifµ(i)

∞ is known for each mode. First determine (numerically)Λ from the relationshiplimT→∞(eΛT )′ = Π(∞)′ = [µ∞, µ∞, . . . , µ∞], whereµ∞ = [µ

(1)∞ , µ

(2)∞ , . . . , µ

(M)∞ ]′, and then


Π = Π(T ) = eΛT for the given sampling intervalT . This method was used in [68], [39] for different IMM configurations witha nonuniformT for air defence system applications. Compared with those based on mean sojourn timeτ i above, this methodhas pros and cons: It obtains off-diagonal elementsπij as well as diagonal elementsπii, but it relies on knowledge ofµ(i)

∞ ,which is often harder to come by thanτ i, and an asymptotic relationship, based on which results areusually less accurate.This reliance on the asymptotics can be replaced byΠ(T ′) = eΛT ′

if Π(T ′) is known for someT ′. Note that a simple use ofπij andπii from the above two methods would usually violate the requirement

∑Mj=1 πij = 1. It would be better to combine

them by solving (approximately, if needed)limT→∞(eΛT )′ = (µ∞, µ∞, . . . , µ∞), or Π(T ′) = eΛT ′

if Π(T ′) is known, forΛwith λii = −1/ τ i, ∀i, if possible.

Online adaptation. The offline design, being completely a priori in nature, does not provideestimatesof the TPM usingonline data. In some cases, prior information about the TPM may be inadequate or even lacking. The “unreasonable” needto provide TPM a priori even in the case of insufficient information has been cited by some as one of the main reservationsabout using Markov-chain based MM estimation algorithms (see, e.g., [114]), which involve the TPM. A number of algorithmshave been proposed recently in [151], [153], [152], [91] foronline estimation of the TPM. These algorithms are naturallyand easily incorporable into a typical MM (e.g., IMM) estimation algorithm, resulting in TPM-adaptive MM estimation. Morespecifically, [151], [153], [152] developed a Bayesian framework and proposed several suboptimal algorithms for recursiveMMSE estimation of the TPM starting from an initial estimateΠ0. Among them, the most cost-effective one—the so-calledquasi-Bayesian estimator—assumes a Dirichlet prior distribution of the TPM and its recursion is given by

g(ij)k = 1 +

1

µ′kΠk−1Lk

µ(i)k [L

(j)k − π

(i)′k−1Lk], α

(ij)k = α

(ij)k−1 +

1∑M

j=1 α(ij)k−1g

(ij)k

α(ij)k−1g

(ij)k , π

(ij)k =

1

k + 1α

(ij)k

whereLk = [L(1)k , . . . , L

(M)k ]′ and µk = [µ

(1)k , . . . , µ

(M)k ]′ are vectors of model likelihoods and probabilities, respectively.

This algorithm was shown in [151], [153], [152] to have fairly reasonable performance at an almost negligible computationalexpense. Also adopting the Dirichlet prior, [91] derived recursive hybrid estimation schemes with an unknown TPM by obtainingposterior marginal pdfs of the base and modal states analytically. Note that these online adaptation algorithms are more generallyapplicable than the offline designs.

It is intuitively appealing to combine offline design with online adaptation to take advantage of both prior knowledge andonline information: The a priori designed TPM is refined by the online TPM adaptation using online data from the currentscenario; the adaptation may be slow if the prior TPM is (nearly) “noninformative” (e.g., uniformly distributed) but could bespeeded up by a good initial TPM.

C. Various MM Designs and Performance Studies

Successful application of any particular MM algorithm to a real-world maneuvering target tracking problem largely amountsto design of the model set, ad hoc adjustment of the algorithm, tuning of parameters, and choice of the best trade-off variant byperformance evaluation and comparison. The tracking literature is abundant in various studies on model-set design, parameterchoice/tuning, and performance evaluation/comparison. Many of them are generic enough to be applied to a wide range ofproblems and situations, although a “universal” approach best for all applications is impossible. Here we give a brief reviewof these more generic designs. More problem-specific implementations were addressed in Secs. IV-E, V-E, and VI-E.

Most MM algorithm designs follow two basic ideas concerninghow maneuvers are modeled:parametricand structural.Parametric designs select one or more parameters to represent the effect of the target maneuvers; each model is characterizedby a quantized value of the parameter(s). Structural designs use models of different structures to describe different maneuvers.All designs include one or more CV models. The reader is referred to [209] for model details.

Parametric designs. Typical parameters to be quantized are input (acceleration), process noise level, and turn rate. Mostcommon examples are: exponentially correlated (ECA) Singer model with constant mean levels [257], [123], [263], second-order linear kinematic model with multiple acceleration levels (see, e.g., [11]), second-order linear kinematic model withmultiple process noise levels (see, e.g., [362], [293]). Inparticular, [257] designed a linear target motion model, with unknowninput uk quantized into several known levelsu(i). The corresponding GPB1 algorithm reduces to a single Kalman filter withinput uk, which is a probabilistically weighted sum of the known input levels u(i) (see Sec. 5.3.8 of Part IV [208]). Thistechnique has been used later by many others. It is also applicable in principle when the input level is over a continuum ifthe transition pdf between any pair(uk−1, uk) is known. This, however, involves integration in general toobtain the averageuk. Assuming the transition pdf between(uk−1, uk) is Gaussian anduk−1 is Gaussian distributed, the averageuk can beobtained in a close form, and such a form was obtained in [187]for a constant-position model withu as velocity bias. [11],[10] investigated a 2D design consisting of one CV and 12 CA models with acceleration values distributed symmetrically overa 2D region centered at zero and bounded by40m/s2. The tracking performance was found to depend on|∆a|T 2/σ, where|∆a| is the quantization step,T the sampling time, andσ2 the measurement noise variance. This design has later been usedin many theoretical and comparative studies (see, e.g., [223], [217], [213], [207]). A shortcoming associated with quantizationof the input acceleration is that many models/filters are needed to cover even a moderate range. A much smaller number ofmodels are needed if instead the process noise level is quantized [362], [293].


Structural designs. Structural and hybrid (structural/parametric) designs can be much more efficient than quantization only.Typically, CA and/or CT models are used here. Most common examples are: CV-CA [16], [18], CV-CT (see, e.g., [101], [21]),and CV-CA-CT [36]. More specifically, [16] presented two IMMconfigurations with two models (CV-CA) and three models(CV-2CA) (denoted as IMM2 and IMM3), respectively, and demonstrated that they beat the input-estimation method [64]significantly in accuracy and dramatically in computation.[140] suggested the inclusion of a CA model with higher processnoise to cover the transitions between CV and CA motions and provide a faster response. [52] introduced an innovative model,exponentially increasing acceleration (EIA) particularly suitable for fast maneuver detection and demonstrated that a CV-EIA-ECA (Singer model) IMM algorithm improved the accuracy achieved by the IMM2 during maneuvers. Explicit modeling ofCT motion was proposed in [101], [102], where two model sets were designed, one CA-2CT (left/right turns with known turnrates) and the other CV-2DCT (with estimated turn rate). Using explicit CT models proved to be very beneficial for precisiontracking during turns. Further significant enhancements were presented in the series of papers [346], [347], [348]. Whileutilizing the EIA for speedy maneuver responses, the proposed IMM configuration includes a 3D CT model, implemented viathe kinematic constraint (KC) method [4], to provide constant speed prediction. The resulting sophisticated CV-EIA-3DCT/KCtracker outperformed considerably the CV-2DCT of [101] in tracking non-horizontal planar maneuvers [346]. These and manyother comprehensive studies of various IMM designs, proposed by Blair and Watson [46], [51], [43], [44], [351], [349],preceded and led to the solution of the second benchmark problem proposed in [350]. Solutions to the benchmark problemswere discussed in Sec. V-E and more discussions are given in asubsequent part.

Although using only three filters the above designs are computationally involved since nonlinear and/or correlated modelsare used. If computational load is of a great concern, a viable alternative is the interacting multiple bias model (IMBM)scheme[42], [345], [344] (see also [243]). The main idea is to modelthe maneuver acceleration as an isolated system bias and employthe two-stage (bias free + bias only) reduced-order estimator [117] rather than the complete Kalman filter for the augmentedstate (including the bias state). (See [208] for a survey anddiscussion of issues associated with the two-stage filtering.) Itstwo-model (CV-Bias) version, referred to as interacting acceleration compensation (IAC) algorithm, was demonstrated in [345]to achieve about 50% reduction in computation as compared tothe CV-CA configuration with similar performance if the datarate is high enough.

A comprehensive study reported in [68], [37], [39], [36] evaluated configurations with CA, CV, ECA-Singer models andseveral new models, including a horizontal CT model with polar velocity [122] and a 3D version with two additional states(velocity elevation angle and its rate) [66]. Comprehensive simulations over a great variety of scenarios showed that thehorizontal CT model combined with decoupled altitude filtering performed slightly better than the complete 3D model overall.The former was included into the IMM-MHT solution to the second benchmark problem (see Sec. V-E). [32] includes anevaluation of an IMM design with normal and tangential accelerations within its proposed 2D curvilinear model (see [209]).

Hybrid designs. Parametric and structural designs can of course be integrated, leading to hybrid designs, which involveboth models with different structures and quantized parameter(s) within structures. Examples of hybrid designs include: CVand CA models with multiple process noise levels (see, e.g.,[18], [21]) and CV and CT models with multiple turn rates (see,e.g., [101], [199], [18]).

Additional references concerning design, performance evaluation/comparison, and/or other aspects related to IMM configu-rations for maneuvering target tracking include [67], [159], [88], [41], [165], [274], [135], [298], [143], [31], [144], [290].

VIII. N ONSTATISTICAL METHODS

Multiple-model approach is a general methodology, not limited to the probabilistic/statistical setting of the previous sections.Many nonstatistical methods have been proposed (see, e.g.,[269]). In this section, we discuss only those that have beenproposedfor maneuvering target tracking. They are based on alternative means of modeling and handling of the target motion uncertainty,including evidential reasoning [363], neural networks [72], [73], [6], fuzzy logic [244], [89], [369], genetic algorithms [29],[162], and deterministic algorithms [278], [5], [135].

Much of the pros and cons of the statistical methods of MM estimation stems from their reliance on the total probabil-ity/expectation theorem and the Bayes’ rule. These theorems require partitioning of the probability space (i.e., one and onlyone member is true) even when information is not sufficient. In other words, they force us to overstate the degree of certaintywhen evidence or knowledge is actually incomplete or subjective, and thus the claim of probabilistic exactitude or optimality isactually more or less artificial. This is probably the main weakness of the probabilistic formulation. Its main strengths include:they are rigorous, systematic, and particularly suitable for sequential processing. Nonstatistical approaches offer a variety ofalternatives with distinctive flexibilities that are valuable for handling many real-world problems. For example, Dempster-Shaferreasoning overcomes the partitioning limitation of the Bayesian methods by allowing representation of neither exclusive norexhaustive hypotheses of posterior evidence. As such, in the context of MM estimation it may offer a framework potentiallybetter to handle possible model-truth mismatch. Like othernonprobabilistic approaches, however, it does not providea way offusing old knowledge and new information as the Bayes’ rule does in the probabilistic setting. As a result, not surprisingly weare not aware of any effective nonstatistical methods for conditional filtering, which requires sequential processing(fusion) ofold evidence and new data with perfect knowledge of the underlying model.


Nonstatistical methods can be applied to all the other components of MM estimation: model-set determination, cooperationstrategy, and output processing. The use of genetic algorithms for model-set determination (adaptation and design) has beendiscussed in Secs. VI-D and VII-A. Application of nonstatistical methods to output processing is most natural (see, e.g., [72],[73], [244], [369]) since it amounts to fusion of evidence obtained at the same time. By the similarity between output processingand (hard and soft) decision based cooperation strategies,these methods can also be used for cooperation strategies, althoughwe are unaware of such use in the literature.

[363] proposed an approach that integrates maneuver detection with two-model MM estimation based on Dempster-Shaferevidential reasoning (see, e.g., [304], [39]). Consider sequential testing of no-maneuver (H0) against maneuver (H1) hypotheses.The belief Bel(Hi) and plausibility Pls(Hi) of Hi are obtained by an extension of the Dempster-Shafer theory.They can beinterpreted as lower and upper probabilities, respectively, in that probability is an interval (not a single number)P{Hi} =[Bel(Hi),Pls(Hi)].22 The interval length Pls(Hi)−Bel(Hi) reflects residual ignorance in our knowledge. If Bel(H0) > Pls(H1),or equivalently (for two-model case), Bel(H0) > 0.5, the target is deemed not maneuvering, its estimate is basedon thenonmaneuver model alone, and a maneuver onset detector is turned on, which declares a maneuver onset if Bel(H0,H1) >Pls(H0,H0). If Bel(H1) > Pls(H0) or equivalently Bel(H1) > 0.5, the target is deemed maneuvering, its estimate is based onthe maneuver model alone, and a maneuver termination detector is turned on, which declares a maneuver onset if Bel(H1,H0) >Pls(H1,H1). In other cases (i.e., when the intervalsP{H0} andP{H1} have an overlap), including after a maneuver onsetor termination is declared, the target state estimate is theweighted sum of estimatesx(0) and x(1) from both models usingtheir normalized belief values as weights

x = [x(0)Bel(H0) + x(1)Bel(H1)]/[Bel(H0) + Bel(H1)]

Given new measurements, the belief is updated by the Dempster’s rule of combination. By the random-set theory, however,this rule holds only if the bodies of evidence being combinedare independent, which is actually not the case here. Earlier,this way of integrating hard decision and soft decision was proposed in [327] in a probabilistic setting.

[72], [73] considered both AMM and VSMM estimation by means of the so-called “mixture-of-experts” system, gainedpopularity in the neural network literature recently. Hereeach conditional filter (rather than neural network) is viewed as anexpert. The overall estimate is a weighted sum (mixture) of the output of all experts. The weights are computed by thesoftmaxoperationµ

(i)k|k−1 = ez′

ka(i)

/∑

i ez′ka(i)

, which is a differentiable version of the “winner takes all”strategy, meaning that thefilter (expert) with the best performance will have a weight close to unity. Herezk is measurement and the internal weightvectora(i) is updated by a steepest descent search with a step size (learning rate)η: a(i) := a(i) + η(µ

(i)k − µ

(i)k|k−1)zk, where

µ(i)k follows from µ

(i)k|k−1 and likelihood by the Bayes’ rule as in the IMM algorithm (seeTable II). A hierarchical version was

also explored in [73]. The simulation results presented seem to suggest that this estimator is slightly better than the probabilisticAMM algorithm in terms of response time to mode jumps as well as best filter selection when the truth is not in the modelset. Its performance, however, is sensitive to the design parameterη and its competitiveness relative to the IMM algorithm isnot known.

[244], [369] proposed to use fuzzy weights as an alternativeto the probabilistic weights in the IMM algorithm. In [244],thespace of true modes (e.g., accelerationa [244] or turn rateω [369]) is quantized to obtain a mode set{m(1), . . . ,m(M)}. Acorresponding set of (Gaussian or triangular) membership functionsµ(i)(s) ∈ [0, 1], centered atm(i), is designed as a measureof the “validity” of each modelm(i). A mode estimatesk = ak|k is obtained by an independent CA filter. The model weights

are then computed asµ(i)k = µ(i)(sk)/

∑

j µ(j)(sk) and an IMM algorithm is run using these weights as if they werethe

posterior probability weightsP{m(i)k |zk}. Such total reliance on the not-so-accurate estimatesk = ak|k for weight update

is undesirable. The inferior performance of this fuzzy variant relative to the standard IMM algorithm, as indicated by thesimulation results, does not justify the extra design effort required here. [369] differs from [244] by using ad hoc fuzzy if-thenrules to obtainµ(i)(s) based on normalized measurement residual squared. Note that these fuzzy weights do not fit particularlywell into the IMM configuration; they can be used equally well/poorly for any other merging-based MM estimation algorithm.As mentioned in Sec. VI-D, [89] proposed to use a fuzzified process noise covariance in the Kalman filter to provide anacceleration estimate for grid adaptation in VSMM estimation; simulation results of automatic adjusting model parameters byan if-then rule for process noise covariance were reported in [271].

As a side effect of their flexibility, many nonstatistical methods are more susceptive to misuse or abuse than statisticalmethods. For example, a main weakness of these nonstatistical methods stems from their lack of solid, systematic weightupdate, while the statistical methods have a built-in solidmechanism for sequential update of the weights thanks to theBayes’rule. The above fuzzy variants rely on ad hoc heuristics for weight update, while the “mixture of experts” of [72], [73] hadrecourse to an optimization (search) algorithm with the help of the Bayes’ rule.

A two-model MM algorithm was derived in [5] using a deterministic approach. As pointed out in [135], however, an IMMalgorithm with uniform transition probabilitiesπij = 0.5, ∀i, j has the same state estimate formula but a superior formula forthe error covariance.

22For instance, we may think the probability of raining tomorrowis in between50% and70%.


IX. CONCLUDING REMARKS

The MM estimation approach provides state-of-the-art solutions to many maneuvering target tracking problems. There arebasically two directions to improve the existing solutions.

The first one is to design a better set of models. Numerous publications have appeared in which various ad hoc designs werepresented. This will certainly continue. It is extremely challenging to obtain effective, systematic, and generally applicableresults for model-set design. Relevant theoretical results are scarce and this deserves more attention.

The other direction is to develop and design better algorithms. The MM approach started with the first-generation autonomousMM algorithms in which elemental filters work independently. Its advantage over many other non-MM approaches stems fromits superior processing of results from elemental filters for outputa posteriori. The first generation has significant applicationsfor non-tracking problems, but limited value for maneuvering target tracking, because of its inability to account for informationcontained in one elemental filter for better performance of another filter.

Represented by the IMM algorithm, the second-generation (cooperating MM) algorithms explore effective cooperationstrategies among elemental filters while inheriting the first generation’s superior rule for output processing. They are significantlymore developed than the first generation. The IMM algorithm has been so successful in solving maneuvering target trackingproblems in the real world that it has become a standard tool for maneuvering target tracking. Significant advancement hasbeen made in recent years and further developments are sure to continue. However, their fundamental limitations are clear andnot minor: They believe at any given time one of their elemental filters is perfect and none of them may provide incorrect,misleading, confusing, or any other harmful information. In short, they trust themselves so much that they refuse to adaptthemselves to the outside world, although their estimates are adaptive to a certain degree. They cannot be expected to performwell if they are exposed to an environment to which none of theexisting elemental filters fit well, such as one that is unknownor new to them.

The third generation is potentially much more advanced in the sense of having an open architecture—a variable structure—than its ancestors, which have a closed architecture. Not only does it inherit the second generation’s effective cooperationstrategies and the first generation’s superior output processing, but it also adapts to the outside world by producing newelemental filters if the existing ones are not good enough andby eliminating those elemental filters that provide harmfulinformation. The decisions on terminating harmful ones arerelatively easier—general and fairly successful rules havebeenobtained. The task of producing new filters systematically in a general setting is much more challenging. A breakthroughherewould be a new milestone in MM estimation. Similar to model-set design, ad hoc designs for producing new filters are almostalways obtainable that outperform the first two generationsgiven a particular problem. Many products in this area can beexpected in the future. The main drawback of most algorithmsin this generation is their sophistication.

The first two generations do have certain “intelligence” at different levels in that they learn the environment during thecourse of estimation, along with their capability of self-assessment, but they stop short of drastically adjusting themselves forbetter performance. The third generation is more intelligent as characterized by its self-adjustment to the outside world, inparticular its ability to (re)produce elemental filters, for the best performance possible.

The operation of anon-MM algorithm amounts to deciding on the best single individualfirst, letting him/her perform, andthen sending out his/her estimation results. For the MM estimation, thefirst generationcan be thought of as afixed group ofindividuals working independently. Its superiority to non-MM algorithms stems from the fact that its output is generatedafterall individuals have performed, which allows, for example,use of the best performancea posterioriand optimal combinationof individual results. The price paid is that all these individuals have to perform in the MM algorithm. The elemental filters inthesecond generationin effect form acooperative team with a fixed membership. It outperforms the first generation because ofits team work via cooperation. The third generation can be likened to be an adaptive, cooperative team with a possiblyvariablemembership. It may recruit new members and fire bad or incompetent members or put them on probation. This additionalflexibility enables the third generation to handle a wide spectrum of intricate and challenging problems in uncertain, complex,and changing situations.

All three generations have their reasons to exist because they have their best domains of application. Clearly, a non-MMalgorithm would be optimal if the best possible individual for the task at the time could always be chosen, which is possibleonly in the absence of uncertainty about the task. If the taskwere fixed in time but unknown over a set and the group wereformed by the best possible individuals for every task in theset, the first generation would be optimal. The second generationwould potentially be optimal if the task might be changing over time within a set and the team were formed by the bestpossible individuals for every task in the set. If either thebest possible individual for each of the tasks is not part of the teamor some team members do not match any of the tasks, it would be possible for the third generation with a variable team tooutperform the champions of the first two generations.


NOMENCLATURE

arg maxx

g(x) = argumentx that maximizesg(x) = maximizer (i.e., location of largest peak) ofg(x)

E[y] = expectation of random variabley

f(x) = probability density function (pdf) of continuous random variable x

(i) = quantity pertaining to modelm(i)

(ik) = quantity pertaining to mode/model sequencemk(ik)

k = time index (subscript for quantity at timek; superscript for sequence through timek)

m = model (mathematical model at certain accuracy level)

M = model set (set of models used)

M = number of models in the model setM

m(i)k = {sk = m(i)} = event that modelm(i) matches the true mode at timek

mk(ik) = {m(i1)

1 , . . . ,m(ik)k } = sequence of events that models match the true mode

P{A} = probability of eventA

p(m) = probability mass function (pmf) of discrete random variable m

P (x,A) = f(x|A)P{A} = mixed pdf-probability of random variablex and eventA

p(x,m) = f(x|m)p(m) = mixed pdf-pmf of random variablesx andm

s = mode (true behavior pattern, system structure, or exact mathematical model)

S = mode space (set of possible modes)

x = base state (continuous valued)

y = estimate ofy

z = data (measurement)

REFERENCES

[1] G. A. Ackerson and K. S. Fu, “On State Estimation in Switching Environments,”IEEE Trans. Automatic Control, AC-15(1):10–17, Jan. 1970.[2] H. Akashi and H. Kumamoto, “Random Sampling Approach to State Estimation in Switching Environments,”Automatica, 13(4):429–433, July 1977.[3] S. Allam, F. Dufour, and P. Bertrand, “Discrete Time Estimation of a Markov Chain with Marked Point Process Observations. Application to Markovian

Jump Filtering,”IEEE Trans. Automatic Control, 46(6):903–908, 2001.[4] A. T. Alouani and W. D. Blair, “Use of a Kinematic Constraint in Tracking Constant Speed, Maneuvering Targets,”IEEE Trans. Automatic Control,

AC-38(7):1107–1111, Jul. 1993.[5] A. T. Alouani and T. R. Rice, “Single-Model Multiple-Process Noise Soft Switching Filter,” inProc. SPIE Conference on Sensor Fusion: Architectures,

Algorithms and Applications, pp. 260–278, 1999.[6] F. Amoozegar, “Neural-Network-Based Target Tracking State-of-The-Art Survey,”Society of Photo-Optical Instrumentation Engineers, 37(3):836–846,

Mar. 1998.[7] B. D. O. Anderson and J. B. Moore,Optimal Filtering. Englewood Cliffs, NJ: Prentice-Hall, 1979.[8] D. Andrisani, F. P. Kuhl, and D. Gleason, “A Nonlinear Tracker Using Attitude Measurements,”IEEE Trans. Aerospace and Electronic Systems,

22(3):533–539, Sept. 1986.[9] M. S. Arulampalam, N. Gordon, M. Orton, and B. Ristic, “A Variable Structure Multiple Model Particle Filter for GMTI Tracking,” in Proc. 2002

International Conf. on Information Fusion, (Annapolis, MD, USA), pp. 927–934, July 2002.[10] A. Averbuch, S. Itzikowitz, and T. Kapon, “Parallel Implementation of Multiple Model Tracking Algorithms,”IEEE Trans. Parallel and Distributed

Systems, PDS-2(2):242–252, Apr. 1991.[11] A. Averbuch, S. Itzikowitz, and T. Kapon, “Radar TargetTracking—Viterbi versus IMM,” IEEE Trans. Aerospace and Electronic Systems, AES-

27(3):550–563, May 1991.[12] Y. Bar-Shalom, “Recursive Tracking Algorithms: From theKalman Filter to Intelligent Trackers for Cluttered Environment,” in Proc. 1989 IEEE Int.

Conf. Control and Applications, (Jerusalem, Israel), Apr. 1989.[13] Y. Bar-Shalom, ed.,Multitarget-Multisensor Tracking: Advanced Applications. Norwood, MA: Artech House, 1990.[14] Y. Bar-Shalom, ed.,Multitarget-Multisensor Tracking: Applications and Advances, vol. II. Norwood, MA: Artech House, 1992.[15] Y. Bar-Shalom and W. D. Blair, eds.,Multitarget-Multisensor Tracking: Applications and Advances, vol. III. Boston, MA: Artech House, 2000.[16] Y. Bar-Shalom, K. C. Chang, and H. A. P. Blom, “Tracking a Maneuvering Target Using Input Estimation Versus the Interacting Multiple Model

Algorithm,” IEEE Trans. Aerospace and Electronic Systems, AES-25(2):296–300, Apr. 1989.[17] Y. Bar-Shalom, A. K. Kumar, W. D. Blair, and G. W. Groves, “Tracking Low Elevation Targets in the Presence of Multipath Propagation,”IEEE Trans.

Aerospace and Electronic Systems, AES-30(4), Oct. 1994.[18] Y. Bar-Shalom and X. R. Li,Estimation and Tracking: Principles, Techniques, and Software. Boston, MA: Artech House, 1993. (Reprinted by YBS

Publishing, 1998).[19] Y. Bar-Shalom and X. R. Li,Multitarget-Multisensor Tracking: Principles and Techniques. Storrs, CT: YBS Publishing, 1995.[20] Y. Bar-Shalom, X. R. Li, and K. C. Chang, “Non-StationaryNoise Identification with Interacting Multiple Model Algorithm,” in Proc. 5th International

Symp. Intelligent Control, (Philadelphia, PA), pp. 585–589, Sept. 1990.[21] Y. Bar-Shalom, X. R. Li, and T. Kirubarajan,Estimation with Applications to Tracking and Navigation: Theory, Algorithms, and Software. New York:

Wiley, 2001.[22] Y. Baram, “A Sufficient Condition for Consistent Discrimination Between Stationary Gaussian Models,”IEEE Trans. Automatic Control, AC-23(5):958–

960, Oct. 1978.[23] Y. Baram, “Nonstationary Model Validation from Finite Data Records,”IEEE Trans. Automatic Control, AC-25(1):10–19, Feb. 1980.


[24] Y. Baram and N. R. Sandell, Jr., “An Information Theoretic Approach to Dynamical Systems Modeling and Identification,”IEEE Trans. AutomaticControl, AC-23(1):61–66, Feb. 1978.

[25] Y. Baram and N. R. Sandell Jr., “Consistent Estimation onFinite Parameter Sets with Application to Linear Systems Identification,” IEEE Trans.Automatic Control, AC-23(3):451–454, June 1978.

[26] M. Basseville, “Detecting Changes in Signals and Systems,” Automatica, 24(3):309–326, May 1988.[27] N. Bergman,Recursive Bayesian Estimation. Navigation and Tracking Applications. PhD thesis, Department of Electrical Engineering, Linkoping

University, Sweden, 1999.[28] N. Bergman and F. Gustafsson, “Three Statistical Batch Algorithms for Tracking Manoeuvring Targets,” inProc. 5th European Control Conference,

(Karlsruhe, Germany), 1999.[29] K. Berketis, S. K. Katsikas, and S. Likothanassis, “Multimodel Partitioning Filters and Genetic Algorithms,”Nonlinear Analysis, Theory, Methods and

Applications, 30(4):2421–2427, 1997.[30] D. Bertsekas,Nonlinear Programing. Athena Scientific, 2nd ed., Sept. 1999. ISBN: 1886529000.[31] A. Bessell, B. Ristic, A. Farina, X. Wang, and M. S. Arulampalam, “Error Performance Bounds for Tracking a Manoeuvring Target,” in Proc. 2003

International Conf. on Information Fusion, (Cairns, Australia), pp. 903–910, July 2003.[32] R. A. Best and J. P. Norton, “A New Model and Efficient Tracker for a Target with Curvilinear Motion,”IEEE Trans. Aerospace and Electronic Systems,

AES-33(3):1030–1037, Jul. 1997.[33] P. J. Bickel, C. Klassen, Y. Ritov, and J. Wellner,Efficient and Adaptive Estimation for Semiparametric Models. New York: Springer, 1998.[34] S. S. Blackman,Multiple Target Tracking with Radar Applications. Norwood, MA: Artech House, 1986.[35] S. S. Blackman, M. T. Busch, and R. F. Popoli, “IMM/MHT Tracking and Data Association for Benchmark Tracking Problem,” in Proc. 1995 American

Control Conf., (Seattle, WA), pp. 2606–2610, June 1995.[36] S. S. Blackman, M. T. Busch, and R. F. Popoli, “IMM/MHT Solution to Radar Benchmark Tracking Problem,”IEEE Trans. Aerospace and Electronic

Systems, AES-35(2):730–737, Apr. 1999. Also appeared inProc. 1995 American Control Conf., Seattle, WA, June 1995, pp. 2606–2610.[37] S. S. Blackman, R. J. Dempster, and S. H. Roszkowski, “IMM/MHT Application to Radar and IR Multitarget Tracking,” inProc. 1997 SPIE Conf. on

Signal and Data Processing of Small Targets,vol. 3163, pp. 429–439, 1997.[38] S. S. Blackman, R. J. Dempster, D. M. Sasaki, P. F. Singer, and G. K. Tucker, “Application of IMM/MHT Tracking with Spectral Features to Ground

Targets,” inProc. 1999 SPIE Conf. on Signal and Data Processing of Small Targets,vol. 3809, (Denver, CO, USA), pp. 456–467, July 1999.[39] S. S. Blackman and R. F. Popoli,Design and Analysis of Modern Tracking Systems. Norwood, MA: Artech House, 1999.[40] W. D. Blair, “Toward the Integration of Tracking and Signal Processing for Phased Array Radar,” inProc. 1994 SPIE Conf. on Signal and Data

Processing of Small Targets,vol. 2235, (Orlando, FL, USA), 1994.[41] W. D. Blair and Y. Bar-Shalom, “Tracking Maneuvering Targets with Multiple Sensors: Does More Data Always Mean Better Estimates,”IEEE Trans.

Aerospace and Electronic Systems, 32(1):450–456, Jan. 1996.[42] W. D. Blair and G. A. Watson, “Interacting Multiple BiasModel Algorithm with Application to Tracking Maneuvering Targets,” inProc. 31th IEEE

Conf. on Decision and Control, (Tucson, AZ), pp. 3790–3795, Dec. 1992.[43] W. D. Blair and G. A. Watson, “Interacting Multiple Model Algorithm with Aperiodic Data,” inProc. SPIE Symp. on Acqisition, Tracking and Pointing,

(Orlando, FL), Apr. 1992.[44] W. D. Blair and G. A. Watson, “IMM Algorithm for Solutionto Benchmark Problem for Tracking Maneuvering Targets,” inProc. SPIE Symp. on

Acqisition, Tracking and Pointing, (Orlando, FL), Apr. 1994.[45] W. D. Blair and G. A. Watson, “Benchmark Problem for RadarResource Allocation and Tracking Maneuvering Targets in the Presense of False Alarms

and ECM,” Tech. Rep. NSWCDD/TR-96/10, Naval Surface WarfareCenter Dahlgren Division, Dahlgren, VA, Feb. 1996.[46] W. D. Blair, G. A. Watson, and A. T. Alouani, “Tracking Constant Speed Targets Using a Kinematic Constraint,” inProc. 1991 IEEE Southeast Conf,

1991.[47] W. D. Blair, G. A. Watson, G. L. Gentry, and S. A. Hoffman, “Benchmark Problem for Beam Pointing Control of Phased Array Radar Against

Maneuvering Target in the Presence of ECM and FA,” inProc. 1995 American Control Conf., (Seattle, WA), pp. 2601–2605, June 1995.[48] W. D. Blair, G. A. Watson, and S. A. Hoffman, “Benchmark Problem for Beam Pointing Control of Phased Array Radar AgainstManeuvering Target,”

in Proc. 1994 American Control Conf., (Baltimore, MD), pp. 2071–2075, June 1994.[49] W. D. Blair, G. A. Watson, T. Kirubarajan, and Y. Bar-Shalom, “Benchmark for Radar Resource Allocation and Tracking Targets in the Presence

of ECM,” IEEE Trans. Aerospace and Electronic Systems, AES-34(4):1097–1114, Oct. 1998. Also appeared inProc. 1995 American Control Conf.,Seattle, WA, June 1995, pp. 2601–2605.

[50] W. D. Blair, G. A. Watson, T. Kirubarajan, and Y. Bar-Shalom, “Benchmark for Radar Resource Allocation and Tracking Targets in the Presence ofECM,” IEEE Trans. Aerospace and Electronic Systems, AES-34(4):1097–1114, Oct. 1998.

[51] W. D. Blair, G. A. Watson, and T. R. Rice, “Interacting Multiple Model Filter for Tracking Maneuvering Targets in Spherical Coordinates,” inProc.of IEEE Southeastcon 1991, (Williamsburg, VA), pp. 1055–1059, Apr. 1991.

[52] W. D. Blair, G. A. Watson, and T. R. Rice, “Tracking Maneuvering Targets with an Interacting Multiple Model Filter Containing ExponentiallyCorrelated Acceleration Models,” inSoutheastern Symp. Systems Theory, (Columbia, SC), Mar. 1991.

[53] E. Blasch and T. Connare, “Improving Track Maintenance Through Group Tracking,” inProc. Workshop on Estimation, Tracking, and Fusion — ATribute to Yaakov Bar-Shalom, (Monterey, CA, USA), pp. 360–371, May 2001.

[54] E. A. Bloem, H. A. P. Blom, and F. J. van Schaik, “Advanced Data Fusion for Airport Surveillance,” inProc. JISSA 2001 Int. Conf. On AirportSurveillance Sensors, (Paris), Dec. 2001.

[55] H. A. P. Blom, “A Sophisticated Tracking Algorithm for ATC Surveillance Data,” inProc. International Radar Conf., (Paris, France), May 1984.[56] H. A. P. Blom, “An Efficient Filter for Abruptly Changing Systems,” inProc. 23rd IEEE Conf. Decision and Control, (Las Vegas, NV), Dec. 1984.[57] H. A. P. Blom, “Overlooked Potential of Systems with Markovian Switching Coefficients,” inProc. 25th IEEE Conf. Decision and Control, (Athens,

Greece), Dec. 1986.[58] H. A. P. Blom and Y. Bar-Shalom, “The Interacting Multiple Model Algorithm for Systems with Markovian Switching Coefficients,” IEEE Trans.

Automatic Control, AC-33(8):780–783, Aug. 1988.[59] H. A. P. Blom and Y. Bar-Shalom, “Time-Reversion of a Hybrid State Stochastic Difference System with a Jump-Linear Smoothing Application,”

IEEE. Trans. Information Theory, IT-36(4):836–847, July 1990.[60] H. A. P. Blom, R. A. Hogendoorn, and B. A. van Doorn, “Design of a Multisensor Tracking System for Advanced Air Traffic Control,” in Multitarget-

Multisensor Tracking: Applications and Advances(Y. Bar-Shalom, ed.), vol. II, ch. 2, Norwood, MA: Artech House, 1992.[61] H. A. P. Blom, R. A. Hogendoorn, and F. J. van Schaik, “Bayesian Multisensor Tracking for Advanced Air Traffic Control Systems,” inAircraft

Trajectories: Computation, Prediction and Control(A. Benoit, ed.), AGARDOgraph 301, 1990.[62] L. Bloomer and J. E. Gray, “Are More Models Better?: The Effect of the Model Transition Matrix on the IMM Filter,” inThe 34th Southeastern

Simposium on System Theory (SSST), (Huntsville, AL), March 2002.[63] Y. Boers and H. Driesen, “A Multiple Model Multiple Hypothesis Filter for Systems with Possibly Erroneous Measurements,” in Proc. 2002 International

Conf. on Information Fusion, (Annapolis, MD, USA), pp. 700–704, July 2002.[64] P. L. Bogler, “Tracking a Maneuvering Target Using Input Estimation,” IEEE Trans. Aerospace and Electronic Systems, AES-23(3):298–310, May

1987.[65] R. G. Brown,Introduction to Random Siganls and Kalman Filtering. New York: Wiley, 1983.


[66] T. E. Bullock and S. Sangsuk-Iam, “Maneuver Detection and Tracking with a Nonlinear Target Model,” inProc. 23 IEEE Conf. Decision and Control,(Las Vegas, NV), Dec. 1984.

[67] S. Burassa, P. Fontaine, E. Shahbazian, and M.-A. Simard, “Comparison of Different Parallel Filtering Techniques,”in Proc. 1993 SPIE Conf. on Signaland Data Processing of Small Targets,vol. , (Orlando, FL), pp. 319–330, April 1993.

[68] M. Busch and S. Blackman, “Evaluation of IMM Filtering for an Air Defence System Application,” inProc. 1995 SPIE Conf. on Signal and DataProcessing of Small Targets,vol. 2561, pp. 435–447, 1995.

[69] L. Campo, P. Mookerjee, and Y. Bar-Shalom, “State Estimation for Systems with Sojourn-Time-Dependent Markov Model Switching,” IEEE Trans.Automatic Control, AC-36(2):238–243, Feb. 1991.

[70] M. J. Caputi, “A Necessary Condition for Effective Performance of the Multiple Model Adaptive Estimator,”IEEE Trans. Aerospace and ElectronicSystems, AES-31(3):1132–1139, July 1995.

[71] M. J. Caputi and R. L. Moose, “A Modified Gaussian Sum Approach to Estimation of Non-Gaussian Signals,”IEEE Trans. Aerospace and ElectronicSystems, 29(2):446–451, April 1993.

[72] W. S. Chaer, R. H. Bishop, and J. Ghosh, “A Mixture-of-Experts Framework for Adaptive Kalman Filtering,”IEEE Trans. Systems, Man, andCybernetics—Part B: Cybernetics, 27(3):452–464, June 1997.

[73] W. S. Chaer, R. H. Bishop, and J. Ghosh, “Hierarchical Adaptive Kalman Filter for Interplanetary Orbit Determination,” IEEE Trans. Aerospace andElectronic Systems, 34(3):883–895, July 1998.

[74] C. B. Chang and M. Athans, “State Estimation for DiscreteSystems with Switching Parameters,”IEEE Trans. Aerospace and Electronic Systems,AES-14(5):418–425, May 1978.

[75] C. B. Chang, R. H. Whiting, and M. Athans, “On the State andParameter Estimation for Maneuvering Reentry Vehicles,”IEEE Trans. AutomaticControl, AC-22(2):99–105, Feb. 1977.

[76] B. Chen and J. K. Tugnait, “Interacting Multiple Model Fixed-Lag Smoothing Algorithm for Markovian Switching Systems,” IEEE Trans. Aerospaceand Electronic Systems, 36(1):243–250, Jan. 2000.

[77] B. Chen and J. K. Tugnait, “Multisensor Tracking of a Maneuvering Target in Clutter by Using IMMPDA Fixed-Lag Smoothing,” IEEE Trans. Aerospaceand Electronic Systems, 36(3):983–991, Jan. 2000.

[78] B. Chen and J. K. Tugnait, “Tracking of Multiple Maneuvering Targets in Clutter Using IMM/JPDA Filtering and Fixed-Lag Smoothing,”Automatica,37(2), Feb. 2001.

[79] T. Connare, E. Blasch, J. Greenewald, J. Schmitz, F. Salvatore, and F. Scarpino, “Group IMM Tracking Utilizing Trackand Identification Fusion,” inProc. Workshop on Estimation, Tracking, and Fusion — A Tribute to Yaakov Bar-Shalom, (Monterey, CA, USA), pp. 205–220, May 2001.

[80] R. L. Cooperman, “Tactical Ballistic Missile Tracking Using the Interacting Multiple Model Algorithm,” inProc. 2002 International Conf. on InformationFusion, (Annapolis, MD, USA), pp. 824–831, July 2002.

[81] O. L. V. Costa, “Linear Minimum Mean Square Error Estimation for Discrete-Time Markovian Jump Linear Systems,”IEEE Trans. Automatic Control,AC-39(8):1685–1689, Aug. 1994.

[82] O. L. V. Costa and S. Guerra, “Robust Linear Filtering for Discrete-Time Hybrid Markov Linear Systems,”Int. J. Control, 75(10):712–727, 2002.[83] O. L. V. Costa and S. Guerra, “Stationary Filter for Linear Minimum Mean Square Error Estimator of Discrete-Time Markovian Jump Systems,”IEEE

Trans. Automatic Control, 47(8):1351–1356, Aug. 2002.[84] E. Daeipour and Y. Bar-Shalom, “An Interacting MultipleModel Approach for Target Tracking with Glint Noise,”IEEE Trans. Aerospace and Electronic

Systems, AES-31(2):706–715, Apr. 1995.[85] E. Daeipour and Y. Bar-Shalom, “IMM Tracking of Maneuvering Targets in the Presence of Glint,”IEEE Trans. Aerospace and Electronic Systems,

AES-34(3):996–1003, July 1998.[86] E. Daeipour, Y. Bar-Shalom, and X. R. Li, “Adaptive Beam Pointing Control of a Phased Array Radar Using an IMM Estimator,” in Proc. 1994

American Control Conf., (Baltimore, MA), pp. 2093–2097, June 1994.[87] A. P. Dempster, N. M. Liard, and D. B. Rubin, “Maximum Likelihood from Incomplete Data Via the EM Algorithm,”J. R. Statist. Soc.B, 39:1–38,

1977.[88] E. Derbez, B. Remillard, and A. Jouan, “A Comparison of Fixed Gain IMM Against Two Other Filters,” inProc. 2000 International Conf. on Information

Fusion, (Paris, France), pp. ThB2–3–ThB2–9, July 2000.[89] Z. Ding, H. Leung, and K. Chan, “Model-Set Adaption Using a Fuzzy Kalman Filter,” inProc. International Conf. on Information Fusion, (Paris,

France), p. MoD2, Jul. 2000.[90] A. Doucet and C. Andrieu, “Iterative Algorithms for State Estimation of Jump Markov Linear Systems,”IEEE Trans. Signal Processing, 49(6):1216–

1227, June 2001.[91] A. Doucet and B. Ristic, “Recursive State Estimation forMultiple Switching Models with Unknown Transition Probabilities,” IEEE Trans. Aerospace

and Electronic Systems, AES-38:1098–1104, July 2002.[92] J. N. Driessen and Y. Boers, “A Multiple Model Multiple Hypothesis Filter for Tracking Maneuvering Targets,” inProc. 2001 SPIE Conf. on Signal

and Data Processing of Small Targets,vol. 4473, (San Diego, CA, USA), pp. 279–288, 2001.[93] O. Drummond, “Feature, Attribute, and Classification Aided Target Tracking,” inProc. 2001 SPIE Conf. on Signal and Data Processing of Small

Targets,vol. 4473, (San Diego, CA, USA), pp. 542–548, 2001.[94] O. E. Drummond,Multiple-Object Estimation. PhD thesis, University of California, Los Angeles, 1992.[95] O. E. Drummond, “Multiple Target Tracking with Multiple Frame, Probabilistic Data Association,” inProc. 1993 SPIE Conf. on Signal and Data

Processing of Small Targets,vol. 1954, Apr. 1993.[96] O. E. Drummond, “Multiple SensorTracking with Multiple Frame, Probabilistic Data Association,” inProc. 1995 SPIE Conf. on Signal and Data

Processing of Small Targets,vol. 2561, Apr. 1995.[97] O. E. Drummond, “Target Tracking with Retrodicted Discrete Probabilities,” inProc. 1997 SPIE Conf. on Signal and Data Processing of Small Targets,

vol. 3163, pp. 249–268, 1997.[98] O. E. Drummond, “Best Hypothesis Target Tracking and Sensor Fusion,” inProc. 1999 SPIE Conf. on Signal and Data Processing of Small Targets,

vol. 3809, (Denver, CO, USA), pp. 586–599, July 1999.[99] O. E. Drummond, X. R. Li, and C. He, “Comparison of Various Static Multiple-Model Estimation Algorithms,” inProc. 1998 SPIE Conf. on Signal

and Data Processing of Small Targets,vol. 3373, pp. 510–527, 1998.[100] F. Dufour and P. Bertrand, “An Image-Based Filter for Discrete-Time Markov Jump Linear Systems,”Automatica, 32(2):241–247, 1996.[101] F. Dufour and M. Mariton, “Tracking a 3D Maneuvering Target with Passive Sensors,”IEEE Trans. Aerospace and Electronic Systems, AES-27(4):725–

739, Jul. 1991.[102] F. Dufour and M. Mariton, “Passive Sensor Data Fusion and Maneuvering Target Tracking,” inMultitarget-Multisensor Tracking: Applications and

Advances(Y. Bar-Shalom, ed.), vol. II, ch. 3, Norwood, MA: Artech House, 1992.[103] P. F. Easthope, “Using TOTS for More Accurate and Responsive Multi-Sensor, End-to-End Ballistic Missile Tracking,” in Proc. SPIE Conf. on Signal

and Data Processing of Small Targets 2000,vol. 4048, Apr. 2000.[104] P. F. Easthope and N. W. Heys, “Multiple-Model Target-Oriented Tracking System,” inProc. SPIE Conf. on Signal and Data Processing of Small

Targets 1994,vol. 2235, Apr. 1994.[105] M. Efe and D. P. Atherton, “Maneuvering Target Tracking Using Adaptive Turn Rate Models in the Interacting Multiple Model Algorithm,” in Proc.

35th IEEE Conf. on Decision and Control, (Kobe, Japan), pp. 3151–3156, Dec. 1996.


[106] M. Efe and D. P. Atherton, “The IMM Approach to the FaultDetection Problem,” in11th IFAC Symp. on System Identification, (Fukuoka, Japan), July1997.

[107] R. J. Elliott, L. Aggoun, and J. B. Moore,Hidden Markov Models. New York: Springer-Verlag, 1997.[108] R. J. Elliott, F. Dufour, and W. P. Malcolm, “A Comparisonof Angle-Only Tracking Algorithms,” inProc. 2001 SPIE Conf. on Signal and Data

Processing of Small Targets,vol. 4473, (San Diego, CA, USA), pp. 270–278, 2001.[109] R. J. Elliott, F. Dufour, and D. D. Sworder, “Exact Hybrid Filters in Discrete Time,”IEEE Trans. Automatic Control, AC-41(12):1807–1810, Dec.

1996.[110] J. S. Evans,Studies in Nonlinear Filtering Theory–Random Parameter Linear Systems, Target Tracking and Communication Constrained Estimation.

PhD thesis, University of Melbourne, Melbourne, Australia, Jan. 1998.[111] J. S. Evans and R. J. Evans, “State Estimation for MarkovSwitching Systems with Modal Observations,” inProc. 36th IEEE Conf. on Decision and

Control, (San Diego, CA), pp. 1688–1693, Dec. 1997.[112] J. S. Evans and R. J. Evans, “Image-Enhanced Multiple Model Tracking,”Automatica, 35(11):1769–1786, Nov. 1999.[113] A. Farina, L. Ferranti, and G. Colino, “Constrained Tracking Filters for A-SMGCS,” inProc. 2003 International Conf. on Information Fusion, (Cairns,

Australia), pp. 414–421, July 2003.[114] K. A. Fisher and P. S. Maybeck, “Multiple Model Adaptive Estimation with Filtering Spawning,”IEEE Trans. Aerospace and Electronic Systems,

AES-38(3):755–768, 2002.[115] G. D. Forney, “The Viterbi Algorithm,”Proc. IEEE, 61(3):268–278, Mar. 1973.[116] D. C. Fraser and J. E. Potter, “The Optimum Linear Smoother as a Combination of Two Optimum Linear Filters,”IEEE Trans. Automatic Control,

AC-14(4):387–390, 1969.[117] B. Friedland, “Treatment of Bias in Recursive Filtering,” IEEE Trans. Automatic Control, AC-14:359–367, Aug. 1969.[118] C. M. Fry and A. P. Sage, “On Hierarchical Structure Adaptation and Systems Identification,”Int. J. Control, 20(3):433–452, 1974.[119] H. Gauvrit, J. P. L. Cadre, and C. Jauffret, “A Formulation of Multitarget Tracking as an Incomplete Data Problem,”IEEE Trans. Aerospace and

Electronic Systems, 33(4):1242–1257, Oct. 1997.[120] M. Gauvrit, “Bayesian Adaptive Filter for Tracking with Measurements of Uncertain Origin,”Automatica, 20:217–224, Mar. 1984.[121] A. Gersho and R. M. Gray,Vector Quantization and Signal Compression. Boston, MA: Kluwer, 1992.[122] J. L. Gertz, “Multisensor Surveillance for Improved Aircraft Tracking,”Lincoln Laboratory Journal, 2(3):381–396, 1989.[123] N. H. Gholson and R. L. Moose, “Maneuvering Target Tracking Using Adaptive State Estimation,”IEEE Trans. Aerospace and Electronic Systems,

AES-13:310–317, May 1977.[124] D. E. Goldberg,Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley, 1989.[125] J. Goutsias and J. M. Mendel, “Optimal Simultaneous Detection and Estimation of Filtered Discrete semi-Markov Chains,” IEEE. Trans. Information

Theory, IT-34(3):551–568, May 1988.[126] J. A. Gustafson and P. S. Maybeck, “Flexible Spacestructure Control Via Moving-Bank Multiple Model Algorithms,”IEEE Trans. Aerospace and

Electronic Systems, AES-30:750–757, July 1994.[127] F. Gustafsson,Adaptive Filtering and Change Detection. Wiley, 2001.[128] J. A. Guu and C. H. Wei, “Maneuvering Target Tracking Using IMM Method at High Measurement Frequency,”IEEE Trans. Aerospace and Electronic

Systems, AES-27(3):514–519, May 1991.[129] M. T. Hadidi and S. C. Schwartz, “Sequential Detectionwith Markov Interrupted Observations,” inProc. 16th Allerton Conf. on Communication,

Control and Computing, (Univ. of Illinois), Oct. 1978.[130] R. M. Hawkes and J. B. Moore, “Performance Bounds for Adaptive Estimation,”Proc. IEEE, 64(8):1143–1150, 1976.[131] R. E. Helmick, W. D. Blair, and S. A. Hoffman, “One-Step Fixed-Lag Smoothers for Markovian Switching Systems,” inProc. Of the American Control

Conference, pp. 782–786, 1994.[132] R. E. Helmick, W. D. Blair, and S. A. Hoffman, “Fixed-Interval Smoothing for Markovian Switching Systems,”IEEE. Trans. Information Theory,

IT-41(6):1845–1855, Nov. 1995.[133] R. E. Helmick, W. D. Blair, and S. A. Hoffman, “One-Step Fixed-Lag Smoothers for Markovian Switching Systems,”IEEE Trans. Automatic Control,

AC-41(7):1051–1056, July 1996.[134] G. A. Hewer, R. D. Martin, and J. Zeh, “Robust Preprocessing for Kalman Filtering of Glint Noise,”IEEE Trans. Aerospace and Electronic Systems,

23(1):120–128, Jan. 1987.[135] T.-J. Ho and M. Farooq, “Comparing an IMM Algorithm and aMultiple-Process Soft Switching Algorithm: Equivalence Relashionship and Tracking

Performance,” inProc. 2000 International Conf. on Information Fusion, (Paris, France), pp. MoD2.17–MoD2.24, July 2000.[136] R. A. Hogendoorn, C. Rekkas, and W. H. L. Neven, “ARTAS:An IMM-Based Multisensor Tracker,” inProc. 1999 International Conf. on Information

Fusion, (Sunnyvale, CA), pp. 1021–1028, July 1999.[137] J. H. Holland,Adaptation in Natural and Artificial Systems. Ann Arbor, MI: University of Michigan Press, 1975.[138] L. Hong, Z. Ding, and R. A. Wood, “Development of Multirate Model and Multirate Interacting Multiple Model Algorithm for Multiplatform Multisensor

Tracking,” Optical Engineering, 37(2):453–467, 1998.[139] L. Hong, “Multirate Interacting Multiple Model Filtering for Target Tracking Using Multirate Models,”IEEE Trans. on Automatic Control, 44(7):1326–

1340, Jul. 1999.[140] A. Houles and Y. Bar-Shalom, “Multisensor Tracking of aManeuvering Target in Clutter,”IEEE Trans. Aerospace and Electronic Systems, AES-

25(2):176–189, Mar. 1989.[141] R. G. Hutchins and A. San Jose, “IMM Tracking of a Theater Ballistic Missile during Boost Phase,” inProc. 1998 SPIE Conf. on Signal and Data

Processing of Small Targets,vol. 3373, pp. 528–531, 1998.[142] R. G. Hutchins and A. San Jose, “Trajectory Tracking and Backfitting Techniques Against Theater Ballistic Missiles,” in Proc. 1999 SPIE Conf. on

Signal and Data Processing of Small Targets,vol. 3809, pp. 532–526, 1999.[143] R. G. Hutchins, D. Wilson, L. K. Allred, and R. Duren, “Alternative Architectures for IMM Tracking of Maneuvering Aircraft,” in Proc. 2002 SPIE

Conf. on Signal and Data Processing of Small Targets,vol. 4728, (Orlando, FL, USA), April 2002.[144] I. Hwang, H. Balakrishnan, and C. Tomlin, “Flight-Mode-Based Aircraft Conflict Detection using a Residual-Mean Interacting Multiple Model

Algorithm,” in Proceedings of AIAA Guidance, Navigation, and Control Conference, (Austin, TX), Aug. 2003.[145] I. Hwang, H. Balakrishnan, and C. Tomlin, “Observability Criteria and Estimator Design for Stochastic Linear Hybrid Systems,” inProc. IEE European

Control Conf., (Cambridge, UK), Sept. 2003.[146] A. Isaksson and F. Gustafsson, “Comparison of Some KalmanFilter Based Methods for Manoeuvre Tracking and Detection,” in Proc. 34th IEEE Conf.

on Decision and Control, (New Orleans, LA, USA), pp. 1525–1531, Dec. 1995.[147] A. Isaksson, F. Gustafsson, and N. Bergman, “Pruning versus Merging in Kalman Filter Banks for Manoevre Tracking.” URL: cite-

seer.nj.nec.com/isaksson97pruning.html, 1997.[148] A. G. Jaffer and S. C. Gupta, “On Estimation of Discrete Processes Under Multiplictive and Additive Noise Conditions,” Information Science, 3:267,

1971.[149] V. P. Jilkov and D. S. Angelova, “Performance Evaluation and Comparison of Variable Structure Multiple-Model Algorithms for Tracking Maneuvering

Radar Targets,” inProc. 26th European Microwave Conf., (Prague, Czech), Sept. 1996.


[150] V. P. Jilkov, D. S. Angelova, and T. A. Semerdjiev, “Mode-Set Adaptive IMM for Maneuvering Target Tracking,”IEEE Trans. Aerospace and ElectronicSystems, AES-35(1):343–350, Jan. 1999.

[151] V. P. Jilkov and X. R. Li, “Adaptation of Transition Probability Matrix for Multiple Model Estimators,” inProc. 2001 International Conf. on InformationFusion, (Montreal, QC, Canada), pp. ThB1.3–ThB1.10, Aug. 2001.

[152] V. P. Jilkov and X. R. Li, “On-Line Bayesian Estimation of Transition Probabilities for Markovian Jump Systems,”IEEE Trans. Signal Processing(toappear), SP-51, 2003.

[153] V. P. Jilkov, X. R. Li, and D. Angelova, “Bayesian Estimation of Transition Probabilities for Markovian Jump Systems by Stochastic Simulation,” inSpringer Lecture Notes in Computer Science, vol. 2542, pp. 307–315, 2003.

[154] V. P. Jilkov, X. R. Li, and L. Lu, “Performance Enhancement of IMM Estimation by Smoothing,” inProc. 2002 International Conf. on InformationFusion, (Annapolis, MD, USA), pp. 713–720, July 2002.

[155] V. P. Jilkov, L. Mihaylova, and X. R. Li, “An Alternative IMM Solution to Benchmark Radar Tracking Problem,” inProc. International Conf. onMultisource-Multisensor Information Fusion, pp. 924–929, Jul. 1998.

[156] T. A. Johansen and R. Murray-Smith, “The Operating Regime Approach,” inMultiple Model Approaches to Modelling and Control(R. Murray-Smithand T. A. Johansen, eds.), ch. 1, pp. 3–73, Taylor &Francis, 1997.

[157] L. A. Johnston and V. Krishnamurthy, “Mode-Matched Filtering via the EM Algorithm,” inProc. 1999 American Control Conf., (San Diego, CA),pp. 1930–1934, June 1999.

[158] L. A. Johnston and V. Krishnamurthy, “An Improvement to the Interacting Multiple Model (IMM) Algorithm,” IEEE Trans. Signal Processing,49(12):2893–2908, 2001.

[159] A. Jouan, E. Bosse, M.-A. Simard, and E. Shahbazian, “Comparison of Various Schema of Filter Adaptivity for the Tracking of Maneuvering Targets,”in Proc. 1998 SPIE Conf. on Signal and Data Processing of Small Targets,vol. 3373, (Orlando, Florida, USA), pp. 247–258, 1998.

[160] H. Kameda, S. Tsujimichi, and Y. Kosuge, “Target Tracking for Maneuvering Reentry Vehicles Using Multiple Maneuvering Models,” in Proc. 36thSICE (Society of Instrument and Control Engineers) Annual Conference, (Japan), pp. 1031–1036, SICE, 1997.

[161] K. Kastella and M. Biscuso, “Tracking Algorithms for Air Traffic Control Applications,”Air Traffic Control Quarterly, 3(1):19–43, Jan. 1996.[162] S. K. Katsikas, S. D. Likothanassis, G. N. Beligiannis, K. G. Berketis, and D. A. Fotakis, “Genetically Determined Variable Structure Multiple Model

Estimation,” IEEE Trans. Signal Processing, 49(10):2253–2261, Oct. 2001.[163] J. D. Kendrick, P. S. Maybeck, and J. G. Reid, “Estimation of Aircraft Target Motion Using Orientation Measurements,” IEEE Trans. Aerospace and

Electronic Systems, 17:254–260, Mar. 1981.[164] T. H. Kerr, “Duality Between Failure Detection and Radar/Optical Maneuver Detection,”IEEE Trans. Aerospace and Electronic Systems, AES-25:520–

528, July 1989.[165] T. Kirubarajan and Y. Bar-Shalom, “Kalman Filter vs. IMMEstimator: When We Need the Latter?,” inProc. 2000 SPIE Conf. on Signal and Data

Processing of Small Targets,vol. 4048, (Orlando, Florida, USA), pp. 576–582, April 2000.[166] T. Kirubarajan and Y. Bar-Shalom, “Tracking Evasive Move-Stop-Move Targets with a GMTI Radar Using a VS-IMM Estimator,” IEEE Trans. Aerospace

and Electronic Systems, 39(3):1098–1103, 2003.[167] T. Kirubarajan, Y. Bar-Shalom, W. D. Blair, and G. A. Watson, “IMMPDAF for Radar Management and Tracking Benchmark with ECM,” IEEE Trans.

Aerospace and Electronic Systems, AES-34(4):1115–1134, Oct. 1998.[168] T. Kirubarajan, Y. Bar-Shalom, and E. Daeipour, “Adaptive Beam Pointing Control of a Phased Array Radar in the Presense of ECM and False Alarms

Using IMMPDAF,” in Proc. 1995 American Control Conf., (Seattle, WA), pp. 2616–2620, June 1995.[169] T. Kirubarajan, Y. Bar-Shalom, K. R. Pattipati, and I. Kadar, “Ground Target Tracking with Topography-Based Variable-Structure IMM Estimator,” in

Proc. 1998 SPIE Conf. on Signal and Data Processing of Small Targets,vol. 3373, (Orlando, Florida, USA), pp. 222–233, 1998.[170] T. Kirubarajan, Y. Bar-Shalom, K. R. Pattipati, and I. Kadar, “Ground Target Tracking with Topography-Based Variable Structure IMM Estimator,”

IEEE Trans. Aerospace and Electronic Systems, AES-36(1):26–46, Jan. 2000.[171] T. Kirubarajan, K. R. Pattipati, R. L. Popp, and H. Wang, “Large-Scale Air Surveillance Using an IMM Estimator,” inProceedings of the Workshop

on Estimation, Tracking and Fusion: A Tribute to Yaakov Bar-Shalom, (Monterey, CA), pp. 427–466, May 2001.[172] T. Kirubarajan, M. Yeddanapudi, Y. Bar-Shalom, and K. R. Pattipati, “Comparison of IMMPDA and IMM-Assignment Algorithms on Real Air Trafic

Surveillance Data,” inProc. 1996 SPIE Conf. on Signal and Data Processing of Small Targets,vol. 2759, (Orlando, FL, USA), 1996.[173] W. Koch, “Retrodiction for Bayesian Multiple Hypothesis/Multiple TargetTracking in Densely Cluttered Environment,” in Proc. 1996 SPIE Conf. on

Signal and Data Processing of Small Targets,vol. 2759, (Orlando, FL, USA), 1996.[174] W. Koch, “Fixed-Interval Retrodiction Approach to Bayesian IMM-MHT for Maneuvering Multiple Targets,”IEEE Trans. Aerospace and Electronic

Systems, 36(1):2–14, Jan. 2000.[175] T. G. Kolda, R. M. Lewis, and V. Torczon, “Optimization by Direct Search: New Perspectives on Some Classical and Modern Methods,”SIAM Review,

45(3):385–482, 2003.[176] V. Krishnamurthy and R. J. Elliott, “Filters for Estimating Markov Modulated Poisson Processes and Image-Based Tracking,” Automatica, 33(5):821–833,

1997.[177] V. Krishnamurthy and J. Evans, “Finite Dimensional Filters for Passive Tracking of Markov Jump Linear Systems,”Automatica, 33(5):821–833, 1998.[178] D. W. Kyger and P. S. Maybeck, “Redusing Lag in Virtual Display Using Multiple Model Adaptive Estimation,”IEEE Trans. Aerospace and Electronic

Systems, AES-34(4):1237–1248, Oct. 1998.[179] D. G. Lainiotis, “Optimal Adaptive Estimation: Structure and Parameter Adaptation,”IEEE Trans. Automatic Control, 16(2):160–170, April 1971.[180] D. G. Lainiotis, “Partitioning: A Unifying Framework for Adaptive Systems, I: Estimation,”Proc. IEEE, 64(8):1126–1143, Aug. 1976.[181] D. G. Lainiotis and P. Papaparaskeva, “Efficient Algorithms of Clustering Adaptive Nonlinear Filters,”IEEE Trans. Automatic Control, 44(7):1454–1459,

July 1999.[182] D. G. Lainiotis and S. K. Park, “On Joint Detection, Estimation and System Identification,”Int. J. Control, 17(3):609–633, 1973.[183] D. G. Lainiotis and F. L. Sims, “Performance Measure for Adaptive Kalman Estimators,”IEEE Trans. Automatic Control, AC-15:249–250, Apr. 1970.[184] D. G. Lainiotis and F. L. Sims, “Estimation: A Brief Survey,” Information Sciences, 7(3):191–20, 1974. Also inEstimation Theory, D. G. Lainiotis,

ed., American Elsevier, New York, 1974.[185] P. R. Lamb and L. C. Westphal, “Simplex-Directed Partitioned Adaptive Filters,”Int. J. Control, 30(4):617–627, 1979.[186] J. Layne, “Monopulse Radar Tracking Using an AdaptiveInteracting Multiple Model Method with Extended Kalman Filters,” in Proc. 1998 SPIE Conf.

on Signal and Data Processing of Small Targets,vol. 3373, (Orlando, FL), Apr. 1998.[187] J. Layne and S. Weaver, “Stochastic Estimation Using a Continuum of Models,” inProc. 2000 International Conf. on Information Fusion, (Paris,

France), July 2000.[188] C. C. Lefas, “Using Roll-Angle Measurement to Track Aircraft Maneuvers,”IEEE Trans. Aerospace and Electronic Systems, AES-20:672–681, Nov.

1984.[189] C. T. Leondes, D. Sworder, and J. E. Boyd, “Multiple Model Methods in Path Following,”Journal of Mathematical Analysis and Applications,

251(2):609–623, Nov. 2000.[190] X. R. Li, Hybrid State Estimation and Performance Prediction with Applications to Air Traffic Control and Detection Threshold Optimization. PhD

thesis, University of Connecticut, 1992.[191] X. R. Li, “Multiple-Model Estimation with Variable Structure: Some Theoretical Considerations,” inProc. 33rd IEEE Conf. on Decision and Control,

(Orlando, FL), pp. 1199–1204, Dec. 1994.


[192] X. R. Li, “Hybrid Estimation Techniques,” inControl and Dynamic Systems: Advances in Theory and Applications (C. T. Leondes, ed.), vol. 76,pp. 213–287, New York: Academic Press, 1996.

[193] X. R. Li, “Model-Set Sequence Conditioned Estimation in Multiple-Model Estimation with Variable Structure,” inProc. 1998 SPIE Conf. on Signaland Data Processing of Small Targets,vol. 3373, (Orlando, Florida, USA), April 1998.

[194] X. R. Li, “Optimal Selection of Estimatee for Multiple-Model Estimation with Uncertain Parameters,”IEEE Trans. Aerospace and Electronic Systems,AES-34(2):653–657, Apr. 1998.

[195] X. R. Li, “Engineer’s Guide to Variable-Structure Multiple-Model Estimation for Tracking,” inMultitarget-Multisensor Tracking: Applications andAdvances(Y. Bar-Shalom and W. D. Blair, eds.), vol. III, ch. 10, pp. 499–567, Boston, MA: Artech House, 2000.

[196] X. R. Li, “Multiple-Model Estimation with Variable Structure—Part II: Model-Set Adaptation,”IEEE Trans. Automatic Control, AC-45(11):2047–2060,Nov. 2000.

[197] X. R. Li, “Model-Set Design for Multiple-Model Estimation—Part I,” in Proc. 2002 International Conf. on Information Fusion, (Annapolis, MD, USA),pp. 26–33, July 2002.

[198] X. R. Li and Y. Bar-Shalom, “Mode-Set Adaptation in Multiple-Model Estimators for Hybrid Systems,” inProc. 1992 American Control Conf., (Chicago,IL), pp. 1794–1799, June 1992.

[199] X. R. Li and Y. Bar-Shalom, “Design of an Interacting Multiple Model Algorithm for Air Traffic Control Tracking,”IEEE Trans. Control SystemsTechnology, 1(3):186–194, Sept. 1993. Special issue on Air Traffic Control.

[200] X. R. Li and Y. Bar-Shalom, “A Recursive Multiple Model Approach to Noise Identification,”IEEE Trans. Aerospace and Electronic Systems, AES-30(3):671–684, July 1994.

[201] X. R. Li and Y. Bar-Shalom, “Multiple-Model Estimation with Variable Structure,”IEEE Trans. Automatic Control, AC-41(4):478–493, Apr. 1996.[202] X. R. Li and J. Dezert, “Layered Multiple-Model Algorithm with Application to Tracking Maneuvering and Bending Extended Target in Clutter,” in

Proc. 1998 International Conf. on Information Fusion, (Las Vegas, NV), pp. 207–214, July 1998.[203] X. R. Li and C. He, “Model-Set Choice for Multiple-Model Estimation,” inProc. IFAC 14th World Congress, (Beijing, China), pp. 196–174, July 1999.

Paper no. 3a-154.[204] X. R. Li and C. He, “Model-Set Design, Choice, and Comparison for Multiple-Model Estimation,” inProc. 1999 SPIE Conf. on Signal and Data

Processing of Small Targets,vol. 3809, (Denver, CO, USA), pp. 501–513, July 1999.[205] X. R. Li and V. P. Jilkov, “A Survey of Maneuvering Target Tracking—Part II: Ballistic Target Models,” inProc. 2001 SPIE Conf. on Signal and Data

Processing of Small Targets,vol. 4473, (San Diego, CA, USA), pp. 559–581, July-Aug. 2001.[206] X. R. Li and V. P. Jilkov, “A Survey of Maneuvering Target Tracking—Part III: Measurement Models,” inProc. 2001 SPIE Conf. on Signal and Data

Processing of Small Targets,vol. 4473, (San Diego, CA, USA), pp. 423–446, July-Aug. 2001.[207] X. R. Li and V. P. Jilkov, “Expected-Mode Augmentation for Multiple-Model Estimation,” inProc. 2001 International Conf. on Information Fusion,

(Montreal, QC, Canada), pp. WeB1.3–WeB1.10, Aug. 2001.[208] X. R. Li and V. P. Jilkov, “A Survey of Maneuvering Target Tracking—Part IV: Decision-Based Methods,” inProc. 2002 SPIE Conf. on Signal and

Data Processing of Small Targets,vol. 4728, (Orlando, Florida, USA), April 2002.[209] X. R. Li and V. P. Jilkov, “Survey of Maneuvering TargetTracking—Part I: Dynamic Models,”IEEE Trans. Aerospace and Electronic Systems,

AES-39(4), Oct. 2003.[210] X. R. Li, V. P. Jilkov, J.-F. Ru, and A. Bashi, “Expected-Mode Augmentation Algorithms for Variable-Structure Multiple-Model Estimation,” inProc.

IFAC 15th World Congress, (Barcelona, Spain), July 2002. Paper no. 2816.[211] X. R. Li, B. J. Slocumb, and P. D. West, “Tracking in the Presence of Range Deception ECM and Clutter by Decomposition and Fusion,” inProc.

1999 SPIE Conf. on Signal and Data Processing of Small Targets, vol. 3809, (Denver, CO, USA), pp. 198–210, July 1999.[212] X. R. Li and T. Solanky, “Applications of Sequential Tests to Target Tracking by Multiple Models,” inApplications of Sequential Methodologies

(N. Mukhopadhyay, S. Datta, and S. Chattopadhyay, eds.), pp. 219–247, New York: Marcel Dekker, 2003.[213] X. R. Li and Y. M. Zhang, “Multiple-Model Estimation with Variable Structure—Part V: Likely-Model Set Algorithm,”IEEE Trans. Aerospace and

Electronic Systems, AES-36(2):448–466, Apr. 2000.[214] X. R. Li and Y. M. Zhang, “Numerically Robust Implementation of Multiple-Model Algorithms,” IEEE Trans. Aerospace and Electronic Systems,

AES-36(1):266–278, Jan. 2000.[215] X. R. Li, Y. M. Zhang, and X. R. Zhi, “Design and Evaluation of Model-Group Switching Algorithm for Multiple-Model Estimation with Variable

Structure,” inProc. 1997 SPIE Conf. on Signal and Data Processing of Small Targets,vol. 3163, (San Diego, CA), July 1997.[216] X. R. Li, Y. M. Zhang, and X. R. Zhi, “Multiple-Model Estimation with Variable Structure: Model-Group Switching Algorithm,” in Proc. 36th IEEE

Conf. on Decision and Control, (San Diego, CA), Dec. 1997.[217] X. R. Li, Y. M. Zhang, and X. R. Zhi, “Multiple-Model Estimation with Variable Structure—Part IV: Design and Evaluation of Model-Group Switching

Algorithm,” IEEE Trans. Aerospace and Electronic Systems, AES-35(1):242–254, Jan. 1999.[218] X. R. Li and Z.-L. Zhao, “Measures of Performance for Evaluation of Estimators and Filters,” inProc. 2001 SPIE Conf. on Signal and Data Processing

of Small Targets,vol. 4473, (San Diego, CA, USA), pp. 530–541, July-August 2001.[219] X. R. Li, Z.-L. Zhao, P. Zhang, and C. He, “Model-Set Design, Choice, and Comparison for Multiple-Model Approach to Hybrid Estimation,” inProc.

Workshop on Signal Processing, Communications, Chaos and Systems, (Newport, RI, USA), pp. 59–92, June 2002.[220] X. R. Li, Z.-L. Zhao, P. Zhang, and C. He, “Model-Set Design for Multiple-Model Estimation—Part II: Examples,” inProc. 2002 International Conf.

on Information Fusion, (Annapolis, MD, USA), pp. 1347–1354, July 2002.[221] X. R. Li, X. R. Zhi, and Y. M. Zhang, “Multiple-Model Estimation with Variable Structure—Part III: Model-Group Switching Algorithm,” IEEE Trans.

Aerospace and Electronic Systems, AES-35(1):225–241, Jan. 1999.[222] X. R. Li, Y. M. Zhu, J. Wang, and C. Z. Han, “Optimal LinearEstimation Fusion—Part I: Unified Fusion Rules,”IEEE. Trans. Information Theory,

IT-49(9):2192–2208, Sept. 2003.[223] H.-J. Lin and D. P. Atherton, “An Investigation of the SFIMM Algorithm for Tracking Manoeuvring Targets,” inProc. 32nd IEEE Conf. Decision and

Control, (San Antonio, TX), pp. 930–935, Dec. 1993.[224] R. H. Liu and Q. Zhang, “Nonlinear Filtering: A Hybrid Approximation Scheme,”IEEE Trans. Aerospace and Electronic Systems, 37(2):470–480, Apr.

2001.[225] A. Logothetis and V. Krishnamurthy, “MAP State Sequence Estimation for Jump Markov Linear Systems via the Expectation-Maximization Algorithm,”

in Proc. 36th IEEE Conf. on Decision and Control, (San Diego, CA), pp. 1700–1705, Dec. 1997.[226] A. Logothetis and V. Krishnamurthy, “Expectation Maximization Algorithms for MAP Estimation of Jump Markov Linear Systems,” IEEE Trans.

Signal Processing, 47(8):2139–2156, August 1999.[227] A. Logothetis and V. Krishnamurthy, “A Bayesian EM Algorithm for Optimal Tracking of a Maneuvering Target in Clutter,” Signal Processing,

82(3):473–490, 2002.[228] A. Logothetis, V. Krishnamurthy, and J. Holst, “On Maneuvering Target Tracking via the PMHT,” inProc. 36th IEEE Conf. on Decision and Control,

(San Diego, CA), pp. 5024–5029, Dec. 1997. Also in [315], pp.157-162.[229] D. G. Luenberger,Linear and Nonlinear Programming. Reading, Massachusetts: Addison-Wesley, 2nd ed., 1984.[230] E. J. Lund, J. G. Balchen, and B. A. Foss, “Multiple Model Estimation with Inter-Residual Distance Feedback,”Modeling, Identification and Control,

13(3):127–140, 1992.


[231] M. F. Magalhaes and Z. Binder, “A True Multimodel Estimation Algorithm,” in Preprints of 10th World Congress of IFAC, vol. 10, (Munich), pp. 260–264, July 1987.

[232] D. T. Magill, “Optimal Adaptive Estimation of Sampled Stochastic Processes,”IEEE Trans. Automatic Control, AC-10:434–439, 1965.[233] A. K. Mahalanabis, B. Zhou, and N. K. Bose, “Improved Multi-Target Tracking in Clutter by PDA Smoothing,”IEEE Trans. Aerospace and Electronic

Systems, 26, 1990.[234] D. P. Malladi and J. L. Speyer, “A New Approach to Multiple Model Adaptive Estimation,” inProc. 1997 IEEE Conf. on Decision and Control, (San

Diego, CA), pp. 3460–3467, 1997.[235] D. P. Malladi and J. L. Speyer, “A Generalized ShiryaevSequential Probability Ratio Test for Change Detection andIsolation,” IEEE Trans. Automatic

Control, AC-44(8):1522–1534, 1999.[236] C. J. Masreliez, “Approximate Non-Gaussian Filteringwith Linear State and Observation Relations,”IEEE Trans. Automatic Control, AC-20:107–110,

1975.[237] C. J. Masreliez and R. D. Martin, “Robust Bayesian Estimation for the Linear Model and Robustifying the Kalman Filter,” IEEE Trans. Automatic

Control, AC-22:361–371, June 1977.[238] V. J. Mathews and J. K. Tugnait, “Detection and Estimation with Fixed Lag for Abruptly Changing Systems,”IEEE Trans. Aerospace and Electronic

Systems, AES-19(5):730–739, Sept. 1983.[239] P. S. Maybeck,Stochastic Models, Estimation and Control,Vols. II, III. New York: Academic Press, 1982.[240] P. S. Maybeck and P. D. Hanlon, “Performance Enhancementof a Multiple Model Adaptive Estimator,” inProc. 32nd IEEE Conf. on Decision and

Control, (San Antonio, TX), pp. 462–268, Dec. 1993. Also in IEEE Trans. Aerospace and Electronic Systems, Oct. 1995.[241] P. S. Maybeck and K. P. Hentz, “Investigation of Moving-Bank Multiple Model Adaptive Algorithms,”AIAA J. Guidance, Control, and Dynamics,

10(1):90–96, Jan.-Feb. 1987.[242] P. S. Maybeck and R. I. Suizu, “Adaptive Tracker Field-of-View Variation via Multiple Model Filtering,”IEEE Trans. Aerospace and Electronic Systems,

AES-21:529–539, July 1985.[243] E. Mazor, A. Averbuch, Y. Bar-Shalom, and J. Dayan, “Interacting Multiple Model Methods in Target Tracking: A Survey,” IEEE Trans. Aerospace

and Electronic Systems, AES-34(1):103–123, 1998.[244] S. McGinnity and G. W. Irwin, “Fuzzy Lodic Approach to Manoeuvring Target Tracking,”IEE Proc.–Radar, Sonar, and Navigation, 145(6):337–341,

Dec. 1998.[245] G. J. McLachlan and K. E. Basford,Mixture Models: Inference and Applications to Clustering. New York: Marcel Dekker, 1988.[246] G. J. McLachlan and T. Krishnan,The EM Algorithm and Extensions. New York: Wiley, 1997.[247] J. Meditch,Stochastic Linear Estimation and Control. McGraw-Hill, 1969.[248] J. S. Meditch, “A Survey of Data Smoothing for Linear andNonlinear Synamic Systems,”Automatica, 9(3):151–162, Mar. 1973.[249] D. E. Meer and P. S. Maybeck, “Multiple Model Adaptive Estimation for Space-Time Point Process Observations,” inProc. 23th IEEE Conf. Decision

and Control, (Las Vegas, NV), pp. 811–818, Dec. 1984.[250] R. K. Mehra, C. Rago, and S. Seereeram, “Failure Detection and Identification using a Nonlinear Interactive Multiple Model (IMM) Filtering Approach

with Aerospace Applications,” in11th IFAC Symp. on System Identification, (Fukuoka, Japan), July 1997.[251] M. Meila and M. Jordan, “Markov Mixtures of Experts,” in Multiple Model Approach to Modelling and Control(R. Murray-Smith and T. A. Johansen,

eds.), ch. 5, pp. 145–166, Taylor & Francis, 1997.[252] X. L. Meng and D. V. Dyk, “The EM Algorithm – an Old Folk-Song Sung to a Fast New Tune,”J. R. Statist. Soc.B, 59(3):511–567, 1997.[253] X. L. Meng and D. B. Rubin, “Maximum Likelihood Estimation via the ECM Algorithm: A General Framework,”Biometrika, 80:267–278, 1993.[254] M. Miller, O. Drummond, and A. Perrella, “Multiple-Model Filters for Boost-to-Coast Transtition of Theater Ballistic Missiles,” in Proceedings of SPIE

Conference on Signal and Data Processing of Small Targets 1998, vol. SPIE Vol. 3373, pp. 355–376, 1998.[255] P. Mookerjee, L. Campo, and Y. Bar-Shalom, “Estimation inSystems with Semi-Markov Switching Model,” inProc. of the 26th Conf. on Decision

and Control, December 1987.[256] P. Mookerjee, L. Campo, and Y. Bar-Shalom, “Sojourn Time Distribution in a Class of Semi-Markov Chains,” inProc. 1987 Conf. Inform. Sci. Syst.,

John Hopkins University, March 1987.[257] R. L. Moose, “An Adaptive State Estimator Solution to the Maneuvering Target Problem,”IEEE Trans. Automatic Control, AC-20(3):359–362, June

1975.[258] R. L. Moose, “Passive Range Estimation of an UnderwaterManeuvering Target,”IEEE Trans. Acoustic, Speech, and Signal Processing, ASSP-35(3):274–

285, March 1987.[259] R. L. Moose and T. E. Dailey, “Adaptive Underwater Target Tracking Using Passive Multipath Time-Delay Measurements,” IEEE Trans. Acoustic,

Speech, and Signal Processing, ASSP-33:777–787, Aug. 1985.[260] R. L. Moose and P. M. Godiwala, “Passive Depth Trackingof Underwater Maneuvering Targets,”IEEE Trans. Acoustic, Speech, and Signal Processing,

ASSP-33:1040–1044, Aug. 1985.[261] R. L. Moose, M. Sistanizadeh, and G. Skagfjord, “Adaptive State Estimation for a System With Unknown Input and Measurement Bias,”IEEE Journal

of Oceanic Engineering, pp. 222– 227, Jan 1987.[262] R. L. Moose, M. K. Sistanizadeh, and G. Skagejord, “Adaptive Estimation for a System with Unknown Measurement Bias,”IEEE Trans. Aerospace

and Electronic Systems, AES-22(6):732–739, Nov. 1986.[263] R. L. Moose, H. F. VanLandingham, and D. H. McCabe, “Modeling and Estimation of Tracking Maneuvering Targets,”IEEE Trans. Aerospace and

Electronic Systems, AES-15(3):448–456, May 1979.[264] R. L. Moose and P. L. Wang, “An Adaptive Estimator with Learning for a Plant Containing Semi-Markov Switching Parameters,” IEEE Trans. Systems,

Man, Cybernetics, SMC-3:277–281, May 1973.[265] R. E. Mortensen, “Maximum Likelihood Recursive Nonlinear Filtering,”J. Optim. Theory Application, 2:386–394, 1968.[266] D. Mosier and M. Sundareshan, “A Multiple Model for Passive Ranging,” inProc. 2001 SPIE Conf. on Signal and Data Processing of Small Targets,

vol. 4473, (San Diego, CA, USA), pp. 222–233, 2001.[267] A. Munir and D. P. Atherton, “Maneuvering Target Tracking Using an Adaptive Interacting Multiple Model Algorithm,” in Proc. 1994 American Control

Conf., (Baltimore, MD), June 1994.[268] A. Munir and D. P. Atherton, “Adaptive Interacting Multiple Model Algorithm for Tracking a Manoeuvring Target,”IEE Proc.–Radar, Sonar, and

Navigation, 142(1):11–17, Feb. 1995.[269] R. Murray-Smith and T. A. Johansen, eds.,Multiple Model Approaches to Modelling and Control. Taylor &Francis, 1997.[270] W. H. L. Neven, H. A. P. Blom, and P. C. de Kraker, “Jump Linear Model Based Aircraft Trajectory Reconstruction,” inProc. 1994 SPIE Conf. on

Signal and Data Processing of Small Targets,vol. 2235, (Orlando, FL, USA), pp. 540–556, 1994.[271] C. W. Ng, A. Lau, and K. Y. How, “Auto-Tuning Interactive Multiple Model,” inProc. SPIE Conf. on Acquisition, Tracking, and Pointing, XII, (Orlando,

FL.), Apr. 1998.[272] B. J. Noe and N. Collins, “Variable Structure Interacting Multiple Model Filter (VS-IMM) for Tracking Targets with Transportation Network Constraints,”

in Proc. 2000 SPIE Conf. on Signal and Data Processing of Small Targets,vol. 4048, (Orlando, Florida, USA), pp. 247–258, April 2000.[273] Y. Oshman, J. Shinar, and S. A. Weizman, “Using a MultipleModel Adaptive Estimator in a Random Evasion Missile/Aircraft Encounter,”AIAA

Journal of Guidance, Control, and Dynamics, 24(6):1176–1186, 2001.


[274] M. W. Owen and S. C. Stubberud, “Interacting Multiple Model Tracking Using a Neural Extended Kalman Filter,” inProc. International Joint Conferenceon Neural Networks, pp. 2788–2791, 1999.

[275] Q. Pan, Y. G. Jia, and H. G. Zhang, “A d-step Fixed-Lag Smoothing Algorithm for Markovian Switchig Systems,” inProc. 2002 International Conf.on Information Fusion, (Annapolis, MD, USA), pp. 721–726, July 2002.

[276] A. Papoulis and S. U. Pillai,Probability, Random Variables, and Stochastic Processes. New York: McGraw-Hill, 4th ed., 2002.[277] K. R. Pattipati and N. R. Sandell Jr., “A Unified View of State Estimation in Switching Environments,” inProc. 1983 American Control Conf.,

pp. 458–465, 1983.[278] V. Petridis and A. Kehagias, “A Multi-Model Algorithmfor Parameter Estimation of Time Varying Nonlinear Systems,”Automatica, 34:469–475, 1998.[279] A. I. Petrov and A. G. Zubov, “On Applicability of the Interacting Multiple-Model Approach to State Estimation for Systems with Sojourn-Time

Dependent Markov Model Switching,”IEEE Trans. Automatic Control, 41(1):136–140, Jan. 1996.[280] G. Pulford and R. J. Evans, “A Survey of HMM Trackingt with Emphasis on Over-The-Horizon Radar,” Tech. Rep. 7, CSSIP,Australia, May 1995.[281] G. Pulford and B. La Scala, “Manoeuvring Target Tracking Using the Expectation-Maximisation Algorithm,” in4th Int. Conf. On Control, Automation,

Robotics & Vision, (Singapore), December 1996. Also in [315], pp. 295-299.[282] G. Pulford and B. La Scala, “MAP Estimation of Target Manoeuvre Sequence with the Expectation-Maximisation Algorithm,” in Studies in Probabilistic

Multi-Hypothesis Tracking and Related Topics, vol. SES-98-01, pp. 277–292, Newport, Rhode Island: NavalUndersea Warfare Center Division, Feb.1998.

[283] G. Pulford and B. La Scala, “MAP Estimation of Target Manoeuvre Sequence with the Expectation-Maximisation Algorithm,” IEEE Trans. Aerospaceand Electronic Systems, 38(2):367–377, April 2002.

[284] X. Qiao and B. Wang, “A New Approach to Grid Adaptation of AGIMM Algorithm,” in Proc. 2003 International Conf. on Information Fusion, (Cairns,Australia), pp. 400–405, July, 8-11 2003.

[285] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,”Proc. IEEE, 77(2):257–286, Feb. 1989.[286] L. R. Rabiner and B. H. Juang, “An Introduction to Hidden Markov Models,”IEEE ASSP Magazine, pp. 4–16, Jan. 1986.[287] H. E. Rauch, F. Tung, and C. T. Striebel, “Maximum Likelihood Estimation of Linear Dynamic Systems,”AIAA Journal, 3:1445–1450, Aug. 1965.[288] R. A. Redner and H. F. Walker, “Mixture Densities, Maximum Likelihood and the EM Algorithm,”SIAM Review, 26(2), Apr. 1984.[289] D. B. Reid, “An Algorithm for Tracking Multiple Targets,” IEEE Trans. Automat. Contol, AC-24:843–854, Dec. 1979.[290] B. Ristic and M. S. Arulampalam, “Tracking a ManoeuvringTarget Using Angle-Only Measurements: Algorithms and Performance,”Signal Processing,

83(6):1223–1238, 2003.[291] B. L. Rozovskii, A. Petrov, and R. B. Blazek, “Interacting Banks of Bayesian Matched Filters,” inProc. 2000 SPIE Conf. on Signal and Data Processing

of Small Targets,vol. 4048, (Orlando, Florida, USA), April 2000.[292] J.-F. Ru and X. R. Li, “Interacting Multiple-Model Algorithm with Maximum Likelihood Estimation for FDI,” inProc. 2003 IEEE International Symp.

Intelligent Control, (Houston, TX), Oct. 2003.[293] Y. Ruan and P. Willet, “Maneuvering PMHTs,” inProc. 2001 SPIE Conf. on Signal and Data Processing of Small Targets,vol. 4473, (San Diego, CA,

USA), pp. 186–197, 2001.[294] D. J. Salmond, “Mixture Reduction Algorithms for Uncertain Tracking,” Tech. Rep. 88004, Royal Aerospace Establishment, Farnborough, England,

Jan. 1988.[295] D. J. Salmond, “Mixture Reduction Algorithms for TargetTracking in Clutter,” inProc. 1990 SPIE Conf. on Signal and Data Processing of Small

Targets,vol. 1305, pp. 434–445, 1990.[296] G. J. Schiller and P. S. Maybeck, “ Control of a Large Space Structure Using MMAE/MMAC Techniques,”IEEE Trans. Aerospace and Electronic

Systems, AES-33(4):1122–1131, Oct. 1997.[297] K. Schnepper, “A Comparison of GLR and Multiple Model Filters for a Target Tracking Problem,” inProceedings of the 25 Conference on Decision

and Control, (Athens, Greece), pp. 666–670, December 1986.[298] R. Schutz, B. Engelberg, W. Soper, and R. Mottl, “IMM Modeling for AEW Applications,” inProc. 2001 SPIE Conf. on Signal and Data Processing

of Small Targets,vol. 4473, (San Diego, CA, USA), pp. 210–221, 2001.[299] E. Semerdjiev and L. Mihaylova, “Adaptive InteractingMultiple Model Algorithm for Manoeuvring Ship Tracking,” in Proc. 1998 International Conf.

on Information Fusion, (Las Vegas, NV), pp. 974–979, July 1998.[300] E. Semerdjiev, L. Mihaylova, and X. R. Li, “An Adaptive IMM Estimator for Aircraft Tracking,” inProc. 1999 International Conf. on Information

Fusion, (Sunnyvale, CA, USA), pp. 770–776, July 1999.[301] E. Semerdjiev, L. Mihaylova, and X. R. Li, “Variable- and Fixed-Structure Augmented IMM Algorithm Using CoordinateTurn Model,” in Proc. 2000

International Conf. on Information Fusion, (Paris, France), pp. MoD2.25–MoD2.32, July 2000.[302] E. Semerdjiev, L. Mihaylova, and T. Semerdjiev, “Manoeuvring Ship Model Identification and Interacting Multiple Model Tracking Algorithm Design,”

in Proc. 1998 International Conf. on Information Fusion, (Las Vegas, NV), pp. 968–973, July 1998.[303] R. L. Sengbush and D. G. Lainiotis, “Simplified ParameterQuantization Procedure for Adaptive Estimation,”IEEE Trans. Automatic Control, AC-

14:424–425, Aug. 1969.[304] G. Shafer,A Mathematical Theory of Evidence. Princeton Univ. Press, 1976.[305] P. J. Shea, T. Zadra, D. Klamer, E. Frangione, and R. Brouillard, “Improved State Estimation Through Use of Roads in Ground Tracking,” inProc.

2000 SPIE Conf. on Signal and Data Processing of Small Targets, vol. 4048, (Orlando, Florida, USA), pp. 321–332, April 2000.[306] S. N. Sheldon and P. S. Maybeck, “An Optimizing Design Strategy for Multiple Model Adaptive Estimation and Control,”IEEE Trans. Automatic

Control, AC-38(4):651–654, Apr. 1993.[307] T. Shima, Y. Oshman, and J. Shinar, “Efficient Multiple Model Adaptive Estimation in Ballistic Missile Interception Scenarios,”AIAA Journal of

Guidance, Control, and Dynamics, 25(4):667–675, 2002.[308] H.-J. Shin, S.-M. Hong, and D.-H. Hong, “Adaptive-Update-Rate Target Tracking for Phased-Array Radar,”IEE Proc., G, 142(2), 1995.[309] D. Simon and T. L. Chia, “ Kalman Filtering with State Equality Constraints,” IEEE Trans. Aerospace and Electronic Systems, 38(1):128–136, Jan.

2002.[310] F. L. Sims, D. G. Lainiotis, and D. T. Magill, “RecursiveAlgorithm for the Calculation of the Adaptive Kalman Filter Weighting Coefficients,”IEEE

Trans. Automatic Control, AC-14:215–218, Apr. 1969.[311] B. J. Slocumb, P. D. West, and X. R. Li, “Implementation andAnalysis of the Decomposition-Fusion ECCM Technique,” inProc. 2000 SPIE Conf.

on Signal and Data Processing of Small Targets,vol. 4048, (Orlando, Florida, USA), pp. 486–497, April 2000.[312] T. L. Song and D. G. Lee, “Effective Filtering of TargetGlint,” IEEE Trans. Aerospace and Electronic Systems, 36(1):234–240, Jan. 2000.[313] R. Streit and T. E. Luginbuhl, “A Probabilistic Multi-Hypothesis Tracking Algorithm without Enumeration and Pruning,” in Proc. 6th Joint Service

Data Fusion Symposium, (Laurel, MD), pp. 1015–1024, June,14-18 1993.[314] R. Streit and T. E. Luginbuhl, “Maximum Likelihood Method for Probabilistic Multi-Hypothesis Tracking,” inProc. 1994 SPIE Conf. on Signal and

Data Processing of Small Targets,vol. 2335, Apr. 1994.[315] R. L. Streit, ed.,Studies in Probabilistic Multi-Hypothesis Tracking and Related Topics, vol. SES-98-01 ofScientific and Engineering Studies. Newport,

Rhode Island: Naval Undersea Warfare Center Division, February 1998.[316] S. Sugimoto and I. Ishizuka, “Identification and Estimation Algorithms for Markov Chain Plus AR PRocess,” inProc. IEEE International Conf. on

Acoustics, Speech and Signal Processing, ICASSP 83, pp. 247–250, 1983.


[317] E. Sviestins, “Multi-Radar Tracking for Theater Missile Defence,” inProc. 1995 SPIE Conf. on Signal and Data Processing of Small Targets,vol.2561, (San Diego, CA), pp. 384–394, July 1995.

[318] D. Sworder and J. Boyd, “Enhanced Multiple Model Algorithms,” Automatica, 2000.[319] D. Sworder and J. Boyd, “Maneuver Sequence Identification,” in Proc. 2003 SPIE Conf. on Signal and Data Processing of Small Targets,vol. 5204,

(San Diego, CA, USA), Aug. 2003.[320] D. Sworder, J. Boyd, and R. Elliott, “Modal Estimation in Hybrid Systems,”Journal of Mathematical Analysis and Applications, 245(1):225–247, 2000.[321] D. D. Sworder and J. Boyd,Estimation Problems in Hybrid Systems. Cambridge University Press, 1999.[322] D. D. Sworder and J. E. Boyd, “A New Merging Formula for Multiple Model Trackers,” inProc. 2000 SPIE Conf. on Signal and Data Processing of

Small Targets,vol. 4048, pp. 498–509, Apr. 2000.[323] D. D. Sworder and J. E. Boyd, “Measurement Rate Reduction in Hybrid Systems,”AIAA Journal of Guidance, Control, and Dynamics, 24(2):411–414,

2001.[324] D. D. Sworder and R. G. Hutchins, “Utility of Imaging Sensor Sensors in Tracking Systems,”Automatica, 29(2):445–449, March 1993.[325] D. D. Sworder, P. F. Singer, and R. G. Hutchins, “Image-Enhanced Estimation Methods,”Proc. IEEE, 81(6):797–812, June 1993.[326] G. Tanner, “Accounting for Glint In Target Tracking,”in Proc. 1998 SPIE Conf. on Signal and Data Processing of Small Targets,vol. 3373, (Orlando,

FL, USA), April 1998.[327] J. S. Thorp, “Optimal Tracking of Maneuvering Targets,” IEEE Trans. Aerospace and Electronic Systems, AES-9:512–519, July 1973.[328] D. M. Titterington, A. F. M. Smith, and U. E. Makov,Statistical Analysis of Finite Mixture Distributions. John Wiley & Sons, 1985.[329] D. M. Tobin and P. S. Maybeck, “Enhancements to a Multiple Model Adaptive Estimator—Target Image Tracker,”IEEE Trans. Aerospace and Electronic

Systems, AES-24:417–425, July 1988.[330] J. K. Tugnait, “Adaptive Estimation and Identificationfor Discrete Systems with Markov Jump Parameters,”IEEE Trans. Automatic Control, AC-

27(5):1054–1065, October 1982.[331] J. K. Tugnait, “Detection and Estimation for Abruptly Changing Systems,”Automatica, 18(5):607–615, Sept. 1982.[332] J. K. Tugnait, “Detection and Identification of AbruptChanges in Linear Systems,” inProc. of the 1983 American Control Conference, pp. 960–965,

June 1983.[333] J. K. Tugnait and A. H. Haddad, “A Detection-EstimationScheme for State Estimation in Switching Environments,”Automatica, 15(4):477–481, July

1979.[334] S. Tzafestas and K. Watanabe, “Techniques for Adaptive Estimation and Control of Discrete-Time Stochastic Systems with Abruptly Changing Systems,”

in Advances in Control and Dynamic Systems(C. T. Leondes, ed.), vol. 55, pp. 111–148, Academic Press, 1993.[335] P. Vacher, I. Barret, and M. Gauvrit, “Design of a Tracking Algorithm for an Advanced ATC System,” inMultitarget-Multisensor Tracking: Applications

and Advances,vol. II (Y. Bar-Shalom, ed.), ch. 1, Norwood, MA: Artech House, 1992.[336] H. F. VanLandingham and R. L. Moose, “Digital Control of High Performance Aircraft Using Adaptive Estimation Techniques,”IEEE Trans. Aerospace

and Electronic Systems, AES-13(2):112–120, Mar. 1977.[337] D. Varon, “New Advances in Air Traffic Control Trackingof Aircraft,” Journal of Air Traffic Control, pp. 6–12, Oct.-Dec. 1994.[338] J. R. Vasquez and P. S. Maybeck, “Density Algorithm Based Moving-Bank MMAE,” in Proc. 1999 IEEE Conf. on Decision and Control, (Phoenix,

AZ), pp. 4117–4122, Dec. 1999.[339] J. R. Vasquez and P. S. Maybeck, “Enhanced Motion and Sizing of Bank in Moving-Bank MMAE,” inProc. 1999 American Control Conf., (San Diego,

CA), pp. 1555–1562, June 1999.[340] H. Wang, T. Kirubarajan, and Y. Bar-Shalom, “PrecisionLarge Scale Air Traffic Surveillance Using IMM/Assignment Estimators,” IEEE Trans.

Aerospace and Electronic Systems, AES-35(1):255–266, Jan. 1999.[341] X. Wang, S. Challa, R. Evans, and X. R. Li, “Minimal Sub-Model-Set Algorithm for Maneuvering Target Tracking,”IEEE Trans. Aerospace and

Electronic Systems, AES-39(4), Oct. 2003.[342] K. Watanabe,Adaptive Estimation and Control: Paritioning Approach. New York: Prentice Hall, 1992.[343] K. Watanabe and S. G. Tzafestas, “A Hierarchical Multiple Model Adaptive Control of Discrete-time Stochastic Systems for Sensor and Actuator

Uncertainties,”Automatica, 26(5):875–886, Sept. 1990.[344] G. A. Watson, “IMAM Algorithm for Tracking Maneuvering Targets in Clutter,” inProc. 1996 SPIE Conf. on Signal and Data Processing of Small

Targets,vol. 2759, pp. 304–315, 1996.[345] G. A. Watson and D. W. Blair, “Interacting Acceleration Compensation Algorithm for Tracking Maneuvering Targets,” IEEE Trans. Aerospace and

Electronic Systems, 31(3):1152–1159, July 1995.[346] G. A. Watson and W. D. Blair, “IMM Algorithm for Tracking Targets That Maneuver Through Coordinated Turns,” inProc. of Signal and Data

Processing for Small Targets, vol. SPIE 1698, pp. 236–247, April 1992.[347] G. A. Watson and W. D. Blair, “Multiple Model Estimationfor Control of Phased Array Radar,” inProc. 1993 SPIE Conf. on Signal and Data

Processing of Small Targets,vol. , (Orlando, FL), pp. 275–286, April 1993.[348] G. A. Watson and W. D. Blair, “Tracking Targets with Multiple Sensors Using the InteractingMultiple Model Algorithm,” in Proc. 1993 SPIE Conf.

on Signal and Data Processing of Small Targets,vol. , (Orlando, FL), April 1993.[349] G. A. Watson and W. D. Blair, “Revisit Control of a Phased Array Radar for Tracking Maneuvering Targets when Supported by a Precision ESM

Sensor,” inProc. 1994 SPIE Conf. on Signal and Data Processing of Small Targets,vol. 2235, (Orlando, FL, USA), 1994.[350] G. A. Watson and W. D. Blair, “Solution to Second Benchmark Problem for Tracking Maneuvering Targets in the Presenceof FA and ECM,” inProc.

1995 SPIE Conf. on Signal and Data Processing of Small Targets, vol. 2561, (San Diego, CA), July 1995.[351] G. Watson and W. Blair, “IMM Algorithm for Solution to Benchmark Problem for Tracking Maneuvering Targets,” inProc. Acquisition, Tracking and

Pointing IX, vol. SPIE 2221, pp. 476–488, 1994.[352] I. H. Whang and J. G. Lee, “Maneuvering Target Tracking via Model Transition Hypotheses,” inProc. 35th IEEE Conf. on Decision and Control,

(Japan), pp. 3157–3158, Dec. 1996.[353] B. J. Wheaton and P. S. Maybeck, “Second-Order Acceleration Model for an MMAE Target Tracker,”IEEE Trans. Aerospace and Electronic Systems,

31(1):151–166, 1995.[354] P. Willet, Y. Ruan, and R. Streit, “The PMHT for Maneuvering Target Tracking,” inProc. 1998 SPIE Conf. on Signal and Data Processing of Small

Targets,vol. 3373, (Orlando, Florida, USA), pp. 416–427, 1998. Alsoin [315], pp. 165-176.[355] P. Willet, Y. Ruan, and R. Streit, “PMHT: Problems and Some Solutions,”IEEE Trans. Aerospace and Electronic Systems, 38(3):738–753, July 2002.[356] W.-R. Wu, “Target Tracking with Glint Noise,”IEEE Trans. Aerospace and Electronic Systems, 29(1):174–185, Jan. 1993.[357] W.-R. Wu and D.-C. Chang, “Maneuvering Target Tracking with Colored Noise,”IEEE Trans. Aerospace and Electronic Systems, 32(4):1311–1319,

Oct. 1996.[358] W.-R. Wu and P.-P. Cheng, “A Nonlinear IMM Algorithm for Maneuvering Target Tracking,”IEEE Trans. Aerospace and Electronic Systems, 30(3):875–

885, July 1994.[359] C. Yang and Y. Bar-Shalom, “Discrete-Time Point ProcessFilter for Image-Based Target Mode Estimation,” inProc. 29th IEEE Conf. Decision and

Control, (Honolulu, HA), Dec. 1990.[360] C. Yang, Y. Bar-Shalom, and C.-F. Lin, “Discrete-Time Point Process Filter for Mode Estimation,”IEEE Trans. Automatic Control, 37(11):1812–1816,

1992.


[361] M. Yeddanapudi, Y. Bar-Shalom, and K. R. Pattipati, “MATSurv: Multisensor Air Trafic Surveillance Data,” inProc. 1995 SPIE Conf. on Signal andData Processing of Small Targets,vol. 2561, 1995.

[362] M. Yeddanapudi, Y. Bar-Shalom, and Y. Pattipati, “IMM Estimation for Multitarget-Multisensor Air Trafic Surveillance,” Proc. IEEE, 85(1):80–94,Jan. 1997.

[363] J. Yoon, Y. H. Park, I. H. Whang, and J. H. Seo, “An Evidential Reasoning Approach to Maneuvering Target Tracking,” inProc. AIAA Conf. Guidance,Navigation, and Control, (New Orleans, LA), Aug. 1997.

[364] Q. Zhang, “Hybrid Filtering for Linear Systems with Non-Gaussian Disturbances,”IEEE Trans. Automatic Control, 45(1):50–61, 2000.[365] Y. M. Zhang and X. R. Li, “Detection and Diagnosis of Sensor and Actuator Failures Using IMM Estimator,”IEEE Trans. Aerospace and Electronic

Systems, AES-34(4):1293–1312, Oct. 1998.[366] Z.-L. Zhao, X. R. Li, and V. P. Jilkov, “Best Linear Unbiased Filtering with Nonlinear Measurements for Target Tracking,” in Proc. 2003 SPIE Conf.

on Signal and Data Processing of Small Targets,vol. 5204, (San Diego, CA, USA), Aug. 2003.[367] Z.-L. Zhao, X. R. Li, V. P. Jilkov, and Y.-M. Zhu, “Optimal Linear Unbiased Filtering with Polar Measurements for Target Tracking,” inProc. 2002

International Conf. on Information Fusion, (Annapolis, MD, USA), pp. 1527–1534, July 2002.[368] D. Zuo, C. Han, S. Bian, L. Zheng, and H. Zhu, “Tracking Maneuvering Target in Glint Environment,” inProc. 2003 International Conf. on Information

Fusion, (Cairns, Australia), pp. 1394–1399, July 2003.[369] D. Zuo, C. Han, Z. Lin, H. Zhu, and H. Hong, “Fuzzy Multiple Model Tracking Algorithm for Maneuvering Target,” inProc. 2002 International Conf.

on Information Fusion, (Annapolis, MD, USA), pp. 818–823, July 2002.

PLACEPHOTOHERE

X. Rong Li (S’90-M’92-SM’95-F’04) received the B.S. and M.S. degreesfrom Zhejiang University, Hangzhou, Zhejiang, PRC,in 1982 and 1984, respectively, and the M.S. and Ph.D. degrees from the University of Connecticut, USA, in 1990 and 1992,respectively. He joined the Department of Electrical Engineering, University of New Orleans in 1994, where he is now UniversityResearch Professor, Department Chair, and Director of Information and Systems Laboratory. During 1986–1987 he did researchon electric power at the University of Calgary, AB, Canada. He was an Assistant Professor at the University of Hartford, WestHartford, CT, from 1992 to 1994. He has authored or coauthored four books:Estimation and Tracking(with Yaakov Bar-Shalom,Norwood, MA: Artech House, 1993),Multitarget-Multisensor Tracking(with Yaakov Bar-Shalom, Storrs, CT: YBS Publishing, 1995),Probability, Random Signals, and Statistics(Boca Raton, FL: CRC Press, 1999), andEstimation with Applications to Tracking andNavigation (with Yaakov Bar-Shalom and T. Kirubarajan, New York: Wiley, 2001); six book chapters; and more than 160 journaland conference proceedings papers. His current research interests include signal and data processing, target tracking and informationfusion, stochastic systems, statistical inference, and electric power.

Dr. Li has served the International Society of Information Fusion as the President (2003), Vice President (1998-2002) and a member of Board of Directors(since 1998); served as General Chair for 2002 International Conference on Information Fusion, and Steering Chair or General Vice-Chair for 1998, 1999,and 2000 International Conferences on Information Fusion; servedIEEE Transactions on Aerospace and Electronic Systemsas an Associate Editor from 1995to 1996 and as Editor from 1996 to 2003; servedCommunications in Information and Systemsas an Editor since 2001; received a CAREER award and anRIA award from the U.S. National Science Foundation. He received 1996 Early Career Award for Excellence in Research fromthe University of New Orleansand has given numerous seminars and short courses in U.S., Europe and Asia. He won several outstanding paper awards, is listed in Marquis’ Who’s Who inAmericaandWho’s Who in Science and Engineering, and consulted for several companies.

PLACEPHOTOHERE

Vesselin P. Jilkov(M’01) received his B.S. and M.S. degree in mathematics from the University of Sofia, Bulgaria in 1982, the Ph.D. degree in the technical sciences in 1988, and the academic rank senior research fellow of the Bulgarian Academy of Sciences in1997.

He was a research scientist with the R&D Institute of Special Electronics, Sofia, (1982-1988) where he was engaged in researchand development of radar tracking systems. From 1989 to 1999 hewas a research scientist with the Central Laboratory for ParallelProcessing – Bulgarian Academy of Sciences, Sofia, where he worked as a key researcher in numerous academic and industry projects(Bulgarian and international) in the areas of Kalman filtering, target tracking, multisensor data fusion, and parallel processing. Since1999 Dr. Jilkov has been with the Department of Electrical Engineering, University of New Orleans, where he is currently anassistant professor, and is engaged in teaching and conducting research in the areas of hybrid estimation and target tracking. Hiscurrent research interests include stochastic systems, nonlinear filtering, applied estimation, target tracking, information fusion.

Dr. Jilkov is author/coauthor of over 50 journal articles and conference papers. He is a member of ISIF (International Society ofInformation Fusion).

a survey of maneuvering target tracking—part v: multiple-mode l...

Documents