running linux on a dsp? exploiting the computational resources … · 2010. 12. 12. · •...

18
Running Linux on a DSP? Exploiting the Computational Resources of a programmable DSP Micro-Processor with uClinux Michael Durrant Jeff Dionne Michael Leslie 1 Introduction Many software developers in recent years have turned to Linux as their operating system of choice. Until the advent of uClinux, how- ever, developers of smaller embedded sys- tems, usually incorporating microprocessors with no MMU, (Memory Management Unit) could not take advantage of Linux in their designs. uClinux is a variant of mainstream Linux that runs on “MMU-less” processor ar- chitectures. Perhaps a DSP? If a general pur- pose DSP has enough of a useable instruction set why not! Component costs are of primary concern in embedded systems, which are typically re- quired to be small and inexpensive. Micropro- cessors with on-chip MMU hardware tend to be complex and expensive, and as such are not typically selected for small, simple embedded systems, which do not require them. Benefits Using Linux in devices, which require some in- telligence, is attractive for many reasons: • It is a mature, robust operating system • It already supports a large number of de- vices, filesystems, and networking proto- cols • Bug fixes and new features are constantly being added, tested and refined by a large community of programmers and users • It gives everyone from developers to end users complete visibility of the source code • A large number of applications (such as GNU software) exist which require little to no porting effort • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux may be configured in many ways other than that of the familiar UNIX-like Linux distribution. Nev- ertheless, an example of a system running uClinux in this way will help to illustrate how it may be used. Kernel / Root Filesystem The Arcturus uCdimm is a complete computer in an so-DIMM form factor, built around ei- ther a Motorola 68VZ328 “DragonBall” mi- crocontroller, the latest processor in a family widely popularized by the “Palm Pilot” or a Motorola CF5272 “ColdFire” microcontroller. It is equipped with 2M of flash memory, 8M of SDRAM, both synchronous and asynchronous serial ports, and an Ethernet controller. There

Upload: others

Post on 31-Dec-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Running Linux on a DSP?Exploiting the Computational Resources of a

programmable DSP Micro-Processor with uClinux

Michael Durrant Jeff Dionne Michael Leslie

1 Introduction

Many software developers in recent years haveturned to Linux as their operating system ofchoice. Until the advent of uClinux, how-ever, developers of smaller embedded sys-tems, usually incorporating microprocessorswith no MMU, (Memory Management Unit)could not take advantage of Linux in theirdesigns. uClinux is a variant of mainstreamLinux that runs on “MMU-less” processor ar-chitectures. Perhaps a DSP? If a general pur-pose DSP has enough of a useable instructionset why not!

Component costs are of primary concern inembedded systems, which are typically re-quired to be small and inexpensive. Micropro-cessors with on-chip MMU hardware tend tobe complex and expensive, and as such are nottypically selected for small, simple embeddedsystems, which do not require them.

Benefits

Using Linux in devices, which require some in-telligence, is attractive for many reasons:

• It is a mature, robust operating system

• It already supports a large number of de-vices, filesystems, and networking proto-cols

• Bug fixes and new features are constantlybeing added, tested and refined by a largecommunity of programmers and users

• It gives everyone from developers to endusers complete visibility of the sourcecode

• A large number of applications (such asGNU software) exist which require littleto no porting effort

• Linux’s very low cost

2 uClinux System Configurations

Embedded systems running uClinux may beconfigured in many ways other than that of thefamiliar UNIX-like Linux distribution. Nev-ertheless, an example of a system runninguClinux in this way will help to illustrate howit may be used.

Kernel / Root Filesystem

The Arcturus uCdimm is a complete computerin an so-DIMM form factor, built around ei-ther a Motorola 68VZ328 “DragonBall” mi-crocontroller, the latest processor in a familywidely popularized by the “Palm Pilot” or aMotorola CF5272 “ColdFire” microcontroller.It is equipped with 2M of flash memory, 8M ofSDRAM, both synchronous and asynchronousserial ports, and an Ethernet controller. There

Page 2: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Ottawa Linux Symposium 2002 131

is a custom resident boot monitor on the de-vice, which is capable of downloading a bi-nary image to flash memory and executing it.The image that is downloaded consists of auClinux kernel and root filesystem. In UNIXterms, the kernel makes a block device out ofthe memory range where the root filesystem re-sides, and mounts this device as root. The rootfilesystem is in a read-only UNIX-like formatcalled “ROMFS”. Since the DragonBall runsat 32MHz, the kernel and optionally user pro-grams execute in-place in flash memory. Fastersystems like the MCF5272 ColdFire running at48 MHz or 66 MHz benefit from copying thekernel and root filesystem into RAM and exe-cuting there.

The Micro Signal Architecture MSA devel-oped by ADI (and Intel) is very interesting.While a DSP processor, it possesses enoughgeneral-purpose processor instruction supportto allow operating systems like uClinux to bedeployed. The ADI BlackFin is one such pro-cessor.

Other embedded systems may be inherentlynetwork-based, so a kernel in flash memorymight mount a root filesystem being servedvia nfs (Network File System). An even morenetwork-centric device might request its kernelimage and root filesystems via dhcp (DynamicHost Configuration Protocol) and bootp. Notethat drivers for things like IDE and SCSI disk,CD, and floppy support are all still present inthe uClinux kernel.

User Space

The contents of the root filesystem vary moredramatically between embedded systems usinguClinux than between Linux workstations.

The uClinux distribution contains a rootfilesystem which implements a small UNIX-like server, with a console on the serial port,

a telnet daemon, a web server, nfs (NetworkFile System) client support, and a selection ofcommon UNIX tools.

A system such as an mp3 (MPEG layer 3 com-pressed audio) CD player might not even havea console. The kernel might contain only sup-port for a CD drive, parallel I/O, and an audioDAC. User space might consist only of an in-terface program to drive buttons and LEDs, tocontrol the CD, and which could invoke oneother program; an MPEG audio player. Suchan application specific system would obviouslyrequire much less memory than the full-fledgeduClinux distribution as it is shipped.

3 Development Under uClinux

Development Tools

Developing software for uClinux systems typ-ically involves a cross-compiler toolchain builtfrom the familiar GNU compiler tools. Soft-ware that builds under gcc (GNU C Compiler)for x86 architectures, for example, often buildswithout modification on any uClinux target.

Debugging a target via gdb (GNU debugger)presents a debugging interface common to allthe platforms supported by gdb. The debug-ging interface to a uClinux kernel on a targetdepends on debugging support for that target.If the target processor has hardware support fordebugging, such as IEEE’s JTAG or Motorola’sBDM, gdb may connect non-intrusively to thetarget to debug the kernel. If the processorlacks such support, a gdb “stub” may be in-corporated into the kernel. gdb communicateswith the stub via a serial port, or via Ethernet.

uClibc

uClibc, the C library used in uClinux, is asmaller implementation than those which ship

Page 3: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Ottawa Linux Symposium 2002 132

with most modern Linux distributions. The li-brary has been designed to provide most of thecalls that UNIX-like C programs will use. Ifan application requires a feature that is not im-plemented in uClibc, the feature may be addedto uClibc, it may me linked in as a separate li-brary, or it may be added to the application it-self.

Differences Between uClinux And Linux

Considering that the absence of MMU sup-port in uClinux constitutes a fundamental dif-ference from mainstream Linux, surprisinglylittle kernel and user space software is affected.Developers familiar with Linux will notice lit-tle difference working under uClinux. Embed-ded systems developers will already be familiarwith some of the issues peculiar to uClinux.

Two differences between mainstream Linuxand uClinux are a consequence of the removalof MMU support from uClinux. The lack ofboth memory protection and of a virtual mem-ory model are of importance to a developerworking in either kernel or user space. Certainsystem calls to the kernel are also affected.

Memory Protection

One consequence of operating without mem-ory protection is that an invalid pointer refer-ence by even an unprivileged process may trig-ger an address error, and potentially corrupt oreven shut down the system. Obviously coderunning on such a system must be programmedcarefully and tested diligently to ensure robust-ness and security.

Virtual Memory

There are three primary consequences of run-ning Linux without virtual memory. One isthat processes, which are loaded by the ker-nel, must be able to run independently of their

position in memory. One way to achieve thisis to “fix up” address references in a programonce it is loaded into RAM. The other is togenerate code that uses only relative address-ing (referred to as PIC, or Position IndependentCode). uClinux supports both of these meth-ods.

Another consequence is that memory alloca-tion and deallocation occurs within a flat mem-ory model. Very dynamic memory allocationcan result in fragmentation, which can starvethe system. One way to improve the robustnessof applications that perform dynamic memoryallocation is to replacemalloc() calls with re-quests from a preallocated buffer pool.

Since virtual memory is not used in uClinux,swapping pages in and out of memory is notimplemented, since it cannot be guaranteedthat the pages would be loaded to the same lo-cation in RAM. In embedded systems it is alsounlikely that it would be acceptable to suspendan application in order to use more RAM thanis physically available.

System Calls

The lack of memory management hardware onuClinux target processors has meant that somechanges needed to be made to the Linux systeminterface. Perhaps the greatest difference is theabsence of thefork() andbrk() system calls.

A call to fork() clones a process to create achild. Under Linux,fork() is implemented us-ing copy-on-write pages. Without an MMU,uClinux cannot completely and reliably clonea process, nor does it have access to copy-on-write.

uClinux implementsvfork() in order to com-pensate for the lack offork(). When a parentprocess callsvfork() to create a child, both pro-cesses share all their memory space includingthe stack.vfork()then suspends the parent’s ex-

Page 4: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Ottawa Linux Symposium 2002 133

ecution until the child process either callsexit()or execve(). Note that multitasking is not oth-erwise affected. It does, however, mean thatolder-style network daemons that make exten-sive use offork() must be modified. Since childprocesses run in the same address space as theirparents, the behaviour of both processes mayrequire modification in particular situations.

Many modern programs rely on child pro-cesses to perform basic tasks, allowing the sys-tem to maintain an interactive “feel” even if theprocessing load is quite heavy. Such programsmay require substantial reworking to performthe same task under uClinux. If a key applica-tion depends heavily on such structuring, thenit may be necessary to either re-create the ap-plication, or an MMU-enabled processor mayalso be needed.

A hypothetical, simple network daemon,hyped, will illustrate the use offork(). hypedalways listens on a well-known network port(or socket) for connections from a networkclient. When the client connects,hypedgivesit new connection information (a new socketnumber) and callsfork(). The child processthen accepts the client’s reconnection to thenew socket, freeing the parent to listen for newconnections.

uClinux has neither an autogrow stack norbrk() and so user space programs must use themmap()command to allocate memory. Forconvenience, our C library implementsmal-loc() as a wrapper tommap(). There is acompile-time option to set the stack size of aprogram.

4 Brief Anatomy Of The uClinuxKernel

This section describes the changes that weremade to the Linux kernel to allow it to run on

MMU-less processors.

Architecture-Generic Kernel Changes

The architecture-generic memory managementsubsystem was modified to remove reliance onMMU hardware by providing basic memorymanagement functions within the kernel soft-ware itself. For those who are familiar withuClinux, this is the role of the directory/mm-nommuderived from and replacing the direc-tory /mm. Several subsystems needed to bemodified, added, removed, or rewritten. Kerneland user memory allocation and deallocationroutines had to be reimplemented. Support fortransparent swapping / paging was removed.Program loaders which support PIC (PositionIndependent Code) were added. A new binaryobject code format, named “flat” was created,which supports PIC and which has a very com-pact header. Other program loaders, such asthat for ELF, were modified to support otherformats which, instead of using PIC, use abso-lute references which it is the responsibility ofthe kernel to “fix up” at run time. Each methodhas advantages and disadvantages. TraditionalPIC is quick and compact but has a size re-striction on some architectures. For example,the 16-bit relative jump in Motorola 68k archi-tectures limits PIC programs to 32K. The run-time fix-up technique removes this size restric-tion, but incurs overhead when the program isloaded by the kernel.

Porting uClinux To New Platforms

The task of adding support for a new CPU ar-chitecture in uClinux is similar to doing so inLinux proper. Fortunately, there is a great dealof code in Linux that can be ported with mi-nor adaptations and reused in uClinux. Ma-chine dependent startup code and header filesalready exist in Linux for MMU versions ofprocessors in the ARM, Motorola 68k, MIPS,

Page 5: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Ottawa Linux Symposium 2002 134

SPARC and other families. This code may beadapted to support non-MMU versions of theseprocessors in uClinux.

Driver code, which already exists in Linux, isoften easily portable to run under uClinux. Is-sues in porting such code may involve endianissues or memory handling code, which as-sumes the presence of MMU support.

5 The Future of uClinux

Numerous enhancements are in the works foruClinux. The diversity of the innovations thatmainstream Linux receives from the commu-nity pave a good path for the development ofuClinux. The uClinux developer community isvery active; enhancements and innovations arefrequently made.

Real-Time

Linux is now a platform for hard real-timeapplication development (that is, applicationswith deterministic latency under varying pro-cessor loads). The Linux kernel scheduleralready provides non-deterministic, or “soft”,real-time, and systems such as RT-Linux andRTAI (Real-Time Application Interface) up-grade the Linux kernel to provide hard (deter-ministic) real-time support. Real-time applica-tions in Linux have access to the extensive re-sources of the Linux kernel without sacrificinghard real-time performance. Efforts are under-way to provide the RTAI subsystem for use onvarious MMU-less processors.

Adding DSP support or adding general purposeprocessor support to a DSP

Processors like the ColdFire MCF5272 are pri-marily general purpose processors with a RSICinstruction set. Yet Motorola included limitedDSP functionality with a multiply and accu-

mulate DSP functions. This is an approachof adding functionality into general purposeprocessors. In this case the ColdFire proces-sor can and does have uClinux support. Otherprocessors like the Analog Devices BlackFin(MSA DSP), the processor is primarily a DSPwith added general purpose processor support.uClinux is portable to the BlackFin and ex-pected to be publicly available in the fall of2002.

The attached paper“Exploiting the Compu-tational Resources of a Programmable DSPMicro-processor (Micro Signal ArchitectureMSA) in the field of Multiple Target Tracking”(Hussain et al 2001), is a technical represen-tation of the computational requirements for amultiple target acquisition system. Such a sys-tem would require a DSP and would greatlybenefit from a UNIX like operating system af-forded by uClinux. Commercial uses for sucha system would include traditional sonar/radardevices allowing for affordable collision detec-tion systems, for robots, and automobiles.

uClinux 2.4

uClinux 2.4, with support for Motorola Drag-onBall and ColdFire, was released in Januaryof 2001. New ports, including MIPS, HitachiSH2, ARM, and SPARC, will be made to theuClinux 2.4 tree, which is based on Linux 2.4.but enhancements are also still being made tothe uClinux 2.0 tree. uClinux 2.4 will give de-velopers access to many of the new featuresadded to Linux since 2.0, including supportfor USB, IEEE Firewire, IrDA, and new net-working features such as bandwidth allocation,(a.k.a. QoS: Quality of Service) IP Tables, andIPv6.

Since uClinux is Open Source, developmenteffort spent on uClinux will never be lost. En-gineering professionals world-wide, are usinguClinux to create commercial products and a

Page 6: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Ottawa Linux Symposium 2002 135

significant portion of their work is contributedback to the open source community.

**********

References:

“Running Linux on low cost, low power,MMU-less processors”, Michael Durrant, Arc-turus Networks Inc.

“Building Low Cost, Embedded, Network Ap-pliance with Linux”, Greg Ungerer, SnapGearInc.

“Embedded Coldfire-Taking Linux On-Board”, Nigel Dick, Motorola Ltd

“When hard real-time goes soft”, D. JeffDionne, Arcturus Networks Inc.

Real-Time Application Interface (RTAI):http://www.rtai.org

The uClinux project:http://www.uClinux.org

Page 7: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Ottawa Linux Symposium 2002 136

EXPLOITING THE COMPUTATIONAL RESOURCES OF A PROGRAMMABLE DSPMICRO-PROCESSOR (Micro Signal Architecture MSA) IN THE FIELD OF MULTIPLE

TARGET TRACKING

Principal Researcher: Dr. D.M. Akbar HussainContributions: Michael Durrant [email protected]

Jeff Dionne [email protected]

Arcturus Networks Inc. 195 The West Mall, Suite 608, M9C 5K1, Toronto CANADATel: 416 621 0125 Fax: 416 621 0190

Abstract: During the last few decades the im-proved technology available for surveillancesystems has generated a great deal of interestin algorithms capable of tracking large num-ber of objects.. Typical sensor systems, suchas radar or sonar using information from oneor more sensors can obtain noisy informationdata returns from true targets and other pos-sible objects. The tracking problem requiresthe processing of this data to produce accurateestimates of the position and velocity of thetargets. There are two types of uncertaintiesinvolved with the measurement data, first theposition inaccuracy, as the measurements arecorrupted by noise, and second the measure-ment origin since there may be uncertainty asto which measurement originates from whichtarget. These uncertainties lead to a data asso-ciation problem and the tracking performancedepends not only on the measurement noise butalso upon the uncertainty in the measurementorigin. Therefore, in a multiple target environ-ment extensive computation may be requiredto establish the correspondence between mea-surements and tracks every radar scan.

It is also true that tremendous advancement hasalso been made in the computational capabili-ties of a processing unit to deal with such de-manding tasks. In this paper we present a sim-ulated study of implementing a recursive mul-

tiple target tracking (MTT) algorithm using atrack splitting filter, study uses a MSA proces-sor from Analog Devices Inc. (ADI) which is aprogrammable Digital Signal Processor (DSP)with the added functionality to realize manyof the programming advancements more nor-mally associated with Micro-controllers. In ad-dition, the study also explores the porting andsupport of an embedded real time operatingsystem to such an architecture.

Keywords: DSP, RTAI, Kalman Filter, TargetTracking, State Estimation, uClinux.

1. INTRODUCTION

In the ideal situation of tracking a single tar-get, where one noisy measurement is obtained,standard Kalman filter technique can be used ateach radar scan In the multi-target case, an un-known number of measurements are receivedat each radar scan and, assuming no false mea-surements, each measurement has to be associ-ated with an existing or new target tracking fil-ter. When the targets are well apart from eachother forming a measurement prediction ellipsearound a track to associate the measurementwith the appropriate track is a standard tech-nique [1]. When targets are near to each other,more than one measurement may fall withinthe prediction ellipse of a filter and prediction

Page 8: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Ottawa Linux Symposium 2002 137

ellipses of different filters may interact. Thenumber of measurements accepted by a filterwill therefore be quite sensitive in this situationto the accuracy of the prediction ellipse. Sev-eral approaches may be used for this situation[2, 3]. One such approach is called the TrackSplitting Filter algorithm. In this algorithm, ifn measurements occur inside a prediction el-lipse, then the filter branches or splits intontracking filters.

This situation, which results in an increasednumber of filters, requires more processingpower and in some cases the system may satu-rate. A mechanism for restricting excess trackssplitting is required, since eventually this pro-cess may result in more than one filter trackingthe same target. The first criterion is the sup-port function, which uses the likelihood func-tion of a track as the pruning criterion. Thesecond, similarity criterion, which uses a dis-tance threshold to prune similar filter trackingthe same target [4]. The flow chart shown inFig. 1 depicts the actual processing sequenceof a recursive MTT algorithm.

2. TARGET MOTION MODEL ANDSTATE ESTIMATION

The motion of a target being tracked is as-sumed to be approximately linear and modeledby the following equations

xn+1 = Φxn + Γωn (1)

zn+1 = Hxn+1 + νn+1 (2)

where the state vector

xTn+1 = (x x• y y•)n+1 (3)

is a four dimensional vector,ωn is the two di-mensional disturbance vector,zn+1 is the twodimensional measurement vector andνn+1 is

the two dimensional measurement error vec-tor. AlsoΦ is the assumed (4x4) state transitionmatrix,Γ is the excitation matrix (4x2) and H isthe measurement matrix (2x4) and are definedrespectively by,

Φ =

1 ∆t 0 00 1 0 00 0 1 ∆t0 0 0 1

(4)

Γ =

∆t2/2 0

∆t 00 ∆t2/20 ∆t

(5)

H =

[1 0 0 00 0 1 0

](6)

Here ∆t is the sampling interval and corre-sponds to the time interval (scan interval) as-sumed constant, at which radar measurementdata is received. The system noise sequenceωn is a two dimensional Gaussian sequence forwhich

E(ωn) = 0 (7)

where E is the expectation operator. The co-variance ofωn is

E(ωnωTn ) = Qnδnm (8)

whereQn is a positive semi-definite (2x2) di-agonal matrix andδnm is the Kronecker deltadefined as

δnm =0 n 6= m1 n = m

The measurement noise sequenceνn is a twodimensional zero mean Gaussian white se-quence with a covariance of

Page 9: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Ottawa Linux Symposium 2002 138

E(νn νTn) = Rnδnm (9)

whereRn is a positive semi-definite symmetric(2x2) matrix given by

Rn =

[σ2

x σxy

σxy σ2y

](10)

σ2x andσ2

y are the variances in the error of thex, y position measurements, andσxy is the co-variance between the x and y measurements er-rors. It is assumed that the measurement noisesequence and the system noise sequence are in-dependent of each other, that is

E(νn ωTn) = 0 (11)

The initial statex0 is also assumed independentof theνn andωn sequences that is

E(x0 ωT n) = 0 (12)

E(x0 νT n) = 0 (13)

x0 is a four dimensional random vector withmeanE(x0) = x0/0 and a (4x4) positive semi-definite covariance matrix defined by

P0 = E[(x0 − x−0)(x0 − x−

0)T] (14)

wherex−0 is the mean of the initial statex0.

The Kalman filter is an optimal filter as it min-imizes the mean squared error between the esti-mated state and the true state (actual) providedthe target dynamics are correctly modeled.

The standard Kalman filter equations for esti-mating the position and velocity of the targetmotion described by eqns. (1) and (2) are

Figure 1: Recursive MTT

x∧n+1/n = Φx∧

n (15)

x∧n+1 = x∧

n+1/n + Kn+1 νn+1 (16)

Kn+1 = Pn+1/nHTB−1

n+1 (17)

Pn+1/n = ΦPnΦT + ΓQFnΓT (18)

Bn+1 = Rn+1 + HPn+1/nHT (19)

Page 10: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Ottawa Linux Symposium 2002 139

Pn+1 = (I−Kn+1H)Pn+1/n (20)

νn+1 = zn+1 −Hx∧n+1/n (21)

wherex∧n+1/n, x∧

n+1, Kn+1, Pn+1/n, Bn+1, andPn+1 are the predicted state, estimated state,the Kalman gain matrix, the prediction covari-ance matrix, the covariance matrix of innova-tion, and the covariance matrix of estimationrespectively.QF

n is the covariance of the mea-surement noise assumed by the filter which isnormally taken equal toQn. In a practical sit-uation, however, the value of this covarianceis not known so the choice should be such thatthe filter can adequately track any possible mo-tion of the target. To start the computation aninitial value is chosen forP0. Even if this isa diagonal matrix, then clearly from the aboveequations the covariance matricesBn+1, Pn+1

andPn+1/n for a given n, do not remain diago-nal whenRn+1 is not diagonal.

3. MSA

The MSA processor has five independent,full functional computation units: TwoArithmetic/Logic Units (ALU), Two Multi-plier/Accumulator Units and a Barrel Shifter,Fig. 2 shows the MSA. The processor unitscan process 8-bit, 16 bit, 32 bit and 40 bit data,depending upon the type of function being per-formed.

• Data Address Generator (DAG)

The Micro Signal Architecture processor baseis a dual data path, modified Harvard Architec-ture. The two DAG support the sophisticatedoperations required in DSP algorithms, suchas reversed addressing, circular buffering. Inaddition, auto increment, auto decrement andbase plus immediate offset addressing are pos-sible. These units update dedicated register

files, the DAG register file and the pointer reg-ister file. In general, DSP mathematical opera-tions involving circular buffers uses the DAGregister file. The DAG register file containsfour sets of 32 bit Index, Length, Modify andBase registers, for a total of sixteen 32 bit regis-ters. With these two independent DAGs, MSAcan generate two 32 bit addresses in a singlecycle, fetching or storing two 32 bit or four 16bit operands. The pointer register file is usedfor more general operations. It has six gen-eral purpose 32 bit addressing registers and twodedicated 32 bit Stack Pointers for stack ma-nipulation. The Micro Signal Architecture pro-cessor can access a unified 4 GB linear addressspace.

Figure 2: MSA

• Memory Management Unit (MMU)

The memory Management unit (MMU) pro-vides protection to individual tasks that maybe operating on the Micro Signal Architec-ture Processor and may protect system Mem-ory Mapped Registers from unintended access.The architecture includes two Memory Man-agement Units: one for instruction memoryand the other one for data memory. TheseMMUs control accesses to caches, on chip

Page 11: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Ottawa Linux Symposium 2002 140

SRAM and off chip memories. The Micro Sig-nal Architecture Processor supports multiplepages of memory, in four page sizes: 1 K byte,4 K byte, 1 M byte, 4 M byte. The instructionMMU may designate as many as sixteen dis-tinct memory pages, each with a separate setof criteria governing its cache and protectionproperties. The Data MMU may designate afurther sixteen memory pages.

The Micro Signal Architecture Processor usesa modified Harvard Architecture in combi-nation with a hierarchical memory structure.Memory closest to the Core are referred to aslevel 1 (L1), and generally have a single cycle,zero latency access by the core. Other mem-ories on chip are referred to as level 2 (L2)and may have multiple-cycle access or latency.Off-chip memories are usually seen at the samehierarchical level as the on chip L2 memory. Atthe L1 level, the instruction memory holds in-structions only and the two data memories holdonly data. At L2 level a single unified mem-ory space exists, holding both instructions anddata. Also L1 instruction memory and L1 datamemories may be configured as either SRAMor caches. The Micro Signal Architecture Pro-cessor has a dedicated scratch data memorythat is always configured as an additional L1data SRAM. Scratchpad memory is designedto store stack and local variables.

• Program Sequencer

The program sequencer is used to get instruc-tions from the L1 instruction memory and de-termine if these instructions are 16 bit ‘Con-trol’ instructions, 32 bit ‘DSP’ instructions or64 bit ‘DSP multi-function’ instruction. Thesequencer manages the control of data throughthe processor core, insuring that the pipe line isfully interlocked and that zero overhead loop-ing is correctly managed.

• Event Controller, Timer and JTAG Inter-face

The event controller supports nested and priori-tized events. The controller has five basic typesof events: Emulation, Reset, Non-maskable In-terrupt (NMI), Exception and Interrupts. Theprogrammable interval timer is used to gener-ate periodic interrupts. The 8-bit pre-scale reg-ister can be used to set the number of cyclesfor decrementing a 32 bit counter register. Thenumber of clock cycles per timer decrementmay be one to 256. An interrupt is generatedwhen this count register reaches zero. The reg-ister may be automatically reloaded from a 16bit period register and the count resumed.

The JTAG interface provides the method bywhich the Micro Signal Architecture emula-tions interact with the processor core. The em-ulation unit contains an instruction register anddata register that is accessed through the JTAGport. These two registers are used to controland interrogate the processor during emulationmode. Additionally the trace unit can be usedto store the last 16 non-sequential PC values,which can be used to reconstruct the processorsequence.

• Performance Monitor Unit (PMU)

During the operation of the Micro Signal Ar-chitecture Processor, the performance moni-tor unit can be used to review the efficiencyof certain operations, e.g., cache misses, andprovide the information for code optimization.The performance unit consists of six instruc-tion address watch points and two data addresswatch points. These address watch points maybe combined in pairs to create address rangewatch points, which additionally may be asso-ciated with various counters for evaluating theperformance and profiling the processor code.

Page 12: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Ottawa Linux Symposium 2002 141

• Instruction Set

The Micro Signal Architecture Processor as-sembly level instruction set employs an alge-braic syntax, presenting code to the program-mer that is very readable even at the assemblylevel. The instructions have been specificallytuned to provide a very flexible, yet denselyencoded instruction set that will compile to avery small final memory size. The instructionset also provides fully featured multi-functioninstructions that allow the programmer to useany of the Micro Signal Architecture Processorcore resources in a single instruction. Coupledwith a variety of enhanced features more oftenseen on micro-controllers, this instruction set isvery efficient when compiling code for C andC++ languages.

• Modes of Operation

The Micro Signal Architecture Processor hasfive distinct modes of operations: User, Su-pervisor, Emulation, Idle and Reset. Usermode has restricted access to certain systemresources, thus providing a protected soft-ware environment. Supervisor and Emula-tion modes have unrestricted access to core re-sources. Idle and Reset modes prevent soft-ware execution, so resource access is not anissue.

4. DSP PROGRAMMING MODEL

The instruction set for the Micro Signal Archi-tecture processor provides two types of instruc-tions: one primarily for micro-controller andgeneral tasks, and those used for DSP orientedcomputation. Specific Micro Signal Architec-ture instructions are tuned for their correspond-ing task, but instructions can easily be com-bined. Table 1 shows the resources available tothe DSP processing. DSP instructions usuallyread two 32 bit operands from the register file,

compute results and either store results back tothe register file or accumulate them in the twoaccumulators as shown in Fig. 3.

Resources Description

Data Exe-cution Unit0

• 16 x 16 bit MAC unit (MAC0)• 40 bit accumulator (a0)• 40 bit shifter

Data Exe-cution Unit1

• 16 x 16 bit MAC unit (MAC1)• 40 bit accumulator (a1)

Data Regis-ter File

8, 32 bit wide, accessible asregister halves

Table 1: Execution Units and Register File

Figure 3: Register Files and Execution Unit

The register file delivers two 32 bit operandsto MAC units and accepts two 32 bit resultsin return. In addition, the register file deliverstwo 32 bit values to the memory system or itreceives two 32 bit values from memory. Toperform multiplication, each MAC unit selecttwo 16 bit operands from the two 32 bit wordsthat it receives from the register file as shownin Fig. 4.

It also means that each MAC unit usesfour possible combinations of input operands.Therefore, when both MAC units operate inparallel, sixteen possible combinations of 16

Page 13: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Ottawa Linux Symposium 2002 142

Figure 4: Operand Selection

bit input operand results. Multiply and accu-mulate operations can be performed on fourcombinations of input operands, as shown inFig. 5, for MAC 0. Assembly instruction ex-ample of dual MAC operation.

A0 += R1.H*R2.L, A0+=R1.L*R2.L;

In addition to performing a multiply accumu-late, the multiplier results may optionally bewritten to the register file, independently ofeach other. In addition to multiply accumulate,in which the contents of the accumulator areeffected, MSA also has multiply instructionswhich does not effect the contents of the ac-cumulator.

The ALUs have a different mechanism thanMACs, ALU 0 as the primary source of arith-metic and logic operation, it can perform thefollowing operations:

• 32 bit operation on two 32 bit inputs pro-ducing one output.

• One 16 bit operation on two 16 bit inputsresiding in arbitrary register halves pro-ducing one 16 bit output.

• Two 16 bit (dual) operation on four 16 bitinputs (in two registers). Dual 16 bit ALUoperations can perform four combinations

Figure 5: MAC 0 Possible Choices

of addition and subtraction on the inputoperands as shown in Fig. 6.

Figure 6: Dual 16-bit ALU Operations

The result from the upper halves are placed intothe upper half of the destination register andlower halves to the lower half of the destina-tion register. It also supports the cross option,in which case the order of the two 16 bit resultis inverted. In addition to the operations sup-ported on the primary computation unit ALU

Page 14: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Ottawa Linux Symposium 2002 143

0, a small number of instructions support thesecondary arithmetic unit ALU 1 in parallelwith ALU 0. The Micro Signal Architecturecore does not support full dual ALU function-ality, because the maximum number of inputoperands that may be transferred from the reg-ister file to the execution units is limited to two32 bit operands. Therefore, parallel ALU oper-ations on ALU 0 and ALU 1 can be performedonly on the same two 32 bit words.

5. IMPLEMENTATION

For the implementation of the target trackingalgorithm employing a track splitting filter, asensor was simulated in two dimensions togenerate data for different scenarios. The sim-ulating program is capable of generating up to20 targets in a predefined scan window, twoversions of the program can be used: one, inwhich initial position, heading, noise and otherparameters are taken as default, in the secondcase all this data can be entered by the pro-grammer. The generated data basically pro-vides the following input information corre-sponding to each individual target.

x y σ2x σ2

y σxy Flag CID

Where x, y is the noisy position measurements,σ2

x, σ2y andσxy are the measurement noise co-

variance values, Flag is an index used to see ifa measurement has been used by the filter andCID is another index color ID used for iden-tification of each tracking filter/measurement.Basically, first five variables will be availablein real time to the tracking algorithm throughan interface, in an actual implementation. Herethe input data is given to the algorithm throughhand coded assembly instructions.

6. RESULT AND CONCLUSION

For the evaluation of our algorithm, a simula-tor for MSA hardware architecture was writ-ten as the actual silicon for MSA is expected

in September, 2001. For our evaluation a twocrossing target scenario was used, the sensorwas moving parallel to the target, towards thetargets at a very high speed compared with thetargets. The two targets at start up were ap-proximately 30 Km from the sensor and crosseach other at 30th scan. After successful initial-ization of two targets, tracking proceeded witheach tracking filter accepting only one mea-surement as they are well apart. Track split-ting (branching) starts at the 24th scan, maxi-mum number of tracks occur at 31st scan. At38th scan all the redundant tracks are prunedand normal tracking of two targets resume. Itshould be noted here that lots of detail abouttracking is not revealed [5][6] as the actual em-phasis is about utilizing the architecture advan-tages of the processor.

The main concern was to implement the com-putationally expensive algorithm in such a wayto take the advantages of the DSP instructionset for MSA also, one of the reason of selectingBlackfin was its DSP assembly language alge-braic syntax which is probably ideal for suchan application. As pointed out earlier, the algo-rithm is hand coded in assembly in order to ex-ploit the DSP. Where ever necessary C is usedfor the actual implementation. As the archi-tecture supports richly encoded instruction setfor such a demanding task, in some cases mul-tiple operations are performed with less num-ber of cycles, this obviously improves real timeprocessing. Here, we just present one particu-lar sequence where the implementation helpsin achieving our mission of exploiting its ar-chitecture. At each radar scan incoming mea-surements (returns) are to be tested with eachestablished track (path) for a possible match.The following computation sequence that in-cludes multiplication of three matrices of di-mensions [(1x2)*(2x2)*(2x1)] is to be calcu-lated and then compared with a known value tofind out if a possible match exits. Therefore,each track should accept the true measurement

Page 15: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Ottawa Linux Symposium 2002 144

from the same target, although this situationchanges at the time of multiple measurementsexisting in the same vicinity. In our case whenthe target cross each other. Now to evaluatethis expression

d2n = νn

TB−1nνn

supposeνTn of dimension (1x2) is stored in data

register R0 such that:

R0.L = νnT(0, 0) R0.H = νn

T(0, 1)

(0, 0) represents first row, first column elementof the matrix and so on. SimilarlyB−1

n is storedin two registers as

R2.L = B−1n (0, 0) R2.H = B−1

n (0, 1)

R3.L = B−1n (1, 0) R3.H = B−1

n (1, 1)

By using MSA instruction set we can performthis task in just four steps:

A0 = R0.L ∗ R2.LA1 = R0.L ∗ R2.L

](Parallel Op)

R4.L = A0+ = R0.H ∗ R3.LR4.H = A1+ = R0.H ∗ R3.H

](Parallel Op)

A0 = R4.L ∗ R0.LA1 = R4.H ∗ R0.H

](Parallel Op)

d2n = A0 + A1

The above calculation is just a replica ofcalculations performed repeatedly in a recur-sive filter, which basically suite such DSParchitecture. Because DSP processors arecharacterized by tight code loop and in-factdata flow driven. Therefore, this type ofloop/calculations can be implemented veryefficiently. Actually, the number of timesd2

n is evaluated to find a probable track-measurement pair increases exponentially withincreasing number of track splitting. If it ispossible to do such calculations quickly the ef-ficiency will increase in terms of time. Thesimulated study carried out here has logicallydetermined the advantage of using MSA corefor such an application. The simulated studyafter running a number of scenarios and carefulevaluation reveals that MSA may provide 20 to30% improved performance in processing forsuch a computationally intensive application.Although, no bench mark evaluation criterionis used for such evaluation.

The selection of an operating system suitablefor implementation in a target tracking sys-tem presents many challenges. The main chal-lenge is selecting a processor and complimen-tary operating system able to handle the com-putationally expensive load. The combinationof Blackfin architecture and Linux operatingsystem meets this challenge with Linux Ker-nel. The second being the ability of the op-erating system to respond in a deterministicmanner. This can be achieved by operatingsystem that are designed to operate in RealTime. Linux was selected in this study asits characteristic performance achieves close toa predictable real time response under knownloads. However, Linux on its own is not a suit-able Real Time Operating System (RTOS) andsome characterize Linux’s response time as“Soft Real Time”, the observed jitter is largerthan the jitter associated with running Linuxunder the control of a real time scheduler suchas found in “Real Time Executive in C for DSP

Page 16: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Ottawa Linux Symposium 2002 145

(RTXCDSP)” or the Real Time Application In-terface (RTAI) [11].

Recently, efforts are underway to provide theRTAI subsystems for use on various MMU-lessprocessors. If one is to develop an embeddedsystem using such a DSP (MSA) for real timeapplications, uClinux with uClibc may providean excellent platform for such real time targettracking implementation. The work is alreadyunderway for porting uClinux and uClibc forthis processor (MSA).

7. REFERENCES

[1] P. L. Smith and G. Buechler, “A branchingalgorithm for discriminating and trackingmultiple objects,” IEEE Transactions Au-tomatic Control, Vol. AC-20, February1975, pp 101-104.

[2] Y. Bar-Shalom and T. E. Fortmann,“Tracking and Data Association”, Aca-demic Press, Inc. 1988.

[3] S. S. Blackman, “ Multiple Target Track-ing with Radar Applications”, ArtechHouse, Inc. 1986.

[4] D. P. Atherton, E. Gul, A. Kountzeris andM. Kharbouch, “Tracking Multiple Tar-gets using parallel processing,” Proc. IEE,Part D, No. 4, July 1990, pp 225-234.

[5] D. P. Atherton, D. M. A. Hussain andE. Gul, “Target tracking using transputersas parallel processors,” 9th IFAC Sympo-sium on Identification and System Param-eter Estimation, Budapest, Hungary July1991.

[6] M. Kharbouch, “Some investigations ontarget tracking,” D. Phil thesis, SussexUniversity, 1991.

[7] The Carmel DSP core user’s manual, infi-neon, 1999.

[8] Clifford Liem, Pierre Paulin, Ahmed Jer-raya, “Address calculation for retarget-able compilation and exploration of in-struction set architecture.”

[9] The scientist and engineer’s guide to Dig-ital Signal Processing.

[10] 1999, DSP architecture directory.

[11] Michael Durrant, Michael Leslie, UsingLinux for MMU-less micro-processors.Electronics Engineering UK, Feb. 2001.

8. ACKNOWLEDGEMENTS

The author would like to thank Arcturus Net-works Inc. and Lineo Canada Corp for theirsupport in every aspect for doing such a study,and special thanks to Analog Devices Inc., forproviding all the technical details, documentsetc., about MSA which really encouraged theauthor to carry out such investigation. Theauthor also likes to extend his appreciationand gratitude to his colleagues, having beenvery kind to extend their comments at difficulttimes. Also there is lot of literature available onthe Internet about DSP, few of the referencesare given here for the acknowledgement of thiswonderful technology feast, thanks Internet.

Page 17: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Proceedings of theOttawa Linux Symposium

June 26th–29th, 2002Ottawa, Ontario

Canada

Page 18: Running Linux on a DSP? Exploiting the Computational Resources … · 2010. 12. 12. · • Linux’s very low cost 2 uClinux System Configurations Embedded systems running uClinux

Conference Organizers

Andrew J. Hutton,Steamballoon, Inc.Stephanie Donovan,Linux SymposiumC. Craig Ross,Linux Symposium

Proceedings Formatting Team

John W. Lockhart,Wild Open Source, Inc.

Authors retain copyright to all submitted papers, but have granted unlimited redistribution rightsto all as a condition of submission.