use a microprocessor, a dsp, or both

Upload: nqdinh

Post on 30-May-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Use a Microprocessor, A DSP, Or Both

    1/11

    Workshop ESC- 304 : Use a Microprocessor, a DSP, or Both?Bjorn Hori, Jeff Bier, and Jennifer Eyre

    Berkeley Design Technology, Inc.

    Introduction

    The number and variety of products that include some form of digital signal processing hasgrown dramatically over the last ten years. Digital signal processing has become a key componentin many consumer, communications, medical, and industrial products. These products use avariety of hardware approaches to implement the required signal processing, ranging from off-the-shelf general-purpose microprocessors (GPPs) to application-specific integrated circuits(ASICs) to fixed-function chips. Off-the-shelf programmable processors are often the preferredchoice because they avoid the need for designing a custom chip and because they can bereprogrammed in the field, allowing product upgrades or fixes. They are often more cost-effective(and less risky) than custom chips, particularly for low- and moderate-volume applications, wherethe development cost of a custom ASIC may be prohibitive.

    Digital signal processors (or DSPs) are microprocessors specially designed for signal processingapplications. In the 1980s and early 1990s, typical DSPs provided signal processingperformance that was significantly better than what was achievable with typical GPPs. Sincethen, the performance gap between DSPs and GPPs has narrowed; today many GPPs are capableof handling serious digital signal processing applications.

    While both GPPs and DSPs provide sufficient processing power for many signal processingapplications, each type of processor brings with it important strengths and weaknesses that canhave an enormous influence on how effectively the processor meets the needs of a particularapplication. In this article, we examine the key attributes that distinguish DSP algorithms andapplications and show how these attributes have led to the architectural specialization found in

    DSP processors. We then investigate how GPP and DSP architectures compare with regard tothese key DSP algorithm attributes. In addition, we examine the architectural techniquesemployed in high-performance GPPs and DSPs, which have led to rapid gains in signalprocessing performance in both classes of processor.

    DSP and GPP Architecture Fundamentals

    Before we examine the key attributes of digital signal processing algorithms, well broadlyclassify the processors we will be discussing. Weve already introduced the terms GPP and DSP,but we would like to refine our definitions of these two classes of processor to include both low-end and high-performance variants of each class.

    The most common GPPs are microcontrollers (or MCUs). Microcontrollers are designed for cost-sensitive embedded applications and are available in 4-, 8-, 16-, and 32-bit varieties. As a group,MCUs account for the vast majority of all microprocessors sold. Our focus will be on 32-bitMCUs, such as ARMs ARM7 architecture. We will refer to this type of MCU as a low-endGPP.

  • 8/14/2019 Use a Microprocessor, A DSP, Or Both

    2/11

  • 8/14/2019 Use a Microprocessor, A DSP, Or Both

    3/11

    Fast Multipliers

    The FIR filter is mathematically expressed as y[n]= , where x[n] is a vector of

    input data and h[n] is a vector of filter coefficients. To produce an output sample, a set of recentinput samples is multiplied by a set of constant coefficients. Hence, the main component of theFIR filter algorithm is a dot product: a series of multiplications where the multiplication productsare summed, or accumulated. These operations are not unique to the FIR filter algorithm; infact, multiplication (often combined with accumulation of products) is one of the most commonoperations performed in signal processingconvolution, infinite impulse response (IIR) filtering,and Fourier transforms all also involve heavy use of multiply-accumulate (or MAC) operations.

    ][][1

    0

    k n xk h N

    k

    =

    Early microprocessors implemented multiplications via a series of shift and add operations, eachof which consumed one or more clock cycles. In 1982, however, Texas Instruments (TI)introduced the first commercially successful DSP processor, the TMS32010, which incorpo-rated specialized hardware to enable it to compute a multiplication in a single clock cycle. Asmight be expected, faster multiplication hardware yields faster performance in many digital signalprocessing algorithms. Today, hardware multipliers are standard in both low-end GPPs and DSPs.The multiplier in a low-end DSP is usually combined with an adder to facilitate single-cyclemultiply-accumulate operations; the resulting hardware is referred to as a MAC unit.

    Multiple Execution Units

    Digital signal processing applications typically have very high computational requirements incomparison to other types of processor applications. This is because they generally executemath-intensive signal processing algorithms (such as FIR filtering) in real time on signalssampled at data rates of 10-100 kHz or higher. Hence, low-end DSP processors often includeseveral independent execution units that are capable of operating in parallelfor example, inaddition to the MAC unit, they typically contain an arithmetic-logic unit (ALU), shifter, andaddress generation unit. Low-end GPPs in contrast, are generally not capable of parallel

    operations.

    Efficient Memory Accesses

    Executing MAC operations at a rate of one MAC per clock cycle requires more than just a single-cycle MAC unit. It also requires the ability to fetch the MAC instruction and the operands frommemory in a single cycle. Hence, good digital signal processing performance requires highmemory bandwidthhigher than is typically supported on low-end GPPs, which usually rely on asingle-bus connection to memory and can only make one access per clock cycle. To address theneed for increased memory bandwidth, designers of early low-end DSP processors developedmemory architectures that could support multiple memory accesses per cycle. The most commonapproach (which is still used in many DSP architectures) was to use two or more separate banksof memory, each of which was accessed by its own bus and could be read or written during everyclock cycle. Often, instructions were stored in one memory bank while data was stored in another.With this arrangement, the processor could fetch both an instruction and a data operand in parallelin every cycle. Since many digital signal processing algorithms (such as FIR filters) consume twodata operands per instruction (e.g., a data sample and a coefficient), some architects have adoptedan extension to this idea: divide data memory into two separate banks (e.g., X data and Y data),each allowing one access per instruction cyclethus enabling the processor to execute a MAC ina single cycle. In contrast, most low-end GPPs require one instruction cycle per memory accessand per arithmetic operation. Thus, in the context of sustained sequential MAC operations, a low-

  • 8/14/2019 Use a Microprocessor, A DSP, Or Both

    4/11

    end GPP typically requires several instruction cycles to complete each MAC operationeven if asingle-cycle multiplier is available.

    Low-end DSPs often further support high memory bandwidth requirements via dedicatedhardware for calculating memory addresses. These address generation units operate in parallelwith the DSP processors main execution units, enabling the processor to access data at newlocations in memory (for example, stepping through a vector of coefficients) without pausing tocalculate the new address. In contrast, low-end GPPs do not provide separate address generationunits and must rely on the main arithmetic logic unit (ALU) to perform address calculations. Thistypically requires one instruction cycle per address update, widening the performance gapbetween low-end DSPs and GPPs in tasks such as sustained sequential MAC operations.

    Memory accesses in digital signal processing algorithms tend to exhibit very predictable patterns;for example, for each sample in an FIR filter, the filter coefficient vector is accessed sequentiallyfrom start to finish; when processing the next input sample, accesses start over from thebeginning of the coefficient vector. (This is in contrast to other types of computing tasks, such asdatabase processing, where accesses to memory are less predictable.) Address generation units inlow-end DSP processors take advantage of this predictability by supporting specializedaddressing modes that enable the processor to access data efficiently in patterns commonly foundin digital signal processing algorithms. The most common addressing mode is register-indirectaddressing with post-increment, which is used to automatically increment the address pointer foralgorithms where repetitive computations are performed on a series of data stored sequentially inmemory. Without this feature, the processor would spend instruction cycles incrementing theaddress pointer, as is the case with low-end GPPs. Many DSP processors also support circularaddressing, which allows the processor to access a block of data sequentially and then auto-matically wrap around to the beginning addressexactly the pattern used to access coefficients inFIR filtering, for example. Circular addressing is also very helpful in implementing first-in, first-out buffers, commonly used for I/O and for FIR filter input data delay lines. Another specializedDSP addressing mode is bit-reversed addressing, which is used in certain fast Fourier transform(FFT) implementations. Low-end GPPs do not support specialized DSP addressing modes like

    circular or bit-reversed addressing. These addressing modes, if needed, must be implemented insoftware, or alternative algorithms must be selected that do not require these addressing modes.

    Data Format

    Most DSP processors (whether low-end or high-performance) use a fixed-point numeric data typeinstead of the floating-point format most commonly used in scientific applications. In a fixed-point format, the binary point (analogous to the decimal point in base 10 math) is located at afixed location in the data word. In contrast, most high-performance GPPs use a floating-pointformat, where numbers are expressed using an exponent and a mantissa. The binary pointessentially floats based on the value of the exponent. Floating-point formats allow a muchwider range of values to be represented and virtually eliminate the risk of numeric overflow inmany applications.

    Digital signal processing applications typically must pay careful attention to numeric fidelity(e.g., avoiding overflow). Since numeric fidelity is far more easily maintained using a floating-point format, it may seem surprising that most DSP processors use a fixed-point format. In manyapplications, however, DSP processors face additional constraints: they must be inexpensive andprovide good energy efficiency. At comparable speeds, fixed-point processors tend to be cheaperand less power-hungry than floating-point processors because floating-point formats require more

  • 8/14/2019 Use a Microprocessor, A DSP, Or Both

    5/11

    complex hardware to implement so DSP processors predominately use a fixed-point numericformat. Low-end GPPs also generally use fixed-point numeric formats, for many of the samereasons that DSPs do.

    Sensitivity to cost and energy consumption also influences the data word width used in DSPprocessors. DSP processors tend to support the shortest data word that will provide adequateprecision in the target applications. Most fixed-point DSP processors support 16-bit data wordsbecause 16 bits is sufficient for many digital signal processing applications. (Some fixed-pointDSP processors support 20, 24, or 32 bits to enable higher precision in some applications, such ashigh-fidelity audio processing. A few also support 8-bit data, which is useful for videoapplications.) Low-end GPPs generally support 32-bit fixed-point data, offering a better matchfor applications requiring higher precision than the DSPs supporting 16-bit format.

    To ensure adequate signal quality while using fixed-point data, DSP processors typically includespecialized hardware to help programmers maintain numeric fidelity throughout a series of computations. For example, most DSP processors include one or more accumulator registers tohold the results of summing several multiplication products. Accumulator registers are typicallywider than other registers; the extra bits, called guard bits, extend the range of values that canbe represented and thus helps avoid overflow. In addition, DSP processors usually include goodsupport for saturation arithmetic, rounding, and shifting, all of which are useful for maintainingnumeric fidelity. Low-end GPPs, in contrast, usually have uniform register widths; i.e., they donot provide larger accumulator style registers, and do not support saturation arithmetic or round-ing. To some extent, however, the larger native data word size typical of GPPs (32 bits vs. 16bits for DSPs) reduces the need for saturation, scaling, and rounding.

    Zero-Overhead Looping

    Digital signal processing algorithms typically spend the vast majority of their processing time inrelatively small sections of software that are executed repeatedly; i.e., in loops. Hence, most DSPprocessors provide special support for efficient looping. Often, a special loop or repeat instructionis provided which allows the programmer to implement a for-next loop without expending anyclock cycles for updating and testing the loop counter or branching back to the top of the loop.This feature is often referred to as zero-overhead looping. In contrast, low-end GPPs do notprovide special loop oriented operations, beyond normal conditional branch instructions.

    Specialized Instruction Sets

    DSP processor instruction sets have traditionally been designed with two goals in mind. The firstis to make maximum use of the processor's underlying hardware, thus increasing efficiency. Thesecond is to minimize the amount of memory space required to store DSP programs, since digitalsignal processing applications are often quite cost-sensitive and the cost of memory contributessubstantially to overall chip and/or system cost. To accomplish the first goal, low-end DSPprocessor instruction sets generally allow the programmer to specify several parallel operations in

    a single instruction, typically including one or two data fetches from memory (along with addresspointer updates) in parallel with the main arithmetic operation. With the second goal in mind,instructions are kept short (thus using less program memory) by restricting the registers that canbe used with various operations, and also restricting which operations can be combined in aninstruction. To further reduce the number of bits required to encode instructions, DSP processorsoften have only a small number of registers divided into special-purpose groups. DSP processorsmay use mode bits to control some features of processor operation (for example, rounding orsaturation) rather than encoding this information as part of the instructions.

  • 8/14/2019 Use a Microprocessor, A DSP, Or Both

    6/11

    The overall result of this approach is that low-end DSP processors tend to have highlyspecialized, complicated, and irregular instruction sets. Combined with multiple memory spaces,multiple buses, and highly specialized arithmetic and addressing hardware, these instruction setsmake low-end DSP architectures very poor compiler targets. While a compiler can certainly takeC source code and generate assembly code for a low-end DSP, this compiler-generated code isusually far less efficient than hand written assembly code. But digital signal processingapplications typically have very high computational demands coupled with strict cost constraints,making efficient code essential. For these reasons, when selecting a processor, programmers oftenneed to consider the quality of compiler-generated code and the ease or difficulty of writing keyroutines in assembly language.

    Low-end GPPs, with their simple single-operation RISC instructions, regular register sets, andsimpler memory models, are excellent compiler targets. Programmers who write software forlow-end GPPs typically don't have to worry much about the ease of use of the processor'sinstruction set, and they generally develop applications in a high-level language, such as C orC++. The main motivation to hand write assembly for a low-end GPP is to gain access toinstructions the compiler does not support, although this scenario is rare in the context of low-endGPPs.

    High-Performance Processors

    High-Performance DSPs

    Processor architects who want to improve performance beyond the gains afforded by faster clock speeds and must find a way to get more useful work out of every clock cycle. One way to do thisis to use a multi-issue architecture, as is used in most current high-performance DSPprocessors. Multi-issue processors achieve a high level of parallelism by issuing and executingmore than one instruction at a time. In general, the instructions used in multi-issue DSPs aresimple, RISC-like instructions, more similar to those of GPPs than to those of low-end DSPs. The

    advantage of the multi-issue approach is a processor with a significant increase in parallelismwith a simpler, more regular architecture and instruction set that lends itself to efficient compilercode generation. And because VLIW DSPs tend to use simple instructions that ease the task of instruction decoding and execution, they are able to execute at much higher clock rates than low-end DSPsthus further increasing their performance advantage.

    TI was the first vendor to use a multi-issue approach in a mainstream commercial DSP processor.TI's VLIW (very long instruction word) TMS320C62x, introduced in 1996, was dramaticallyfaster than other mainstream DSP processors available at the time. Other vendors have sincefollowed suit, and now all of the major DSP processor vendors (TI, Analog Devices, andFreescale) employ multi-issue architectures for their latest high-performance DSPs.

    In a VLIW architecture, the assembly language programmer (or code-generation tool) specifieswhich instructions will be executed in parallel. Hence, instructions are grouped at the time theprogram is assembled, and the grouping does not change during program execution. VLIWarchitectures provide multiple execution units, each of which executes its own instruction. TheTMS320C62x, for example, contains eight independent execution units, and can execute up toeight instructions per cycle. In general, VLIW processors issue a maximum of between four andeight instructions per clock cycle; these instructions are fetched and issued as part of one longsuper-instructionhence the name very long instruction word.

  • 8/14/2019 Use a Microprocessor, A DSP, Or Both

    7/11

    Although their instructions typically encode only one or two operations, some current VLIWDSPs use wider instruction words than low-end DSP processorsfor example, 32 bits instead of 16. There are a number of reasons for using a wider instruction word. In VLIW architectures, awide instruction word may be required in order to specify which functional unit will execute theinstruction. Wider instructions allow the use of larger, more uniform register sets (rather than thesmall sets of specialized registers common among low-end DSP processors), which in turnenables higher performance. Relatedly, the use of wide instructions allows a higher degree of consistency and regularity in the instruction set; the resulting instructions have few restrictions onregister usage and addressing modes, making VLIW processors better compiler targets (andeasier to program in assembly language).

    There are disadvantages, however, to using wide, simple instructions. Since each VLIWinstruction is simpler than a low-end DSP processor instruction, VLIW processors tend to requiremany more instructions to perform a given task. Combined with the wider instruction words thanthose on low-end DSP processors, this characteristic results in relatively high program memoryusage. High program memory usage, in turn, may result in higher chip or system cost because of the need for additional ROM or RAM.

    Most current VLIW DSPs use mixed-width instruction sets; 32-bit/16-bit instruction sets arecommon. Using a mixed-width instruction set allows the processor to use small, simpleinstructions where appropriate (such as in the control-oriented sections of code), while still beingable to use longer, more powerful instructions for the computationally intensive portions of theapplication. VLIW processors with mixed-width instruction sets typically achieve much bettercode density than those that only support wide (e.g., 32-bit) instructions.

    The architectures of VLIW DSP processors are in some ways more like a general-purposeprocessor than a highly specialized low-end DSP architecture. VLIW DSP processors often omitsome of the features that were, until recently, considered virtually part of the definition of a DSPprocessor. For example, VLIW DSPs do not typically include zero-overhead loopinginstructions; they require the processor to execute instructions to perform the operationsassociated with maintaining a loop. This does not necessarily result in a loss of performance,however, since VLIW processors are able to execute many instructions in parallel. The operationsneeded to maintain a loop, for example, can be executed in parallel with several arithmeticcomputations, achieving the same effect as if the processor had dedicated looping hardwareoperating in the background.

    High-Performance GPPs

    Like high-performance DSPs, high-performance GPPs also employ multi-issue architectures thatexecute multiple instructions in parallel. Many of the multi-issue concepts discussed in the abovesection on high-performance DSPs hold true for high-performance multi-issue GPPs. Forexample, both types of processors issue multiple independent instructions in parallel, both

    provide multiple execution units to execute the parallel instructions, and both are effective atimproving performance compared to their single-issue counterparts. But high-performance GPPs,by and large, are not VLIW architectures, but rather are superscalar architectures. The differencein the way these two types of architectures schedule instructions for parallel execution isimportant in the context of using them in real-time digital signal processing applications.

    Superscalar processors contain specialized hardware that determinesat run-timewhichinstructions will be executed in parallel based on inter-instruction data dependencies and resourcecontention. This shifts the burden of scheduling parallel instructions from the programmer or

  • 8/14/2019 Use a Microprocessor, A DSP, Or Both

    8/11

    tools to the processor. An important benefit of the superscalar approach is the ability to maintainbinary compatibility with previous generations of the same processor family. For example, somesuperscalar processors simply duplicate the execution units of an existing single-issue GPP orDSP processor, producing two execution paths, each one identical to the single execution path of the earlier single-issue processor. In the worst case, software written and compiled for the earliersingle-issue processor runs on the new multi-issue processor using a single execution path whilethe other path remains idle, resulting in performance similar to the original processor (assuming asimilar clock rates and underlying microarchitectures). In the best case, the scheduling hardwarecan issue two instructions in parallel in every instruction cycle, effectively doubling throughput,and potentially improving performance by a factor of two. Superscalar processors typically issueand execute fewer instructions per cycle than VLIW processorsusually between two and four.

    A challenge for programmers using a superscalar processor to implement signal processingsoftware is that superscalar processors may group the same set of instructions differently atdifferent times in the program's execution; for example, it may group instructions one way thefirst time it executes a loop, then group them differently for subsequent iterations. Becausesuperscalar processors dynamically schedule parallel operations, it may be difficult for theprogrammer to predict exactly how long a given segment of software will take to execute. Theexecution time may vary based on the particular data accessed, whether the processor is executinga loop for the first time or the third, or whether it has just finished processing an interrupt, forexample. This uncertainty in execution times can pose a problem for digital signal processingsoftware developers who need to guarantee that real-time application constraints will be met inevery case. Measuring the execution time on hardware doesn't solve the problem, since theexecution time is often variable. Determining the worst-case timing requirements and using themto ensure that real-time deadlines are met is another approach, but this tends to leave much of theprocessor's speed untapped.

    A processors dynamic features also complicate software optimization. As a rule, DSP processorshave traditionally avoided dynamic features (such as superscalar execution, dynamic caches, anddynamic branch prediction) for just these reasons.

    These are some of the reasons why, in general, high-performance DSPs are VLIW architectures,whereas high-performance GPPs are superscalar architectures. This distinction isnt arbitrary, butrather is driven by key requirements related to each class of processor: execution timepredictability in DSPs and compatibility requirements in GPPs.

    Common Characteristics

    VLIW and superscalar processors often suffer from high energy consumption relative to theirsingle-issue counterparts; in general, multi-issue processors are designed with an emphasis onspeed rather than energy efficiency. These processors often have more execution units active inparallel than single-issue processors, and they require wide on-chip buses and memory banks to

    accommodate multiple parallel instructions and to keep the multiple execution units supplied withdata, all of which contribute to increased energy consumption.

    Because they often have had high memory usage and energy consumption, VLIW and superscalarprocessors have historically mainly targeted applications which have very demandingcomputational requirements but are not very sensitive to cost or energy efficiency. For example, aVLIW processor might be used in a cellular base station, but not in a portable cellular phone. Thistrend is changing, however, as portable devices increasingly require more computational powerand processor vendors continually strive to produce energy efficient versions of their high-

  • 8/14/2019 Use a Microprocessor, A DSP, Or Both

    9/11

    performance architectures. For example, the StarCore SC1400 core is based on a high-performance VLIW architecture, but has sufficiently low energy consumption to enable its use inportable products.

    SIMD

    High-performance DSPs and GPPs often further increase the amount of parallelism available byincorporating single-instruction, multiple-data (SIMD) operations. SIMD is not a class of architecture itself, but is instead an architectural technique that can be used within any of theclasses of architectures we have described. SIMD improves performance on some algorithms byallowing the processor to execute multiple instances of the same operation in parallel usingdifferent data. For example, a SIMD multiplication instruction could perform two or moremultiplications on different sets of input operands in parallel in a single clock cycle. Thistechnique can greatly increase the rate of computation for some vector operations that are heavilyused in signal processing applications. In the case where SIMD is included in a multi-issuearchitecture, the resulting parallelism can be quite high. For example, if two multiplicationinstructions can be issued per instruction cycle and these two instructions are each four-waySIMD instructions, the result is eight independent multiplications per clock cycle.

    High-performance GPPs such as Pentiums and PowerPCs have been enhanced to increase signalprocessing performance via the addition of SIMD-based instruction-set extensions, such as MMXand SSE for the Pentium and AltiVec for the PowerPC. This approach is good for GPPs, whichtypically have wide resources (buses, registers, ALUs), that can be treated as multiple smallerresources to increase parallelism. For example, Freescales PowerPC 74xx with AltiVec has verypowerful SIMD capabilities; its wide, 128-bit vector registers are logically partitioned into four32-bit, eight 16-bit, or sixteen 8-bit elements. The PowerPC 74xx can perform eight 16-bitmultiplications per cycle. Using this approach, high-performance GPPs are often able to achieveperformance on DSP algorithms that is better than that of even the fastest DSP processors. Thissurprising result is partly due to the effectiveness of SIMD, but is also because high-performanceGPPs operate at extremely high clock speeds in comparison to DSP processors; they typicallyoperate at upwards of 2500 MHz or more, while the fastest DSP processors are typically in the600-1000 MHz range.

    On DSP processors with SIMD capabilities, the underlying hardware that supports SIMDoperations varies widely. Analog Devices, for example, modified its low-end floating-point DSParchitecture, the ADSP-2106x, by adding a second set of execution units that exactly duplicatethe original set. The resulting architecture, the ADSP-21xxx, has two sets of execution units thateach include a MAC unit, ALU, and shifter, and each has its own set of operand registers. Theaugmented architecture can issue a single instruction and execute it in parallel in both sets of execution units using different dataeffectively doubling performance in some algorithms.

    Alternatively, some DSP processors split their execution units into multiple sub-units, similar tothe technique used on high-performance GPPs. Perhaps the most extensive SIMD capabilities in aDSP processor to date are found in Analog Devices TigerSHARC processor. TigerSHARC is aVLIW architecture, and combines the two types of SIMD: one instruction can control executionof the processor's two sets of execution units, and an instruction can specify a split-execution-unit(e.g., split-ALU or split-MAC) operation that will be executed in each set. Using this hierarchicalSIMD capability, TigerSHARC can execute eight 16-bit multiplications per cycle, like thePowerPC 74xx.

  • 8/14/2019 Use a Microprocessor, A DSP, Or Both

    10/11

    Making effective use of processors SIMD capabilities can require significant creativity on thepart of the programmer. Programmers often must arrange data in memory so that SIMDprocessing can proceed at full speed (e.g., arranging data so that it can be retrieved in groups of two, four, or eight operands at a time) and they may also have to re-organize algorithms, or writeadditional code that reorganizes data in registers to make maximum use of the processor'sresources. SIMD is only effective in algorithms that can process data in parallel; for algorithmsthat are inherently serial (that is, algorithms that tend to use the result of one operation as an inputto the next operation), SIMD is generally not of use.

    Development Support

    Development support refers to the software and hardware infrastructure that that is available toassist users in developing products based on the processor. This typically includes softwaredevelopment tools, software component libraries, hardware emulators, and evaluation boards.Development support varies significantly between GPPs and DSPs, between low-end and high-performance processors, and from one processor to another within these categories.

    As a group, DSPs have good to excellent signal-processing-specific tool support, while GPPstypically have poor signal-processing-specific tool support. The situation is similar in terms of third-party signal processing software components: DSP processors have poor to excellentsupport, depending on the processor, whereas GPPs typically have poor support. Vendor-suppliedsignal processing software, however, is improving for GPPs, as GPP vendors increasingly strivefor competitive advantages, particularly in multimedia applications. For third-party software fortasks other than signal processing, DSP processors typically have poor support, whereas GPPshave poor to excellent support, depending on the specific processor.

    An important class of third-party software is real-time operating systems. A typical DSP mayhave a small handful of real-time operating systems available for it, but usually these dontinclude the most popular real-time operating systems. In contrast, GPPs often have extensivesupport from a wide range of real-time operating systems including some of the most popularreal-time operating systems such as VxWorks from Wind River Systems.

    Finally, DSP processor tools tend to have links to high-level signal processing tools likeMATLAB. General-purpose processor tools typically lack these links, but instead have links toother kinds of high-level tools, such as user interface builders.

    On-Chip Integration

    We use the phrase on-chip integration to refer to all of the elements on a chip besides theprocessor core itself. A key consideration is the appropriateness of the on-chip integration to theapplication at hand. If the elements on the chip suit the needs of the application well, they can bean enormous benefit to the system designer. They can result in reduced cost, lower designcomplexity, reduced power consumption, reduced product size and weight, and even reducedelectromagnetic interference (because fewer printed circuit board traces will be required).

    Most processors incorporate some type of on-chip integration. GPPs and DSPs generally offer awide range of on-chip peripherals and I/O interfaces. These are typically geared towards costsensitive application areas such as consumer electronics. Examples of commonplace peripheralsinclude USB ports and LCD controllers. High-performance GPPs and DSPs tend to offer lowerlevels of integration. The peripherals and I/O interfaces in high-performance processors

  • 8/14/2019 Use a Microprocessor, A DSP, Or Both

    11/11

    generally target communications infrastructure applications. For example, high-performanceDSPs typically include features such as Viterbi coprocessors and UTOPIA ports. High-performance PC CPUs are an exception to this rule. Unlike high-end embedded GPPs, PC CPUshave little on-chip integration.

    Not surprisingly, the on-chip integration found in DSPs tends to be oriented towards signalprocessing functions, while the integration found in GPPs tends to be oriented towards general-purpose functions. For example, high-performance DSPs generally have integration appropriatefor baseband processing, while high-performance GPPs generally have integration appropriate fornetwork processing.

    Conclusions

    For applications with moderate signal processing requirements, low-end DSPs and GPPs areoften both viable candidates. In most cases, low-end DSPs will offer better speed, energyefficiency, memory use, and on-chip integration. If the application has extensive functionalityoutside of signal processing tasksfor example, if it requires things like user interfaces andcommunications protocolsthen a low-end GPP may be preferred because of its efficiency inthose types of tasks, and its generally superior development tools and support.

    Applications with heavy signal processing requirements will invariably require a high-performance processor. But since high-performance GPPs often have signal processingperformance that is similar to that of high performance DSPs, which one should you use?

    Since high-performance GPPs are competitive with high-performance DSPs in terms of speed andmemory use, other factors become important in selecting and differentiating between the two.Energy efficiency and chip cost, for example, often favor DSPs. It is also important to consideron-chip integration and development support. In neither of these areas does either category of processor have a consistent advantage. PC CPUs are the exception to this rule: with their minimalon-chip integration, PC CPUs are at a clear disadvantage in embedded applications.

    Though in general their tools are very good, signal-processing-specific developmentinfrastructure is usually a weakness of high-performance GPPs used in signal processingapplications. This is an area where high-performance DSP processors still hold a significantadvantage, though vendors of high-performance GPPs and their tool vendors are making progressin signal-processing-oriented tool support.

    Finally, high-performance GPPs introduce a lack of execution time predictability compared tohigh-performance DSPs, which results from their use of dynamic features such as superscalarinstruction execution. This timing variability can make it difficult to ensure real-time behavior insome situations, and can make it difficult to optimize software, which is often required inperformance-hungry signal processing applications.

    Resources

    Berkeley Design Technology, Inc., DSP Processor Fundamentals: Architectures andFeatures, Berkeley Design Technology, Inc., 1996.

    www.BDTI.com for numerous white papers and presentation slides on processorarchitectures, DSP software development, and processor benchmark results.