numeric representations and arithmetic

DSP VLSI Design

Numeric Representations and Arithmetic

Byungin Moon

Yonsei University

1YONSEI UNIVERSITYDSP VLSI Design

Numeric Representationsand Arithmetic

Outline

Fixed-point representationFloating-point representationFixed-point versus Floating-point

PrecisionDynamic rangeDevelopment timeProduct cost

Native data word widthExtended PrecisionFloating-point emulation and block floating-pointIEEE-754 floating-point standardRelationship between data and instruction word sizes



Numeric Representations inDSP Processors

Source: DSPFundamentals



Fixed-Point Arithmetic

Represented numbers are uniformly spacedRepresents each number with a minimum 16 bits, although a different length can be usedDivided into two categories

Signed/Unsigned Integer and signed/unsigned fractionalE.g., 16-bit fixed-point arithmetic

In unsigned integer, from 0 to 65,535In signed integer, from -32,768 to 32,767In unsigned fraction notation, 65,536 levels spread uniformly between 0 and 1In signed fraction format, 65,356 levels equally spaced between -1 and 1

The algorithms and hardware used to implement fractional arithmetic is virtually identical to those for integer arithmetic

The main difference is how the multiplication results are handled



Example of Integer Representation(Two’s Complement)




Example of FractionalRepresentation




Floating-point Arithmetic

Numbers are represented by the combination of a mantissa and an exponent

value = mantissa × 2exponent

MantissaA signed fractional value with a single implied integer bitValue in the ranges of +1.0 to +2.0 and -1.0 to -2.0

ExponentA signed integer that represents the number of places that the binary point of the mantissa must be shifted left or right to obtain the original number represented

Typically 32 bitsRepresented numbers are not uniformly spaced



Example of Floating-pointRepresentation




Precision

PrecisionDefined based on the idea of quantization error, exactly signal-to-noise ratio

Quantization errorNumerical error introduced when a longer numeric format is converted to a shorter oneRelated to the gap between two adjacent numbersE.g., Quantization error of 0.004 when the value 1.325 is rounded to 1.33

The ratio of the size of the value represented to the size of the maximum quantization errorMaximum precision

= Log2(|maximum value|/maximum quantization error|



Precision of Fixed-Point

Maximum precision (of 16 bits)= Log2(|-1.0|/2-16| = 16 (the same as the bit width)

The smaller the value being represented is, the less the precision (signal-to-noise ratio) is

E.g., signal-to-noise ratio is 100,000 for the value 10,000, whereas the ratio is 1,000 for 100In fixed-point, scaling is important to maintain precision

Scaling to avoid overflowRound-off by scaling causes additional quantization noise on each step (simply adding noise in the worst case)E.g., For 500 coefficient FIR filter, the signal-to-noise ratio drops to 1/500 of the original precisionNeed for an extended precision accumulator that has 2-3 times as many bits as the other memory locations (important feature of DSPs)

The only round-off error suffered is when the accumulator is scaled and stored in the 16 bit memory



Precision of Floating-Point

Maximum precision is the number of bits of mantissa (larger than that of fixed-point)Relatively uniform precision regardless of the value being presented

The magnitude of the mantissa is restricted to be at least 1 (due to the implied integer bit) guaranteeing that the precision of any floating value is no less than half of the maximum precisionIn other words, floating-point notation places large gap between the large numbers but small gap between small numbersThe gap between any two numbers is about ten-million times smaller than the value of the numbersScaling is not needed



Dynamic Range

The ratio between the largest and smallest number representable in a given data format

In applications, translated into a range of signal magnitudes that can be processed while maintaining sufficient fidelityDifferent applications have different dynamic range needs

For high-fidelity audio applications, 90 dB is a common benchmark

DSPs need to have somewhat more dynamic range than the application demands

Frees the programmer from some of the painstaking scaling that may otherwise be needed to preserve adequate dynamic range



Dynamic Range

In fixed point16-bit fixed point

1/2-15 = 32768 = 90 dB32-bit fixed point

1/2-31 = 2.15 × 109 = 187 dBRelatively low

In floating point32-bit floating-point format with 24-bit mantissa and 8-bit exponent (from the DSPFundamentals)

5.88 × 10-39 to 3.40 × 1038 = 5.79 × 1076 = 1535 dBIEEE 754 standard

1.2 × 10-38 to 3.4 × 1038 = 2.8 × 1076 = 1529 dBRelatively high



Development Time

In fixed pointScaling is needed because of provide adequate precision and dynamic range to applicationsDifficult to develop algorithms

The possibility of an overflow or underflow needs to be considered after each operationThe programmer needs to continually understand the amplitude of the numbers to take place

Longer development timeIn floating point

Scaling is not neededDevelopment algorithm is simpleShorter development time



Fixed vs. Floating Point Instructions(Used in the SHARC DSPs)

Source: DSPGuide



Product Cost

Fixed-point DSPSimple architectureCheap (5$ to $100 in 1999)

Floating-point DSPComplicated architecture

All the registers and data must be 32 bit wide instead of only 16The multiplier and ALU must be able to quickly perform floating point arithmeticThe instruction set must be larger

All floating point DSPs can also handle fixed point numbers (carry out as quickly as or slower than the floating point operations), a necessity to implement counter, loops, and signals coming from ADC and going to the DAC



Fixed versus Floating PointSource: DSPGuide



Major Trends in DSPsSource: DSPGuide



Native Data Word Width

Width of data that the processor’s buses and data path can manipulate in a single cycleInfluence on cost

The size of chip and the number of package pinsThe size and number of external memory devices connected to the DSP

Influence on development complexityThe smaller the width is, the more complex algorithms and programming are

In fixed pointTypically 16 bits, but 20, 24, 32-bit DSPs exist



Extended Precision

Provides higher precision than that of a processor’s native data format

Can be obtained in two waysBuilt-in support for an extended precision format

As long as a series of arithmetic operations is carried out exclusively within the processor’s data path and does not involve transferring intermediate results to and from memoryE.g., extended precision accumulator

Multiprecision arithmeticConstructs larger data words out of sequences of native-width data wordsExample of hardware supports

Preserve the carry bit for use as an input into a subsequent addition (e.g., ADDC)Ability to tread multiplication operands as signed or unsigned



Floating-Point Emulation andBlock Floating-Point

Floating-point emulationObtain the precision and dynamic range of floating-point arithmetic by using software routinesSome manufactures provide a library of emulation routines

Block floating-point representationAnother approach to obtaining increased precision and dynamic rangeA block of data

A group of numbers with different mantissas but a single, commonexponent

The common (negative) exponentDetermined by the data element with the largest magnitudeMantissas are shifted left by the magnitude of the exponent

Some DSPs have hardware features for block floating-pointE.g., “exponent detect” instruction



Example of Block Floating-point Representation




IEEE-754 Floating-Point

IEEE standard which defines standard formats for floating-point data representations and a set of standard rules for floating-point arithmetic, in 1985Types of supports for IEEE-754

Hardware supportMotorola DSP96002 features hardware support for single-precision floating-point arithmetic as specified in IEEE-754ADSP-210xx family processors provide nearly complete hardware support for single-precision format

Special hardware for fast conversion of numbers between the processor’s internal and IEEE-754 representations

AT&T DSP32xxSoftware routines for conversion



Data Word Size versusInstruction Word Size

Most DSP processors use an instruction word size equal to their data word size, but not all do

Analog Devices ADSP-21xx family and IBM MDSP278016-bit data word and a 24-bit instruction word

Zoran’s ZR3800x20-bit data word and 30-bit instruction word

Storing data in program memory in processors with dissimilar word sizes

Allowed in some processorsNot the most efficient use of memory

A significant portion of each program memory word used to store data is wasted

numeric representations and arithmetic

Documents