signal processing in fpga.pdf

Upload: srinivas-cheruku

Post on 14-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 signal processing in fpga.pdf

    1/7

    Implementing a Quantitative Model for the Effective

    Signal Processing in the Auditory Systemon a Dedicated Digital VLSI Hardware

    A. Schwarz, B. Mertsching M. Brucke, W. Nebel J. Tschorz, B. Kollmeier

    University of Hamburg University of Oldenburg University of Oldenburg

    Computer Science Department, Computer Science Department, Physics Science Department,

    IMA Group VLSI Group Medical Physics Group

    D-22527 Hamburg, Germany D-26111 Oldenburg, Germany D-26111 Oldenburg, [email protected] [email protected] [email protected]

    1. Introduction

    The binaural perception model introduced in [1] de-

    scribes the effective signal processing in the human audi-

    tory system and provides an appropriate internal

    representation of acoustic signals. Its capabilities were suc-

    cessfully demonstrated as a preprocessing algorithm for

    speech recognition [2], objective speech quality measure-

    ment [3] and digital hearing aids. The algorithm processes

    stereo signals and includes a gammatone filter bank (30

    bandpass filters equidistant distributed on the ERB scale

    from 73 to 6700 Hz) to model spectral properties of the hu-

    man ear like spectral masking and frequency-dependent

    bandwidth of auditory filters.

    max

    t1

    t2

    t3

    t4

    t5

    1

    kHz

    8

    Hz

    stereo

    gammatone

    filterbank

    halfwave

    re

    adaptation

    loops

    inre

    lowpass

    filtering

    absolute

    threshold

    lowpass

    filtering

    filteradaptationloops

    gammatonefilterbank

    envelope absolute lowpassextraction

    input

    threshold

    stereo1 kHz

    t1 t2 t3 t5t4

    8 Hz

    Figure 1. Processing scheme of the binaural perception model introduced in [1].

    Abstract

    A digital VLSI implementation of an algorithm model-

    ing the effective signal processing of the human auditory

    system is presented. The model consists of several stages

    psychoacoustically and physiologically motivated by the

    signal processing in the human ear and was successfully

    applied to various speech processing applications. The pro-

    cessing scheme was partitioned for implementation in a set

    of three chips. Due to local properties of the signal dynamic

    and the necessary arithmetical precision different ap-

    proaches for number representation and appropriate arith-

    metic operators were investigated and implemented. It is

    demonstrated how an application of the model has been

    used to determine the necessary wordlengths for a transfer

    of the algorithm into a version suitable for hardware imple-

    mentation. Fix point arithmetic is used in the linear parts of

    the origin algorithm and a special small floating point op-

    erator set was developed for the nonlinear part. This part

    was coded in behavioral VHDL and synthesized with

    Synopsys Behavioral Compiler. The hardware algorithm is

    being evaluated on different implementation levels for a

    FPGA and will be manufactured as ASICs in a later ver-

    sion. The presented FPGA chip set will be combined with a

    commercial DSP system (TMS320C6201) for real time and

    reconfigurable signal processing.

  • 7/27/2019 signal processing in fpga.pdf

    2/7

    A stage modeling inner hair cell behavior (envelope ex-

    traction) is followed by five adaptation loops (with time

    constants between 5 and 500 ms) to consider dynamical ef-

    fects as nonlinear adaptive compression and temporal

    masking (see Fig. 1). The demonstrated VLSI design con-tains additional components to determine differences in

    phase and magnitude of each channel (Fig. 2).

    2. Hardware design specifications

    Due to the complexity the design was partitioned into

    three chips (Fig. 2). Besides the serial data interfaces chip 1

    contains the 30-channel binaural filter bank, the envelope

    extraction, and a module computing phase differences and

    magnitude quotients in each stereo output of the channels.

    A single bandpass filter is multiplexed through all 30 chan-

    nels both for the left and right stereo signal. Including a six-stage pipelined multiplier and one adder/subtractor this ker-

    nel is realized by a quad cascade of a single stage complex

    IIR filter. This saves chip area but requires a 50 MHz sys-

    tem clock to operate with a 16276 Hz sampling frequency.

    RAM units save temporary data and filter constants are

    read from a ROM (see Fig. 2).For further processing three system outputs are avail-

    able. A high speed interface (30 MBit/s) provides real and

    imaginary parts of the right and left stereo data for all filter

    bank channels (chip 1). The adaptive compressed data of

    the left (1st chip 2) and the right stereo signal (2nd chip 2)

    and the phase and amplitude information of the filter bank

    outputs are combined within a second data stream (12

    MBit/s). The 4-wire serial interfaces of the chip set (16 bit

    data words) support a direct interface to the serial ports of

    most DSP-devices. In a constellation with a DSP-device

    able to serve the fast serial ports (TI TMS320C6201) a sys-

    tem solution for auditory signal processing is provided.

    3

    3

    Highspeed Output Interface

    Panic

    Sync

    Input Interface

    RectificationHalfwave

    IIR Lowpass1kHz 1st Order

    QuotientMagnitude-

    Controller

    TempReg Reg

    Mux Mult

    Mux Op1Reg

    Op2

    GammatoneFilter Bank

    Phase-Difference

    Input Interface

    Output Interface

    LogicInput

    Output

    Logic

    Scale & Lowpass

    Serial/Parallel Converter

    Parallel/Serial Converter

    SerialDataSync

    DecimationLowpass &

    Sub

    Add

    SerialDataIn

    INPUT FROM DSP/CODECReset

    50 MHz Clock

    SerialDataOut

    30 Mbit/s

    LowspeedOutputInterface

    12 Mbit/s

    12 Mbit/s

    Sync

    3

    OUTPUT TO DSP COMBINED OUTPUT

    3

    3SerialDataOut

    Sync

    Panic

    ASIC 1 / 1st ASIC 2 (left)

    12 Mbit/s30 Mbit/s

    Valid

    ROMConstants

    RAMState Mem

    1st Order

    Lowpass

    Core Input

    AdaptationLoops

    Panic

    Core OutputInitBusyValid

    Valid

    Divider

    Controller

    ("Rolled")Multiplexed

    Five

    ASIC 1 / 2nd ASIC 2 (right)

    ROM RAM

    50 MHz ClockReset24 MHz Clock

    Figure 2. Structure and wiring scheme of the internal components of the chip set.

  • 7/27/2019 signal processing in fpga.pdf

    3/7

    Each of the five adaptation loops contains a divider

    whose quotient is fed back by a 1st order IIR lowpass pro-

    viding the divisor. This feedback, the necessary precision

    and signal dynamic requires large fix point wordlengths or

    a logarithmic number format. The dividers are very area ex-pensive and therefore the most critical components in the

    design. A fourfold subsampling and data serialization in

    chip 1 allow a multiplexed loop kernel monaurally imple-

    mented in two chips (two of chip 2). The loop kernel con-

    tains RAM cells storing the states of all lowpass filters for

    the 30 serial processed frequency channels.

    3. Floating point to fix point to floating point -

    arithmetic suitable for auditory signal

    processing

    A direct implementation of an IEEE 32 bit single preci-

    sion floating point arithmetic of the model is not possible

    due to limitations of area and timing. To gain an optimal

    implementation different methods are applied to the linear

    filter bank and the nonlinear adaptation loops respectively.

    The main problem when converting number formats and

    dedicated arithmetic is the determination of the required

    numerical precision. Because the necessary quantization

    depends on applications and typical signal dynamic the per-

    ception model was recoded in C++ using new classes of

    scalable data types and necessary operators. This class

    takes the internal wordlength as a parameter and saves the

    values exactly in the same format as they would be saved ina register on an ASIC. Thus numerical effects of imprecise

    arithmetic can be simulated in target applications.

    The kernel arithmetic of gammatone filterbank was de-

    signed and successfully evaluated in a fix point notation.

    After evaluating a scalable fix point version of the nonlin-

    ear adaptation loops and recognizing the high area con-

    sumption for especially the dividers a small floating point

    class was successfully tested.

    3.1. Arithmetic transformation for linear gamma-

    tone filters

    Principle. The necessary internal wordlength for the

    gammatone filter bank can be assessed in a straight-forward

    way, because the filters are linear time invariant systems

    where classical numerical parameters like SNR can be ap-

    plied. It is sufficient to record the filter responses for -pulses for each filter parameterized with different internal

    wordlengths. Figure 3 shows the mean square error (rela-

    tive error, i.e. noise-to-signal ratio) between one of these

    implementations and the original specification with IEEE

    single precision floating point arithmetic. The choice of a

    certain maximal square error (e.g. 10-3 for all channels)

    leads directly to the necessary internal wordlength. Allow-

    ing an error of 0.001 a minimal wordlength of 24 bits is nec-

    essary for the lowest filter bank channel (Fig. 3).

    Figure 3. Error introduced by fix point quan-tization in the gammatone filter bank.

    Numerical operations. The filter algorithm consists of

    a fourfold first-order filter which contains only add and

    multiply by constants operations.

    Number formats. Due to the increased analysis band-

    width the error for a given wordlength decreases with in-

    creasing center frequency and channel number

    respectively. All channels use the same operator structure,

    thus a general number format of 24 bits fix point is re-quired.

    3.2. Arithmetic transformation for nonlinear

    adaptation loops

    Principle. The determination of an optimal quantiza-

    tion in the adaptation loops is much more difficult because

    they show a strong nonlinear behavior.

    It was demonstrated in [3] that the perception model can

    supply an objective speech quality measure q. Speech sig-

    nals distorted by low-bit-rate codecs used in mobile tele-

    phone devices are compared to their undistorted versionand a quality measure q is given, which is correlated with a

    subjective Mean Opinion Score (MOS) of the test signals.

    Because this testbench is very sensitive to limited number

    precision and signal dynamic in the perception model, it

    can be used to evaluate modifications caused by limited

    quantization and arithmetic (Fig. 4). An optimized quanti-

    zation of the nonlinear adaptation loops (small wordlengths

    i.e. small chip area vs. reliable signal processing) was found

    by empirical wordlength variation. The results were veri-

    fied processing two different large speech signal sets vary-

    ing the input signal levels from -10 to 50 dB.

    1e-08

    1e-07

    1e-06

    1e-05

    1e-04

    1e-03

    1e-02

    1e-01

    1e+00

    5 10 15 20 25 30

    meansquareerror

    number of filter-bank channel

    30 bit

    28 bit

    26 bit

    24 bit

    22 bit

    20 bit

    18 bit

    16 bit

  • 7/27/2019 signal processing in fpga.pdf

    4/7

    Data analysis. Histograms were recorded at internal

    nodes to investigate signal levels during the processing of

    typical speech (ETSI-test data [4][5]) and noise input sig-nals (Fig. 5).

    Figure 5. Histograms of output and divisorin the adaptation loops for typical speechsignals.

    The divisors of the loops have an individual threshold,

    and their lower bounds are introduced to reduce unwanted

    peaks. The dynamic range is obviously limited. Only posi-

    tive values occur in the loops, divisors never exceed 1.0,

    and the loop outputs are concentrated near zero. This is to

    be expected since small amplitudes are very frequently intypical speech signals according to their probability density

    distribution [6].

    Numerical operations. The original C-code contains in

    the loops and the following scaling and lowpass unit all ba-

    sic arithmetic operators (Table 1.). The current quotients

    qi[n] in the loops are calculated from local lowpass filter

    outputs bi[n-1] of the last cycle. The current lowpass output

    is derived from its last output bi[n-1] and the new quotient

    qi[n]. The output of the last loop q5[n] is shifted and scaled

    to s[n] in the scaling unit and after last lowpass filter the re-

    sult o[n] is given to the output interface. All Cx(i)

    are con-

    stants.

    Table 1. Operations in the adaptation loops,

    i is the loop number and n represents sam-ple numbers.

    An useful simplification for the hardware specification

    is the fact that all values remain in the positive range up to

    last output of the last loop. Indeed, the scaling unit intro-

    duces a sign bit which propagates to the output.

    Number formats. Considering the necessary precision

    of the kernel arithmetic and available arithmetic cores in

    the synthesis tool libraries (Synopsys DesignWare), two

    approaches are possible. Simulations with the integer pro-

    Perception

    Model

    Perception

    Codec

    original

    signal

    distorted

    signal

    frequency

    weighting

    correlation

    cross- comparation/

    correlation

    subjective

    MOS-data

    q

    weighting

    frequency

    Model

    IEEE 32 bit floating, fixed or

    small floating point arithmetic

    Figure 4. Speech quality measurement used as a testbench for changes in kernel arith-metic of the adaptation loops in the perception model.

    0.00 10.00 20.00 30.00

    value

    100

    102

    104

    106

    108

    1010

    frequency

    loop0loop1loop2loop3loop4

    0.00 0.20 0.40 0.60 0.80 1.00

    value

    100

    102

    104

    106

    108

    frequency

    divisor0divisor1divisor2divisor3divisor4

    division in loop i

    i = [0, 1, 2, 3, 4]

    q0[n] = x[n] b0[n-1] (1st loop)

    qi[n] = qi-1[n] bi[n-1] (others)

    lowpass in loop i bi[n] = C1i*qi[n] + C2i*bi[n-1]

    scaling unit s[n] = (q5[n] - C3) * C4

    completing

    lowpass

    o[n] = C5*s[n] + C6*o[n-1]

  • 7/27/2019 signal processing in fpga.pdf

    5/7

    totype show that, using the available fix point operators, a

    number format of 4 integer (int part) and 15 fraction bits

    (frac part) is sufficient and all constants Cx(i) have to be

    quantized in 19 fraction bits.

    When dividing or multiplying these fix point numbersthe internal wordlengths must be greater to hold all possible

    digits: in case of the divider 34 bits (eq. 1) and the multipli-

    er 38 bit (eq. 2). The dividend has to be prescaled (shifted)

    because the integer part of the quotient can grow by the

    fraction bits of the divisor (complementary to multipliers).

    The product wordlength is the sum of the wordlength of

    the operands a and b. Operand b (filter constants) only have

    a fraction part (fract part b). In addition a 20 bit fix point

    adder and subtractor are necessary. The most expensive op-

    erator is the 34 bit divider with an unacceptable huge area

    demand and it seems to be near the limits for handling by

    the design tools.

    A floating point number format has been introduced for

    the adaptation loops to reduce the area requirements and

    long signal propagation delays through the operator combi-

    national nets (Table 2.).

    The speech quality measure testbench shows that thesmall floating point divider with 6 significant bits and 6 bit

    exponent in the unsigned operands is sufficient (Fig. 6) and

    has a impressively reduced area demand (see Table 4.).

    Table 2. Properties of the small floatingpoint number format.

    Furthermore, this number format matches the require-

    ments of speech processing systems much better than a fix

    point system with an equidistant resolution, since its loga-

    rithmical range partitioning has the best resolution at the

    lower end (near zero) of the representable dynamic rangewhere speech signals are concentrated. For the same rea-

    son, i.e. the probability density distribution of speech sig-

    nals, the A- and -law characteristics in the AD and DAconverters with companding are efficient standards for tele-

    communication systems. A similar approach is introduced

    in [8] for a neural net implementation for speech recogni-

    tion purposes, where the net weights could be successfully

    quantized in a floating point format of only 1 sign bit, 1 bit

    mantissa and 3 bit exponent.

    Prototype and VHDL implementation. Since design

    tool libraries do not support scalable floating point datatypes and -operators respectively, an own prototype was

    developed. Similar as proposed in [9] floating point opera-

    tors has been designed which incorporate fix point sub units

    provided by the synthesis tools.

    But a test and simulation environment which can evalu-

    ate signal distortions with a meaningful coverage process-

    ing large data streams (ETSI-test data [5]) is not possible on

    logic VHDL simulation level. Therefore, a C++ class was

    designed whose operators work identically like the desired

    hardware version and allow extensive tests of different

    wordlengths.

    Multiplication (eq. 3) and division by (eq. 4) use fix

    point library elements for multiplication/division of the sig-nificants and addition/subtraction of the exponents respec-

    tively [10].

    The small floating point division is enclosed in normal-

    ization operations for each operand and the result in order

    to get a leading 1 in the MSBs and to reduce complexity in

    data handling. Under- or overflow during normalizationforces signal clipping to zero or full scale. The internal

    wordlength of the divider is twice the length of the oper-

    ands to preserve the precision of the operands. Normaliza-

    tion and shrinking to the operand wordlength follow. Adder

    and subtractor need exponent aligning before the mantissas

    can be summed or subtracted. If the operands are very dif-

    ferent, one of them can disappear during aligning. When

    subtracting similar large values an additional dirty zero

    problem can occur, i.e. calculation errors grow. But in this

    case we could observe a general sufficient distance between

    subtrahend and minuend.

    Divider:

    (precisionp=5)

    Multiplier, Adder, Subtractor:

    (precision p=13)

    significand s=6

    exponent e=6

    significand s=14

    exponent e=6

    binary excess 100000

    largest error =/2 * p =0.03125 (div)

    largest error =/2 * p =0.00012207 (mul, add, sub)(machine epsilon)[7]

    max binary value (div) 111111.111111

    min binary value (div) 100000.000000

    binary zero (div) 000000.100000

    div wordlength = (int part + frac part), (frac part) (1)

    mul wordlength = (int part a), (frac part a+frac part b) (2)

    s1 2e1

    ( ) s2 2e2

    ( ) s1 s2( ) 2e1 e2+( )

    = (3)

    s1 2e1

    ( ) s2 2e2

    ( ) s1 s2( ) 2e1 e2( )

    = (4)

  • 7/27/2019 signal processing in fpga.pdf

    6/7

    The use of pure behavioral code synthesizable by

    Synopsys Behavioral Compiler presumes some more work.

    Shortly described, the Behavioral Compiler analyzes data

    dependencies and the required operator usage, schedules

    the design, and builds a controller. The type of the automat-ically created finite state machine for the controller may be

    specified. A binary encoding is used in this case. All oper-

    ators are implemented as combinational nets for easy tim-

    ing and scheduling and are handled as dedicated multi-

    cycle (-delayed) blocks if necessary. Overloading the oper-

    ators (+, -, *, /) allows inferring in VHDL and a straight for-ward coding of the algorithm. In addition, a RAM module

    of the target library was manually created and is handled by

    wrappers in behavioral code in order to have indexed cell

    access to the lowpass values via an array data type.

    Except for the RAM block, the design is coded com-

    pletely independent of a target library, because no specificcores of the FPGA technology are instanced. Thus there is

    no need for code modifications when the target library

    changes.

    4. Synthesis and simulation results

    A prototype of the core design of chip 1 (input interface,

    gammatone filterbank, halfway rectification, lowpass filter,

    and output interface) was implemented on a Xilinx

    XC4062XL-2 device. A complete mapped FPGA-cell

    netlist is transferred to the Xilinx place&route tools. When

    the temporary values are stored on an external RAM 2186logic cells are allocated. The FPGA utilization is about 40%

    (Table 3.). The timing constraints according to the sam-

    pling rate of the whole system are met even though the

    RAM access limits the clock to 32 MHz.

    Table 3. Allocated resources of a XilinxXC4062XL-2 device for the chip1 design.

    After compilation and mapping the chip2-design to the

    FPGA look-up-table cell level (not mapped to FPGA-

    gates), an EDIF netlist is transferred to the vendor specific

    place&route tool. Here, the design is mapped to physical

    cells and connected. Table 4. presents the allocated hard-

    ware resources and timing analysis results when targeting

    an Altera Flex10K100A-1 device.

    Table 4. Allocated resources of an AlteraFlex10k100A-1 device for the chip2 design.

    The state vector of the controller has eight bits storing

    142 states. Timing analysis shows that the most critical path

    is a part of this controller, reducing the maximum clock fre-

    quency. Since 50 MHz could not be reached for a common

    clock, one of the two FPGA clock networks drives the ker-

    nel with 24 MHz while the other is used for the interface

    parts. Because very few I/O pins are used by the design pin

    locking causes no routing problems.

    Simulation in the testbench was performed extensively

    on prototype level (C++) with large sample data streams.The enormous simulation times on VHDL logic level allow

    only single value or short data stream evaluation.

    The following results for versions of the chip2-arith-

    metic could be calculated (Fig. 6) using the perception

    model as a testbench. Diagram (a) shows that the model

    works correctly and the objective speech quality measure is

    well correlated with the subjective MOS (indicated by the

    linear correlation coefficient r). Nearly no losses can be

    found in diagram (c) due to fixed point quantization errors

    when the resolution is 4 integer and 30 fraction bits. In (d)

    enormous losses in the data correlation appear after reduc-

    ing the wordlength to 4 integer and 24 fraction bits. Thesmall floating point implementation works well with an op-

    erand width of 6 bits mantissa for division, 14 bits mantissa

    for all other, and 6 bits exponent for all operations.

    Real time experiments become possible with the com-

    pletion of the demonstrator board and, after installing it on

    the DSP card, a powerful signal processing system with a

    reconfigurable coprocessor is available.

    interfaces 273 logic cells

    = 5 % LC usage

    kernel 1913 logic cells

    = 35 % LC usage

    memory external RAM

    max clock frequency(external RAM access)

    32 MHz

    interfaces 195 logic cells

    = 5 % LC usage

    kernel, scaling unit and low-pass

    2983 logic cells= 59% LC usage

    memory

    (in Flex10K EAB blocks)

    3600 bits

    = 14 % EAB usage

    max clock frequency (kernel)

    (timing constraints violation)

    24 MHz

    small float divider

    (6 bit mantissa, 6 bit exponent

    operand width)

    94 logic cells,

    205 ns delay

    fix point divider

    (34 bit)

    1186 logic cells,

    1527 ns delay

  • 7/27/2019 signal processing in fpga.pdf

    7/7

    5. Conclusion

    In this paper we present our work on the digital VLSI-

    implementation of a speech perception model. The hard-

    ware design of the algorithm was derived from a recoded

    version of the model in C/C++ using special classes for fix

    point and small floating point quantization. An application

    of the model (speech quality measurement) is used to deter-

    mine optimized wordlengths in a dedicated hardware. The

    development of the perception model as a FPGA/ASIC for

    a target system, e.g. a PC-card, provides efficient co-pro-

    cessing power and allows real time implementations of

    complex auditory-based speech processing algorithms.

    References

    [1] Dau, T., Pschel, D. and Kohlrausch, A.: A quantitative

    model of the effective signal processing in the auditory

    system I. Journal of the Acoustical Society of America

    (JASA) 99 (6): 3631-3633, 1996.

    [2] Tchorz, T., Wesselkamp, M. and Kollmeier, B.: Gehrge-

    rechte Merkmalsextraktion zur robusten Spracherkennung

    in Strgeruschen. Fortschritte der AkustikDAGA 96:

    532-533, DEGA, Oldenburg, Germany, 1996.

    [3] Hansen M. and Kollmeier B.: Using a quantitative psycho-

    acoustical signal representation for objective speech quality

    measurement. In: Proc. ICASSP-97, Intl. Conf. on Acous-

    tics, Speech and Signal Proc.: 1387, Munich, Germany,

    1997.

    [4] Hansen, M.: Assessment and prediction of speechtransmis-

    sion quality with an auditory processing model, Disserta-

    tion, Oldenburg, Germany, 1998.

    [5] ETSI, TM/TM5/TCH-HS.: Selection Test Phase II: Listen-

    ing test results with German speech samples. Technical Re-

    port 92/35, FI/DBP-Telekom. Experiment 1, IM4, 1992.

    [6] Vary, P., Heute, U., Hess, W.:Digitale Sprachsignalverar-

    beitung. Teubner, Stuttgart, Germany, 1998.

    [7] Goldberg, D.: What every Computer Scientist Should

    Know About Floating-Point Arithmetic, Computing Sur-veys, March 1991.

    [8] Wst, H., Kasper, K., Reininger, H.: Hybrid Number Rep-

    resentation for the FPGA-Realization of a Versatile Neuro-

    Processor. Proc. EUROMICRO98, 694-701, Vsteras,

    Sweden, 1998.

    [9] Shirazi, N., Walters, A., Athanas, P.: Quantitative Analysis

    of Floating Point Arithmetic on FPGA Based Custom Com-

    puting Machines. Technical Report, Virginia Polytechnic

    Institute and State University, Blacksburg, Virginia, 1995.

    [10] Hennessy, J. L., Patterson, D. A.: Computer Architecture -

    A Quantitative Approach. Morgan Kaufmann Publishers,

    Inc., San Francisco, California, 1996.

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    0.75 0.8 0.85 0.9 0.95 1

    "pmx6_6_div_sparc.rpt"

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    0.75 0.8 0.85 0.9 0.95 1

    subjectiveMOS

    objective measure q

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    0.75 0.8 0.85 0.9 0.95 1

    subjectiveMO

    S

    objective measure q

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    0.75 0.8 0.85 0.9 0.95 1

    subjectiveMOS

    objective measure q

    IEEE float single prec.(a)r=0.935

    r=0.63

    4 int bits / 24 frac bits(d)r=0.927

    4 int bits / 30 frac bits(c)

    (b) Add, Sub, Mul: 14(M) 6(E)r=0.928 Div: 6(M) 6(E)

    Figure 6. Results for a complete objective speech quality measurement with theETSI half-rate selection test data [4][5].