high speed approximate multiplier with ...jscglobal.org/gallery/7-dec-1466.pdfsignal processing...

HIGH SPEED APPROXIMATE MULTIPLIER WITH CONFIGURABLE ERROR

RECOVERY

R.SOWNDARYA1 S.SIVAPRAKASAM

2

PG Scholar1 Assistant professor

2

[email protected] [email protected]

2

Department of Electronics and Communication Engineering 1,2

Narasu’s Sarathy Institute of Technology, India.1,2

ABSTRACT

Approximate circuits have been

considered for applications that can tolerate

some loss of accuracy with improved

performance and/or energy efficiency.

Multipliers are key arithmetic circuits in

many of these applications including digital

signal processing (DSP). In this paper, a

novel approximate multiplier with low

power consumption and a short critical

path is proposed for high-performance DSP

applications. The approximate multipliers

using these two error reduction strategies

are referred to as AM1 and AM2,

respectively. Both AM1 and AM2 have a

low mean error distance, i.e., most of the

errors are not significant in magnitude.

Compared with a Wallace multiplier

optimized for speed, an 8×8 AM1 using four

most significant bits for error reduction

shows a 60% reduction in delay (when

optimized for delay) and a 42% reduction

in power dissipation (when optimized for

area). Compared with the Wallace

multiplier, TAM1 and TAM2 save from

50% to 66% in power, when optimized for

area. Compared with existing approximate

multipliers, AM1, AM2, TAM1, and TAM2

show significant advantages in accuracy

with a low power-delay product. AM2 has a

better accuracy compared with AM1 but

with a longer delay and higher power

consumption. Image processing

applications, including image sharpening

and smoothing, are considered to show the

quality of the approximate multipliers in

error-tolerant applications. By utilizing an

appropriate error recovery scheme, the

proposed approximate multipliers achieve

similar processing accuracy as exact

multipliers, but with significant

improvements in power.

Keywords: Approximate multiplier,

partial product, approximate adder, error

recovery, low-power, image processing,

digital systems.

OBJECTIVE:

The main aim of our project is to

“EMERGED AS A POTENTIAL SOLUTION

FOR THE DESIGN OF ENERGY

EFFICIENT DIGITAL SYSTEMS” and to

minimize the working time of the computing

process. Approximate computing has

Applications such as multimedia, recognition

and data mining are inherently error-tolerant

and do not require a perfect accuracy in

computation. For digital signal processing

(DSP) applications, the result is often left to

interpretation by human perception. Therefore,

strict exactness may not be required and an

imprecise result may suffice due to the

limitation of human perception. For these

applications, approximate circuits play an

important role as a promising alternative for

reducing area, power and delay, thereby

achieving better performance in energy

Journal of Scientific Computing

Volume 8 Issue 12 2019

ISSN NO: 1524-2560

http://jscglobal.org/57

mailto:[email protected]

mailto:[email protected]

efficiency. A multiplier usually consists of

three stages: partial product generation, partial

product accumulation and a carry propagation

adder (CPA) at the final stage. In the

underdesigned multiplier (UDM), approximate

partial products are computed using inaccurate

2 × 2 multiplier blocks, while accurate adders

are used in an adder tree to accumulate the

approximate partial products, approximate 4 ×

4 and 8 × 8 bit Wallace multipliers are

designed by using a carry-in prediction

method. Then, they are used in the design of

approximate 16 × 16 Wallace multipliers,

referred to as AWTM. The AWTM is

configured into four different modes by using

a different number of approximate 4 × 4 and 8

× 8 multipliers.

EXISTING SYSTEM

Generally, a multiplier consists of

stages of partial product generation,

accumulation and final addition. The

commonly used partial product accumulation

structures include the Wallace, Dadda trees

and a carry-save adder array. In a Wallace tree,

log2(n) layers are required for an n-bit

multiplier. The adders in each layer operate in

parallel without carry propagation, and the

same operation repeats until two rows of

partial products remain. Therefore, the delay

of the partial product accumulation stage is

O(log2(n)). Moreover, the adders in a Wallace

tree can be considered as a 3:2 compressor and

can be replaced by other counters or

compressors (e.g. a 4:2 compressor) to further

reduce the delay. The Dadda tree has a similar

structure as the Wallace tree, but it uses as few

adders as possible. For a carry-save adder

array, the carry and sum signals generated by

the adders in a row are connected to the adders

in the next row. However, an array requires a

smaller area and thus a lower power

dissipation due to the simple and symmetric

structure. Three methodologies are applicable

to approximate a multiplier: i) approximation

in generating the partial products, ii)

approximation (including truncation) in the

partial product tree and iii) using approximate

designs of adders, counters and/or compressors

to accumulate the partial products. Following

this classification, existing designs of

approximate multipliers are briefly reviewed

next.

Drawbacks of Existing System:

Longer delay

High power consumption

Low accuracy

Only luminance multiplication is

achieved ( Gray Scale )

Errors in the final product.

PROPOSED SYSTEM:

The proposed multiplier can be

configured to two designs by using OR gates

and the proposed approximate adders for error

reduction, referred to as approximate

multiplier 1 (AM1) and approximate multiplier

2 (AM2), respectively. Different levels of error

recovery can also be achieved by using a

different number of MSBs for error recovery

in both AM1 and AM2. Compared to the

traditional Wallace tree, the proposed

multipliers have significantly shorter critical

paths. Functional and circuit simulations are

performed to evaluate the performance of the

multipliers. Image sharpening and smoothing

are considered as approximate multiplication

based DSP applications. Experimental results



ISSN NO: 1524-2560


indicate that the proposed approximate

multipliers perform well in these error tolerant

image processing applications. The proposed

designs can be used as effective library cells

for the synthesis of approximate circuits. In the

proposed approximate multiplier, a simple tree

of the approximate adders is used for partial

product accumulation and the error signals are

used to compensate error for obtaining a better

accuracy.

Main features of proposed system

Excellent delay and power

consumption

High accuracy

Both multiplication of luminance and

Chrominance is achieved

A.Approximation in generating partial

products:

The underdesigned multiplier (UDM)

utilizes an approximate 2 × 2 bit multiplier

block obtained by altering a single entry in the

Karnaugh Map (K-Map) of its function. In this

approximation, the accurate result “1001” for

the multiplication of “11” and “11” is

simplified to “111” to save one output bit.

Assuming the value of each input bit is equally

likely, the error rate of the 2 × 2 bit multiplier

block is ( 1 2 ) 4 = 1 16 .

B. Approximation in the partial product

tree:

A NOR gate based control block is

used to deal with two cases: i) if the product of

the MSBs is zero, then the multiplication

section is activated to multiply the LSBs

without any approximation, and ii) if the

product of the MSBs is nonzero, the non-

multiplication section is used as an

approximate multiplier to process the LSBs,

while the multiplication section is activated to

multiply the MSBs.

C. High-speed inventorying:

The proposed compressor halves the

height of the partial product tree and generates

a vector to recover accuracy. Furthermore,

with 0.28% normalized mean error distance,

the silicon area required to implement the

multiplier is reduced by 50.1%.

D. Automated partial product calculation:

Different from ETM, no

approximation is applied to the LSBs in the

SSM. Either the MSBs or the LSBs of each of

the operands is accurately multiplied

depending on whether its MSBs are all zeros.

A power and area-efficient approximate

Wallace tree multiplier (AWTM) is based on a

bit-width aware approximate multiplication

and a carry-in prediction method.

BLOCK DIAGRAM:

Fig1. Block diagram of approximate

unsigned multiplier

ERROR REDUCTION:

The approximate adder generates two

signals: the approximate sum S and the error

E; the use of the error signal is considered next

to reduce the inaccuracy of the multiplier. Two

steps are required to reduce errors: i) error



ISSN NO: 1524-2560


accumulation and ii) error recovery by the

addition of the accumulated errors to the adder

tree output using a CPA.

A. Error Accumulation for Approximate

Multiplier 1

As shown in Figure, each approximate

adder Ai generates a sum vector Si and an error

vector Ei, where i= 1, 2, ・・・,7. If the error

signals are added using accurate adders, the

accumulated error can fully compensate the

inaccurate product; however to reduce

complexity, an approximate error

accumulation is introduced. Hence, an OR gate

is used to approximately compute the sum of

the errors for a single bit..

Fig 2. Symbols for (a) an OR gate, (b) an

full adder or a half adder and (c) an

approximate adder cell

To reduce errors, an accumulated error

vector is added to the adder tree output using a

conventional CPA (e.g. a carry lookahead

adder). However, only several (e.g. k) MSBs

of the error signals are used to compensate the

outputs to further reduce the overall

complexity. The number of MSBs is selected

according to the extent that errors must be

compensated. For example in an 8 × 8 adder

tree, there are a total of 7 error vectors,

generated by the 7 approximate adders in the

tree.

Fig 3: Error accumulation tree for AM1.

B. Error Accumulation for Approximate

Multiplier 2

The error accumulation scheme for

AM2 is shown in Figure. To introduce the

design of AM2, an 8 × 8 multiplier with two

inputs X and Y is considered. For example,

consider the first two partial product vectors

X0Y7, X0Y6, ..., X0Y0 and X1Y7, X1Y6, ...,

X1Y0 accumulated by the first approximate

adder (A1 in Fig. 1), where Xi and Yi are the

ithleast significant bits of X and Y ,

respectively.

After applying the OR gates to

accumulate E1 and E2 as well as E3 and E4,

the four error vectors are compressed into two.

For E5, E6 and E7, they are generated from

the approximate sum of the partial products

rather than the partial products. Therefore,

they cannot be accurately accumulated by OR

gates.



ISSN NO: 1524-2560


Fig 4. Error accumulation tree for AM2.

Another interesting feature of the

proposed approximate adder is as follows.

Assume Ei= 1 in (6), then Ai−1 = Bi−1 = 1 and

Ai _= Bi. Since Ai−1 = Bi−1 = 1, i.e., Ai−1 _

Bi−1 = 0, it is easy to show that Ei−1 = 0.

Moreover, as Ai _= Bi, i.e., Ai Bi = 0, then

Ei+1 = 0. Thus, once there is an error in one

bit, its neighboring bits are error free, i.e.,

there are no consecutive error bits in one row.

Simulation results (found in later sections)

show that the modified error accumulation

outperforms the OR-gate error accumulation

with little overhead on delay and power.

Hereafter, the proposed n × napproximate

multiplier with k-MSB OR-gate based error

reduction is referred to as an n/k AM1, while

an n × napproximate multiplier with k-MSB

approximate adder based error reduction is

referred to as an n/k AM2.

ROLE OF APPROXIMATE UNSIGNED

MULTIPLIER:

Proposed System:

To developed Image

Multiplication with technique of

both luminance and chrominance

(Y Cb Cr).

To developed AM1 and AM2

Multiplication and compared all

the terms of area, delay and power.

The Approximate Adder:

In this section, the design of a new

approximate adder is presented. This adder

operates on a set of pre-processed inputs. The

input pre-processing (IPP) is based on the

interchangeability of bits with the same

weights in different addends. If A˙iB˙I are the

ith bits in the pre-processed inputs, the IPP

functions are given by:

i= Ai + Bi ……(1)

i= AiBi ……(2)

TABLE 1. Truth table of an approximate

adder cell. “X” represents that no such a

combination occurs due to the IPP.

By replacing A˙ i and B˙ i using (1) and

(2) respectively, the logic functions with

respect to the original inputs are given

bywhere i is the bit index, i.e., i = 0, 1, · · · ,

n for an n-bit adder. Let A−1 = B−1 = 0

when i is 0, thus, S0 = A0 ⊕ B0 and E0 = 0.

Also, Ei = 0 when Ai−1 or Bi−1 is 0.



ISSN NO: 1524-2560


Fig 7. Delay, power and area comparisons

of proposed 16× 16 approximate and the

optimized Wallace multipliers.

16 × 16 Approximate Multipliers

In both AM1 and AM2, all the error

vectors are compressed to one error vector,

which is then added back to the approximate

output of the partial product tree. Compared to

8 × 8 designs, 16 × 16 multipliers generate

more error vectors, and too much information

would be ignored if the same error reduction

strategies are used. That is, using only one

compressed error vector does not make a good

estimation of the overall error.

Fig 8. Accuracy comparison of the

approximate 16× 16 multipliers vs. the

number of bits used for error reduction.

In present-day VLSI technology,

errors cannot be avoided and it is not always

feasible to overcome all errors. An attempt to

get rid of all errors results in excessive power

consumption and also slows down the

system.

Fig.9 (a) An exact full adder and (b) the

approximate adder cell.

IMPLEMENTATION RESULTS:

Electrical Performance:

All the investigated circuits have been

described in HDL language and synthesized in

TSMC 40 nm technology, imposing proper

timing constraints. In all the investigated

circuits, the final carry-propagate adder (CPA)

was implemented by using a fast parallel-

prefix topology. To make the comparison

meaningful, area and power are obtained by

synthesizing all the circuits with the same

timing constraint, equal to the delay of the

exact multiplier. For all the approximate

multipliers we have also calculated the

variations with respect to the exact multiplier

(reported as a percentage in , where a negative

percentage means improvement, while a

positive one means worsening).

Error Performance:

The error metrics considered in the following

for the approximate multipliers are:

The probability of an incorrect result,

Error Rate, (ER).



ISSN NO: 1524-2560


The average value of the errors

produced by the multiplier, Mean

Error, (ME).

The root mean square of the errors

produced by the multiplier (ERMS).

SOFTWARE REQUIREMENT:

Verification Tool

Modelsim 6.4c

Synthesis Tool

Xilinx ISE 13.2

Software Description:

Modelsim:

Modelsim is a hardware simulation

and debug environment primarily targeted at

smaller ASIC and FPGA design.

Modelsimcombines simulation performance

and capacity with the code coverage and

debugging capabilities required to simulate

multiple blocks and systems and attain ASIC

gate-level sign-off. Comprehensive support of

Verilog, SystemVerilog for Design, VHDL,

and SystemC provide a solid foundation for

single and multi-language design verification

environments. ModelSim’s easy to use and

unified debug and simulation environment

provide today’s FPGA designers both the

advanced capabilities that they are growing to

need and the environment that makes their

work productive. ModelSim is a verification

and simulation tool for VHDL, Verilog,

SystemVerilog, and mixed language designs.

Project Flow:

Fig.10 Project Flow

A project is a collection mechanism

for an HDL design under specification or test.

Even though you don’t have to use projects in

ModelSim, they may ease interaction with the

tool and are useful for organizing files and

specifying simulation settings.

TABLE. 2 PSNR of image processing

applications for AM1 and AM2.

Functional analysis has shown that on a

statistical basis, the proposed multipliers have

very small error distances and thus, they

achieve a high accuracy. Simulation has also

shown that AM2 has a higher accuracy than

AM1 at the cost of a longer delay and a higher

power consumption.

APPLICATIONS

Image Filtering:

We have investigated the quality-

power trade-off of approximate multipliers in a

Gaussian smoothing image filtering, which is a

typical error resilient application.



ISSN NO: 1524-2560


Fig. 11 Partial filtering of image

LMS Filtering:

Adaptive Least-Mean-Square (LMS)

is the most popular adaptive filtering

algorithm, with applications ranging from

adaptive noise cancellation to system

identification and channel equalization. LMS

aims to minimize the mean squared error

(MSE) between the output of the adaptive

filter (typically a Finite Impulse Response

filter, FIR) and a desired signal.

Fig. 11LMS System identification

In our test case we employed as filter

to be identified an IIR Butterworth low-pass

filter with bandpass gain equal to 0 dB (from

zero to 0.2 π⋅rad /sample) and stopband

attenuation of −60 dB at 0.3125π⋅rad /sample.

The LMS FIR filter uses 125 taps and is

implemented by using 16-bit fixed point

representation (only 16×16 signed multipliers

are employed).

Fig.12Adaptive filters variations

The convergence of the algorithm has

been evaluated through 100 simulations, each

one with a duration of 5×104 iterations, using

as input x(n) a white Gaussian signal with zero

mean, 0.3 standard deviation and range in (−1,

1). The adaptive filters have been synthesized

by imposing a maximum clock frequency of

290 MHz. Mean error compensation has been

applied at system level for all the investigated

approximate circuits.



ISSN NO: 1524-2560


https://ieeexplore.ieee.org/mediastore_new/IEEE/content/media/8919/8511082/8383694/espos11-2839266-large.gif




FUTURE ENHANCEMENT:

International Journal of VLSI design

& Communication Systems (VLSICS) Vol.10,

No.1, February 2019 2 within the third step,

the ultimate sums and carries square measure

additional to come up with the result. A

changed booth multiplier factor ought to focus

on the subsequent things. On reducing the full

range of partial product generated. This may

include any committal to writing ways or

reduction of computation quality of generation

partial products.

Fig 13. Images sharpened using the

proposed multipliers.

A big quantity of delay is consumed

find two’s complement of

multiplicand. So this delay should be

reduced.

The optimization of adder structure.

Once partial product generated, they

need to be classified and supplemental

during a systematic manner intense

less delay.

Fig.14 Images multiplied by different

multipliers.

This low power, fast and area efficient

multiplier can be used for FIR filter design,

MAC design as an extension to this paper. The

hardware implementations of the approximate

multiplier including one for the unsigned and

two for the signed operations can be done. It

can be downloaded into FPGA for further

improvements and observations.



ISSN NO: 1524-2560


SNAP SHOT AS BINARY VALUE:

Fig 15. Output as binary value

SNAP SHOT AS UNSIGNED VALUES:

Fig 16.Output as unsigned values.

LITERATURE REVIEW:

[1] IMPACT: IMPrecise adders for low-

power Approximate CompuTing:

Low-power is an imperative requirement for

portable multimedia devices employing

various signal processing algorithms and

architectures. In most multimedia applications,

the final output is interpreted by human senses,

which are not perfect. This fact obviates the

need to produce exactly correct numerical

outputs. Previous research in this context

exploits error-resiliency primarily through

voltage overscaling, utilizing algorithmic and

architectural techniques to mitigate the

resulting errors. In this paper, we propose

logic complexity reduction as an alternative

approach to take advantage of the relaxation of

numerical accuracy.

[2] Approximate Computing: An

Emerging Paradigm For Energy-Efficient

Design:

Approximate computing has recently emerged

as a promising approach to energy-efficient

design of digital systems. Approximate

computing relies on the ability of many

systems and applications to tolerate some loss

of quality or optimality in the computed result.

By relaxing the need for fully precise or

completely deterministic operations,

approximate computing techniques allow

substantially improved energy efficiency. This

paper reviews recent progress in the area,

including design of approximate arithmetic

blocks, pertinent error and quality measures,

and algorithm-level techniques for

approximate computing.

[3] Bio-Inspired Imprecise Computational

Blocks for Efficient VLSI Implementation of

Soft-Computing Applications:

The conventional digital hardware

computational blocks with different structures

are designed to compute the precise results of

the assigned calculations. The main

contribution of our proposed Bio-inspired

Imprecise Computational blocks (BICs) is that

they are designed to provide an applicable

estimation of the result instead of its precise

value at a lower cost.

[4] Variable Latency Speculative

Addition: A New Paradigm for Arithmetic

Circuit Design:



ISSN NO: 1524-2560


Adders are one of the key components in

arithmetic circuits. Enhancing their

performance can significantly improve the

quality of arithmetic designs. This is the

reason why the theoretical lower bounds on

the delay and area of an adder have been

analysed, and circuits with performance close

to these bounds have been designed.

CONCLUSION

This paper proposes a high-

performance and low-power approximate

partial product accumulation tree for a

multiplier using a newly designed approximate

adder. The proposed approximate adder

ignores the carry propagation by generating

both an approximate sum and an error vector.

OR gate and approximate adder based error

reduction schemes are utilized, yielding two

different approximate 8 × 8 multiplier designs:

AM1 and AM2. Moreover, modifications are

made on the error reduction schemes for 16 ×

16 multiplier designs, such that TAM1 and

TAM2 are obtained by truncating 16 LSBs of

the partial products. The proposed

approximate multipliers have been shown to

have a lower power dissipation than an exact

Wallace multiplier optimized for speed.

REFERENCES:

[1] J. Han and M. Orshansky, “Approximate

computing: An emerging paradigm for energy-

efficient design,” in Proc. 18th IEEE Eur. Test

Symp., May 2013, pp. 1–6.

[2] S.-L. Lu, “Speeding up processing with

approximation circuits,” Computer, vol. 37,

no. 3, pp. 67–73, Mar. 2004.

[3] A. K. Verma, P. Brisk, and P. Ienne,

“Variable latency speculative addition: A new

paradigm for arithmetic circuit design,” in

Proc. Design, Automat. Test Eur., Mar. 2008,

pp. 1250–1255.

[4] N. Zhu, W. L. Goh, and K. S. Yeo, “An

enhanced low-power high-speed adder for

error-tolerant application,” in Proc. 12th Int.

Symp. Integr. Circuits, Dec. 2009, pp. 69–72.

[5] H. R. Mahdiani, A. Ahmadi, S. M.

Fakhraie, and C. Lucas, “Bio-inspired

imprecise computational blocks for efficient

VLSI implementation of soft-computing

applications,” IEEE Trans. Circuits Syst. I,

Reg. Papers, vol. 57, no. 4, pp. 850–862, Apr.

2010.

[6] V. Gupta, D. Mohapatra, S. P. Park, A.

Raghunathan, and K. Roy, “IMPACT:

IMPrecise adders for low-power approximate

computing,” in Proc. IEEE/ACM Int. Symp.

Low Power Electron. Design, Aug. 2011, pp.

409–414.

[7] A. B. Kahng and S. Kang, “Accuracy-

configurable adder for approximate arithmetic

designs,” in Proc. Design Automat. Conf., Jun.

2012, pp. 820–825.

[8] K. Du, P. Varman, and K. Mohanram,

“High performance reliable variable latency

carry select addition,” in Proc. Design,

Automat. Test Eur. Conf. Exhib., Mar. 2012,

pp. 1257–1262.

[9] J. Liang, J. Han, and F. Lombardi, “New

metrics for the reliability of approximate and

probabilistic adders,” IEEE Trans. Comput.,

vol. 62, no. 9, pp. 1760–1771, Jun. 2012.

[10] J. Huang, J. Lach, and G. Robins, “A

methodology for energy-quality tradeoff using

imprecise hardware,” in Proc. Design

Automat. Conf., Jun. 2012, pp. 504–509.



ISSN NO: 1524-2560


[11] J. Miao, K. He, A. Gerstlauer, and M.

Orshansky, “Modeling and synthesis of

quality-energy optimal approximate adders,”

in Proc. IEEE/ACM Int. Conf. Comput.-Aided

Design, Nov. 2012, pp. 728–735.

[12] R. Venkatesan, A. Agarwal, K. Roy, and

A. Raghunathan, “MACACO: Modeling and

analysis of circuits for approximate

computing,” in Proc. IEEE/ACM Int. Conf.

Comput.-Aided Design, Nov. 2011, pp. 667–

673.

[13] H. Jiang, C. Liu, L. Liu, F. Lombardi, and

J. Han, “A review, classification, and

comparative evaluation of approximate

arithmetic circuits,” ACM J. Emerg. Technol.

Comput. Syst., vol. 13, no. 4, 2017, Art. no.

60.

[14] P. Kulkarni, P. Gupta, and M. D.

Ercegovac, “Trading accuracy for power in a

multiplier architecture,” J. Low Power

Electron., vol. 7, no. 4, pp. 490–501, 2011.



ISSN NO: 1524-2560


high speed approximate multiplier with ...jscglobal.org/gallery/7-dec-1466.pdfsignal processing...

Documents