high speed approximate multiplier with ...jscglobal.org/gallery/7-dec-1466.pdfsignal processing...
TRANSCRIPT
HIGH SPEED APPROXIMATE MULTIPLIER WITH CONFIGURABLE ERROR
RECOVERY
R.SOWNDARYA1 S.SIVAPRAKASAM
2
PG Scholar1 Assistant professor
2
[email protected] [email protected]
2
Department of Electronics and Communication Engineering 1,2
Narasu’s Sarathy Institute of Technology, India.1,2
ABSTRACT
Approximate circuits have been
considered for applications that can tolerate
some loss of accuracy with improved
performance and/or energy efficiency.
Multipliers are key arithmetic circuits in
many of these applications including digital
signal processing (DSP). In this paper, a
novel approximate multiplier with low
power consumption and a short critical
path is proposed for high-performance DSP
applications. The approximate multipliers
using these two error reduction strategies
are referred to as AM1 and AM2,
respectively. Both AM1 and AM2 have a
low mean error distance, i.e., most of the
errors are not significant in magnitude.
Compared with a Wallace multiplier
optimized for speed, an 8×8 AM1 using four
most significant bits for error reduction
shows a 60% reduction in delay (when
optimized for delay) and a 42% reduction
in power dissipation (when optimized for
area). Compared with the Wallace
multiplier, TAM1 and TAM2 save from
50% to 66% in power, when optimized for
area. Compared with existing approximate
multipliers, AM1, AM2, TAM1, and TAM2
show significant advantages in accuracy
with a low power-delay product. AM2 has a
better accuracy compared with AM1 but
with a longer delay and higher power
consumption. Image processing
applications, including image sharpening
and smoothing, are considered to show the
quality of the approximate multipliers in
error-tolerant applications. By utilizing an
appropriate error recovery scheme, the
proposed approximate multipliers achieve
similar processing accuracy as exact
multipliers, but with significant
improvements in power.
Keywords: Approximate multiplier,
partial product, approximate adder, error
recovery, low-power, image processing,
digital systems.
OBJECTIVE:
The main aim of our project is to
“EMERGED AS A POTENTIAL SOLUTION
FOR THE DESIGN OF ENERGY
EFFICIENT DIGITAL SYSTEMS” and to
minimize the working time of the computing
process. Approximate computing has
Applications such as multimedia, recognition
and data mining are inherently error-tolerant
and do not require a perfect accuracy in
computation. For digital signal processing
(DSP) applications, the result is often left to
interpretation by human perception. Therefore,
strict exactness may not be required and an
imprecise result may suffice due to the
limitation of human perception. For these
applications, approximate circuits play an
important role as a promising alternative for
reducing area, power and delay, thereby
achieving better performance in energy
Journal of Scientific Computing
Volume 8 Issue 12 2019
ISSN NO: 1524-2560
http://jscglobal.org/57
efficiency. A multiplier usually consists of
three stages: partial product generation, partial
product accumulation and a carry propagation
adder (CPA) at the final stage. In the
underdesigned multiplier (UDM), approximate
partial products are computed using inaccurate
2 × 2 multiplier blocks, while accurate adders
are used in an adder tree to accumulate the
approximate partial products, approximate 4 ×
4 and 8 × 8 bit Wallace multipliers are
designed by using a carry-in prediction
method. Then, they are used in the design of
approximate 16 × 16 Wallace multipliers,
referred to as AWTM. The AWTM is
configured into four different modes by using
a different number of approximate 4 × 4 and 8
× 8 multipliers.
EXISTING SYSTEM
Generally, a multiplier consists of
stages of partial product generation,
accumulation and final addition. The
commonly used partial product accumulation
structures include the Wallace, Dadda trees
and a carry-save adder array. In a Wallace tree,
log2(n) layers are required for an n-bit
multiplier. The adders in each layer operate in
parallel without carry propagation, and the
same operation repeats until two rows of
partial products remain. Therefore, the delay
of the partial product accumulation stage is
O(log2(n)). Moreover, the adders in a Wallace
tree can be considered as a 3:2 compressor and
can be replaced by other counters or
compressors (e.g. a 4:2 compressor) to further
reduce the delay. The Dadda tree has a similar
structure as the Wallace tree, but it uses as few
adders as possible. For a carry-save adder
array, the carry and sum signals generated by
the adders in a row are connected to the adders
in the next row. However, an array requires a
smaller area and thus a lower power
dissipation due to the simple and symmetric
structure. Three methodologies are applicable
to approximate a multiplier: i) approximation
in generating the partial products, ii)
approximation (including truncation) in the
partial product tree and iii) using approximate
designs of adders, counters and/or compressors
to accumulate the partial products. Following
this classification, existing designs of
approximate multipliers are briefly reviewed
next.
Drawbacks of Existing System:
Longer delay
High power consumption
Low accuracy
Only luminance multiplication is
achieved ( Gray Scale )
Errors in the final product.
PROPOSED SYSTEM:
The proposed multiplier can be
configured to two designs by using OR gates
and the proposed approximate adders for error
reduction, referred to as approximate
multiplier 1 (AM1) and approximate multiplier
2 (AM2), respectively. Different levels of error
recovery can also be achieved by using a
different number of MSBs for error recovery
in both AM1 and AM2. Compared to the
traditional Wallace tree, the proposed
multipliers have significantly shorter critical
paths. Functional and circuit simulations are
performed to evaluate the performance of the
multipliers. Image sharpening and smoothing
are considered as approximate multiplication
based DSP applications. Experimental results
Journal of Scientific Computing
Volume 8 Issue 12 2019
ISSN NO: 1524-2560
http://jscglobal.org/58
indicate that the proposed approximate
multipliers perform well in these error tolerant
image processing applications. The proposed
designs can be used as effective library cells
for the synthesis of approximate circuits. In the
proposed approximate multiplier, a simple tree
of the approximate adders is used for partial
product accumulation and the error signals are
used to compensate error for obtaining a better
accuracy.
Main features of proposed system
Excellent delay and power
consumption
High accuracy
Both multiplication of luminance and
Chrominance is achieved
A.Approximation in generating partial
products:
The underdesigned multiplier (UDM)
utilizes an approximate 2 × 2 bit multiplier
block obtained by altering a single entry in the
Karnaugh Map (K-Map) of its function. In this
approximation, the accurate result “1001” for
the multiplication of “11” and “11” is
simplified to “111” to save one output bit.
Assuming the value of each input bit is equally
likely, the error rate of the 2 × 2 bit multiplier
block is ( 1 2 ) 4 = 1 16 .
B. Approximation in the partial product
tree:
A NOR gate based control block is
used to deal with two cases: i) if the product of
the MSBs is zero, then the multiplication
section is activated to multiply the LSBs
without any approximation, and ii) if the
product of the MSBs is nonzero, the non-
multiplication section is used as an
approximate multiplier to process the LSBs,
while the multiplication section is activated to
multiply the MSBs.
C. High-speed inventorying:
The proposed compressor halves the
height of the partial product tree and generates
a vector to recover accuracy. Furthermore,
with 0.28% normalized mean error distance,
the silicon area required to implement the
multiplier is reduced by 50.1%.
D. Automated partial product calculation:
Different from ETM, no
approximation is applied to the LSBs in the
SSM. Either the MSBs or the LSBs of each of
the operands is accurately multiplied
depending on whether its MSBs are all zeros.
A power and area-efficient approximate
Wallace tree multiplier (AWTM) is based on a
bit-width aware approximate multiplication
and a carry-in prediction method.
BLOCK DIAGRAM:
Fig1. Block diagram of approximate
unsigned multiplier
ERROR REDUCTION:
The approximate adder generates two
signals: the approximate sum S and the error
E; the use of the error signal is considered next
to reduce the inaccuracy of the multiplier. Two
steps are required to reduce errors: i) error
Journal of Scientific Computing
Volume 8 Issue 12 2019
ISSN NO: 1524-2560
http://jscglobal.org/59
accumulation and ii) error recovery by the
addition of the accumulated errors to the adder
tree output using a CPA.
A. Error Accumulation for Approximate
Multiplier 1
As shown in Figure, each approximate
adder Ai generates a sum vector Si and an error
vector Ei, where i= 1, 2, ・・・,7. If the error
signals are added using accurate adders, the
accumulated error can fully compensate the
inaccurate product; however to reduce
complexity, an approximate error
accumulation is introduced. Hence, an OR gate
is used to approximately compute the sum of
the errors for a single bit..
Fig 2. Symbols for (a) an OR gate, (b) an
full adder or a half adder and (c) an
approximate adder cell
To reduce errors, an accumulated error
vector is added to the adder tree output using a
conventional CPA (e.g. a carry lookahead
adder). However, only several (e.g. k) MSBs
of the error signals are used to compensate the
outputs to further reduce the overall
complexity. The number of MSBs is selected
according to the extent that errors must be
compensated. For example in an 8 × 8 adder
tree, there are a total of 7 error vectors,
generated by the 7 approximate adders in the
tree.
Fig 3: Error accumulation tree for AM1.
B. Error Accumulation for Approximate
Multiplier 2
The error accumulation scheme for
AM2 is shown in Figure. To introduce the
design of AM2, an 8 × 8 multiplier with two
inputs X and Y is considered. For example,
consider the first two partial product vectors
X0Y7, X0Y6, ..., X0Y0 and X1Y7, X1Y6, ...,
X1Y0 accumulated by the first approximate
adder (A1 in Fig. 1), where Xi and Yi are the
ithleast significant bits of X and Y ,
respectively.
After applying the OR gates to
accumulate E1 and E2 as well as E3 and E4,
the four error vectors are compressed into two.
For E5, E6 and E7, they are generated from
the approximate sum of the partial products
rather than the partial products. Therefore,
they cannot be accurately accumulated by OR
gates.
Journal of Scientific Computing
Volume 8 Issue 12 2019
ISSN NO: 1524-2560
http://jscglobal.org/60
Fig 4. Error accumulation tree for AM2.
Another interesting feature of the
proposed approximate adder is as follows.
Assume Ei= 1 in (6), then Ai−1 = Bi−1 = 1 and
Ai _= Bi. Since Ai−1 = Bi−1 = 1, i.e., Ai−1 _
Bi−1 = 0, it is easy to show that Ei−1 = 0.
Moreover, as Ai _= Bi, i.e., Ai Bi = 0, then
Ei+1 = 0. Thus, once there is an error in one
bit, its neighboring bits are error free, i.e.,
there are no consecutive error bits in one row.
Simulation results (found in later sections)
show that the modified error accumulation
outperforms the OR-gate error accumulation
with little overhead on delay and power.
Hereafter, the proposed n × napproximate
multiplier with k-MSB OR-gate based error
reduction is referred to as an n/k AM1, while
an n × napproximate multiplier with k-MSB
approximate adder based error reduction is
referred to as an n/k AM2.
ROLE OF APPROXIMATE UNSIGNED
MULTIPLIER:
Proposed System:
To developed Image
Multiplication with technique of
both luminance and chrominance
(Y Cb Cr).
To developed AM1 and AM2
Multiplication and compared all
the terms of area, delay and power.
The Approximate Adder:
In this section, the design of a new
approximate adder is presented. This adder
operates on a set of pre-processed inputs. The
input pre-processing (IPP) is based on the
interchangeability of bits with the same
weights in different addends. If A˙iB˙I are the
ith bits in the pre-processed inputs, the IPP
functions are given by:
i= Ai + Bi ……(1)
i= AiBi ……(2)
TABLE 1. Truth table of an approximate
adder cell. “X” represents that no such a
combination occurs due to the IPP.
By replacing A˙ i and B˙ i using (1) and
(2) respectively, the logic functions with
respect to the original inputs are given
bywhere i is the bit index, i.e., i = 0, 1, · · · ,
n for an n-bit adder. Let A−1 = B−1 = 0
when i is 0, thus, S0 = A0 ⊕ B0 and E0 = 0.
Also, Ei = 0 when Ai−1 or Bi−1 is 0.
Journal of Scientific Computing
Volume 8 Issue 12 2019
ISSN NO: 1524-2560
http://jscglobal.org/61
Fig 7. Delay, power and area comparisons
of proposed 16× 16 approximate and the
optimized Wallace multipliers.
16 × 16 Approximate Multipliers
In both AM1 and AM2, all the error
vectors are compressed to one error vector,
which is then added back to the approximate
output of the partial product tree. Compared to
8 × 8 designs, 16 × 16 multipliers generate
more error vectors, and too much information
would be ignored if the same error reduction
strategies are used. That is, using only one
compressed error vector does not make a good
estimation of the overall error.
Fig 8. Accuracy comparison of the
approximate 16× 16 multipliers vs. the
number of bits used for error reduction.
In present-day VLSI technology,
errors cannot be avoided and it is not always
feasible to overcome all errors. An attempt to
get rid of all errors results in excessive power
consumption and also slows down the
system.
Fig.9 (a) An exact full adder and (b) the
approximate adder cell.
IMPLEMENTATION RESULTS:
Electrical Performance:
All the investigated circuits have been
described in HDL language and synthesized in
TSMC 40 nm technology, imposing proper
timing constraints. In all the investigated
circuits, the final carry-propagate adder (CPA)
was implemented by using a fast parallel-
prefix topology. To make the comparison
meaningful, area and power are obtained by
synthesizing all the circuits with the same
timing constraint, equal to the delay of the
exact multiplier. For all the approximate
multipliers we have also calculated the
variations with respect to the exact multiplier
(reported as a percentage in , where a negative
percentage means improvement, while a
positive one means worsening).
Error Performance:
The error metrics considered in the following
for the approximate multipliers are:
The probability of an incorrect result,
Error Rate, (ER).
Journal of Scientific Computing
Volume 8 Issue 12 2019
ISSN NO: 1524-2560
http://jscglobal.org/62
The average value of the errors
produced by the multiplier, Mean
Error, (ME).
The root mean square of the errors
produced by the multiplier (ERMS).
SOFTWARE REQUIREMENT:
Verification Tool
Modelsim 6.4c
Synthesis Tool
Xilinx ISE 13.2
Software Description:
Modelsim:
Modelsim is a hardware simulation
and debug environment primarily targeted at
smaller ASIC and FPGA design.
Modelsimcombines simulation performance
and capacity with the code coverage and
debugging capabilities required to simulate
multiple blocks and systems and attain ASIC
gate-level sign-off. Comprehensive support of
Verilog, SystemVerilog for Design, VHDL,
and SystemC provide a solid foundation for
single and multi-language design verification
environments. ModelSim’s easy to use and
unified debug and simulation environment
provide today’s FPGA designers both the
advanced capabilities that they are growing to
need and the environment that makes their
work productive. ModelSim is a verification
and simulation tool for VHDL, Verilog,
SystemVerilog, and mixed language designs.
Project Flow:
Fig.10 Project Flow
A project is a collection mechanism
for an HDL design under specification or test.
Even though you don’t have to use projects in
ModelSim, they may ease interaction with the
tool and are useful for organizing files and
specifying simulation settings.
TABLE. 2 PSNR of image processing
applications for AM1 and AM2.
Functional analysis has shown that on a
statistical basis, the proposed multipliers have
very small error distances and thus, they
achieve a high accuracy. Simulation has also
shown that AM2 has a higher accuracy than
AM1 at the cost of a longer delay and a higher
power consumption.
APPLICATIONS
Image Filtering:
We have investigated the quality-
power trade-off of approximate multipliers in a
Gaussian smoothing image filtering, which is a
typical error resilient application.
Journal of Scientific Computing
Volume 8 Issue 12 2019
ISSN NO: 1524-2560
http://jscglobal.org/63
Fig. 11 Partial filtering of image
LMS Filtering:
Adaptive Least-Mean-Square (LMS)
is the most popular adaptive filtering
algorithm, with applications ranging from
adaptive noise cancellation to system
identification and channel equalization. LMS
aims to minimize the mean squared error
(MSE) between the output of the adaptive
filter (typically a Finite Impulse Response
filter, FIR) and a desired signal.
Fig. 11LMS System identification
In our test case we employed as filter
to be identified an IIR Butterworth low-pass
filter with bandpass gain equal to 0 dB (from
zero to 0.2 π⋅rad /sample) and stopband
attenuation of −60 dB at 0.3125π⋅rad /sample.
The LMS FIR filter uses 125 taps and is
implemented by using 16-bit fixed point
representation (only 16×16 signed multipliers
are employed).
Fig.12Adaptive filters variations
The convergence of the algorithm has
been evaluated through 100 simulations, each
one with a duration of 5×104 iterations, using
as input x(n) a white Gaussian signal with zero
mean, 0.3 standard deviation and range in (−1,
1). The adaptive filters have been synthesized
by imposing a maximum clock frequency of
290 MHz. Mean error compensation has been
applied at system level for all the investigated
approximate circuits.
Journal of Scientific Computing
Volume 8 Issue 12 2019
ISSN NO: 1524-2560
http://jscglobal.org/64
FUTURE ENHANCEMENT:
International Journal of VLSI design
& Communication Systems (VLSICS) Vol.10,
No.1, February 2019 2 within the third step,
the ultimate sums and carries square measure
additional to come up with the result. A
changed booth multiplier factor ought to focus
on the subsequent things. On reducing the full
range of partial product generated. This may
include any committal to writing ways or
reduction of computation quality of generation
partial products.
Fig 13. Images sharpened using the
proposed multipliers.
A big quantity of delay is consumed
find two’s complement of
multiplicand. So this delay should be
reduced.
The optimization of adder structure.
Once partial product generated, they
need to be classified and supplemental
during a systematic manner intense
less delay.
Fig.14 Images multiplied by different
multipliers.
This low power, fast and area efficient
multiplier can be used for FIR filter design,
MAC design as an extension to this paper. The
hardware implementations of the approximate
multiplier including one for the unsigned and
two for the signed operations can be done. It
can be downloaded into FPGA for further
improvements and observations.
Journal of Scientific Computing
Volume 8 Issue 12 2019
ISSN NO: 1524-2560
http://jscglobal.org/65
SNAP SHOT AS BINARY VALUE:
Fig 15. Output as binary value
SNAP SHOT AS UNSIGNED VALUES:
Fig 16.Output as unsigned values.
LITERATURE REVIEW:
[1] IMPACT: IMPrecise adders for low-
power Approximate CompuTing:
Low-power is an imperative requirement for
portable multimedia devices employing
various signal processing algorithms and
architectures. In most multimedia applications,
the final output is interpreted by human senses,
which are not perfect. This fact obviates the
need to produce exactly correct numerical
outputs. Previous research in this context
exploits error-resiliency primarily through
voltage overscaling, utilizing algorithmic and
architectural techniques to mitigate the
resulting errors. In this paper, we propose
logic complexity reduction as an alternative
approach to take advantage of the relaxation of
numerical accuracy.
[2] Approximate Computing: An
Emerging Paradigm For Energy-Efficient
Design:
Approximate computing has recently emerged
as a promising approach to energy-efficient
design of digital systems. Approximate
computing relies on the ability of many
systems and applications to tolerate some loss
of quality or optimality in the computed result.
By relaxing the need for fully precise or
completely deterministic operations,
approximate computing techniques allow
substantially improved energy efficiency. This
paper reviews recent progress in the area,
including design of approximate arithmetic
blocks, pertinent error and quality measures,
and algorithm-level techniques for
approximate computing.
[3] Bio-Inspired Imprecise Computational
Blocks for Efficient VLSI Implementation of
Soft-Computing Applications:
The conventional digital hardware
computational blocks with different structures
are designed to compute the precise results of
the assigned calculations. The main
contribution of our proposed Bio-inspired
Imprecise Computational blocks (BICs) is that
they are designed to provide an applicable
estimation of the result instead of its precise
value at a lower cost.
[4] Variable Latency Speculative
Addition: A New Paradigm for Arithmetic
Circuit Design:
Journal of Scientific Computing
Volume 8 Issue 12 2019
ISSN NO: 1524-2560
http://jscglobal.org/66
Adders are one of the key components in
arithmetic circuits. Enhancing their
performance can significantly improve the
quality of arithmetic designs. This is the
reason why the theoretical lower bounds on
the delay and area of an adder have been
analysed, and circuits with performance close
to these bounds have been designed.
CONCLUSION
This paper proposes a high-
performance and low-power approximate
partial product accumulation tree for a
multiplier using a newly designed approximate
adder. The proposed approximate adder
ignores the carry propagation by generating
both an approximate sum and an error vector.
OR gate and approximate adder based error
reduction schemes are utilized, yielding two
different approximate 8 × 8 multiplier designs:
AM1 and AM2. Moreover, modifications are
made on the error reduction schemes for 16 ×
16 multiplier designs, such that TAM1 and
TAM2 are obtained by truncating 16 LSBs of
the partial products. The proposed
approximate multipliers have been shown to
have a lower power dissipation than an exact
Wallace multiplier optimized for speed.
REFERENCES:
[1] J. Han and M. Orshansky, “Approximate
computing: An emerging paradigm for energy-
efficient design,” in Proc. 18th IEEE Eur. Test
Symp., May 2013, pp. 1–6.
[2] S.-L. Lu, “Speeding up processing with
approximation circuits,” Computer, vol. 37,
no. 3, pp. 67–73, Mar. 2004.
[3] A. K. Verma, P. Brisk, and P. Ienne,
“Variable latency speculative addition: A new
paradigm for arithmetic circuit design,” in
Proc. Design, Automat. Test Eur., Mar. 2008,
pp. 1250–1255.
[4] N. Zhu, W. L. Goh, and K. S. Yeo, “An
enhanced low-power high-speed adder for
error-tolerant application,” in Proc. 12th Int.
Symp. Integr. Circuits, Dec. 2009, pp. 69–72.
[5] H. R. Mahdiani, A. Ahmadi, S. M.
Fakhraie, and C. Lucas, “Bio-inspired
imprecise computational blocks for efficient
VLSI implementation of soft-computing
applications,” IEEE Trans. Circuits Syst. I,
Reg. Papers, vol. 57, no. 4, pp. 850–862, Apr.
2010.
[6] V. Gupta, D. Mohapatra, S. P. Park, A.
Raghunathan, and K. Roy, “IMPACT:
IMPrecise adders for low-power approximate
computing,” in Proc. IEEE/ACM Int. Symp.
Low Power Electron. Design, Aug. 2011, pp.
409–414.
[7] A. B. Kahng and S. Kang, “Accuracy-
configurable adder for approximate arithmetic
designs,” in Proc. Design Automat. Conf., Jun.
2012, pp. 820–825.
[8] K. Du, P. Varman, and K. Mohanram,
“High performance reliable variable latency
carry select addition,” in Proc. Design,
Automat. Test Eur. Conf. Exhib., Mar. 2012,
pp. 1257–1262.
[9] J. Liang, J. Han, and F. Lombardi, “New
metrics for the reliability of approximate and
probabilistic adders,” IEEE Trans. Comput.,
vol. 62, no. 9, pp. 1760–1771, Jun. 2012.
[10] J. Huang, J. Lach, and G. Robins, “A
methodology for energy-quality tradeoff using
imprecise hardware,” in Proc. Design
Automat. Conf., Jun. 2012, pp. 504–509.
Journal of Scientific Computing
Volume 8 Issue 12 2019
ISSN NO: 1524-2560
http://jscglobal.org/67
[11] J. Miao, K. He, A. Gerstlauer, and M.
Orshansky, “Modeling and synthesis of
quality-energy optimal approximate adders,”
in Proc. IEEE/ACM Int. Conf. Comput.-Aided
Design, Nov. 2012, pp. 728–735.
[12] R. Venkatesan, A. Agarwal, K. Roy, and
A. Raghunathan, “MACACO: Modeling and
analysis of circuits for approximate
computing,” in Proc. IEEE/ACM Int. Conf.
Comput.-Aided Design, Nov. 2011, pp. 667–
673.
[13] H. Jiang, C. Liu, L. Liu, F. Lombardi, and
J. Han, “A review, classification, and
comparative evaluation of approximate
arithmetic circuits,” ACM J. Emerg. Technol.
Comput. Syst., vol. 13, no. 4, 2017, Art. no.
60.
[14] P. Kulkarni, P. Gupta, and M. D.
Ercegovac, “Trading accuracy for power in a
multiplier architecture,” J. Low Power
Electron., vol. 7, no. 4, pp. 490–501, 2011.
Journal of Scientific Computing
Volume 8 Issue 12 2019
ISSN NO: 1524-2560
http://jscglobal.org/68