04447243

2116 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 55, NO. 7, AUGUST 2008

Reducing Lookup-Table Size in Direct DigitalFrequency Synthesizers Using Optimized

Multipartite Table MethodDavide De Caro, Member, IEEE, Nicola Petra, Member, IEEE, and Antonio G. M. Strollo, Senior Member, IEEE

Abstract—The use of the multipartite table methods (MTMs) toimplement high-performance direct digital frequency synthesizers(DDFSs) is investigated in this paper. A closed-form expressions forthe spurious-free dynamic range (SFDR) is obtained when a singletable of offset (TO) is used in the multipartite approximation. Inthis case, the optimal design that minimizes storage requirementfor a given SFDR can be obtained analytically. A numerical algo-rithm is also presented to obtain the optimal design also when twoor more TOs are employed is the approximation. The VLSI im-plementation results and the comparison with previously proposedDDFS architectures demonstrate the effectiveness of multipartitetable methods for the realization of high performance direct digitalsynthesizers.

Index Terms—CMOS digital integrated circuits, direct dig-ital synthesis (DDS), direct digital frequency synthesizers(DDFS), frequency synthesis, multipartite table method (MTM),phase-to-sinusoid amplitude conversion, read-only memory(ROM) compression.

I. INTRODUCTION

M ODERN digital communication systems require fre-quency synthesizers with fine frequency resolution, fast

channel switching speed, and large bandwidth. These require-ments are surpassing the capabilities of conventional analogphase-locked loops. Direct digital frequency synthesizers(DDFSs) are ideally suited for these demanding applications,being characterized by ultrahigh-precision frequency control,short tuning latency, fast frequency switching with phase conti-nuity, and excellent stability [1]–[4].

The basic DDFS architecture was originally proposed byTierney et al. [1] and is shown in Fig. 1. The phase accumulatorgenerates instantaneous phase values, while the sine generatorproduces a digital sinewave signal. Analog output is obtainedby using a digital-to-analog converter (DAC) followed by alow-pass reconstruction filter.

The frequency of the generated sinewave is proportionalto the frequency control word and is given by

(1)

Manuscript received November 23, 2006; revised March 13, 2007 and June19, 2007. First published February 7, 2008; last published August 13, 2008(projected). This paper was recommended by Associate Editor J. R. (a.k.a.Rong-Jian) Chen.

The authors are with the Department of Electronics and Telecommu-nication Engineering, University of Naples, 80125 Naples, Italy (e-mail:[email protected]).

Digital Object Identifier 10.1109/TCSI.2008.918008

Fig. 1. Simplified schematic of a DDFS. DAC and low-pass filter are includedwhen analog output is needed.

where is the clock frequency. By increasing the wordlengthof the phase accumulator , the DDFS can achieve an excellentfrequency resolution.

The most critical block in a DDFS is the sine generator. Inthe simplest implementation, the output of the accumulator ad-dresses a read-only memory (ROM). The ROM implements alookup table (LUT) storing -bit digitized sine waveform. Inthis brute-force approach, a very large ROM is needed, with thetotal ROM size being bits.

To reduce the ROM size without impairing frequency reso-lution, the phase value passed to the sine generator is normallytruncated to bits. Phase truncation reduces ROM size tobits, however, it introduces spurious noise in the DDFS outputs[1]–[4] which should be carefully taken into account in the de-sign phase.

Another well-known technique to reduce ROM size is to storethe sine values only for angles in . Sine values for thefull range of input phase are generated by exploiting thequarter-wave symmetry of trigonometric functions and trigono-metric identities. In this way, the ROM size can be reduced bya factor of four.

Even after performing phase truncation and exploiting quarterwave symmetry, the size of the LUT is usually prohibitive. Forthis reason, several alternative approaches for the implementa-tion of the sine generator have been proposed. A comprehensivereview of phase to sinusoid amplitude conversion techniques hasbeen recently published in [5]. In general, optimization of sinegenerator architecture involves trading off numerical precisionand the sine computation method against the sine wave spec-tral purity and maximum clock rate. The spurious free dynamicrange (SFDR), which is defined as the ratio between the ampli-tude of the wanted sinusoid and the amplitude of the largest un-desired frequency component, is the parameter commonly usedto characterize the DDFS spectral purity.

The DDFS architectures based on CORDIC-like angle-rota-tion algorithms [7]–[9] are well suited when a large SFDR isrequired but are rather ineffective for high-clock-frequency ap-plications. These approaches, in fact, require very small lookup

1549-8328/$25.00 © 2008 IEEE

bipin

Highlight

bipin

Highlight

bipin

Highlight

bipin

Highlight

bipin

Highlight

bipin

Highlight

bipin

Highlight

bipin

Highlight

DE CARO et al.: REDUCING LUT SIZE IN DIRECT DIGITAL FREQUENCY SYNTHESIZERS USING OPTIMIZED MTM 2117

memories but use cumbersome arithmetic circuitry that reducesclock frequency and increases power dissipation.

In polynomial and piecewise polynomial interpolation ar-chitectures, [10]–[15], a small ROM (often implemented as arandom logic) is employed to store polynomial coefficients.Adders and multipliers are required to implement the polyno-mial approximation. When high speed is the primary concern,piecewise-linear approximation with optimized coefficientsappears to be one of the most effective approaches [10], [15].

ROM compression techniques use approximations in whichthe LUT storing sine values is subdivided in two smaller parts(a “coarse” ROM and a “fine” ROM). The outputs of coarse andfine ROMs are added together to yield the final sine value. Thefirst ROM compression technique was proposed by Hutchinson[1], [5]. An improved approach, based on trigonometric approx-imations, was developed by Sunderland and then improved byNicholas [5], [6]. The DDFS presented in [16] uses another tech-nique in which total ROM size is further reduced by decom-posing both coarse and fine ROMs as the sum of an “error” ROMand a “quantization” ROM.

The multipartite table method (MTM) is a very effective tech-nique for table-based function evaluation, recently proposed in[18]. The MTM can be seen as a generalization of the Nicholastechnique. The LUT is decomposed in small ROMs[a table of initial values (TIV) plus table of offsets (TOs)],whose outputs are added together to obtain the required func-tion value. From an implementation point of view, the optimal

value comes from a tradeoff between the total ROM size (thatdecreases with ) and the multi-operand adder complexity. Theimplementation results reported in [19] highlight that the MTMis a technique ideally suited for DDFS implementations, re-quiring both small ROM and a minimal arithmetic overhead.

This paper investigates the optimization of MTM for DDFSimplementation. An analytical technique that allows to obtainthe optimal multipartite table approximation, which minimizesthe overall ROM size while guaranteeing a target SFDR value, ispresented in the paper. As shown in [19], optimizing for SFDRresults in a substantial memory saving with respect to the orig-inal approach of [18], where the optimal multipartite table wassearched by imposing a bound on the maximum absolute ap-proximation error.

First, this paper investigates the MTM with a single TO, thebipartite table method (BTM). In this case, the optimal SFDRvalue and the ROM content which maximizes the SFDR areboth derived in closed form. This allows us to easily determine,with straightforward calculations, the optimal ROM decompo-sition. The developed analytical technique allows to obtain theoptimal bipartite table decomposition avoiding the numericalsearch technique used in the approach of [19].

In the second step, the analytical approach developed for theBTM is extended in the general MTM, with two or more TOs.In this case, the optimal MTM approximation is found by usinga novel search algorithm which, extending our BTM technique,is able to find the optimal MTM decomposition avoiding anybrute-force exhaustive search that would require an unaccept-ably large computing time to evaluate the SFDR for each pos-sible input decomposition.

The paper also studies the implementation tradeoffs involvedin MTM-based DDFS. It is shown that the optimal number of

Fig. 2. Sine generator architecture, exploiting half-wave sine symmetry.

TOs depends on the required SFDR. Optimal table decomposi-tion and content is given for 60- and 80-dBc cases.

This paper is organized as follows. Section II reports a briefreview of ROM compression algorithms. Section III investi-gates the MTM with a single TO (the BTM). The extension toMTM with two or more TOs is described in Section IV. Thetradeoff between silicon area, clock speed, and power is investi-gated in Section V by comparing the simulated performances ofseveral MTM-based DDFS implemented in a 0.25- m CMOStechnology.

This paper focuses only on the digital portion of the systemshown in Fig. 1. When analog outputs are needed, the DACcharacteristics should be taken into account since the SFDR andpower dissipation of the DAC could limit system performances.

II. ROM COMPRESSION ALGORITHMS

A. Quadrant Compression

The architecture of the sine generator block using quadrantcompression is shown in Fig. 2. The -bits input signal rep-resents the input phase in . The two most significant bitsof determine the quadrant in which the input phase lies. Thesignal , composed by the least-significant bits of

, represents an angle in , scaled to a binary fraction[0,1). The input of the sine calculation block is the -bit signal

and the sine calculation block computes the -bitoutput

(2)

where is the weight of the least significant bit of. As shown in Fig. 2, in the first quadrant, , and the

output of the sine calculation block is straightforwardly sent tothe DDFS output. In the other quadrants, the output sine waveis reconstructed by conditionally complementing the input andthe output of the sine calculation block. The offset in(2) allows using a simple 1’s complementor for the signal , inplace of a more complex 2’s complementor [20].

B. Sunderland and Nicholas Algorithms

The Sunderland technique reduces ROM size by employingsimple trigonometric identities. The signal is decomposed as

, where , , and correspond to the most

bipin

Highlight


significant bits (MSBs), the middle bits, and the least significantbits (LSBs) of .

Due to the relative magnitudes of , , and , the functionin (2) is approximated as

(3)

The last equation is implemented by using a coarse ROM,implementing the function , a fine ROM for the function

, and an adder.The Nicholas algorithm [6] uses the same coarse/fine ROM

partitioning of the Sunderland architecture. In the Nicholas al-gorithm, however, the ROM coefficients are obtained by usingnumerical optimization, instead of closed-form expression (3).Fine ROM samples are obtained by minimizing either the max-imum absolute error or the mean-square error between and

.

C. BTM

The BTM, introduced in [17], is a table-based approach thatimplements a particular piecewise-linear approximation for

. The range [0,1) of is divided in equal lengthsegments. In each segment the function is approximatedas

(4)The starting point of the th segment is , with and

.The idea behind the BTM is to group the segments into

larger intervals (with ) and to use the sameinterpolating slope in each larger interval. Therefore, the sameinterpolating slope is employed for adjacent segments, where

(5)

Fig. 3 shows an example of the BTM algorithm, forand . In this case, we have eight segments while only twoslope values are used to perform the linear interpolation. Thefirst slope value is used for the segments from 1 to 4, while thesecond slope value is employed for the segments from 5 to 8.

The implementation of the BTM requires aTIV that storesand a TO for . The TIV is addressed by the MSBsof and the total number of TIV entries is . The total numberof slopes considered in the approximation is , thereforethe MSBs of address the TO. Moreover, the termcorresponds to the subword composed by the LSBs of (with

). This subword, therefore, also addresses the TOand the total number of TO entries is .

Fig. 4 shows the symmetric [17] implementation of the BTM.In this case, the TIV stores the value of the interpolating func-tion in the middle of each segment. The TO stores the offsetsbetween the approximating function and the TIV value in eachsegment. As highlighted in the expanded view of Fig. 3, this

Fig. 3. BTM for � � �, � � �. The entries of the TIV are represented withthe heavy dots. The expanded view highlights the linear interpolation imple-mented through the TO

Fig. 4. Implementation of the BTM, exploiting the symmetry in the TO entries.

Fig. 5. MTM input signal decomposition.

choice makes symmetric the values stored in the TO for eachsegment. This property is exploited to halve the size of the TO:the TO stores only the absolute offset values, corresponding tothe right half of each segment, and an adder–subtractor is em-ployed to perform the interpolation, as shown in Fig. 4.

D. MTM

The MTM, described in [18], generalizes the BTM. In themultipartite table approach, the -bit input signal is decom-posed in nonoverlapping subwords:of lengths respectively. The value of the inputoperand is and the length is

; see Fig. 5.The MTM approximation of starts from the observation

that the TO computes the multiplication: . Now, theterm can be seen as the sum the subwordsand the multiplication implemented by the TO can be written as:


Fig. 6. Implementation of the MTM using two TOs.

. Therefore, the TO can bedistributed into the sum of smaller TOs:

with (6)

The is indexed by the and bits shown in Fig. 5. In[18] it is shown that a significant reduction in memory size canbe achieved by indexing the -th Table of Offsets with and ,where: .

Fig. 6 shows the implementation of the MTM approximation,when two TOs are employed. Symmetry is exploited also inMTM to reduce the size of the TO.

The values to be stored in the ROMs are obtained in [18]by minimizing the maximum absolute approximation error.Closed-form expression for ROM coefficients and error boundsare also provided in [18].

In general, increasing the number of TOs allows to reduce thetotal memory size. However, any TO requires the introductionof an additional adder input, with a tradeoff between the totalROM size and the multi-operand adder complexity. Moreover,using more tables increases the discretization error, requiringthe introduction of guard bits [18] that may partly overcome theadvantages in terms of memory size.

III. BTM WITH SFDR OPTIMIZATION

Here, we focus on bipartite table approximation. First, anexpression for the harmonic content will be obtained. Then asimple closed-form expression giving the SFDR upper boundwill be derived. Finally, we will address the problem of deter-mining the optimal decomposition that minimizes the ROM sizefor a given SFDR.

A. Harmonics Calculation

For the time being, let us neglect the effects of quantization,as that will be considered in subsequent sections. Therefore, wewill assume so that only two parameters ( and )characterize the bipartite decomposition. Moreover, to simplify

discussion and without loss of generality, let us also neglect theLSB/2 phase offset in (2).

As discussed previously, the BTM corresponds to a piece-wise-linear approximation of . Therefore, we can use theanalytical approach developed in [10] and [14] to obtain the am-plitude of the generated harmonics.

Let us indicate as the DDFS output, obtained by applyingquadrant compression to . The function has periodand odd symmetry. Therefore, it can be represented by a Fouriersine series as follows:

(7)

Even harmonics are zero, since has quadrant symmetry.Odd harmonics amplitude, following the analysis of [10], can becalculated as

(8)

for odd and 0 otherwise, with

(9)

(10)

where coefficients and are given by

(11)

Since the same interpolating slope is employed foradjacent subintervals, the slopes are constrained as

follows:

(12)By imposing the conditions (12) in (11), we observe that allcoefficients with index not divisible by are zero. As a con-

sequence, the function can be written as

(13)

where . The equations (8)–(13) allow to computethe harmonic amplitude from the knowledge of the andvalues and from the value of the two parameters and thatcharacterize the bipartite decomposition.

B. SFDR Optimization

The problem of SFDR optimization consists in determiningthe and that maximize the SFDR, for given values ofand . This problem is treated in Appendix I, where an analyt-ical technique able to obtain the optimal coefficients and


Fig. 7. Optimal SFDR in BTM as a function of � and � .

is described and the value of the optimal SFDR is also calcu-lated. In Appendix I, it is shown that the optimal SFDR can bewritten in two different forms [cf. (31) and (41)], depending onthe value.

Equations (41) and (31) simplified for can be groupedin a single handy expression as follows:

(14)

where is the Kronecker delta function: forand 0 otherwise.

Fig. 7 shows the behavior of the optimal SFDR, given by (14).The case corresponds to the upper bound derived in[10]. The SFDR increases with . As displayed in Fig. 7, for aconstant value, the SFDR decreases by decreasing . In fact,the lower is, the lower the number of different slopes is thatwe use in BTM approximation.

Fig. 8 shows the calculated harmonics, for and, corresponding to a BTM approximation with seg-

ments and different slopes. The analysis of Appendix Ishows that, for this set of parameters, the dominant harmonicsare , , and , where from (21)and (22) and . Fig. 8 shows that the largest har-monics are indeed the 15th, 49th, and 65th, as expected, with anSFDR slightly lower than 60 dBc, in agreement with (14). Theamplitude of higher order harmonics is well below this value.The inset of Fig. 8 shows the approximation error

, which is discontinuous at the segment boundaries. Theapproximation error is in the range , . Itcan be noted that the developed SFDR optimization is differentfrom a min–max approximation, since the maximum positiveand negative approximation errors are not equal in modulus.

C. Optimal Design

The first step to determine the optimal BTM design, that min-imizes the ROM size for a target SFDR, is the selections of the

and values (see Fig. 2). The value of is obtained takinginto account the spurs introduced by phase quantization [1]–[4]

(15)

Fig. 8. Calculated harmonics, for a DDFS using the BTM with SFDR optimiza-tion. In this case, � � � and � � �, corresponding to a BTM approximationwith � � �� segments and � � � different slopes. The inset shows the ap-proximation error.

hence

(16)

where the function rounds to the nearest integer, largerthan (or equal to) .

The value of is related to amplitude quantization errors.There is no expression available for the effect of amplitudequantization on SFDR. On the other hand, the effect of ampli-tude quantization on signal-to-noise ratio (SNR) is well known.If we consider the SNR, we have from [21] that the phasequantization dominates the SNR when , otherwiseamplitude quantization dominates. Therefore, one of the twovalues, either or is selected. In thefollowing, will be assumed for dBc,while is assumed for larger SFDR values.

Now let us determine the optimal values of and . To thatpurpose, a few candidate solutions that allow to reach the targetSFDR are initially obtained. The best BTM decomposition isselected as the candidate solution with the minimal ROM size.

The first candidate solution is the BTM decomposition withthe minimum number of segments that allows to reach the targetSFDR. This is the decomposition with . Thevalue of can easily be obtained from (14) as

(17)

The second candidate solutions has and theminimum value of that allows to reach the target SFDR. Thisvalue of is obtained through a simple search, using again (14)to compute the SFDR.

The other candidate solutions are obtained in a similar way,assuming and selecting the minimum valueto reach the target SFDR, for increasing values.


Fig. 9. Design of a 60-dBc DDFS, based on optimized BTM. At the optimalpoint � � � and � � �, with a total ROM size of 352 b.

As an example, for 60-dBc SFDR, the following four candi-date solutions are obtained: ,

, , and .Now, the ROM size is computed for each candidate solution

and the best BTM decomposition, with the minimal ROM size,is selected.

The ROM size can easily be computed. The TIV size is. In order to determine the TO size, let us observe that the

maximum slope of the BTM can be estimated as the maximumderivative of , given by . Therefore, the largest value tobe stored in the TO can be approximated as

and the TO values can be represented withbits. The total ROM size can hence be written as

(18)

where the symmetry of the fine ROM entries has been taken intoaccount and .

The design example for a 60-dBc SFDR ( , ) isshown in Fig. 9. In this case, is assumed. When

increases, the TIV size increases while the TO decreases. Theoptimal point is achieved for , , with a total ROMof 352 b and an SFDR (before rounding) of 65.2 dBc.

D. Tables Entries

The TIV stores the value of the interpolating function in themiddle of each segment. The content of TIV (before rounding)is given by

(19)

where is a constant, equal to one-halfthe maximum value and the LSB/2 phase offset of (2)is also taken into account.

The TO stores the offsets between the approximating functionand the TIV value in each segment

(20)The actual values stored in the TIV and the TO are obtained

by rounding the right-hand sides of (19) and (20). To provideimproved performance in the presence of amplitude quantiza-tion, the values stored in LUTs can be scaled before rounding

TABLE IOPTIMAL DECOMPOSITIONS AND ROM CONTENT

FOR 60-dBc SFDR, USING OPTIMIZED BTM

Fig. 10. Total ROM size as a function of SFDR, for optimized MTMalgorithms.

[6] (amplitude optimization). To that purpose, a search is per-formed by multiplying the right-hand sides of (19) and (20) bya factor , with . For each trial value, (19) and(20) are rounded to fill the LUTs, and the SFDR is computed byperforming a -point fast Fourier transform (FFT). The bestrounded TIV and TO are selected as the ones yielding the largestSFDR. The use of a very small step size (less than one LSB)during the search guarantees a negligible reduction of the outputsinewave.

The ROMs content for the optimized 60-dBc DDFS is re-ported in Table I. The total ROM size is 352 b and the obtainedSFDR is 64.69 dBc. As a comparison, the DDFS recently pro-posed in [16] (using quad line range compression, the Sunder-land technique, and additional “quantization and error” ROMcompression) uses 368 b of memory, reaches 55-dBc SFDR, andrequires a six-input multi-operand adder.

Fig. 10 shows the total memory required by the optimizedBTM algorithm, as a function of SFDR. As can be seen, thestorage requirements are much smaller than uncompressedmemory, with a compression ratio increasing with SFDR. Onthe other hand, the ROM size still increases exponentially withthe SFDR. Therefore, the BTM algorithm becomes ineffectivewhen SFDR larger than 80 dBc are required.

IV. MTM WITH SFDR OPTIMIZATION

The MTM can be seen as a piecewise-linear approximation,with an additional error component due to the splitting of


Fig. 11. Algorithm to determine the optimal MTM decomposition, with twoTOs.

the TO into smaller TOs. The SFDR reduction due to thisadditional error component cannot easily be determined ana-lytically. Therefore, we developed a search algorithm to findthe optimal MTM decomposition (that minimizes the ROMsize for a given SFDR). The proposed algorithm imposes anupper bound on the maximum error in the time domain toreduce the search space (see Appendix II). Moreover, in thedeveloped algorithm, according to [18], the slopes employedin , , etc., are obtained by averaging the slopes usedin (see (42) in Appendix II). Please note that using moretables increases the discretization error. In our algorithm, thisproblem is solved by considering the addition of a maximumof two guard bits in the tables.

A. Using Two TOs

Let us start by considering a multipartite approximation withtwo TOs, that is, a tripartite approximation. The input word de-composition is shown in Fig. 6.

The proposed algorithm is shown in Fig. 11 and is composedof three main steps. A time-consuming numerical SFDR calcu-lation is performed only in the third step of the algorithm.

In the first step, the candidate couples of the BTMdecomposition and the total ROM size corresponding to the op-timal BMT decomposition are obtained, as described in the pre-vious section. In the second step of the algorithm, for each can-didate couple, the possible tripartite decompositions are enu-merated by varying and . To limit the search space, in theenumeration only the decompositions with areconsidered. This condition, as described in Appendix II, guaran-tees a bound on the additional error component of tripartite ap-proximation. Moreover, only the decompositions characterized

TABLE IIMTM APPROXIMATION WITH TWO TOS. OPTIMAL DECOMPOSITIONS AND

ROM CONTENT FOR 60- AND 80-dBc SFDR

by a ROM size smaller than the one obtained with the BTM aresaved. For each , , , and values, which satisfy the aboveconditions, three candidate decompositions, with a number ofguard bits equal to 0, 1, and 2, are saved to be considered inthe final phase of the algorithm. A final search is performed inthe third step of the algorithm. After sorting the candidate de-compositions in ascending ROM size, the SFDR is computedfor each decomposition. The first candidate solution that meetsthe target SFDR decomposition is the optimal one, with the min-imal ROM size.

Table II shows the obtained tripartite decompositions and theROMs contents for 60- and 80-dBc SFDR. No guard bits areneeded in this case to reach the target SFDR.

The CPU time needed to find the optimal decomposition on a3-GHz Pentium IV PC varies from about 1 s, for a target SFDRof 60 dBc, up to about 150 s, for a target SFDR of 120 dBc.

B. General Case (More Than Two TOs)

For the general case, when more than two TOs are employed,the search algorithm is very similar to that described in Fig. 11.The only difference is the enumeration of the possible decom-positions in the second step of the algorithm. Also, in this case,the time-consuming SFDR computation is carried out only inthe third step of the algorithm. The CPU time needed to find theoptimal decomposition for the most complex case investigatedin this paper (four TOs and 120 dBc target SFDR) is about 15min on a 3-GHz Pentium IV PC.

Fig. 10 shows the total ROM size required by DDFS usingoptimized MTM algorithm as a function of SFDR. As can beseen, a sensible decrease in memory size is obtained by usingthe MTM algorithm with two TOs, while the improvement isless evident when three or four TOs are employed.


TABLE IIICOMPARISON BETWEEN ROM SIZE IN RECENTLY PROPOSED DDFSS

Table III shows a comparison between the ROM size usedin recently proposed DDFS architectures. As can be seen, theoptimized MTM approach compares favorably even with multi-plier-based techniques such as those by Curticapean [22], [23],Bellauar [12], and De Caro [14]. However, it is worth noting thatthe introduction of additional fine ROMs requires the utilizationof a larger multi-operand adder that might become a bottleneckin terms of speed or silicon area.

V. VLSI SIMULATIONS RESULTS

We have implemented several DDFSs for SFDR values of60, 80, and 100 dBc. A 24-b accumulator was used in everyDDFS. All circuits have been synthesized by using a standarddesign flow, starting from VHDL synthesizable description, fol-lowed by gate-level optimization and standard cells place androute. The technology is 0.25- m CMOS, with one poly andfive metals.

In the designed DDFSs, we have not used slow and power-hungry full-custom ROMs, but instead we have implementedthe ROMs by using standard cells, with the help of automaticsynthesis tools. As discussed in [13], this not only facilitates de-sign reuse, but also allows to reach high clock frequencies withreduced dissipation. In addition, the circuits can be designedto meet different system requirements, by specifying speed andarea constraints during synthesis.

Table IV shows the simulation results in the absence ofpipelining in the sine generator, with a single pipeline level inthe accumulator. Timing constraints have been imposed duringsynthesis and optimization. The considered clock periods are:3, 4, and 5 ns for 60-, 80-, and 100-dBc DDFSs, respectively.Fast carry look-ahead adder (Brent–Kung parallel-prefix archi-tecture [28]) was employed both in the accumulator and in thesine generator.

As shown in Table IV, for 60-dBc DDFS there is no advantagein increasing the number of TOs, and the best architecture cor-

TABLE IVIMPLEMENTATION RESULTS FOR OPTIMIZED DDFSS

TABLE VHIGH-SPEED DDFS IMPLEMENTATION RESULTS

responds to the bipartite approximation. In this architecture, theTIV and the TO are synthesized by using only 73 and 25 gates,respectively, with a total ROM area of about m ,which is less than 15% of the total DDFS area. The total ROMarea decreases if two (or more) TOs are used. This improvement,however, is more than compensated for by the multi-operandadder needed to sum TIV and TOs outputs. The overall effect isa larger (and slightly slower) circuit.

For 100-dBc SFDR, the best design uses four TOs. In thiscase, an area reduction of about 45% is obtained with respectto the BTM implementation. In this implementation, in fact, thepercentage of total circuit area taken by the ROMs is relevant,and any reduction in ROM size results in an improvement intotal DDFS area. Note, however, that using more than three TOsyields only a marginal ROM size (and circuit area) reductionwith a slightly larger power dissipation.

For 80-dBc DDFSs, the best tradeoff between adder andROMs size is achieved by using two TOs.

It is interesting to note that the tradeoff found here betweenROM and adder complexities is similar to the tradeoff existingin piecewise polynomial DDFSs between ROM and arithmeticcircuitry (see [14]).

Reaching a high clock frequency is an important issue forDDFSs. The circuits designed with the MTM are ideally suitedfor high clock frequency operation, requiring both small LUTsand a very simple arithmetic circuitry.

Table V shows the simulated performances of 60- and 80-dBchigh-speed DDFSs. In these circuits, high-speed operation isgained by introducing two pipeline levels in the accumulatorand three additional pipeline levels in the sine generator. In par-ticular, the first pipeline stage is introduced on the ROMs ad-


TABLE VICOMPARISON BETWEEN RECENTLY PROPOSED DDFSS

dress lines, the second pipeline stage is inserted on the ROMsoutputs, while the third pipeline stage is introduced in the multi-operand adder. Several implementations have been realized byvarying timing constraints during synthesis and optimizationand by using different parallel-prefix adder topologies (e.g., ei-ther Brent–Kung or Kogge–Stone [28]).

As shown in Table V, a clock frequency larger than 800 MHzfor a 60-dBc DDFs and larger than 700 MHz for an 80-dBcDDFS can be reached with a price in term of silicon area.

For 60-dBc circuits, using high-speed and area-hungryKogge–Stone topology in the sine generator is ineffective dueto the reduced adder wordlength. On the other hand, in theaccumulator, the use of Kogge–Stone architecture is mandatoryto reach a clock frequency larger than 700 MHz.

For high-speed 80-dBc DDFS, the use of Kogge–Stone archi-tecture in the sine–cosine generator allows to achieve the bestperformances due to larger wordlengths.

A fair comparison between the performances of DDFSsdeveloped in this paper and previously proposed circuits is noteasy. This is due to the wide range of possible architecturaland implementation choices, like the desired SFDR, the accu-mulator wordlength, the structure of the sine generator (whichcan be either single phase or quadrature), the implementationtechnology, the standard-cell library, and so on. The data shownin Table VI have been obtained by selecting some recentlypublished DDFS circuits, using CMOS technology, SFDR, andclock frequency similar to the ones considered in this paper.

For 60-dBc SFDR, the design in [26] uses only 60% of thearea of our DDFS, but is four times slower, provides a 20%higher power dissipation, and uses a more advanced 0.18- mtechnology.

Considering 60-dBc high-speed implementations, our pro-posed technique allows reaching a high clock frequency withoutrequiring the parallel operation of multiple sine–cosine gener-ators, as in [16]. Therefore, silicon area of our DDFS is onlya small fraction with respect to that of [16], and moreover thepower dissipation of the proposed circuit uses about one thirdof the power of that in [16].

For an 80-dBc SFDR, the 250-MHz version of the proposedcircuit reveals itself to be both faster and smaller than piece-wise-linear DDFSs. However, some of the difference could beattributed to the use of a simpler single-phase architecture inthis paper. A similar consideration applies for the high-speedversion of the 80-dBc DDFS.

Interestingly, the data in Table VI show that the developedDDFS largely outperforms recently proposed circuits evenwhen high SFDR (100 dBc) are required. As an example, thearea occupation and the power dissipation of the proposed cir-cuit are about one third and one sixth with respect to the designproposed in [27]. This clearly demonstrates the effectiveness ofthe ROM compression technique.

VI. CONCLUSION

We have investigated the optimization of the MTM for DDFSimplementation. An analytical technique that allows to obtainthe optimal bipartite approximation (that minimizes the overallROM size, while guaranteeing a target SFDR value) has beenpresented. An effective search algorithm has been proposed toselect the optimal multipartite approximation.

Several DDFS have been implemented and simulated in a0.25- m CMOS technology. We have found that the optimalnumber of TOs to be used depends on the required SFDR. For


an SFDR of the order of 60 dBc, the best results are obtainedby employing a single TO, while using two TOs is the recom-mended choice for an 80-dBc SFDR. For a 100-dBc SFDR, ei-ther three or four TOs should be considered.

The obtained VLSI simulation results and the comparisonwith previously proposed DDFS architectures demonstrate theeffectiveness of MTMs for the realization of high-performancedirect digital synthesizers.

APPENDIX I

In order to minimize harmonic contents, we put at zero theamplitude of as many harmonics as possible, while keepingfixed the amplitude of the fundamental harmonic .

Let us start by observing, from (13), that function hasodd symmetry and is periodic in with period

(21)

The function , from (9), has instead even symmetry andis periodic in with period

(22)

Combining the periodicity and symmetry of and ,one has

(23)

(24)

Let us now focus our attention on the amplitude of the funda-mental and of the harmonics for . Wewill show a posteriori that, at the optimal point, the higher orderharmonics are smaller in amplitude with respect to the ones inthe range .

In order to minimize harmonic contents, we impose the fol-lowing conditions:

for (25)

Owing to (23), the last equation implies that all values withindex from up to are also zero. Therefore, wehave

for (26)

and the only values different from zero are ,, , etc.

Conditions similar to (26) are imposed for the function

for (27)

From (24) and (27), the only values different from zero are

, etc.

Due to (26) and (27), all of the harmonics of are can-celled, with the exception of the fundamental and of

, , , etc.From (8), (9), (13), and (21)–(27), we can write

(28)

for (29)

for (30)

Please note that, for , the only harmonics to be consid-ered are , and (29) and (30) do not apply. This corre-sponds to the (unconstrained) piecewise-linear sine approxima-tion considered in [10] and the corresponding optimal SFDR iscalculated in [10] as

(31)

For , we have to take into account only and, so that (30) does not apply.

In the general case , considered in the following, wehave to consider also the harmonics given by (30).

From (29), it comes out that the parameter appearsonly in the expression of harmonics and

. Since these two harmonics are linearly dependent on, their maximum absolute value is minimized when

. Therefore, at the optimal point, oneof the following two conditions is met: either

or . It can easilybe seen that the condition to be imposed to reduce harmonicsamplitude is the last one: .Substituting into (29), one obtains

for (32)


Similarly, from (30), in order to minimize the amplitude ofharmonics and , we impose

, thus obtaining

for (33)

From (32) and (33), it can easily be seen that

for (34)

with

(35)

Therefore, the dominant harmonic betweenis the one at lower frequency and is given

by (35). Equation (35) holds also in the case .Let us now consider (28). By imposing an unitary value for

the fundamental , one obtains

(36)

(37)

The maximum SFDR is achieved by choosing the valuethat minimizes the maximum amplitude of harmonics ,from (37), and given by (35). The three harmonics

, , and are linearly dependent from. The best solution is found when .

One has

(38)

(39)

The above equation can also be expressed in terms of andas follows:

(40)For the last equation can be simplified as follows:

(41)

In order to obtain the optimal coefficients and for agiven decomposition , , we first compute andaccording to (36) and (38). The other and values arecalculated by using (26), (27), (32), and (33). Then, the linearequations system (9) and (13) is resolved to obtain optimal ,

, and values. Coefficients and are finally computedfrom (11).

APPENDIX II

Let us consider a multipartite approximation with two TOs,i.e., a tripartite approximation. The input word decompositionis shown in Fig. 6. While uses different slopes, com-puted according the technique presented in Section III,uses slope values, with . Therefore, the slopes

of (with ) are approximatedwith a single value in . This results in an error compo-nent that is minimized by assuming

(42)

The corresponding slope error is given by. The error on the computed function is hence

(43)

The value of can be estimated with the help of Taylorapproximation

(44)

where is the maximum value of the modulus of the secondderivative of and is the distance between the midpointsof the first segment using and the last segment using

(45)

Substituting, we obtain

(46)

Following the analysis of [17], the algorithmic error of theBTM can be bounded as

(47)

From our simulations, we have noted that the optimal MTMdecompositions always verify the following inequality:

(48)

By substituting (46) and (47) into (48), we have

(49)

The last condition is satisfied when

(50)

In general, it can be shown that the error component due tois smaller than when (50) is imposed in addition

to the following inequalities:

(51)


REFERENCES

[1] J. Tierney, C. M. Rader, and B. Gold, “A digital frequency synthesizer,”IEEE Trans. Audio Electroacoust., vol. AU-19, no. 1, pp. 48–57, Mar.1971.

[2] B. G. Goldberg, Digital Frequency Synthesis Demystified. EagleRock, VA: LLH Technology, 1999.

[3] V. F. Kroupa, Direct Digital Frequency Synthesizer. New York: IEEEPress, 1998.

[4] J. Vankka and K. Halonen, Direct Digital Synthesizers: Theory, Designand Applications. Norwell, MA: Kluver, 2001.

[5] J. M. P. Langlois and D. Al-Khalili, “Phase to sinusoid amplitude con-version techniques for direct digital frequency synthesis,” Inst. Proc.Elect. Eng. Circuits Devices Syst., vol. 151, no. 6, pp. 519–528, Dec.2004.

[6] H. T. Nicholas, III and H. Samueli, “A 150-MHz direct digital fre-quency synthesizer in 1.25-micron CMOS with—90 dBc spuriousperformance,” IEEE J. Solid-State Circuits, vol. 26, no. 12, pp.1959–1969, Dec. 1991.

[7] A. Madisetti, A. Y. Kwentus, and A. N. Willson, “A 100-MHz, 16-b,direct digital frequency synthesizer with a 100-dBc spurious-freedynamic range,” IEEE J. Solid-State Circuits, vol. 34, no. 8, pp.1034–1043, Aug. 1999.

[8] F. Curticapean, K. I. Palomaki, and J. Niittylahti, “Quadrature directdigital frequency synthesizer using angle rotation algorithm,” in Proc.IEEE Int. Symp. Circuits Syst. (ISCAS), May 2003, vol. II, pp. 81–84.

[9] Y. Song and B. Kim, “Quadrature direct digital frequency synte-sizer using interpolation based angle rotation,” IEEE Trans. VeryLarge-Scale Integr. (VLSI) Syst., vol. 12, no. 7, pp. 701–710, Jul. 2004.

[10] J. M. P. Langlois and D. Al Khalili, “Novel approach to the designof direct digital frequency synthesizers based on linear interpolation,”IEEE Trans. Circuits Sys. II, Analog Digit. Signal Process., vol. 50, no.9, pp. 567–578, Sep. 2003.

[11] D. De Caro, E. Napoli, and A. G. M. Strollo, “Direct digital frequencysynthesizers with polynomial hyperfolding technique,” IEEE Trans.Circuits Syst. II, Exp. Briefs, vol. 51, no. 7, pp. 337–344, Jul. 2004.

[12] A. Bellaouar, M. S. O’Brecht, A. M. Fahim, and M. I. Elmasry, “Low-power direct digital frequency synthesis for wireless communications,”IEEE J. Solid-State Circuits, vol. 35, no. 3, pp. 385–390, Mar. 2000.

[13] J.-S. Wang, S.-J. Lin, and C. Yeh, “A low-power high-SFDR CMOS di-rect digital frequency synthesizer,” in Proc. IEEE Circuits Syst. Symp.,May 2005, vol. 2, pp. 1670–1673.

[14] D. De Caro and A. G. M. Strollo, “High performance direct digitalfrequency synthesizers using piecewise polynomial approximation,”IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 2, pp. 324–337,Feb. 2005.

[15] D. De Caro and A. G. M. Strollo, “High performance direct digitalfrequency synthesizers in 0.25 �m CMOS using dual-slope approxi-mation,” IEEE J. Solid-State Circuits, vol. 40, no. 11, pp. 2220–2227,Nov. 2005.

[16] B. D. Yang, J. H. Choi, S. H. Han, L. S. Kim, and H. K. Yu, “An 800Mhz low-power direct digital frequency synthesizer with on-chip D/Aconverter,” IEEE J. Solid-State Circuits, vol. 39, no. 5, pp. 761–774,May 2004.

[17] J. E. Stine and M. J. Schulte, “Approximating elementary functionswith symmetric bipartite tables,” IEEE Trans. Comput., vol. 48, no. 8,pp. 842–847, Aug. 1999.

[18] F. De Dinechin and A. Tisserand, “Multipartite table methods,” IEEETrans. Comput., vol. 54, no. 3, pp. 319–330, Mar. 2005.

[19] A. G. M. Strollo, D. De Caro, and N. Petra, “A 630 MHz, 76 mWdirect digital frequency synthesizer using enhanced ROM compressiontechnique,” IEEE J. Solid-State Circuits, vol. 42, no. 2, pp. 350–360,Feb. 2007.

[20] J. Vankka, “Methods of mapping from phase to sine amplitude in directdigital synthesis,” IEEE Trans. Ultrasonic Ferroelectr. Freq. Control,vol. 44, no. 2, pp. 526–534, Mar. 1997.

[21] J. Vankka, L. Lindemberg, and K. Halonen, “Direct digital synthesizerwith tunable phase and amplitude error feedback structures,” Proc. IEECircuits Devices Syst., vol. 151, no. 6, pp. 529–535, Dec. 2004.

[22] F. Curticapean and J. Niittylahti, “A hardware efficient direct digitalfrequency synthesizer,” in Proc. IEEE Int. Conf. Electron., CircuitsSyst., Sep. 2–5, 2001, pp. 51–54.

[23] F. Curticapean and J. Niittylahti, “Low power direct digital frequencysynthesizer,” in Proc. 43rd IEEE Midwest Symp. Circuits Syst.,Lansing, MI, Aug. 9–11, 2000, pp. 822–825.

[24] S. Liao and L.-G. Chen, “A low-power low-voltage direct digital fre-quency synthesizer,” in Proc. Int. Symp. VLSI Technol., Syst., Applica-tions, Jun. 1997, pp. 265–269.

[25] J. F. Ardekani, “ � � � booth encoded multiplier generator usingoptimized Wallace trees,” IEEE Trans. Very Large-Scale Integr. (VLSI)Syst., vol. 1, no. 2, pp. 120–125, Jun. 1993.

[26] J. M. P. Langlois and D. Al Khalili, “Low power direct digital fre-quency synthesizer in 0.18 �m CMOS,” in Proc. Custom Integr. Cir-cuits Conf., Sep. 2003, pp. 21–24.

[27] Y. Song and B. Kim, “A 14-b direct digital frequency synthesizer withsigma-delta noise shaping,” IEEE J. Solid-State Circuits, vol. 39, no.5, pp. 847–851, May 2004.

[28] B. Parhami, Computer Arithmetic, Algorithms and Hardware De-signs. New York: Oxford Univ. Press, 2000.

Davide De Caro (M’05) received the M.S. degreein electronic engineering (with honors) and thePh.D. degree in electronic engineering and computerscience from the University of Napoli Federico II,Naples, Italy, in 1999 and 2003, respectively.

Since March 2003, he has been a Researcherwith the Department of Electronics and Telecom-munication Engineering, University of Naples,Naples, where he is involved with high-performanceflip-flops (including both low-power and high-speedstructures), VLSI implementation of arithmetic

circuits, direct digital frequency synthesizers, and digital mixers. He is theauthor or coauthor of more than 30 technical papers on international journalsand refereed international conferences. He has acted as a reviewer for the IEEETRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, the IEEETRANSACTIONS ON VERY LARGE-SCALE INTEGRATED (VLSI) SYSTEMS, andthe IEEE JOURNAL OF SOLID-STATE CIRCUITS.

Nicola Petra (M’08) received the Laurea degree(summa cum laude) and the Ph.D. degree from theUniversity of Napoli Federico II, Naples, Italy, in2002 and 2007, respectively.

His research interests include design of dig-ital VLSI circuits for telecommunications andhigh-performances arithmetic circuits. He is nowa Researcher with the Department of Electronicsand Telecommunications Engineering, University ofNapoli “Federico II.” He has acted as a reviewer forthe IEEE TRANSACTIONS ON VLSI SYSTEMS.

Antonio G. M. Strollo (SM’06) received the Laureadegree (cum laude) in electronic engineering and thePh.D. degree in electronic engineering and computerscience from the University of Napoli Federico II,Naples, Italy, in 1988 and 1992, respectively.

From 1990 to 1998, he was a Research Assistantwith the Department of Electronic Engineering, Uni-versity of Napoli, Naples. In November 1998, he wasappointed an Associate Professor with the Universityof Napoli Federico II and has been a full Professorsince November 2002. His initial research activities

covered the area of power electronics. In this field, he has worked on switchingpower converter simulation, modeling and simulation of power devices (SIT,IGBT, superjunction), SPICE modeling of PiN diodes and IGBTs, character-ization techniques, electro-thermal modeling of power devices, and optimiza-tion techniques of power bipolar devices with local lifetime control. His currentresearch interests include design and analysis of VLSI circuits. In particular,he is working on advanced architectures for direct-digital frequency synthesis,techniques for clock dithering in digital ASICs, high-performance arithmeticcircuits, and high speed flip-flops. He has published more than 100 papers oninternational journals and refereed conferences.

04447243

Documents