tdc sar algorithm with continuous disassembly (sar-cd) for ... 1 tdc sar algorithm with continuous...

1

TDC SAR Algorithm with Continuous Disassembly(SAR-CD) for Time-Based ADCs

Karim O. Ragab1, Hassan Mostafa2 and Ahmed Eladawy31,2,3Electronics and Communications Engineering Department, Cairo University, Giza 12613, Egypt,2Center for Nanoelectronics and Devices, AUC and Zewail City of Science and Technology, New Cairo

11835,Egypt.{[email protected], [email protected], [email protected],[email protected]}

Abstract—This paper introduces a new algorithm and circuitdesign of Time-to-Digital Converter(TDC) with modified Succes-sive Approximation Register(SAR) algorithm. This design enablescontinuous pulse disassemble. The input pulse is absolutelycompared to pulses of widths proportional to Vfs/2,Vfs/4..Vfs/N,and each bit is evaluated independent of the previous bit result.Then bits correction is applied after the sample evaluation. A 4-bit case study circuit is realized using TSMC CMOS 65nm designtechnology. The design demonstrated 3.67 Effective Number OfBits (ENOB) for a sampling frequency of 666 MS\s.

Index Terms—time-to-digital converter, software defined radio,successive approximation register, time-based analog-to-digitalconverter.

I. INTRODUCTION

Software Defined Radio (SDR) is becoming an importanttarget for Systems On Chip (SOC) design to consume lesspower and chip area for various applications. This increasesthe need for Ultra Wide-Band (UWB) ADCs with higher sam-pling rates in which we can push more analog blocks towardsthe digital domain. Time Based ADC is a special kind ofdata converters in which the input voltage is first transformedinto intermediate change in frequency, pulse position or pulsewidth using Voltage to Time Converter (VTC) circuit, andthen the Time to Digital Converter (TDC) is the one to makethe rest of the conversion. In the proposed circuit a pulsewidth-based TDC uses a modified SAR algorithm in whichthe conversion is done in stages. The number of stages isequal to the number of bits, which means that the powerand area are linearly proportional to the number of bits incontrast to [1] in which they are exponentially proportionalto the number of bits. The algorithm and the proposed circuitcan be easily fit to high speed VTC circuits like in [2]. Thereis no linear counting mechanism like in [3]. The rest of thispaper is organized as follows. In Section II, The proposedalgorithm is described showing how the modified algorithmis simpler in the circuit design rather than the ordinary SARalgorithm to be applied with minimum analog components. InSection III, a full circuit design is described as a proof for thealgorithm. Design challenges and weaknesses are describedin this section. Section IV handles the simulation resultsand proposes significant enhancements. Section VI draws theconclusion and the summary.

II. PROPOSED ALGORITHM

Before describing the algorithm, we recall a conventionalcriterion to convert a decimal number to its binary form. Fora given decimal value, starting from the Most Significant Bit(MSB) and for each bit, we try to subtract the weight of thebit from the current value. If the result is a positive number,then the digital bit is ‘1’ and it is ‘0’ otherwise. For example,when converting the decimal number of 9 to 4 binary weighteddigits, weighted 8, 4, 2 and 1 respectively, starting from theMSB, we first try to subtract the weight 8 from the number9. Which is a positive one. In this case the MSB is ‘1’ andthe subtraction result is the new value. This new value is thentested to be subtracted from the next weight of 4. Which willfail with a result of ‘-3’. In this case the evaluation of thesecond bit is ‘0’ and we do not make the subtraction and keepthe result of the previous subtraction, which is ‘1’, and test itagain for the third bit, of weight 2, and so on. This processis valid as long as the passed remaining decimal value is theresult of a successful subtraction, which results in a positivenumber.

Fig. 1: Flow chart for the algorithm

2

In this algorithm we neglected the result of the unsuccessfulsubtraction, which is -3, in the second bit evaluation in ourexample, and in particular, the modulus of this result. Thismodulus holds all the information we need to evaluate thenext bits. The value ’3’ is the complement of the first resultto the current bit weight; which is the complement of 1 to theweight 4. In the new algorithm, we use this value (modulus)to evaluate the third bit of weight 2, and then we revert theoutput of this evaluation. Also, when evaluating the last bit,the passed value from the third bit of weight 2 is ‘1’; theabsolute difference between ‘3’ and ‘2’, which is the old valuewhen using the first method. The new algorithm to evaluatethe decimal number of ‘9’ to the equivalent binary “b3 b2b1 b0” is shown at Fig.2. The value “value” is initially setthe value ’9’. Then for the next iteration, when the counter“counter” is ’1’, the new value of “value” equals abs(value -Dfs/2), as Dfs is the full scale value of “16”, irrespective ofthe calculated “b0” value. The bits “b1” and “b0” are revertedafter the subtraction (value - Dfs/2) because their previous bits,“b2” and “b0” respectively, hold ’0’ value.

Fig. 2: Evaluation of the binary ”b3 b2 b1 b0” for the number ’9’using the new algorithm

A general Flowchart for the algorithm is depicted in Fig.1.The input number “D IN” is assigned to the variable “value”.For each iteration, “value” is updated with the absolute dif-ference between the previous number and the loop referencevalue (2ˆcounter). The bit value in each iteration depends onthe sign of the result. The process ends when “counter” is ’0’.

It should be noted that “value” doesn’t depend on the bitevaluation of the current bit. Also, the evaluated bit is correctedby the previous bit value later. This resolves for the used“value” which may be the complement of the correct numberto the previous bit weight. Fortunately, this bit correction canbe done after the pulse disassembly. In Fig. 1, the correction ofthe evaluated bit can be moved outside the loop (to a separateloop).

The new algorithm is very useful for Time-Based Digitalconverters in which we convert the pulse width to a digitalnumber by comparing to the full scale reference pulses. Theequivalence of the abs(value – 2ˆcounter) is a simple XORoperation between the 2 pulses. The output of this operation isdirectly passed to the next iteration irrespective of the currentlyevaluated bit value.

III. CIRCUIT DESIGN

The proposed circuit is a 4-bit system. It consists of 4 stagesand each stage is responsible for evaluating the current bitvalue and correcting it by the bit evaluated from the previousstage; which is an input to this stage. Fig. 3 shows the unit cell.In a general n-bit circuit there will be ‘n’ successive stages ofthe same cell type.

For a general Kth cell, the cell is triggered once an inputpulse from the previous stage (k-1) is detected. The input pulse“Pin” presets the digital value “b[k]” to ‘1’ and triggers a pulsegenerator to generate a pulse of width proportional to Vfs/2ˆk;as Vfs is the full scale voltage. The input signal is delayed untilthe reference signal is ready, and is compared to it by a simpleXOR gate. The result of this comparison is a pulse with thedifference-width. This pulse is used to trigger the next stage.Also, the output pulse triggers the comparator by a small pulseto indicate the larger pulse. The inputs to the comparator aredelayed versions of the input pulse “Pin” and the generatedreference pulse to compensate for the XOR gate delay. Theoutput of the comparator is the correct value for the bit “b[k]”after it passes through two conditions. The first condition isthe previous bit value (b[k-1]) . As stated in the algorithm,if the previous bit is ’0’, then the value of the current bitshould be reverted. This is done using the 2*1 multiplexerwith selection “Sel (b[k-1]))” such that the comparator outputis selected when b[k-1] is ’1’ or the inverse is selected whenb[k-1] is ’0’. The second condition resolve the problem of theXOR gate defined resolution as explained next.

Fig. 3: Bit unit cell

When the XOR inputs are very close in length, the XORmay not produce an output sufficient to trigger the next stagebecause of the limited resolution. In this case, which are morelikely to happen for high sampling rates, there will be an errorcomparable to the reference pulse the input is compared to.In an example of 4-bit system, if this problem is encounteredfor the MSB, the output will be ‘0000’ instead of ‘0111’,which is a great loss. To solve this problem, we preset thedigital bit of the current cell (“pre-set”). As long as this cellis triggered and no output from the XOR gate, and of coursesuccessive stages won’t be triggered, then the input pulse isvery close to the reference pulse and the current bit should beforced to ’1’. However, in normal operation, when the XORoutput is sufficient to trigger the next stage, the output ofthe comparator should be considered the correct value. Thisis done by the “eval[k+1]” feedback signal taken from the

3

reference pulse generated by the next stage. This indicatesthat the signal survived to the next stage and the comparatoroutput is selected instead of the preset value ’1’. The errorin this case is defined by the resolution of the XOR gate. Inother words, resolution of the XOR gate defines the resolutionof the system for this architecture.

stage 1 stage 2Fig. 4: The comparator circuit

.Fig. 5: Pulse generator circuit

Fig. 6: DFF based on SDFF

This solution imposes a correlation between the signal pathand the comparison path, when the 2 signals are comparedfor each bit. This correlation adds conditions to the designthat requires time and power budget to fix. These conditionscan be summarized in 2 points. The first one is fitting thetiming requirements for the comparator of the early stage.When this signal is back from stage k + 1, the comparatoroutput should be ready with the correct comparison result. Atypical failure case happens when the signal path starting fromthe reference pulse output through the XOR gate, the bufferand the pulse generator of the next stage takes smaller delaythan the comparator delay. One solution, which is used in thisdesign, is to delay the signal “eval” till the comparator outputis ready. This delay is found to be relatively high reachingmore than 120ps (for the comparator used), which is a loss ofarea and also a loss of power that can reach up to 30% of the

circuit power for a high speed delay line. The second pointis meeting a condition that states that the delay summationof the XOR, buffer and the reference pulse generator shouldbe greater than the delay summation of the DFF access time,DFF setup time and the multiplexer setup time. This exposesanother main function for the buffer; which is to impose somedelay to satisfy this condition. This condition is addressed inthe simulation results section.

Fig. 7: FFT output

The comparator used is depicted in Fig.4. It consists of 2stages. When the clock signal is high, the first stage is activatedand stores the input values for the small high period of thetriggering clock. At the same time, the PMOS transistors inthe second stage pre-charge the output to VDD preparing thecoupled transistors for the evaluation phase. The evaluationphase starts once the clock goes low again when the secondstage is activated and the first stage is deactivated holding thecaptured input values to be the input to the second stage. Theevaluation is completed by this stage and the output remainson-hold till the next clock edge. Choosing the clock pulsewidth is critical because it should be long enough for pre-charging phase through the PMOS transistor, and not very longas the input of logic 1 (the input or the reference pulse) maygo down quickly weakening the differential signal held in thefirst stage. The comparator is triggered using a pulse generatedfrom a pulse generator. This pulse generator is triggered bythe XOR output.

The XOR gate used is a simple CMOS circuit. The errorin the CMOS circuit is almost fixed along different inputcombinations and can be compensated in the next stage bychanging the width of the generated reference pulse. The XORgate designs based on pass transistors add distortion to theoutput signal.

The pulse generator developed is depicted on Fig.5. Itconsists of a DFF and a delay element. The input to the DFFis the supply and the output is connected to a delay arrayending with the DFF asynchronous reset. Once a clock triggeris detected, the output is high until the feedback signal reachesthe “Rst” pin when the output goes low again. The widthof the generated pulse is the summation of the array delayand the DFF loop delay. The delay of the DFF defines theminimum pulse width that can be generated imposing anotherresolution limit beside the XOR gate resolution. The DFFused is depicted in Fig.6. This architecture is based on edge-

4

triggered SDFF in [[4]] (Fig.1) for higher speeds over thetransmission gate-based DFF.

Fig. 8: Feed back synchronization problem

IV. SIMULATION RESULTS AND ANALYSIS

The simulation is done for input pulses of width range froman offset of 10ps to a full scale width of 750ps. The inputis mapped from a sine wave with a frequency of 41 MHzwith a sampling rate of 666.7 MHz (1.5ns sampling period).The offset is optional and is compensated in the first stageby increasing the reference pulse with the same amount. Asmentioned, this goes for any possible error from the XORoperation as the reference pulse in the next stage can bemodified by design.

The FFT output for the system is depicted in Fig.7. TheSNR corresponds to 3.67 ENOB. The circuit is running on1.5 ns sampling period to resolve a pulse of 750ps maximumwidth. Ideally, the two numbers should be very close, howeverfixed delay in the circuit cause deviation from this value whichcan be summarized in four main delay sources. The first delaysource is from the XOR gate which consumes 43ps for eachbit. The second one is the pulse generator that consumes adelay of 35ps for each bit. The third one is the feedbackdelay deployed at the signal “eval” path early mentioned tocompensate for the comparator delay time (“Dcomp” in Fig.8).The delay added is the one in the last bit stage only to takethe final digital word; as this delay is consumed parallel to thesignal path. This delay can be totally removed if the digitalword is taken after the start of the new sample, when this stageis not used (caution should be taken when resetting the digitalbits for the new sample). The last delay is a forced 20ps delaypresented by the XOR gate buffer for each bit to satisfy thecondition pointed out in section III.

To explain the reason for this condition, consider Fig.8which consists of a capture of 3 stages; indexed from “K-1” to “k+1” (stages “k-1” and “k+1” are clipped for fittingpurpose). Starting from stage “K”, consider the signal at theoutput of the reference pulse generator output which is readyto enter the XOR gate and also to be fed-backed to stage “K-1”. This signal takes 2 paths. The first path passes throughthe XOR gate and the buffer of stage “k”, then through thereference pulse generator of stage “k+1”, then through thefeedback “Dcomp” (eval[k+1]). The summation of the pathdelay is: .

T1 = TXOR + Tcommon + Tbuff + TPulseGen + TDcomp (1)

As Txor is the delay of the XOR gate and Tcommon isthe common part in the signal pulse and the reference pulseat which both of the two signal is high. The second path isthrough “Dcomp” delay element, DFF for stage “k-1”, thenthrough the MUX and the input of the DFF of stage “k”, thesummation of the path delay is:

T2 = TDcomp + TDFFaccess + Tmux + TDFFsetup (2)

For proper operation, the input to the MUX of the stage “K”should be ready before the clock signal “eval[k+1]” comes.This means that T1 > T2 is a condition to be satisfied.All of these delays are fixed except the Tcommon whichcan take a value from ‘0‘ to Tfs2. As Tfs2 is the width ofPvfs2. Considering the worst case when Tcommon is equal to‘0’, The condition presented in equation (3) should be metfor proper operation. The buffer used after the XOR gatestrengthens the signal and adds delay to ensure this conditionis met.

TXOR+Tbuff+TPulseGen > TDFFaccess+Tmux+TDFFsetup

(3)A significant enhancement for the architecture is by per-

forming bits correction at the end of the disassembly phase. Inthis phase we only do the XOR operation and the comparisonfor each bit. Then after the signal evaluation is done for allthe possible stages, we do the XNOR operation of the samplebits as featured from the algorithm in Fig.1 (noting that theXNOR operation should be done for the triggered stages onlyand the bit of the last triggered stage should be forced to 1 tosatisfy the resolution condition early mentioned). This wouldeliminate the need for the “Dcomp” delay lines and MUXfrom each stage.

V. ACKNOWLEDGMENT

This research was funded by NTRA, ITIDA, Cairo Univer-sity, Zewail City of Science and Technology, AUC, the STDF,Intel, Mentor Graphics, MCIT.

VI. CONCLUSION

The conventional SAR algorithm is modified to suite con-tinuous pulse width evaluation needed in the time-based digitalconverter. The algorithm offers lower circuit scalability factorcompared to competitive circuit. The theoretical time neededfor each bit evaluation is reduced to the delay of a simple XORgate and a comparator. New design techniques are developedto make full advantage of the algorithm and modifications wasproposed for significant enhancements.

REFERENCES

[1] G. Li and H. Chou, “A high resolution time-to-digital converter usingtwo-level vernier delay line technique,” in Nuclear Science SymposiumConference Record, 2007. NSS ’07. IEEE, vol. 1, 2007, pp. 276–280.

[2] H. M. M.Wagih Ismail, “A new design methodology for voltage-to-time converters (vtcs) circuits suitable for time-based analog-to-digitalconverters (t-adc),” in In Press.

[3] S. Naraghi, M. Courcy, and M. Flynn, “A 9-bit, 14 µW and 0.06 mmpulse position modulation ADC in 90 nm digital CMOS,” vol. 45, no. 9,pp. 1870–1880, 2010.

[4] F. Klass, “Semi-dynamic and dynamic flip-flops with embedded logic,”in VLSI Circuits, 1998. Digest of Technical Papers. 1998 Symposium on,1998, pp. 108–109.

tdc sar algorithm with continuous disassembly (sar-cd) for ... 1 tdc sar algorithm with continuous...

Documents