isscc 2014 – tutorial transcription analog front-end design for … · 2016-06-29 · was...

ISSCC 2014 – Tutorial Transcription Analog Front-End Design for Gb/s Wireline Receivers

Instructor: Elad Alon 1. Intro Thank you very much Frank, for the kind introduction. As Frank mentioned, today I’m going to be telling you a little bit about Analog Front-End Design for Gb/s Wireline Receivers. 2. Importance of Wireline Link So before I go too far, I thought it’s worthwhile to just mention a little bit about why this is an increasingly important topic these days, and perhaps actually for this audience, I will not have to work very hard to convince you of that. As you can imagine, with the rise of cloud computing and multi-core, and in fact, even specialized processing, the ability to shovel data back and forth between these large computing servers, and in fact, even inside of your tablets and mobile phones, is really increasingly becoming an important differentiator in terms of the overall performance, as well as the power consumption of the systems that we deal with. So just to give you an example in the pictures that I’m showing you here: this is taken from the Open Compute Project, where you can see inside these chases – you can easily have 10s of 1000s of these links, because each one those riser cards that are shown in the picture there, can easily have 10s of 100s of links, and you can see that there are actually many of them shoved into a single backplane. 3. The High-Speed Wireline Challenge This is not to say that building these systems and getting them to work is an easy thing to do. From kind of a 1000-foot point of view, the whole challenge here is that we have to deliver as much throughput on and off of the chip as we can. To give you an idea, this is typical on the order of multiple Tb/s for each individual die if you looked at the bi-directional bandwidth; while at the same time using as little power consumption as you possibly can. To give you numbers of what’s a very good target – perhaps somewhat aggressive, but if you get there, you’re definitely good – you have to get something on the order of 1pJ/bit. To put this into context – that means for a 10-Gb/s I/O, you should only be spending 10mW for the entire link. This is all compounded by the fact that when you send a particular analog voltage waveform on the transmitter, by the time it shows up at the receiver, generally, it has been heavily distorted. In the picture here, the so-called communication channel unfortunately really does distort the signal. Here, this is the frequency domain view of things – we’ll actually be focusing more on the time-domain view of

things shortly. But really, the point is that: what you get at the receiver is not what you originally sent, and so we’re going to have to deal with fixing that up, and figure out what the right data really is. 4. Focus of this Tutorial: RX As you can imagine, these systems, as we’ve increased the data rates and become much more stringent on the power consumption budgets that we’d like to me, have actually become fairly large and complex. I don’t want to even attempt to cover the entire/overall transceiver, but instead, today, we’ll be focusing mostly on the receiver, and in particular the receiver front-end itself. You may wonder: why did I choose the receiver as opposed to any other component? The first thing is that by definition, the transmitter knows the data. In several places, that actually makes things much easier on that side of things. In other words, the constraints on the receiver tend to be more stringent, because again, it doesn’t actually know the data a priori – that’s what it’s trying to recover in the first place. The second is that (as we’ll see in a little more detail shortly), the transmitter itself, generally, is driving the impedance of the channel, which typically speaking is a 50-Ohm single-ended equivalent, whereas on the receive side, we really get to control the impedance within our circuits much more directly. Typically speaking, the impedance levels are actually higher in the circuits, and because of that, a lot of the processing we want to do – particularly in the analog/mixed-signal domain, tends to actually be much more efficient when you implement it on the receiver than on the transmitter. The final point that I’ll mention is that if you really understand how to design the receiver – particularly the specific blocks that we’ll be walking through today, those techniques map over extremely well to the transmitter also. 5. Tutorial Outline What are we going to do for the rest of this tutorial? First, I’m going to walk through a few of the basics of what receiver analog front-ends (AFE) really need to do. I’ll then actually spend most of the time going in to the detailed circuits themselves, and in particular, give you some simple analytical equations you can use to predict the power consumption of these designs. I’ll then take a step back and walk back through a few of these circuits, and point out a few of the practical issues and gotchas that I’ve seen myself in the past, so that when you potentially go to build these things, can avoid them. Finally, I’ll summarize with what I think a few of the key lessons that you should really have learned from this overall tutorial were, not just for high-speed links per se, not just for high-speed receivers, but more general if you are trying to build efficient analog/mixed-signal systems.

6. Tutorial Outline Let me dive into the Receive Analog Front-End portion of things. As I said, I’m first going to walk through the most basic requirements for these analog front-ends, and then I’ll spend some time mostly focusing on Receive Equalizers, which I’ll explain shortly what these things really need to do. 7. Most Basic RX If we take a step back and just think about a very simple digital communication system, where I’m sending, at the transmitter just 1s or 0s, the picture actually ends up being very simple. I have this transmitter, which is either going to do 1s or 0s, it’s going to go some “wire” to get to the receiver, and then at the receive side, barring any other complexities which we’ll introduce in one second, the most basic receive would essentially just be a comparator, because all it really needs to do is take the potentially small analog voltage that shows up at the end of the “wire”, and convert it into a nice digital level. The natural question of course is: in this simple picture, what can go wrong? 8. Key Constraint: Bit Error Rate Hopefully it is obvious to everyone that the #1, most important constraint on any receiver is that it actually receives the bits that were transmitted, and not something else. The constraint that this is translated into is called the Bit Error Rate. It turns out you can never build something that perfectly receives every single bit that has ever been transmitted. You can only drive the probability of an error down to a sufficiently low level, and so this Bit Error Rate (BER) is really just the average number of received bits that were incorrect, divided by the total number of transmitted bits. As you can imagine, once you get into the guts of these things, it’s actually a whole discipline in and of itself, to calculate what this BER actually is, given one particular high-speed link design and the channels, etc. But just to give you an idea why errors might happen, let’s consider that I take my comparator, and there’s some residual offset, or input referred offset, meaning that just the DC level at which it decides a 1 and a 0 is not exactly where you wanted it to be, and let’s also keep in mind that any circuit that we build will always have some thermal noise associated with it. If that’s the case and we look at this eye diagram, which is just basically taking the analog voltages that show up at the receiver and overlaying them in time, what will happen is that, because of thermal noise, every once in a while, you’ll get a noise event that’s actually larger than the input amplitude you received. If that happens, that means implicitly that input is now (in the analog domain) lower than the crossing point of my comparator – so my comparator will get the wrong answer. If I restrict myself to this very simple scenario, it is very easy to show that the BER is equal to ½ of the complimentary error function of essentially the analog input voltage amplitude I have, minus the residual offset, divided by square root of 2

times the deviation of thermal noise. It turns out don’t really need to remember this equation – it’s much easier to just think about it as what’s inside of the argument of the complimentary error function. As I’ve pointed out here below, usually you get told that you need something like a BER of 10-12 or 10-15, perhaps even 10-20. Once you know that, it just tells you how big your residual signal amplitude (which is the analog level minus the offset) need to be relative to the standard deviation of the noise. In this particular case, if you want 10-12, you need something like7 sigma of noise. If you want 10-20, it’s something like 9.25 sigma of the noise. This tells you signal-to-noise ratio-wise, for a given BER, how much do you have to guarantee. 9. Noise Not the Only Source of Errors: ISI Having said all of this, life would actually be quite good if the only cause of errors in our links was actually noise. It turns out that these days, the much more dominant cause of errors tends to be the distortion introduced by the channel itself. I’m going to repeat this one many teams, which I’m going to define right now. The most common type of distortion is what’s called Inter-Symbol Interference (ISI). The reason this comes up is because the “wire” that we have between the transmitter and receiver is really not ideal. It has several effects that distort the signal. The first is that the channel is actually band-limited, means that it’s dispersive, or typically, low-pass. If at the transmitter we sent this nice, short, clean, blue pulse; by the time it arrives at the receiver, it looks like this green thing where it has been spread out. The black dots are sort-of my symbol-spaced samples, showing what the analog levels would have been if I sampled this right at the data rate. These dispersion or low-pass filtering effects spread the energy out, but relatively close to the originally-transmitted signal. Unfortunately, we won’t just get those – we’ll also get reflections from various discontinuities within those channels. If you remember, we had those riser cards that plugged into the backplane, those connectors – it’s very hard to maintain an actual 50-Ohms throughout the entire thing. Those then to cause reflections, which in this waveform here, you can see these wiggles right where my mouse pointer is now, that’s typically indicative of a reflection. Unfortunately, because those reflections come fairly far away from where the transmitter is, they can take a long time in terms of the symbol period to actually show up in our received response. Basically, we are going to have to deal with not only the thermal noise, but also trying to get rid of these ISI, which as you can imagine, can cause us to have bit errors simply because for example, if I was transmitting a “1” for a very long time and then all of a sudden, I just transmit “1,0”, as you can see in this response here, you wouldn’t even get down far enough to cross the receiver’s threshold. 10. Equalization So what’s the technique that people use to deal with this? Well, that’s actually a very old and very well known technique called equalization. The idea is very straightforward. If in the frequency domain, my channel has this low-pass characteristic, but on my receiver, I pass it

through something that looks like the inverse of that channel (or to a good approximation, the inverse of that channel), then again in the frequency domain, I should get something that looks flat. Equalization is coming from this frequency domain picture, where you are trying to equalize the gain at all frequencies. In time, this just means that if we sent a nice, clean, well-defined pulse at the transmitter, we’ve done everything properly, then at the receiver, we should also get this nice, clean, well-defined pulse. 11. More Complete RX AFE If we now look at what a more complete receive analog front-end would look like, in general, I’m still going to have this comparator, which is sometimes to referred to as a slicer, but in front of it, I’m going to have some kind of equalizer, and inevitably, there is always going to be some kind of digital control, for all of these things, both for example to try to cancel any sort of residual offsets in that slicer, as well as actually change the effect of the equalizer so it really matches with a particular channel that we’re working with in this point in time. What I want to do now is look at the types of equalizers that we’re going to need, particularly on the receive side. But I would like to point out that there is often equalization done on the transmitter as well, but if you look at the more modern link architectures, increasingly speaking, the burden is moving to the receiver, and so that’s why I’m going to focus on things there. 12. Continuous Time Linear Equalizer (CTLE) The first and perhaps most basic type of equalizer is what’s known as a continuous time linear equalizer (CTLE). The idea here is what I actually proposed when I described equalization in the first place. Imagine that I have my channel, which has a certain response in the frequency domain. What I’m going to try to do is build an analog circuit on my receiver, which tries to approximate the inverse of that channel. As I’ve highlighted here, you can imagine, if I really had a low-pass channel, then my analog circuit that does the inverse, out at infinite frequency, would have to have infinite gain. Obviously, that’s not a very practical thing to do, but I should mention that if anyone knows how to build infinite gain at infinite frequency, please come give me some stock in the startup, I’d love to get in on that particular action. But the good news is, we don’t actually have to do that, because at the end of the day, we are really only transmitted up to a certain Baud rate – we have a certain data rate we are trying to get through this thing. So really, you only have to maintain this inverse shape, basically up to about the bandwidth of the signals that you’re interested in. It turns out that the real bandwidth you’re typically interested in, is something about 2/3 of 1/bitrate. If you’re at 10 Gb/s, this would be about 6.67 GHz, which is roughly the bandwidth that you would need to maintain in this equalizer.

13. CTLE Implementation, Limitations (1) Let’s think conceptually about how these CTLEs would actually be built. As we’ll see, I’ll immediately point out some of the limitations associated with them. The most common implementation of these CTLEs is actually to use negative feedback. The idea is that you would take some kind of amplifier, which originally had its own (generally speaking) flat response, you then feed that back through a low-pass filter, and if you think about it, that means that from the channel input to the output, if you have enough gain in that amplifier, you shouldn’t be building the inverse of what that filter’s low-pass response is. That means that if I want my equalizer to match any particular channel, I just have to make this low-pass filter here have the same response as my channel did. Typically the way you do that is to build that low-pass filter out of some resistors and capacitors. That’s actually nice because it’s pretty easy and efficient to implement fairly low time constants just using these resistor-capacitor networks. As I’ve pointed out here, this is really most effective with a fairly small number of discrete poles and zeros that you’re trying to implement. You can imagine if you have a very complicated response, you have to build a very complicated analog RC network to try to mimic that, and in general, that tends to be infeasible fairly quickly. As I said, if you really have a complicated channel, particularly one that has a lot of reflections, which in the frequency domain tend to manifest themselves as deep notches in the response, that you usually don’t want to deal with, with these CTLEs, because again, it’s very difficult to mimic that response, particularly in a programmable and robust way. Often times, this will not be your only equalizer, but rather, you’ll combine this with some sort of discrete-time equalizer. The advantage of discrete time is that now, every time you add one additional tap, that basically means you an extra degree of freedom – it’s easier to deal with more arbitrary responses in those types of equalizers. 14. CTLE Implementation, Limitations (2) Before I move on from this CTLE, I do want to point out one of these limitations that is fundamental to the way we are building these things. We typically build these equalizers by actually placing some amplifier in a feedback loop through a low-pass filter. If you think about that, what that really means is that we took some amplifier that originally had some response that looked like this – some gain and some bandwidth associated with it, and then when we pass it through the low-pass filter, what we are actually doing is taking the gain at low frequencies and throwing it away. That is by no means the same as actually increasing the gain at high frequencies. 15. The Fundamental Issue: Noise If what you’re really doing is throwing away gain at low frequencies, then the problem you then run into is that what you’re actually doing when you build this equalizer is: you’re decreasing the

amplitude of the signal. Again, for the low frequency parts, you are actually reducing the gain, not increasing the high frequency gain. So these equalizers certainly can eliminate ISI, but if you think about it from the signal-to-noise ratio standpoint, if I have a certain fixed amount of noise that was coming in to the input of my equalizer, and I just said that I reduced the amplitude of my signal, that means that relatively speaking, the noise looks like it was larger than it was before. We should actually be careful here. I’ve focused on noise so far, but it turns out that another very important very important source of potential error in these links is crosstalk, meaning that even though you have two differential channels coupling onto each other, another one of the main limitations of these CTLEs is that they also effectively make the high frequency crosstalk look larger than what it would have looked like otherwise. As I’ve hinted at, this is really particularly bad on channels that have deep notches like this, because you basically have to knock the gain down all the way below the deepest point, just then afterwards, you can keep things relatively flat. The more equalization you try to do, the more you’re making the noise look larger than it otherwise would have. 16. Good News: There is Another Way Given that I spent as much time saying why there are some limitations here, as you hopefully have figured out, there is indeed another approach we can take to get around some of these issues. In particular, this comes down to the fact that even though I’m at the receiver, let me assume that the bit I receive is actually the correct one. If I actually did receive the bit that was the correct one, then if I go back and look at this channel, then once I have received for example a “1”, I know that one symbol later, I’m going to be getting some residual ISI, which in this particular case has an amplitude of about 0.2, two symbols I’m going to be getting 0.1, and so on and so forth. Because I know what ISI should be coming, why don’t I just take my received bit, which is now digital, and use that to directly cancel the ISI that I know is coming afterwards. 17. Decision Feedback Equalization This is what’s known as decision feedback equalization. The name hopefully should be clear based on the description I just gave. What you have to do is decide what is the bit that I did actually just receive, take that decision, feed it back through some kind of filter, where that filter is predicting what the future ISI is going to be due to the bit I just received. The advantage here, which is really the critical reason why you do this, is that you pay no noise penalty. There is no amplification of random garbage, because once you’ve decided, this is a digital bit. As a reminder, the whole point of digital is that the system behaves as if there is no more noise anymore. If I decided it was a “1”, and it really was a “1”; it has no noise in that estimate any further. Just take those “perfect bits” and use those to subtract out the future ISI that I know is coming.

18. Key Constraint: Timing As you can hopefully tell from this thing, the number one constraint in building these decision feedback equalizers (DFE), is that you have a very tight timing loop to deal with, because you have to do all the following in at most 1-bit time. You have to take the potentially small analog input signal, resolve it up to a good digital level, feed it back through this scaling, sum it, allow that to settle to sufficient precision, and then finally repeat the process. As you can imagine, again if I’m going at a very high speed and I need a reasonably high gain to get the small analog things up to the digital levels, that can very easily limit the performance of this system. 19. Complete Signal Chain Before I move to then actually dive into some of the circuit details, I want to give you a picture of what the overall signal chain now looks like. We have some transmitter, we go over a channel, typically we are going to have some CTLE, which we’re then going to combine with these DFEs, and as I’ve indicated here, we’ll sometimes include a variable gain amplifier (VGA) simply because we want to make sure that the DFE, if I have a lot of ISI left over, that kind of means there is a lot of junk – I don’t just have a nice digital “1” and “0” – I actually have a lot of junk spread around those levels, and I want to make sure that my analog circuit in this DFE processes that analog input signal in a linear way – meaning it doesn’t compress, because if it compresses, it turns out you have to use a much more complicated feedback filter. As long as it behaves linearly, the feedback filter itself can remain a linear FIR. I’ll say some more about this a little bit later on, but just keep in mind this is implicitly the signal chain that we are thinking about. 20. Tutorial Outline Now what I’d like to do is dive into a few more details of the individual circuits, and I’ve said before, I really want to focus in on predicting is predicting the power of each one of these designs. 21. Comparator Circuit Design Let’s start out from the end of the signal chain in some sense, or at least from the most basic receiver, which is just a comparator. I’m not going to spend a lot of time or detail on this because there are many comparators that are out there, and there is actually a lot of great literature as to how one designs them. But, I will point out that by far, the most common design used today is a so-called strong-arm comparator. That’s really what’s drawn here in the core here on the right. We basically have a differential input pair used to steer some current into a cross-coupled, or

something that looks a lot like a cross-coupled inverter. And then, that cross-coupled inverter provides you a regenerative gain to decide which one was higher or lower. I will however mention that it is typically required to include a pre-amp in front of these things for a couple of reasons. One is that this pre-amp just provides you a little bit of analog gain to ease the job of the comparator, but even more importantly for these types of designs, it acts as a way of isolating the so-called kickback from the comparator. As you can notice, this comparator has a lot of precharge devices in it, and basically has signal swings that go from 0 to VDD. If you get any coupling from the drains of these input devices, back to their inputs, that can cause substantial voltage error at the input of the comparator, unless you have a nice impedance driver in front of it. Typically, that’s exactly what this pre-amp is design to do – just get rid of that kickback and make sure that thing behaves robustly. 22. Comparator Power Consumption Now let’s think about the power consumption of this comparator. Let’s focus first on both the core and the latch that typically follows it. The good news here is that this is typically very much just digital gates. Just like with digital gates, you can compute the power with CV2f – exact same thing. Once you know how much capacitance is switching, you know the supply voltage, you know the clock frequency, that basically tells you the power consumption. Let’s actually focus more on the pre-amp power. As I’ll describe shortly, if you can predict the power of the pre-amp, you can actually predict the power of all the other equalizers basically immediately. 23. Preamp Power Consumption (1) Let’s go ahead and do that. That pre-amp, as I mentioned before, typically is just a resistively loaded Class A amplifier. I’ve drawn a single-ended equivalent circuit over here. Hopefully, as most of you remember already, if I have this type of circuit where I have a resistive load, the gain-bandwidth of that analog circuit, is simply set by the gm (or transconductance) of this input device, divided by the total capacitance that it’s driving. Notice I’ve actually included the capacitance (I’ve called it Cself here), which is really due to the transistor itself. The key thing to remember about that capacitance is that as you make the transistor larger, you give it a larger gm, that Cself will also increase, because it basically implies that you’re making the width larger, or you’re using more fingers, etc. In order to get to a single equation that doesn’t depend upon itself, I want to remind you that this Cself is basically equal to the gm, but I just have to normalize it by some other factor to get back to capacitance. It turns out, the ωT – the transition frequency, and this γ here is just a factor that relates the drain cap to the gate cap. If you just divide that gm by that ωT and multiply by γ, that actually tells you how much capacitance is really there. If you do two lines of algebra, you’ll quickly come to this following result. The transconductance is simply equal to the gain-bandwidth times the capacitance, divided by 1 minus a normalized quantity, where that normalized quantity is the gain-bandwidth divided by the ωT effectively of

the transistor. I’ll mention now and I’ll mention this again. If you remember nothing else from this tutorial, please remember the form of this equation, because actually drives a lot of the design constraints for everything else afterwards that we’re going to be talking about. This is basically the gain-bandwidth times capacitance, divided by 1 minus a normalized quantity, where that normalized tells you basically how hard are you working to get gain-bandwidth relative to the intrinsic speed of the transistor. 24. Preamp Power Consumption (2) Now to actually get this into real current consumption as opposed to just the transconductance. In almost all of these circuits, we essentially design them so they’re operating at a constant current density. If you fix the current density, that basically means that your gm is proportional to the current that you’re using. I’m going to defined this so-called V* here – you can think of this like an overdrive voltage, which is just by definition, the drain current divided by the gm, and so again it’s a lot like the overdrive because the lower V*, the larger gm you get per unit drain current. Once you’ve done that, then basically the bias current is exactly the expression we have for gm, just multiplied by V*. So again, for of those of you who are digital people, and you like CVf as the current, it’s the exact same thing here. The f is actually gain-bandwidth, and the V is not the supply voltage – it’s actually V*. Now that we’ve gotten everything into this form, it’s actually useful to go and just look at some typical numbers that one gets from this. How much power does it actually take to build one of these things? Let’s just take a typical 20-Gb/s design. Let’s say we set a gain of about 2, a bandwidth of 80 Grps, a V* of 200mV, a load of cap of 10fF, and a transition frequency (ωT) of 400 Grps. This bias current is only 530μA. To put this into context of the pJ/bit metric I mentioned earlier, this is on the order of 25 fJ/bit. As you can see, this is most likely not going to be the dominant source of power in our link. This is usually actually a very small fraction of the overall budget. 25. CTLE Circuit Design Now that we’ve looked at this pre-amp, let go to the next step, which is basically looking at how we actually build this CTLE. For reasons that I’ll expand upon a little further later on, the most common design is this source degenerated design. What I mean here is that I’ve essentially taken my normal pair, but inside of the source, I’ve put this RC network, which if you think about this, you can twist your head a little bit, is essentially doing some local negative feedback. When the input comes in, for example this goes high, this side here also wants to go high, but that means the VGS across the device has actually gotten smaller. A very natural tendency for most of us, I include myself in this crowd, you give a circuit designer a circuit, and you give them Rs and Cs, and the first thing they try to figure out is: what are the values of those components that I need to be using? Well, I want to advise you to resist that temptation for now, because if you think about these CTLEs, really one level up from the circuit is really only a small number of things you care

about. You care about: where is the zero – remember, I’m trying to build a high-pass filter, so that zero is supposed to match where the bandwidth of my channel is. I care about where that is, and again, that’s going to be set by the channel pole or poles as the case may be. I care about how much bandwidth my CTLE is going to give me, because that has to match up with my data rate. I care about the peak gain that my CTLE is going to give me, because I want to know how much signal swing is going to come out of this thing. And, I also care about the DC gain, which if you remember, we said we’re knocking off that DC gain. Well, I would like to know what that DC gain actually is because that sets how much noise I’m enhancing, or really what the signal levels coming out actually will be. 26. CTLE Design Equations (Simplified) Once you take that perspective, I actually claim that the design of the CTLE itself is fairly straightforward. I can just go and take these system level parameters, for example the peak gain is simply the transconductance times the load resistance, the bandwidth as usual is just 1 over the load resistance times the load capacitance, and so on. Once I can relate those system level parameters to the circuit level parameters, I can do a little algebra, and immediately tell you what the circuit parameters need to be in order to implement the system that I’m interested in. Given that it is early in the morning on a Sunday, I don’t want to walk through a ton of detail on each of these equations, but trust me, you can go and do the math, and you’ll get these fairly straightforwardly. 27. CTLE Power Consumption Let’s think about how much power these things actually will be dissipating. The good news here is, remember I told you before you should remember that one equation for the pre-amp? Well, that’s because this is going to keep on showing up over and over again. For the CTLE in particular, this is at the end of the day, at the end of the day, simply and amplifier. The equation for power consumption is exactly the same. Once again, it’s simply this gain-bandwidth times V*, times the load capacitance, divided by this normalized quantity, which is indicating how much gain-bandwidth are you demanding from the circuit, relative to the intrinsic bandwidth available from the transistors. 28. VGA Implementation and Power I’m jumping around in the signal chain, but this is sort of an order of complexity of how hard it is to analyze these things. The next thing I’m going to briefly mention is how one might build this VGA, and again how much power it would actually take. There are many possible VGA implementations. I will mention that there are issues and potential drawbacks to each of them,

but I want to point out one in particular that has many nice characteristics, and really the most important thing to take home from this is: what is it about this that actually makes this nice? I would claim that source degeneration is an attractive way of doing things for a few different reasons. The first is that you have to remember: the whole reason you build a VGA in the first place is: if you have signals that are coming in that are too large, you want to make sure that things afterwards or even your own stage itself, is not becoming too non-linear – it’s not actually clipping that signal. If you do this source degeneration thing, and you do it “right”, then as I increase this degeneration resistance, I’m basically applying stronger negative feedback, which implies that my linearity will improve as I reduce the gain; which is exactly the characteristic you want. 29. CTLE Power Consumption The other (perhaps even more important) point is: if we look back at these equations, my power is really directly set by that load capacitance. If you increase that by a factor of 2, power goes up by a factor of 2. 30. VGA Implementation and Power Notice this variable resistor here is not really affecting the load – it’s sitting in this source network. As long as I’ve done things reasonably well (we’ll see later what some of the gotchas there may be), the fact that I have this tunable resistor here should have no impact on what happens on my output, in other words, I’m not adding any extra load capacitance, and so therefore, my power consumption really should not be increased any more than if I would have had simply from building a normal amplifier without the variable gain. The last thing I do want to point out is that if you’ve done things right, you can make it so that the bandwidth of this amplifier does not change as you change its gain, whereas as an example, if you tune the load resistance, the bandwidth will immediately change along with the gain setting, and of course, any extra parasitics you get from that tuning show up directly at the output node. As I’ve mentioned a couple times now, the equation for power consumption for this is exactly identical for the pre-amp. It’s set by the maximum achievable gain, the bandwidth, and so on and so forth. 31. DFE Implementation (1) Let’s launch into what’s the most complicated in terms of the equalizer circuits we’re going to be looking at today, and particularly, the one that will have the largest implications of this form of the equation I kept pointing out to you earlier. Before I dive into that, let me point out how these things are typically implemented. As we’ve shown here in this conceptual thing, what you’re trying to do is take the input, and take the decided output, feed that back through an FIR filter,

and basically subtract the result of that FIR filter from your input. It turns out that the most common way to implement these things is with a mixed-signal FIR filter. As I’ve indicated here on the right, if I have digital bits already, I might as well keep them digital, and do the delays simply with the equivalent of flip-flops or latches. Any z-inverse term in this FIR filter, I’m going to be implementing with flip-flops. On the other hand, in terms of the scaling and summation, that’s really what I’m going to do in the analog domain. As an example here, imagine if I have a differential representation for my data, what that means is that I can take my data and steer some current to one side or to the other of this differential circuit, where the side I steer is just based on the data of course. By essentially adjusting the amount of current I use for that steering operation, that’s how I’m implementing the scaling. As I increase that current, that’s like increasing the coefficient value that I’m subtracting off from the fed-back digital bit. The last point that I want to mention here is that you typically will build these things in the current domain, simply because, when you want to do the summation, summation really is dotting the circuits together – you just have current feeding into the same node, and by definition, by KCL, that’s the equivalent to doing summation, particularly if you do it in this differential format. This whole DFE over here translates basically into something that takes the input voltage, converts it into a current, takes the fed-back bits and converts them into a sequence of currents, sums them all together, and then decides on the new value. 32. DFE Implementation (2) As I mentioned before, the whole design of this thing is really driven by the feedback latency constraint. We’ve said before that we have to take this thing, decide, come back, go through the scaling and summation operation, and then settle that to sufficient resolution. All that has to happen within 1-bit time. 33. DFE Summer Design The implications of that are that we basically have one key design constraint that we can use to figure out how much power is this thing going to be taking. That key constraint is very simple. We’ve said that in 1-bit time, we basically have to get through whatever digital piece we have – this could either be the comparator or in the simplest example, the clock-to-Q of the flip-flop. Then I have to get through that switching circuit that steers current from one side to the other, and then we have to settle that out with sufficient resolution, that basically the accuracy of the analog value that we create is sufficiently good to get the right answer once we do the slicing operation. In this Tbit, we have the digital feedback delay (in this picture you can think of that like a clock-to-Q), plus however much analog settling time we need. For this analog settling time, you usually need some number of time constants of settling to get sufficient accuracy in the value you’re creating. As an example, all I’ve done is rearrange things here to say that if I have a certain amount of analog settling time left, which is really just the bit time minus that digital

feedback delay, and if I need some number of time constants (Nτ), the summer circuit’s time constant has to be Tbit minus the digital feedback delay, divided by that Nτ. Just to give you some representative numbers, typically, people target about 4τ for settling, because what that means is that the final value has settled to about 98% of its ideal final target, which means that the residual errors are below the point where we would really care about them. For those of you who have forgotten, for these simple first-order circuits, every factor of 2 in time constant you add, you get another factor of 10x in the relative settling. So, 2.2τ is basically 90%, 4τ is 98% (not 99%), but basically you can proceed from that point straightforwardly. 34. A Reminder: Self-Loading Here’s where this relative form of things really comes into the picture strongly. Here I’m not actually saying anything about the DFE right now. I’m just going back and looking at even for a simple pre-amp, what was the implication of this form? I’ve really highlighted this piece on the bottom here. I’ve said that we have something like this nominal power, which was that AvC, basically, or AvfC, divided by 1-x. As you can all hopefully tell, if x ever gets close to 1, this whole expression goes off to infinity. From a circuit level, the implication of this is very straightforward. It basically says that there is an absolute gain-bandwidth limit to your circuit. No matter how much power you spend, you can not get higher than that gain-bandwidth, because you’ve hit the intrinsic limits of the transistor. Intuitively, if you kind of think about this, what’s really happening is you try to burn more current to get more gain-bandwidth, but the capacitance of the transistor itself also increases. If you push that to the limit, you’re just driving yourself and not actually getting any faster out of the circuit. The reason I bring this up specifically in the DFE context… 35. DFE Summer Design …is that if we look carefully at this structure, I don’t just have my analog circuit I had before to convert the voltage into a current, I have all these extra so-called taps hanging off there as well. Each one of them has its own parasitic capacitance. As it turns out, that parasitic cap creates additional self-loading in our circuit. 36. A Reminder: Self-Loading As I’ll show you in one second, that self-loading is going to turn out to be the dominant limiter in terms of how complicated, or how fast of a DFE one can build.

37. DFE Summer Power Consumption Let’s actually go through and analyze that. I’ve dramatically simplified things here, but it turns out that the basic analysis really does hold, even if you do real, full circuit. I’m going to model my DFE as being some input device. Again, I’ve done the single-ended equivalent to keep things simple here. This is just converting the input into a current. Then I’m going to lump together all those feedback taps – the ones that are actually steering current back and forth, I’m going to lump them together all into one thing, where I’m just going to define that the amount of voltage signal created at this node due to all of those taps – I just want to define it in a normalized way to be this MDFE. That MDFE is just the Vout created by all of the taps, divided by what happens at this output node due to the actual input that really carries the information about the bit. That’s often referred to by the way as the cursor, as opposed to the pre-cursor or post-cursor. The DFE is what’s canceling the after-the-cursor ISI. To again keep the equations simple, if I assume I have the same biasing for both my input devices, and all the devices inside the DFE itself, then it’s very straightforward to analytically predict how much power this DFE needs to be dissipating. It turns out it is very similar to what we’ve looked at so far, maybe just a couple of things that I’ve substituted. The nominal power consumption is the gain times V*, times the load cap. This term over here on the right, which is just Nτ divided by Tbit minus tdigital – that’s really the bandwidth of the structure. You remember we needed a certain number of time constants. I just plug that in to get the equivalent in bandwidth. That’s the nominal power consumption, but notice that down here on the bottom, rather than just having the bandwidth normalized against the intrinsic transistor bandwidth, I actually have this extra term over here, which is proportional to that MDFE, which was telling you how much extra ISI is your DFE cancelling, relative to the real input information. As you can see, the larger this thing is, the larger this self-loading term gets. 38. Implication What’s the implication of all this? Why did I bother slogging through this particular detail? What this basically tells you is that there’s an absolute maximum amount of ISI that you can cancel in your circuit in a given process technology and at a given data rate. As you can see in these curves on the right over here, as you can imagine, the longer it takes me to take that digital feedback in, the less settling time I have, and because of that, the smaller amount of ISI cancelation I can actually do. If I have a really fast digital feedback – in this example I’m using a 10-Gb/s design with a 30ps feedback, I can get all the way out to about MDFE = 8, which implies that all of the ISI that I’m getting rid of is 8 times larger than that one cursor input that I’m making the decision off of. Unfortunately, if I actually have 50ps of delay, now I can maybe only get to about MDFE = 6 before the thing goes off to infinity, and no matter how much power I spend, it’s not going to be able to handle that. You should keep in mind that usually when you build these circuits, you’re not dealing with just one particular variable in the channel, in fact, you want to build one of these front-ends that will work over a broad variety of channels.

39. DFE Summer Design The implication of that, if we go back and look at how this thing is actually built, each one of those taps is sized not just for one specific channel, but to actually deal with the worst possible channel you may see in any possible usage. That means that each one of these actually has parasitic capacitance associated with it, not just the nominal amount of ISI you’re cancelling on one given usage, but actually the parasitic cap scales with the worst possible case in all possible channels you’ll ever be dealing with. 40. Implication Even though you may think these numbers that I’ve quoted here are large, you may think there’s never going to be a channel I’m realistically dealing with where the ISI is 6 times larger than my input amplitude. Well, if you start dealing with many different channels, where they have many reflections moving around in time, you can pretty easily hit these numbers quickly. In fact, in practice, what often happens is there is just a certain number of taps you have to build, and every time you add one of those, there’s an extra parasitic cap you add in, and that basically sets this overall limit. As I’ve mentioned a couple of times already, unfortunately, it’s almost always the case that for whatever design you’re doing, you find that your requirement is just slightly past what’s slightly feasible, or sometimes you’re really unlucky and it’s just ridiculously past what’s feasible. The question you may be asking yourself is: am I just stuck, can I just not build this thing? Or are there any other tricks that I might be able to play to try to get around this limitation? 41. Current Integrating Summer The good news is there actually a couple of different tricks that one can indeed play – I’m not going to cover all of them, but I am going to cover what’s now becoming the most popular version of this, which is what’s called a current integrating summer. Let me just briefly describe how this circuit works, and then I’ll explain why it does give you a substantial benefit. I should point out that there’s a minor typo in the slide if you’re trying to follow: this reset signal, when it’s applied to the PMOS transistors should actually be inverted. This is kind of a logical reset, not the literal gate signal going into the PMOS transistors. The way this thing works is as follows, and let’s start from over here. Initially, I set reset high, so I force both of these nodes up to VDD, and when reset goes low, let’s assume I have a new analog signal at exactly that time, well I’ve pre-charged the parasitic cap on this node up to VDD, and I have current flowing through both my input devices, and my tap devices. Both of the differential nodes will start going down, but because of the difference in the input as well as any current I’ve steered one way or another because of the tap feedback, I’ll start developing a differential voltage, which should (to

zero-order) increase linearly with time. When I hit this final point, if I then make a decision based off of that voltage, now I should have a good, reasonable differential voltage, before I can then go back and reset it again to begin the operation once more. In this particular example, this reset, or implicitly the clock, would be operating at basically the full rate of the data. You would have a low period and a high period during each one of the input data bits. 42. Why Current Integration Helps You may be wondering, I have this weird, funky circuit, why in the world would you bother doing this. If you kind of think about it, what we are trying to do is develop the largest signal we can within the shortest amount of time and the lowest power. The best way to get as large of a voltage signal as you can, given a certain amount of current, is to have the highest impedance you possibly can. 43. DFE Summer Power Consumption Let’s go back here. This is a reasonable picture. The highest impedance I can get at any particular node is to not actually put a resistor there, but to just have a capacitor. Whatever capacitance is there – that’s the highest impedance I can possibly get. Anything I put in parallel is just going to make things worse. 44. Current Integrating Summer Indeed what this current integration is doing: when I turn off these reset devices, I’m just left with some cap at the output – that’s really maximizing the load impedance. 45. Why Current Integration Helps There is a good reason why people do actually put those resistors, because if I did just have a capacitor and some current being dumped into it, I remember everything I ever did. I’d have infinite history. Obviously, I don’t want infinite history, I just want information from the current bit or set of bits I’m actually dealing with. What those reset devices are doing is enabling you to make it look like you have the highest possible impedance during the time you’re trying to develop the voltage, but they’re eliminating any previous history. In each cycle, you just explicitly reset to get rid of that history. At the end of the day, what does this buy you? It turns out that there’s a direct improvement. Remember there’s the Nτ factor of how precisely you wanted to settle. Basically, when you do current integration, you get to get rid of that. This equation here is essentially the same as it was before, just notice there’s no Nτ appearing on the

numerator, which means my nominal current gets better by 2-4x, but actually, even more importantly in terms of the problem we were describing earlier, my self-loading also goes down by that same factor. As you can see on the plot here, if before I could only do about 10 taps or so before I started hitting this limit, now I can go all the way out to about 40 or maybe a little more than that. The intuition behind this is because I can get the same voltage signal with smaller current, the parasitic cap introduced by each one of my caps is now smaller by exactly this Nτ

factor. It’s basically saying that the intrinsic gain-bandwidth of my device has been improved by that factor, and that’s why I can implement many more taps than I could before. This is becoming an increasingly popular technique, because it really does not only give you a benefit in terms of the nominal power consumption, it also allows you to extend the reach of these designs out to a much larger number of taps than otherwise you would have been able to. 46. Example DFE Power Numbers Once again, I think it’s very useful to go through some example numbers/calculations, to give you some context for how much power do these things actually take when you’ve really done them optimally. Let’s take a 20-Gb/s design, and let’s assume it’s a 10-tap DFE, and once again I’m going to use 65nm because I have those number handy, which is the same process and biasing example I used in the pre-amp. Let’s say I needed to cancel a total of 2.5 times of my cursor, my cursor input voltage is 75mV, and I have 25ps of feedback delay, which I’ve chosen to be about half the bit time, somehow magically, don’t ask me why, every time I’ve built one of these links, the digital delay ends up being about half of the bit time. There’s no particular physical reason for that, but I think that’s something to do with the designer: “okay, yeah, I got it to about half, so that’s probably good enough”; gain of 1, load cap of about 30fF. This entire complicated summer thing I was telling you about would dissipate about 337μA, if you used current integration. Again, we’re talking in the 18fJ/bit range – extremely low. Doesn’t sound like this is going to be the dominant source. It turns out that actually in these DFEs, what tends to cost you the most power, frustratingly enough, are those digital flip-flops. This is somewhat process-specific, but doesn’t seem to be scaling all that well unfortunately. Every time you add those flops (well actually you need extra stuff too, like to do sign inversion and things like this), it takes you about 10fF or so of capacitance to implement each one of those unit cells, and that thing is running at the clock frequency. In this example, with 10fF per cell, 10 of those cells, that’s 100fF at 20GHz for 1V, that’s about 2mA. That’s now 200fJ/bit. As you can see, actually despite all of the magic that’s happening in that analog circuit, just storing the data digitally for that period of time actually burns way more power than all the rest of the arithmetic. This is a general thing I’ll remind you of later. Often times, your power consumption is dominated by just: anytime you have a simple digital structure but it has a certain minimum switching cap, you can almost immediately predict how much power your thing will take just by multiplying those numbers out. If you’ve done the analog stuff right, usually that’s not the dominant limiter.

47. Tutorial Outline Now that we’ve gone through and seen how one can predict the power consumption of these circuits, I do want to give you a more practical flavor as to: you go and put these things down; what are some of the gotchas to watch out for? 48. Comparator Offset Cancellation The first one, which is maybe not so much of a gotcha, but more of a philosophy that is very worthwhile to adopt is – remember how I said we have those comparators and we want to make that offset as small as we can? At the end of the day, if you have too much of an offset, that can really limit your bit error rate. One of the dominant sources of that offset tends to be just mismatch between the transistors inside of the circuit. Unfortunately, if you look at the underlying physics, if you want to go and reduce that mismatch, every time you want to get a factor of 2 reduction in the standard deviation of that mismatch, you have to increase the size by a factor of 4. Just to give you an idea: if you go and build one of these comparators and it actually runs at the speed, but you don’t pay any attention to the mismatch, you can easily end up with 60mV, 70mV, maybe even 100mV of offset, when you’d like to be receiving signals that are order 50mV or so. So you can start working out how many factors of 4 it would take to make that negligible. Whenever my students come and talk to me about these things, I immediately tell them: don’t ever fight offsets with sizing, because that basically guarantees to get you something that is very nasty. Instead, what almost everyone has adopted these days is to do offset cancellation. By offset cancellation, I mean: inject some known DC signal, that cancels the DC offset of the comparator. There are many different ways of doing this, but I want to highlight one particular one that I like, and highlight why it is that I like it, because again, you may contrast that with some of the alternatives and find that they do not meet those criteria. My personal (in most applications) favorite way of doing this is to just cancel the offset in the pre-amp. In particular, the way I do this is: I build a voltage DAC, that can I control typically with quite a bit of resolution – say 8, 9, or 10 bits or so. This changes how much current I’m going to be steering, and then I pull it out of either one side or the other side of the pre-amp, again to create a DC offset, that will cancel whatever the equivalent offset both of the pre-amp and comparator core turns out to be. The reason I like this is, as I mentioned a second ago, I may need like 8, 9, or 10 bits in that thing, which implies up to 1000 or so devices, each of which can actually be pretty big, meaning it has a lot of parasitic cap. Well, because I have these extra switches up here on the top, that can act like cascade transistors, that really isolate you from any parasitics sitting on that node, again, all of that DAC junk doesn’t actually have to load my high speed signal path. So, I’ve reduced the extra parasitic loading just to these two transistors, which just have to carry that current. They don’t need to be programmable or anything like that. This is not to say that there are no parasitics associated with this solution, but relative to for example just taking this DAC and shoving it directly on the output node, really is a huge win.

49. CTLE Design Issues (1) Next, I want to point out some practical gotchas you can run into with the CTLE design, and give you further background on this. If you look over the past few years, there have been many publications that have talked about things like: “I was able to build a CTLE that had 20dB, 25dB, or 30dB of peaking.” So there was kind of this perception that building designs that had very high peaking ratios was problematic. As I’ll show you in a second, that’s really just driven by some very simple considerations, and if you watch out for them, you can actually build CTLEs that have extremely high levels of peaking, without doing anything too sophisticated. The first issue that people tended to run into is: they go and build their CTLE and let’s say they trust me, they actually use the design equations I gave you before, and they find that unfortunately, the actual poles and zeros they got didn’t match with the equations I gave them. It turns out the first source of issue with that is: typically the output resistance of those transconductance devices. In the equations I gave you earlier, I was really assuming that I could treat all the dynamics at the source node totally independently of the dynamics at the drain. If you have an output resistance, unfortunately, that simplicity goes away. Practically speaking, if you find yourself in a scenario where you’ve set everything up and you find it should be giving you certain poles and certain zeros and it’s not giving them to you at all, oftentimes, the first thing to do is to go and try to increasing the output impedance of these devices – either by increasing the drain-to-source voltage, or for example making them longer channel length, or finally adding some explicit cascading devices just to nicely isolate the drain node from the source node. 50. CTLE Design Issues (2) Another very similar issue, which actually sometimes is the more dominant one is that, unfortunately, I do have to build some current sources to bias this thing up. Again, no current source is perfect, so in this particular instance, there are really two parasitics to worry about. The first is that I’m going to have some output resistance associated with that current source, and the second is that I have an output capacitance as well. If you kind of think about it, and you squint a little bit – you should remember that this ground symbol just means that you’re connected to a common potential, so this side is actually shorted to that side. Well, in terms of the output resistance, this output resistance shows up in parallel with this R over here. So the dominant issue that happens when people say “I want to do 20dB, 30dB of peaking”; they go and compute what this resistor needs to be – and just to give you an example, they come up with 100kΩ, but it turns out that the output resistance of their current source is actually 10kΩ. As you can imagine, you don’t actually get the peaking you wanted, because at the end of the day, you have just 10kΩ of resistance here as your output resistance, and your peaking is off by a factor of 10. Similarly, because of this parasitic cap, just like with a resistance, it ends up showing up in parallel with this intentional degeneration capacitor. The role of this capacitor, as a reminder for those of you haven’t looked at these too much, is that: once you get beyond the RC that’s sitting at this node,

the cap is supposed to act like a short. Once there’s a short, you’ve gotten rid of the negative feedback in the structure, or the degeneration in the structure. If you have a huge capacitance from these tail current source devices, that means that at a fairly low frequency, you’re already going to be shorting out that one side to the other, and well, if they’re shorted: you don’t get the degeneration anymore, and what that usually means it that your peaking kind of gets shifted, or you don’t get anywhere near as high of a peak as you expected. To maybe say this more technically, the pole that’s at that source node just ends up being at too low of a frequency. As you can imagine, there is fortunately, a straightforward way of dealing with this, which is adding some cascade in, or being more careful in how you design these current sources by making sure they have sufficiently high output impedance without causing an excessively large capacitance. 51. DFE Design: Closing the First Tap As it was hinted at before, it’s usually the DFE that then causes the most complexity, and this is true from a practical gotcha standpoint as well. As I’ve hinted at a couple times earlier, it’s typically that first tap, meaning this thing that has to take the small analog input voltage, get it up to a digital level, and then feed it back, that’s usually the most challenging timing problem ends up coming into the picture. If you think about it: for these other taps, I’ve already got a digital signal here, so I really just have a flip-flop that has to get through and do the scaling part, whereas for the comparator, it actually has to apply some analog gain, and that typically takes time. In terms of our analysis, that often means that that first tap has a larger tdigital delay than all of the other taps. 52. Option 1: Split Summation Because of that, there has been a whole series of techniques people have introduced to try and really close that first critical timing path. The first is what I refer to as a split summation. You can go check out the reference, which was published at ISSCC originally. The idea is the following: as I said before, this first tap has different constraints than all of the later taps, so why don’t I just go and build a summer, whose only job it is to deal with that first, and than I’ll isolate all the rest of the stuff my DFE is dealing with, so in my earlier example, say taps 2 through 10 onto a separate structure that sits in front of this? The advantage here is that this guy over here – because it now only has to deal with that one tap, can potentially built to be faster. I don’t have to pay for all the extra parasitics from all the other taps at that node. This does indeed work quite nicely, and in fact, oftentimes what people do is they do this summation on the pre-amp, which you were planning to use inside of your comparator core anyways. Remember that pre-amp needed to be there for kickback purposes. But a word of warning: you have to be a little careful with this trick. In particular, the nastiest issue you can sometimes run into is non-linearity. Remember, the job of this thing here is to take an input that has: for example, the real input you want, plus one extra tap of ISI, and basically subtract out that ISI, and only then make the

appropriate decision. Well, if the signal here is now too large for this particular input stage to handle it, you’ll instead clip first, and then do that subtraction. That can actually cause problems with the DFE, and if you’re not careful, it can actually cause errors. You really have to watch out to balance out the overall linearity of this chain, to make sure that up until the point where you’ve gotten rid of at least the vast majority of the ISI, you’re maintaining linearity of the signal chain. 53. Option 2: Loop Unrolling Another approach, which actually right now is the dominant approach to dealing with this first tap is what’s called loop unrolling. For those of you who are digital folks, this should be very familiar to you in the sense of a carry-select adder. The basic idea is the following, in carry-select, and I’ll map it to here as well. In a carry-select, you basically select, for my carry input, there are only two possibilities: it’s either a ‘1’ or it’s a ‘0’, so let me just compute both possibilities, and then some time later decide what the right answer actually was. It turns out you can do a very similar thing with the DFE as well. If I have only a 2-level modulation, and I have just one tap of ISI, if my previous tap was a ‘1’, that means it will shift my analog input level up by (here I’ve called α the value/coefficient of that tap), if my previous bit was a ‘0’ or equivalently here, a ‘-1’, it will shift the level down by ‘-α’. So if I want to then do this unrolling trick, I’ll just have two comparators, where each of their thresholds have been shifted by +α and –α, respectively. I’ll get both of those decisions, and then I’ll feed them through a MUX, where the job of this MUX is to say: once I know what the right answer actually is, I’ll feed that back and make the selection. By the way, in case your head hurts a little bit because its recursive, the DFE itself is recursive too, so you always have to assume it starts with the right answer and then everything works out. The same is true here. What’s nice about this is that this really pushing the feedback loop into the digital domain. Here, it doesn’t really matter in principle how long it takes to make this decision or this decision, because the feedback loop is now only here. If you’re really for example just dealing with this one tap, it’s a very powerful way of going. Unfortunately, however, if I actually have other taps that are not unrolled like this, that have to be physically summed in front of here, now the delay of the delay of this comparator actually matters again, and even more importantly, relative to me building a thing just had that comparator that made it as fast as I could go, I’ve added an extra MUX on the critical path. You may think: “come on, it’s just one small MUX”. At these speeds, you’re talking about sub-, or on the order of single-like inverter delays, which actually matter a lot, so you really have to watch out for every extra component you put onto that signal path. 54. Option 3: All CML Design The third option, which I think is going to be gaining increasing popularity, especially as the data rates are pushed up, is what I call the all CML design. CML means current-mode logic; in

particularly, meaning the type of structure I’ve drawn here, where the main advantage is that I’m not actually going to try to generate digital signals that are at the full supply voltage of swing. Instead, I’m going to constrain them to typically 300mV to 400mV or so of amplitude, and the reason that’s a good thing to do is; remember I said that I have to take this small analog signal to get it to be large enough here to be interpreted digitally, well, I can interpret digital signals as things that are 300mV-400mV, and I don’t actually need to have a full 1V. That saves me how much gain I have to put into these latches. If you remember before, the more gain I have to put, the closer I get to either these self-loading limits or just directly, the more power I have to spend. In other words, by doing this with just enough amplification, and then feeding only that back, you can actually speed things up quite substantially. I apologize for the minor typo here, this is actually reference #8 – this is the people that proposed this particular technique. They were able to demonstrate over 20Gb/s in a 65nm CMOS design. 55. DFE Physical Design The final practical issue that I want to spend a little bit of time on is really related to the physical design of the DFE itself. As I’ve mentioned before, you typically have to deal with a broad variety of a different number of channels, and every one of those channels has its own individual coefficients you have to set for those cancelation currents. The implication of that, and especially if you have a lot of these taps where you have to do the cancelation, is that you typically need a pretty high resolution, programmable current source - in other words, a programmable DAC current source. As I mentioned before, if you have a large number of bits, and you really want that DAC to be reasonably precise, that tends to make the DAC physically large, and therefore have a lot of parasitic capacitance. To make life even worse, if you think about it, you really don’t want this big, huge DAC integrated right next to the DFE, because if you put it there, all of your high speed wires would have to be much longer, and I think this is directly seen by a picture. This is taken from some work from one of my earlier students. This DFE core here is actually quite small. That DFE core by the way is everything that’s here that’s not a current source. That thing is quite small; this thing by the way had quite a few taps I believe – something like 40 or 80 even. So you can see these DACs are like 3, 4, 5 times larger than the DFE is, so you really have to place them far away, and then route a bunch of wires in to carry those currents in to each one these nodes. The implication of this is like it or not, you almost always end up with a substantial amount of parasitic cap on this tail node here. 56. Impact of Tap Tail Capacitance So you may ask yourself: “alright, well, I had argued before why it was kind-of okay for capacitance to be there”, well, the DFE is actually a little bit more complicated. As a reminder, what I’m typically doing in the DFE is that I’m turning on either in+ or in-, but obviously when the data transitions, I have to flip from one side to the other. Depending upon where those data

transitions actually happen – do I tend to cross over at a relatively low point versus a relatively high point?, I can actually introduce glitches onto this tail node. In the simplest example, let’s say that I cross at a low point, meaning that for some small period of time, both of those transistors are off. If the current source is still going, it’s just going to yank charge off of that cap – that’s why you get this downwards glitch. Similarly, if you have a high crossover point, then in order to make sure you have constant current, VGS through each of these transistors has to go down – that will yank that node up. What this means is that you can end up in a very funny situation, where what you are trying to do is build a DFE that had a certain response associated with it, but because of these glitches, you get some other response that looks really weird, and has all kinds of other stuff in it that you weren’t expecting. The good news is that almost all of these things, because of the fact that you actually have to deal with different channels, have an adaptation algorithm built into them that goes and tweaks those coefficients to try and really match them to the individual channel you’re actually dealing with. So as long as you’re a little careful to make sure that your last few taps don’t suffer from this effect, what will happen is that rather than you getting your digital codes that you thought would directly represent your channel, you’ll get a different set of digital codes that will just compensate for all of the weirdness that the DFE itself is introducing because of these extra parasitic caps on those tail nodes. This is one of those instances where you really need to think about the system as a whole, and not go and try to build the best possible analog circuit, because believe me, dealing with this, especially for those longer DFEs, to really make it exactly precise can quickly become a pain. 57. Complete Signal Path: Linearity There’s one last practical point that I want to highlight here. As I’ve mentioned a couple of times now, linearity can actually be a critical constraint in these signal paths, and you really want to stay as linear as you can all the way up until the point where you’ve canceled all of the ISI. The implication of that is that the ordering of components you use on the receiver actually is quite critical. By the way, any time you have linearity issues, the right answer is always to put any stage that gets rid of as much of the dynamic range in the variation in the signal as possible, as early as possible in the chain. It turns out that in any practical instance that I’ve seen, you always want to put the CTLE first. The reason for that is very simple. Remember, that CTLE, what it’s doing is trying to compensate some of the channel response. If I actually had a flat channel, I would get just one or two discrete levels, and that I don’t need any linearity for. Really, I’m using the CTLE to clean up some of the dynamic range, as you can see visual in after versus before, and then if I do need a VGA, I’ll put it there to make sure that my DFE still operates reasonably, but you do not want to do the reverse. If you put the VGA first, it always has to deal with all the dynamic range, and only then would the CTLE clean things up. Do pay attention. In almost all the cases I’ve seen, this is the right order. 58. Tutorial Outline

Let me just briefly summarize with a few of what I think are the key lessons, and then I’ll open it up for questions. 59. Key Take-Home Lessons (1) The first and perhaps most important lesson that I hope you take home from this is that it’s actually quite straightforward to predict the power consumption of these high speed link circuits. As you saw, it was basically two or three equations, and they all look the same as each other. My #1, take-home feedback to you guys, particularly for circuit designers is: calculate the power early and often. When your system guys to you with a spec and say “this is what we want you to build”, you should just immediately calculate and say “alright, this is about what this should take me to implement.” They may realize that actually “we shouldn’t be implementing that, maybe we should try something else because this is either not sophisticated enough, or too sophisticated and causing huge power”. The other thing is, as I’ve pointed out a couple times now, if you’ve really done things right, the analog circuits themselves can typically be very low power consumption, and that’s true even for a large number of taps, even for example in the DFE we had looked at. Again, oftentimes your power consumption is dominated by repeated digital structures that have to run at the full clock rate, because as I’ve said, you can just immediately compute – you’ll see that’s really oftentimes where the power is being spent. 60. Key Take-Home Lessons (2) The other key thing to take home, which I think is true not just for high speed links, but more generally is: any time you can reduce dynamic range on the signal path, you should do that. In fact, very much related with this, if you find that you’re solving problems that for example are like DC offsets that have a lot of dynamic range but are actually completely calibratable or correctable at DC, and you’ve created extra loading in the high-speed path, that’s probably not the right thing to do. As I had mention before, you should watch out for block ordering to make sure dynamic range is compressed as soon as you can make it, but much more commonly, keep any kind of DC correction type of stuff like offset cancelation, or adaptation, or gain control – keep that all off the signal path, because if you put it on the signal path, trust me, it will be painful. The final point, which again I think is a general design paradigm that I like to encourage my own students who are doing analog design is: you should really be paying attention to the system as a whole. There are some problems, which yes, you could solve in the analog domain, but simply don’t make sense to do so because they’ll cost you so much extra power, etc, whereas there’s actually a fairly simple knob somewhere else in the system that would immediately knock it out. So there’s a lot of times opportunities to save both power and complexity just by moving things around from one point to the other.

61. ISSCC 2014 Papers to Check Out If this stuff strikes your fancy, I just want to point you to a couple other papers at ISSCC this year that are really specially hitting on some of the topics we discussed about today. As an example, papers 2.1 and 26.1 are both really looking at some of these front-end design issues within state-of-the-art (20-28 Gb/s) transceivers. If you check out papers 2.4 and 2.5, they’ve got some really clever techniques that they’re using to get extremely energy-efficient equalizers working at about 16-25 Gb/s, using some sort of charge integration or current integration like we had described, or another alternative – the so-called charge domain techniques, that do indeed appear to be quite efficient. 62. Acknowledgements So with that, I’d just like to acknowledge a couple of my colleagues: Vladimir Stojanovic and Bora Nikolic at UC Berkeley, as well as Jared Zerbe at Apple, for contributing to these slides. That’s what I have, and I think we are now open for questions. Thank you all for your attention.

isscc 2014 – tutorial transcription analog front-end design for … · 2016-06-29 · was...

Documents