design of an asynchronous reconfigurable cell for conformal computing
DESCRIPTION
Design of an Asynchronous Reconfigurable Cell for Conformal Computing. Mariam Hoseini. Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya Yuvarajan, Dr. Deying Li. North Dakota State University April 2009. Agenda. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/1.jpg)
North Dakota State University April 2009
Mariam HoseiniMariam Hoseini
Advisor: Dr. Chao YouAdvisor: Dr. Chao You
Supervisor: Dr. Mark PavicicSupervisor: Dr. Mark Pavicic
Committee members: Committee members: Dr. Rajendra Katti, Dr. Subbaraya Yuvarajan, Dr. Deying LiDr. Rajendra Katti, Dr. Subbaraya Yuvarajan, Dr. Deying Li
![Page 2: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/2.jpg)
North Dakota State University 2
![Page 3: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/3.jpg)
North Dakota State University 3
![Page 4: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/4.jpg)
North Dakota State University 4
![Page 5: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/5.jpg)
North Dakota State University 5
![Page 6: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/6.jpg)
• Two major styles of circuit design: Synchronous & Asynchronous
• Advantages of asynchronous design, in terms of:– Clock skew– Speed– Meta-stability– Modularity – Power
• Disadvantages of asynchronous design:– More difficult to design for a hazard free behavior and a correct ordering
of operations.– Additional hardware to initiate, advance, and indicate the completion of
operations.
• Asynchronous systems are specified by handshake protocol, data encoding, underlying delay model.
North Dakota State University 6
![Page 7: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/7.jpg)
• Handshaking is the alternate for clocking in asynchronous systems.
• Data transfer between two processes is synchronized with signals that are generated by the same processes.
• Asynchronous operation can also be done without handshaking.– Handshaking is used to separate successive uses of a component.
– It may not be necessary to separate the use of a component or the separation can be done by delaying the operations.
• Handshaking can be done at higher levels in an asynchronous system.
North Dakota State University 7
![Page 8: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/8.jpg)
• Bundled data:
– Normal Boolean levels encodes data values
– Separate request and acknowledge wires are used
• Dual rail:
– Two wires are used to carry a single bit
– Request wire is encoded in dual rail data wires
– Dual rail data encoding is used in PCC cell design North Dakota State University 8
Dual rail encoding Meaning
00 No data
01 0
10 1
11 Forbidden
![Page 9: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/9.jpg)
• Pulse Signaling:– Each request and acknowledge is a pulse – Simple and small cycle like transition signaling– Dealing with levels like level signaling – Better noise immunity than single-track signaling– Potential problem: robustness of sending pulses over long wires.
– Pulse signaling is used in PCC cell design & there is no problem of long wires.
North Dakota State University 9
Request
Acknowledge
startevent
eventdone
One cycle
![Page 10: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/10.jpg)
North Dakota State University 10
• Bounded delay Simplest model Delays of circuit elements and wires are assumed to be known or bounded.
• Delay insensitive (DI) Both gates and wires have unbounded and unknown delays. Completion detection mechanism is needed at receiver
• Quasi delay insensitive (QDI) DI + Isochronic forks = QDI Isochronic forks are capable of indication All input transitions should be indicated by an output signal transition
A
C
Bd2
d3d1
![Page 11: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/11.jpg)
• In an asynchronous systems, interfaces and inside modules can be designed with different timing models
• In the PCC cell design, for timing management:– Internal of a cell is governed by a bounded delay model
– Communications between the cells is done by a QDI model
North Dakota State University 11
![Page 12: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/12.jpg)
• In synchronous systems, Boolean circuits can be constructed from a primitive like a NAND-gate
• Logic gates provide only logic functionality, not timing functionality, so not sufficient to make asynchronous circuits
• Asynchronous systems can be made from a set of primitives
• The set of primitives must provide both universal logic and timing functionalities
• Different sets of primitives have been introduced, such as Keller’s, Patra’s, Lee’s, and etc
North Dakota State University 12
![Page 13: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/13.jpg)
The set of primitives used in a PCC cell: • Wire
– Transfers the output of a component to input of another one.
• Fork– The output of one component is the input to
several components
• Merge– Sends one of its input to the output
• Join– Data from several independent components are needed
to be synchronized.
North Dakota State University 13
I O
I1 I2
O
O1
O2
I
12
I1
O1
I1 I2
O
![Page 14: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/14.jpg)
• An array of cells each having a simple one-bit processing unit
• Von Neumann neighborhood for local connections
• A routing problem occurs:
• A possible solution:
North Dakota State University 14
![Page 15: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/15.jpg)
• Another approach is to combine every two to make a double cell– The same routing capability with fewer neighboring connections
• A further step is to group 4 cells together to make a quad cell– The same routing capability with simple connections to 4 nearest
neighbors
North Dakota State University 15
![Page 16: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/16.jpg)
• Logic Unit Design
• Synchronization
• Pulse Regenerator
• Top Level Design
• Configuration Circuitry
• PCC Cell Simulations
– One-bit full adder
– Ring oscillator
– Shift register
• Implementing Pipelines North Dakota State University 16
![Page 17: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/17.jpg)
• There is a logic unit (LU) and an output register in each quarter
• Each LU has two inputs and one output
North Dakota State University 17
![Page 18: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/18.jpg)
• Dual rail inputs
• Dual rail outputs
• Switches should be set before inputs arrival
• 8 switches to define a function
• 16 functions
• Avoids floating nodes by pull down resistors
North Dakota State University 18
![Page 19: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/19.jpg)
• AND function
• D, E , F, G are “0001”
North Dakota State University 19
A B Z
0 0 0
0 1 0
1 0 0
1 1 1
![Page 20: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/20.jpg)
• Wire one output pulse triggers the LU inputs of the neighbor cell in the same direction.
• Merge is realized by 2:1 Muxs, pulses do right turns (90 degree)
• Fork Each turn triggers a neighbor quarter and also a neighbor cell, – a single computation forks
into multiple parallel computations
North Dakota State University 20
![Page 21: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/21.jpg)
Join
• A completion detection circuitry
• All the participating quarters should have their LU outputs ready
• Complements a fork by combining multiple parallel computations into a single computation.
• QDI Communications
North Dakota State University 21
![Page 22: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/22.jpg)
• Fork1
– Only when a pulse turns
– LU should use only the turned pulse
• Fork2 & Fork4
– No timing assumptions
• Fork3 & Fork5
– Bounded delay model
North Dakota State University 22
![Page 23: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/23.jpg)
North Dakota State University 23
• When a pulse travels through many cells, the width of the pulse may increase or decrease
• Too short pulse may not be detectable at all, too long pulse may catch up other pulses
• A PRG produces an output pulse with a certain constant width, independent of the width of the input pulse.
• D1 is the delay by which the input pulse is stretched• D2 determines the width of the output pulse
D1D2
A
B
C
D
E
![Page 24: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/24.jpg)
North Dakota State University 24
![Page 25: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/25.jpg)
North Dakota State University 25
In a PCC cell : (W/L)p / (W/L)n ≈ 1.6
In an inverter:
Equivalent resistance of a MOS : (R≈ L/W)
• To match PMOS and NMOS resistances (W/L)p / (W/L)n = 3 ~ 3.5
tpHL = .69* Rn* CL & tpLH = .69* Rp* CL if Rn = Rp tpHL = tpLH
• A bigger PMOS improves the tpLH by increasing the charging current.
• A bigger PMOS degrades the tpHL by causing a larger parasitic capacitance.
• tp = (tpHL + tpLH)/2 is not minimal.
• The ratio for an optimal speed performance equals to √(Rp/Rn)
• The device can be speed up device by reducing the size of PMOS
![Page 26: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/26.jpg)
• Configuration bits (16 bits for LU switches, 8 bits for Merge MUXs & 4 bits for Join, i.e. total of 28 bits) should be loaded
• Only some parts of the array may need to be configured
• One solution is to make a long chain of shift
registers of all the cells & configure all of them
• A better solution is to form the chain of shift
registers only by the cells that are needed to be configured.
• In each cell, a controller:
– decides whether the cell is wanted to be configured or not
– directs the bit flow to one of the cell neighbors
– stops the shift registers whenever all the intended cells are configured
North Dakota State University 26
![Page 27: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/27.jpg)
North Dakota State University 27
Shows that the shift register is filled
Shows that the cell is the last one in the chain of shift register
Determines that the cell should/should not be configured
Defines the neighbor to which the bits should be forwarded D
ecoder
Decoder
ORdata-N
00
01
10
11
Controller
clk-NDecoder clk-S
clk-E clk-W clk-N
OR clk-S clk-W
clk-E
data-N data-S data-W data-E
data-Wdata-Edata-S
![Page 28: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/28.jpg)
North Dakota State University 28
![Page 29: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/29.jpg)
• PCC cell was implemented in TSMC 250 nm CMOS using S-Edit.
• The simulation was done by Pspice
North Dakota State University 29
• The supply voltage is 5V
• Input pulse widths are 400ps
• Propagation delay through a cell is 480ps ~ 500ps.
• Better speed: Slope ≤ gate propagation delay
• Slope of the external inputs are 12ps.
• No overshoots and undershoots
![Page 30: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/30.jpg)
Voltage source =5V
Average current = 6 mA for 1.4 ns & 17 mA for 8.6 ns
For 20 pulses:
Energy = (5 * 6* 1.4) + (5 * 17 * 8.6) = 773 pJ
North Dakota State University 30
![Page 31: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/31.jpg)
North Dakota State University 31
Voltage source= 5 V Average current = 5 mA Energy = 5 * 5 *1.5 ns =37.5 pJ
Voltage source= 3.3 V Average current = 3 mA Energy = 3 * 3.3 *1.8 ns=17.8 pJ
For 1 pulse (1-bit of operation):
![Page 32: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/32.jpg)
• Sum = A B C 1 1 1= 1
• Carry= AB + BC + AC = AB + (A+B)C 1.1 + (1+1).1=1
• Sum & carry products are ready after 0.5ns & 1.8ns
North Dakota State University 32
![Page 33: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/33.jpg)
• Loops are important for many circuits such as sequential circuits, iterative computations and For, If, and While constructs
• The ring oscillator represents two capabilities of PCC cell:– A loop can be controlled externally (started & stopped)– Utilizing Join of pulses, communications can be QDI
North Dakota State University 33
0Start Pulse ‘0’0
Output is always a ‘1’
01
1
11
0
![Page 34: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/34.jpg)
North Dakota State University 34
• Ring oscillator implemented in an array of PCC cells
One
One
Nand
WR
XOR
One
WR
Pass
Pass
• ‘0’ pulses are shown in blue, ‘1’ pulses are shown in red
• The input Mux is configured to receive a ‘0’ pulse only from external of the 1st cell and a ‘1’ pulse only from a turn.
![Page 35: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/35.jpg)
Simulation
Results:
North Dakota State University 35
![Page 36: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/36.jpg)
An input bit stream of “1010” is used.
North Dakota State University 36
Cell 1
Cell 2
Cell 3
Cell 4
D1 x x x
D2 D1 x x
D3 D2 D1 x
D4 D3 D2 D1
![Page 37: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/37.jpg)
• If handshaking is done for every component, the components can form a pipeline.
• Each component should supply an Ack to indicate that it is available for re-use.
Delay(1) = 3X + (n-2)5X + 3X= (5n - 4)X
North Dakota State University 37
LU
LU
LU
LU
LU
LU
LU
LU
LU
Ack
Ack is received
Ack
Ack is received
![Page 38: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/38.jpg)
• Some cells don’t handshake & they are cascaded. The cascaded cells form a unit of a pipeline. So, handshaking is done only at higher level.
Delay(2) = 3X + (n-2)2X + 3x= (2n +2)X
Delay(2)/Delay(1) = (2n + 2)X=(5n-4)X = 2/5 North Dakota State University 38
LU
LU
LU
LU
LU
LU
LU
AckAck
Ack is received
A unit of the pipeline A unit of the pipeline
![Page 39: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/39.jpg)
North Dakota State University 39
PCC Cell
Technology TSMC 250nm
Voltage Source 5V (3.3V)
Transistor Count 760
Propagation delay 500 ps (600 ps)
Minimum input pulse width 400 ps
Energy consumption for 1-bit operation
37.5 pJ (17.8 pJ)
Routing Capability Data can be routed in 4 directions
QDI Communications Yes, by perfuming Join
Implementing comb/seq circuits Yes
Controlling a loop externally Yes
Implementing pipelines Yes
![Page 40: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/40.jpg)
North Dakota State University 40
![Page 41: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/41.jpg)
North Dakota State University 41
![Page 42: Design of an Asynchronous Reconfigurable Cell for Conformal Computing](https://reader031.vdocument.in/reader031/viewer/2022020417/5681437b550346895daffa8c/html5/thumbnails/42.jpg)
North Dakota State University 42